The first failure mode every new OpenClaw owner hits: the agent worked fine last week and is hallucinating this week. The default assumption is that the model broke. The channel's diagnostic is that the context window filled up — same model, same task, fuller window, stupider agent. Three videos anchor the diagnosis, and the structural fix is to treat the context window like working RAM: monitor it, cap it, and restart it before the agent crosses into the "dumb zone."

This article is the foundation for every other article in this course. §5.2 through §5.8 all assume the mental model built here — that the context window is the load-bearing constraint on agent intelligence, that cheap models degrade faster than expensive ones at the same fill percentage, and that the right response to a saturating context is a restart plus a smaller bootstrap, not a model switch.

What you'll learn

  • The context window is working RAM, not long-term storage. It holds the system prompt, bootstrap files, conversation history, tool results, and the current message — everything competes for the same fixed-size container.
  • The critical threshold is ~40% on cheap models (MiniMax 2.5-class). At 200K tokens, that's ~80K tokens of safe working room. Past 80K, the agent starts hallucinating, picking the wrong tools, and "forgetting everything."
  • Expensive models (Opus, Sonnet) tolerate higher fill — they hold up past 100K — but the same shape applies. There is no model that gets smarter with more context; cheaper ones just degrade faster.
  • The "context resets to zero every day" intuition is wrong. The agent process terminates and restarts, but bootstrap files are immediately loaded — a 300-line SOUL.md means you wake up to a 100K-token context at 10 a.m.
  • The right response to a saturating context is a /clear plus a smaller bootstrap, not a model switch. Restarting the agent clears the working set but preserves anything saved to a file. The agent auto-recovers after ~48 hours of inactivity with memory intact.
  • Tool results are the biggest context consumers. Fetching a YouTube transcript via API costs ~50K tokens; uploading the same transcript as a text file costs ~2K. Save 95% on tool-result budget by uploading files instead of fetching them.

Video 1 — My OpenClaw is STUPID (Here's how to Fix It)

The first of the channel's "your agent is dumb" walkthroughs (1,535 views), and the most actionable. The framing is direct: users tweet polished OpenClaw dashboards while in practice the bot "starts gaslighting me." The fix is not better hardware — it is decomposing the task and forcing the agent to test its own work.

The host's worked example: a Stark agent asked to summarise 30 tweets in one shot failed, because browsing, filtering, and summarising were all bundled into a single prompt. The replacement flow — scan → store → retrieve → summarise — pushed the saved tweets into a vector database between steps, exposed exactly which step was broken, and the dashboard started working. Splitting the work into named steps with intermediate persistence is the single most reliable fix in the video.

The second fix is forcing the agent to test the connection before claiming success. OpenClaw "doesn't test" by default. The host told the agent to fetch YouTube view counts, the bot "built the dashboard and it built it like nothing there — it says all done we're complete guys." The cure: ask the agent to walk through the YouTube API setup, explicitly say "make sure that's connected yes test the connection," and screenshot the result. The same trick applies anywhere an agent has root access to a messaging surface — the cited Facebook/Meta case installed an agent with root access and it "started messaging everyone on her contact list" because no one constrained the scope.

Two workflow rules land in this video. Feed the docs first. The host's standing first prompt is "Here's the API documentation. Learn and understand this first." Pass the doc, confirm the agent read it, then make the API call. Save working flows as skills. Once a workflow runs cleanly end-to-end, tell OpenClaw to "save this as a skill for future reference" — that is how repeated tasks stop breaking, because the next session reads the file rather than re-deriving the steps from scratch.

The closing rule is the load-bearing one: agents "will always fail on tasks" regardless of complexity — and "the simpler the task, the more it fails." The right scaling sequence is run a small task, verify the output, then scale up. For data work, demand the actual artifact ("give me the tweets" or a dashboard URL) so you can catch fabricated responses in real time.

Video 2 — Why Your AI Agent Suddenly Gets Stupid And How to Fix It

The diagnostic frame for the whole subtopic. The channel documents a recurring failure mode: once the context window fills, the agent stops calling the right tools, "forgets everything," and starts producing incoherent output. MiniMax 2.5 specifically gets "really stupid" once context exceeds 40% — the host flags this as the same agent, same model, same task, with the drop being purely a context-saturation problem rather than a capability one. The fix is structural, not a prompt rewrite.

The first lever is SOUL.md. Jeff's SOUL.md was scrubbed from roughly 100 lines down to 28 lines, of which about 6 lines are actual instruction. The default file shipped by OpenClaw is longer than that and performs worse in their testing. The rule the channel settled on is 15–30 lines maximum, with characters minimised, and coherence of output (e.g., news roundups where the stories actually relate) jumped immediately after the cut. The intuition is the opposite of what most newcomers assume: a longer identity file does not produce a "deeper" agent, it produces a fuller context window and a stupider one.

The second lever is the skills list. The default OpenClaw onboarding ships with more skills than a cheap-model agent can carry — Spotify, Discord player, GitHub, and weather were still installed on Jeff's news agent and were bleeding context (song titles, lyrics, JSON chatter from Spotify) into unrelated tasks. The recommended ceiling is 7–10 skills per agent on cheaper models; more than that makes them "dumber and dumber" because every skill is a passive context leak. The actionable rule: if you do not actively use a skill every week, uninstall it.

The pattern repeats on the expensive model. Stark on Opus 4.6 produced a usable daily briefing, took initiative to flag MWC 2026 and Apple as trending "intent," and surfaced crypto and US macro context without being told. But it refused to use Remotion for animations and the slides "look like crap." The fix in progress is to write the presentation workflow as a skill file and force the agent to show its steps for review — a categorically different intervention from "re-prompt it until it cooperates."

The closing principle from the video is specialisation, not generalisation. The team's working setup is hyper-specialised agents doing one task well — news, research, presentations, coding — rather than one general-purpose agent managing everything. Pair this with their prior videos on sub-agents and cron jobs for the long-term architecture, and you have the channel's "why your agent suddenly got stupid" diagnostic in one place: 40% context, bloated SOUL.md, too many skills, generalist scope.

Video 3 — 5 Must Know TIPS for OpenClaw

The general best-practices video, and the most-viewed of the three at 3,473 views. The host opens with the line that ties this subtopic to the rest of the course: OpenClaw's biggest gotcha is silent memory failure, and if you fix that first the rest of the agent starts behaving. The verification line he uses verbatim is to ask the agent directly, "are you using the open AI key, is your memory working?" — if you skip this check, every other tip in the video is fighting upstream.

The remaining four tips are operational. Split work across threads, not one mega-chat. Run one thread per topic and let the agent auto-join each. The host ran one thread for general chat and another that built a dashboard in a single shot — a tip he flags as a "bonus" because it reinforces the topic-per-workflow pattern from §5.7. When it dies, use Claw to fix Claw. If the agent corrupts its own settings file it cannot self-recover, but a freshly-started Claw pointed at the openclaw directory can. The prompt he uses is, "study this folder, this is for openclaw, I have an error, it doesn't boot up, help me connect to Discord and fix the errors." It beats the in-app help settings because Claw can read its own codebase. Tame the 30-minute heartbeat. Default heartbeats fire every 30 minutes and "cost a lot of money" over time; ask Claw to renegotiate to roughly one hour and monitor spend via OpenRouter. Force secrets into the .env file. OpenClaw will actively delete passwords from notes on principle; telling it to store API keys and secrets in .env works because the bot treats that path as a coding convention and leaves it alone.

The closing line is the same as the opening: "tip number one will be like the one tip that you will always use to get things done." For this subtopic that is memory-via-embeddings; for the "got stupid" failure mode it is SOUL.md size and skill count. The host's framing is consistent: the cheap fix is the structural one.

The context window, by the numbers

The 08-context-window-management.md source file lays out the per-model context sizes and the threshold the channel uses as a diagnostic.

Model Context window Approximate "dumb zone" entry Notes
Claude Opus 200,000 tokens ~80K (40%) Tolerant of higher fill; still degrades past 100K
Claude Sonnet 200,000 tokens ~80K (40%) Same threshold; faster than Opus
GPT-4 128,000 tokens ~50K (40%) Smaller window means less safe working room
Gemini Pro 1,000,000 tokens ~400K (40%) Big window, same proportional rule
MiniMax 200,000 tokens ~80K (40%) The channel's "enters the dumb zone faster" reference

Conversion: 1 token ≈ 0.75 words in English. The 40% threshold is the channel's rule of thumb from the §5.1 diagnostic video, applied uniformly — even expensive models degrade past the proportional mark, they just hold up longer.

What fills the context window, in order of size:

  1. Tool results — file reads, web fetches, API responses. Fetching a YouTube transcript via API costs ~50K tokens; uploading the same transcript as a file costs ~2K. This is the largest single category and the cheapest to optimise.
  2. Bootstrap filesSOUL.md, agents.md, memory.md. Loaded at every session start. A bloated 200-line SOUL.md is permanent drag.
  3. Conversation history — your back-and-forth with the agent. Grows linearly with turns. Compaction at ~80% summarises the oldest turns; nuance is lost.
  4. System prompts — OpenClaw's own instructions, including any active skill definitions.
  5. Current message — the task being processed. Smallest contributor; not where to optimise.

The compaction trigger is the formula Context Limit − Reserve Tokens − Soft Threshold. For a 200K model with the default 40K reserve and 4K soft threshold, compaction fires at 156K tokens, not 200K. The reserve is space for the agent's response (default 40K, reducible to 20K for large tasks). The soft threshold is a 4K edge-case buffer. If you want to control compaction more aggressively, drop the reserve — but be aware that a 20K reserve on a long-form output is going to truncate mid-sentence.

Checking your context usage

Two methods, both from the source videos.

Method 1 — ask the agent directly:

How much context are you using right now?

A working agent reports back: "I'm currently using 136,482 tokens out of 200,000 (68%)." That 68% number is past the 40% threshold — the agent is in the dumb zone right now, and the host's framing is that performance degradation is observable at that fill. If the agent reports past 40% on a cheap model, run /clear (see §5.4 below) and resume.

Method 2 — terminal status bar. In OpenClaw's terminal mode, context usage is often displayed automatically in the status bar. Hermes' React/Ink TUI shows the same metric inline (the v0.11.0 "Interface Release" made this a default-visible field).

Method 3 — /context list. This shows per-file character counts on your bootstrap:

soul.md: 15,234 / 20,000 characters
memory.md: 8,456 / 20,000 characters
agents.md: 3,221 / 20,000 characters
Total: 26,911 / 150,000 characters

If any single file exceeds 20,000 characters, it is truncated silently — no warning, no error, the agent just sees incomplete instructions. That is the failure mode §5.5's SOUL.md rules exist to prevent.

Warning signs of context overload

From the channel's diagnostic, the failure mode is recognisable, not silent:

  • Agent asks "What are we working on again?" mid-session.
  • Forgets instructions given 10 minutes ago.
  • Produces generic, boilerplate responses that ignore your preferences.
  • Fails to follow established patterns the agent was using earlier in the session.
  • Suggests absurd actions ("walk to a car wash") for prompts that have nothing to do with cars.
  • Needs constant reminders of project context.

All five are the same failure: the model is trying to do useful work in a context window where the load-bearing instructions have been compressed away. The fix is not a better prompt — it is /clear plus a smaller bootstrap.

Manual context clearing: the /clear and Restart patterns

When you approach the limit or notice the symptoms, start a new session:

/clear

Or explicitly ask:

Clear your context and start fresh

What happens: the agent "dies" and restarts. Long-term memory files are preserved. The new session starts with a clean context window. Anything you said in chat that was not written to a file is gone — that is why the §5.5 rule "save important instructions to files, not chat" is load-bearing.

The Restart button (MaxClaw / OpenClaw in-app): clears the context window in ~10 seconds, preserves long-term memory, and the agent auto-recovers after ~48 hours of inactivity with memory intact. For casual users, the in-app Restart is the lowest-friction path back to a working agent.

Tool usage as a context optimisation lever

Tool results are the biggest context consumers, and they are the cheapest to optimise. From the channel's example: fetching a YouTube video transcript via the API pulls the full transcript into the context — roughly 50K tokens for a 30-minute video. The same transcript uploaded as a text file costs ~2K tokens.

The pattern:

# Don't do this:
Analyze this YouTube video: [link]
# (Agent fetches full transcript via API - 50K tokens)

# Do this:
1. Get the transcript manually (yt-dlp, youtube-transcript-api, copy/paste)
2. Save it to a text file in your agent's directory
3. Upload the file with your prompt
# Token savings: ~95%

The same rule applies to PDF reports, code dumps, and large dataset outputs. If the data lives in a file the agent can cat, it should not be re-fetched through the API.

Model-specific behaviour

High-end models (Claude Opus, Sonnet):

  • Handles high context gracefully — less performance degradation.
  • More reliable at 100K+ tokens — maintains intelligence longer.
  • Better memory management — does not dump context aggressively.
  • Worth the cost for context-heavy workflows.

Cheap models (MiniMax, Qwen, Chinese models):

  • Aggressive context dumping — silently removes "unimportant" context to save costs.
  • Sharp performance drop above 120K tokens.
  • May forget mid-task — "What presentation are we making again?"
  • Requires careful context management — keep usage under 40%.

The same model that produces a usable 30-tweet summary at 20% context can fail the same task at 60%. The variable is the window. Before you switch to Opus, trim the window.

Try it yourself

The hands-on goal for this subtopic: prove the 40% threshold is real on your own agent, then prove the structural fix works.

  1. Run a 40% context audit on your cheapest agent. Run your routine workload (news brief, deep research) for 2–3 days and watch the context meter. The first time it crosses ~40% on a MiniMax-class model, check whether tool calls and recall have started to degrade. If they have, you have your diagnostic. If they haven't, the threshold is conservative for your workload — note your own threshold and adjust.
  2. Cut SOUL.md to 15–30 lines. Take your current file, count the lines, and keep only the operational instructions: identity, primary purpose, "tell me what failed explicitly," and the hard no-go actions. Delete personality, delete autobiography, delete anything that does not change the agent's output. Re-run the same task and compare coherence.
  3. Audit your skills list. ls your skills/ directory and uninstall anything you did not actively use in the last week. Hard cap at 7–10 for cheap-model agents. If a skill leaks JSON into unrelated tasks (Spotify titles, weather payloads), disable it first.
  4. Decompose your next multi-step workflow. Pick a task you would normally issue as one mega-prompt. Break it into 4 named steps: scan, store, retrieve, summarise. Push the intermediate output into a vector store or a temp file, then ask the agent to query it. Verify the same dashboard that broke last time now works step by step.
  5. Demand a connection test and a raw artifact. For every API integration, ask the agent to "test the connection" and screenshot the result. For every data task, demand the actual artifact ("give me the tweets" or a dashboard URL) so you can catch hallucinated outputs before they ship.
  6. Save the working flow as a skill. Once a workflow runs cleanly end-to-end, tell OpenClaw to "save this as a skill for future reference" so the next session re-reads the file rather than re-deriving the steps.
  7. Test the manual-restart recovery. Saturate the context deliberately (long conversation, several tool results), then run /clear. Verify the agent still has access to your SOUL.md, your memory store, and your saved skills. Verify the conversation history is gone.

Common pitfalls

  • Treating "all done" as success. OpenClaw does not test connections by default. An unchecked "we're complete" is the most common cause of the "stupid" failure mode — the agent built a dashboard of nothing. Force a connection test and a screenshot of the result before declaring victory.
  • Granting root access to a messaging agent. The Facebook/Meta case in §5.1 Video 1 installed an agent with root access and it "started messaging everyone on her contact list." Constrain scope to specific channels, DMs, or topics before letting the agent anywhere near a real surface.
  • Longer SOUL.md = deeper agent. The opposite is true. A 100-line identity file bloats the context window and produces a stupider agent. 15–30 lines of clear instruction outperforms 100 lines of autobiography on every task the channel tested.
  • Keeping default skills on the agent. Spotify, Discord player, GitHub, weather — the default onboarding ships more skills than a cheap model can carry, and the unused ones leak context into unrelated tasks. Uninstall what you do not use.
  • Running a 30-minute heartbeat by default. Every wake-up is a billable call to the underlying model, and the default 30-minute cadence adds up. Renegotiate to ~1 hour via OpenRouter monitoring.
  • OpenClaw silently scrubs passwords from notes. Tell the agent explicitly to store API keys and secrets in .env and it will leave them alone; otherwise it will delete them on principle and your integrations will mysteriously break.
  • Blaming the model. The same agent on the same model with the same task can pass on a fresh session and fail on a full one. The variable is almost always context. Before you switch to Opus, trim the window.
  • Asking an Opus-class agent to use a tool it underuses. If a workflow needs Remotion, write it as a skill file and force the agent to show its steps for review. Re-prompting rarely changes tool-selection behaviour across sessions.
  • Trusting the daily "context resets to zero" intuition. The agent process terminates and restarts, but bootstrap files are immediately loaded. A bloated 300-line SOUL.md means you wake up to a 100K-token context at 10 a.m. The reset is not a clean slate; it is a fresh copy of whatever you left on disk.

Sources

  • My OpenClaw is STUPID (Here's how to Fix It) — 1,535 views · 9lcn8ZmqyJ0
  • Why Your AI Agent Suddenly Gets Stupid And How to Fix ItpMgUaqXTge4
  • 5 Must Know TIPS for OpenClaw — 3,473 views · -PT46iH03RQ
  • Source files consolidated into this article: 08-context-window-management.md (full file — context-window architecture, threshold, model-specific behaviour), 37-agent-acting-stupid.md (full file — decomposition + connection-test playbook), and 38-agent-suddenly-stupid.md (full file — 40% threshold diagnostic + skill cap).

External tools, prompts, and services referenced in the videos: OpenClaw /context list and /clear commands, SOUL.md identity file, .env for secrets, OpenAI / OpenRouter keys for embeddings and spend monitoring, Discord for topic-per-workflow chat, OpenRouter for spend monitoring, vector database for intermediate data between workflow steps, and the underlying models discussed in the channel (MiniMax 2.5 and Opus 4.6).