The 4-layer memory system - Memory & Troubleshooting

The second most-common OpenClaw failure mode (after the dumb zone) is the wrong mental model of memory. Users think "memory" is one thing — the agent either remembers or it does not — and then try to fix a layer-1 problem (bootstrap file) with a layer-4 fix (embeddings) or vice versa. The channel's source material breaks memory into four distinct layers, each with its own lifetime, its own failure mode, and its own fix. This article walks through all four and shows why the right response to "my agent forgot X" depends entirely on which layer X was stored in.

The four-layer frame is from the source file 11-memory-management.md. The "memory is not one mechanism" framing is the channel's most-cited structural insight on memory — it appears in 08-context-window-management.md, 10-memory-embeddings.md, 40-memory-loss-problem.md, and 41-obsidian-memory-fix.md, all of which this article draws from.

What you'll learn

Memory in OpenClaw is four separate layers, not one mechanism. Treating it as one is the root cause of most "my agent forgot" tickets.
Layer 1 (Bootstrap files): permanent identity, loaded at every session start, immune to compaction. The layer to fix with SOUL.md size rules.
Layer 2 (Session transcript): the full conversation log, persisted on disk but summarised when compaction fires. The "Walter White problem" layer — Day 1 long conversation, Day 3 the agent asks "who are you?"
Layer 3 (Context window): the active working memory. RAM, not disk. Discussed in §5.1; this article covers the compaction trigger specifically.
Layer 4 (Retrieval index): the searchable archive. Embeddings + keywords + hybrid ranking. The layer that fixes "I told you last week" — provided it is enabled.
The right response to a memory complaint is: which layer did the information live in, and which layer's failure mode matches the symptom? A Layer 1 fix (trim SOUL.md) will not repair a Layer 4 problem (no embeddings configured).
Memory prioritisation is temporal by default: yesterday's work is easily accessible, last week's requires search, last month's needs the retrieval index. Without embeddings, the agent effectively has amnesia past a week.

The four layers, mapped to a computer

The channel's analogy, drawn from 11-memory-management.md:

Layer	Computer analogy	Lifetime	Survives compaction?	Failure mode
Bootstrap files	Hard drive (permanent storage)	Per install	Yes — never summarised	Silent truncation past 20K chars
Session transcript	Disk storage (persisted but summarised)	Per session	No — old turns get summarised	"Walter White problem" — Day 1 detail gone by Day 3
Context window	RAM (active working memory)	Per turn	N/A — it is the working set	Dumb zone at 40%+ on cheap models
Retrieval index	Search index (queryable archive)	Long-term, accumulating	Yes — index is separate	Returns nothing if embeddings not enabled

The architectural insight: each layer serves a different purpose. Durability (bootstrap), history (transcript), performance (context), scalability (retrieval). The agent is the combination. None of these layers replaces any of the others.

Layer 1: Bootstrap files

What they are: permanent identity files loaded from disk at every session start. Located at ~/.openclaw/ or ~/.claude/.

Common files:

SOUL.md — agent personality and core instructions
memory.md — long-term facts and preferences
agents.md — sub-agent configuration
tools.md — tool usage instructions

How they work:

Session starts (daily restart or manual).
Files are read from disk — fresh copy every time, no caching.
Content injected into context — immediately available.
Immune to compaction — never summarised, never lost.

Critical characteristics:

Always loaded — every session start reads these files, no exceptions.
Not in conversation history — changes take effect on the next session, no need to "remind" the agent.
Most durable layer — survives compaction, session restarts, agent crashes.

Size limits (defaults):

20,000 characters per file (silent truncation past this).
150,000 characters total across all bootstrap files.

Check your usage: run /context list. Output shows per-file character counts and total. If any file exceeds 20K characters, content is truncated silently — no warning, the agent just sees incomplete instructions. That is the failure mode §5.5's SOUL.md rules exist to prevent.

Sub-agent behaviour: parallel sub-agents only read agents.md and tools.md — not SOUL.md or memory.md. The implication is that sub-agents lack the main agent's personality; task instructions must be in agents.md or passed explicitly. Sub-agents are "dumber" by design (minimal context). See Course 6 §6.4 for the diagnostic framing.

Layer 2: Session transcript

What it is: the full conversation history, saved to disk as a file. Located at ~/.openclaw/sessions/ or similar.

Contains:

User messages
Assistant messages
Tool calls and results
Timestamps

How it works:

Every message is appended to the transcript file.
Transcript is rebuilt into context when continuing a session.
Persists across restarts — can resume conversations.
Stored in vector database format — not human-readable.

The compaction problem:

When context approaches the limit (typically 200K tokens):

Auto-compaction triggers.
Old messages are summarised into compact form.
Summary replaces detailed history in context.
Original transcript still exists on disk — but the agent cannot see it.

Critical distinction: the raw transcript file is still on disk and complete; the agent's view is the summarised version only. You can read the original transcript yourself with cat ~/.openclaw/sessions/<session-id>.jsonl, but the agent's working memory has been compressed.

What survives compaction:

Last 20,000 tokens (recent messages).
Anything written to bootstrap files.
General themes and topics.

What is lost:

Exact wording of earlier instructions.
Nuance and context from old messages.
Specific constraints mentioned mid-conversation.
Casual preferences stated in chat.
Images from earlier in the session.

The Walter White problem. The host's named failure mode: Day 1 you have a long conversation about Project X. Day 2, the context compacts overnight. Day 3, the agent asks "Who are you? What project?" — because the conversation details were in the transcript only, never saved to bootstrap files, and compaction summarised away the specifics.

The fix is structural: "Always save important information to files, not chat." If it is not in a file, it does not exist long-term. The §5.5 SOUL.md rules and the §5.7 Obsidian-vault rule are both expressions of this principle.

Layer 3: Context window

What it is: the active working memory — a fixed-size container where everything competes for space. Covered in detail in §5.1; this article documents only the compaction trigger because that is the bridge to Layer 2.

Compaction trigger formula:

Trigger = Context Limit − Reserve Tokens − Soft Threshold

Example (200K context, defaults):

200,000 − 40,000 − 4,000 = 156,000 tokens

Compaction fires at 156K, not 200K. The reserve is space for the agent's response (default 40K, reducible to 20K for large tasks). The soft threshold is a 4K edge-case buffer.

Biggest consumers of the context window (in order):

Tool results — file reads, web snapshots, API responses.
Long conversations — multi-turn back-and-forth.
Code blocks — full file contents.
Bootstrap files — loaded every turn.

The §5.1 optimisation strategy (upload files instead of fetching via API) is the single biggest lever here. Up to 95% token savings on tool results.

Layer 4: Retrieval index

What it is: the searchable archive. Sits beside or outside the memory files.

Technology:

Vector database (SQLite).
Hybrid search (keyword + semantic).
Embeddings-based retrieval.

How it works:

Write information to memory files.
OpenClaw indexes the content automatically.
Agent searches with the memory_search tool.
Index returns relevant snippets with file paths.
Agent reads full context with memory_get.

This is a two-step process: search → retrieve. Search returns snippets (fast, low token cost); retrieve loads full context (slower, higher token cost). The agent only retrieves what is actually relevant.

Enabling embeddings: requires an OpenAI or Gemini API key. Check if enabled by asking the agent "How does your memory embedding system work?" — a working agent will mention the vector database, semantic search, and the SQLite file in the memory directory. If not, set up with "Set up memory embeddings with OpenAI key..."

Keyword vs. semantic search:

Keyword search: exact word matching. "Pepsi" finds "Pepsi." Fast but limited.
Semantic search: concept matching. "Soda" finds "Pepsi," "Coca-Cola," "soft drink." Understands relationships.

OpenClaw automatically uses both: keyword search for exact matches (fast), semantic search for concept matches (thorough), merged and ranked by relevance, top N returned.

Storage location: ~/.openclaw/memory/memory.db (SQLite). Not human-readable. Contains vector embeddings.

The memory priority rule (from 11-memory-management.md): OpenClaw prioritises recent memory by default:

Yesterday's work: easily accessible.
Last week: requires search.
Last month: needs retrieval index.

Without embeddings, the agent may not find old information — it relies on bootstrap files and recent transcript only. With embeddings, semantic search finds relevant content regardless of age, scaling to months or years of history. That is the §5.3 case for embeddings, applied to long-term recall.

How the layers work together

Session start flow:

1. Bootstrap files loaded from disk
   ↓
2. Session transcript rebuilt into context
   ↓
3. Context window populated with:
   - System prompt
   - Bootstrap files
   - Recent conversation
   ↓
4. Retrieval index ready for queries

During conversation:

User message
   ↓
Agent checks context window (active memory)
   ↓
If needed: searches retrieval index
   ↓
If needed: reads additional files
   ↓
Generates response
   ↓
Appends to session transcript

When context fills:

Context reaches 156K tokens
   ↓
Auto-compaction triggers
   ↓
Old messages summarised
   ↓
Summary replaces detailed history
   ↓
Bootstrap files remain intact
   ↓
Retrieval index unaffected

The key insight: bootstrap files remain intact, retrieval index is unaffected. Compaction only touches the in-memory view of the transcript. Anything you wrote to a file or indexed in the retrieval layer survives.

The three common memory failures

From 11-memory-management.md, the channel's named failure modes map cleanly to the four layers:

Failure	Symptom	Cause	Solution
Bootstrap file truncation	Agent forgets core instructions	File exceeded 20K char limit	Check `/context list`, trim to under 20K chars, remove unnecessary content
Chat instructions lost	Agent forgets instructions given in conversation	Instructions never saved to file, lost in compaction	"Save this instruction to my SOUL.md file"
Retrieval index not enabled	Agent cannot find old information	No OpenAI/Gemini API key configured	Set up API key, verify SQLite database exists, test memory search

The right response to "my agent forgot X" is to identify which layer X lived in. A Layer 1 fix (trim SOUL.md) will not repair a Layer 4 problem (no embeddings). The §5.3 embeddings article and the §5.7 Obsidian article are the fixes for the Layer 4 cases; the §5.5 SOUL.md rules are the fix for the Layer 1 cases.

The "right" memory strategy by use case

From 40-memory-loss-problem.md, the channel's worked recommendation:

For builders (focus on projects):

Priority: skills and project plans.
Memory: minimal personal context.
Approach: document architecture, save build workflows as skills.

For personal assistants (focus on preferences):

Priority: embeddings and personal context.
Memory: extensive preference tracking.
Approach: daily summaries, preference documentation, routine skills.

For researchers (focus on knowledge):

Priority: vector databases and knowledge graphs.
Memory: source tracking, connection mapping.
Approach: Obsidian integration, citation management.

The point: there is no universal "best" memory setup. The right choice depends on what the agent is for. A coding agent and a personal assistant have different memory needs; the channel's videos explicitly optimise for different mixes.

Try it yourself

The hands-on goal for this subtopic: identify which layer is the bottleneck for your memory complaint, then fix that layer.

Inventory your four layers. Open a terminal and run the diagnostic:

# Layer 1: bootstrap files
ls -la ~/.openclaw/*.md
wc -c ~/.openclaw/*.md
/context list   # or the equivalent in your client

# Layer 2: session transcripts
ls -la ~/.openclaw/sessions/

# Layer 3: context window
# Ask the agent: "How much context are you using?"
# Hermes/Ink TUI shows it inline.

# Layer 4: retrieval index
ls -la ~/.openclaw/memory/*.db

If any layer is missing or empty, that is your bottleneck.

Reproduce the Walter White problem. Start a session, have a long detailed conversation about a project, run /clear, start a new session. Note which details the agent retained and which it forgot. The lost details are the ones that lived only in Layer 2 (the transcript) — they are exactly the details the §5.5 rule "save important instructions to files, not chat" exists to capture.
Force a compaction event. Run your routine workload for 2–3 days. Watch the context meter. The first time it crosses ~80% of the limit, note whether compaction fired automatically (the agent should report a context summary), and whether anything important was lost. If something important was lost, that detail should have been in a bootstrap file.
Verify Layer 4 is actually working. Ask the agent: "Search my memory for [concept you mentioned last week]." A working retrieval layer returns the relevant snippet with a file path and similarity score. If the agent returns nothing, the embedding API key is missing or the SQLite database is corrupt.
Pick the memory destination that fits your use case. Builders → skills-first (§5.1 + Course 1 §1.5). Personal assistants → embeddings-first (§5.3 + §5.6 Honcho). Researchers → Obsidian vault (§5.7). Multi-platform users → Honcho (§5.6).

Common pitfalls

Treating memory as one mechanism. It is four layers. A fix for the wrong layer will not work.
Saving important info to chat only. Compaction will summarise it away. The Walter White problem is a Layer 2 failure; the fix is to write to a file (Layer 1) or index it for search (Layer 4).
Ignoring the 20K-character per-file limit. Silent truncation means the agent sees incomplete instructions with no warning. Run /context list regularly.
Trusting the embedding index without verification. "Set up memory embeddings with OpenAI key" is not the same as "the SQLite database exists and semantic search returns relevant results." Test the index end-to-end.
Confusing keyword and semantic search. Both run by default. The hybrid ranking is what makes retrieval work — disabling either one drops the hit rate noticeably.
Loading the whole memory directory into bootstrap files. Bootstrap is permanent RAM-resident context. The retrieval index is for large datasets. Use the right layer.
Reading the priority rule as optional. Yesterday's work is in the context window. Last week's work requires search. Last month's work requires the retrieval index. Plan your memory destination around how far back you need to recall.

Sources

OpenClaw Memory Problem SOLVED | Stop Wasting Time Explaining — 6,603 views · U9wmg7dMWLM · the three-layer memory stack (embeddings + QMD + skills) walkthrough.
How OpenClaw Memory ACTUALLY Works — HM0ATQCHGP0 · the four-layer frame from the channel's foundational memory video.
OpenClaw Memory Embeddings EXPLAINED — mLsxlYuLafE · the embeddings deep-dive from the same release arc.
Source files consolidated into this article: 11-memory-management.md (full file — four-layer frame, compaction trigger, three failure modes), 40-memory-loss-problem.md (full file — three-layer stack, embedding providers, use-case-specific memory strategy), and the Layer 4 / retrieval-index material from 10-memory-embeddings.md.

External tools and services referenced: OpenClaw bootstrap files (SOUL.md, agents.md, memory.md, tools.md), the /context list and /clear commands, the session transcript directory (~/.openclaw/sessions/), the SQLite vector database (~/.openclaw/memory/memory.db), the memory_search and memory_get tools, and the OpenAI / Gemini embedding APIs.