Embeddings: semantic search for your agent - Memory & Troubleshooting

Embeddings are the technical answer to the "I told you last week, you don't remember" failure mode. They convert text into numerical vectors so the agent can search by meaning — "soda" finds "Pepsi," "authentication" finds "login flow," "API rate limits" finds "429 errors." Without embeddings, the agent's retrieval layer (Layer 4 from §5.2) is keyword-only, which means it misses anything where the wording drifted across sessions.

This article is the technical companion to §5.2 Layer 4 and §5.7 Obsidian. The source files are 10-memory-embeddings.md (full) and the embedding-related material from 40-memory-loss-problem.md. The configuration steps are taken from the channel's walkthroughs; the cost math is from the OpenAI pricing page the channel cites.

What you'll learn

Embeddings convert text into high-dimensional numerical vectors that capture semantic meaning. Two sentences with similar meaning produce vectors that are close in vector space.
Without embeddings, your retrieval is keyword-only: "What's my soda preference?" finds nothing if you wrote "Pepsi" last week. With embeddings, the same query finds the "Pepsi" entry because the vectors are close.
OpenAI's text-embedding-3-small produces 1,536-dimensional vectors at $0.02 per 1M tokens — roughly 150–750× cheaper than the underlying LLM calls. A 10,000-entry memory at 300 tokens average costs ~$0.06 total.
The configuration is one OpenAI or Gemini API key plus a SQLite vector store. OpenClaw handles the indexing automatically once the key is set.
The hybrid search strategy (keyword + semantic, ranked by relevance) is the default — disabling either mode drops the hit rate noticeably.
The memory_search tool returns snippets with file paths and similarity scores (0.0–1.0). Use memory_get to load full context. Two-step process keeps token cost down: search returns cheap snippets, retrieve loads only what's relevant.
The directory layout matters: organise memory into daily/, projects/, knowledge/, preferences/. The structure makes the agent's semantic search more useful and makes your own audit easier.
Obsidian + GitHub is the channel's preferred external-storage pattern when memory outgrows the local SQLite store. Smart Connections (Brian Petro, ~836K downloads at the time of recording) runs the embedding model locally — no API key, fully private.

What embeddings actually are

The simple explanation, from 10-memory-embeddings.md:

Embeddings convert text into numbers so computers can understand meaning.

Example:

"Pepsi" → [0.23, 0.87, 0.45, 0.12, ...] (vector of numbers)
"soda" → [0.25, 0.85, 0.43, 0.14, ...] (similar numbers = similar meaning)
"car" → [0.91, 0.15, 0.08, 0.73, ...] (different numbers = different meaning)

Why it matters: computers are excellent with numbers, poor with words. Similar concepts have similar vector representations. That property — similarity in vector space = similarity in meaning — is what makes semantic search work.

The technical explanation: vector embeddings are high-dimensional numerical representations of text that capture semantic meaning. Modern embedding models (OpenAI's text-embedding-3-small) convert text into vectors of 1,536 dimensions, where:

Distance between vectors indicates semantic similarity.
Clustering reveals related concepts.
Search finds content by meaning, not just keywords.

Without embeddings (keyword search only)

Query: "What's my soda preference?"

Search: looks for the exact word "soda."

Result: nothing found — you wrote "Pepsi" not "soda."

This is the failure mode every OpenClaw owner hits without realising it. The agent has the memory, but the retrieval layer cannot connect "soda" to "Pepsi" because there is no lexical overlap. The fix is embeddings.

With embeddings (semantic search)

Query: "What's my soda preference?"

Search: understands "soda" relates to "Pepsi," "Coca-Cola," "soft drink."

Result: finds "Ron likes Pepsi diet with ice."

The same query, the same memory store, the same agent — embeddings turn a 0% hit rate into a high-confidence match.

Real-world impact

Three scenarios from the channel's walkthrough:

Scenario 1 — project recall:

Query: "How did we handle user authentication?"
Without embeddings: must use exact phrase "user authentication"
With embeddings: finds "login system", "auth flow", "credential validation"

Scenario 2 — cross-session knowledge:

Query: "What pricing model did we decide on?"
Without embeddings: lost if you said "cost structure" instead
With embeddings: finds related discussions regardless of exact wording

Scenario 3 — concept exploration:

Query: "Show me everything about API integrations"
Without embeddings: only finds exact phrase "API integrations"
With embeddings: finds "REST endpoints", "webhook handlers", "third-party connections"

The third scenario is the one that makes embeddings indispensable for research workflows — the agent can answer "everything about X" even when you have never written the exact phrase "X" anywhere in memory.

How OpenClaw uses embeddings: the two-layer retrieval

OpenClaw's memory retrieval is two layers deep:

Layer A — daily memory files (keyword search).

Basic text files in ~/.openclaw/memory/.
Fast but limited to exact matches.
Good for recent, explicit information.

Layer B — vector database (semantic search).

SQLite database with embeddings.
Slower but finds related concepts.
Essential for long-term, cross-session recall.

Hybrid search strategy:

Keyword search — fast, exact matches.
Semantic search — slower, concept matches.
Ranking — best results from both methods.

The result: optimal balance of speed and accuracy. Neither layer alone is sufficient. Keyword search misses paraphrase and conceptual drift; semantic search alone is slower and sometimes returns less precise matches for exact-phrase queries (like ticket numbers or specific names).

Enabling memory embeddings

Requirements: an API key from one of:

OpenAI (recommended).
Google Gemini (alternative).

Setup process:

Step 1 — check current status:

How does your memory embedding system work?
Tell me about your vector database setup.

Expected response (if enabled):

I use a vector database stored in SQLite format at
~/.openclaw/memory/memory.db. When you save information,
it's converted to embeddings using OpenAI's API and
stored for semantic search.

Response if NOT enabled:

I don't currently have embeddings enabled. You'll need
to configure an OpenAI or Gemini API key.

Step 2 — configure API key:

For OpenAI:

Set up memory embeddings with my OpenAI API key: sk-...

For Gemini:

Configure memory embeddings using Gemini API key: AIza...

Step 3 — verify setup.

Check for database file:

ls -la ~/.openclaw/memory/

Look for memory.db or sessions.db (SQLite format). File size grows as you add memories. Not human-readable (binary format).

Test semantic search:

Search my memory for anything related to "project deadlines"

Should find results even if you never used the exact phrase "project deadlines."

How embeddings are generated

The embedding pipeline:

1. You save information
   "Ron prefers Pepsi diet with ice"
   ↓
2. Text is sent to embedding API
   OpenAI text-embedding-3-small
   ↓
3. API returns vector
   [0.23, 0.87, 0.45, 0.12, ..., 0.91]
   (1,536 dimensions)
   ↓
4. Vector stored in SQLite
   Indexed for fast similarity search
   ↓
5. Available for semantic queries
   "What's Ron's soda preference?"

Cost considerations:

OpenAI pricing for text-embedding-3-small:

$0.02 per 1M tokens.
Extremely cheap compared to LLM calls.
Typical memory entry: 100–500 tokens.
Cost per memory: ~$0.00001–$0.00005.

Example: 10,000 memory entries × 300 tokens average = 3M tokens = $0.06. Negligible cost for a massive capability boost.

Verdict: if you are concerned about embedding cost, the math does not bear it out. Embeddings are 150–750× cheaper than the LLM calls they enable you to skip (LLM calls: $3–$15 per 1M tokens depending on model).

The two-step retrieval workflow

Step 1: Search (memory_search)
Agent: memory_search("pricing decision")
Result: [
  {file: "projects/saas-app.md", snippet: "...monthly subscription...", score: 0.89},
  {file: "memory/2026-04-15.md", snippet: "...tiered pricing model...", score: 0.82},
  {file: "memory/2026-03-20.md", snippet: "...cost structure analysis...", score: 0.76}
]

Step 2: Retrieve (memory_get)
Agent: memory_get("projects/saas-app.md")
Result: [Full file content with context]

Why two steps?

Search returns snippets (fast, low token cost).
Retrieve loads full context (slower, higher token cost).
Agent only retrieves what is actually relevant.

This is the cheapest pattern: spend a few tokens on the snippet-level search, then spend more tokens only on the files that actually match. Without the snippet step, you would either load everything (waste tokens on irrelevant content) or rely on the agent to guess which file to open (introduces failure modes).

Similarity thresholds

The memory_search tool returns a similarity score on a 0.0–1.0 scale. Higher = stricter matching. Lower = more permissive matching.

Typical thresholds:

Range	Interpretation	Use
0.8+	Very similar (strict)	High-confidence matches, fact lookups
0.7–0.8	Related concepts (moderate)	General semantic search
0.6–0.7	Loosely related (permissive)	Exploratory queries
<0.6	Probably not relevant	Filter out

The default OpenClaw threshold is roughly the 0.7–0.8 range. Adjust with:

Adjust memory search threshold to 0.8 for stricter matching

Or use more specific query terms to improve relevance. The threshold is the right knob to turn when you see too many irrelevant results (raise it) or too few results (lower it).

Organising memory for embeddings

The recommended directory layout, from 10-memory-embeddings.md:

~/.openclaw/memory/
├── daily/
│   ├── 2026-05-01.md
│   ├── 2026-05-02.md
│   └── 2026-05-03.md
├── projects/
│   ├── project-alpha.md
│   ├── project-beta.md
│   └── project-gamma.md
├── knowledge/
│   ├── api-patterns.md
│   ├── deployment-procedures.md
│   └── troubleshooting-guides.md
├── preferences/
│   ├── coding-style.md
│   ├── communication.md
│   └── workflow.md
└── memory.db (SQLite - auto-generated)

What to store where:

Daily memory (ephemeral): today's tasks, temporary context, quick notes.
Project memory (medium-term): project-specific decisions, implementation details, lessons learned.
Knowledge memory (evergreen): reusable patterns, standard procedures, best practices.
Preferences memory (permanent): personal preferences, communication style, work habits.

The structure does two things at once: it makes the agent's semantic search more useful (related files cluster together in vector space) and it makes your own audit easier (you know where to look when the agent returns something wrong).

Keyword vs. semantic search — when each wins

Keyword search wins for:

Exact names: "Project Apollo."
Specific terms: "API key rotation."
Recent information: "yesterday's meeting."
Unique identifiers: "ticket-1234."

Example:

Query: "Find Project Apollo notes"
Keyword: fast, exact match
Semantic: overkill, slower

Semantic search wins for:

Concept queries: "authentication approaches."
Fuzzy recall: "that pricing thing we discussed."
Cross-domain: "security best practices."
Exploratory: "everything about deployments."

Example:

Query: "How do we handle user login?"
Keyword: misses "authentication", "credentials", "auth flow"
Semantic: finds all related concepts

Hybrid strategy (the default):

Keyword search for exact matches (fast).
Semantic search for concept matches (thorough).
Merge and rank results by relevance.
Return top N.

Best of both worlds: speed + intelligence. Disabling either one drops the hit rate noticeably.

Advanced configuration

Embedding model selection:

OpenAI options:

text-embedding-3-small (default, 1,536 dimensions, $0.02/1M tokens).
text-embedding-3-large (3,072 dimensions, more accurate, more expensive).

Gemini options:

text-embedding-004 (768 dimensions).

Configuration example:

Configure embeddings to use text-embedding-3-large for
higher accuracy on technical content

Use text-embedding-3-large for technical domains (code, finance, legal) where precision matters; text-embedding-3-small is the default and is fine for general-purpose memory.

Batch embedding:

For large memory imports:

# Import 100 files at once
openclaw memory import --batch ./knowledge-base/

Benefits:

Faster than one-by-one.
More efficient API usage.
Progress tracking.

Re-indexing:

When to re-index:

Changed embedding model.
Corrupted database.
Major memory reorganization.

How to re-index:

Rebuild my memory embeddings from scratch

Warning: may take time for large memory directories. Plan to run it during a low-activity window.

The Obsidian + GitHub pattern

When memory outgrows the local SQLite store — typically past 100K entries, or when you want a human-readable audit trail — the channel's preferred pattern is Obsidian + GitHub:

OpenClaw Memory (local)
    ↓
Obsidian (editing interface)
    ↓
GitHub (backup + version control)
    ↓
Retrieval Index (searchable)

Workflow:

Agent saves to memory directory.
Obsidian syncs and displays (human-readable).
GitHub backs up (version control).
Embeddings index (searchable).

Benefits:

Edit memories in Obsidian (better UX than cat).
Version control via GitHub.
Backup and sync across devices.
Semantic search via OpenClaw.

Setup:

Step 1 — configure Obsidian vault:

# Point Obsidian to OpenClaw memory directory
Vault location: ~/.openclaw/memory/

Step 2 — initialise Git:

cd ~/.openclaw/memory/
git init
git remote add origin git@github.com:username/openclaw-memory.git

Step 3 — auto-sync script:

#!/bin/bash
# ~/.openclaw/hooks/memory-sync.sh

cd ~/.openclaw/memory/
git add .
git commit -m "Memory update: $(date)"
git push origin main

Step 4 — configure OpenClaw hook:

{
  "hooks": {
    "memory_write": "~/.openclaw/hooks/memory-sync.sh"
  }
}

The full Obsidian pattern — including Smart Connections for local embeddings, QMD as MD for clean journal format, and the Discord topic-per-workflow setup — is covered in §5.7.

Use cases

Use case 1 — long-term project memory. Working on multiple projects over months. Query "What authentication approach did we use in Project A?" Returns the relevant section from projects/project-a.md with a similarity score of 0.91. Without embeddings, you would need to remember which file to grep and search manually.

Use case 2 — cross-session learning. Agent learns from past mistakes. Two months ago, you wrote a "Lesson Learned: Database Migrations" note. Today, the agent is asked "How should I handle this database migration?" and retrieves the lesson automatically, recommending staging test first. Without embeddings, the agent has no way to connect today's task to a two-month-old note.

Use case 3 — news research archive. Daily news scraping with searchable archive. Query "What AI model releases happened last week?" Returns 7 relevant entries spanning multiple daily files. Without embeddings, you would need to read every daily file manually.

Troubleshooting

Embeddings not working:

Symptoms: semantic search returns no results, only exact keyword matches work, no SQLite database file.

Diagnosis:

# Check for database
ls -la ~/.openclaw/memory/*.db

# Check API key
echo $OPENAI_API_KEY

# Test embedding generation
curl https://api.openai.com/v1/embeddings \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "test", "model": "text-embedding-3-small"}'

Solutions:

Verify API key is set correctly.
Check API key has embeddings permission.
Ensure sufficient API credits.
Restart OpenClaw to reload configuration.

Search returns irrelevant results:

Cause: similarity threshold too low.

Solution:

Adjust memory search threshold to 0.8 for stricter matching

Or use more specific query terms to improve relevance.

Database file is huge:

Cause: too many embeddings stored.

Solutions:

Option 1 — archive old memories:

# Move old daily memories to archive
mkdir ~/.openclaw/memory/archive/
mv ~/.openclaw/memory/daily/2025-* ~/.openclaw/memory/archive/

Option 2 — selective indexing:

Only index files in projects/ and knowledge/ directories,
skip daily/ memories older than 30 days

Option 3 — database cleanup:

Rebuild memory database, excluding archived content

Best practices

Enable embeddings early. Don't wait — set up embeddings from day one. Retroactive indexing is slower, you lose semantic search benefits during setup, and you cannot reconstruct which conversations produced the preferences you forgot.
Write descriptive memory entries. Bad: "Meeting notes. Discussed stuff. Made decisions." Good: "Product Roadmap Meeting — 2026-05-06. Decided to prioritize API v2 launch over mobile app. Reasoning: 60% of users access via API, only 20% via mobile." More context = better semantic search results.
Use consistent terminology. Inconsistent: "user login" (file 1), "authentication" (file 2), "sign-in flow" (file 3). Consistent: "authentication" (primary term), "Also known as: login, sign-in" (aliases). Helps embeddings cluster related concepts.
Regular memory maintenance. Monthly: archive old daily memories, consolidate related entries, remove outdated information, update evergreen knowledge.
Test semantic search. After adding important information: "Search my memory for [concept] to verify it's indexed." Ensures embeddings are working, information is findable, search quality is good.

Try it yourself

The hands-on goal: prove the keyword/semantic difference on your own memory store.

Enable embeddings. Set your OpenAI or Gemini API key. Verify the SQLite memory.db file exists in ~/.openclaw/memory/. Cost is ~$0.06 for 10K entries — round down.
Run a side-by-side query. Without embeddings (or with keyword-only search): "What's my soda preference?" — likely returns nothing. With embeddings: returns the entry you wrote about Pepsi. This is the proof.
Test the two-step retrieval. Search for "pricing decision" — note the snippet-level results and similarity scores. Then memory_get the top result — note the full file is loaded. Verify the agent did not have to load every file in the memory directory.
Try the threshold knob. Run a fuzzy query — "that pricing thing we discussed." With the default threshold, you get a relevant match. Raise the threshold to 0.85 — likely nothing returned. Lower to 0.6 — too many matches. Find your threshold for this workload.
Organise the memory directory. Create the four-folder layout (daily/, projects/, knowledge/, preferences/). Move existing memory files into the right buckets. Run a query for "everything about X" — note the cleaner clustering.
Set up the GitHub mirror. For an Obsidian vault or any memory directory you care about durability for: git init in the directory, push to a private repo, set up the memory_write hook for auto-commit. This is the cheapest insurance you will ever buy.

Common pitfalls

Treating embeddings as audit. Vector indexes cannot tell you what the agent retained — they can only tell you what they returned for a query. If you need an audit trail, push a plain-text daily summary to Obsidian (§5.7) and mirror to GitHub.
Enabling embeddings late. Retroactive indexing is slower and you lose the recall benefit for everything you said before enabling. Set up the API key on day one.
Inconsistent terminology. "user login" in one file and "authentication" in another splits the semantic cluster. Pick a primary term per concept and stick to it.
Database bloat. Without archiving, the SQLite file grows indefinitely. Move old daily memories to an archive directory and exclude them from indexing.
Single-file memory dumps. If you dump the entire conversation into one file, embeddings cluster around the most common words, not the most important concepts. Split into topic-specific files for cleaner retrieval.
Trusting the embedding index without testing. "I set up the API key" is not the same as "the SQLite database exists and memory_search returns relevant results." Test the index end-to-end before relying on it.
Raising the threshold past 0.85 for general queries. You will see fewer results, including relevant ones. Keep the default (around 0.7–0.8) for general semantic search; raise to 0.85+ only for high-precision fact lookups.

Sources

OpenClaw Memory Embeddings EXPLAINED — mLsxlYuLafE · the deep-dive on embeddings from the channel's foundational memory walkthrough.
OpenClaw Memory Problem SOLVED | Stop Wasting Time Explaining — 6,603 views · U9wmg7dMWLM · the three-layer memory stack (embeddings + QMD + skills).
Source files consolidated into this article: 10-memory-embeddings.md (full file — embedding pipeline, search workflow, directory layout, troubleshooting, best practices) and the embedding-related material from 40-memory-loss-problem.md (the three-layer stack walkthrough).

External tools and services referenced: OpenAI text-embedding-3-small and text-embedding-3-large models, Google Gemini text-embedding-004, the OpenAI embeddings API endpoint, the SQLite vector store at ~/.openclaw/memory/memory.db, the memory_search and memory_get tools, the hybrid keyword+semantic search pipeline, and the Obsidian + Smart Connections + GitHub integration pattern (covered in detail in §5.7).