Building an AI agent that actually remembers

How we fixed the AI agent's amnesia in production

Feb 09, 2026

TL;DR

We built an AI agent. It worked brilliantly until it suddenly forgot everything it had just done. Fixed it with three changes: made memory loading architectural (not optional), merged all threads into one session, and added safeguard mode to preserve context during compaction. Now the agent actually remembers.

How it started

If you were on AI Twitter last week, you probably saw the chaos: Clawdbot became Moltbot, then became OpenClaw. Three names in one week.

The open-source AI agent project went viral 3 weeks back, then Anthropic sent a trademark request about the “Clawd” name being too close to “Claude.” Developer Peter Steinberger renamed it Moltbot (because lobsters molt, get it?). Then, in the chaos of the rename, crypto scammers hijacked his old username, fake tokens launched, and everything went sideways. Now it’s OpenClaw. Finally.

The project itself is actually useful: an autonomous AI agent that runs locally and executes tasks across messaging platforms. Not just a chatbot, but an agent that actually does things.

We caught the bug early. Downloaded Clawdbot (back when it was still called that), saw what it could do, and thought: “What if we built an agent that does internal audits?”

Not a chatbot that answers audit questions. A full autonomous agent that extracts regulatory requirements, designs test procedures, collaborates across Slack threads, and maintains context over days-long conversations. An AI colleague, not just a tool.

We had the agent running and set it loose. It worked beautifully. Until someone asked a question in a different Slack thread within the same channel.

The agent responded: “I don’t have access to Slack.”

What the actual ****.

You just had a fifty-message conversation. The agent posted detailed findings, created comprehensive reports. And now it claims it’s never seen Slack before?

Welcome to the world of AI agent amnesia. This is the story of how we fixed it and what we learned about building agents that actually remember.

The three problems

1. Memory loading was just advice

Our AGENTS.md file said:

“Before doing anything else: Read SOUL.md, USER.md, and memory/YYYY-MM-DD.md”

These were just instructions - text in a prompt. The agent could ignore them. And it did. Sometimes it would read the memory files. Sometimes it wouldn’t.

Memory loading was advice, not architecture.

2. Threads broke everything

Clawdbot’s default: Each Slack thread = separate session.

Thread 1: agent:main:slack:channel:xxxxxxxxxx:thread:1234
Thread 2: agent:main:slack:channel:xxxxxxxxxx:thread:5678

Makes sense for general chat. Pizza recommendations shouldn’t share context with database migrations.

But for audit work? Threads are just organizational tools. When someone says “Re-extract Chapter 1” in Thread 2, they’re continuing the audit from Thread 1.

Different thread = complete amnesia.

3. Compaction threw away the good stuff

When the context window fills up (~180K tokens), Clawdbot compacts older messages. Necessary - you can’t keep infinite history.

But without saving critical information before compaction, important details just vanished.

The three solutions

Solution 1: Custom memory hooks

The insight: Make memory loading architectural, not optional.

Built a hook that triggers on every message, loads memory files, and injects them before the LLM sees anything.

export default async function memoryContextHook(event) {
    if (event.type !== 'message' || event.action !== 'before') return;
    
    const memoryContext = await loadMemoryContextCached(workspaceDir);
    if (memoryContext) {
        ctx.Body = `${memoryContext}${ctx.Body || ''}`;
    }
}

Before: Message → LLM decides → (maybe reads memory) → Respond

After: Message → LOAD MEMORY → Inject into context → Respond

Memory loading is now deterministic. The agent can’t forget because the memory is already there.

Trade-off: Adds ~2-5K tokens per message. Worth it. An agent that remembers is infinitely more useful than one that saves tokens.

Solution 2: Channel-wide sessions

The insight: Threads aren’t conversation boundaries — they’re organizational tools.

One line change in session-key.js:

// BEFORE:
const useSuffix = params.useSuffix ?? true;

// AFTER:
const useSuffix = params.useSuffix ?? false;

Result: All threads in a channel share one session. Full context continuity.

Solution 3: Safeguard mode + memory flush

The insight: Save critical info before compaction throws it away.

We configured Clawdbot’s compaction to use “safeguard mode”:

{
  "compaction": {
    "mode": "safeguard",
    "memoryFlush": {
      "enabled": true
    }
  }
}

Safeguard mode keeps the last 10-15 messages in full detail (active memory) while compressing older messages into a summary. When compaction triggers:

Memory flush runs first - Agent extracts critical information: requirements, evidence mappings, test procedures, decisions, findings, ongoing work
Saves to memory files - Writes to daily memory files (memory/YYYY-MM-DD.md)
Active memory preserved - Recent 10-15 messages stay in full detail
Older messages compressed - Summarized into compact form
Context available - Memory files auto-loaded via hooks on next message

Think of it like taking notes before clearing your whiteboard. Recent messages stay in full detail, older messages get summarized, but you’ve written down everything that matters.

Rolling memory pattern

This creates a rolling memory system:

Active memory - Last 10-15 messages in full detail
Compact summary - Older messages compressed
Memory files - Critical information extracted and saved
Auto-loaded context - Yesterday’s and today’s memory files injected into every message

The agent maintains both short-term context (recent messages) and long-term memory (extracted to files), without losing critical information when the context window fills up.

In practice: A three-day audit with hundreds of messages stays coherent. The agent remembers requirements extracted on Day 1, references evidence mapped on Day 2, and builds on test procedures designed earlier - even though the original messages are long compressed.

The complete architecture

Message arrives
    ↓
Derive session: agent:main:slack:channel:xxxxxxxxxx
(All threads share this session)
    ↓
[HOOK] Load memory files:
  - AGENTS.md
  - MEMORY.md  
  - memory/today.md
  - memory/yesterday.md
    ↓
Inject memory into message body
    ↓
LLM processes with full context
    ↓
[On compaction] 
  Memory flush → Save critical info
  Keep last 10-15 messages in full detail (active memory)
  Compress older messages into summary

Four layers of context:

Active memory - Last 10-15 messages in full detail (safeguard mode)
Compact summary - Older messages compressed (safeguard mode)
Memory files - Persistent context loaded via hooks
Memory flush - Critical info saved during compaction

This architecture ensures the agent has:

Immediate context from recent messages (active memory)
Historical context from older messages (compact summary)
Persistent context from workspace files (memory hooks)
Preserved context across compactions (memory flush)

Before and after

Before

Thread 1:
You: "What audit are you working on?"
Agent: "ISO 27001:2022 audit. 317 requirements extracted..."

Thread 2:
Teammate: "Re-extract Chapter 1"
Agent: "I don't have access to Slack or context about this audit."

After

Thread 1:
You: "What audit are you working on?"
Agent: "ISO 27001:2022 audit. 317 requirements extracted..."

Thread 2:
Teammate: "Re-extract Chapter 1"
Agent: "I'll re-extract Chapter 1 from ISO 27001:2022..."

The agent just... works. Conversations across threads, hours-long breaks - it maintains continuity.

What we learned

1. Architecture > Instructions

You can’t rely on prompts. We spent days tweaking instructions: “Remember to read memory.” “Always load context first.”

None of it worked reliably.

The memory hook fixed it immediately. If something is critical, make it architectural.

2. Default behaviors matter

Clawdbot’s thread isolation default makes sense for general chat. But for focused project work, it was completely wrong.

One line change solved our biggest problem. Question your defaults.

3. Context management is a system

None of these solutions worked alone:

Memory hooks without channel-wide sessions → lost context across threads
Channel-wide sessions without memory flush → lost context after compaction
Memory flush without hooks → agent forgot to load the flushed memory

You need all three layers working together.

4. Token costs are worth it

Memory hooks add 2-5K tokens per message. For a high-volume chatbot, that’s expensive.

But for multi-day audit work? The cost is irrelevant compared to having an agent that remembers.

Optimize for usefulness first, efficiency second.

5. Test in real workflows

Everything looked great in testing. Then we started actual audit work and everything broke.

The edge cases only appear under real conditions: multi-thread conversations, day-long sessions, multiple collaborators.

Implementation

Full implementation available on GitHub: clawdbot-memory-continuity

Quick version:

Custom hook: ~/.clawdbot/hooks/memory-context/index.js (~130 lines)
Core modification: /opt/homebrew/lib/node_modules/clawdbot/dist/routing/session-key.js (line 70, one boolean flip)
Configuration: ~/.clawdbot/clawdbot.json (enable safeguard mode + memory flush)

Maintenance: Hook survives updates. Core modification needs re-applying after npm update (30 seconds with included script).

See the GitHub repository for:

Complete installation guide
Auto-apply script for core patch
Configuration examples
Troubleshooting guide
Maintenance instructions

When to use this

Works great for:

Focused project channels
Long-running workflows (audits, investigations)
Collaborative work requiring context
Professional agents (auditors, analysts)

Might not work for:

High-traffic general channels
Independent support threads
Cost-sensitive high-volume applications
One-off conversations

The key question: Do threads represent separate conversations or organizational structure within one conversation?

Separate → use thread isolation (Clawdbot default)

Organizational → use channel-wide sessions (our modification)

Maintenance after updates

When you run npm install -g clawdbot@latest:

What survives:

✅ Custom hooks
✅ Configuration

What gets overwritten:

❌ session-key.js modification

To re-apply:

# 1. Backup new version
cp /opt/homebrew/lib/node_modules/clawdbot/dist/routing/session-key.js \
   /opt/homebrew/lib/node_modules/clawdbot/dist/routing/session-key.js.backup

# 2. Edit line 70: change useSuffix ?? true to useSuffix ?? false

# 3. Restart gateway
clawdbot gateway stop
clawdbot gateway install

Or use the auto-apply script from the GitHub repo:

cd clawdbot-memory-continuity
./scripts/apply-patch.sh

Verification

Check memory loading:

clawdbot logs --follow | grep memory-context

Should see:

[memory-context] Injected memory context into message body

Check session keys:

clawdbot logs --follow | grep -E "session.*slack.*channel"

Should see the same session key for all messages in a channel, with no :thread: suffix.

Sometimes the best solutions are the simplest: Make memory architectural, not optional. Make threads organizational, not isolating. Make preservation automatic, not manual.

That’s how you build an agent that actually remembers.

Rotimi Akinyele is the Vice President of Security at Deriv.

Follow our official LinkedIn page for company updates and upcoming events.

Join our team to work on projects like this.

Rotimi's Substack

Discussion about this post

Ready for more?