AXME Code · 9 min read

We Built AXME Code With AXME Code. Here's What the Memory File Looks Like.

Six months of dogfooding. Real memories, real decisions, real safety rules from our own repo. What patterns emerged. What we threw away. What surprised us.

The phrase “dogfooding” is overused but accurate here. AXME Code is a plugin that gives Claude Code persistent memory, decisions, and safety rules. The entire development of AXME Code happened inside Claude Code sessions. With AXME Code loaded. On the AXME Code repo.

This sounds like a recursive joke. It also means the .axme-code/ directory in the AXME Code repo is the most realistic possible example of what six months of real use produces. Not a curated demo. Not a tutorial project. The actual working state of a tool being used to build itself.

I’m going to show you what’s in there, what patterns emerged, what we kept, what we threw away, and what genuinely surprised me.

The structure after 6 months

Here is a find of .axme-code/ in the AXME Code repo, edited for clarity:

.axme-code/
├── oracle/
│   ├── stack.md           # TypeScript, ESM, Node 22, Claude Agent SDK
│   ├── structure.md       # high-level module layout
│   ├── patterns.md        # project-wide coding patterns
│   └── glossary.md        # terms specific to this repo
├── memory/
│   ├── user_georgeb.md    # my preferences
│   ├── patterns/          # 27 pattern files
│   │   ├── audit-worker-lifecycle.md
│   │   ├── claude-sdk-message-format.md
│   │   ├── transcript-parsing-edge-cases.md
│   │   └── ...
│   ├── references/
│   │   ├── claude-code-hooks-docs.md
│   │   └── mcp-sdk-docs.md
│   └── gotchas/
│       └── 14 entries
├── decisions/             # 71 decision files
├── safety/
│   └── rules.yaml         # 23 deny patterns, 4 protected branches
├── sessions/              # 89 session handoffs
├── backlog/               # 38 tracked items
├── worklog.jsonl          # chronological event log
└── test-plan.yaml         # auto-run tests at session start

Some of this exists because it’s part of the schema. Some of it grew from real use. Let me walk through what’s interesting.

Oracle: the stable stuff

oracle/ is the part that looks the most like a traditional CLAUDE.md. It’s the “what is this project” layer, written mostly at setup time and updated infrequently.

oracle/stack.md is 12 lines. Stack, runtime, package manager, test command, build command. That’s it. Writing it took 3 minutes during initial setup. It hasn’t changed in 4 months.

oracle/glossary.md is more interesting. It started with 4 terms. It now has 23. Every time a new word took on a specific meaning in this project (“audit worker”, “two-phase audit”, “handoff”, “enforce level”), it went into the glossary. The entries are one line each:

- **audit worker**: The background process that runs after a session ends,
  reads the transcript, and extracts memories/decisions/safety rules via
  an LLM call. Detached from the main MCP server.

- **handoff**: A short markdown block generated by the audit worker
  summarizing what the session was working on and what was left in
  progress. Loaded at the start of the next session.

I did not predict that the glossary would grow this much. I thought glossaries were for big open-source projects. It turns out, in a 6-month project with a consistent voice, words accrete specific meanings fast. Writing them down once saves having to re-negotiate their meaning every session.

Memory/patterns: the accreted wisdom

memory/patterns/ is the biggest category: 27 files. These are things we discovered during use, not things we planned.

Some examples of actual pattern files in the repo:

  • claude-sdk-message-format.md: The Claude Agent SDK’s message format has a gotcha. Messages with tool_use blocks must be followed by a tool_result in the next user message, and both must be properly paired, or the API returns a cryptic error. We hit this three times before we wrote it down. It’s now the first pattern loaded whenever the agent touches SDK code.

  • transcript-parsing-edge-cases.md: When parsing session transcripts, messages can have a content field that is either a string OR an array of blocks. Every once in a while the array has a block with type "text" and an empty string. Handling this wrong causes the audit worker to drop content. We have tests for it now. The pattern file links to the tests.

  • audit-worker-lifecycle.md: The audit worker is a detached subprocess. If you call process.exit() before the worker’s fd’s are flushed, you lose the audit. We have a specific sequence: flush stdout, wait for a sentinel, then exit. This pattern exists because we shipped a version that skipped the sequence and lost audits silently.

The pattern I want to highlight is transcript-parsing-edge-cases.md, because it represents the single biggest category of these files: shape of a bug you hit once and don’t want to hit again. Before AXME Code, I would write these down in a scratch file that I’d lose. Now they go into patterns/ and get loaded into every session that touches the transcript parser. Result: the bug really doesn’t recur.

Decisions: the contract layer

decisions/ has 71 files. This surprised me most. I expected maybe 20 over 6 months. 71 is more like a decision every 2-3 days, which is roughly the pace of “we made a real choice with a reason.”

The enforce split is roughly 40% required, 60% advisory. Required decisions are the ones the pre-tool-use hook enforces. A sample of required ones:

  • D-003: Never force push on any branch
  • D-017: Never modify user-owned files outside the repo without explicit request
  • D-029: Synchronized SDK releases — never release one SDK without bumping all five to the same version
  • D-072: After the user merges a PR, switch local checkout to main and pull

Advisory decisions are things we strongly prefer but can bend. Samples:

  • D-014: Default to ESM imports, fall back to CJS only for legacy module compat
  • D-041: Session close should extract at least one memory, one decision, and zero safety rules (unless genuinely nothing new was learned)
  • D-055: Test files live next to source, not in a tests/ directory

Supersede chains exist for 8 decisions. My favorite is D-012 (use SQLite for all local storage) being superseded by D-058 (use plain markdown files instead, SQLite was fighting git and adding no value). The history is preserved. When a new contributor asks “why not SQLite?”, I can point to D-058 and show exactly why we changed direction.

Three months ago, D-012 and D-058 would have been one line in CLAUDE.md that I rewrote in place, losing the history. Having them as separate supersede-linked files makes the past visible.

Safety: small but critical

safety/rules.yaml is surprisingly short. 23 deny patterns. 4 protected branches.

protected_branches: [main, master, develop, release/*]
deny_commands:
  - pattern: "rm -rf /"
    reason: "Catastrophic."
  - pattern: "chmod 777"
    reason: "Security footgun."
  - pattern: "git push --force"
    reason: "See D-003."
  - pattern: "git push --force-with-lease"
    reason: "Lease can go stale under concurrent commits. See D-003."
  - pattern: "curl | sh"
    reason: "Arbitrary code execution from network."
  - pattern: "curl | bash"
    reason: "Same as above."
  - pattern: "npm publish"
    reason: "Releases go through GitHub Actions, not local."
  - pattern: "gh release create"
    reason: "Releases go through GitHub Actions, not local."
  - pattern: "git commit --no-verify"
    reason: "Hooks exist for a reason."
  - pattern: "git rebase -i"
    reason: "Interactive requires TTY the agent doesn't have."
  # ... 13 more

Every single one of these entries exists because of a specific incident. We didn’t predict them. They accumulated.

git rebase -i is in the list because Claude Code once tried to run an interactive rebase in a headless subagent, hung, timed out, and we had to recover. After that we added the pattern. Now the rebase fails fast with a helpful message.

This is the shape I want to emphasize: safety rules are an incident log, not a guess list. Don’t try to write them all at setup time. Let them accumulate.

Sessions and worklog: the history layer

sessions/ has 89 handoff files. One per session. Each contains:

  • What the session was working on
  • What got finished
  • What’s still in progress
  • Any open questions
  • What the next session should start with

When a session ends, the audit worker generates a handoff. When the next session starts, it reads the latest handoff and surfaces the top 3 lines to me: “previous session: …”.

I do not read old handoffs. The latest one is almost always sufficient. Older handoffs are kept for archaeology, not for context.

worklog.jsonl is more interesting. It’s a chronological event log: session starts, session ends, memory writes, decision writes, safety rule adds. Append-only. After 6 months it’s 2300 lines. I can grep it for any event pattern.

The useful thing about the worklog is that it survives even when individual handoffs rot. If I want to know “when did we add the force push safety rule,” I grep worklog.jsonl for safety_added and force. Takes two seconds.

What I threw away

Not everything worked. Things I put in place and removed:

A “daily notes” feature: auto-generated daily summaries. Cute, useless. Handoffs are per-session, daily summaries would duplicate them. Dropped after a month.

An aggressive auto-expiry on memories: memories older than 30 days auto-marked as stale. Bad idea. Some patterns are stable for years. Others are stale after a week. Time-based expiry is the wrong signal. I removed it and now memories are only superseded when a new one explicitly replaces them.

Emoji in memory titles: I tried using 📝 / 🔒 / ⚠️ to visually differentiate memory types. Markdown editors rendered them inconsistently and grep across emoji became fragile. Dropped. Types are now indicated by directory, not by emoji prefix.

A “confidence score” on each memory: I thought memories should have confidence levels so the agent could weigh them. In practice, confidence was impossible to score reliably, and the agent weighed them however it wanted anyway. Dropped.

What surprised me

Three things, in order of biggest surprise:

1. Decisions grew faster than memories. I expected memories to be the main growth vector. Instead, decisions outpaced memories by about 2.5x. It turns out that when you have a place for decisions, you make more of them explicit. Before AXME Code, many decisions were implicit (“we just do it this way”). Explicit decisions in a file made my own reasoning more rigorous.

2. The audit worker caught things I didn’t realize I’d said. Sometimes I’d finish a session, run close, and the audit would extract a decision I didn’t remember committing to. I’d look at the extracted decision, think about it, and realize: yes, I had said that in passing, and yes, it was actually a real commitment, and yes, I’d almost immediately forgotten it. The audit was catching more-than-I-thought.

3. The system became a form of rubber duck. Having a structured place to write decisions made me think more clearly about them. The act of phrasing a decision with a rationale in a file exposed fuzzy thinking. Some decisions, I’d start writing and realize “I don’t actually have a good reason for this.” Then I’d go back and not make the decision. The store was not just recording, it was slowing me down at the right moments.

What this means for you

If you set up a memory system tomorrow, don’t try to predict what will fill it. You’ll be wrong. Let it accumulate from real use and prune things that don’t earn their keep.

Also: look at the 6-month state, not the 1-week state, when deciding if a memory system works. At 1 week, anything looks useful. At 6 months, only some of it does, and the parts that survive are the ones worth copying.

The .axme-code/ directory in the AXME Code repo is the 6-month state. 27 patterns, 71 decisions, 23 safety rules, 89 session handoffs. Every file earns its presence by being useful more than once.

That’s my claim for persistent agent memory: it works, it takes time to show value, and the real proof is what the directory looks like after it’s been living inside real work for half a year. Not tutorials. Not demos. Real work.

If you want to see the actual files, AXME Code is open source. Clone it, browse .axme-code/ in the repo, see what real 6-month state looks like.

More on AXME Code