AXME Code · 8 min read

After 100 Sessions: What Persistent Memory Changed About How I Use Claude Code

Before: re-explain the project every session, watch the same mistakes, fix the same bugs twice. After 100 sessions with persistent memory: different patterns, measurable time savings, one problem I did not expect.

This is a report, not a pitch. I have been using Claude Code with a persistent memory plugin (AXME Code) for about four months. In that time I crossed 100 sessions on a single project. I want to tell you what actually changed and what I wasn’t expecting.

I’ll keep the numbers to what I can justify and mostly describe the behavioral changes, because the behavioral changes are the real story.

What I tracked

I did not run a controlled experiment. I’m a working engineer, not a researcher. What I can show you is:

  • A session log I kept roughly chronological, noting interesting events
  • Worklog entries from the plugin itself (automatic, every session)
  • A few specific cost and time comparisons for tasks I did both before and after installing the plugin

The “before” baseline is the first two weeks I used Claude Code on this project, with just a hand-written CLAUDE.md. The “after” state is sessions 40 through 100, where memory was loaded and active.

I am not going to claim precise percentage improvements, because I don’t trust my own measurements at that resolution. What I will claim is direction and approximate magnitude.

The first thing that changed: I stopped retyping

This is the most concrete and the least surprising.

The first 4 minutes of every Claude Code session, before memory, looked like this: I typed a block describing the stack, the conventions, and the current state. I retyped variations of this block every session. Over two weeks I estimated I spent about 40 minutes total just retyping the opening block.

After memory, the first 4 minutes of a session looks like this: the agent reads the memory store, prints a 3-line summary of where the last session stopped, and asks if I want to continue. I say yes or pivot. Elapsed: 15 seconds.

Over 60 sessions, that’s the difference between 4 hours of retyping and roughly 15 minutes of confirmation. I got about 3.75 hours of my life back by having the tool remember what I already told it.

This is the baseline, not the interesting part. The interesting part is what happened to my behavior.

What I wasn’t expecting: I started trusting the agent on things I used to do myself

Here’s a specific example. Our project has a rule that every migration file has a paired .postgresql.sql and .sqlite.sql version because tests run on SQLite and prod runs on Postgres. For the first two weeks, whenever I asked the agent to add a migration, I would carefully check both files and fix the SQLite one because the agent usually got Postgres-specific syntax in it (e.g., ALTER TABLE ... ADD COLUMN IF NOT EXISTS, which SQLite doesn’t support pre-3.35).

I caught this every time. Every time I caught it, I fixed it. Every time I fixed it, I muttered “it should know this by now.”

After session 20-ish, I added a memory entry: “SQLite version of migrations must avoid IF NOT EXISTS on ALTER TABLE. SQLite 3.35+ supports it, but our CI uses 3.31 to match prod.”

After that memory was in place, for the next 80-something sessions, the agent correctly wrote the SQLite version every time without me having to check. I stopped opening the SQLite migration file to verify. Two months in I realized I was reading the SQL once during code review, not every time during writing, which cut the cost of migrations from ~5 minutes to ~2.

The behavioral change: I started trusting the agent to get a specific thing right because it had a specific memory for that specific thing. Not general trust, not “the agent is good.” Targeted trust, grounded in a concrete rule loaded into every relevant session.

I hadn’t expected this. I thought the gain would be “the agent makes fewer mistakes.” The actual gain was “I stop double-checking things I know the agent will get right.” The double-checking cost was bigger than the mistakes cost.

A surprise: I started writing decisions to think clearly

The tool has a feature where I can type “save a decision” and describe a commitment. The audit agent writes it to decisions/ with a rationale field and an enforce level.

The rationale field turned out to be quiet but important. When I had to write the reasoning, I often discovered the reasoning was thinner than I thought.

Example: I decided at one point “we will not use a database for state, only plain files.” I typed this into a decision and started writing the rationale: “Because files are simpler and…” and got stuck. I couldn’t complete the sentence without actually thinking about it. Were files really simpler? For what access patterns? What was the expected size? What about concurrent writes?

In the minute it took to write the rationale, I realized I did not actually have a good reason. I had a preference. I had not thought it through. I deleted the decision, marked the question as open, and came back to it a week later after actually evaluating the alternatives.

This happened maybe a dozen times over 100 sessions. In each case, the act of writing the decision caught fuzzy thinking before it became a commitment. Not always. Sometimes I wrote a decision, looked at the rationale, and said “yep, that’s correct.” But the dozen times I caught myself were valuable in a way that has nothing to do with the tool’s direct purpose.

The tool was built to remember decisions. It turned out to also function as a rubber duck that forced me to articulate why.

What I didn’t expect: the negative surprise

Here is the thing I did not see coming. I got complacent about grounding.

Around session 70, I started trusting the memory store so much that I stopped verifying its contents as often. If a memory said “the billing module uses Decimal for all money,” I’d believe it without re-checking. Usually it was right.

Then one session I asked the agent to add a discount calculation. The agent read the memory (“money is Decimal”), generated code using Decimal, I approved it. Hours later in production we got an exception: the discount function was returning float because I had accidentally mixed types in a helper. The memory said “the module uses Decimal” but that was true in intent and mostly true in implementation, not verified in every line.

The memory was stale in a subtle way. It had been correct at some point. It was still broadly correct. But the specific function being modified was an exception that the memory didn’t capture.

The failure mode I was not expecting: the memory store creates an illusion of certainty. A memory, read at the start of a session, feels like a fact. It is actually just a summary of what was true at some earlier point, possibly with updates I am not aware of.

After that incident I added a habit: for load-bearing memories (money handling, auth, db migrations), I verify against current code before trusting. Roughly weekly, not every session. It’s a small cost but it caught two more cases of memory drift in the next 30 sessions.

The tool’s value is not that memories are always correct. The tool’s value is that memories are usually correct in ways I can exploit, as long as I remember they are not ground truth.

Numbers I can actually defend

Estimates, not precise claims:

  • Session opening time: dropped from ~4 minutes to ~15 seconds. (High confidence, direct measurement.)
  • Re-correction loops (same correction given in multiple sessions): dropped from “several per week” to “maybe one per month.” (Subjective but clear.)
  • Time spent debugging the same issue twice: dropped. I can’t quantify this precisely but it went from weekly to roughly monthly. (Memories capture specific gotchas, they don’t recur.)
  • Time spent on session endings: went up by about 1 minute per session, for the audit close ritual. This is a real cost. It pays for itself in session start time savings, but it’s not zero.

Net: maybe 20-30 minutes saved per day across 4-6 sessions. Which is meaningful but not revolutionary. The bigger wins are qualitative.

What I tell people who ask

“Is persistent memory worth it?” is a common question. My answer:

  • If you use Claude Code on a project for less than 2 weeks, probably no. The overhead of setup and writing decisions doesn’t pay back on short projects.
  • If you use it on a project for more than a month, probably yes. The memory store starts to have enough accumulated wisdom that re-explaining tax drops noticeably.
  • If you use it on a project for more than 3 months with multiple contributors, definitely yes. The supersede chains, the decision log, the safety rules all become load-bearing.

I am in the last category. My 100+ sessions of AXME Code on AXME Code crossed the threshold where the tool is no longer a convenience but a system of record.

The one behavioral change I wish I’d made sooner

I wish I had started writing decisions with rationales from day 1. Not because of the agent. Because of me.

The rationale field catches fuzzy thinking. It’s a cheap habit that I was underusing because I thought “decisions are for the agent, not for me.” Wrong. Decisions are mostly for future-me, who is going to look at the codebase in six months and ask “why did we do it this way?” If the decision has a rationale, future-me has an answer. If it doesn’t, future-me is guessing or doing archaeology.

Start writing rationales on day 1, whether you use a tool or just a plain decisions/ directory. It’s the habit that compounds most.

The bottom line

100 sessions later: I save time, I think more clearly, I catch memory drift if I stay honest about it. The tool was worth installing. The bigger gains came from behavioral changes it encouraged more than from its direct feature set. And the one failure mode I didn’t predict (false sense of certainty from memories) required me to develop a counter-habit to stay safe.

I’m still using it. Every day. That’s the honest report.

More on AXME Code