brianchitester.com Posts Polymath Profiles

The Session Is the Unit

Most of the advice online about working with coding agents is some version of the same thing: keep the context window small, trim tokens, don’t overload the prompt. It’s all focused on what goes into the window at any given moment.

I’ve been running a different experiment for a while now, and I think the more interesting lever isn’t what — it’s when.

Three Phases, Not One Blob

The way I’ve fallen into working with agents looks like this:

Context initialization. A deliberate ritual at the start of every session. Load the relevant prior notes, surface the current state of the work, orient the agent before asking it to do anything.

The working phase. The actual task. This is the part everyone writes about.

Wrap-up. At the end of the session, the agent writes a human-readable summary to disk — what got done, what got decided, what’s still open. Next session’s init pulls from it.

None of the individual pieces are novel. The shift for me was realizing that the agent’s reliability across a project has almost nothing to do with how well-tuned my prompt is in any single moment, and almost everything to do with whether I have a rhythm at the seams between sessions.

Rot Lives at the Boundaries

If you just keep feeding a session more and more work, the context degrades. The agent forgets things it knew an hour ago. It starts contradicting decisions it made earlier. It pulls in stale details and misses fresh ones. There’s real research on this — people are calling it context rot — and every frontier model exhibits it well under the window limit.

You can try to fight rot inside a single session by being more careful about tokens. That works a little. What works a lot more is accepting that a session has a natural lifespan, and investing in the handoff between sessions rather than trying to make any one session last forever.

That reframe changes what you optimize for. Instead of “how do I keep the context clean,” the question becomes “how do I make it cheap to throw this context away and start fresh tomorrow.”

The answer, it turns out, is writing things down.

Human-Readable, and Mine

The wrap-up artifacts are just markdown files. Dated, named, sitting in a folder I can grep, diff, and read myself. When I start a new session, the agent reads them too.

One thing I’m deliberate about: these files are gitignored. They’re mine, not the project’s. Everyone on a team has slightly different preferences, tools, and working styles, and I don’t want my personal workflow bleeding into the shared repo. The codebase stays clean. The notes stay local. I might revisit this later, but for now keeping the artifacts out of version control feels like the right default.

This is also a deliberate choice against opaque memory stores. There’s a whole category of tools trying to give agents “memory” via embeddings and vector databases — hidden state the agent pulls from silently. I’m sure that has a place. But for how I actually work, that approach is the worst of both worlds: I can’t audit it, I can’t edit it, and when it’s wrong I can’t tell why.

A markdown file is the opposite. If the agent misunderstood something last session, I can see it. If a summary is getting stale, I can rewrite it. The artifacts aren’t a side effect of the workflow. They’re the whole point.

Sessions as a Feedback Loop

There’s one more part of the ritual that took me a while to name. Each session is also an opportunity to correct the agent’s behavior in-flight — to push back when it goes off-course, to clarify a preference, to tell it “no, not that, this.”

Those corrections are the most valuable output of a session, and they’re easy to lose. If I just close the laptop, the lesson dies with the context window.

So the wrap-up has a second job beyond summarizing the work. It captures the corrections. And when the same correction shows up across multiple sessions, it gets promoted — pulled out of the session notes and written into CLAUDE.md as durable policy.

That gives me two tiers of memory with clear roles. Sessions are a decaying log of what happened and what had to be adjusted. CLAUDE.md is promoted policy — the patterns that have earned their way into “this is how we work here.” One is transient and specific. The other is lean and load-bearing. Keeping them separate means CLAUDE.md doesn’t bloat with every one-off correction, and I don’t lose the one-offs either.

The session isn’t just where work happens. It’s where the rules get drafted before they’re committed.

The Ritual Matters More Than the Content

Here’s the part that surprised me: the exact content of the init and wrap-up steps matters less than the fact that they happen every time.

If I init most sessions but skip it when I’m in a hurry, the agent drifts. If I wrap up most sessions but skip one, the next session starts colder than it should — and whatever corrections I made during that session are lost. The reliability comes from the consistency of the rhythm, not from the perfection of any individual pass.

This is why I encoded each phase as a named slash command rather than leaving it as a checklist I hold in my head. A checklist gets skipped when I’m tired. A command gets run because it’s the thing I type.

It’s a small thing, but it removes a whole class of “did I remember to do that” variance from the workflow. The harness enforces the rhythm so I don’t have to.

What This Connects To

Since I started doing this, I’ve noticed the industry is converging on something similar from a few different directions. Cline’s Memory Bank has the same three-phase shape. Spec-kit and BMAD encode deliberate context resets between phases. The AGENTS.md standard is trying to make the persistent-policy file portable across tools.

The umbrella term that’s emerged is “context engineering” — Karpathy seems to have coined it, Anthropic has written about it at length. It’s a real thing and the conversation is getting better.

What I’d add to that conversation is this: most of the writing on context engineering is still focused on the window. What token budget, what instructions, what retrieval strategy. Those matter. But the bigger move — at least for the way I actually work — is treating the session as the unit and being disciplined at the seams.

A well-tuned prompt inside a session that never gets written down is a local maximum. A mediocre prompt inside a session that wraps up cleanly, captures its corrections, and promotes the recurring ones to policy is compounding infrastructure.

Sessions as a Design Primitive

The thing I keep coming back to is that the session is a more natural unit of work than most people give it credit for. It has a start, a middle, and an end. It produces artifacts. It has a context that was relevant while it lasted and is mostly discardable afterward.

Most engineering work doesn’t have natural session boundaries — which is exactly why you have to impose them. If you don’t draw the line, the session becomes infinite, and infinite sessions are where rot lives.

Draw the line. Write things down at the line. Promote what recurs. Start fresh on the other side.

That’s most of it.