Loop Engineering: Your Agent Loop Needs a Back-Office (2026)

Q: Is Octopad a replacement for LangSmith, Weave, or Langfuse?

No, it is complementary. Those tools capture run-level telemetry: tokens, latency, and tool-call traces for a single execution. Octopad captures work-level state and a human-readable progress narrative across sessions. Most serious loops want both.

The short version:

Loop engineering is real. The best people directing coding agents stopped typing prompts and started writing the loops that do the typing.
The loop forgets. Every iteration starts from whatever state you hand it. The context window clears; the agent does not remember yesterday.
So the durable artifact isn't the prompt anymore. It's the state. The spec the loop checks against, the memory it resumes from, the record of what it shipped. That's the layer loopmaxxing skips, and the gap Octopad is built to fill.

One thing up front, because it's the whole basis for trusting the rest: Octopad does not write or run your loop. The loop lives in your AI client or code you run yourself. Octopad is the back-office it reads from and writes to. Start free in 60 seconds if that already sounds like the gap you're hitting.

From prompts to context to loops

In June 2026 a way of working that had been quietly spreading got a name. Boris Cherny, Head of Claude Code at Anthropic, said it on stage at Acquired Unplugged:

"I don't prompt Claude anymore. I have loops that are running. They're the ones that are prompting Claude and figuring out what to do. My job is to write loops."

Boris Cherny, Head of Claude Code, Anthropic, Acquired Unplugged, June 2, 2026

Days later, Peter Steinberger, creator of OpenClaw, posted the line that named the practice: "you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents." (@steipete). Addy Osmani codified it the same week: "Loop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead."

It's the next rung on a ladder we've been climbing for two years. First prompt engineering: craft the perfect instruction. Then context engineering, which Andrej Karpathy described as "the delicate art and science of filling the context window with just the right information for the next step." Loop engineering goes one level up again: you no longer fill the context window by hand for one step, you design the system that fills it, every step, on its own.

What a loop actually is

Strip away the hype and an engineered loop is a small, old idea: reason, act, observe, repeat. Something triggers the loop. An agent reads the current state, decides on an action, and executes it through tools. Then an evaluator checks the result against the goal, and the loop either continues, retries, succeeds, or hands off to a human.

Two parts carry all the weight. The first is a trigger: a schedule, a CI failure, a slash command. The second, and the one most people get wrong, is a verifiable goal. As one good primer puts it: "LLMs have no built-in concept of 'done.' Without explicit stopping conditions, a loop runs until the money runs out."

That's why the evaluator matters so much, and why it should be separate from the agent doing the work. Anthropic calls this the evaluator-optimizer pattern: "One LLM call generates a response while another provides evaluation and feedback in a loop." A model grading its own homework is a pathological optimist. The check has to come from outside it. Ideally from the environment itself: tests pass, the build compiles, the exit code is zero.

Five things, and the sixth

Osmani gives the cleanest inventory of what a loop is made of: "A loop needs five things and then one place to remember stuff." The five are the machinery:

Automations that fire the loop on a schedule and do discovery and triage on their own.
Worktrees: isolated branches so two agents working in parallel don't step on each other.
Skills, the project knowledge written down once (a SKILL.md) so the agent stops re-explaining your project every session.
Connectors, built on MCP, that let the agent read your issue tracker, query a database, hit a staging API, or drop a message in Slack.
Sub-agents: specialists dividing the labor, one drafting, another checking.

Then the sixth thing, the one he leans on hardest: memory. Anything that lives outside the single conversation and holds what's done and what's next. His line is the whole argument in seven words:

"The agent forgets, the repo doesnt."

Addy Osmani, Loop Engineering

Notice the split. The first five are about the runtime: how the loop is wired, where it runs, how agents stay out of each other's way. Those belong to your loop runner, and they should. Octopad has no opinion on your worktrees or your scheduler. The sixth is different. Memory, plus the spec the loop checks against and the record of what it did, is not runtime. It's state. And state is the part teams keep hand-rolling out of a progress file, a markdown checklist, or a Linear board, then watching it rot.

Loopmaxxing: when the loop becomes the product

Every powerful idea grows a degenerate twin. Prompt engineering had "tokenmaxxing," the belief that burning more tokens is the same as getting more done, the AI era's version of measuring productivity in lines of code. Loop engineering has loopmaxxing: replacing software architecture with an open-ended while(true) and trusting that enough iterations will eventually converge on something good.

They don't converge. They drift. The common failure modes:

Fuzzy goals cause drift. Tell a loop to "refactor this to be better" and you get an agent six hours deep, optimizing a metric nobody chose, producing a beautiful loop and garbage output.
Self-grading evaluators reward-hack. Let a loop judge its own work and it games the rubric. In work Osmani cites, the share of gamed evaluations rose from about 26% to 58% as optimization steps went from 10 to 100. Past a point, more looping makes things worse, not better.
The bill is the only circuit breaker. A loop with no exit condition runs until the budget does. Tom's Hardware reported one run that burned through roughly $1.3M in API tokens in a single month. The point isn't whoever ran it. It's that nothing in the loop itself ever said stop.
Comprehension debt piles up. Osmani's term for the gap between "how much code exists in your system and how much of it any human being genuinely understands." An autonomous loop ships faster than you can keep up, and one day production breaks and you're reading thousands of lines you've never seen.

And the quiet one underneath all of them, from Dann Waneri's "The Loop Is Not the Product": "Cron jobs ran quietly and failed loudly. Agents run loudly and fail quietly." It's confident, well-formatted, and wrong, and it returns HTTP 200 the whole way down.

Every one of those failures is the same thing: a missing state layer. Drift is a goal that wasn't persisted and re-asserted. Reward-hacking is an evaluator with no external ground truth. Comprehension debt is no durable record of what shipped and why. The loop isn't broken. It just has no memory to run on.

What the loop reads from: the part loopmaxxing skips

So separate the two jobs. The code that runs the loop owns the runtime: the schedule, the control flow, the retries, the budget, the kill switch. Something else has to own the state: the spec, the memory, the shared truth, the record. That second job is Octopad's, and only that one.

What the loop needs	Where Octopad fits	What stays on your side
Memory that outlives the context window	Persistent tasks and typed knowledge the loop reads and writes over MCP. Its disk, not a file on one machine	Deciding what to load into the context window each turn
A goal the evaluator can check	A machine-readable "Done when" plus scope and out-of-scope on every task	Running the actual check (tests, types, judge) and deciding to stop
One source of truth for many agents	A cross-host workspace every client and teammate reads and writes. Drafter and checker see the same picture	Spawning sub-agents and passing messages between them
Standing project knowledge	Markdown pages plus a methodology loaded into every session, hosted and shared	Repo-local `SKILL.md` / config the runtime auto-discovers
A record of what shipped and why	Typed Decisions with rationale, session recaps, and a rolling progress narrative, kept current automatically	Per-iteration token, latency, and tool-call traces
A goal that doesn't drift	Why / What / out-of-scope / Done-when give the loop a structured target to re-anchor to, instead of a vague prompt	Budgets, iteration caps, and the circuit breaker

That right-hand column is the whole argument, not a hedge. Scoping to state, and not runtime, is what lets Octopad compose with the rest of your stack instead of competing with it.

A loop with a back-office, concretely

Take a concrete case: a nightly loop that triages the day's CI failures. The loop itself lives in your own code: a cron job, a GitHub Action, a while loop in a script. Here's where Octopad shows up, and where it deliberately doesn't:

Your cron job fires at 2am and starts an agent session. (Octopad isn't the clock.)
The agent reads state: it calls Octopad to pull the open tasks, recent decisions, and the rolling work plan, so it knows what's already being handled and what the "Done when" for each is.
The agent acts in your repo. It reads the failing test, drafts a fix on a worktree, runs the suite. (That part is all yours; Octopad never touches your code.)
The agent writes state back: it files a task for each genuine failure with Why/What/Done-when, records a typed Decision ("rolled back the flaky retry, root cause is the shared fixture") with its rationale, and flags anything ambiguous as a Question.
Your script checks the exit condition. Tests green? Open a PR and mark the task. Stuck after N retries? Hand off: flip the task to blocked and ping a human on Slack. (Octopad is where the handoff lands; the decision to hand off is yours to make, not Octopad's.)

The next morning, you (or a teammate, on a different AI client) open the workspace and the loop's whole night is legible: what it touched, what it decided, what it couldn't resolve. Not because you wrote a status report, but because the loop wrote its state as it went. That's the difference between an agent that runs loudly and fails quietly and one you can actually stay the engineer of.

From a solo loop to a team of loops

Everything above holds for one person and one loop. It gets more important, not less, the moment a second person shows up.

Cognition learned this building multi-agent systems and wrote it down: "Share context, and share full agent traces, not just individual messages." Sub-agents that don't share state produce incoherent work. Their own example: one agent builds a Flappy Bird background while another draws Super Mario in the foreground. The same is true of teammates' loops. If your loop's evaluator and your colleague's drafter are reading different copies of the truth, they will quietly diverge.

This is the part a progress file on one laptop can't do. Octopad is one workspace every AI on the team plugs into over MCP: Claude, ChatGPT, Cursor, whatever each person runs. A decision one teammate's loop captures in the morning is something another teammate's loop reads in the afternoon. The industry is already converging here; as one analysis put it, "the issue tracker becomes the source of truth, and the agent becomes the synchronization layer." Octopad is that source of truth, built for the agents to read and write directly rather than retrofitted from a human tool.

The pragmatic setup: stay the engineer

None of this is an argument against loops. Loop engineering is a real upgrade, and the people pushing it are right. It's an argument against building loops on amnesia. The division of labor that works:

Your runner owns control flow, budgets, retries, and the kill switch. Keep it deterministic where you can; reach for the LLM only at the decisions plain code can't make.
Your loop owns verification against ground truth: tests, types, a separate evaluator, not the model's own confidence.
Octopad owns the state the loop stands on: the spec it checks against, the memory it resumes from, the record everyone reads.

Osmani's advice is the right one to design by: "Build the loop like someone who intends to stay the engineer - not just the person who presses go." Staying the engineer means the loop's work is legible to you the next morning. That legibility is a place, and the loop has to write to it. Give your loop one.

Common questions

Does Octopad run my agent loop?

No. The loop runs in your AI client or your own code. The schedule, the retries, the budget caps, and the kill switch all live there. Octopad is the shared state, memory, spec, and progress record the loop reads from and writes to over MCP.

Is Octopad a replacement for LangSmith, Weave, or Langfuse?

No, it's complementary. Those tools capture run-level telemetry: tokens, latency, and tool-call traces for a single execution. Octopad captures work-level state and a human-readable progress narrative across sessions. Most serious loops want both.

How does my loop connect to Octopad?

Through the Model Context Protocol (MCP). You paste one workspace URL into any MCP-compatible client (Claude, ChatGPT, Cursor), and your loop reads and writes the same workspace through standard tool calls. Same workspace, any client.

Can two teammates' agents share one loop's state?

Yes. A decision one teammate's agent writes in the morning is something another teammate's agent reads in the afternoon, regardless of which client each person uses. Everyone's agents read and write the same workspace.

Can Octopad stop a runaway or fuzzy loop?

Not directly. Budgets, iteration caps, and kill switches are the runner's job, not Octopad's. What Octopad does is make the goal hard to fudge: a task with Why, What, out-of-scope, and a Done-when condition is a target the loop re-anchors to at each checkpoint, instead of "optimize this until it's better."

You're writing loops now. What are they reading from?