"The monkey runs because someone still has to decide what's worth running toward."
The Question Nobody Asks Out Loud
I run a startup studio with AI agents instead of employees. Not as a thought experiment — as a real business. I have a COO agent that coordinates across projects. PM agents that write specs and prioritize backlogs. Engineering agents that build features, write tests, and open pull requests. Marketing agents that create distribution plans. A security agent that audits code. A research agent that mines logs for patterns.
Fifteen-plus agents filling twenty-plus roles across a portfolio of products.
And every few weeks, usually late at night, the same question surfaces: what am I actually doing here?
It's not imposter syndrome. It's a genuine architectural question. If the agents can do the PM work, the engineering work, the marketing work, the security work, and the operational coordination — what's left for the human? If you can replace your team with agents, what's the irreducible contribution that can't be automated?
This is the question at the heart of MonkeyRun. Every other post on this site answers how — how we orchestrate agents, how we manage context, how we build with flat files and conventions. This one answers why.
The Anxiety Is the Point
Let me be specific about the anxiety, because I think every founder experimenting with AI feels some version of it.
I'll watch Atlas (the founder agent on Halo) work through a feature — reading the schema, planning the approach, building the component, writing tests, opening a PR — and realize that the entire sequence, start to finish, would have taken a junior engineer a full day. Atlas does it in 20 minutes. With fewer bugs.
Then I'll watch Jared (the COO agent) sweep across all projects, read status files, flag risks, propagate patterns from one project to another, and write briefings for the other agents. That's a full-time COO role. He does it continuously, across all projects, without sleep.
Then Nova writes a product spec. Scout mines logs for content ideas. Marco plans a distribution strategy. Jenny runs a security audit.
At some point you have to ask: am I the founder, or am I the guy who turns on the lights in the morning?
Trust, But Verify — At Machine Speed
Reagan borrowed the phrase from the Russian proverb: Доверяй, но проверяй. Trust, but verify. It's become the default mental model for human-in-the-loop AI systems. You trust the agent to do the work. You verify the output.
The problem is that "verify" doesn't scale the obvious way.
When you're running two or three agents, you can review every PR, read every document, check every decision. But when you're running fifteen agents across six projects — some of them operating concurrently, some of them spawning subagents — you physically cannot review everything. The output volume exceeds human attention bandwidth.
So you build systems to verify.
That's what most of MonkeyRun's infrastructure actually is. Not tools for building — tools for checking:
- FEATURES.yaml contracts that enforce what "done" means, so an agent can't claim a feature is shipped when acceptance criteria aren't met
- WIP.md protocols that prevent agents from silently overwriting each other's work
- Marketing-reality audits that automatically compare what the site claims with what the product actually does, catching oversells before they go live
- COO status sweeps that aggregate project health across the portfolio, flagging risks that no single agent can see
- Convention files that evolve through failure — every agent mistake becomes a rule that prevents the next one
The human doesn't verify every line. The human designs the verification system. And then verifies that.
This is the subtle shift that took me months to internalize. My job isn't to check the agents' work. My job is to build systems that check the agents' work, and then to check whether those systems are working. It's verification all the way down — but each layer is more abstract than the last, and the human operates at the top of the stack.
The Irreducible Human Contribution
So what can't be automated? After running this experiment for months, here's where I've landed — for now.
Judgment about what to build. Agents can evaluate markets, analyze competitors, write specs, and build products. They cannot decide which problem is worth solving. They cannot feel the gap between what exists and what should exist. They cannot look at a portfolio of five projects and decide to kill three of them because the world changed. That judgment — the willingness to say "this isn't working, stop" — is the irreducible human input.
Judgment about when to stop. This is harder than it sounds. Agents are optimizers. Given a task, they will improve it. Given a codebase, they will refactor it. Given a product, they will add features. Knowing when something is good enough to ship — when the next marginal improvement isn't worth the time — is a human call. Our CHARTER.md traction gates are an attempt to systematize this ("if you don't have 10 users by week 4, kill it"), but the thresholds themselves are human judgment.
Judgment about what matters. Agents propagate patterns. Humans decide which patterns are worth propagating. When Jared flags that one project discovered a useful testing pattern, someone has to decide whether that pattern is worth spreading to every project or whether it's context-specific. That "is this generalizable?" question is something I haven't seen an agent answer well.
The narrative. Why does MonkeyRun exist? What story are we telling? Who are we talking to and what do we want them to feel? Agents can write copy, create distribution plans, and optimize for engagement. They cannot decide what the story is. This blog post — the thesis, the philosophy, the point of the whole enterprise — is something an agent could format but couldn't originate. The question "what am I doing here?" is a human question.
The Monkey Runs
The name "MonkeyRun" is a play on the idea that the machine runs — executes, builds, ships, coordinates — but the monkey (the human, the messy biological thing at the top of the stack) is the one deciding where to run.
It's not flattering. It's not meant to be. The whole point is that the human contribution isn't clean or elegant. It's messy judgment calls at midnight. It's killing a project you spent two months on because the traction isn't there. It's reading a blog post by a stranger and realizing that your agents have the same problem his do, and deciding to write about it honestly instead of pretending you had it figured out.
The monkey runs because someone still has to decide what's worth running toward. The agents execute. The systems verify. The conventions prevent known failures. But the direction — the why — that's still human.
I don't know how long that stays true. Maybe in a year, an agent will be able to evaluate a portfolio and make kill decisions with better judgment than mine. Maybe the narrative question gets solved by a model that can hold genuine conviction about what matters.
But right now, today, the experiment is this: one founder, twenty years of early-stage experience, a portfolio of AI agents, and an honest attempt to figure out exactly where the line is between what humans do and what machines do. MonkeyRun is the lab where I run that experiment. Backstage is where I share the results.
If you're a founder trying to figure out how to use AI agents in your own startup — not as a toy, not as a demo, but as your actual team — that's what this site is for. Not hype. Not "AI will change everything." Just: here's what I tried, here's what broke, here's what I learned, and here's what I still don't know.
The line between human and machine is moving. I'm trying to document exactly where it is today, so you don't have to find it the hard way.
This is the thesis post for MonkeyRun. If you're new here, start with How We Stopped Our AI Agents From Getting Dumber (the context engineering problem) or Why We Stopped Delegating to AI Agents (the builder-who-triages pattern). See The Model for the full operating system, or Start Here for a guided reading path.