Three agents. One feature. Zero hand-holding.
agent-bober is an autonomous coding pipeline that plans, builds, and evaluates your features using three specialized AI agents. Describe what you want. Walk away. Come back to working software.
This website was built by agent-bober in a single autonomous run.
The root cause
The solo agent problem
Ask a single AI to plan, build, and evaluate your code and you get a confident answer — not necessarily a correct one. Three failure modes collapse every long autonomous session.
Confirmation bias
A single agent never questions its own plan. It writes code and evaluates it in the same context, confirming its own assumptions. The reviewer and the author are the same mind.
No real evaluation
Without a separate evaluator, "testing" means the same agent glancing at its own work. Bugs that seem obvious to fresh eyes slip through. Self-review is not review.
Broken output
Context degrades over long sessions. The agent loses track of decisions made 50 messages ago. The result: code that almost works but doesn't.
Solo agent loop
same context — same blind spots
Anthropic research
“Multi-agent systems consistently outperform solo agents on complex coding tasks.”
— Anthropic's research on agentic AI systems
The colony
Meet the colony
Three agents. Three distinct responsibilities. Each one laser-focused on its role — which is exactly what makes the whole system work.
Planner
The Architect
Sees the whole forest, touches nothing. Read-only tools: read_file, glob, grep. Plans everything, implements nothing.
Generator
The Builder
Shapes every log with precision. Full tools: bash, read_file, write_file, edit_file, glob, grep. Implements exactly what the contract says.
Evaluator
The Inspector
Tests every joint, never lies. Read + bash only — no write, no edit. Reports pass or fail. Cannot change the code it reviews.
Core principle
“The separation is the feature.”
When the agent who plans cannot build, and the agent who builds cannot evaluate, every assumption gets challenged. That's not overhead — that's quality.
The pipeline
From idea to working software
Every feature run follows the same pipeline — plan it, build it, evaluate it, ship it. No shortcuts. No surprises.
- 1You
Describe Your Idea
Tell agent-bober what you want to build. A sentence, a paragraph, or a full spec — it all works.
- 2Claude Opus
Planner Creates the Blueprint
The Architect analyzes your request, asks clarifying questions, and decomposes it into sprint-sized contracts with explicit success criteria.
- 3Claude Sonnet
Generator Builds Each Sprint
The Builder receives one sprint contract at a time. Fresh context, clear scope, no accumulated confusion. It implements, self-verifies, and commits.
- 4Claude Sonnet
Evaluator Judges the Output
The Inspector runs typecheck, lint, build, and tests against the contract's success criteria. Pass or fail — no opinions, just evidence.
- Evaluator → Generator retry5Loop
Rework Until It Passes
Failed? The evaluator's feedback goes back to a fresh Generator. New context, specific fixes. Up to 3 iterations per sprint.
- 6Done
Ship Working Software
When all sprints pass evaluation, you have a working feature. Tested, typed, linted — ready to merge.
End result
Every commit passes typecheck, lint, build, and tests.
The pipeline doesn't finish until all sprints pass evaluation. There's no “good enough.” Either it passes, or it loops.
What you get
Everything the colony needs
agent-bober comes batteries-included.
Stack Agnostic
7 presets from Next.js to Solana. Or define your own stack. agent-bober adapts to your tools, not the other way around.
Sprint Contracts
Every sprint has an explicit scope and success criteria. The generator can't wander — the contract is the boundary.
Pluggable Evaluation
Typecheck, lint, build, unit tests, Playwright — mix the strategies you need. Add custom commands for anything else.
Context Resets
Fresh context window every sprint. No accumulated confusion, no degraded output. Each sprint starts clean.
Brownfield Ready
Add features to existing codebases. Deep analysis first, conservative sprints, regression-focused evaluation.
Fully Autonomous
Fully autonomous or human-in-the-loop — you set the permissions. Walk away and come back to working software, or stay involved and approve each step.
Multi-Provider
Anthropic, OpenAI, Google Gemini, or any OpenAI-compatible endpoint — mix and match per agent role. 4 providers, your choice.
MCP Tools
10 MCP tools exposed over stdio transport — read, write, edit, search, bash, and more. Drop agent-bober into any MCP client.
Slash Commands
10 slash commands for Claude Code: /bober-run, /bober-plan, /bober-sprint, and more — the full pipeline at your fingertips.
Your models, your choice
Pick your provider
Mix and match AI providers per agent role. Anthropic by default, everything else when you need it.
Anthropic
Default provider. Recommended for all roles.
OpenAI
Full OpenAI model lineup supported.
Google Gemini
Google's Gemini models via the Generative AI API.
OpenAI-Compatible
Any endpoint that speaks the OpenAI API — Ollama, LM Studio, Groq, DeepSeek, and more.
Live proof
You're looking at the proof
This website was built by agent-bober. Not as a demo — as a real production deployment. Every section you've scrolled through was planned, built, and evaluated by the three-agent pipeline.
103
tests passing
4
AI providers
10
MCP tools
10
slash commands
7
presets
6+
eval strategies
The Architect designed 7 sprint contracts. The Builder implemented each one in a fresh context. The Evaluator rejected flawed outputs and fed specific feedback back into the loop. Every commit you can see in the repository passed typecheck, lint, build, and test.
View the source code — every commit from the runWorks where you work
Every IDE. Every terminal.
Claude Code plugin, MCP server for Cursor and Windsurf, or plain CLI — agent-bober meets you where you are.
Claude Code
Plugin10 slash commands built in. Run /bober-run to kick off the full pipeline without leaving your editor.
/bober-run/bober-plan/bober-sprint/bober-eval/bober-statusCursor
MCP ServerConnect via MCP server transport. Run npx agent-bober mcp and point Cursor at the stdio endpoint.
npx agent-bober mcpWindsurf
MCP ServerSame MCP setup as Cursor. One command, full pipeline access from inside Windsurf.
npx agent-bober mcpAny Terminal
CLIThe original interface. Works in any terminal — no IDE required.
npx agent-bober run "feature"Get started
Build something
Three commands. That's it.
Initialize your project
npx agent-bober init nextjsRun the pipeline
/bober-run "Build a landing page with hero, features grid, and contact form"Watch it ship
✓ Plan created: 5 sprints
✓ Sprint 1/5 passed (iteration 1)
✓ Sprint 2/5 passed (iteration 2)
✓ Sprint 3/5 passed (iteration 1)
✓ Sprint 4/5 passed (iteration 1)
✓ Sprint 5/5 passed (iteration 2)
✓ All sprints complete. Ready to merge.Alternative
Or use agent-bober as a Claude Code skill — just run /bober-run in any Claude Code session.
MCP Server (Cursor / Windsurf)
npx agent-bober mcp # Start MCP server for Cursor/Windsurf