Autonomous AI Coding Pipeline

Three agents. One feature. Zero hand-holding.

agent-bober is an autonomous coding pipeline that plans, builds, and evaluates your features using three specialized AI agents. Describe what you want. Walk away. Come back to working software.

This website was built by agent-bober in a single autonomous run.

The root cause

The solo agent problem

Ask a single AI to plan, build, and evaluate your code and you get a confident answer — not necessarily a correct one. Three failure modes collapse every long autonomous session.

Confirmation bias

A single agent never questions its own plan. It writes code and evaluates it in the same context, confirming its own assumptions. The reviewer and the author are the same mind.

No real evaluation

Without a separate evaluator, "testing" means the same agent glancing at its own work. Bugs that seem obvious to fresh eyes slip through. Self-review is not review.

Broken output

Context degrades over long sessions. The agent loses track of decisions made 50 messages ago. The result: code that almost works but doesn't.

Solo agent loop

1
Plan
2
Build
3
Self-evaluate

same context — same blind spots

Anthropic research

“Multi-agent systems consistently outperform solo agents on complex coding tasks.”

— Anthropic's research on agentic AI systems

The colony

Meet the colony

Three agents. Three distinct responsibilities. Each one laser-focused on its role — which is exactly what makes the whole system work.

Planner

The Architect

Claude Opus (configurable)

Sees the whole forest, touches nothing. Read-only tools: read_file, glob, grep. Plans everything, implements nothing.

Generator

The Builder

Claude Sonnet (configurable)

Shapes every log with precision. Full tools: bash, read_file, write_file, edit_file, glob, grep. Implements exactly what the contract says.

Evaluator

The Inspector

Claude Sonnet (configurable)

Tests every joint, never lies. Read + bash only — no write, no edit. Reports pass or fail. Cannot change the code it reviews.

Core principle

“The separation is the feature.”

When the agent who plans cannot build, and the agent who builds cannot evaluate, every assumption gets challenged. That's not overhead — that's quality.

The pipeline

From idea to working software

Every feature run follows the same pipeline — plan it, build it, evaluate it, ship it. No shortcuts. No surprises.

  1. 1
    You

    Describe Your Idea

    Tell agent-bober what you want to build. A sentence, a paragraph, or a full spec — it all works.

  2. 2
    Claude Opus

    Planner Creates the Blueprint

    The Architect analyzes your request, asks clarifying questions, and decomposes it into sprint-sized contracts with explicit success criteria.

  3. 3
    Claude Sonnet

    Generator Builds Each Sprint

    The Builder receives one sprint contract at a time. Fresh context, clear scope, no accumulated confusion. It implements, self-verifies, and commits.

  4. 4
    Claude Sonnet

    Evaluator Judges the Output

    The Inspector runs typecheck, lint, build, and tests against the contract's success criteria. Pass or fail — no opinions, just evidence.

  5. Evaluator → Generator retry
    5
    Loop

    Rework Until It Passes

    Failed? The evaluator's feedback goes back to a fresh Generator. New context, specific fixes. Up to 3 iterations per sprint.

  6. 6
    Done

    Ship Working Software

    When all sprints pass evaluation, you have a working feature. Tested, typed, linted — ready to merge.

End result

Every commit passes typecheck, lint, build, and tests.

The pipeline doesn't finish until all sprints pass evaluation. There's no “good enough.” Either it passes, or it loops.

What you get

Everything the colony needs

agent-bober comes batteries-included.

Stack Agnostic

7 presets from Next.js to Solana. Or define your own stack. agent-bober adapts to your tools, not the other way around.

Sprint Contracts

Every sprint has an explicit scope and success criteria. The generator can't wander — the contract is the boundary.

Pluggable Evaluation

Typecheck, lint, build, unit tests, Playwright — mix the strategies you need. Add custom commands for anything else.

Context Resets

Fresh context window every sprint. No accumulated confusion, no degraded output. Each sprint starts clean.

Brownfield Ready

Add features to existing codebases. Deep analysis first, conservative sprints, regression-focused evaluation.

Fully Autonomous

Fully autonomous or human-in-the-loop — you set the permissions. Walk away and come back to working software, or stay involved and approve each step.

Multi-Provider

Anthropic, OpenAI, Google Gemini, or any OpenAI-compatible endpoint — mix and match per agent role. 4 providers, your choice.

MCP Tools

10 MCP tools exposed over stdio transport — read, write, edit, search, bash, and more. Drop agent-bober into any MCP client.

Slash Commands

10 slash commands for Claude Code: /bober-run, /bober-plan, /bober-sprint, and more — the full pipeline at your fingertips.

Your models, your choice

Pick your provider

Mix and match AI providers per agent role. Anthropic by default, everything else when you need it.

A

Anthropic

Default provider. Recommended for all roles.

claude-opus-4claude-sonnet-4claude-haiku-4
OA

OpenAI

Full OpenAI model lineup supported.

gpt-4.1gpt-4.1-minio3o4-mini
GG

Google Gemini

Google's Gemini models via the Generative AI API.

gemini-2.5-progemini-2.5-flash
OAC

OpenAI-Compatible

Any endpoint that speaks the OpenAI API — Ollama, LM Studio, Groq, DeepSeek, and more.

Any model

Live proof

You're looking at the proof

This website was built by agent-bober. Not as a demo — as a real production deployment. Every section you've scrolled through was planned, built, and evaluated by the three-agent pipeline.

103

tests passing

4

AI providers

10

MCP tools

10

slash commands

7

presets

6+

eval strategies

The Architect designed 7 sprint contracts. The Builder implemented each one in a fresh context. The Evaluator rejected flawed outputs and fed specific feedback back into the loop. Every commit you can see in the repository passed typecheck, lint, build, and test.

View the source code — every commit from the run

Works where you work

Every IDE. Every terminal.

Claude Code plugin, MCP server for Cursor and Windsurf, or plain CLI — agent-bober meets you where you are.

CC

Claude Code

Plugin

10 slash commands built in. Run /bober-run to kick off the full pipeline without leaving your editor.

/bober-run/bober-plan/bober-sprint/bober-eval/bober-status
CR

Cursor

MCP Server

Connect via MCP server transport. Run npx agent-bober mcp and point Cursor at the stdio endpoint.

npx agent-bober mcp
WS

Windsurf

MCP Server

Same MCP setup as Cursor. One command, full pipeline access from inside Windsurf.

npx agent-bober mcp
>_

Any Terminal

CLI

The original interface. Works in any terminal — no IDE required.

npx agent-bober run "feature"

Get started

Build something

Three commands. That's it.

Initialize your project

npx agent-bober init nextjs

Run the pipeline

/bober-run "Build a landing page with hero, features grid, and contact form"

Watch it ship

✓ Plan created: 5 sprints
✓ Sprint 1/5 passed (iteration 1)
✓ Sprint 2/5 passed (iteration 2)
✓ Sprint 3/5 passed (iteration 1)
✓ Sprint 4/5 passed (iteration 1)
✓ Sprint 5/5 passed (iteration 2)
✓ All sprints complete. Ready to merge.

Alternative

Or use agent-bober as a Claude Code skill — just run /bober-run in any Claude Code session.

MCP Server (Cursor / Windsurf)

npx agent-bober mcp  # Start MCP server for Cursor/Windsurf