Spec-driven development with coding agents

Different control-system designs for turning specs → incremental code changes, with different answers to:

Where does truth live? (PRD/story files, a task list, a test list, a merge queue)
How do we keep agents “smart”? (fresh contexts, constrained tasks, checklists, backpressure)
How do we ship safely? (QA gates, tests, sandboxing, merge orchestration)

Below is a concise playbook for each approach + a decision framework.

A shared mental model: Spec Plane vs Execution Plane

Think of the system as two planes:

Spec Plane (slow, human-aligned): PRD/architecture/stories/specs/plan files become the “API” for the agent(s).
Execution Plane (fast, machine): loops that pick one unit of work, implement, validate, commit/merge, and update state.

flowchart LR
  subgraph SpecPlane[Spec Plane: source of truth]
    S1[PRD / Specs] --> S2[Architecture / Design]
    S2 --> S3[Plan: Stories / Tasks / Test List]
  end

  subgraph ExecPlane[Execution Plane: loops]
    L1[Select 1 unit of work] --> L2[Implement]
    L2 --> L3[Validate: tests/lint/build]
    L3 --> L4[Commit/MR + Update state]
    L4 --> L1
  end

  S3 --> L1
  L4 --> S3

The four approaches differ mainly in what S3 is (stories vs task list vs test list vs merge queue) and how heavily they instrument the Execution Plane.

1) BMad (BMAD): “SDLC as a prompt framework” (roles + workflows + artifacts)

How it works

BMAD is a natural-language framework: agents are personas with explicit dependencies (tasks/templates/checklists/data), and workflows prescribe a planning phase followed by an execution cycle.

Key moves:

Planning (often in Web UI):
- Analyst/PM/UX/Architect/QA/PO collaborate to produce PRD + architecture (and optionally QA strategy).
- PO runs a master checklist and ensures alignment.
Shard the plan (critical):
- PO shards PRD + architecture into epics/stories so each dev iteration is small and explicit.
Core dev cycle in IDE (Scrum loop):
- SM drafts the next story with very specific context.
- Dev implements sequential tasks + tests.
- Optional QA mid-checks + final gate; then commit; mark story done; repeat.
Architectural core concept: “Dependencies + Bundles”
- The “brain” lives in bmad-core/ (agents, tasks, templates, checklists, KB).
- Tooling can bundle teams into dist/ for web UIs vs lean agent files for IDEs.

Why it’s spec-driven

Specs are first-class artifacts (PRD/architecture/stories) and the workflow enforces that execution pulls from sharded specs, not ad-hoc chat state.
Templates embed generation instructions while keeping output clean (separation of concerns).

BMAD’s core heuristics (the “rules that make it work”)

Keep dev agents lean (save context for code; planning agents can be heavier).
Prefer many small tasks over branching mega-prompts (less context entropy).
Use checklists/gates as the safety rail, not “trust the model.”

BMAD “control loop” diagram (condensed)

flowchart TD
  P[Planning: PRD + Architecture] --> SH[PO: Shard into Epics/Stories]
  SH --> S1[SM: Draft next Story w/ exact context]
  S1 --> D1[Dev: Implement + Tests]
  D1 --> Q{QA Gate?}
  Q -->|Optional| QA[QA: Risk/Trace/NFR checks]
  QA --> D1
  Q -->|Pass| C[Commit + Mark Story Done]
  C --> S1

Example: minimal story “spec unit” (what Dev consumes)

# Story: Add Checkout Coupon Validation

## Context
- Related Epic: Pricing & Promotions
- Relevant modules: src/pricing/, src/checkout/

## Acceptance Criteria
- Invalid coupons rejected with code COUPON_INVALID
- Expired coupons rejected with COUPON_EXPIRED
- Tests: unit + one e2e

## Tasks (ordered)
1) Add validator in src/pricing/coupons.ts
2) Wire into checkout apply flow
3) Add unit tests
4) Add e2e scenario

Best fit: teams that want SDLC structure + human-in-loop controls, especially for non-trivial systems and multi-stakeholder planning.

2) Claude Code Autonomous Agents: “Long-running harness + persistent progress”

How it works

This approach is less “SDLC roles” and more agent runtime engineering.

The demo harness uses a two-agent pattern:

Initializer Agent (Session 1):
- Reads app_spec.txt
- Generates a big feature_list.json (e.g., 200 test cases)
- Sets up project structure + git
Coding Agent (Sessions 2+):
- Picks features one-by-one
- Implements, marks passing in feature_list.json
- Commits progress
- Each session starts with a fresh context window; progress lives in files + git

It also explicitly designs a defense-in-depth security model (sandboxing + filesystem restrictions + bash allowlist).

And the Python entry point shows the harness wrapper around repeated runs (resume by rerunning the command).

Why it’s spec-driven

The spec becomes a test inventory (feature_list.json) and that inventory is the truth:

the agent doesn’t “remember” what’s done—the file says what’s done
validation is builtin: a feature isn’t done until it passes tests and is recorded

Harness mental model: “Agent is a function; harness is the product”

Anthropic’s long-running harness guidance (attached) emphasizes that most reliability comes from the wrapper:

checkpointing/progress files
resumability
tool safety boundaries
clear stop/resume semantics
guardrails around bash/file operations

Sequence diagram

sequenceDiagram
  participant H as Harness
  participant I as Initializer Agent
  participant C as Coding Agent
  participant FS as Repo Files (spec/state)
  participant T as Tests

  H->>I: Session 1 (fresh context) + app_spec.txt
  I->>FS: Write feature_list.json + scaffold + git init
  H->>C: Session 2+ (fresh context) + read state files
  loop each iteration
    C->>FS: Select next failing feature
    C->>FS: Implement code changes
    C->>T: Run tests/build
    alt pass
      C->>FS: Mark feature passing + commit
    else fail
      C->>FS: Fix + retry (bounded)
    end
  end

Mini harness loop (pseudo)

while not done(feature_list):
    task = pick_next(feature_list)
    apply_patch(task)
    ok = run_tests()
    if ok:
        mark_done(task); git_commit()
    else:
        triage_failure(); patch_again()

Best fit: when you want a mostly autonomous “app builder” that can run for hours, with strong emphasis on resumability + safety boundaries + file-based progress.

3) Ralph Loop: “Fresh context every iteration + file-based memory”

How it works (core loop)

Ralph is a deliberately simple continuous loop:

read plan → pick task → implement → test → commit → clear context → repeat

Key files (the “state store”):

specs/* (requirements)
IMPLEMENTATION_PLAN.md (prioritized task list)
AGENTS.md (operational learnings / how to run)
PROMPT.md (instructions)

Two modes use the same mechanism:

PLANNING: gap analysis (specs vs code) → update plan only (often 1–2 iterations)
BUILDING: implement from plan → validate (backpressure) → update plan → commit

The BUILDING loop lifecycle is explicitly enumerated (Orient → Read plan → Select → Investigate → Implement → Validate → Update plan/agents → Commit).

Why it works (the “Ralph physics”)

Fresh context keeps the model in its “smart zone” (less drift).
File-based memory persists learnings without bloating context.
Backpressure (tests/build) forces self-correction.

Ralph loop diagram

flowchart TD
  O[Orient: read specs/*] --> RP[Read IMPLEMENTATION_PLAN.md]
  RP --> S[Select next task]
  S --> I[Investigate codebase]
  I --> IMPL[Implement]
  IMPL --> V[Validate: tests/build]
  V --> UP[Update plan + AGENTS.md if needed]
  UP --> C[Commit]
  C --> O

Minimal `IMPLEMENTATION_PLAN.md` pattern

# Implementation Plan

## Now (highest priority)
- [ ] Add coupon validator + unit tests
- [ ] Wire validator into checkout
- [ ] Add e2e scenario

## Later
- [ ] Add admin UI for coupons
- [ ] Add metrics + alerts

## Notes / Discoveries
- checkout flow uses applyPromotion() in src/checkout/apply.ts

Best fit: solo devs / small teams who want maximum throughput with minimal orchestration, and who are disciplined about tests + commits + keeping the plan accurate.

4) Gas Town: "Multi-agent swarm + merge queue + patrol automation"

How it works

Gas Town is an orchestrator for many Claude Code instances—meant for stage-7/8 "I can manage 10–30 agents" workflows.

Core concept: scale the Execution Plane by adding roles and merge mechanics:

Town (HQ) + Rigs (projects): rigs are repos under management.
Mayor: the main interface/concierge.
Polecats: ephemeral swarm workers that generate MRs.
Refinery: resolves the merge queue problem and merges changes safely to main, one at a time.
Witness: watches polecats and unblocks them.
Deacon + Dogs (+ Boot): a patrol loop that keeps the system running (maintenance, plugins, nudges).

Why it's spec-driven

In Gas Town, "spec" often becomes work packets that can be fanned out to a swarm:

the overseer (you) supplies features/constraints
the system compiles them into parallel MRs
the merge queue + refinery enforce convergence to a coherent main branch

Gas Town architecture (condensed)

flowchart LR
  O[Overseer (Human)] --> M[Mayor (front desk)]
  M --> SW[Swarm: Polecats (MRs)]
  SW --> MQ[Merge Queue]
  MQ --> R[Refinery (rebases/merges safely)]
  R --> MAIN[(main)]
  W[Witness] --> SW
  D[Deacon patrol loop] --> W
  D --> DG[Dogs: maintenance/plugins]

Best fit: high-parallelism work (many independent tasks), where merge conflicts + coordination become the bottleneck and you're ready to invest in orchestrator complexity.

5) Oh My OpenCode Sisyphus: "Three-layer orchestration + category-based delegation + wisdom accumulation"

How it works

Oh My OpenCode Sisyphus is a plugin for OpenCode that transforms a single AI agent into a coordinated development team through intelligent orchestration. Named after the Greek figure condemned to eternally roll a boulder uphill, it embodies relentless, continuous work until completion.

Core architecture: three-layer separation of planning, orchestration, and execution:

Planning Layer (Prometheus + Metis + Momus):
- Prometheus: Strategic planner in interview mode, researches codebase while collecting requirements
- Metis: Pre-planning consultant that catches hidden intentions, ambiguities, and over-engineering patterns
- Momus: Plan reviewer for high-accuracy mode (rejects plans until 100% compliant)
- Output: detailed work plan in .sisyphus/plans/{name}.md
Orchestration Layer (Atlas):
- Atlas: The conductor - delegates tasks, accumulates wisdom, verifies results (never trusts subagent claims)
- CANNOT write code - MUST delegate to specialized agents
- Manages parallel execution when tasks are independent
- Passes accumulated learnings to all subsequent tasks
Worker Layer (Specialized Agents):
- Sisyphus-Junior: Focused task executor (Sonnet 4.5), cannot delegate, must pass diagnostics
- Oracle: Architecture/debugging (GPT-5.2), read-only
- Librarian: Documentation/OSS research (GLM-4.7), multi-repo analysis
- Explore: Fast codebase grep (Grok Code / Gemini Flash / Haiku)
- Frontend: UI/UX specialist (Gemini 3 Pro), designer-turned-developer

Two modes of operation:

Ultrawork mode (ulw): "Just do it" - agent figures everything out autonomously
Prometheus mode (Tab → /start-work): Interview → plan → orchestrated execution with full verification

Why it's spec-driven

Specs are plan files + accumulated wisdom:

Plans live in .sisyphus/plans/*.md with tasks, acceptance criteria, constraints
Wisdom accumulates in .sisyphus/notepads/{plan-name}/ (learnings, decisions, issues, verification)
boulder.json tracks progress across sessions (resume anytime)
Todo state externalized in files, enforced by system reminders ("boulder pushing")

The plan is the contract; Atlas orchestrates execution; wisdom ensures consistency.

Sisyphus "control loop" diagram

flowchart TB
    U[User: Describe work] --> P{Mode?}
    P -->|ultrawork/ulw| A[Auto-explore + plan + execute]
    P -->|Tab key| PROM[Prometheus: Interview mode]

    PROM --> I[Research codebase + ask questions]
    I --> MET[Metis: Gap analysis]
    MET --> PF[Generate plan file]
    PF --> MOM{High accuracy?}
    MOM -->|Yes| REV[Momus: Validate plan]
    REV -->|REJECT| PF
    REV -->|OKAY| SW[/start-work]
    MOM -->|No| SW

    SW --> ATL[Atlas: Read plan + build task map]
    ATL --> DEL[Delegate to specialized agents]
    DEL --> JUN[Junior: Execute task]
    JUN --> VER[Atlas: Verify independently]
    VER -->|Fail| DEL
    VER -->|Pass| WIS[Accumulate wisdom]
    WIS --> NT{More tasks?}
    NT -->|Yes| DEL
    NT -->|No| DONE[Final report + completion]

    A --> ATL

Key mechanisms (what makes it unique)

Category-based delegation (not model names):
- visual-engineering → Gemini 3 Pro (frontend/UI)
- ultrabrain → GPT-5.2 (complex reasoning)
- quick → Haiku (trivial tasks)
- Eliminates distributional bias from model names
Todo Continuation Enforcer:
- System reminder forces agents to complete ALL todos before responding
- The "boulder pushing" mechanism - prevents premature "done" claims
Wisdom accumulation:
- Conventions, successes, failures, gotchas, commands extracted after each task
- Passed forward to ALL subsequent agents
- Prevents repeated mistakes, ensures consistency
Background agents:
- Run multiple agents in parallel (e.g., GPT debugging while Claude implements)
- Configurable concurrency limits per provider/model
Context injection:
- Hierarchical AGENTS.md (directory-specific context)
- Conditional rules from .claude/rules/ (glob-based)
- LSP + AST-grep tools for refactoring
Built-in Ralph Loop:
- /ralph-loop command for continuous development until <promise>DONE</promise>
- Integrates Ralph's fresh-context philosophy with orchestration

Example: minimal plan file (what Atlas consumes)

# Plan: Add Checkout Coupon Validation

## Intent: New Feature

## Background
Users need coupon validation at checkout. Current flow has no validation.

## Tasks

- [ ] **Task 1**: Add validator in src/pricing/coupons.ts
  - **Expected Outcome**: CouponValidator class with validate() method
  - **Acceptance Criteria**: Rejects invalid/expired coupons with correct error codes
  - **Reference**: See src/pricing/discounts.ts:42 for similar pattern

- [ ] **Task 2**: Wire validator into checkout flow
  - **Dependencies**: Task 1
  - **Expected Outcome**: Checkout calls validator before applying coupon
  - **Files to Modify**: src/checkout/apply.ts

- [ ] **Task 3**: Add unit tests
  - **Expected Outcome**: 95%+ coverage for CouponValidator
  - **Test Cases**: Invalid coupon, expired coupon, valid coupon

- [ ] **Task 4**: Add e2e scenario
  - **Expected Outcome**: E2E test for full checkout flow with coupons
  - **Reference**: tests/e2e/checkout.spec.ts:100 for pattern

## Constraints
- MUST NOT modify existing discount logic
- MUST preserve current checkout UX
- MUST handle edge cases: null coupons, malformed codes

## Verification
- All unit tests pass
- E2E test passes
- lsp_diagnostics shows zero errors
- No regressions in existing checkout tests

Best fit: teams that want maximum autonomy + multi-model orchestration + quality-critical output, willing to invest in token budget for production-ready code indistinguishable from human work. Ideal for multi-session projects with interruptions.

Differences & trade-offs (the practical comparison)

What "spec" is in each approach

BMAD: PRD/architecture → sharded epics/stories (SDLC artifacts)
Claude Autonomous Harness: app_spec.txt → feature_list.json (test inventory)
Ralph: specs/* → IMPLEMENTATION_PLAN.md (task inventory)
Gas Town: features → parallel MRs + merge-queue convergence
Sisyphus: .sisyphus/plans/*.md (detailed work plans) + .sisyphus/notepads/ (accumulated wisdom)

Autonomy vs Control (rule of thumb)

Most controlled: BMAD (human checkpoints + roles + gates)
Balanced control: Ralph (simple loop, you control the plan and accept "smart zone" constraints)
Balanced autonomy: Sisyphus (interview mode for planning, orchestrated execution with verification)
Most autonomous single-thread: Claude harness (long-running agent with strong state + safety wrapper)
Most autonomous at scale: Gas Town (swarm + patrol + merge machinery)

Complexity budget

BMAD: medium (framework + workflow discipline)
Ralph: low (a loop + a few files)
Sisyphus: medium–high (multi-agent orchestration, category system, wisdom accumulation)
Claude harness: medium–high (proper sandboxing, tool allowlists, resumability engineering)
Gas Town: very high (orchestrator, many roles, merge queue, maintenance loops)

Unique strengths

BMAD: SDLC structure + stakeholder alignment
Ralph: Maximum simplicity + fresh context every iteration
Sisyphus: Multi-model orchestration + quality-critical output + wisdom accumulation
Claude harness: Long-running autonomy + strong safety boundaries
Gas Town: Massive parallelism + merge conflict resolution

Failure modes (and the built-in antidotes)

Context drift / hallucinated completion
- Ralph/Claude: fresh context + file state + tests/backpressure
- BMAD: sharded stories + QA/PO gates
- Sisyphus: todo continuation enforcer + wisdom accumulation + independent verification by Atlas
Merge conflicts / mainline instability
- Gas Town: Refinery + MQ
- Others: sequential work, smaller diffs
- Sisyphus: parallel execution when tasks are independent, dependency tracking in plans
Unsafe tool usage
- Claude harness: sandbox + allowlist
- BMAD/Ralph: usually rely on IDE tool permissions; you can copy the allowlist idea
- Sisyphus: permission system per agent (edit/bash/webfetch allow/deny/ask)
Agent stops halfway / claims "done" prematurely
- Ralph: fresh context forces re-evaluation
- BMAD: QA/PO gates catch incomplete work
- Sisyphus: todo continuation enforcer (system reminder blocks response until all todos complete)
- Claude: feature_list.json tracks what's actually passing tests
Model selection bias / suboptimal performance
- Most: hardcode model names (creates distributional bias)
- Sisyphus: category-based delegation ("ultrabrain" not "GPT-5.2") eliminates model self-awareness

When to choose which (decision framework)

Use these questions to guide your choice:

Q1) Is your bottleneck "figuring out what to build" or "shipping lots of code changes"?

Figuring out what to build / aligning stakeholders → BMAD
Shipping code changes fast → go to Q2

Q2) What's your code quality requirement vs autonomy preference?

Production-critical + want indistinguishable-from-human quality → Sisyphus
- Especially if you need multi-model orchestration (different expertise for different tasks)
- Acceptable higher token usage for quality
- Multi-session projects with interruptions
Good quality + maximum autonomy + willing to engineer harness → go to Q3

Q3) Is your work mostly sequential or massively parallel?

Sequential, one change at a time → Ralph or Claude harness
- If you want simplicity + tight loop discipline + fresh context → Ralph
- If you want autonomy + long runs + strong safety wrapper → Claude harness
Moderate parallelism (2-5 streams) + need model diversity → Sisyphus
- Background agents for research while main agent implements
- Category-based delegation to optimal models per task type
Massively parallel (10+ streams) + merge becomes the bottleneck → Gas Town

Quick Decision Matrix

Criterion	BMAD	Ralph	Sisyphus	Claude Harness	Gas Town
Stakeholder alignment needs	✅ Strong	⚠️ Weak	⚠️ Weak	⚠️ Weak	⚠️ Weak
Code quality requirements	⭐⭐⭐	⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐	⭐⭐
Autonomy level	Low	Medium	High	Very High	Very High
Setup complexity	Medium	Low	Medium-High	Medium-High	Very High
Token efficiency	Medium	High	Medium-Low	Medium	Low
Multi-model orchestration	❌	❌	✅ Strong	❌	⚠️ Possible
Resume after interruption	Manual	Manual	✅ Automatic	✅ Automatic	✅ Automatic
Parallel execution	Manual	No	✅ Smart	No	✅✅ Massive
Learning curve	Medium	Low	Medium-High	High	Very High
Best for	Requirements phase	Solo dev speed	Production quality	Long automation	Scale

Common patterns across all 5 (the "portable best practices")

Externalize state (never trust chat memory)
- plan files, feature/test lists, story shards, MR queues, wisdom/learnings
- BMAD: epics/stories
- Ralph: IMPLEMENTATION_PLAN.md
- Sisyphus: .sisyphus/plans/*.md + notepads/
- Claude harness: feature_list.json
- Gas Town: MR queue
Make the unit of work small
- BMAD: sharded stories
- Ralph: one plan item
- Sisyphus: one task from plan with clear acceptance criteria
- Claude harness: one feature/test
- Gas Town: one MR per polecat
Backpressure is non-negotiable
- tests/build/lint are your "physics engine" that keeps agents honest
- Sisyphus adds: lsp_diagnostics + independent verification by orchestrator
Commit/merge is part of the loop
- "Done" must mean: validated + recorded in state + committed
- Sisyphus: todo continuation enforcer blocks "done" claims until all todos complete
Separate planning from building
- BMAD: planning workflow vs dev cycle
- Ralph: PLANNING prompt vs BUILDING prompt
- Sisyphus: strongest separation - Prometheus (planner, READ-ONLY) vs Atlas (orchestrator) vs Junior (executor)
- Claude harness: Initializer vs Coding agent
- Gas Town: implicit in work packet distribution
Treat the harness/orchestrator as production software
- security boundaries, retries, resumability, observability
- Sisyphus adds: permission system per agent, background task concurrency limits, session recovery hooks
Accumulate and propagate learnings (Sisyphus innovation)
- After each task: extract conventions, successes, failures, gotchas
- Pass wisdom forward to ALL subsequent tasks
- Prevents repeated mistakes, ensures consistency
- Portable to other frameworks via notepad/learnings files

A practical "starter blueprint" you can adapt (works for any of the 5)

Universal Structure (works with all approaches)

/specs/                      # requirements (Ralph-style) or PRD shards (BMAD-style)
/docs/                        # PRD, architecture, decision logs
/plan/IMPLEMENTATION_PLAN.md  # task list (Ralph-style)
/plan/feature_list.json       # test inventory (Claude harness-style)
AGENTS.md                     # operating guide + learnings (all approaches)
PROMPT.md                     # system instructions (planning/building)
/scripts/loop.sh              # re-run harness

Sisyphus-Specific Structure

.sisyphus/
├── drafts/                  # Interview conversation drafts (Prometheus)
├── plans/                   # Generated work plans
│   └── {plan-name}.md       # Detailed task list with acceptance criteria
├── notepads/                # Accumulated wisdom per plan
│   └── {plan-name}/
│       ├── learnings.md     # Conventions, patterns, successful approaches
│       ├── decisions.md     # Architectural choices and rationales
│       ├── issues.md        # Problems, blockers, gotchas encountered
│       ├── verification.md  # Test results, validation outcomes
│       └── problems.md      # Unresolved issues, technical debt
└── boulder.json             # Current plan + session ID for resume

.opencode/
└── oh-my-opencode.json      # Configuration: agents, categories, hooks

.claude/
├── rules/                   # Conditional rules (glob-based injection)
├── skills/                  # Custom skills
└── settings.json            # Claude Code compatibility hooks

Minimal loop script examples

Ralph-style (fresh context):

#!/usr/bin/env bash
set -euo pipefail
# 1) Start agent with PROMPT.md + AGENTS.md + specs + plan
# 2) Agent implements ONE task, runs tests, updates plan, commits
# 3) Script exits; rerun for next iteration (fresh context)

Sisyphus-style (orchestrated):

#!/usr/bin/env bash
set -euo pipefail
# Option 1: Ultrawork mode (auto-plan + execute)
# opencode "ulw implement feature X"

# Option 2: Prometheus planning + orchestrated execution
# Step 1: opencode "@plan implement feature X" (interview mode)
# Step 2: opencode "/start-work" (Atlas orchestrates)
# Resumes automatically via boulder.json if interrupted

Spec-driven development with coding agents

A shared mental model: Spec Plane vs Execution Plane

1) BMad (BMAD): “SDLC as a prompt framework” (roles + workflows + artifacts)

How it works

Why it’s spec-driven

BMAD’s core heuristics (the “rules that make it work”)

BMAD “control loop” diagram (condensed)

Example: minimal story “spec unit” (what Dev consumes)

2) Claude Code Autonomous Agents: “Long-running harness + persistent progress”

How it works

Why it’s spec-driven

Harness mental model: “Agent is a function; harness is the product”

Sequence diagram

Mini harness loop (pseudo)

3) Ralph Loop: “Fresh context every iteration + file-based memory”

How it works (core loop)

Why it works (the “Ralph physics”)

Ralph loop diagram

Minimal IMPLEMENTATION_PLAN.md pattern

4) Gas Town: "Multi-agent swarm + merge queue + patrol automation"

How it works

Why it's spec-driven

Gas Town architecture (condensed)

5) Oh My OpenCode Sisyphus: "Three-layer orchestration + category-based delegation + wisdom accumulation"

How it works

Why it's spec-driven

Sisyphus "control loop" diagram

Key mechanisms (what makes it unique)

Example: minimal plan file (what Atlas consumes)

Differences & trade-offs (the practical comparison)

What "spec" is in each approach

Autonomy vs Control (rule of thumb)

Complexity budget

Unique strengths

Failure modes (and the built-in antidotes)

When to choose which (decision framework)

Q1) Is your bottleneck "figuring out what to build" or "shipping lots of code changes"?

Q2) What's your code quality requirement vs autonomy preference?

Q3) Is your work mostly sequential or massively parallel?

Quick Decision Matrix

Common patterns across all 5 (the "portable best practices")

A practical "starter blueprint" you can adapt (works for any of the 5)

Universal Structure (works with all approaches)

Sisyphus-Specific Structure

Minimal loop script examples

Minimal `IMPLEMENTATION_PLAN.md` pattern