Oh My OpenCode Sisyphus: Multi-Agent Orchestration Framework
Overview
Oh My OpenCode Sisyphus is a sophisticated plugin for OpenCode that transforms a single AI agent into a coordinated development team through intelligent orchestration, specialized agents, and separation of planning from execution. Named after the Greek mythological figure condemned to roll a boulder uphill eternally, Sisyphus embodies the relentless, continuous work ethic that AI agents must maintain to complete tasks.
Core Philosophy: Human intervention during agentic work is fundamentally a failure signal. The system should complete work autonomously, producing code indistinguishable from that written by senior engineers.
How It Works
Architecture: Three-Layer Orchestration
Oh My OpenCode implements a three-layer architecture that separates planning, orchestration, and execution:
flowchart TB
subgraph Planning["Planning Layer"]
User[("User")]
Prometheus["Prometheus (Planner)<br/>Claude Opus 4.5"]
Metis["Metis (Consultant)<br/>Claude Opus 4.5"]
Momus["Momus (Reviewer)<br/>GPT-5.2"]
end
subgraph Execution["Orchestration Layer"]
Atlas["Atlas (Conductor)<br/>Claude Opus 4.5"]
end
subgraph Workers["Worker Layer"]
Junior["Sisyphus-Junior<br/>Claude Sonnet 4.5"]
Oracle["Oracle (Architecture)<br/>GPT-5.2"]
Explore["Explore (Codebase)<br/>Grok Code"]
Librarian["Librarian (Docs)<br/>GLM-4.7"]
Frontend["Frontend (UI/UX)<br/>Gemini 3 Pro"]
end
User --> Prometheus
Prometheus --> Metis
Prometheus --> Plan[".sisyphus/plans/*.md"]
Plan --> Momus
Plan --> Atlas
Atlas --> Junior
Atlas --> Oracle
Atlas --> Explore
Atlas --> Librarian
Atlas --> Frontend
1. Planning Layer: Prometheus + Metis + Momus
Prometheus (The Strategic Planner)
- Role: Acts as an intelligent interviewer and strategic consultant
- Model: Claude Opus 4.5
- Constraint: READ-ONLY mode - can only create/modify markdown files in
.sisyphus/directory - Process:
- Interview Mode: Instead of immediately planning, collects context through conversation
- Intent Classification: Identifies whether work is refactoring, new feature, debugging, etc.
- Context Collection: Investigates codebase via
exploreandlibrarianagents - Draft Creation: Continuously records discussion in
.sisyphus/drafts/ - Plan Generation: Creates detailed work plan in
.sisyphus/plans/{name}.md
Metis (The Plan Consultant)
- Role: Pre-planning analysis and gap detection
- Function: Catches what Prometheus missed - hidden intentions, ambiguities, over-engineering patterns
- Purpose: Externalizes implicit knowledge that Prometheus might assume but not document
Momus (The Plan Reviewer)
- Role: High-precision plan validation (activated in "high accuracy" mode)
- Validation Criteria:
- Clarity: Does each task specify WHERE to find implementation details?
- Verification: Are acceptance criteria concrete and measurable?
- Context: Sufficient context to proceed without >10% guesswork?
- Big Picture: Clear purpose, background, and workflow?
- Behavior: REJECTS plans until 100% compliant - no retry limit
2. Orchestration Layer: Atlas
Atlas (The Conductor)
- Role: Coordinates work execution without doing implementation itself
- Model: Claude Opus 4.5 with Extended Thinking (32k budget)
- Responsibilities:
- Read plans and build task dependency maps
- Delegate tasks to specialized agents
- Accumulate wisdom across tasks
- Verify results independently (never trust subagent claims)
- Manage parallel execution when tasks are independent
What Atlas CAN do:
- ✅ Read files to understand context
- ✅ Run commands to verify results
- ✅ Use
lsp_diagnosticsto check for errors - ✅ Search patterns with grep/glob/ast-grep
What Atlas MUST delegate:
- ❌ Writing/editing code files
- ❌ Fixing bugs
- ❌ Creating tests
- ❌ Git commits
Wisdom Accumulation System: After each task, Atlas extracts learnings and categorizes them into:
- Conventions: Code patterns, naming schemes, architectural styles
- Successes: What worked well, efficient approaches
- Failures: What failed, anti-patterns encountered
- Gotchas: Edge cases, tricky dependencies
- Commands: Useful bash commands, test procedures
These learnings are passed to ALL subsequent tasks, preventing repeated mistakes.
3. Worker Layer: Specialized Agents
Sisyphus-Junior (The Workhorse)
- Model: Claude Sonnet 4.5
- Role: Focused task execution
- Constraints:
- Cannot delegate (blocked from task/delegate_task tools)
- Cannot modify plan files (READ-ONLY)
- Must pass
lsp_diagnosticsbefore completion - Obsessive todo tracking enforced by system
Oracle (Architecture/Debugging)
- Model: GPT-5.2
- Role: Strategic reasoning, complex architecture decisions, debugging
- Permission: Read-only consultation
Librarian (Documentation/OSS)
- Model: GLM-4.7 Free
- Role: Multi-repo analysis, official documentation lookup, OSS implementation examples
- Tools: Context7 MCP (docs), grep.app MCP (GitHub search)
Explore (Fast Codebase Grep)
- Model: Grok Code / Gemini 3 Flash / Haiku (context-dependent)
- Role: Blazing fast codebase exploration and contextual grep
- Permission: Read-only
Frontend (UI/UX Engineer)
- Model: Gemini 3 Pro
- Role: Frontend development with strong design aesthetic
- Philosophy: Designer-turned-developer - crafts stunning UI/UX without mockups
Two Modes of Operation
Mode 1: Ultrawork ("Just Do It" Mode)
Usage: Include ultrawork or ulw in your prompt
ulw add authentication to my Next.js app
What Happens:
- Agent automatically explores codebase to understand patterns
- Researches best practices via specialized agents
- Implements following existing conventions
- Verifies with diagnostics and tests
- Keeps working until complete
When to Use: Quick work, complex tasks where explaining full context is tedious
Keywords that Activate Features:
ultrawork/ulw: Max performance mode - parallel agents, background tasks, aggressive explorationsearch/find: Parallel exploration modeanalyze/investigate: Deep analysis modeultrathink: Extended thinking mode (activates 32k reasoning budget)
Mode 2: Prometheus Planning ("Precise Orchestration")
Usage: Press Tab to enter Prometheus mode, then run /start-work
Workflow:
- Interview Phase: Prometheus asks clarifying questions while researching your codebase
- Plan Generation: Creates detailed work plan with tasks, acceptance criteria, guardrails
- Optional Review: Momus validates plan in high-accuracy mode
- Execution: Run
/start-work- Atlas orchestrates specialized sub-agents - Verification: Each task verified independently
- Learning Accumulation: Wisdom passed across tasks
- Progress Tracking: Survives session interruptions via
boulder.jsonstate file
When to Use:
- Multi-day or multi-session projects
- Critical production changes
- Complex refactoring spanning many files
- When you want documented decision trail
Core Mechanisms
1. Category-Based Delegation
Instead of hardcoding model names (which creates distributional bias), tasks are delegated by semantic category:
| Category | Model | Use Case |
|---|---|---|
visual-engineering |
Gemini 3 Pro | Frontend, UI/UX, design, styling |
ultrabrain |
GPT-5.2 Codex | Deep logical reasoning, complex architecture |
artistry |
Gemini 3 Pro | Highly creative tasks, novel ideas |
quick |
Claude Haiku 4.5 | Trivial tasks - typo fixes, single-file changes |
unspecified-low |
Claude Sonnet 4.5 | General tasks, low effort |
unspecified-high |
Claude Opus 4.5 | General tasks, high effort |
writing |
Gemini 3 Flash | Documentation, prose, technical writing |
Why Categories Matter: Model names create bias - "GPT-5.2" knows its limitations. "ultrabrain" means "think strategically" without implementation details leaking.
2. Skills System
Skills provide domain-specific instructions that prepend to agent prompts:
Built-in Skills:
- playwright: Browser automation (web scraping, testing, screenshots)
- git-master: Atomic commits, rebase/squash, history search
- frontend-ui-ux: Designer-turned-developer persona with strong aesthetic direction
Usage:
delegate_task(category="visual-engineering", skills=["frontend-ui-ux"], prompt="...")
delegate_task(category="general", skills=["playwright"], prompt="...")
3. Todo Continuation Enforcer
The "boulder pushing" mechanism that keeps agents working:
[SYSTEM REMINDER - TODO CONTINUATION]
You have incomplete todos! Complete ALL before responding:
- [x] Implement user service
- [ ] Add validation ← IN PROGRESS
- [ ] Write tests
DO NOT respond until all todos are marked completed.
This prevents the common failure mode where agents claim to be "done" prematurely.
4. Background Agent Execution
Run multiple agents in parallel like a real dev team:
# Launch in background
delegate_task(agent="explore", background=true, prompt="Find auth implementations")
# Continue working...
# System notifies on completion
# Retrieve results when needed
background_output(task_id="bg_abc123")
5. Context Injection & Memory
Directory AGENTS.md: Auto-injects hierarchical context when reading files:
project/
├── AGENTS.md # Project-wide context
├── src/
│ ├── AGENTS.md # src-specific context
│ └── components/
│ ├── AGENTS.md # Component-specific context
│ └── Button.tsx # Reading this injects all 3
Conditional Rules:
Rules from .claude/rules/ injected when glob patterns match:
---
globs: ["*.ts", "src/**/*.js"]
description: "TypeScript/JavaScript coding rules"
---
- Use PascalCase for interface names
- Use camelCase for function names
6. Ralph Loop Integration
Built-in /ralph-loop command for self-referential development:
/ralph-loop "Build a REST API with authentication"
Continuously works toward goal, detects <promise>DONE</promise> for completion.
Built-in Tools & MCPs
LSP Tools (IDE Features for Agents)
lsp_diagnostics: Get errors/warnings before buildlsp_rename: Rename symbol across workspacelsp_goto_definition: Jump to symbol definitionlsp_find_references: Find all usages
AST-Grep Tools
ast_grep_search: AST-aware code pattern search (25 languages)ast_grep_replace: AST-aware code replacement
Built-in MCPs
- websearch (Exa AI): Real-time web search
- context7: Official documentation lookup
- grep_app: Ultra-fast GitHub code search
Session Tools
session_list: List all OpenCode sessionssession_read: Read messages from a sessionsession_search: Full-text search across sessions
State Management & Persistence
Plan Files
.sisyphus/
├── drafts/ # Interview conversation drafts
├── plans/ # Generated work plans
│ └── {name}.md
└── notepads/ # Per-plan accumulated wisdom
└── {plan-name}/
├── learnings.md
├── decisions.md
├── issues.md
└── verification.md
Progress Tracking
- boulder.json: Tracks current plan and session ID for resume
- Todo State: Externalized in files, not chat memory
- Session Resumability: Work continues across interruptions
Key Heuristics & Best Practices
From the Ultrawork Manifesto
- Human Intervention = Failure Signal: If you need to babysit the agent, the system has failed
- Indistinguishable Code: Agent output should be indistinguishable from senior engineer code
- Token Cost vs Productivity: Higher token usage acceptable for 10x-100x productivity gains
- Minimize Human Cognitive Load: Human provides intent, agent handles everything else
- Predictable, Continuous, Delegatable: Work like a compiler - markdown in, working code out
Operational Guidelines
- Keep dev agents lean: Save context for code; planning agents can be heavier
- Many small tasks: Prefer many small tasks over branching mega-prompts (less context entropy)
- Checklists/gates as safety rails: Don't "trust the model" - verify everything
- Backpressure is non-negotiable: Tests/build/lint keep agents honest
- Separate planning from building: Never mix the two
- Single plan principle: No matter how large the task, one plan file prevents context fragmentation
Configuration
Config File Locations
.opencode/oh-my-opencode.json(project)~/.config/opencode/oh-my-opencode.json(user)
Supports JSONC (comments + trailing commas).
Example Configuration
{
"$schema": "https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/master/assets/oh-my-opencode.schema.json",
// Agent overrides
"agents": {
"oracle": {
"model": "openai/gpt-5.2"
},
"Sisyphus": {
"model": "anthropic/claude-opus-4-5",
"temperature": 0.3
}
},
// Custom categories
"categories": {
"data-science": {
"model": "anthropic/claude-sonnet-4-5",
"temperature": 0.2,
"prompt_append": "Focus on data analysis, ML pipelines, and statistical methods."
}
},
// Background task concurrency
"background_task": {
"defaultConcurrency": 5,
"providerConcurrency": {
"anthropic": 3,
"openai": 5,
"google": 10
}
},
// Disable specific features
"disabled_hooks": ["comment-checker"],
"disabled_agents": ["multimodal-looker"],
"disabled_skills": ["playwright"],
"disabled_mcps": ["websearch"]
}
Commands
| Command | Description |
|---|---|
/init-deep |
Initialize hierarchical AGENTS.md knowledge base |
/ralph-loop |
Start self-referential development loop until completion |
/ulw-loop |
Start ultrawork loop - max intensity mode |
/cancel-ralph |
Cancel active Ralph Loop |
/refactor |
Intelligent refactoring with LSP, AST-grep, TDD verification |
/start-work |
Execute Prometheus-generated plan |
Hooks System
Oh My OpenCode includes 25+ hooks that intercept lifecycle events:
Events: PreToolUse, PostToolUse, UserPromptSubmit, Stop
Notable Hooks:
- keyword-detector: Activates ultrawork/search/analyze modes from keywords
- todo-continuation-enforcer: Forces agents to complete todos (the "boulder pushing")
- comment-checker: Prevents excessive AI comments
- thinking-block-validator: Prevents API errors from malformed thinking blocks
- session-recovery: Recovers from session errors
- grep-output-truncator: Dynamically truncates output based on context window
- directory-agents-injector: Auto-injects AGENTS.md hierarchy
- rules-injector: Injects conditional rules from
.claude/rules/ - ralph-loop: Manages continuous loop execution
Installation
For Humans
Copy this prompt to your LLM agent:
Install and configure oh-my-opencode by following the instructions here:
https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/refs/heads/master/docs/guide/installation.md
For LLM Agents
curl -s https://raw.githubusercontent.com/code-yeongyu/oh-my-opencode/refs/heads/master/docs/guide/installation.md
When to Use Oh My OpenCode Sisyphus
Best Fit For
- High autonomy requirements: Want agents to work for hours without intervention
- Complex multi-session projects: Work that spans days/weeks with interruptions
- Multi-model orchestration: Need different models for different expertise (frontend, backend, architecture)
- Quality-critical work: Production code where "good enough" isn't acceptable
- Large codebases: Need intelligent exploration and context management
- Iterative refinement: Tasks requiring continuous improvement until perfect
Not Ideal For
- Simple one-off tasks: Overkill for single-file changes
- Exploratory coding: When you're still figuring out what to build
- Strict budget constraints: Uses more tokens for higher quality
- OpenCode beginners: Requires understanding of OpenCode concepts first
Comparison with Other Approaches
vs BMAD
- BMAD: Structured SDLC roles (PM, Architect, QA, Dev) with human checkpoints
- Sisyphus: Orchestrated agent swarm with minimal human intervention
- Trade-off: BMAD has more control gates; Sisyphus has more autonomy
vs Claude Autonomous Harness
- Harness: Two-agent pattern (Initializer + Coding) with test inventory as truth
- Sisyphus: Multi-agent orchestration with plan files as truth
- Trade-off: Harness emphasizes security/sandboxing; Sisyphus emphasizes specialization
vs Ralph Loop
- Ralph: Simple continuous loop with file-based memory (fresh context every iteration)
- Sisyphus: Built-in
/ralph-loopcommand PLUS orchestration layer - Trade-off: Ralph is simpler; Sisyphus adds planning and multi-agent delegation
vs Gas Town
- Gas Town: Massive parallelism (10-30 agents) with merge queue orchestration
- Sisyphus: Moderate parallelism with intelligent task delegation
- Trade-off: Gas Town for scale; Sisyphus for quality and precision
Decision Framework
Choose Oh My OpenCode Sisyphus when:
| Criterion | Evaluation |
|---|---|
| Task Complexity | Multi-step, requires planning |
| Code Quality Requirements | Production-ready, indistinguishable from human |
| Human Availability | Want to delegate and walk away |
| Project Duration | Multi-day or multi-session |
| Model Diversity Needs | Different expertise for different tasks |
| Budget Flexibility | Higher token usage acceptable for quality |
Decision Flow:
- Simple task? → Standard OpenCode
- Complex + lazy? →
ulwmode - Complex + precise? → Prometheus planning +
/start-work - Massive parallel work? → Consider Gas Town instead