Phases, modes, overrides — what's safe to delegate, what isn't.
Easier to learn from than "when to use." These are the failure-mode patterns we keep seeing.
Six phases. Each writes a row to task_steps. You can replay individual phases on retry.
skip-tests)On retry, completed phases are skipped (checkpointed in Postgres). Workspace + branch are reused.
| Mode | Trigger | Best for |
|---|---|---|
stiv (default) | No special label | Well-scoped, single-file, TDD-friendly |
claude-code | Label claude-code | Multi-file, exploratory — recommended today |
claude-code-tmux | Label claude-code-tmux | Same as above + live tmux attach for debugging |
Mode resolution: task label → repo config (execution_mode) → task property → default (stiv).
config/repositories.json — per-repo: `branch`, `testCmd`, `execution_mode`, `hipaa_mode`, `blockedPaths`skip-tests — bypass Phase 1 (test generation)force-rereview — one-shot bypass of the SHA-dedup gate on PR auto-reviewWORKER_MODEL / OPENROUTER_TRIAGE_MODEL — model overrides; never bypass resolveModel()SLING_TICKET_BUDGET_USD — per-ticket cost cap (default $25)Categories from failure-classifier.js. Each is reported on the PR + dashboard.
Generated tests pass but existing suite fails. Most often: brittle existing test referencing an old field name.
tsc errors. Usually a missing type import on a refactor.
Easy — re-run usually fixes. If repeated, the lint config drifted.
Per-ticket cost cap hit. Brief is too big — split it or raise the cap on this one ticket.
Hit the worker timeout. Long compile / test suite — bump CLAUDE_CODE_TIMEOUT_MS for the run.
Risk assessment refused. Touches auth/payments/PHI. Implement by hand.
/ticket/{owner}/{repo}/{issue} — per-ticket lifecycle + cost + phase timings/dashboard — live state, approvals queue, mode mix/admin/onboarding-stats (when enabled) — adoption funnelSELECT * FROM task_steps WHERE task_id = ? — raw phase logSELECT * FROM ai_call_log WHERE task_id = ? — every model call + cost