It starts with people. A user-led workshop frames the problem and gathers requirements — then two AI agents, Codex the implementer and Claude the validator, build, test, fix, and document it autonomously through stage gates and guardrails until it ships.
Great products are reasoned about from six altitudes — from why it matters down to how it ships. Put on each hat in turn, capture the thinking as diagrams, and only then drop into engineering. Each mindset feeds the workshop, the requirements, and the pipeline below.
The vibe-coding trap. Pick up the tool, say "build me app XYZ," and Claude — or any capable agent — will faithfully build it: fast, confident, and often in completely the wrong direction. Because the direction was never set, you usually don't notice until it's too late to change course cheaply. The six mindsets set the direction first — so the agent builds the right thing.
Each mindset produces artifacts — vision, strategy, landscape, solution & application designs — that flow straight into the user-led workshop, the requirements matrix, and the AutoCode pipeline below. You descend the altitudes once; the agents build from the result.
Long before AutoCode runs, a facilitated engagement frames the real problem and gathers requirements with the people who live it. Code Easy's documented method runs in four phases over roughly 6–11 days.
Discovery and architecture aren't box-ticking — they follow the five design-thinking stages, keeping the work centred on the people who'll actually use what gets built.
Interview sponsors, users & tech stakeholders. Understand the real pain, not the stated ask.
Frame the problem & success criteria. Capture the requirements matrix and what's out of scope.
Map architecture & solution options with stakeholders. Explore trade-offs before committing.
Build to learn — AI-powered rapid prototyping, visible in Code Easy, iterated daily.
Validate against criteria with real users; feed findings back. Then hand off to AutoCode.
“Discovery isn't about documenting what the client says they want — it's about understanding the problem deeply enough to build something that actually solves it.”
Every requirement is logged to the matrix
(type · title · value · rationale · priority · source), prioritised Must → Won't,
and categorised across five types so coverage is comprehensive.
Captured as a simple CSV — type,title,value,rationale,priority,source — so the whole
matrix imports straight into Code Easy and feeds the spec the agents build from.
Six live Code Easy visual modes validate structure with the right audience — business domains for sponsors, dependency graphs for engineers, journeys for UX.
CLAUDE.md is authored here as the architecture context file (layers, patterns, constraints),
and significant choices become Architecture Decision Records in /docs/adr/ —
title · context · decision · consequences · alternatives.
Each loop is 30–60 minutes, monitored live in Code Easy. Intervene the moment files land in the wrong layer or scope creeps.
Pick 2–3 features for the cycle.
Direct Claude with matrix context.
Watch activity & architecture live.
Validate against acceptance criteria.
Refine, or promote to AutoCode.
The wizard turns selectable, framework-grounded inputs into the spec & CLAUDE.md the agents build from — so choices align with your strategy, budget, standards and your team.
The method converges four ways of building software — then adds a fifth. Move at vibe-coding speed with traditional-coding quality, with visible architecture instead of a black box.
“Write every line yourself. Understand every byte.”
“AI suggests, I decide.”
“Describe it, generate it, ship it.”
“Build to learn, not to keep.”
“Define the goal, let AI agents collaborate to build it.” Codex implements, Claude validates — inside guardrails, with everything visible. This is where the workshop hands off to AutoCode.
Black box · hope it works.
Visual architecture · verify it works.
The ads promise you'll think of an app and AI builds it. That's a compelling story and a systematically flawed one. Here's where the time actually goes — and why the missing work can't be prompted away.
Each task runs implement → validate → fix → re-validate. Many iterate 2–3 times before passing. Generation is cheap; the round-trips are the cost.
No API before the schema, no UI test before the API. Most of a build is sequential — more agents can't collapse the timeline.
Max ~20 files & ~1000 lines per task forces large features into many small, reviewable units. Reviewability costs wall-clock by design.
Wiring, smoke, e2e, startup & Chrome UI checks — gates with 60–300s timeouts that fail and restart the loop. You can't generate your way past testing.
No model holds a large codebase in memory. It reads a slice, reasons, writes, re-reads — thousands of times across the build.
LLMs emit plausible code, not verified code. Subtle wrongness is the normal mode — exactly why drift detection flags after 3 failures.
The pitch doesn't remove the hard work — it hides where it lives. Three structural reasons it breaks down beyond toy apps.
A one-line idea maps to millions of valid apps. The hard part was always deciding precisely what to build. That ambiguity doesn't vanish — the AI either asks you (a requirements process) or guesses (the wrong app). The magic prompt conceals the problem; it doesn't solve it.
AI crushes accidental complexity — boilerplate, syntax, glue. But the essential complexity — the domain, constraints, edge cases, trade-offs — is irreducible (Brooks' “No Silver Bullet”). You can pay it faster, never skip it.
Even with perfect generation, you must confirm real behaviour across every state — errors, auth, concurrency, persistence, deploy. That means running & observing, bounded by real time, not token speed.
The demos dodge all three. A to-do app or landing page is well-trodden, low-ambiguity, and sits squarely in the training data — the model recalls a memorised pattern, it doesn't reason about your novel requirements. Scale up or go original and the illusion collapses into the loop above.
Days in autonomous mode isn't the system being slow. It's the system being honest about where software actually gets built.
The autonomous controller drives a state machine. Each card below is a real state in
autonomous-controller.js. Watch the signal flow through the pipeline.
Codex claims work from the queue and writes code. On completion the system auto-queues a
validation task for Claude. Pass → next task. Needs changes → a fix task flows back to Codex.
codex · caps: code · implement · fixclaude · reviews correctness · style · securityagent_work_queue & agent_messages tables —
the state machine drives the handoff.
Gates flagged AUTO run a command and pass themselves;
gates flagged MANUAL pause in
awaiting_approval for a human. Required gates must pass before pre_deploy.
Enforced at the watcher (codex-watcher/config.js), per-spec in the database,
and continuously by the drift detector.
A single task may touch at most maxFilesPerTask files before requiring approval.
maxLinesChanged caps the diff size of any one task to keep changes reviewable.
Drift detector flags a task after maxFailuresBeforeFlag = 3 failed attempts — no runaway fix loops.
Flags the run when staleWorkThreshold = 5 items pass with no real progress.
Agents can never write to secrets or keys:
.env · .env.**.key · *.pemsecrets/** · credentials/**Sensitive actions pause for a human:
package.jsonOptional human approval at:
pre_implement · post_implementcritical_change (>5 files / sensitive)pre_deployThe test runner queues and executes these as work flows through the pipeline. A failure with
autoFixOnFailure spawns a fix task straight back to Codex.
Connectivity & integration sanity after each phase.
npm run test:wiring || npm run lintCore functionality builds & runs.
npm run test:smokeNo critical errors — lint & types are clean.
npm run lint && npm run typecheckValidates package.json, boots the app, watches for errors (30s).
Opt-in via guardrail — cross-module behaviour.
npm run test:integration || npm testFull end-to-end run before the pre_deploy gate.
When enabled per-repo, Code Easy launches a real Chrome (no manual debug port), drives it over the DevTools Protocol, and feeds console / JS / network errors and screenshots back into the autonomous fix loop — so the agents can verify the running UI, not just the source.
Per-repo switch in the Chrome tab gates everything (403 if off).
Spawns system Chrome with an isolated profile + debug port.
Opens the test URL & auto-connects the CDP debugger.
Console logs, JS exceptions, failed requests, screenshots.
Critical errors → a fix task back to Codex.
chrome_test_config--user-data-dirAs specs are processed and code lands, the project's intent and history are recorded through the MCP knowledge tools — so the docs grow with the codebase.
Specs & acceptance criteria stored via store_requirement.
Architecture choices logged with store_decision.
Every generated plan and task breakdown persists in the DB.
Summaries via store_session_summary at session boundaries.
How it works today: documentation is MCP-assisted — requirements, decisions, plans and session summaries are captured through the knowledge tools and surfaced in the dashboard, rather than auto-written to files. Hooks nudge an agent to record a session summary at natural stopping points.
All 88 tools are exposed to Claude through the MCP server and on by default. Each can be switched on or off from one global setting — so you control exactly which capabilities the agents can use, grouped here by function.