AutoCode — Autonomous Coding Pipeline

Plan Top-Down · Before a Line of Code

The 6 Mindsets for a Successful Product

Great products are reasoned about from six altitudes — from why it matters down to how it ships. Put on each hat in turn, capture the thinking as diagrams, and only then drop into engineering. Each mindset feeds the workshop, the requirements, and the pipeline below.

The vibe-coding trap. Pick up the tool, say "build me app XYZ," and Claude — or any capable agent — will faithfully build it: fast, confident, and often in completely the wrong direction. Because the direction was never set, you usually don't notice until it's too late to change course cheaply. The six mindsets set the direction first — so the agent builds the right thing.

Founder / CEO

Why

CTO

Whether

Enterprise Architect

Where it fits

Solutions Architect

How it connects

Application Architect

How it's built

Engineering

Build it

Founder / CEOVision

"Why are we building this, and for whom?"

The problem worth solving & who feels it
Market opportunity & target customer
Business model — how it makes money
North-star metric & definition of success
Funding, runway & timing
Competitive moat & build-vs-buy

CTOStrategy

"Can we build it, sustain it, and afford it?"

Technology strategy & platform bets
Build vs buy vs partner
Team, skills & org to deliver
Security, compliance & risk posture
Total cost of ownership & cloud strategy
Velocity vs quality & tech-debt strategy

Enterprise ArchitectLandscape

"How does it fit the whole organization?"

Alignment with standards & technology radar
Integration with existing systems & data domains
Governance, security & compliance (GDPR/SOC2)
Reuse of shared platforms & capabilities
Data strategy & master-data ownership
Portfolio & roadmap fit, rationalization

Solutions ArchitectSolution

"How do the systems fit together for this solution?"

End-to-end design across systems & teams
Integration patterns, APIs, events & contracts
Non-functional requirements (scale, latency, uptime)
Identity, auth & security boundaries
Deployment topology & environments
Migration / cutover & trade-off analysis

Application ArchitectApplication

"How is this application structured?"

Layers, modules & component boundaries
Design patterns & framework choices
Data model & internal API design
State, error handling & observability
Testability & maintainability
Coding standards (captured in CLAUDE.md)

EngineeringBuild it

"Now build it — visibly, with guardrails." → hands off to the pipeline below.

Implement → validate → iterate (Codex + Claude)
Tasks, tests, stage gates & guardrails
Real-time visibility in Code Easy
Documentation captured as you go

Each mindset produces artifacts — vision, strategy, landscape, solution & application designs — that flow straight into the user-led workshop, the requirements matrix, and the AutoCode pipeline below. You descend the altitudes once; the agents build from the result.

Phase 0 · Before the Code

It starts with a user-led workshop

Long before AutoCode runs, a facilitated engagement frames the real problem and gathers requirements with the people who live it. Code Easy's documented method runs in four phases over roughly 6–11 days.

PHASE 11–2 days

Discovery

1–2 days to frame the problem deeply — interviews, requirements, constraints.

Requirements MatrixStakeholder MapSuccess Criteria

PHASE 21–2 days

Architecture

1–2 days of system & component mapping, validated visually with stakeholders.

Architecture docsCLAUDE.mdADRs

PHASE 33–5 days

Prototyping

3–5 days of AI-powered rapid building with daily stakeholder iteration.

Working prototypeUpdated matrix ✓Committed code

PHASE 41–2 days

Handoff

1–2 days to finalise documentation, ADRs & roadmap — bridges into AutoCode.

Production roadmapCode Easy exportHandoff package

The Lens · Design Thinking

Every phase runs on a design-thinking loop

Discovery and architecture aren't box-ticking — they follow the five design-thinking stages, keeping the work centred on the people who'll actually use what gets built.

STAGE 1

Empathise

Interview sponsors, users & tech stakeholders. Understand the real pain, not the stated ask.

STAGE 2

Define

Frame the problem & success criteria. Capture the requirements matrix and what's out of scope.

STAGE 3

Ideate

Map architecture & solution options with stakeholders. Explore trade-offs before committing.

STAGE 4

Prototype

Build to learn — AI-powered rapid prototyping, visible in Code Easy, iterated daily.

STAGE 5

Test

Validate against criteria with real users; feed findings back. Then hand off to AutoCode.

Discovery · Design Thinking

Three conversations frame the problem

“Discovery isn't about documenting what the client says they want — it's about understanding the problem deeply enough to build something that actually solves it.”

Empathise

Business sponsors

What business problem are we solving?
How do you measure success today — and after?
What happens if we don't build this?
What's the budget & timeline expectation?

Understand

End users

Walk me through your current workflow, step by step.
What's the most frustrating part of the process?
What workarounds do you use today?
What would “delightful” look like?

Constrain

Technical stakeholders

What systems must this integrate with?
Authentication / authorization requirements?
Compliance or security requirements?
Who maintains this after handoff?

Requirements Gathering

Prioritised by MoSCoW, mapped to requirement types

Every requirement is logged to the matrix (type · title · value · rationale · priority · source), prioritised Must → Won't, and categorised across five types so coverage is comprehensive.

Priority

Requirement Types

Must have

TechnicalFunctional

Should have

ArchitecturalDesign rationale

Could have

Enterprise alignment

Won't have

Out of scope

Captured as a simple CSV — type,title,value,rationale,priority,source — so the whole matrix imports straight into Code Easy and feeds the spec the agents build from.

Architecture Mapping

The system is drawn with stakeholders

Six live Code Easy visual modes validate structure with the right audience — business domains for sponsors, dependency graphs for engineers, journeys for UX.

1Force Graph — dependencies

2Treemap — size & complexity

3Tree — hierarchy

4Architecture — technical layers

5Business Architecture — domains

6User Flow — journeys

CLAUDE.md is authored here as the architecture context file (layers, patterns, constraints), and significant choices become Architecture Decision Records in /docs/adr/ — title · context · decision · consequences · alternatives.

Rapid Prototyping

A tight build-to-learn cycle

Each loop is 30–60 minutes, monitored live in Code Easy. Intervene the moment files land in the wrong layer or scope creeps.

Plan

Pick 2–3 features for the cycle.

Prompt

Direct Claude with matrix context.

Monitor

Watch activity & architecture live.

Review

Validate against acceptance criteria.

Iterate

Refine, or promote to AutoCode.

What The Workshop Captures

One pass captures the whole picture

The wizard turns selectable, framework-grounded inputs into the spec & CLAUDE.md the agents build from — so choices align with your strategy, budget, standards and your team.

Vision

Why & for whom

user typessuccess metricsoutcomes

CEO mindset · design thinking

Requirements

Prioritised scope

MoSCoW5 typesCSV import

Must · Should · Could · Won't

Architecture

Stack & targets

platforms / OScloudSLA / OLAAPI / CLIauth

CLAUDE.md templates

Strategy

Budget & evolution

invest / optimise / outsourcebuild vs run/operate

Wardley mapping

Governance

Delivery model

Team Topologiesfitness functionsADRsevolutionary

TOGAF ADM → Emergent Stack

Team

Skills alignment

languagesframeworksseniorityknown gaps

Build to what the team can maintain

Integrations

External services

3rd-party providerscustom APIswebhooks

Connect & consume

UI & Design

Look & standards

light / darkcolour schemesdesign systemsWCAG AAsketches

Suggestions; agents still apply best judgment

Output

Build-ready spec

requirementsCLAUDE.mdADRsUI refsdraft spec

Straight into AutoCode

Requirements Matrix+ CLAUDE.md→ Claude Plan Mode→ Specification→ AutoCode pipeline ↓

▶ Run the workshop & export a bundle →

Reality Check

Why a real app takes days — not ten seconds

The ads promise you'll think of an app and AI builds it. That's a compelling story and a systematically flawed one. Here's where the time actually goes — and why the missing work can't be prompted away.

wall-clock ≈ tasks × round-trips each × (latency + verification) Typing speed isn't in this equation. The loop is. A large app is hundreds of tasks, each iterating until it passes.

Where the time actually goes

It's the loop, not the keystrokes

Each task runs implement → validate → fix → re-validate. Many iterate 2–3 times before passing. Generation is cheap; the round-trips are the cost.

A dependency graph, not parallel

No API before the schema, no UI test before the API. Most of a build is sequential — more agents can't collapse the timeline.

Guardrails throttle on purpose

Max ~20 files & ~1000 lines per task forces large features into many small, reviewable units. Reviewability costs wall-clock by design.

Verification is slow, real work

Wiring, smoke, e2e, startup & Chrome UI checks — gates with 60–300s timeouts that fail and restart the loop. You can't generate your way past testing.

Context windows force chunking

No model holds a large codebase in memory. It reads a slice, reasons, writes, re-reads — thousands of times across the build.

Correction is structural, not rare

LLMs emit plausible code, not verified code. Subtle wrongness is the normal mode — exactly why drift detection flags after 3 failures.

Why “describe it and it's built” can't scale

The pitch doesn't remove the hard work — it hides where it lives. Three structural reasons it breaks down beyond toy apps.

PILLAR 1

Underspecification

A one-line idea maps to millions of valid apps. The hard part was always deciding precisely what to build. That ambiguity doesn't vanish — the AI either asks you (a requirements process) or guesses (the wrong app). The magic prompt conceals the problem; it doesn't solve it.

PILLAR 2

Essential vs accidental complexity

AI crushes accidental complexity — boilerplate, syntax, glue. But the essential complexity — the domain, constraints, edge cases, trade-offs — is irreducible (Brooks' “No Silver Bullet”). You can pay it faster, never skip it.

PILLAR 3

Verification doesn't compress

Even with perfect generation, you must confirm real behaviour across every state — errors, auth, concurrency, persistence, deploy. That means running & observing, bounded by real time, not token speed.

The demos dodge all three. A to-do app or landing page is well-trodden, low-ambiguity, and sits squarely in the training data — the model recalls a memorised pattern, it doesn't reason about your novel requirements. Scale up or go original and the illusion collapses into the loop above.

The advertised promise

Idea→✨→Done

Assumes typing was the bottleneck
No specification, no iteration
Correctness taken on faith
Works only for memorised toy apps

The engineering reality

Frame→Specify→Generate→Verify→Correct→Ship

Requirements & verification dominate the cost
Hundreds of validate-and-fix cycles
Correctness is proven, not assumed
Holds up for real, novel systems

Days in autonomous mode isn't the system being slow. It's the system being honest about where software actually gets built.

The Autonomous Loop

Two agents, one feedback cycle

Codex claims work from the queue and writes code. On completion the system auto-queues a validation task for Claude. Pass → next task. Needs changes → a fix task flows back to Codex.

Implementer

Codex

agent type codex · caps: code · implement · fix

implement Write new features & modules
update Modify existing code
refactor Improve structure & quality
bugfix Fix specific defects
fix Apply Claude's review findings
phase_validate Run build · typecheck · lint

code submitted → ← fix / needs_changes

Validator

Claude

agent type claude · reviews correctness · style · security

validate Review each code change
passed ✓ Mark task done, advance spec
needs_changes Emit findings → new fix work
error analysis Diagnose failed tasks & suggest fixes

Communicate through the agent_work_queue & agent_messages tables — the state machine drives the handoff.

Safety System

Guardrails keep the agents inside the lines

Enforced at the watcher (codex-watcher/config.js), per-spec in the database, and continuously by the drift detector.

Max files / task

A single task may touch at most maxFilesPerTask files before requiring approval.

1000

Max lines changed

maxLinesChanged caps the diff size of any one task to keep changes reviewable.

Failure flag

Drift detector flags a task after maxFailuresBeforeFlag = 3 failed attempts — no runaway fix loops.

Stall threshold

Flags the run when staleWorkThreshold = 5 items pass with no real progress.

Blocked paths

Agents can never write to secrets or keys:

.env · .env.*
*.key · *.pem
secrets/** · credentials/**

Requires approval

Sensitive actions pause for a human:

Deleting files
Editing package.json
Editing lock files

Semi-autonomous checkpoints

Optional human approval at:

pre_implement · post_implement
critical_change (>5 files / sensitive)
pre_deploy

Auto-Testing

Six layers of testing run automatically

The test runner queues and executes these as work flows through the pipeline. A failure with autoFixOnFailure spawns a fix task straight back to Codex.

Wiring test

Connectivity & integration sanity after each phase.

npm run test:wiring || npm run lint

Smoke test

Core functionality builds & runs.

npm run test:smoke

Debug check

No critical errors — lint & types are clean.

npm run lint && npm run typecheck

Startup test

Validates package.json, boots the app, watches for errors (30s).

spawn & observe · 30s timeout

Integration test

Opt-in via guardrail — cross-module behaviour.

npm run test:integration || npm test

E2E validation

Full end-to-end run before the pre_deploy gate.

npm run test:e2e || npm run test

Browser-Level Verification · v1.3.2

Chrome UI auto-testing

When enabled per-repo, Code Easy launches a real Chrome (no manual debug port), drives it over the DevTools Protocol, and feeds console / JS / network errors and screenshots back into the autonomous fix loop — so the agents can verify the running UI, not just the source.

Toggle on

Per-repo switch in the Chrome tab gates everything (403 if off).

Launch Chrome

Spawns system Chrome with an isolated profile + debug port.

Navigate + connect

Opens the test URL & auto-connects the CDP debugger.

Capture signals

Console logs, JS exceptions, failed requests, screenshots.

Feed the loop

Critical errors → a fix task back to Codex.

MCP tools

chrome_launchchrome_closechrome_test_status chrome_get_errorschrome_screenshotchrome_evaluate

Per-repo config

enabled · debug_port (9222)
test_url · chrome_path
headless mode
stored in chrome_test_config

Safe by design

Isolated --user-data-dir
No puppeteer / playwright dep
Auto-closed on shutdown
Only runs if the repo opts in

Living Documentation

Knowledge is captured as the project is built

As specs are processed and code lands, the project's intent and history are recorded through the MCP knowledge tools — so the docs grow with the codebase.

Requirements

Specs & acceptance criteria stored via store_requirement.

Decisions (ADRs)

Architecture choices logged with store_decision.

Plans & tasks

Every generated plan and task breakdown persists in the DB.

Session memory

Summaries via store_session_summary at session boundaries.

How it works today: documentation is MCP-assisted — requirements, decisions, plans and session summaries are captured through the knowledge tools and surfaced in the dashboard, rather than auto-written to files. Hooks nudge an agent to record a session summary at natural stopping points.

The 6 Mindsets for a Successful Product

Founder / CEO

CTO

Enterprise Architect

Solutions Architect

Application Architect

Engineering

It starts with a user-led workshop

Discovery

Architecture

Prototyping

Handoff

Every phase runs on a design-thinking loop

Empathise

Define

Ideate

Prototype

Test

Three conversations frame the problem

Business sponsors

End users

Technical stakeholders

Prioritised by MoSCoW, mapped to requirement types

The system is drawn with stakeholders

A tight build-to-learn cycle

Plan

Prompt

Monitor

Review

Iterate

One pass captures the whole picture

Why & for whom

Prioritised scope

Stack & targets

Budget & evolution

Delivery model

Skills alignment

External services

Look & standards

Build-ready spec

Crossing the Boundary

Traditional coding

Copiloting

Vibe coding

Rapid prototyping

The fifth mode · Multi-Agent Autonomous Coding

Why a real app takes days — not ten seconds

Where the time actually goes

It's the loop, not the keystrokes

A dependency graph, not parallel

Guardrails throttle on purpose

Verification is slow, real work

Context windows force chunking

Correction is structural, not rare

Why “describe it and it's built” can't scale

Underspecification

Essential vs accidental complexity

Verification doesn't compress

From specification to shipped — autonomously

Spec Created

Planning

Implementing

Validating

Tested & Gated

Completed

Two agents, one feedback cycle

Codex

Claude

Eight stage gates guard the path to deploy

Guardrails keep the agents inside the lines

Max files / task

Max lines changed

Failure flag

Stall threshold

Blocked paths

Requires approval

Semi-autonomous checkpoints

Six layers of testing run automatically

Wiring test

Smoke test