Before the how, the why. AI isn't replacing developers — it's changing what developers do. The destination is a world where you define intent, and agents handle execution.
The core shift in AI-assisted development isn't about typing faster. It's about moving from maker to architect of intent. You stop writing every line and start defining what right looks like — then letting agents execute against that definition.
In the AI-SDLC, every stage of the development lifecycle has agent participation — guided by intent, constrained by harnesses, connected by dense handoff artefacts. Here's what a single ticket looks like at maturity:
Notice: you're involved at one step — reviewing the plan. Everything else is agents operating within the guardrails you've built. That's the destination.
The vision rests on three ideas. Every tab on this site teaches you how to build one of them.
Communicate goals, constraints, and success criteria — not exact words. Define the shape of the output, not the output itself. Move from crafting prompts to architecting agent interactions.
Don't tell agents what to do — make wrong things impossible. Tests, compilers, and linters are physics: structural constraints the agent can't ignore. Prompts are law: suggestions it might break.
Compress hours of exploration into structured documents that carry intent forward. research.md, plan.md, handoff.md — each phase produces an artefact that feeds the next, so no context is wasted.
There's a specific moment in the journey where everything comes together. It looks like this:
That's three layers of physics catching three different classes of error. The harness is the teacher, not you. That's what you're building towards.
This isn't about AI taking your job. It's about your job becoming more impactful. The progression looks like this:
The rest of this site is a progressive guide from where you are now to the vision above. Each tab builds on the last:
Start with the Roadmap to see the full journey at a glance with interactive checklists, or jump to any tab that matches where you are right now. Every stage delivers value on its own — you don't need to reach Stage 5 to benefit.
From prompting to intent engineering to physics thinking — the three layers that make AI agents reliable. This is the conceptual foundation everything else builds on.
Prompting is the execution layer. Context engineering is about what the agent knows. Intent engineering is about how the goal is structured — workflows, boundaries, and destinations.
How the goal is structured. Workflows, boundaries, destinations. Architecting the machine interaction.
What the agent knows. RAG, vector databases, memory. Supplying the raw material.
Execution layer. The words you write. Necessary but not sufficient.
Laws are instructions — they can be broken. Physics are structural constraints — they can't. Intent engineering encodes your goals as physics.
The rule: If a human has to manually check something, the harness is incomplete. Encode every compliance check as an automated constraint.
Six techniques that make your interactions with AI agents dramatically more effective. Try them, then ask yourself: why does this work? Understanding the mechanism matters more than memorising the trick.
These aren't magic incantations — they're structural patterns that exploit how language models reason. Each one changes the shape of the agent's thinking, not just the words you feed it. Once you see the pattern, you'll start combining them instinctively.
Notice the common thread across all six tactics. Each one changes the structure of the interaction, not just the words. They shape what the agent pays attention to, how it reasons, and what it produces.
This is the bridge to intent engineering. These tactics work because they change the structure of the interaction. Intent engineering takes the same idea further — encoding structure into workflows, harnesses, and artefacts rather than individual prompts. Start here, then level up on the next tabs.
Open Copilot Chat in your IDE and try this sequence. It combines three tactics in one interaction:
"I need to add a caching layer to our API. Please ask me 5 questions to understand what I need before suggesting an approach."
Prompt For QuestionsAfter answering the questions, ask: "Now create a Mermaid sequence diagram showing the cache hit and cache miss flows."
Visualisation"If I showed this design to a senior backend engineer, a security engineer, and a DBA — what would each of them say?"
Group Simulation"Rate the design out of 10 based on their feedback. If it isn't a 10, revise it."
ReflectionThe six tactics above are intuitive moves. These four frameworks give you repeatable scaffolding for prompts — each one builds on the last. Start with APE for quick tasks, graduate to COAST for complex scenarios. The 4S Framework (Single, Specific, Short, Surround) underpins all of them.
The simplest structured framework. Three components that turn a vague ask into a targeted prompt. Use this when you know what you want but need to be precise about it.
State the specific task you need performed.
Give context about the goal behind the action.
Define format, length, style, or constraints.
Adds a Role dimension — assigning an expertise persona activates domain-specific vocabulary and patterns. Use when the task benefits from a specialist’s perspective.
Assign a specific expertise.
Specify exactly what to produce.
Tech stack, constraints, dependencies.
Quality standards and deliverables.
Five components for complex, multi-step work. COAST forces you to think about edge cases and acceptance criteria up front — it maps directly to Agent Mode task decomposition.
The most intent-aligned framework. Start with the end state, provide the landscape, point to authoritative sources, and define quality standards. Maps directly to copilot-instructions.md and Copilot Workspace goals.
Start with the end state, not the process.
Architecture, dependencies, constraints.
Reference authoritative material for grounding.
Quality and format standards.
Progression, not competition. APE → RACE → COAST → GCSE isn't about picking the "best" framework. It's about escalating precision as complexity increases. A quick utility function? APE. A cross-service architectural change? GCSE. Match the framework to the task.
Good frameworks get you close. These two operational patterns close the remaining gap — grounding the AI with curated inputs and systematically fixing broken outputs.
A short, explicit bundle of information given to an AI before asking it to summarise, plan, refactor, or make decisions. Reduce hallucinations by constraining what the AI works with.
When AI output is poor, don't restart from scratch — diagnose the failure mode and apply the targeted fix.
As models get smarter, they prioritise helpfulness (guessing) over truthfulness (admitting ignorance). The output is fluent, so we stop checking. This is automation bias — and it's the single biggest risk in AI-assisted development. These structural rules make hallucinations visible.
AI accelerates each step, but humans control correctness. Same loop good engineers already follow — AI changes the speed, not the discipline.
For developers, hallucination manifests as code that looks idiomatic but calls non-existent APIs, uses deprecated patterns, or introduces subtle logic bugs. For product managers, it manifests as plausible-sounding but incorrect feature specs. It reads like expertise.
The risk: because the output is fluent, we stop checking. Trust is lost the moment a hallucination breaks a build or ships a wrong spec.
Don't rely on the agent's "morals" or instructions — that's Law. Build the interaction's architecture so inaccuracy is structurally difficult — that's Physics.
Remember the Reflection tactic — "rate your response out of 10"? It works brilliantly for creative drafts where the rating triggers an improvement loop. The model can genuinely improve a poem on its second pass.
But for extraction, logic, and code correctness, the confidence score comes from the same process that produced the error. If a model hallucinates a variable name, it will confidently rate itself "9/10" on that answer.
In humans, expertise produces expert language. In LLMs, expert language produces expertise. Forcing vocabulary like EXTRACTED and INFERRED isn't just metadata — it restricts predictions to expert regions of the training data, shifting the mean toward quality and reducing variance.
This is the fundamental mechanism behind all structural rules: by constraining the form of the output, you constrain the quality of the reasoning.
Document what exists. Reverse-engineer the intent that was never written down. Engineer the context so agents get curated knowledge, not raw dumps.
Prompt engineering treats the prompt as the product. Intent engineering treats the entire system — prompts, structure, feedback loops, constraints — as the product.
Intent engineering applies at every level — from vision through to operations — and spans all your workstreams. Each layer encodes intent in a different way.
A prompt says "please don't break the cart." A test suite says "the cart works or you don't proceed." Both communicate intent, but only one enforces it.
Think of it this way: Prompt engineering is writing a good brief. Intent engineering is writing a good brief, hiring a QA team, setting up CI/CD, and defining acceptance criteria — all encoded into the system the AI operates within.
Past roughly 40% context usage, AI reasoning degrades sharply — the model becomes confident but wrong. Every token you waste on irrelevant context is a token the agent can't use for thinking. This is why curated context beats raw dumps.
Use LSP over grep for precise queries. Compress findings into dense artefacts. Start each RPI phase in a fresh context window. Every token should earn its place.
Here's what "use LSP over grep" actually looks like. These are real queries from the nopCommerce steel thread — the same ones that found critical gaps grep missed.
// LSP: textDocument/references on AddToCartAsync // Returns: 14 callers across 8 files, with exact locations ShoppingCartController.cs:617 AddProductToCart_Catalog() ShoppingCartController.cs:680 AddProductToCart_Details() CheckoutController.cs:243 MigrateCart() OrderProcessingService.cs:891 ReOrder() // ... 10 more — grep found only 6 of these
// LSP: textDocument/documentSymbol on ShoppingCartService.cs // Returns: 62 symbols — 31 fields, 1 constructor, 27 methods Fields (31): _catalogSettings, _aclService, _customerService ... Methods (27): AddToCartAsync, GetShoppingCartAsync, DeleteShoppingCartItemAsync, FindShoppingCartItemInTheCartAsync ... // One query = complete class anatomy. No file-reading tokens spent.
// LSP: typeHierarchy/subtypes on IShoppingCartService // Returns: implementation chain with exact locations IShoppingCartService (interface, 22 methods) └─ ShoppingCartService (src/Libraries/Nop.Services/Orders/) // Confirms: single implementation, safe to extract subset // Compare: grep for "IShoppingCartService" returns 47 matches // including imports, comments, and XML docs — all noise
How to enable LSP: In VS Code, the language server runs automatically. In agent workflows (Claude Code, Copilot agent mode), use MCP servers like @anthropic/lsp-mcp to give agents LSP access. The key queries are find-references, document-symbols, and type-hierarchy — these three cover 90% of research needs.
Pick one vertical slice. Apply RPI: Research with LSP, Plan with risk registers, Implement with feedback loops. Prove the methodology on your codebase.
AI agents have limited context windows. RPI splits work into phases, each operating in a fresh context with only the artefacts it needs.
Explore the codebase freely. No implementation, no planning — pure discovery. Trace flows, find files, document connections.
Read only research.md. Produce ordered implementation steps, interface definitions, risk areas, and test strategies.
Read only plan.md. Execute each step, running the full test suite after every change. Don't proceed until green.
Information diets: Each phase gets only what it needs. The Plan agent never sees raw code — only compressed research. The Implement agent never sees research — only the ordered plan. This prevents context waste and forces density.
The width of each bar shows how much information each agent receives. Less is more — constrained input forces focused output.
Break complex work into phases, route each to a fresh context with exactly the artefacts it needs, then recompose the results. This works with context window limitations instead of fighting them.
Don't tell the agent what to do — make wrong things impossible. A failing test is worth a thousand prompt instructions. Structural constraints are your most reliable form of intent communication.
The test suite, the compiler, the linter — these are your real safety net. A mediocre model with a great harness outperforms a brilliant model with no guardrails. Invest in the harness first.
nopCommerce is a 200k+ line .NET e-commerce platform. The steel thread: the shopping cart flow from "Add to cart" through controllers and services to the database. One vertical slice through the entire architecture.
Reverse-engineering documentation from an existing codebase is the most common starting point for teams adopting AI-SDLC. Here's the methodology, using the nopCommerce cart flow as a worked example.
find-references on the controller action to discover every caller. Use document-symbols to map the service class (e.g., 62 symbols in ShoppingCartService.cs — 31 fields, 27 methods). Use type-hierarchy to find all implementations. Record exact file paths and line numbers.research.md: Entry Point → Controller Layer → Service Layer → Data Access → Observations. Include caller counts, dependency lists, and blocking conditions. The nopCommerce research.md maps the full add-to-cart flow in ~2k lines — down from 200k.research.md. Produces ordered steps: create interface → create implementation (copy, don't move) → register in DI → delegate from original → update controller. Each step ends with a verification command (dotnet build, dotnet test, npx playwright test).dotnet format --verify-no-changes && dotnet test && npx playwright test. The harness catches type errors (compiler), logic regressions (unit tests), and broken UI flows (Playwright). The agent self-corrects from error messages. No human code review needed during implementation.Why this works for brownfields: You're not asking AI to understand your whole codebase. You're giving it a compressed, verified map of one narrow flow — and a harness that catches mistakes structurally. The steel thread proves the methodology; then you repeat it for the next flow.
Understand the codebase, the steel thread, and the key files. Context-setting for humans before agents enter the picture.
Write Playwright E2E tests and unit tests before any refactoring. The harness defines "correct" structurally. This is the most important step.
Harnesses > ModelsAgent explores the add-to-cart flow end-to-end. Produces a dense research.md with every file, method, and connection documented.
RPI: ResearchFresh agent reads only research.md. Produces an ordered plan with interface definitions, DI registration, and a risk register.
RPI: PlanFresh agent follows the plan step-by-step, running the full test suite after each change. The harness catches mistakes; the agent learns from errors.
RPI: Implement + PhysicsThree moments where you see the methodology working in real time:
The agent introduces a subtle type mismatch. The C# compiler refuses to build. The agent reads the error, fixes the type, moves on. No human intervention.
Physics ThinkingThe agent refactors the cart service and accidentally changes calculation order. "Expected 42.99, got 38.50." Diagnosed and fixed.
Harnesses > ModelsThe service works, but a missing DI registration throws a 500 error. Playwright catches it by clicking "Add to cart." End-to-end verification.
All Three Together✓ Research before code — agent explored before touching anything ✓ Plan before implementation — ordered steps with risk analysis ✓ Harnesses before refactoring — tests existed before changes ✓ Feedback loops — test suite ran after every change ✓ Dense handoffs — research.md and plan.md carried intent forward ✓ Physics enforcement — compiler + tests caught real errors ✓ No manual checking — if the harness is green, the refactoring is correct
The takeaway: The model doesn't matter as much as the methodology. A well-structured system — with phases, harnesses, and feedback loops — produces reliable results regardless of which AI you use.
Stop relying on prompts. Build structural enforcement — tests, linters, compilers — that make wrong agent behaviour impossible. These are the patterns that encode intent as physics.
Define the shape of the output: required sections, detail level, format, and location. The agent fills in the scaffold.
Explicit negative instructions define the safe operating space. They prevent the most common failure modes.
## Constraints - Do NOT modify the database schema - Do NOT change the plugin architecture - Do NOT proceed to the next step until all tests pass - Focus only on the shopping cart steel thread
Predict what could go wrong and map each risk to the harness that catches it. This transforms "be careful" into "here's which test will fail."
Critical — App fails to start. All Playwright tests fail immediately.
Critical — InvalidOperationException. Caught by any E2E test.
High — Unit test asserts "Expected 42.99, got 38.50."
Build mandatory checkpoints into the workflow. The error message itself becomes a remediation instruction.
Execute one step from the plan
dotnet test + Playwright
Read error, diagnose, fix
Proceed to next step
Why this works: "Expected 42.99, got 0" is infinitely more useful than "please make sure the cart logic is correct." The harness teaches the agent what's wrong in machine-readable terms.
Give each phase only what it needs. Constrained input forces focused output. See the Methodology tab for the full visualization.
Full codebase access. Explore freely, compress findings into research.md.
Only research.md. No raw code. Forces planning from compressed summary.
Only plan.md + codebase. Follows the plan. Doesn't re-research or re-plan.
A progressive roadmap from first Copilot prompt to fully agentic workflows. Each stage builds on the last — check the boxes as you go.
The AI-SDLC is a software development lifecycle where AI agents participate at every stage — from discovery through deployment — guided by intent engineering, constrained by structural enforcement (physics), and connected by dense handoff artefacts.
It isn't a new process bolted onto what you already do. It's a progressive evolution of your existing SDLC: first AI assists you, then augments you, then orchestrates alongside you, and eventually runs autonomously with you setting intent and reviewing outcomes. The five stages below map this journey.
Get comfortable with Copilot's core features. Build muscle memory for the basics before adding complexity. This is where everyone starts.
You can use Copilot as a fast, reliable code companion. You know how to ask good questions and you always verify the answers.
Before AI can help you change a codebase, it needs to understand it. This stage is about reverse-engineering the intent that was never written down — architecture, conventions, business rules, decisions.
# Project: [Your Project Name] ## Architecture - Framework: .NET 10 / React / [yours] - Pattern: N-tier with service layer - Key entry points: Controllers → Services → Repositories ## Coding Standards - Async/await throughout; suffix Async on async methods - Interfaces for all services; register in DI container - No magic strings — use constants or enums ## Constraints - Do NOT modify the database schema - Do NOT change public API contracts - Do NOT proceed until all tests pass ## Verification Commands dotnet format --verify-no-changes # Style dotnet test # Unit + integration npx playwright test # E2E ## Steel Thread The add-to-cart flow: Browse → Product → Add → View Cart Files: ShoppingCartController.cs, ShoppingCartService.cs
.github/copilot-instructions.md for Copilot or CLAUDE.md at the repo root for Claude. This file is the single most impactful thing you can create — it turns every agent interaction from cold-start to context-aware.docs/adc/ ├── YYYY-MM-DD--decision-name.md ← Decision record └── YYYY-MM-DD--decision-name/ ← RPI artefacts ├── research.md ← Phase 1 output ├── plan.md ← Phase 2 output └── handoff.md ← Agent context for Phase 3 ## Decision Record Template Title: Extract CartService from ShoppingCartService Date: 2026-03-31 Status: Proposed | Accepted | Implemented | Rejected Motivation: ShoppingCartService is 1,976 lines with 31 deps Approach: Extract steel thread methods into focused service Rejected: Split by CRUD (too granular), rewrite (too risky) Rollback: Delete CartService, revert delegation in original Harness: dotnet build + dotnet test + npx playwright test
research.md maps the full add-to-cart flow (UI → Controller → Service → Data) with precise file paths and line numbers. The plan.md contains 7 ordered steps, each with a verification command. The handoff.md gives the implement agent just enough context — steel thread scope, key selectors, and harness commands — without re-explaining the research.Your codebase has agent-readable context. Copilot understands your conventions, architecture, and constraints. You're no longer starting from zero every prompt.
This is where you stop relying on prompts and start building structural enforcement. Tests, linters, and CI pipelines become the physics that make wrong agent behaviour impossible. You're closing the gaps in code coverage that make AI unreliable.
Your codebase has physics. AI agents can make changes and get immediate, structural feedback. Wrong behaviour is caught automatically, not by code review.
Pick one narrow, vertical slice through your entire architecture — a "steel thread." Apply the full RPI methodology: Research with LSP, Plan with risk registers, Implement with feedback loops. This is your proof-of-concept that the methodology works on your codebase.
You've proven the methodology works on your actual codebase. You have a repeatable playbook, reusable skills, and concrete results to share with your team.
Graduate your proven patterns from individual to team infrastructure. Shared skills repos, pipeline agents, PR review bots, automated research phases. The methodology that worked on your machine now runs in your CI/CD pipeline.
AI agents are team infrastructure. Your pipeline researches, plans, implements, and verifies — with humans setting intent and reviewing outcomes. The AI-SDLC is operational.
Each stage unlocks the next. You can't build physics without capturing intent first. You can't run a steel thread without physics. And you can't scale what you haven't proven. The progression is deliberate.
Start where you are. Most teams are somewhere between Stage 1 and Stage 2. That's fine. The roadmap isn't a race — it's a progression. Each stage makes the next one possible, and each stage delivers value on its own.
By Stage 5, AI participates in every phase of the SDLC. But even at Stage 1, it's accelerating your work. The difference is scope and trust.
The deepest change isn't in tooling — it's in mindset. As you progress through the stages, your role evolves from writing code to defining what correct looks like.
Graduate your proven patterns from individual to team infrastructure. Shared skills, pipeline agents, and the AI-native SDLC.
AI adoption in software development follows a progression. Each wave changes the ratio of humans to agents — and the kind of intent engineering required.
AI as autocomplete. You drive, it suggests. Code completion, inline help, simple Q&A. The human does all the thinking.
You direct agents to do whole tasks. Local skills, RPI methodology, personal agent workflows. This is where intent engineering begins.
Shared skills repos, pipeline agents, PR review bots. Agents become team infrastructure. Humans supervise.
Agents coordinating agents at scale. Humans set intent and review outcomes. The SDLC runs itself.
You don't jump straight to pipeline agents. The adoption path starts with you — proving the patterns locally — then graduating them to shared infrastructure.
Build your own skills, refine your own agent workflows, prove the RPI methodology on real tasks. This is your lab — experiment, iterate, learn what works. You're building muscle memory for intent engineering.
Once your skills and workflows are proven, promote them to shared infrastructure. A team skills repo means everyone benefits from your hard-won patterns. Shared CLAUDE.md files encode team conventions as agent-readable intent.
Agents move from your terminal into the CI/CD pipeline. PR review agents check for style, test coverage, and architectural compliance. Research agents pre-analyse tickets. Plan agents draft implementation approaches before a human even starts.
The full AI-SDLC: agents that research, plan, implement, and verify — with humans setting intent and reviewing outcomes. The RPI methodology you proved locally is now an organisational capability. Custom Skills encode your org's patterns as portable, reusable knowledge.
A skill is a self-contained folder — a SKILL.md plus any bundled scripts, references, and examples. Skills are how individual knowledge becomes team capability.
The industry is converging. GitHub's Custom Skills, Copilot's Modernization Agent, the Agentic Context Framework — they all use the same pattern: self-contained skill folders with documentation, scripts, and examples. Your RPI methodology can be packaged as a skill. Your ADC pattern can be a skill. This is how methodology becomes infrastructure.
In the AI-SDLC, every stage of the development lifecycle has agent participation — guided by intent, constrained by harnesses, connected by dense artefacts.
The key insight: The same intent engineering patterns that make agents reliable on your machine — phased workflows, dense artefacts, physics-based enforcement — are exactly what make them reliable in a pipeline. The AI-SDLC isn't a different methodology. It's the same methodology, graduated from individual to infrastructure.
The most common question: "Where does AI actually fit in our process?" This map shows every stage of the software development lifecycle with concrete AI capabilities you can use today.
AI isn't just for coding. It fits across the entire lifecycle — from planning through operations. The maturity varies by stage: coding is ready now, operations is still emerging. Here's the big picture.
Monday morning starter kit: You don't need all of this. Start with three things: (1) Create a .github/copilot-instructions.md with your coding standards. (2) Try agent mode on one real task. (3) Ask Copilot to generate tests for code you wrote this week. That's enough to change how your team works.
Focused, printable reference sheets distilled from the workshop material. Each answers a specific question you can hand to your team or your exec.
How AWS AI-DLC and GitHub Spec Kit handle structured planning. When to use each, the tools you have now, and why specs before code matters.
Set up copilot-instructions.md, custom skills, and MCP servers. Agent mode vs coding agent. Effective prompts for real coding work.
Test generation, test-first workflows, security scanning, automated PR review, custom review agents. Commands and example prompts.
Understanding legacy code with /explain, the steel thread pattern, agent mode refactoring, Dependabot + Copilot, and building the safety net.
Generate and fix workflow YAML, common pipeline patterns, infrastructure as code across providers, and custom CI review agents.
These are designed to be shared. Send a one-pager to your exec, print one for your squad, or use them as a reference during project time. Each stands alone — no prior context needed.