Harness Engineering · AI Agents · 2026
Everything that surrounds the AI model, except the model itself. The discipline that OpenAI, Anthropic, Martin Fowler and Karpathy adopted in 2026. Agent = Model + Harness.
From prompt engineering to harness engineering: the 3 eras of AI engineering.
Prompt Engineering → tell the model WHAT to do (2022-2024)
Context Engineering → give the model WHAT to know (2025)
Harness Engineering → build WHERE it works (2026)
Agent = Model + Harness Era 1
One prompt, one response. "Act as an expert in X." The art was in the instruction.
Era 2
Fill the context window with the right information: documents, history, tool definitions. Karpathy popularized the term in June 2025.
Era 3
Not what you tell it or what you give it. It's WHERE you make it work: restrictions, permissions, infrastructure, protocols, observability.
They all converged without coordinating. Different paths, same conclusion.
November 2025 · Anthropic
"Effective Harnesses for Long-Running Agents." Initializer agent + coding agent + claude-progress.txt. The model is the same — what changes is what surrounds it.
February 5, 2026 · Mitchell Hashimoto
Co-founder of HashiCorp, creator of Terraform. Coins the term "harness engineering" and implements it with AGENTS.md — every line comes from a real agent error.
February 11, 2026 · OpenAI
7 engineers, 1 million lines of code, zero written by humans. The engineer's job was no longer to write code — it was to design the harness.
April 2026 · Martin Fowler
Formal taxonomy: Guides and Sensors, each computational or inferential. When Fowler organizes it, the industry adopts it. Happened with Refactoring, happened with Microservices.
April 2026 · Andrej Karpathy
Sequoia AI Ascent. Declares vibe coding dead. The replacement: agentic engineering, where 99% of the time you don't write code — you orchestrate agents.
There's no official closed list. Each source organizes it differently, but all three complement each other.
Vision 1
| Fowler | Type | What we already had |
|---|---|---|
| Guide | Computational | Agent Skills: tool catalog defines what it can do BEFORE the agent reasons |
| Guide | Computational | CLAUDE.md: rules the agent reads before starting |
| Sensor | Computational | validate.sh: rejects malformed skills after building them |
| Sensor | Inferential | The Agent Teams Reviewer: one LLM reviewing another LLM's work |
Vision 2
| Hashimoto | What we already had |
|---|---|
| AGENTS.md with error rules | CLAUDE.md: every line comes from a real agent error |
| Programmed tools for verification | validate.sh, publish.sh, health checks with curl |
| Reactive philosophy: error → rule | Exit conditions: "the reviewer max 2 rounds" — born from agents that wouldn't stop |
Vision 3
| OpenAI | What we already had |
|---|---|
| Structured documentation | agent.json: A2A card with skills, description and tags |
| Architectural constraints | Tool restriction: if the function doesn't exist, it can't execute it |
| Custom linters | validate.sh: verifies frontmatter and character limits |
| Testing and validation | "Delete all DNS records" → rejected. Live security test |
| Observability | tmux with 7 real-time log panels |
| Feedback loops | AI Studio 503 → migration to Vertex AI with 3 variables |
78 moments of harness engineering identified in our videos — before the term even existed.
Limit what the agent can do BEFORE it acts. Tool restriction, IAM permissions, secrets in Secret Manager.
Visions: Guides (Fowler) + Constraints (OpenAI)
Observe AFTER the agent acts. Rejection of dangerous operations, prompt injection that fails, validate.sh.
Visions: Sensors (Fowler) + Tools (Hashimoto) + Testing (OpenAI)
Files that define how the system behaves. CLAUDE.md as contract, agent.json as discovery card.
Visions: AGENTS.md (Hashimoto) + Docs (OpenAI)
If you can't see what the agent does, you don't have a harness. 7 terminals with real-time logs.
Visions: Sensors (Fowler) + Observability (OpenAI)
When something fails, the harness adapts — not the model. AI Studio 503 → Vertex AI with 3 variables, no rebuild.
Visions: Steering Loop (Fowler) + Error→Rule (Hashimoto) + Feedback Loops (OpenAI)
Agents as independent services on Cloud Run. Each scales to zero. Context travels as JSON in HTTP messages — no shared filesystem.
Ours: a level that current literature doesn't yet cover
| Pattern | Moments |
|---|---|
| Restrictions | 27 |
| Verification | 18 |
| Documentation | 19 |
| Observability | 5 |
| Reactive iteration | 5 |
| Infra + distributed context | 4 |
| Total | 78 |
The data that changes everything
Agent D0: they had 16 specialized tools and replaced them with a filesystem with YAMLs and grep. Fewer tools, simpler harness, better results. This is no longer opinion — it's data.
100%
Success rate (was 80%)
3.5x
Faster
-40%
Fewer tokens
What to do tomorrow with this, depending on where you are.
Level 1 — Minimum harness
If you're starting with agents.
Scoped tools
Define what your agent can do. If it doesn't need to delete, don't give it the delete function.
CLAUDE.md or AGENTS.md
Every time the agent fails, add a rule. Hashimoto does exactly this.
Secrets outside code
Secret Manager, encrypted environment variables. Never in code or Dockerfiles.
Level 2 — Productive harness
If you already have agents running.
Role-based permissions
Specific service accounts. Principle of least privilege. Your agent shouldn't have more access than it needs.
Observability
Logs, health checks. If you can't see what the agent does, you don't have a harness.
Feedback loops
What happens when it fails? Does it get stuck or have a plan B? Our AI Studio → Vertex AI switch with 3 variables is a feedback loop.
Level 3 — Multi-agent harness
If you want to scale.
Discovery protocol
Agent cards, agent.json. Agents need to know who can do what without knowing internal details.
Context via messages, not filesystem
In distributed production, state travels via HTTP. Not in shared files.
Exit conditions
Without them, agents spend tokens forever.
Each agent protects itself
Security by design, not by hope. Don't depend on a prompt — depend on the architecture.
Primary sources cited in the video.
Anthropic · November 2025
The original pattern: initializer agent + coding agent + claude-progress.txt. The harness that surrounds the model.
Mitchell Hashimoto · February 5, 2026
Co-founder of HashiCorp and creator of Terraform. Coins the term "harness engineering" and implements it with AGENTS.md.
OpenAI · February 11, 2026
7 engineers, 1 million lines of code, zero written by humans. 6 production practices for agents.
Martin Fowler · April 2026
Formal taxonomy: Guides and Sensors, each computational or inferential. "Everything that surrounds the AI model, except the model itself."
Vercel · December 2025
Agent D0: from 16 tools to a filesystem with YAMLs. Success rate from 80% to 100%, 3.5x faster, 40% fewer tokens.
Andrej Karpathy · April 2026
Declares vibe coding dead. "You can outsource your thinking, but you can't outsource your understanding."
Every video on the channel is a piece of the harness. These are the ones featured in the clips.
The 40-year pattern: agents that execute scoped skills and discover each other.
7 agents that build a complete IDP communicating via A2A artifacts.
7 agents build a skill factory. CLAUDE.md as the system contract.
7 independent containers, scale to zero, distributed context via HTTP.
Role-based permissions: same prompt, different results depending on who asks.
Auto-memory and auto-dream: the harness that evolves documentation automatically.
The essentials about harness engineering.
It's the discipline of designing everything that surrounds the AI model, except the model itself: restrictions, permissions, infrastructure, protocols, observability. The formula is Agent = Model + Harness. The term was coined by Mitchell Hashimoto (creator of Terraform) in February 2026.
Prompt Engineering (2022-2024): you tell the model WHAT to do. Context Engineering (2025): you give the model WHAT to know — documents, history, tool definitions. Harness Engineering (2026): you build WHERE it works — restrictions, permissions, infrastructure, feedback loops.
Mitchell Hashimoto, co-founder of HashiCorp and creator of Terraform, published it on February 5, 2026. Anthropic had already described the pattern in November 2025. OpenAI adopted it 6 days after Hashimoto. Martin Fowler formalized it into a taxonomy in April 2026.
The model matters, but it's no longer the differentiator. You can use Gemini, Claude or GPT — if the harness is well designed, it works with any of them. Vercel proved it: they removed 80% of their D0 agent's tools, and success rate went from 80% to 100%.
Fowler: Guides (before acting) and Sensors (after acting), each computational or inferential. Hashimoto: reactive documentation (AGENTS.md) and programmed tools. OpenAI: 6 production practices — structured docs, architectural constraints, custom linters, testing, observability and feedback loops.
A concept by Martin Fowler: how controllable your system is for agents. Not every system is equally easy to control. That's why we use Cloud Run, APIs, agent cards, IAM, schemas and HTTP messages — to make the system more controllable.
Level 1 (minimum): scoped tools, a CLAUDE.md/AGENTS.md file with error rules, and secrets outside code. Level 2 (productive): role-based permissions, observability and feedback loops. Level 3 (multi-agent): discovery protocol, context via HTTP messages and exit conditions.
YouTube Channel
@NicolasNeiraGarcia
ADK · A2A · Claude Code · Automation · Infrastructure