Harness Engineering · AI Agents · 2026

Harness Engineering: why Claude, GPT and Gemini no longer matter

Everything that surrounds the AI model, except the model itself. The discipline that OpenAI, Anthropic, Martin Fowler and Karpathy adopted in 2026. Agent = Model + Harness.

Watch on YouTube

The formula

From prompt engineering to harness engineering: the 3 eras of AI engineering.

Prompt Engineering  → tell the model WHAT to do    (2022-2024)
Context Engineering → give the model WHAT to know   (2025)
Harness Engineering → build WHERE it works          (2026)

Agent = Model + Harness

Era 1

Prompt Engineering

One prompt, one response. "Act as an expert in X." The art was in the instruction.

Era 2

Context Engineering

Fill the context window with the right information: documents, history, tool definitions. Karpathy popularized the term in June 2025.

Era 3

Harness Engineering

Not what you tell it or what you give it. It's WHERE you make it work: restrictions, permissions, infrastructure, protocols, observability.

The history

They all converged without coordinating. Different paths, same conclusion.

1

November 2025 · Anthropic

"Effective Harnesses for Long-Running Agents." Initializer agent + coding agent + claude-progress.txt. The model is the same — what changes is what surrounds it.

2

February 5, 2026 · Mitchell Hashimoto

Co-founder of HashiCorp, creator of Terraform. Coins the term "harness engineering" and implements it with AGENTS.md — every line comes from a real agent error.

3

February 11, 2026 · OpenAI

7 engineers, 1 million lines of code, zero written by humans. The engineer's job was no longer to write code — it was to design the harness.

4

April 2026 · Martin Fowler

Formal taxonomy: Guides and Sensors, each computational or inferential. When Fowler organizes it, the industry adopts it. Happened with Refactoring, happened with Microservices.

5

April 2026 · Andrej Karpathy

Sequoia AI Ascent. Declares vibe coding dead. The replacement: agentic engineering, where 99% of the time you don't write code — you orchestrate agents.

The 3 visions of the harness

There's no official closed list. Each source organizes it differently, but all three complement each other.

Vision 1

Martin Fowler — Guides and Sensors

Fowler	Type	What we already had
Guide	Computational	Agent Skills: tool catalog defines what it can do BEFORE the agent reasons
Guide	Computational	CLAUDE.md: rules the agent reads before starting
Sensor	Computational	validate.sh: rejects malformed skills after building them
Sensor	Inferential	The Agent Teams Reviewer: one LLM reviewing another LLM's work

Vision 2

Mitchell Hashimoto — Documentation and Tools

Hashimoto	What we already had
AGENTS.md with error rules	CLAUDE.md: every line comes from a real agent error
Programmed tools for verification	validate.sh, publish.sh, health checks with curl
Reactive philosophy: error → rule	Exit conditions: "the reviewer max 2 rounds" — born from agents that wouldn't stop

Vision 3

OpenAI — 6 production practices

OpenAI	What we already had
Structured documentation	agent.json: A2A card with skills, description and tags
Architectural constraints	Tool restriction: if the function doesn't exist, it can't execute it
Custom linters	validate.sh: verifies frontmatter and character limits
Testing and validation	"Delete all DNS records" → rejected. Live security test
Observability	tmux with 7 real-time log panels
Feedback loops	AI Studio 503 → migration to Vertex AI with 3 variables

The 5 harness patterns

78 moments of harness engineering identified in our videos — before the term even existed.

1

Restrictions

27 moments

Limit what the agent can do BEFORE it acts. Tool restriction, IAM permissions, secrets in Secret Manager.

Visions: Guides (Fowler) + Constraints (OpenAI)

2

Verification

18 moments

Observe AFTER the agent acts. Rejection of dangerous operations, prompt injection that fails, validate.sh.

Visions: Sensors (Fowler) + Tools (Hashimoto) + Testing (OpenAI)

3

Documentation

19 moments

Files that define how the system behaves. CLAUDE.md as contract, agent.json as discovery card.

Visions: AGENTS.md (Hashimoto) + Docs (OpenAI)

4

Observability

5 moments

If you can't see what the agent does, you don't have a harness. 7 terminals with real-time logs.

Visions: Sensors (Fowler) + Observability (OpenAI)

5

Reactive iteration

5 moments

When something fails, the harness adapts — not the model. AI Studio 503 → Vertex AI with 3 variables, no rebuild.

Visions: Steering Loop (Fowler) + Error→Rule (Hashimoto) + Feedback Loops (OpenAI)

+

Infra + distributed context

4 moments

Agents as independent services on Cloud Run. Each scales to zero. Context travels as JSON in HTTP messages — no shared filesystem.

Ours: a level that current literature doesn't yet cover

Pattern	Moments
Restrictions	27
Verification	18
Documentation	19
Observability	5
Reactive iteration	5
Infra + distributed context	4
Total	78

The data that changes everything

Vercel removed 80% of their agent's tools

Agent D0: they had 16 specialized tools and replaced them with a filesystem with YAMLs and grep. Fewer tools, simpler harness, better results. This is no longer opinion — it's data.

100%

Success rate (was 80%)

3.5x

Faster

-40%

Fewer tokens

Practical guide: 3 harness levels

What to do tomorrow with this, depending on where you are.

Level 1 — Minimum harness

If you're starting with agents.

1

Scoped tools

Define what your agent can do. If it doesn't need to delete, don't give it the delete function.

2

CLAUDE.md or AGENTS.md

Every time the agent fails, add a rule. Hashimoto does exactly this.

3

Secrets outside code

Secret Manager, encrypted environment variables. Never in code or Dockerfiles.

Level 2 — Productive harness

If you already have agents running.

4

Role-based permissions

Specific service accounts. Principle of least privilege. Your agent shouldn't have more access than it needs.

5

Observability

Logs, health checks. If you can't see what the agent does, you don't have a harness.

6

Feedback loops

What happens when it fails? Does it get stuck or have a plan B? Our AI Studio → Vertex AI switch with 3 variables is a feedback loop.

Level 3 — Multi-agent harness

If you want to scale.

7

Discovery protocol

Agent cards, agent.json. Agents need to know who can do what without knowing internal details.

8

Context via messages, not filesystem

In distributed production, state travels via HTTP. Not in shared files.

9

Exit conditions

Without them, agents spend tokens forever.

10

Each agent protects itself

Security by design, not by hope. Don't depend on a prompt — depend on the architecture.

Sources

Primary sources cited in the video.

Anthropic · November 2025

Effective Harnesses for Long-Running Agents

The original pattern: initializer agent + coding agent + claude-progress.txt. The harness that surrounds the model.

Mitchell Hashimoto · February 5, 2026

My AI Adoption Journey

Co-founder of HashiCorp and creator of Terraform. Coins the term "harness engineering" and implements it with AGENTS.md.

OpenAI · February 11, 2026

Harness Engineering

7 engineers, 1 million lines of code, zero written by humans. 6 production practices for agents.

Martin Fowler · April 2026

Harness Engineering

Formal taxonomy: Guides and Sensors, each computational or inferential. "Everything that surrounds the AI model, except the model itself."

Vercel · December 2025

We removed 80% of our agent's tools

Agent D0: from 16 tools to a filesystem with YAMLs. Success rate from 80% to 100%, 3.5x faster, 40% fewer tokens.

Andrej Karpathy · April 2026

Sequoia AI Ascent 2026

Declares vibe coding dead. "You can outsource your thinking, but you can't outsource your understanding."

Frequently asked questions

The essentials about harness engineering.

What is Harness Engineering?

+

It's the discipline of designing everything that surrounds the AI model, except the model itself: restrictions, permissions, infrastructure, protocols, observability. The formula is Agent = Model + Harness. The term was coined by Mitchell Hashimoto (creator of Terraform) in February 2026.

What's the difference between Prompt Engineering, Context Engineering and Harness Engineering?

+

Prompt Engineering (2022-2024): you tell the model WHAT to do. Context Engineering (2025): you give the model WHAT to know — documents, history, tool definitions. Harness Engineering (2026): you build WHERE it works — restrictions, permissions, infrastructure, feedback loops.

Who invented the term Harness Engineering?

+

Mitchell Hashimoto, co-founder of HashiCorp and creator of Terraform, published it on February 5, 2026. Anthropic had already described the pattern in November 2025. OpenAI adopted it 6 days after Hashimoto. Martin Fowler formalized it into a taxonomy in April 2026.

Does the AI model no longer matter?

+

The model matters, but it's no longer the differentiator. You can use Gemini, Claude or GPT — if the harness is well designed, it works with any of them. Vercel proved it: they removed 80% of their D0 agent's tools, and success rate went from 80% to 100%.

What are the 3 visions of the harness?

+

Fowler: Guides (before acting) and Sensors (after acting), each computational or inferential. Hashimoto: reactive documentation (AGENTS.md) and programmed tools. OpenAI: 6 production practices — structured docs, architectural constraints, custom linters, testing, observability and feedback loops.

What is harnessability?

+

A concept by Martin Fowler: how controllable your system is for agents. Not every system is equally easy to control. That's why we use Cloud Run, APIs, agent cards, IAM, schemas and HTTP messages — to make the system more controllable.

Where do I start implementing harness engineering?

+

Level 1 (minimum): scoped tools, a CLAUDE.md/AGENTS.md file with error rules, and secrets outside code. Level 2 (productive): role-based permissions, observability and feedback loops. Level 3 (multi-agent): discovery protocol, context via HTTP messages and exit conditions.

Community

Design your own harness in Agentic Engineers

Restrictions, permissions, observability and feedback loops applied to production agents — the 5 patterns with real repos to practice. Free community access; the full courses are in the Premium tier.

Join Agentic Engineers →

YouTube Channel

@NicolasNeiraGarcia

ADK · A2A · Claude Code · Automation · Infrastructure

Subscribe ›

Harness Engineering: why Claude, GPT and Gemini no longer matter

The formula

Prompt Engineering

Context Engineering

Harness Engineering

The history

The 3 visions of the harness

Martin Fowler — Guides and Sensors

Mitchell Hashimoto — Documentation and Tools

OpenAI — 6 production practices

The 5 harness patterns

Restrictions

Verification

Documentation

Observability

Reactive iteration

Infra + distributed context

Vercel removed 80% of their agent's tools

Practical guide: 3 harness levels

Sources

Effective Harnesses for Long-Running Agents

My AI Adoption Journey

Harness Engineering

Harness Engineering

We removed 80% of our agent's tools

Sequoia AI Ascent 2026

Related videos

Agent Skills

Google ADK + A2A

Claude Agent Teams

ADK on Cloud Run

Claude Fable 5

Claude Code Memory 2.0

Frequently asked questions

What is Harness Engineering?

What's the difference between Prompt Engineering, Context Engineering and Harness Engineering?

Who invented the term Harness Engineering?

Does the AI model no longer matter?

What are the 3 visions of the harness?

What is harnessability?

Where do I start implementing harness engineering?

Design your own harness in Agentic Engineers