Harness Engineering · AI Agents · 2026

Harness Engineering: why Claude, GPT and Gemini no longer matter

Everything that surrounds the AI model, except the model itself. The discipline that OpenAI, Anthropic, Martin Fowler and Karpathy adopted in 2026. Agent = Model + Harness.

The formula

From prompt engineering to harness engineering: the 3 eras of AI engineering.

Prompt Engineering  → tell the model WHAT to do    (2022-2024)
Context Engineering → give the model WHAT to know   (2025)
Harness Engineering → build WHERE it works          (2026)

Agent = Model + Harness

Era 1

Prompt Engineering

One prompt, one response. "Act as an expert in X." The art was in the instruction.

Era 2

Context Engineering

Fill the context window with the right information: documents, history, tool definitions. Karpathy popularized the term in June 2025.

Era 3

Harness Engineering

Not what you tell it or what you give it. It's WHERE you make it work: restrictions, permissions, infrastructure, protocols, observability.

The history

They all converged without coordinating. Different paths, same conclusion.

1

November 2025 · Anthropic

"Effective Harnesses for Long-Running Agents." Initializer agent + coding agent + claude-progress.txt. The model is the same — what changes is what surrounds it.

2

February 5, 2026 · Mitchell Hashimoto

Co-founder of HashiCorp, creator of Terraform. Coins the term "harness engineering" and implements it with AGENTS.md — every line comes from a real agent error.

3

February 11, 2026 · OpenAI

7 engineers, 1 million lines of code, zero written by humans. The engineer's job was no longer to write code — it was to design the harness.

4

April 2026 · Martin Fowler

Formal taxonomy: Guides and Sensors, each computational or inferential. When Fowler organizes it, the industry adopts it. Happened with Refactoring, happened with Microservices.

5

April 2026 · Andrej Karpathy

Sequoia AI Ascent. Declares vibe coding dead. The replacement: agentic engineering, where 99% of the time you don't write code — you orchestrate agents.

The 3 visions of the harness

There's no official closed list. Each source organizes it differently, but all three complement each other.

Vision 1

Martin Fowler — Guides and Sensors

Fowler Type What we already had
Guide Computational Agent Skills: tool catalog defines what it can do BEFORE the agent reasons
Guide Computational CLAUDE.md: rules the agent reads before starting
Sensor Computational validate.sh: rejects malformed skills after building them
Sensor Inferential The Agent Teams Reviewer: one LLM reviewing another LLM's work

Vision 2

Mitchell Hashimoto — Documentation and Tools

Hashimoto What we already had
AGENTS.md with error rules CLAUDE.md: every line comes from a real agent error
Programmed tools for verification validate.sh, publish.sh, health checks with curl
Reactive philosophy: error → rule Exit conditions: "the reviewer max 2 rounds" — born from agents that wouldn't stop

Vision 3

OpenAI — 6 production practices

OpenAI What we already had
Structured documentation agent.json: A2A card with skills, description and tags
Architectural constraints Tool restriction: if the function doesn't exist, it can't execute it
Custom linters validate.sh: verifies frontmatter and character limits
Testing and validation "Delete all DNS records" → rejected. Live security test
Observability tmux with 7 real-time log panels
Feedback loops AI Studio 503 → migration to Vertex AI with 3 variables

The 5 harness patterns

78 moments of harness engineering identified in our videos — before the term even existed.

1

Restrictions

27 moments

Limit what the agent can do BEFORE it acts. Tool restriction, IAM permissions, secrets in Secret Manager.

Visions: Guides (Fowler) + Constraints (OpenAI)

2

Verification

18 moments

Observe AFTER the agent acts. Rejection of dangerous operations, prompt injection that fails, validate.sh.

Visions: Sensors (Fowler) + Tools (Hashimoto) + Testing (OpenAI)

3

Documentation

19 moments

Files that define how the system behaves. CLAUDE.md as contract, agent.json as discovery card.

Visions: AGENTS.md (Hashimoto) + Docs (OpenAI)

4

Observability

5 moments

If you can't see what the agent does, you don't have a harness. 7 terminals with real-time logs.

Visions: Sensors (Fowler) + Observability (OpenAI)

5

Reactive iteration

5 moments

When something fails, the harness adapts — not the model. AI Studio 503 → Vertex AI with 3 variables, no rebuild.

Visions: Steering Loop (Fowler) + Error→Rule (Hashimoto) + Feedback Loops (OpenAI)

+

Infra + distributed context

4 moments

Agents as independent services on Cloud Run. Each scales to zero. Context travels as JSON in HTTP messages — no shared filesystem.

Ours: a level that current literature doesn't yet cover

Pattern Moments
Restrictions27
Verification18
Documentation19
Observability5
Reactive iteration5
Infra + distributed context4
Total78

The data that changes everything

Vercel removed 80% of their agent's tools

Agent D0: they had 16 specialized tools and replaced them with a filesystem with YAMLs and grep. Fewer tools, simpler harness, better results. This is no longer opinion — it's data.

100%

Success rate (was 80%)

3.5x

Faster

-40%

Fewer tokens

Practical guide: 3 harness levels

What to do tomorrow with this, depending on where you are.

Level 1 — Minimum harness

If you're starting with agents.

1

Scoped tools

Define what your agent can do. If it doesn't need to delete, don't give it the delete function.

2

CLAUDE.md or AGENTS.md

Every time the agent fails, add a rule. Hashimoto does exactly this.

3

Secrets outside code

Secret Manager, encrypted environment variables. Never in code or Dockerfiles.

Level 2 — Productive harness

If you already have agents running.

4

Role-based permissions

Specific service accounts. Principle of least privilege. Your agent shouldn't have more access than it needs.

5

Observability

Logs, health checks. If you can't see what the agent does, you don't have a harness.

6

Feedback loops

What happens when it fails? Does it get stuck or have a plan B? Our AI Studio → Vertex AI switch with 3 variables is a feedback loop.

Level 3 — Multi-agent harness

If you want to scale.

7

Discovery protocol

Agent cards, agent.json. Agents need to know who can do what without knowing internal details.

8

Context via messages, not filesystem

In distributed production, state travels via HTTP. Not in shared files.

9

Exit conditions

Without them, agents spend tokens forever.

10

Each agent protects itself

Security by design, not by hope. Don't depend on a prompt — depend on the architecture.

Sources

Primary sources cited in the video.

Anthropic · November 2025

Effective Harnesses for Long-Running Agents

The original pattern: initializer agent + coding agent + claude-progress.txt. The harness that surrounds the model.

Mitchell Hashimoto · February 5, 2026

My AI Adoption Journey

Co-founder of HashiCorp and creator of Terraform. Coins the term "harness engineering" and implements it with AGENTS.md.

OpenAI · February 11, 2026

Harness Engineering

7 engineers, 1 million lines of code, zero written by humans. 6 production practices for agents.

Martin Fowler · April 2026

Harness Engineering

Formal taxonomy: Guides and Sensors, each computational or inferential. "Everything that surrounds the AI model, except the model itself."

Vercel · December 2025

We removed 80% of our agent's tools

Agent D0: from 16 tools to a filesystem with YAMLs. Success rate from 80% to 100%, 3.5x faster, 40% fewer tokens.

Andrej Karpathy · April 2026

Sequoia AI Ascent 2026

Declares vibe coding dead. "You can outsource your thinking, but you can't outsource your understanding."

Related videos

Every video on the channel is a piece of the harness. These are the ones featured in the clips.

Agent Skills

The 40-year pattern: agents that execute scoped skills and discover each other.

Google ADK + A2A

7 agents that build a complete IDP communicating via A2A artifacts.

Claude Agent Teams

7 agents build a skill factory. CLAUDE.md as the system contract.

ADK on Cloud Run

7 independent containers, scale to zero, distributed context via HTTP.

Workspace Profiling

Role-based permissions: same prompt, different results depending on who asks.

Claude Code Memory 2.0

Auto-memory and auto-dream: the harness that evolves documentation automatically.

Frequently asked questions

The essentials about harness engineering.

What is Harness Engineering?

+

It's the discipline of designing everything that surrounds the AI model, except the model itself: restrictions, permissions, infrastructure, protocols, observability. The formula is Agent = Model + Harness. The term was coined by Mitchell Hashimoto (creator of Terraform) in February 2026.

What's the difference between Prompt Engineering, Context Engineering and Harness Engineering?

+

Prompt Engineering (2022-2024): you tell the model WHAT to do. Context Engineering (2025): you give the model WHAT to know — documents, history, tool definitions. Harness Engineering (2026): you build WHERE it works — restrictions, permissions, infrastructure, feedback loops.

Who invented the term Harness Engineering?

+

Mitchell Hashimoto, co-founder of HashiCorp and creator of Terraform, published it on February 5, 2026. Anthropic had already described the pattern in November 2025. OpenAI adopted it 6 days after Hashimoto. Martin Fowler formalized it into a taxonomy in April 2026.

Does the AI model no longer matter?

+

The model matters, but it's no longer the differentiator. You can use Gemini, Claude or GPT — if the harness is well designed, it works with any of them. Vercel proved it: they removed 80% of their D0 agent's tools, and success rate went from 80% to 100%.

What are the 3 visions of the harness?

+

Fowler: Guides (before acting) and Sensors (after acting), each computational or inferential. Hashimoto: reactive documentation (AGENTS.md) and programmed tools. OpenAI: 6 production practices — structured docs, architectural constraints, custom linters, testing, observability and feedback loops.

What is harnessability?

+

A concept by Martin Fowler: how controllable your system is for agents. Not every system is equally easy to control. That's why we use Cloud Run, APIs, agent cards, IAM, schemas and HTTP messages — to make the system more controllable.

Where do I start implementing harness engineering?

+

Level 1 (minimum): scoped tools, a CLAUDE.md/AGENTS.md file with error rules, and secrets outside code. Level 2 (productive): role-based permissions, observability and feedback loops. Level 3 (multi-agent): discovery protocol, context via HTTP messages and exit conditions.

YouTube Channel

@NicolasNeiraGarcia

ADK · A2A · Claude Code · Automation · Infrastructure

Subscribe ›