Source: YouTube — 7zZy1QTvokM (≈13 min). Based on Andrej Karpathy's talk at AISN 2026. Presented by a third-party YouTuber who attended and followed up with research.
Karpathy argues that almost everyone is prompting AI wrong. His method for building 10× faster breaks down into three layers: the Spec, the Verifier, and the Environment. The unifying theme is that you can outsource thinking but not understanding — the human must grasp the bigger picture to direct AI effectively.
AI is brilliant at measurable things but blind to context. Karpathy illustrates this with a simple example: asking AI whether to drive or walk 50 m to a car wash — every major model says walk, missing the obvious point that you need the car there. The gap between your contextual understanding and AI's computational power is bridged by a well-crafted spec.
Karpathy explicitly prefers working directly on a detailed spec over using Claude's built-in plan mode, which he considers too high-level.
Three steps to a good spec:
1. Uncover the actual goal — not just the task ("create a report") but the decision or conclusion the task serves. Ask Claude to interview you to draw this out.
2. Work agile, not waterfall — break work into small, scoped checkpoints; review output and adjust throughout rather than handing everything over at once. Tell Claude to bias towards smaller, compartmentalised specs.
3. Be precise and use your brain — every assumption AI makes is a chance to drift. The more precise the spec, the less room for error. Prompt Claude to make you verify key decisions explicitly so nothing is missed.
Karpathy frames AI as a "ghost" (not an animal/human), meaning emotional levers — yelling, pleading, vague feedback like "make this better" — don't work. The video simplifies this as a robot librarian: it answers from what's in its library, doesn't know when a book is missing, and may confidently fabricate.
The only effective lever is the verification layer. Three tactics:
1. Set evaluation criteria up front — before Claude touches anything, define what good looks like with precision (e.g. "the report must have three sections, each ending with a recommendation").
2. Use a second AI as critic — a second model (e.g. Codex via the Claude Code plugin) from a different training base can catch errors the first model missed. Prompt: "If this turns into a complex build, run the final output by Codex to ensure both systems agree."
3. Pull external signal — connect Claude to the actual system (e.g. deployment environment, historical reports) to verify outputs against ground truth rather than relying on AI self-assessment.
Boris Cherney (creator of Claude Code) is quoted: a feedback loop two-to-three-x's the quality of the final result.
The spec and verifier need a stable workspace to live in. Most people rebuild from scratch each session. Four steps to a compounding environment:
1. Claude MD file — gets injected automatically at the start of every session. Use it to encode rules (repo structure, custom skills, architecture, key working rules). Enforces good habits without relying on memory.
2. LLM knowledge base — a structured folder system (Karpathy's concept, went viral on Twitter) that ingests your own data as training material. Your data is your moat; this builds intellectual data property over time.
3. Custom skills — reusable handbook entries for repeated tasks. The more you use them, the better they become ("run water through the hose to find the leaks").
4. Guardrails — always/ask/never — bucket tasks into three groups: always do (autopilot), ask first (double-check), never do (hard lines). Claude MD rules are guidance, not enforcement; for truly critical constraints use pre-tool-use hooks at the tool level so Claude literally cannot bypass them (e.g. block writes to a protected folder).
Asked what still merits deep learning when intelligence gets cheap: "You can outsource your thinking, but you can't outsource your understanding." All three layers depend on the human understanding goals, context, and the bigger picture — that's what AI cannot supply.
The spec/verifier/environment framing maps closely to Envoy's phase-aware FSM and the Structured Context Protocol proposal. The MEMORY/CONTENTS/PROGRAMMING_RULES notes cluster already functions as an LLM knowledge base in Karpathy's sense. The Claude MD concept is analogous to the userPreferences + MEMORY read convention used in these sessions.