
Karpathy's Four AI Problems Are Your Firm's Too
Key Takeaways
- Karpathy's four-line diagnosis (no thinking, over-production, breaking unrelated things, work-driven not goal-driven) is the default state of any AI deployed without a decision layer above it.
- A 131,000-star markdown file works for one developer using one tool. Your firm is running AI across fifteen surfaces with twenty-something users on regulated data. You can't fix that with a file.
- The same four failures are already running inside finance, marketing, operations, and customer success. MIT puts the enterprise AI pilot failure rate at 95 percent. Not because the models are bad. Because nobody installed the constraints.
- The head-fake is to roll out CLAUDE.md to the dev team, write prompting guidelines, stand up a centre of excellence. More tools, more pilots, more documentation. That's the standard consulting answer and it's wrong.
- If AI is running across five or more surfaces and nobody can tell you on one page what each one is doing and what it's costing, you don't need another tool. You need the decision layer.
The four-line diagnosis of why your AI output is mediocre
A tweet from Andrej Karpathy hit 7.74 million views last month because it named something every operator using AI tools has felt but couldn't articulate. Four problems. Plain language. No hedge.
Current LLMs don't think before working. They over-complicate the output. They can't make thin-slice edits without breaking unrelated things. And they're work-driven, not goal-driven. They produce work to produce work, instead of working toward a defined outcome.
Boris Cherny, the creator of Claude Code, replied that the points landed and he'd look into them. Elon Musk added that AI trained on the entire internet regresses toward average because most of the internet is average. A community member then took Karpathy's diagnosis, turned it into a CLAUDE.md file, posted it on GitHub, and watched it pick up 131,000 stars. One of the most-starred files on the platform.
Here's why it matters to anyone running a mid-market firm where AI is already in the building, whether you formally approved it or not.
What does a markdown file actually fix?
The CLAUDE.md file is a ruleset. You point Claude Code at the GitHub URL, it pulls the file down, and from that point forward it operates against an upgraded set of constraints.
The four upgrades, in plain terms:
Think first. Don't assume. Don't hide confusion. Surface trade-offs. Push back when warranted. Stop and ask when something is unclear. In the test, the prompt "make me a lead magnet" produced two completely different responses. Without the file, Claude started building immediately, guessing at every variable. With the file, Claude cited the rulebook back at the user and asked four clarifying questions before touching anything.
Keep it minimum. Same output, fewer lines of code. The test ran the same lead magnet request both ways. Without the file: 212 lines. With the file: roughly half. Nearly pixel-identical visual result. Faster load. Cleaner foundation for the next iteration.
Touch only what you must touch. Don't improve adjacent code. Don't refactor what isn't broken. Match existing style even if you'd do it differently. Mention dead code, don't delete it. The test: change one button's colour. With the file, the button changed and nothing else moved. Without the file, the whole site changed colour.
Tie every action to a success criterion. A vague prompt like "fix the bug" gets rewritten as "write a test that reproduces the bug, then make it pass." Claude keeps working until the test passes.
Four constraints. One file. Measurably better output.
Why is this relevant to a firm that doesn't write code?
Because the same four failures show up everywhere AI is being used inside your business right now, and nobody has installed a CLAUDE.md equivalent for any of it.
Your finance team is using Copilot to draft variance commentary. It doesn't think first; it generates plausible-sounding narrative based on whatever it can see. Marketing is using ChatGPT to draft proposals. It over-produces. Three pages where one would do. Your operations lead asks AI to redraft a process document and it quietly changes definitions in three other places nobody noticed. Customer success uses AI to respond to escalations and the responses are work-driven, not outcome-driven. The reply gets sent. The customer issue doesn't get resolved.
That's what AI Fragmentation looks like inside an operating firm. Not "we have too many AI tools." That's the surface. The structural cause is that every team is running AI without constraints, without a defined success criterion, and without a governance line that says what the AI is allowed to touch and what it isn't. Karpathy's four problems aren't a Claude Code problem. They're the default state of any AI deployed without a decision layer above it.
The cost is real. MIT's research on enterprise AI pilots puts the failure rate at 95 percent. Not because the models are bad. Because nobody installed the constraints. The work gets produced. The goal doesn't get met. Margin doesn't move. Throughput doesn't compound. And the data surface keeps widening because every team is feeding context into tools nobody approved on data nobody mapped.
So is the answer just "give every team a markdown file"?
That's the question every operator should be sitting with right now. If a single file can 10x the output of one developer using one tool, what's the equivalent for a firm running AI across finance, marketing, operations, compliance, and customer success?
Honest answer: it's not a file. A file works for Claude Code because Claude Code is one tool, used by one user, against one defined workspace. Your firm is running AI across fifteen surfaces, with twenty-something users, against data that spans regulated material, client matters, financial reporting, and operational records. You can't install a markdown file across that. The constraints have to live above the tools, not inside them.
What you actually need is the same logic Karpathy named, applied to the firm:
A think-first layer that asks whether an AI workflow should run at all before it runs. Most firms skip this entirely.
A minimum-output discipline that kills the AI tools producing work nobody uses. The Copilot licences sitting idle. The ChatGPT Teams seats that duplicate something the firm already has. The pilot from Q1 that never shipped.
A surgical-change rule that defines which data each AI workflow is allowed to touch, and which it isn't. This is the governance line. Without it, the AI redrafting a marketing brief has the same data access as the AI summarising a client matter, and that is how regulated firms end up in front of a regulator they didn't expect to meet.
A goal-driven verification step that ties every AI workflow to a measurable success criterion. Not "we deployed it." Throughput per employee. Margin contribution. Cycle-time reduction. If the workflow can't be tied to a number, it shouldn't be in production.
That's what an AI Operating Audit produces. A workflow-by-workflow verdict (Kill, Fix, Build) against every AI surface in the firm, paired with the governance line that boxes in what the firm keeps. Throughput and data sovereignty on the same page. Margin expansion and data-exposure remediation, on one operating path.
What does the head-fake look like?
Most firms, when they read Karpathy's tweet or watch the demo, reach for the obvious answer: roll out CLAUDE.md to the dev team, write internal prompting guidelines for everyone else, maybe stand up a centre of excellence. More documentation. More tooling. More pilots.
That's the standard consulting answer, and it's wrong for the same reason every AI transformation deck is wrong. You don't fix AI Fragmentation by adding another layer. You fix it by subtracting. Fewer tools. Fewer pilots. A defined decision layer above the tools that stay. A prioritised remediation roadmap that compounds throughput inside an enforced governance line.
The Karpathy file works because it's a constraint, not an expansion. It tells Claude what not to do. Don't assume, don't over-produce, don't touch what you don't have to, don't stop until the goal is verified. The file is valuable because it removes degrees of freedom, not because it adds capability.
Same principle applies at the firm level. The firms getting compounding throughput from AI right now aren't the ones with the most tools. They're the ones who killed the tools that didn't earn their keep, fixed the workflows that almost worked, built the two or three that actually move the P&L, and installed a governance line that holds the surface stable while the throughput compounds.
That's what HIP installs. The AI Operating Partner retainer is the firm equivalent of what CLAUDE.md is to a developer. A decision layer above the tools. Constraints that hold. Success criteria that get verified. Work that ties to a goal instead of producing work to produce work.
The next step
If your firm has AI running across five or more surfaces and nobody can tell you, on one page, what each one is doing and what it's costing in margin or exposure, you don't need another tool. You need the decision layer.
Infographic

Frequently Asked Questions
- What is the Karpathy CLAUDE.md file?
- It's a four-rule constraint file for Claude Code. Think first, keep output minimum, touch only what you must touch, and tie every action to a success criterion. A community member turned Karpathy's tweet into the file and it picked up 131,000 GitHub stars.
- Why does a markdown file produce better AI output?
- Because it removes degrees of freedom. It tells the model what not to do. Don't assume, don't over-produce, don't touch adjacent work, don't stop until the goal is verified. Constraints beat capability every time.
- Does this apply to firms that don't write code?
- Yes. The same four failures show up in finance, marketing, operations and customer success right now. AI is producing work instead of meeting goals, touching data nobody mapped, and over-producing output nobody reads. That's AI Fragmentation.
- Can I just give every team a markdown file?
- No. A file works for one tool and one user. Your firm is running AI across fifteen surfaces and twenty-something users on regulated and client data. The constraints have to live above the tools, not inside them. That's the decision layer.
- What does HIP actually install?
- An AI Operating Audit produces a kill, fix, build verdict on every AI workflow in the firm, paired with the governance line that defines what the AI is allowed to touch. The AI Operating Partner retainer holds that decision layer in place while throughput compounds.
- When do I need this?
- If AI is running across five or more surfaces in your firm and nobody can tell you on one page what each one is doing and what it's costing in margin or exposure, you don't need another tool. You need the decision layer.