
Your AI Is Fast. Your Infrastructure Is Not.
Key Takeaways
- Infinite model speed would yield only 2-3x productivity gains, because the real bottleneck is every human-built tool the agent has to work through: paginated APIs, auth flows, batch jobs, and CRM dashboards.
- Most agentic wall clock time is spent handling tools, not running inference. The trillion-dollar model capability is not the constraint. Your infrastructure is.
- The infrastructure rebuild is happening in three layers: speeding up existing tools, replacing human-facing interfaces with agent-native primitives, and rebuilding entire scaffolding stacks so tool overhead stays negligible as models get faster.
- Wrapping a slow human-friendly API in an MCP layer does not make it agent-native. It makes it politely slow. Speed requires redesigning the primitive, not decorating the interface.
- Four to five human roles are emerging clearly: the tool-using generalist, the pipeline engineer, the relationship builder, the mature decision-maker, and the creative visionary. These are not new roles. They are newly essential ones.
The Trillion-Dollar Bottleneck Nobody's Talking About
Your AI can reason 50x faster than your team. Your actual productivity gain is maybe 2x. The gap isn't a model problem. It's an infrastructure problem. And most companies are pouring money into the wrong side of the equation.
Here's the uncomfortable truth: every piece of software your business runs on was built for humans. The login screens, the paginated API responses, the CRM dashboards, the batch processing jobs. All of it was engineered around the speed of human eyes, human hands, and human attention spans. That was brilliant engineering for fifty years. It's now the single biggest drag on everything AI is supposed to do for you.
Why Is Your AI Spending Most of Its Time Waiting?
Jeff Dean, Google's chief scientist and co-creator of TensorFlow and the TPU chip line, said something at GTC that should have gotten more attention. He noted that making a model not 50x faster, not 100x faster, but infinitely fast would likely yield only a 2 to 3x improvement in actual productivity.
Read that again. Infinite speed. Two to three x gain.
Why? Because the model would still be bottlenecked by everything around it: compilers, file systems, APIs, databases, test suites, CRMs, ERPs. All designed for humans. None designed for agents.
In most agentic loops today, the majority of wall clock time is the agent handling tools, not performing inference. The trillion-dollar inference capability isn't the bottleneck. Your human-designed tooling is.
Think about how your systems actually work. Your financial APIs paginate at roughly 100 rows because they assume a human needs to page through and read them. Your CRM requires a login and a visual display because humans process information with their eyes. Your spreadsheet application opens files because humans need to scan rows at the speed their brains handle visual information.
Every timeout, rate limit, authentication flow, startup sequence, and pagination scheme was calibrated to human pace. Not by accident. On purpose. Because that's who was using the tools. Until now.
What Does "10 to 50x Faster" Actually Mean for My Business?
The speed mismatch is real and accelerating. Every major tech company has published statements saying 20 to 40% of their code is now written by AI. Anthropic claims Claude Code writes 80% of its own code. NVIDIA's Billy Deli stated at GTC that inference now accounts for 90% of data center power consumption, not training, and is heading toward 10,000 to 20,000 tokens per second per user.
Dean expects AI to perform like a solid junior developer working 24/7 within about a year.
So the raw capability is there. The raw speed is there. But your organization isn't getting 50x out of it. You're getting 2x, maybe 3x, because the agent has to work through systems built for someone who blinks, takes coffee breaks, and reads at 250 words per minute.
This is where most companies are getting it wrong. They're evaluating models, comparing benchmarks, debating which LLM to adopt. Those are fine questions. But they're not the binding constraint. The binding constraint is everything between the model and the work.
I've seen this pattern before, across multiple technology cycles. The raw capability arrives years before the infrastructure catches up. And the companies that win aren't the ones with the best engine. They're the ones who rebuild the road.
How Is the Infrastructure Actually Being Rebuilt?
The rebuild is already underway, and it's happening in three layers. Understanding these layers matters because they tell you where to invest attention and where to stop wasting it.
Layer 1: Making Existing Tools Faster
The JavaScript system has spent the last half-decade migrating toward Rust, Go, and Zig for speed. TypeScript 7 is being rewritten in Go for a 10x-plus speed improvement. These faster languages also happen to be better for AI to write in. Rust's strict compiler acts as natural verification: if it compiles, it's more likely correct. Lee Robinson built a 38,000-line Rust image compressor using only coding agents with zero runtime dependencies.
But this is just the surface. Python, Java, Go, and .NET networks haven't undergone the same shift. Enterprise middleware, think Salesforce APIs, SAP batch processing, SharePoint authentication, has barely begun to change.
Here's a mistake I see constantly: assuming that placing an MCP layer over a human-friendly API makes it truly agent-native. Agents are flexible. They'll work around pagination. But they still burn wall clock time doing it. Wrapping a slow interface in a protocol layer doesn't make it fast. It makes it politely slow.
Layer 2: Replacing Human Tool Abstractions with Agent-Native Primitives
This is where things get interesting. The second layer replaces human-facing interfaces with primitives that assume the consumer has no eyes, no hands, and takes no breaks.
OpenAI shipped agentic primitives in February, including persistent containers and hosted shells where agents can install dependencies once and never restart between turns. Researchers have published branch file systems with sub-second branch creation, enabling fast iterative agent workflows. A recent agent primitives paper showed multi-agent coordination via a shared key-value cache achieves 3 to 4x lower latency compared to text-based message passing.
Server-side compaction can keep agents alive for hours or days. The concept of "starting up the compiler" disappears.
Layer 3: Agent-Native Scaffolding Replaces Human Scaffolding Across the Entire Stack
This is the most radical layer, and it follows what AI researchers call "the bitter lesson": general methods that use more computation almost always beat human-engineered solutions over time.
Aaron Levy connected this directly to Dean's talk by noting that every new generation of model "pinches off" human scaffolding in new ways. Tools built for today's model become drag on tomorrow's model.
Here's the structural problem with refinement. If you spend a year making an agent framework 3x faster, a new model that ships 5x faster at inference suddenly shifts your framework overhead from 30% to 60% of total time. Standing still means losing ground. Every model improvement shifts the ratio of model capability against human scaffolding overhead.
The durable response is building agent-native scaffolding so fast that no matter how capable agents become, tool calls, memory calls, and system commands remain negligible overhead.
Most enterprises aren't even thinking about this layer yet. That's the one that will matter most.
So What Do Humans Actually Do in This World?
This is the question I get most often, and it's usually asked with some anxiety. Let me reframe it: the shift isn't a demotion. It's a promotion. And it clarifies something that was always true but easy to ignore. The most valuable human work was never the execution. It was the judgment, the relationships, and the direction.
I see four to five roles emerging clearly, and they cut across traditional job titles.
The tool-using generalist. Someone who can activate processes, get things started, and drive them toward completion. The spark in a team. They know which AI tool to use and can direct long-running agentic processes. Think of the best "vibe coder" you know, but expanded to orchestrating autonomous agent workflows across a business.
The pipeline engineer. Someone skilled in infrastructure, data pipelines, security, and systems reliability. While the generalist deploys tools, this person makes sure the tools are running correctly, pipelines are sound, and everything is measurable and stable. This is where many conventional engineering roles are heading.
The business relationship builder. People prefer doing business with people. That's not changing. Within the next year or two, agent-run companies will specifically hire high-quality human salespeople to improve close rates, because human relationship-building has direct commercial value agents can't replicate.
The mature decision-maker. The adult in the room. Someone with the judgment to know when to slow down, or stop the system entirely. When not to aim an agent in a particular direction. When an inefficiency is worth preserving. How to lead and grow a team. This often maps to CEO-level thinking: assembling teams, setting direction, providing oversight.
The creative visionary (possibly). Someone in a "Steve Jobs chair" who can imagine and articulate how a product should feel, how it should be polished, what the experience should be. This role may not change much from its current form. But genuinely excellent creatives are rare today, particularly in product and business leadership.
These roles already exist among people I know, even if nobody has those titles yet. I expect them to become increasingly defined over the next 12 to 24 months.
What Should I Actually Do About This?
If you're running a company or leading a team, here's what this means in practice.
Stop over-indexing on model selection. The model is not your bottleneck. Your infrastructure is. Audit how much time your agents spend waiting on tools, authenticating, paginating, and restarting. That's where your 50x is leaking down to 2x.
Start thinking about your tooling stack as a speed problem, not a features problem. The question isn't "does this tool work with AI?" It's "does this tool work at AI speed?"
Invest in people who fit the roles above. The generalist who can orchestrate. The pipeline engineer who keeps things stable. The relationship builder who closes deals. The decision-maker who knows when to say no. These aren't new roles. They're newly essential ones.
And stop treating this as a future problem. The rebuild is happening now. Companies that recognize the bottleneck is in the scaffolding, not the model, are the ones capturing the actual productivity gains everyone else is only reading about.
At Holm Intelligence Partners, this is exactly the kind of operational shift we help leadership teams see clearly and act on. Not the hype. Not the benchmarks. The real structural changes that determine whether AI investments produce returns or just produce demos. If that's the conversation you need to have, we should talk.
The AI is fast enough. Your infrastructure isn't. That's the fix.
Infographic

Frequently Asked Questions
- Why is my AI only giving me a 2x productivity gain when the models are so much faster?
- Because the model is not your bottleneck. Your tooling is. Every API your agent calls, every authentication flow it navigates, every paginated response it parses was built for humans. The agent burns most of its wall clock time waiting on that infrastructure, not thinking. Fix the scaffolding around the model, not the model itself.
- What is an agent-native infrastructure and how is it different from what I have now?
- Agent-native infrastructure assumes the consumer has no eyes, no hands, and takes no breaks. That means persistent containers that do not restart between tasks, key-value caches instead of text-based message passing, branch file systems for fast iteration, and APIs that return full datasets instead of paginating for a human reader. Most enterprise stacks are nowhere near this yet.
- Does putting an MCP layer over my existing APIs make them agent-ready?
- No. An MCP layer makes a slow interface accessible to agents. It does not make it fast. Agents will work around pagination and authentication flows, but they still burn time doing it. If the underlying system was calibrated for human pace, wrapping it in a protocol layer just makes it politely slow.
- What human roles still matter as AI agents take over more execution work?
- Four roles stand out clearly: the generalist who can orchestrate agent workflows, the pipeline engineer who keeps systems stable and measurable, the relationship builder who closes deals because people still prefer buying from people, and the decision-maker who knows when to slow or stop the system. A fifth, the creative visionary, may stay close to its current form. These roles exist now. They will just become much more explicitly valuable.
- Should I be spending more time choosing the right AI model?
- Probably less than you think. Model selection is a fine question, but it is not the binding constraint for most companies. Audit how much time your agents spend waiting on tools, paginating, authenticating, and restarting. That is where your productivity is leaking. Fix the infrastructure first, then worry about which model sits on top of it.
- How fast is the infrastructure rebuild actually happening?
- Faster than most enterprise teams realize. TypeScript 7 is being rewritten in Go for a 10x speed improvement. OpenAI shipped persistent agent containers in early 2025. Research is showing 3 to 4x latency reductions from agent-native memory primitives. The early movers are already capturing these gains. Most enterprises have not even mapped the bottleneck yet, let alone started fixing it.