Is AI actually getting cheaper over time?

At the cheap end, yes. Small models and commodity tasks are flat or down. But the frontier models doing the real work are getting more expensive. GPT 5.5 runs 49 to 92 percent more than 5.4 in practice. Opus kept its sticker price flat then shipped a tokeniser that produces responses 35 percent longer. The part of the stack actually doing operator work is on an upward curve, not a downward one.

What is AI Fragmentation in a mid-market firm?

It's the pattern where a firm is paying for AI in eight places, getting visible output from two, and exposing data through all eight. Microsoft 365 features, bolt-on marketing tools, IT pilots, Shadow AI on personal accounts. Nobody at the exec table can say which workflows pay back and which are silently compounding cost.

How should an operator budget for AI tokens in 2025?

Stop assuming the cost curve will save you. Treat AI spend the way you treat every other operating cost. Know what you have, know what it produces, and kill what doesn't earn its line. The fragmented stack is inflating quarter over quarter, and the margin moves before the annual reconciliation catches it.

What does forward deploying yourself mean for a CEO?

Stop treating AI as an IT line item. Treat it as an operating discipline that sits across functions and reports up to you. Get a real inventory including Shadow AI. Draw the governance line once at the exec level. Stop greenlighting new pilots until the existing ones have a verdict.

The AI Token Bill Nobody Wants on the Page

Josef Holm7 min readMay 20, 2026

Key Takeaways

A founder just published a $250,000 weekly AI token bill. The industry keeps not saying out loud that frontier tokens are getting more expensive, not cheaper, and the incentive for that to reverse does not exist.
The cheap end stayed cheap. The models doing real operator work went the other way. GPT 5.5 runs 49 to 92 percent more than 5.4 in practice. Opus kept its sticker flat then shipped a tokeniser producing 35 percent longer responses.
Most mid-market firms are paying for AI in eight places, getting output from two, and exposing data through all eight. That's AI Fragmentation, and the upward token curve is inflating it every quarter.
The move is not more tools or a Chief AI Officer. It's Kill, Fix, Build on every workflow already running, an Opportunity Map that compounds throughput inside an enforced governance line, and a hard rule against new pilots until the existing ones have a verdict.
The firms that win the next three years are not the ones spending the most on tokens. They're the ones running the leanest AI stack with the cleanest governance line, extracting throughput from workflows they actually understand.

The token bill nobody wants to put on the page

A founder I respect just published his weekly AI token spend. Two hundred and fifty thousand dollars. In seven days. One-point-three million across the last thirty.

Peter Steinberger framed it as a thought experiment. How would we build software if tokens didn't matter? Fine. It's an interesting question if you're at OpenAI. It's a different question entirely if you're a Managing Director at a $40M services firm staring at next year's operating budget, trying to figure out which AI workflows actually pay for themselves.

Here's the thing the industry keeps not saying out loud: tokens are not getting cheaper. The frontier is getting more expensive, the models doing the real work are getting more expensive, and the incentive structure for any of that to reverse does not exist.

I want to walk through what that means for an operator running a private mid-market firm right now. Not for the people writing the cheques. For the people writing the cheques to the people writing the cheques.

Is AI actually getting cheaper, or are we being sold a story?

Most operators I talk to have absorbed a comfortable assumption: AI follows a Moore's law curve. Wait twelve months, the same capability costs a fraction. Plan around that and you're fine.

That assumption is wrong at the level that matters.

The cheap end has stayed cheap. Small models, commodity tasks, the stuff that runs your email summariser. Costs are flat or down there. But the models that do work an operator would actually pay for? Those have moved the other way.

GPT 5.4 to 5.5 doubled in price. Anthropic kept Opus's nominal price flat between 4.6 and 4.7, then quietly shipped a new tokeniser that produces responses 35 percent longer. Same sticker, bigger bill. An OpenRouter study found GPT 5.5 costs between 49 and 92 percent more than 5.4 in practice once you account for how the model actually behaves.

So when a vendor tells your COO "the cost curve is on your side, just deploy and the unit economics will improve", they are describing the bottom of the stack. Not the part doing the work.

This matters for one reason. If you're building an AI workflow that depends on frontier model output, your cost basis is not on a downward curve. It's on an upward one. And the procurement assumption sitting inside most mid-market AI budgets right now is the opposite. Which means the budget is wrong.

What does this actually do to a mid-market P&L?

Take a private firm doing $30M in revenue, running maybe twelve to fifteen percent net margin. The exec team has approved a handful of AI initiatives. Some sit inside Microsoft 365. Some are bolt-on tools the marketing team signed up for. A few are pilots IT is running. Nobody has a unified view of what the firm is actually spending on AI tokens, model access, and the integration scaffolding around them.

This is the pattern Josef Holm sees inside almost every operating firm we touch. It has a name: AI Fragmentation. The firm is paying for AI in eight places, getting visible output from two, and exposing data through all eight. Nobody at the exec table can say which workflows pay back and which are silently compounding cost.

Now add the upward token curve to that picture. The fragmented spend isn't just inefficient. It's inflating. Every quarter the same workflows cost more to run, and because nobody owns the consolidated view, nobody notices until the annual reconciliation. By then the margin has already moved.

The cost is real and it shows up in a few places. Throughput that didn't materialise because the pilots stalled. Margin compression from tool spend nobody approved. Data sitting in vendor environments under terms nobody at the exec level has read. Headcount plans built on productivity assumptions that never landed.

So what's the right move when the curve isn't on your side?

This is the part where most articles tell you to "build an AI strategy" and link to a fifty-page framework. That's not the move.

The move is to stop assuming the curve will save you and start treating AI spend the way you treat every other operating cost in the business. Which means knowing what you have, knowing what it produces, and killing what doesn't earn its line.

Most operators expect the answer to look like a change programme. More tools. A pilot budget. Maybe hiring a Chief AI Officer. The market has trained them to expect that, and every consulting deck reinforces it.

The actual answer is the opposite. Fewer tools. A decision layer that sits above them. A workflow-by-workflow verdict on each piece of AI the firm is already running. That verdict is what Josef Holm calls Kill, Fix, Build, and it's what the AI readiness note produces. Each existing workflow gets one of three calls. Kill the ones that consume budget and produce no measurable output. Fix the ones that work but leak data or run outside any governance line. Build only where the prior two steps have made room.

The deliverable is an Opportunity Map. A prioritised remediation roadmap that compounds throughput inside an enforced governance line. Throughput and data sovereignty on the same page, because at the mid-market scale they are the same problem. The token bill and the exposure surface are produced by the same fragmented stack.

What about the firms that don't have a million dollars to spend on tokens?

This is where the Steinberger screenshot actually matters to an operator.

The argument from the AI-is-for-the-rich camp goes like this: companies with capital reach product-market fit faster because they can iterate at frontier model prices. Companies without that capital fall further behind every quarter. The gap widens.

For a venture-backed startup, that's probably true. For a $30M private services firm, it's the wrong frame entirely. You're not racing to PMF. You found product-market fit a decade ago. Your job is not to outspend competitors on tokens. Your job is to extract more throughput per employee from the AI you already pay for, without widening the data surface your clients or regulator care about.

This is the part Mitchell Hashimoto put his finger on when he described what he calls AI psychosis at companies. Firms are optimising for fixing issues quickly instead of having fewer issues. Ship buggier work in a tenth of the time, fix it afterwards. That logic works at a SaaS company shipping consumer features. It does not work at a wealth manager, a corporate-services firm, a manufacturer with regulated supply chains, or a B2B services business where one privileged document landing in the wrong vendor's training set ends the client relationship.

The mid-market operator's edge is not raw token spend. It's judgement. Which workflows to run, which to kill, where the governance line goes, and which exposures cannot be tolerated regardless of throughput upside.

What does "forward deploy yourself" actually mean for an operator?

The closing line in the source piece is to put on the turtleneck and forward deploy yourself. Embed directly. Use AI hands-on inside real work.

I'd translate that for the C-suite reader.

You do not need to become a prompt engineer. You need to stop treating AI as an IT line item and start treating it as an operating discipline that sits across functions and reports up to you. In practice, that means a few things.

First, get a real inventory. Not what IT thinks is being used. What is actually being used, including the Shadow AI nobody approved. The ChatGPT accounts on personal emails. The Copilot licences sitting in Microsoft 365 that the firm is paying for whether anyone uses them. The embedded AI features inside tools the firm bought for other reasons.

Second, draw the governance line. Decide which data classes are allowed in which environments, under which terms. Make that call once, at the exec level, so it does not get made forty times by individuals on a Tuesday afternoon.

Third, stop greenlighting new AI pilots until the existing ones have a verdict. Every new tool added to a fragmented stack compounds the problem. The discipline is to consolidate before you expand.

This is the work the operator note ongoing work was built to do on an ongoing basis. It's the firm equivalent of a AI leadership, except the work is not a single person filing memos. It's a Principal-led mandate sitting beside the exec team, making the calls that keep the stack tight as the cost curve keeps moving the wrong way.

The bottom line

The frontier is not getting cheaper. The vendors selling you AI know it. Most operators have not absorbed it yet.

The firms that win the next three years are not the ones spending the most on tokens. They're the ones running the leanest AI stack with the cleanest governance line, extracting throughput from a handful of workflows they actually understand, and refusing to add tools until the existing ones have earned their place.

If you don't know what your firm is currently spending on AI across every cost centre, who approved it, what it produces, or where the data sits, that's the gap. It widens every quarter you leave it open.

Infographic

Frequently Asked Questions

Is AI actually getting cheaper over time?: At the cheap end, yes. Small models and commodity tasks are flat or down. But the frontier models doing the real work are getting more expensive. GPT 5.5 runs 49 to 92 percent more than 5.4 in practice. Opus kept its sticker price flat then shipped a tokeniser that produces responses 35 percent longer. The part of the stack actually doing operator work is on an upward curve, not a downward one.
What is AI Fragmentation in a mid-market firm?: It's the pattern where a firm is paying for AI in eight places, getting visible output from two, and exposing data through all eight. Microsoft 365 features, bolt-on marketing tools, IT pilots, Shadow AI on personal accounts. Nobody at the exec table can say which workflows pay back and which are silently compounding cost.
How should an operator budget for AI tokens in 2025?: Stop assuming the cost curve will save you. Treat AI spend the way you treat every other operating cost. Know what you have, know what it produces, and kill what doesn't earn its line. The fragmented stack is inflating quarter over quarter, and the margin moves before the annual reconciliation catches it.
What is Kill, Fix, Build?: It's the verdict Josef Holm applies to every AI workflow already running inside a firm. Kill the ones that consume budget and produce no measurable output. Fix the ones that work but leak data or run outside any governance line. Build only where the prior two steps have made room. The output is an Opportunity Map: a prioritised remediation roadmap.
Does the mid-market need to compete on token spend?: No. You're not racing to product-market fit. You found that a decade ago. The edge is judgement: which workflows to run, which to kill, where the governance line goes, and which data exposures cannot be tolerated regardless of throughput upside.
What does forward deploying yourself mean for a CEO?: Stop treating AI as an IT line item. Treat it as an operating discipline that sits across functions and reports up to you. Get a real inventory including Shadow AI. Draw the governance line once at the exec level. Stop greenlighting new pilots until the existing ones have a verdict.