Why did Klarna's AI customer service strategy backfire?

Klarna's AI agent was optimised for resolution speed and cost, not retention and relationship quality. It hit the measurable objective and destroyed the ones that mattered. Klarna began rehiring the human agents it had cut.

How is intent engineering different from prompt and context engineering?

Prompt engineering asks how to talk to AI. Context engineering asks what the AI needs to know. Intent engineering asks what the organisation needs the AI to want. The first two are necessary. Neither is sufficient on its own.

Why have Microsoft Copilot rollouts stalled at most companies?

Gartner found only 5% of organisations moved from pilot to scaled deployment. The fundamental issue is not UX or model quality. It is deploying AI tools without intent alignment, so you get activity in a dashboard and no measurable impact on what the business is trying to accomplish.

What does an intent architecture actually look like?

Three layers. Unified context infrastructure agents can access securely. A coherent AI worker toolkit shared across the organisation. And goal-translation infrastructure: agent-practical objectives, encoded decision boundaries, value hierarchies for trade-offs, and feedback loops to measure alignment drift.

What should a CEO do first if their AI estate is fragmented?

Stop treating AI as a CIO project. Make intent architecture a leadership-team decision. Map workflows by agent-readiness, encode the trade-offs your senior people make intuitively, and build the governance line that tells agents what to want before they start working at scale.

Klarna Saved $60M and Lost the Thing That Mattered

Josef Holm13 min readJune 8, 2026

Key Takeaways

Klarna's AI agent did the work of 853 employees and saved $60M, then the CEO admitted publicly the strategy had cost something far more valuable and started rehiring humans. The AI worked too well at the wrong objective.
Prompt engineering was era one. Context engineering, framed by Anthropic in September 2025, was era two. Intent engineering is era three: encoding organisational purpose into infrastructure so agents make decisions that are strategically coherent, not just technically correct.
Microsoft Copilot was adopted by 85% of the Fortune 500. Gartner found only 5% scaled past pilot. Deloitte's 2026 survey found 84% of companies have not redesigned jobs around AI and only 21% have mature agent governance. These are intent failures, not technology failures.
Three layers are missing in most firms: unified context infrastructure, a coherent AI worker toolkit, and goal-translation infrastructure that converts human-readable objectives into agent-useful parameters with encoded trade-offs, decision boundaries, and feedback loops.
Context without intent is a loaded weapon with no target. The agents are already running. The question is what they have been told to want.

Klarna Saved $60M and Lost Something It Can't Buy Back

In January, Klarna reported its AI agent now does the work of 853 full-time employees and has saved the company $60 million. In the same earnings cycle, the CEO admitted publicly that the AI strategy had cost something far more valuable than $60 million, and he is still trying to buy it back.

This is not another AI-is-overhyped story. It is the opposite. The AI worked too well.

The distinction between AI that fails and AI that succeeds at the wrong thing is the most important unsolved problem in enterprise AI right now. Bigger than context engineering, though that is part of it. Much bigger than prompt engineering, which now looks like a warm-up act.

What we are actually talking about is intent engineering. The discipline of making organisational purpose, goals, values, trade-offs, decision boundaries, machine-readable and machine-practical so that when you deploy an autonomous system, it works toward what your business actually needs, not just what it can measure.

What Actually Happened at Klarna

Early 2024, Klarna rolled out an AI-powered customer service agent. It handled 2.3 million conversations in the first month across 23 markets in 35 languages. Resolution times dropped from 11 minutes to two. The CEO projected $40 million in savings.

Then customers started complaining. Generic answers. Robotic tone. No ability to handle anything requiring judgment.

By mid-2025, CEO Sebastian Siemiatkowski told Bloomberg that cost had been the predominant evaluation factor and the result was lower quality. Klarna began rehiring the human agents it had cut.

Most people read this as proof that AI cannot handle subtlety. That was a comforting interpretation in early 2025. The more interesting reading in 2026? The AI agent was extraordinarily good at resolving tickets fast, and resolving tickets fast was the wrong goal.

Klarna's organisational intent was never "resolve tickets fast." It was "build lasting customer relationships that drive lifetime value in a competitive fintech market." Those are profoundly different goals. They require profoundly different decisions at the point of interaction.

A human agent with five years at the company knows the difference intuitively. She knows when to bend a policy. She knows when to spend three extra minutes because the customer's tone says they are about to churn. She knows when efficiency is the right move and when generosity is. She absorbed those judgments from the decisions managers made every day, the stories veterans told new hires, the unwritten rules about which metrics leadership actually cared about when push came to shove.

The AI agent had none of it. It had a prompt. It had context. It did not have intent.

Three Eras of AI Discipline

Naming matters. We are short on naming things correctly when it comes to AI and organisational alignment, so let me try.

Prompt engineering was the first discipline. Individual, synchronous, session-based. You sit in front of a chat window, craft an instruction, iterate the output. Personal skill, personal value. The era that produced a thousand "how to write the perfect prompt" blog posts, most of them forgettable.

Context engineering followed. Anthropic published a foundational piece in September 2025 framing context engineering as the shift from crafting isolated instructions to crafting the entire information state an AI system operates within. Harrison Chase at LangChain put it more bluntly: "Everything's context engineering." Building RAG pipelines, wiring up MCP servers, structuring organisational knowledge so agents can access it. Necessary. Not sufficient.

Intent engineering is the third discipline. Almost nobody is building for it yet.

Context engineering tells the agent what to know. Intent engineering tells the agent what to want.

It is the practice of encoding organisational purpose into infrastructure. Not as prose in a system prompt, but as structured, useful parameters that shape how agents make decisions autonomously. The layer that would have told Klarna's AI: yes, you can resolve this ticket in 90 seconds, but the customer has been with us four years, their tone indicates frustration, spend the extra time, offer them a specialist. The goal is retention, not throughput.

Without it, you get what Klarna got. A technically brilliant agent working toward exactly the wrong objective.

The Investment Paradox Nobody Wants to Name

The Deloitte 2026 State of AI in the Enterprise report surveyed 3,000+ leaders across 24 countries. 84% of companies have not redesigned jobs around AI capabilities. Only 21% have a mature model for agent governance.

Those are not technology numbers. They are intent failures.

What makes this disorienting is the juxtaposition. Investment is massive and accelerating. Deloitte's tech value survey found 57% of respondents are putting between 21% and 50% of their digital transformation budgets into AI automation. 20% of companies are investing more than half. KPMG's Q4 AI pulse showed capital flowing, ROI confidence rising, agents moving from pilots to professional platforms. Gartner predicts that by 2028, 15% of day-to-day work decisions will be made autonomously by agents. That number is probably low.

And yet. 74% of companies globally report they have yet to see tangible value from AI. McKinsey found 30% of AI pilots failed to achieve scaled impact.

Once you peel the onion, there is no contradiction. Organisations have solved "can AI do this task." They have not solved "can AI do this task in a way that serves our goals, at scale, with appropriate judgment." That second question is an intent question.

Look at Microsoft Copilot. One of the most heavily invested enterprise AI products in history. Embedded into every Office application. 85% of Fortune 500 companies adopted it. Then adoption stalled hard. Gartner found only 5% of organisations moved from a Copilot pilot to larger-scale deployment. Only about 3% of the total Microsoft 365 user base actually adopted Copilot as paid users. Bloomberg reported Microsoft slashing internal sales targets after the majority of salespeople missed their goals.

Standard explanations centre on UX and model quality. Real issues, but not the fundamental one. The fundamental issue is that deploying an AI tool across an organisation without intent alignment is like hiring 40,000 new employees and never telling them what the company does, what it values, or how to make decisions. You get activity. You do not get productivity. You get AI usage in a dashboard and almost no measurable impact on what the business is trying to accomplish.

That is not a tools problem. That is an intent gap.

The Three Layers Most Companies Have Not Built

The intent gap operates at three altitudes. Getting one right is helpful. Getting all three right is the difference between having AI tools and having an AI-native operating model.

Layer One: Unified Context Infrastructure

The industry is most aware of this layer and still has not built it. Every team rolls its own context stack. One team pipes Slack data through a custom RAG pipeline. Another exports Google Docs into a vector store. A third built an MCP server that connects to Salesforce but not to Jira. A fourth does not know the other three exist.

Analysts call this the shadow agents problem. It mirrors the shadow IT crisis of the early cloud era except the stakes are higher, because agents do not just access data, they act on it. Security and compliance teams cannot allow arbitrary, unvetted agents running on developer laptops to touch customer PII, financial data, or healthcare records. Without sanctioned infrastructure, that is exactly what is happening.

Anthropic's Model Context Protocol, introduced late in 2024 and donated to the Linux Foundation in December 2025, is the most promising standardisation attempt. OpenAI, Google, Microsoft, and more than 50 enterprise partners have committed. Monthly SDK downloads are approaching 100 million. But protocol adoption and organisational build are very different things. Having a USB-C standard does not help if your company has not decided which ports to install, who maintains them, or what gets plugged in.

The context infrastructure question is not technical. It is architectural and political. Which systems become agent-accessible? Who decides what context an agent can see across departments? How do you version organisational knowledge so agents are not operating on stale information? How do you handle the fact that sales-team Slack and engineering-team Slack encode completely different institutional assumptions?

Layer Two: A Coherent AI Worker Toolkit

Everyone is rolling their own workflow. One person uses Claude for research and ChatGPT for drafting. Another uses Cursor for code and Perplexity for fact-checking. A third has built a custom agent chain. A fourth is copy-pasting into a chat window. None of those workflows is transferable, measurable, or improvable by anyone else.

The gap between individual AI use and organisational AI value is enormous. It is the difference between AI activity and AI fluency. Bolting AI onto existing workflows gets 30% gains. Rethinking the workflow itself around AI capabilities gets 300%.

Fluency does not scale through training alone. It scales through shared infrastructure. Whether any one person has Slack does not matter. Whether an agent can search 50 people's Slack plus their docs plus their project plans plus the customer data, that determines whether the agent can do organisational-scale work or only individual-scale tasks.

Deloitte's 2026 report found workforce access to sanctioned AI tools expanded by 50% in a year. Access is not the same as value. Organisations are giving people tools without giving the tools the organisational context to deliver real value. That is where Klarna's story intersects with Copilot's. Tools deployed without organisational infrastructure become expensive toys.

Layer Three: Intent Engineering Proper

This is the layer that almost certainly does not exist in your business. It matters the most and it requires something genuinely new.

OKRs were designed for people. They encode human-readable goals. They assume human judgment about prioritisation, trade-offs, values, and exceptions. They assume a manager can look a direct report in the eye, say "here is what matters this quarter," and trust that the report will interpret that guidance through institutional context, professional norms, and personal judgment built over years.

Agents have none of that. An agent does not know your company's OKRs unless you put them in the context window. It does not know which trade-offs your leadership team would prefer unless you encode those preferences in a way it can act on. It does not know the difference between a decision that should be escalated and one it should make autonomously unless you define the boundary. And unlike a human employee, the agent will not absorb your company culture through six months of all-hands meetings, hallway conversations, and watching senior people handle ambiguous situations. None of that works.

Agents need explicit alignment, and they need it before they start working, not six months after.

This means organisations need to build something that mostly does not exist: machine-readable expressions of organisational intent. A cascade of specificity that most companies have never had to produce because humans filled in the gaps.

At the top, goal structures the agent can act on. Not "increase customer satisfaction," which is a human-readable aspiration, but an agent-practical objective: what signals indicate customer satisfaction in our context, what data sources contain those signals, what actions am I authorised to take, what trade-offs am I allowed to make on speed versus thoroughness or cost versus quality, where are the hard boundaries.

Below that, delegation frameworks. Tenets translated into decision boundaries. Amazon's leadership principles work for humans because humans can interpret "customer obsession" through contextual judgment. An agent needs that principle decomposed: when customer request X conflicts with policy Y, here is the resolution hierarchy. When data suggests action A but the customer expressed preference B, here is the decision logic. These are not rules in the traditional sense. They are encoded judgment. The kind a senior employee carries in her head after five years.

At the base, feedback mechanisms that close the loop. When an agent makes a decision, was it aligned with intent? How do you know? That is exactly what failed at Klarna. The agent worked toward resolution speed because that was what it could measure. Nobody had encoded the objectives that mattered most: relationship quality, brand trust, lifetime value. Those objectives lived in the heads of the human agents who had been laid off.

The age of "humans just know" is ending. Intent engineering is the discipline of making what humans know explicit, structured, and machine-useful. Not because humans are leaving, but because the agents arriving to work alongside them cannot function without it.

Why This Has Not Been Built

Three reasons.

It is genuinely new. Before agents could run autonomously over long time horizons, we did not need it. The human was the intent layer. Long-running agents break that model.

People who understand organisational strategy are not the people who build agents, and people who build agents do not always understand organisational strategy. Classic two-cultures problem. MIT found that AI investment is still viewed primarily as a tech challenge for the CIO rather than a business issue that requires leadership across the organisation. That framing guarantees an intent gap. CIOs can build infrastructure. Intent comes from the whole leadership team.

It is hard. Making organisational intent explicit and structured is extremely difficult. Most goals live in slide decks, half-read OKR documents, leadership principles cited in performance reviews but never operationalised, and the tacit knowledge of experienced employees who know what to do in ambiguous situations even though no one ever told them. Nobody has strong muscles here because most organisations have never exercised them.

What the Solution Looks Like

I do not want to leave you with a gap.

At the infrastructure level, you need a composable, vendor-agnostic architecture that lets agents operate across systems, tools, and models securely and at scale. MCP is the protocol layer. The organisational setup requires decisions about data governance, access controls, freshness guarantees, and semantic consistency that no protocol will make for you. Treat it the way a serious company treated its data warehouse strategy in the 2000s. Core strategic investment. Not an IT project.

At the workflow level, you need an organisational capability map for AI. A shared, living understanding of which workflows are agent-ready, which are agent-added to with a human in the loop, and which remain human-only. Not a static document filed in Confluence. An operating system that evolves as agent capabilities improve. The companies that do this well will create a new role, something like an AI workflow architect, sitting between engineering, operations, and strategy.

At the alignment level, the genuinely new thing. Goal translation infrastructure that converts human-readable objectives into agent-useful parameters. Decision boundaries. Value hierarchies for resolving trade-offs. Feedback loops for measuring alignment drift over time. Google's Agent Development Kit is one of the early formal attempts, separating agent context into working context, session memory, long-term memory, and artifacts. Researchers at Google DeepMind have proposed five levels of agent autonomy (operator, collaborator, consultant, approver, observer) each with different intent-alignment requirements. Early sketches. The integrated system is still whitespace.

If OKRs were the management innovation that let Intel align thousands of humans in the 1970s, intent engineering is the management innovation that lets organisations align hundreds or thousands of agents to the same objectives in 2026, while those agents operate at speeds and scales no human manager can supervise.

OKRs took decades to become standard. We do not have 20 years.

The Race Is Not an Intelligence Race

For three years, the AI race has been framed as an intelligence race. Best model. Best benchmarks. Biggest context window. That made sense when models were the bottleneck. Models are not the bottleneck for most organisational use cases anymore. The frontier models are extraordinarily capable. The differences between them matter far less than the differences between the organisations that give them clear, structured, goal-aligned intent and the ones that do not.

A company with a mediocre model and extraordinary intent infrastructure will outperform a company with a frontier model and fragmented, inaccessible, unaligned organisational knowledge every single time.

The most important AI investment in 2026 is not a model subscription or another Copilot licence. It is organisational intent architecture. Making your company's goals, values, decision frameworks, and trade-off hierarchies discoverable, structured, and agent-specific. The alignment infrastructure that lets agents make decisions that are not just technically correct, but strategically coherent.

Klarna's story was never "AI does not work." The AI worked brilliantly. That was the problem. It was so good at hitting the measurable objective that nobody noticed it was destroying the ones that mattered. The 700 human agents who walked out the door took with them the institutional knowledge that had never been documented. Humans just knew.

Prompt engineering asked: how do I talk to AI? Context engineering asks: what does AI need to know? Intent engineering is asking the question that actually matters: what does the organisation need AI to want?

Context without intent is a loaded weapon with no target. We have spent years building AI systems. 2026 is the year to learn to aim them.

If your firm is sitting on a fragmented AI estate, multiple tools, no shared infrastructure, no encoded intent, the gap will not close on its own. That is the work behind the operator note ongoing work: the decision layer, the intent architecture, the governance line that lets autonomous systems compound throughput without compounding the wrong objective. The agents are running. The question is what they have been told to want.

Infographic

Frequently Asked Questions

What is intent engineering?: Intent engineering is the discipline of making organisational purpose, goals, values, trade-offs, and decision boundaries machine-readable and machine-practical, so autonomous agents work toward what the business actually needs, not just what they can measure.
Why did Klarna's AI customer service strategy backfire?: Klarna's AI agent was optimised for resolution speed and cost, not retention and relationship quality. It hit the measurable objective and destroyed the ones that mattered. Klarna began rehiring the human agents it had cut.
How is intent engineering different from prompt and context engineering?: Prompt engineering asks how to talk to AI. Context engineering asks what the AI needs to know. Intent engineering asks what the organisation needs the AI to want. The first two are necessary. Neither is sufficient on its own.
Why have Microsoft Copilot rollouts stalled at most companies?: Gartner found only 5% of organisations moved from pilot to scaled deployment. The fundamental issue is not UX or model quality. It is deploying AI tools without intent alignment, so you get activity in a dashboard and no measurable impact on what the business is trying to accomplish.
What does an intent architecture actually look like?: Three layers. Unified context infrastructure agents can access securely. A coherent AI worker toolkit shared across the organisation. And goal-translation infrastructure: agent-practical objectives, encoded decision boundaries, value hierarchies for trade-offs, and feedback loops to measure alignment drift.
What should a CEO do first if their AI estate is fragmented?: Stop treating AI as a CIO project. Make intent architecture a leadership-team decision. Map workflows by agent-readiness, encode the trade-offs your senior people make intuitively, and build the governance line that tells agents what to want before they start working at scale.