What are the six layers of the agent infrastructure stack?

Compute sandboxes (where agents run code safely), identity and communication (how agents prove who they are), memory (what agents retain across sessions), integrations (how agents connect to enterprise tools), payments (how agents transact without human sign-off), and orchestration (how multiple agents coordinate at scale).

Which layer of the agent stack is most underdeveloped?

Orchestration. There is no production-grade system yet for scheduling, lifecycle management, failure recovery, cost controls, and supervision hierarchies across multi-agent workflows. Every team is hand-rolling their own solution. Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025, but the tooling has not caught up.

What is the platform risk for standalone agent memory providers?

Every major frontier lab is building memory directly into their models. If memory becomes a model-level feature the way search became part of ChatGPT, standalone memory companies face serious existential pressure. The counter-argument is portability: businesses may not want a single hyperscaler owning their agent's memory. That trade-off is genuinely uncertain.

How does agent infrastructure affect system reliability for businesses?

Reliability multiplies across components. If five infrastructure layers each run at 97% uptime, your end-to-end system uptime is roughly 86%. That is not a rounding error. That is the difference between a system your team trusts and one they work around. Leaders need to know how many layers their agent workflows depend on and what each layer's actual reliability record is.

What should business leaders do now to prepare for the agent infrastructure shift?

Build context engineering as an internal competency, adopt eval-driven development so agents can produce outcomes without constant human review, and develop real stack literacy across your leadership team. You need to know which components are agent-native, which are shims, and which layers represent genuine competitive risk if they fail or get replaced by a standard.

What are the six layers of the agent infrastructure stack?

Compute sandboxes (where agents run code safely), identity and communication (how agents prove who they are), memory (what agents retain across sessions), integrations (how agents connect to enterprise tools), payments (how agents transact without human sign-off), and orchestration (how multiple agents coordinate at scale).

Which layer of the agent stack is most underdeveloped?

Orchestration. There is no production-grade system yet for scheduling, lifecycle management, failure recovery, cost controls, and supervision hierarchies across multi-agent workflows. Every team is hand-rolling their own solution. Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025, but the tooling has not caught up.

What is the platform risk for standalone agent memory providers?

Every major frontier lab is building memory directly into their models. If memory becomes a model-level feature the way search became part of ChatGPT, standalone memory companies face serious existential pressure. The counter-argument is portability: businesses may not want a single hyperscaler owning their agent's memory. That trade-off is genuinely uncertain.

How does agent infrastructure affect system reliability for businesses?

Reliability multiplies across components. If five infrastructure layers each run at 97% uptime, your end-to-end system uptime is roughly 86%. That is not a rounding error. That is the difference between a system your team trusts and one they work around. Leaders need to know how many layers their agent workflows depend on and what each layer's actual reliability record is.

What should business leaders do now to prepare for the agent infrastructure shift?

Build context engineering as an internal competency, adopt eval-driven development so agents can produce outcomes without constant human review, and develop real stack literacy across your leadership team. You need to know which components are agent-native, which are shims, and which layers represent genuine competitive risk if they fail or get replaced by a standard.

The Agent Infrastructure Stack Is Forming Now

Josef Holm10 min readApril 7, 2026

Key Takeaways

A six-layer infrastructure stack is forming right now, built for AI agents, not humans: compute sandboxes, identity, memory, integrations, payments, and orchestration.
Each layer has real companies competing with genuinely different architectural bets; some are mature, some are rough prototypes marketed as production-ready.
System reliability compounds fast in the wrong direction: five components at 97% uptime each yields roughly 86% end-to-end reliability, which breaks trust quickly.
The orchestration layer (how agents work with other agents at scale) is the biggest gap in the stack and likely where the next defining infrastructure company comes from.
Leaders do not need to build these layers, but they must understand them; ignorance of the stack is a direct strategic liability.

The Agent Infrastructure Stack Is Forming Right Now. Most Leaders Are Ignoring It.

A new infrastructure stack is being built, and it's not designed for humans. It's designed for AI agents. Billions of dollars are flowing into it. If you're a business leader who doesn't understand what's happening at the layer level, you're making strategic bets you can't actually see.

I've watched two previous infrastructure shifts unfold in real time. The move from on-premise servers to cloud computing between 2006 and 2010. Then the decomposition of monolithic applications into microservices between 2012 and 2016. Both times, the builders who understood the new stack early built companies that are now dominant. The ones who didn't spent years catching up, or never did.

This is the third shift. And it's moving faster than either of the first two.

What does "agent-first infrastructure" actually mean?

Here's the simplest way to explain it. For the last 30 years, every piece of software infrastructure was built assuming a human would use it. Dashboards, login screens, settings pages, billing portals. All designed for people clicking buttons and reading screens.

That assumption is breaking.

The new customer for infrastructure is the agent itself. Not a person using a tool. An autonomous system that needs to authenticate, execute, remember, communicate, pay for services, and coordinate with other agents. The companies building interfaces for those capabilities are effectively building the operating system for the agentic economy. That's not hype. That's what's happening.

One useful analogy: think of system calls. When software needs to interact with an operating system, it uses defined, reliable interfaces for memory, compute, file access, and networking. Agents need the same thing. Defined, reliable interfaces for identity, compute, memory, communication, and payments.

The problem? Right now, the components in this stack are a mix of mature building blocks and rough prototypes, all marketing themselves as production-ready. Knowing the difference matters enormously.

Let me walk through the six layers.

Layer 1: Where do agents actually run?

This is the most mature layer in the stack, and the easiest to understand. Agents need isolated, sandboxed, auditable environments to run code. Not on your laptop. Not in production. Not unsupervised.

Several companies are competing here with genuinely different architectural bets.

E2B, with roughly $32 million in funding, uses Firecracker microVMs, the same technology behind AWS Lambda. Each agent gets a session with its own dedicated kernel. The sandbox is ephemeral: spin up, run code, spin down. Daytona, which raised a $24 million Series A, uses Docker containers with a shared kernel, improved for speed with claimed 90-millisecond cold starts and support for persistent state. Modal targets GPU-heavy workloads. Browserbase, valued at around $300 million after its Series B, focuses on headless browser automation so agents can interact with web pages the way a human would. Alibaba's OpenSandbox is a newer entrant.

The core architectural split here is worth understanding: ephemeral versus persistent sandboxes. E2B treats sandboxes as disposable. Others assume agents will return, install dependencies, and maintain state across sessions.

That's not a stylistic preference. It's a fundamental bet on how long agent sessions will run and whether state matters. Both approaches will likely survive. The agentic economy is big enough for more than one answer.

But if you're building on one of these, you should know which bet you're making.

Layer 2: How does an agent prove who it is?

This is where things get interesting and messy. Agents need to exist on the internet as entities. They need to send and receive messages, authenticate with services, and hold identities that other systems can verify.

AgentMail, which raised a $6 million seed from General Spark with Paul Graham and HubSpot CTO Dharmesh Shah as angel investors, provides an API to programmatically create email inboxes for agents. Real addresses with full threading, attachments, labels, search, and self-signup onboarding.

The thesis is clever: email functions not just as communication but as a fundamental identity layer. Every SaaS service requires an email at signup. Every verification flow sends codes to one. Give an agent an email address and you've effectively given it an internet identity.

But is this the right long-term architecture? Or is it a shim?

I think it's both. Email works because it's everywhere, not because it's the right protocol for agents. The problems are real: brittle threading, rate limits designed to block automated behavior, poor signal-to-noise ratios for agent context windows. What agents actually need are native identity and communication protocols that don't require pretending to be human. Other approaches are being explored, including on-chain agent identity, dedicated agent-to-agent communication standards, and MCP-based service discovery. No clear winner has emerged.

Betting on agent email is a pragmatic bet, not necessarily an architectural one. Though I'll note that email has proven notably hard to kill over the last 50 years.

Layer 3: Can agents actually remember anything useful?

Agents need to remember information not just within a session but across many sessions, tasks, and days. This sounds simple. It's not.

Mem0, which has raised $24 million and accumulated over 41,000 GitHub stars and 14 million downloads, was selected by AWS as the exclusive memory provider for its agent SDK. Their API call volume has grown fivefold.

What Mem0 gets right is the framing. Memory for agents is not about saving conversation history. That's the old chatbot approach. Real agent memory is active curation: the system stores what matters, deliberately forgets outdated or conflicting details, and recalls only relevant context at inference time. Their architecture uses a hybrid data store combining a network graph, a vector database, and a key-value store.

On the LoCoMo benchmark, Mem0 outperforms OpenAI's built-in memory by 26% on accuracy, with 91% faster latency and 90% reduced token usage. Those numbers are notable.

Here's the platform risk every leader should understand. Every frontier lab is investing in building memory directly into their models. OpenAI has long-term memory investments. Anthropic is building memory into Claude. If memory becomes a model-level feature, the way search became integrated into ChatGPT, standalone memory companies face existential risk.

Mem0's counter-thesis is portability. No single company should own an agent's memory. Whether the market values portability over hyperscaler convenience is, in my honest assessment, genuinely uncertain. I'd call it a coin flip shaped by what builders and enterprises actually demand.

This is exactly the kind of layer where stack literacy stops being optional and starts being a strategic requirement.

Layer 4: How do agents interact with the tools businesses already use?

Agents need to interact with enterprise tools. Slack, Jira, Salesforce, GitHub, Google Workspace. And they need to interact with basic computing primitives. Managing those integrations at scale is an enormous combinatorial problem.

Composio, which raised $29 million from Lightspeed, provides a managed integration layer for agents. Authentication handling without complex OAuth flows. Pre-built connectors to hundreds of solutions. Observability on every tool call.

Why does this matter? Without middleware, every agent builder independently manages credentials, OAuth flows, rate limits, error handling, and API schema changes for every tool the agent touches. At enterprise scale, where an agent might touch a CRM, ticketing system, email, and calendar in a single workflow, this is unsustainable. The problem is N times M: the number of agents times the number of tools.

I've seen this pattern before. It's the same reason API gateways and integration platforms became essential infrastructure in the microservices era. The math just doesn't work without a shared layer.

The long-term risk is standardization. If MCP becomes a true universal standard, the value of managed integration layers diminishes. The company thesis depends on enterprises being slow to adopt MCP. Having spent 25 years watching how slowly large organizations adopt new standards, I'd say that's a reasonable bet. But it's still a bet.

Layer 5: How do agents pay for things?

This layer barely existed six months ago. Now it's arriving fast.

Agents need to acquire services and pay for them securely. Provisioning databases. Upgrading hosting tiers. Transacting without requiring human authentication at each step.

Stripe recently launched its Agent Toolkit and Stripe Projects as the first credible trust layer for agent-to-service transactions. Agents use CLI commands to provision their own databases, upgrade hosting tiers, and transact. Stripe tokenizes payment credentials so raw card details never leave Stripe's vault. Databases are ready in approximately 350 milliseconds, free to start, and scale to zero when inactive.

Since the start of 2025, agents have been able to handle nearly every aspect of spinning up a project except creating accounts and provisioning infrastructure, which always required a human in the loop. Stripe is closing that gap.

What's still missing is big:

Agent-to-agent payment protocols
Metered billing mapped to agent compute patterns
Dynamic budget allocation, where Agent A can spend up to a set amount without human approval while Agent B requires it
Financial observability across agent workflows

This layer immediately looks like fundamental infrastructure for how agents build on the web. The design choices are telling. They're refined for agent legibility and buildability, not human dashboard interaction.

If you're running a business that will depend on agent-driven operations, and most businesses will within 18 months, the financial control layer is something you need to understand now. Not after your agents have already racked up unmonitored costs. This is one of the areas we evaluate in our AI Operating Review.

Layer 6: How do agents work with other agents?

This is the biggest opportunity and the biggest gap in the entire stack.

Agents need to work with other agents reliably at scale. With fallback handling, audit trails, cost controls, and human escalation paths. How big is the demand signal? Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025. That's not a typo.

The current problem is structural. Existing tooling like LangChain operates at the framework level, not the infrastructure level. The gap between spinning up three agents in a notebook and reliably running 50 agents across enterprise systems, with failure recovery, cost controls, audit logging, and human escalation, is enormous. Right now, every team is hand-rolling that gap.

What doesn't exist yet but needs to:

Scheduling and lifecycle management. Handling agent creation, assignment, health checking, scaling, and termination as a managed service. Not in the Kubernetes container sense, but purpose-built for agents.

Merge and coordination infrastructure. Built from the ground up for parallel agent work: merge queues, conflict detection, and resolution protocols when multiple agents work on related tasks simultaneously.

Supervision hierarchies. Meta-agents that monitor, evaluate, and course-correct other agents, configured as infrastructure rather than coded as a framework pattern.

Financial observability. FinOps for agents: tracking what each agent spent, outcome quality, and cost per successful task across multi-agent workflows.

Standard failure and recovery patterns. Defined protocols for what happens when an agent's tool call fails, rather than ad hoc decisions made per team.

The structural analogy is Kubernetes. Not the compute itself, but the scheduling, scaling, health checking, and lifecycle management that made compute usable at enterprise scale. Whoever solves orchestration at infrastructure grade will own the most valuable position in the agent stack.

No winner has emerged yet. If I were placing bets on where the next defining company comes from, this is the layer I'd watch.

What does this mean if you're running a business, not building infrastructure?

You don't need to build any of these layers yourself. But you absolutely need to understand them.

Reliability compounds in the wrong direction. When an agent depends on five different primitives, end-to-end reliability is the product of each component's reliability. Five components at 99% uptime each yields only about 95% system uptime. At 97% each, you're down to roughly 86%. That's not a minor detail. That's the difference between a system your team trusts and one they route around.

Transitional lock-in is a real risk. Building on shims, like email as identity, creates migration costs when native protocols arrive. Every shim you adopt is a bet that it either becomes the standard or can be swapped out cheaply. Most leaders aren't even aware they're making this bet. Your team should be able to distinguish between what is truly agent-native and what is a pragmatic stopgap.

Agent sprawl is coming. The same dynamic that plagued microservices in 2018, where everything was forced into a microservices architecture regardless of fit, is now happening with agents. Enterprises are deploying agents without observability or orchestration layers. The result is unexpected actions and unmanageable complexity. This problem will grow through 2026 unless organizations invest now in orchestration infrastructure.

I've seen this exact pattern play out twice before. The companies that got ahead of it didn't necessarily move fastest. They moved most deliberately.

So what should you actually do?

Build context engineering as a core competency. What you feed to an agent directly determines its outcomes. This is not a technical detail to delegate. It's a strategic capability.

Adopt eval-driven development. Agents must be able to drive autonomously toward results. If every output requires human review, you haven't built an agent. You've built a suggestion engine with extra steps.

Develop stack literacy across your leadership team. Understanding which layer of the stack represents a competitive advantage, which components are being hand-rolled, and which are shims is no longer optional. Agent-driven business outcomes are so dependent on these infrastructure layers that ignorance of the stack is a strategic liability.

This is the core reason we built the HIP OS platform and why our work with clients always starts with understanding where they actually sit in relation to these shifts. Not where they think they sit. Where they actually sit.

The agent infrastructure stack is forming now. The layers I've described here will look different in 18 months. Some of these companies will win. Some will be absorbed by platform providers. Some will be replaced by standards that don't exist yet.

The structural shift is real. The new customer for infrastructure is the agent. The leaders who understand that, who can read the stack and make deliberate bets, will build the next generation of durable companies.

The ones who don't will wonder why their AI investments stopped compounding.

If you want to pressure-test where your organization stands against this shift, that's what we do.

Infographic

Frequently Asked Questions

What is agent-first infrastructure?: It is software infrastructure built for AI agents as the primary user, not humans. That means defined interfaces for identity, compute, memory, communication, and payments, designed so an autonomous system can authenticate, execute, remember, and transact without a person clicking buttons.
What are the six layers of the agent infrastructure stack?: Compute sandboxes (where agents run code safely), identity and communication (how agents prove who they are), memory (what agents retain across sessions), integrations (how agents connect to enterprise tools), payments (how agents transact without human sign-off), and orchestration (how multiple agents coordinate at scale).
Which layer of the agent stack is most underdeveloped?: Orchestration. There is no production-grade system yet for scheduling, lifecycle management, failure recovery, cost controls, and supervision hierarchies across multi-agent workflows. Every team is hand-rolling their own solution. Gartner reported a 1,445% surge in multi-agent system inquiries between Q1 2024 and Q2 2025, but the tooling has not caught up.
What is the platform risk for standalone agent memory providers?: Every major frontier lab is building memory directly into their models. If memory becomes a model-level feature the way search became part of ChatGPT, standalone memory companies face serious existential pressure. The counter-argument is portability: businesses may not want a single hyperscaler owning their agent's memory. That trade-off is genuinely uncertain.
How does agent infrastructure affect system reliability for businesses?: Reliability multiplies across components. If five infrastructure layers each run at 97% uptime, your end-to-end system uptime is roughly 86%. That is not a rounding error. That is the difference between a system your team trusts and one they work around. Leaders need to know how many layers their agent workflows depend on and what each layer's actual reliability record is.
What should business leaders do now to prepare for the agent infrastructure shift?: Build context engineering as an internal competency, adopt eval-driven development so agents can produce outcomes without constant human review, and develop real stack literacy across your leadership team. You need to know which components are agent-native, which are shims, and which layers represent genuine competitive risk if they fail or get replaced by a standard.