
A few weeks ago, I watched a demo where one company’s AI agent negotiated a procurement contract with another company’s AI agent, in seconds, on a protocol neither of them shipped a year earlier. No human signed off on either side until the deal was done. The room loved it. I left thinking about a different question: at what point in this pipeline did a human being’s judgment, taste, or accountability actually matter? The honest answer was, not really anywhere. And that demo isn’t a thought experiment anymore. Anthropic released the Model Context Protocol in November 2024 as „a universal, open standard for connecting AI systems with data sources“ [1]. Five months later, in April 2025, Google announced Agent2Agent as „an open protocol that complements Anthropic’s Model Context Protocol“ — explicitly designed for cross-vendor multi-agent collaboration [2]. By June 2025 the Linux Foundation took over the A2A project with „more than 100 leading technology companies“ backing it, including AWS, Cisco, Salesforce, SAP, Microsoft, and ServiceNow [3]. Microsoft made MCP first-party across GitHub, Copilot Studio, Dynamics 365, Azure AI Foundry, and Windows 11 at Build 2025 [4]. The infrastructure for bots talking to bots is no longer a roadmap; it’s standard plumbing. So the question I want to ask is the one nobody at the demo seemed to be asking: what for?
Let me describe the landscape as it actually is in mid-2026, without hype and without doom. The agent stack has consolidated faster than almost any enterprise technology I’ve seen in twenty years. MCP gives an agent access to data and tools. A2A gives one agent a way to discover and call another agent across vendor boundaries. Identity layers — Microsoft is now provisioning every Copilot Studio and Foundry agent with its own Entra directory entry to „avoid agent sprawl“ [4] — turn agents into first-class organizational citizens with credentials and audit trails. On top of that sit the orchestration platforms: Salesforce’s Agentforce, ServiceNow’s AI Agent Control Tower, Microsoft Copilot Studio, AWS Bedrock Agents. And these aren’t slideware. Walmart, in a Harvard Business Review case study, has had Pactum’s autonomous-negotiation agent close contracts with the majority of its tail-spend suppliers [5]. The MCP reference repository has roughly 85,000 GitHub stars [6] — a signal that developers, not just CIOs, have voted. We are not debating whether agents will mediate substantial parts of B2B commerce; we are watching them do it. The shift I want to flag is subtler than „agents are powerful.“ It is that the interface between organizations — historically the place where a salesperson, a procurement officer, a customer success manager, a regulator made a judgment call — is being silently re-platformed onto a software protocol. And once an interaction crosses that threshold, the cost of inserting a human back into it goes up by an order of magnitude.
This is where I have to be honest about my own industry. The reason humans get removed from these loops isn’t because someone in a strategy offsite said „let’s get the humans out.“ It’s because the unit economics of doing so are extraordinary, and in a publicly-traded firm answering to shareholders quarterly, extraordinary unit economics are gravity. Klarna’s CEO publicly claimed in 2024 that its OpenAI assistant was doing „the work of 700 full-time agents.“ Then in 2025 he reversed course and said Klarna had „gone too far“ and would re-hire humans, because customer service quality had collapsed. Both halves of that story are the point. The first half shows you why every CFO is running the math. The second half shows you that the math, when the only variable is cost, is wrong — and that without a doctrine telling you what else to optimize for, you only learn it after the damage. Daron Acemoglu has spent a decade documenting what he calls „so-so automation“ — automation that displaces labor without producing meaningful productivity gains — and his 2024 NBER paper estimates AI will raise total factor productivity by no more than 0.66% over ten years, while widening the gap between capital and labor income [7]. Erik Brynjolfsson’s Stanford work tells the complementary story: in his customer-support field study, generative AI raised novice productivity by 34% and top-performer productivity by roughly 0%, dramatically improving outcomes when used as a tool for workers [8]. The same technology, deployed two ways, produces two civilizations. Which one we get is a choice. The IMF’s January 2024 staff note estimates 40% of global jobs are exposed to AI, 60% in advanced economies, with about half of that exposure in the substitution zone [9]. A capitalist firm, optimizing rationally for quarterly margin, will lean into substitution every time the contract allows it. That’s not a moral failure; it’s the system working as designed.
In August 2019, the Business Roundtable — 181 American CEOs — formally retired the Friedman doctrine. Their statement explicitly „supersedes“ prior shareholder-primacy language and commits the firms to „all stakeholders — customers, employees, suppliers, communities and shareholders“ [10]. It was a remarkable document. It also came with no enforcement mechanism, no governance change, no compensation realignment, and — by most empirical assessments since — no measurable change in behavior. I sat in management meetings during the years that followed where stakeholder language was used in slide decks while the same firms ran the same workforce-reduction calculations. This is not cynicism; it’s the structural reality that a public corporation’s directors have fiduciary duties calibrated to shareholders, and „stakeholder capitalism“ without enforceable governance is a brochure. Now overlay the agent transformation on top of this purpose vacuum. We are handing the most leveraged labor-substituting technology in modern history to organizations whose declared values are stakeholder-centric and whose actual incentive gradient is shareholder-centric. Of course they will deploy it to remove humans from the loop. They are doing precisely what their structure tells them to do. The mistake is to expect anything else without changing the structure. And the most common organizational response I see — appointing an AI ethics board, signing a principles document, attending an AI safety summit — does not change the structure. It changes the slide deck about the structure.
The most telling pattern I see across enterprise AI deployments right now is what’s happening to „human in the loop“ as a phrase. Article 14 of the EU AI Act requires high-risk AI systems to be „designed and developed in such a way, including with appropriate human-machine interface tools, that they can be effectively overseen by natural persons“ [11]. The Act even names the failure mode it’s trying to prevent: it requires overseers to „remain aware of the possible tendency of automatically relying or over-relying on the output produced by a high-risk AI system (automation bias)“ [11]. This is one of the most thoughtful regulatory texts I’ve read on this subject. And in deployment after deployment, what I see in practice is the legal department pasting Article 14 language into a procurement template and the operations team reducing „human oversight“ to a click-through screen on top of an output the human cannot meaningfully evaluate. Madeleine Clare Elish coined the term „moral crumple zones“ for exactly this: humans nominally in the loop who absorb the moral and legal liability for failures of largely autonomous systems they had no real capacity to override [12]. Ben Green’s peer-reviewed work on algorithmic decision-making finds, empirically, that humans paired with risk-assessment tools fail to improve fairness or accuracy and exhibit disparate deference patterns [13]. The clinical-AI literature — Goddard, Roudsari, Wyatt’s systematic review in JAMIA — confirms automation bias is widespread and that human oversight alone does not mitigate it [14]. We have a generation of empirical evidence that „put a human in the loop“ is not a safety mechanism unless the human has time, expertise, accountability, and the genuine capacity to dissent. None of those conditions are produced by a checkbox in a vendor contract. And yet checkbox HITL is what most enterprise AI deployments are converging on, because it satisfies the regulator at the lowest possible cost. That’s not human oversight. That’s a moral crumple zone with a help-desk ticket.
I want to be specific here, because the abstract version of this argument always loses to the specific deployment economics. So let me name some of the cases where humans were removed from the decision loop, the system was deployed, and the harm was documented. Australia’s Robodebt scheme issued roughly 470,000 wrongful welfare debts via an automated income-averaging algorithm; the 2023 Royal Commission called it „a costly failure of public administration, in both human and economic terms,“ and the Federal Court approved an A$1.8 billion settlement [15]. The Dutch childcare benefits scandal — toeslagenaffaire — saw an algorithmic risk-classification system wrongly accuse roughly 26,000 parents of fraud between 2005 and 2019, often demanding tens of thousands of euros back, and ethnically profiled families through dual-nationality flags; the third Rutte cabinet resigned over it in January 2021 [16]. The U.S. Electronic Privacy Information Center filed an FTC complaint against HireVue in 2019 alleging its AI video-interview tool harmed candidates via biometric collection and bias on gender, race, sexual orientation, and „neurological differences“; HireVue subsequently dropped facial analysis [17]. The content moderators at Sama in Nairobi — workers labeling toxic content for, among others, OpenAI’s training data — were paid under $2 an hour, recruited under false pretenses (told they would be call-center staff), and exposed to graphic material; one of them, Daniel Motaung, called the work „torture“ and was fired for organizing [18]. Each of these is a system where the answer to „does a human review this?“ was, in practice, „not in any way that mattered.“ Each produced damage that ran into hundreds of millions of euros, broken governments, ruined careers, traumatized workers. None of these were edge cases. They were the predictable output of optimizing a deployment for cost-per-decision while treating „human oversight“ as overhead. The agent transformation, without a doctrine, scales this pattern from public-sector welfare and HR into every B2B interface in the economy.
Here is where I depart from the dominant register of this conversation, which oscillates between regulator-will-save-us optimism and capitalism-will-eat-us-all fatalism. Neither is right. Regulation matters — the EU AI Act is a serious instrument and the OECD’s updated 2024 AI Principles, signed by member countries, give us a shared vocabulary [19]. But regulation, even good regulation, is fundamentally reactive and slow. The agent stack is moving in months; legislative cycles move in years. If we wait for the regulator to draw the lines, the lines will be drawn around damage that already happened. What we need — what I want to argue for — is an industry doctrine: a set of standards that AI suppliers and the corporations deploying them adopt and hold each other to, not because the law forces them, but because they have collectively decided this is what professional practice in this technology looks like. Some of the architecture for this already exists. Anthropic operates as a Delaware Public Benefit Corporation with a Long-Term Benefit Trust empowered to elect a majority of its board [20], and its updated Responsible Scaling Policy commits the firm to not deploying models „unless we have implemented safety and security measures that keep risks below acceptable levels“ [21]. Google DeepMind’s Frontier Safety Framework defines Critical Capability Levels and gating mitigations [22]. Stanford HAI explicitly couples AI development with humanities, social-science, and policy work [23]. The OECD AI Policy Observatory now hosts the merged Global Partnership on AI [19]. These are not nothing. They are also not enough — they apply at the frontier-lab layer, while the substitution decisions happen at the customer-deployment layer, and there is currently no equivalent doctrine for the firm that buys the agent and decides what it gets to do without a human present. A real doctrine would, at minimum, do four things. First, it would treat human-in-the-loop as a design constraint, not a compliance step — meaning the human has to have the time, the expertise, the information, and the institutional power to dissent, or you can’t claim HITL. Second, it would distinguish the deployments where automation augments human capacity (Brynjolfsson’s novice-uplift case) from those where it substitutes for human judgment, and require a higher governance bar for the second. Third, it would require „moral crumple zone“ audits — explicitly identifying who carries the liability for an automated decision and whether they have the capacity to actually catch errors. Fourth, it would require firms to publish their substitution decisions: how many roles, in which functions, were eliminated by which agent system — the same way they publish emissions. We need a Greenhouse Gas Protocol for human displacement. None of this is utopian. All of it is achievable by the firms in this room agreeing it is what professional practice looks like, before the regulator decides for them, and before the damage compounds.
I want to close with the question the demo I started this with should have raised but didn’t. What is the agent transformation actually for? Not what could it do — the answer to that is enormous and exciting. What is it for? Whose flourishing is the system optimizing toward? In the formulation I keep coming back to, technology is a multiplier — it doesn’t have a destiny, it has a direction, and the direction is set by who is permitted to extract the rents and what they have agreed to optimize for. If the answer is „the agent transformation is for shareholders, and the human in the loop is overhead,“ we will get exactly that economy: a B2B layer where companies talk to companies through bots, where the friction-reducing benefit is captured by capital, where labor share continues its forty-year decline, and where the externalities — eroded social systems, diminished trust, the moral crumple zones absorbing every algorithmic failure — are paid by everyone who isn’t a shareholder. That is the default trajectory. It is what happens if we don’t decide otherwise. The alternative isn’t to slow the technology down; the technology is genuinely good and I want it to keep moving. The alternative is to be honest that „good for shareholders“ and „good for humanity“ are not the same target, and to commit, as an industry and as professionals, to building the second target into how we deploy. That commitment is not a feeling. It is a doctrine, an audit, a published number, a refused deployment. It is harder than signing a principles letter. It is the actual work. The agents are coming whether we are ready or not. The question is whether we let them be a tool that augments the people who have made every prior generation of technology worth having — or whether we let them be the instrument by which we quietly remove ourselves from our own economy. I know which answer I want, and I know it doesn’t write itself.
Sources
- Anthropic. „Introducing the Model Context Protocol.“ 2024-11-25. anthropic.com/news/model-context-protocol [T1]
- Google Developers Blog. „A2A: A new era of agent interoperability.“ 2025-04-09. developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability [T1]
- The Linux Foundation. „Linux Foundation launches the Agent2Agent Protocol Project.“ 2025-06-23. linuxfoundation.org — A2A press release [T1]
- Microsoft. „Microsoft Build 2025: The age of AI agents and building the open agentic web.“ 2025-05-19. blogs.microsoft.com — Build 2025 [T1]
- Bouquin, Niemtzow & Stack. „How Walmart Automated Supplier Negotiations.“ Harvard Business Review, November 2022. hbr.org/2022/11/how-walmart-automated-supplier-negotiations [T2]
- modelcontextprotocol/servers GitHub repository (~85,000 stars as of 2026-05-06). github.com/modelcontextprotocol/servers [T1]
- Acemoglu, Daron. „The Simple Macroeconomics of AI.“ NBER Working Paper 32487, May 2024. nber.org/papers/w32487 [T1]
- Brynjolfsson, Li & Raymond. „Generative AI at Work.“ NBER Working Paper 31161, 2023. nber.org/papers/w31161 [T1]
- International Monetary Fund. „Gen-AI: Artificial Intelligence and the Future of Work.“ Staff Discussion Note SDN/2024/001, January 2024. imf.org — SDN/2024/001 [T1]
- Business Roundtable. „Statement on the Purpose of a Corporation.“ 2019-08-19. businessroundtable.org — Statement [T1]
- EU AI Act, Article 14 — Human Oversight. artificialintelligenceact.eu/article/14 [T1]
- Elish, Madeleine Clare. „Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.“ Data & Society. datasociety.net — Moral Crumple Zones [T1]
- Green, Ben. Publications page (incl. „The Flaws of Policies Requiring Human Oversight of Government Algorithms“ and „The Principles and Limits of Algorithm-in-the-Loop Decision Making“). benzevgreen.com/publications [T1]
- „Automation bias“ — Wikipedia, citing Goddard, Roudsari & Wyatt (JAMIA 2012) and Mosier et al. (1997). en.wikipedia.org/wiki/Automation_bias [T2]
- „Robodebt scheme.“ Wikipedia (sourcing the 2023 Royal Commission and the A$1.8B Federal Court settlement). en.wikipedia.org/wiki/Robodebt_scheme [T2]
- „Dutch childcare benefits scandal.“ Wikipedia (sourcing the parliamentary inquiry and 2021 Rutte cabinet resignation). en.wikipedia.org/wiki/Dutch_childcare_benefits_scandal [T2]
- „HireVue.“ Wikipedia (sourcing the 2019 EPIC FTC complaint). en.wikipedia.org/wiki/HireVue [T2]
- „Sama (company).“ Wikipedia (sourcing the Motaung case and the Time reporting on Kenyan moderators). en.wikipedia.org/wiki/Sama_(company) [T2]
- OECD AI Principles (adopted 2019, updated May 2024). oecd.ai/en/ai-principles [T1]
- Anthropic. „The Long-Term Benefit Trust.“ 2023-09-19. anthropic.com/news/the-long-term-benefit-trust [T1]
- Anthropic. „Announcing our updated Responsible Scaling Policy.“ 2024-10-15. anthropic.com — RSP update [T1]
- Google DeepMind. „Introducing the Frontier Safety Framework.“ 2024-05-17. deepmind.google — Frontier Safety Framework [T1]
- Stanford HAI — About. hai.stanford.edu/about [T1]
All URLs verified HTTP 200 on 2026-05-06.


