Back to Blog
IndustryJune 20268 min read

The Production Gap: AI Won Dealmaking, but Agents Keep Dying Before They Ship

By Arvya Team

The Production Gap: AI Won Dealmaking, but Agents Keep Dying Before They Ship

For two years, the argument about AI in dealmaking was about whether it worked. That argument is over. The interesting question now is a harder one: why, when the technology so clearly works, do so few teams actually trust it to run anything.

Both things are true at the same time, and the gap between them is the most important story in enterprise AI right now.

The adoption number is no longer in dispute

Deloitte's most recent M&A study found that 86% of firms have integrated generative AI into their deal workflows, and roughly two-thirds of them did it in the last year alone. The share of teams investing in AI for pre-deal work — research, target screening, early diligence — climbed from 31% to 46% in two years. Bain reports that more than 60% of private equity firms are already using at least one AI tool for sourcing or diligence, and among them, 83% have put a million dollars or more behind it.

The returns are real enough to explain the spending. McKinsey's work on gen AI in M&A points to roughly 20% cost reductions and deal cycles that run 30 to 50% faster for teams that have actually operationalized it. The capability case is settled. The model is good enough to draft the update, screen the target, and summarize the data room.

And yet the agents keep dying before they ship

Now hold that 86% next to a different set of numbers. Gartner expects more than 40% of agentic AI projects to be cancelled before the end of 2027. Industry teardowns of enterprise deployments suggest only a small fraction of agents ever reach production at all — the rest stall out in pilot. And of the organizations that did push customer-facing agents live, a striking share were forced to pull them back, most often after the system touched data it should not have.

This is the pattern everyone in the market now recognizes: a demo that lands, a pilot that impresses, and then a quiet death somewhere between the prototype and the live mandate. The capability was there. The trust was not.

It was never a model problem

The instinct, when an agent fails, is to blame the model — it is not smart enough, it hallucinated, wait for the next release. That is almost always the wrong diagnosis. As one widely-shared analysis of the "agentic reckoning" put it bluntly: enterprises have a runtime problem, not a model problem.

The failure mode is structural. A small error early in a chain of reasoning does not stay small. A confident mistake in step three becomes a corrupted conclusion by step twelve, and nobody caught it because nothing in the system was built to catch it. Logging the conversation is not the same as being able to explain the decision. When someone asks why the agent did what it did, "the model said so" is not an answer a deal team — or a regulator — can accept.

The spending tells the same story. By one estimate, for every $735 enterprises spend on AI capability, they spend about a dollar on AI trust, risk, and security. Firms bought the horsepower and forgot the brakes, then acted surprised when they could not take the car onto a real road.

Dealmaking is the hardest place to earn that trust

Every industry feels this gap. Deal teams feel it most, because the work is among the most confidential, regulated, and relationship-driven anywhere. The cost of a wrong autonomous action here is not an awkward support reply. It is a misstated buyer position sent to a client, a CRM write that quietly corrupts the system of record, a term that should never have left the room.

Add ethical walls to that. A general-purpose agent that can see across every deal in the firm is not a productivity feature — it is a conflict waiting to surface in the worst possible meeting. This is the uncomfortable irony of AI in finance: the demo-to-production gap is widest exactly where the return on getting it right is largest.

What actually closes the gap

The teams crossing from pilot to production are not the ones with access to a smarter model. Everyone has the same models. They are the ones who wrapped the model in the things that make its output safe to act on. In practice, that comes down to four unglamorous requirements.

  • Every output cites its source. A claim with no provenance is a liability. The answer should point to the exact email, document, or call it came from, so a human can verify in seconds instead of trusting on faith.
  • Nothing irreversible happens without a person. The system drafts, proposes, and prepares. Sending the email, writing to the CRM, booking the meeting — those wait for a human to say yes. Draft-first is not a limitation; it is the only thing that makes an agent usable in a regulated business.
  • Every action is logged and explainable. Not the conversation — the decision. Inputs, sources, reasoning, output, and who approved it. This is the difference between an answer to a regulator and an embarrassment in front of one.
  • It runs where the data already lives. Deal data is too sensitive to leave the firm's environment for the vendor's convenience. In-tenant deployment and enforced walls are not nice-to-haves; they are the precondition for the security team ever saying yes.

None of this is about the model. All of it is about trust. It is the least glamorous part of building AI for finance, and it is the entire game.

The bottleneck moved. Most firms are still aiming at the old one.

For years the constraint in enterprise AI was capability: could the model actually do the work. That constraint is gone. The work is doable. The constraint now is whether you can trust what the model produces enough to let it near a live deal — and that is a problem you solve with architecture, not with a bigger model.

The firms that win the next phase will not be the ones with the smartest model. Everyone will have that. They will be the ones who built the layer that makes a model's output safe to act on — sourced, approved, auditable, and inside their own walls. Capability crossed the line two years ago. Trust is the line now.

That layer — a cited, approval-first record of the deal that an agent can prepare work against and a human signs off on — is exactly what we are building at Arvya. Not a smarter guesser. A system a deal team can actually trust to touch the deal.

Want to see Arvya in action?

Bring a live deal workflow. We'll show you the output in 30 minutes.