Enterprises are buying AI copilots faster than almost any software in history, and they are quietly struggling to find the return. MIT's NANDA initiative looked at 300 public deployments and found that 95% of enterprise generative-AI pilots produced no measurable impact on the bottom line. Only about one in twenty broke through.

That number should be confusing. The models are extraordinary. So why does so little of that capability show up in the P&L? The answer matters a great deal for anyone in investment banking or private equity deciding what to buy.

The copilot paradox: record adoption, missing ROI

Look at the most widely deployed copilot of all. In Gartner's 2025 survey of Microsoft 365 Copilot, only a small fraction of organizations that finished a pilot moved on to a broad rollout, and nearly half of respondents rated it somewhere around "some value, shows promise." A meaningful share of paid seats go unused week to week. McKinsey's 2025 State of AI found that only 39% of companies attribute any EBIT impact at all to AI, and among those that do, most say it is less than five percent.

None of this means the tools are bad. People get real value from them on individual tasks. It means something more specific: horizontal copilots are hard to turn into durable, measurable results on the work that actually matters to a business.

It was never the model. It is context and trust.

A general copilot walks into your deal knowing nothing about your deal. It has the public internet and whatever it can scrape from a document you paste, but it does not know which buyer went quiet, what the partner promised on the last call, or which version of the model is current. Enterprise hallucinations come overwhelmingly from this — weak grounding in the company's own context — not from a lack of raw intelligence.

There is a quieter trap, too: people consistently feel faster with AI than they are. In one controlled study, experienced developers using 2025-era AI tools were measurably slower on their tasks while believing they had been sped up by twenty percent. Speed that feels real and is not is exactly how a tool earns adoption without earning results.

Fluency was never the bottleneck. Defensibility was.

This is the part general-purpose AI keeps missing about deal work. A banker does not need a more eloquent paragraph. They need an answer they can defend — to a client, a partner, an investment committee, eventually a regulator. An unsourced claim you cannot trace back to the email or document it came from has a value of roughly zero on a live mandate, no matter how fluent it is or how fast it arrived. The horizontal copilot optimizes for a plausible answer. Specialized deal work demands a defensible one.

Specialized AI is not magic either — and that is the point

It would be easy to overclaim here, so let us not. Stanford's RegLab tested purpose-built legal AI and found that even these specialized, retrieval-grounded tools produced incorrect information on something like one in six to one in three queries. That is far better than general models, which hallucinated on the majority of legal questions — but it is not zero, and anyone promising "hallucination-free" AI is selling something.

The lesson is not that one kind of model is safe and another is not. It is that grounding, evidence, and human approval are what reduce and contain the risk to a level a serious firm can live with. That is an architecture choice, not a model choice.

What specialized deal AI does differently

The systems that work in regulated, high-stakes environments share a shape. They are grounded in the firm's own deal context rather than the open web. Every output cites its source, so verification takes a glance instead of an act of faith. And nothing irreversible happens without a human signing off. The market is voting on this: analysts now project the vertical-AI market growing several times faster than the broad software market over the next few years, precisely because generic tools keep hitting this wall.

For an IB or PE team buying AI in 2026, the useful question is not which copilot is smartest. They are all smart enough. The question is which system you can actually trust to touch a live deal. That is the system we are building at Arvya — deal-scoped, sourced, and draft-first, so the answer is one you can put your name on.

Copilot Can't Run Your Deal: Why Generic AI Stalls on Live Mandates

The copilot paradox: record adoption, missing ROI

It was never the model. It is context and trust.

Fluency was never the bottleneck. Defensibility was.

Specialized AI is not magic either — and that is the point

What specialized deal AI does differently

Want to see Arvya in action?