From PoC to Production: Why Enterprise AI Struggles for Trust
As AI pilots multiply across large organizations, governance and ownership—not technology—decide what reaches production

Large enterprises are no longer short of experiments involving artificial intelligence. Across banking, logistics, energy, and industrial sectors, proof‑of‑concept (PoC) projects have proliferated as executives and business units test generative models, agents, and automation tools. What remains scarce is production‑grade AI that organizations are willing to trust with real operations at scale.
The gap between experimentation and deployment has become one of the defining challenges of enterprise AI adoption. The issue is no longer whether the technology works in isolation, but whether it can be governed once real data, compliance obligations, and operational risk come into play.
“If a business process depends on an AI system, and people are using it with real data, then you need governance, reliability, and compliance,” said Natalia Konstantinova, global architecture lead in AI at NatWest Group, a UK banking group.
Her definition draws a clear line between experimentation and production. Many AI initiatives appear successful in controlled pilots but fail when exposed to operational reality.
“Sometimes we do PoC with a subset of data, and say, ‘Oh, it looks really great,’ but then you find that the production data is not clean or people are not ready,” she said.
That pattern has repeated across industries. Early pilots often demonstrate technical promise, only to stall when confronted with fragmented data, unclear ownership, or the practical demands of scale.
In many cases, the failure is not technical but organizational in nature. Models may perform well in isolation, but responsibilities for data stewardship, risk oversight, and operational support remain undefined.
As pilots move beyond experimentation, questions emerge around who owns the system, who is accountable for errors, and who has the authority to intervene when outputs are wrong. Without clear answers, projects struggle to progress beyond demonstration.
Accountability before scale
These issues were explored at the AI Summit in London during a panel discussion titled From PoC to Production: Accelerating AI to Market. The session was moderated by Giovanni Ughi, co‑founder and chief executive of Bryo, a company that builds AI tools to accelerate technical sales and quotation workflows for industrial products.
The panel brought together enterprise and startup perspectives on why AI progress slows after early experimentation. Speakers included Konstantinova; Jakob Kær Bille Krogh Nielsen, head of capital markets at Maersk, the Copenhagen-based shipping and logistics group; and Kristian Portz, co‑founder of Masumi, a blockchain‑based network protocol for coordinating and monetizing AI agents.
For Portz, the central barrier to production adoption is accountability. He said organizations need to understand not only what an AI system produces, but how and why it produced that output, particularly when disputes or failures arise.
Without clear accountability, AI systems tend to remain confined to experimentation. Once real decisions, payments, or customer interactions depend on automated outputs, enterprises demand traceability and mechanisms for resolving errors.
That requirement marks a fundamental shift in how AI is evaluated. In production environments, performance metrics alone are insufficient.
Organizations also look for explainability, audit trails, and escalation paths that allow human operators to understand how a system reached a conclusion and to override it when necessary. These expectations are familiar in regulated industries but are now spreading more broadly as AI becomes embedded in everyday operations.
Why pilots collapse
While governance becomes critical at scale, many AI projects never reach that stage. Konstantinova said a common failure mode is experimentation driven by novelty rather than necessity.
In those cases, teams adopt new tools first and only later attempt to identify business problems they might solve. Without a clear owner or operational dependency, pilots lose momentum and quietly expire.
“There are some use cases that are really boring,” Konstantinova said. “No one wants to talk about them, but they have a really good return on investment.”
Those unglamorous applications—such as document processing and information extraction—are often the ones that survive, precisely because they are tied to existing workflows and deliver measurable value.
Unlike more speculative deployments, these systems are introduced to solve well‑defined problems that already consume time and resources. They reduce manual effort, lower error rates, and can be evaluated using clear benchmarks. As a result, they are easier to justify to senior management and less likely to be abandoned once initial enthusiasm fades.
That dynamic is familiar at Maersk, where AI has long been applied to shipping and logistics optimization. Nielsen said generative AI is now being explored more broadly across support functions, where teams often lack the budget or IT resources required for traditional software development efforts.
“Low‑code platforms provide this PoC very aggressively, very fast,” Nielsen said. “Then you can decide whether you want to push it to production or use it internally.”
Low‑code tools, he said, are effective at accelerating experimentation and surfacing ideas that would otherwise struggle to secure attention. The challenge comes later, when organizations must decide whether a workflow is important enough to justify stronger governance, engineering support, and long‑term maintenance.
Data before deployment
Across the panel, data readiness emerged as the most persistent obstacle to production AI. Large enterprises rarely start as data‑first organizations, and historical information is often fragmented across systems, formats, and geographies.
Nielsen said this is particularly evident in Maersk’s global real‑estate portfolio, which spans roughly 1,000 sites worldwide. Lease agreements are written in multiple languages and formats, making manual review slow and error‑prone.
“That was really the key for us to understand,” Nielsen said. “How can we really use the generative tool to help us fully understand extracting key pieces of data from a lease agreement?”
As AI capabilities improve, expectations around data quality rise accordingly across large organizations. Information that was once acceptable for manual processes becomes a liability when automated systems rely on it.
This shift often exposes long‑standing weaknesses in enterprise data estates. Contracts may be incomplete, scanned documents may be poorly digitized, and records may be spread across incompatible systems. Cleaning and standardizing this information can be time‑consuming, but panelists suggested it is a prerequisite for any serious attempt to scale AI beyond isolated use cases.
Governing production
As AI systems move closer to production, governance shifts from an abstract concern to an operational requirement. Once automated systems become embedded in core processes, failures can have real consequences if ownership and support structures are unclear.
Konstantinova warned that production deployments expose gaps that pilots can hide. “If a solution becomes part of a critical business process, you don’t want someone calling you at night saying the system failed and nobody knows how to fix it,” she said.
In practice, this means enterprises must treat AI systems more like critical infrastructure than experimental tools. Monitoring, incident response, and lifecycle management become essential capabilities, particularly as models are updated or retrained over time. Without these safeguards, organizations risk deploying systems they cannot fully control or maintain.
From a startup perspective, Portz said governance is inseparable from adoption. He argued that enterprises are unlikely to rely on AI systems unless they can trace what happened when something goes wrong. “If there is a dispute, we can go into the data and see what happened,” he said, describing the importance of auditability once AI outputs carry commercial or operational consequences.
For enterprises and their technology partners, the message is increasingly pragmatic. Solutions must address concrete problems, integrate with existing processes, and make accountability explicit rather than implicit.
Looking ahead, panelists agreed that the next phase of enterprise AI adoption will be defined less by breakthroughs in model capability than by progress in governance, data discipline, and organizational alignment. In that environment, trust is not a byproduct of success, but a prerequisite for scale.


