True AI returns come from revenue and platforms, not cost cuts
Executives at major corporations say headcount savings are not enough to justify AI spending, and call for clearer links to revenue and process outcomes

Proving that artificial intelligence (AI) delivers tangible business value, rather than merely cutting costs, has become the defining challenge for executives scaling autonomous systems in 2026.
Organizations that frame AI returns in terms of output metrics, platform longevity and data infrastructure gains are more likely to secure sustained investment than those that rely on headcount savings alone.
“Return on investment (ROI) is not a cost-succession feature,” said Riccardo Calliano, Vice President of Finance for GenAI Commercial Investments at GSK Pharmaceuticals. “ROI is a business metric. If I have an AI program that helps me better target my customers, my ultimate impact is on sales. How much have I increased my sales versus my target because of my tool?”
Calliano outlined three ways to evaluate ROI:
Tying it directly to a business outcome such as revenue or customer conversion.
Treating AI as a platform asset, similar to how enterprise resource planning (ERP) systems are assessed in corporate finance, where upfront investment enables future use cases at marginal cost.
Recognizing the data engineering work underpinning an AI rollout as an asset that reduces the cost of subsequent applications.
“If you developed a new generative AI (GenAI) assistant with deterministic and non-deterministic workflows, you prove it works, and then plug in additional use cases at just a marginal cost,” he said.
Divyansh Saxena, Vice President of AI & Data at HomeServe EMEA, noted that the cost of building a proof of concept (POC) has fallen sharply. Two years ago, the same process could take an entire quarter. Today, it can be done in 1 to 2 weeks.
“If you want to prove something quickly to your business, it shouldn’t take more than a week or two to build a quick POC based on the current frameworks,” Saxena said. “ROI comes later once you have proven things, but the amount of investment required to prove or disprove things is much less in 2025 to 2026.”
On productivity, Calliano cautioned against inflating the numbers. “When we’re talking about productivity, we should talk about extracted value, not saving 10% of my time and adding it all up,” he said. “If you’re extracting time, are you really eliminating activities? Are you reallocating people to other activities so that the process is more efficient?”
Franny Hsiao, EMEA Leader for AI Architects at Salesforce, warned that organizations failing to anchor AI programs to a clear north star risk accumulating the wrong kind of complexity. “We’re creating even more agentic silos and technical debt,” she said.
GSK is one of the world’s largest pharmaceutical companies, operating a global AI program that combines machine learning and GenAI into composite solutions designed to optimize commercial investment. HomeServe EMEA is the European arm of the property repair and insurance membership group, currently rolling out AI agents for internal customer operations.
Salesforce provides cloud-based customer relationship management (CRM) and AI platform software to enterprises globally.
Messy data, failed pilots
The panel, “From Prototype to Production: The Journey of AI in Autonomous Systems,” was moderated by Ravi Jay, Vice President and Global Head of Agile Center of Excellence (CoE) at Sanofi, at the AI & Big Data Expo, part of TechEx, in London.
Panelists included Calliano, Saxena and Hsiao alongside Christine Foster, General Manager for AI and Automation at Experian UK & Ireland.
On moving AI from pilot to production, Calliano described three tests a pilot must pass: desirability, feasibility and value. Piloting works well under an agile methodology with continuous minimum viable products (MVPs), but the real challenge begins at scale.
“The real value gets delivered when you move to scale up, and scaling a tool is very challenging,” he said. “The value to the customer gets impacted by how many customers you have, how they access your tool, your infrastructure.”
Large organizations operating across multiple markets must also balance agile and waterfall development methodologies simultaneously.
“The biggest barriers are really related to organizational barriers,” Calliano said. “It’s not really technological barriers.”
Hsiao identified a structural flaw that undermines many pilots. Most customers, she said, build their pilots on simplified data and workflows, without accounting for the messy, complicated reality of enterprise data.
Governance and observability must be built in from the start, not retrofitted after deployment. Fear of job displacement is also one of the most common reasons pilots stall.
“A lot of pilots fail when people don’t adopt your technology because they are afraid that their jobs are being taken away,” Hsiao said. “Providing that psychological safety first is really important.”
Foster pushed back on the casual use of “human in the loop” as a reassurance mechanism.
“People simplify human in the loop probably too much,” she said. “Sometimes you might have somebody explain a use case and say, ‘but don’t worry, it’s human in the loop.’ It’s quite frankly a lazy way of telling you it’s going to be okay.”
She proposed a three-part framework for evaluating human involvement: at design time, at runtime inference and at the observability layer.
“At runtime, are you expecting a person to keep up with the speed of 100,000x processing?” Foster said. “Think about that one.”
Responsibility by design
The appropriate level of human oversight also depends on industry context. HomeServe’s lending operations are regulated by the Financial Conduct Authority (FCA), which places strict limits on automated financial advice. Consumer apps such as fitness or gaming platforms face no such constraints.
Saxena pointed to LangGraph, an open-source agentic framework built to incorporate human-in-the-loop functionality, as a tool available for regulated deployments.
On responsible AI, Foster drew on a Canadian engineering tradition in which graduates receive a ring made from metal from a collapsed bridge, a physical symbol of professional accountability.
“That responsibility is not just some layer in the tech stack; it’s not something that you get to buy from a vendor,” she said. “It has to be designed right throughout.”
Hsiao agreed, stressing that the standard for responsible behavior is not fixed.
“It’s not something you check off on a checklist, and it’s also not something you do at once and done,” she said. “What responsible means for different people, for different groups, different societies — that concept is ever evolving. It has to be part of your DNA, basically ingrained into the culture.”
Foster added that the UK’s principles-based regulatory approach offers a competitive advantage. “The regulators ask for proof that you’re adhering to the principles, not promises,” she said.
During audience questions, Saxena explained how large language model (LLM)-as-a-judge technology can address compliance at scale. A compliance officer required to review lending-related phone calls cannot realistically listen to more than a fraction of the volume. An LLM deployed as a judge can process all 1,000 calls and surface potential violations.
“Where LLM-as-a-judge comes in is the scalability aspect,” Saxena said. “You can have an LLM go through 1,000 of them and give its opinion.”
Calliano added that the human reviewer’s role then shifts to high-risk edge cases.
“The person will not be drowned in tons of data but will focus on specific high-risk activities already highlighted,” he said.
Looking ahead, Hsiao called for a shift toward “frugal AI,” advocating for batched workloads and smaller specialized models over continuous real-time inferencing at scale. Calliano envisioned composite AI architectures that combine machine learning, GenAI, and agents tuned to specific business problems, especially in accuracy-critical sectors such as pharmaceutical R&D.
Saxena cautioned that as AI systems grow more capable, understanding the “why” behind model outputs is becoming harder, making experienced AI practitioners in the boardroom an increasingly valuable guide between technical possibility and business need.


