Qualcomm says inference economics decide where edge AI runs
A senior semiconductor executive explains how cost, latency, and privacy pressures are accelerating the shift from cloud AI to devices

Rising inference costs, power constraints, and privacy concerns are increasingly shaping where artificial intelligence is deployed, pushing workloads away from centralized clouds and closer to devices at the edge.
As generative AI moves from experimentation to everyday use, the economics of running models—rather than their raw capabilities—are emerging as the defining constraint, alongside hard latency limits that cloud-based systems cannot always meet. Training large AI models may remain the domain of hyperscale data centers, but inference, which must be executed repeatedly and at scale and often in real time, is forcing companies to rethink where intelligence should live.
For technology providers, that shift has turned inference from a technical detail into a balance‑sheet question. As usage scales, the cumulative cost of serving AI responses in the cloud can quickly outweigh the benefits of centralization.
That calculation looks very different when AI runs on end devices such as smartphones, vehicles, PCs, or other edge hardware powered by dedicated chips.
“Once you purchase the device, running an extra inference event is effectively free,” said Akash Palkhiwala, chief financial officer and chief operating officer of Qualcomm Inc. “In the cloud, the resources are extremely expensive.”
Palkhiwala shared these views during an interview conducted by Richard Waters, tech writer at large at the Financial Times, at The Global Boardroom, an online event organized by FT Live on December 9.
Palkhiwala said inference costs are becoming a critical issue as AI applications scale beyond pilot projects. Each additional query, recommendation, or autonomous action incurs compute and energy costs when processed in the cloud, creating a business model challenge for application providers.
He said running AI locally on devices fundamentally shifts that cost structure. Once the silicon is deployed, inference does not incur incremental usage fees, making edge execution particularly attractive for high-frequency or always-on workloads.
“There are multiple reasons to run AI on the device,” Palkhiwala said. “Latency is one. Privacy and security are massive reasons. And the final one is cost.”
He said these pressures become more pronounced in emerging markets, where affordability constraints limit how much cloud-based inference can be absorbed into pricing models.
Hybrid AI model
Rather than replacing the cloud, Qualcomm sees a hybrid architecture emerging as the industry standard. Training and large-scale reasoning will continue to rely on data centers, while smaller or distilled models are deployed on devices for real-time interaction.
“We believe the future is hybrid AI,” Palkhiwala said. “Some processing happens on the device, some in the cloud, and the use case determines the split.”
He compared the shift to the evolution of computing itself, in which workloads are routinely split between local hardware and centralized servers. He said AI will follow the same pattern as applications mature, with optimization replacing novelty.
Automotive systems illustrate the logic clearly. Autonomous driving requires immediate decision-making and cannot tolerate network delays or outages.
“You cannot tolerate the latency of going to the cloud and coming back,” Palkhiwala said, describing vehicles as a natural environment for edge inference.
Beyond cost, Palkhiwala emphasized that devices inherently possess richer contextual awareness than remote servers. Location, sensor data, user behavior, and environmental signals are often available locally, enabling more responsive and personalized AI behavior.
“There is certain data that you don’t want to send to the cloud,” he said. “You can run AI on the device.”
This approach also aligns with growing regulatory and consumer scrutiny around data protection. By keeping sensitive information on-device, companies can reduce exposure while still delivering intelligent features.
He said these advantages explain why edge AI is already deployed at scale in areas such as automotive systems and is beginning to appear in consumer products such as smart glasses and other personal AI devices, where on-device inference enables real-time interaction without constant cloud connectivity.
Inference reshapes centers
The shift toward inference-heavy workloads is also changing the role of data centers themselves. Palkhiwala said much of the infrastructure built to date has been optimized for training, but inference and reasoning place different demands on hardware.
“Performance per watt becomes very important,” he said. “That is something Qualcomm does very well.”
He said memory bandwidth and energy efficiency are emerging as key constraints, opening opportunities for specialized architectures rather than general-purpose accelerators.
Qualcomm’s strategy is to adapt technologies initially developed for battery-powered devices and scale them for data center environments, where power availability and operating costs are increasingly scrutinized.
Clear inflection
While the economic case for edge inference is strengthening, Palkhiwala cautioned that organizational change remains a significant hurdle.
“Changing people is a tough business,” he said. “Sometimes that takes longer than the technology itself.”
Deploying AI at scale often requires redesigning workflows, retraining staff, and adjusting incentives—work that cannot be automated away. As a result, he said, timelines should be measured in years rather than months.
Looking ahead, Palkhiwala said inference economics will also shape the rise of AI agents—software systems that act autonomously on behalf of users.
Rather than interacting with individual applications, users may delegate tasks to personal agents that negotiate with other agents and services.
“It becomes machine-to-machine,” he said, pointing to emerging agent-to-agent and data-access protocols.
Despite concerns about overbuilding infrastructure, he said current investment levels are rational given AI’s long-term productivity potential.
“The worst-case scenario is we invested a little bit too early,” he said. “Eventually, the use cases are going to come at scale, and the capacity that’s being built now will be used.”
For Qualcomm, inference economics provide a unifying lens across edge devices, data centers, and emerging categories such as robotics.
“We’re very confident about where this is going,” Palkhiwala said. “I prefer to talk about what the opportunity is two to five years out, because that seems very clear.”
As AI transitions from experimentation to infrastructure, the decisive question is no longer what models can do, but where they should run—and at what cost.


