Scaling Enterprise GenAI Requires Platforms, Not Bigger Models
As organizations push generative AI beyond pilots, execution discipline, governance, and cost control are defining whether systems scale or stall
Rapid advances in model performance and eye-catching demonstrations have fueled early enterprise enthusiasm for generative AI.
Behind the scenes, however, many organizations are struggling to convert that momentum into durable capability. What works for a small group of users often collapses under the weight of cost, compliance, and operational complexity when rolled out more broadly.
“Scale doesn’t mean a big model or a very beefy LLM,” said Sandeep Mehta, staff engineer for AI at Dojo. “Scale means solving a problem—whether by a small model or a big model—and then scaling it to be used by more and more users. At that point, it becomes a SaaS problem.”
Mehta said that once generative AI becomes enterprise-facing, it must meet the same standards as any other mission-critical platform. That includes resilient infrastructure, consistent access control, governance across jurisdictions, and the ability to operate reliably at peak demand.
Dr. Nikolay Burlutskiy, senior manager for generative AI platforms at Mars, said the difficulty lies in the transition from limited use to broad adoption.
“When you scale from five users to thousands, the cost changes by an order of magnitude,” Burlutskiy said. “Scalability has to be cost-efficient; otherwise, you cannot justify the benefits.”
He said platforms must strike a balance between shared, generic capabilities and the flexibility smaller teams need to address specific requirements. Systems that are too customized are hard to govern, while overly rigid platforms risk low adoption.
At an AI Rush panel in London earlier this year, moderated by Squirro Chief Operating Officer Lauren Hawker Zafer, speakers discussed the topic “Cracking the Code: What It Takes to Build Scalable GenAI Platforms” and explored why most enterprise generative AI initiatives struggle to move beyond PoC.
Mehta said many proofs of concept (PoCs) fail because they are built as isolated solutions rather than as part of a long-term platform strategy.
“If you build one solution and don’t think about how AI will scale across the organization, you end up with point solutions,” he said. “That doesn’t work.”
“To avoid this, we are building a foundational layer so that engineers can embed AI into workflows out of the box, with security, governance, and scaling already handled,” he said.
He added that this allows engineers to integrate AI into workflows without having to reimplement controls each time.
The Economics of Scale
Cost discipline becomes unavoidable once generative AI leaves the pilot phase. Burlutskiy said enterprises must understand not only the cost of individual queries, but how those costs compound as usage spreads across the organization.
As adoption grows, even modest per-interaction costs can escalate rapidly. Without clear cost visibility and governance, organizations risk deploying systems that deliver technical novelty but fail to produce sustainable returns.
Mehta said cost comparisons must be grounded in real alternatives. If the expense of an AI-driven workflow exceeds the cost of human execution, the business case collapses. For this reason, many teams are prioritizing internal efficiency gains, where benefits can be measured and risks contained.
This economic pressure is also shaping architectural decisions. Enterprises are increasingly selective about where to deploy large, general-purpose models, and where smaller, task-specific models can deliver sufficient performance at lower cost.
Several speakers noted that this selectivity forces organizations to rethink how they measure success. Rather than celebrating technical sophistication, platform teams are being asked to demonstrate repeatable impact, predictable spend, and clear ownership. In practice, that means fewer experimental deployments and more emphasis on standardization, shared services, and internal enablement.
For large organizations, this shift often requires closer coordination between technology teams, finance, risk, and compliance. Generative AI platforms can no longer be owned solely by innovation units. Instead, they increasingly resemble core enterprise infrastructure, subject to the same budgeting cycles, controls, and accountability as data platforms or enterprise resource planning (ERP) systems.
Observability at Enterprise Scale
As systems grow more complex, observability becomes one of the most complicated challenges in enterprise generative AI. Burlutskiy said organizations operating across regions and teams must be able to monitor both technical performance and policy compliance.
“When you have a huge system with users across regions and teams, observability becomes critical,” he said. “You need guardrails to check prompts and outputs against policies and capture red flags.”
He said many alerts will inevitably be false positives, but visibility is essential for diagnosing failures and maintaining trust.
Mehta said AI observability differs fundamentally from traditional software monitoring. Unlike deterministic systems, generative models can produce different outputs from identical inputs.
“AI observability is completely different from traditional observability,” he said. “Every time you send the same natural-language input, you might get a different response.”
As a result, enterprises must define acceptable performance ranges rather than fixed outcomes.
This need for probabilistic oversight also affects how incidents are handled. Unlike traditional outages, failures in generative AI systems may manifest as subtle quality degradations rather than clear downtime. Detecting these issues early requires continuous evaluation and an understanding of how user behavior evolves.
Speakers said this makes observability not just a technical concern, but an operational one. Without shared visibility into model behavior and usage patterns, organizations struggle to build trust internally, slowing adoption and reinforcing skepticism among business stakeholders.
“You have to define expectations and evaluate whether the system behaves 80%, 90%, or 95% as expected,” Mehta said. “Nothing is 100%.”
Beyond technology, platform strategy is increasingly shaped by vendor dynamics and market volatility. Both speakers said enterprises must balance the convenience of commercial tools with the risks of lock-in in a fast-moving market.
Mehta said organizations should evaluate vendors not only on functionality, but on long-term viability and exit options. He said teams should assume that pricing models will change and that migrations may become necessary.
Burlutskiy said modular, composable architectures provide a hedge against uncertainty. By designing platforms that can integrate multiple models and tools, enterprises retain flexibility as regulations evolve and regional differences emerge.
He said this flexibility is becoming increasingly important as new models emerge across different geographies and as organizations face greater scrutiny over data sovereignty and compliance.
Making AI Boring
Looking ahead, panelists said the ultimate measure of success for enterprise generative AI will be its disappearance into routine operations. Mehta said the goal is to remove the sense of novelty and treat AI as a standard part of the technology stack.
“We want to make AI boring,” he said, adding that by 2026, AI should be embedded quietly into workflows rather than managed as a special initiative.
Burlutskiy said sustainability will also play a growing role as usage scales. He said enterprises are beginning to monitor not only financial costs, but also the energy footprint of AI systems.
He said choosing the right model for each task will be critical. Smaller, task-specific models can often meet business needs while consuming less energy and causing less environmental impact.
Taken together, the discussion suggests that the future of enterprise generative AI will not be defined by ever-larger models or isolated pilots. Instead, it will be shaped by disciplined platform design, clear economics, robust observability, and the ability to turn early experimentation into durable, organization-wide capability.



