Enterprise AI to scale with kill switches, orchestration, Citi says
Most organizations are still experimenting with AI agents while the gap between demos and production remains wide
Agentic artificial intelligence (AI) systems may perform flawlessly in controlled demonstrations, but deploying them across large organizations exposes a set of challenges that most vendors and developers have yet to fully address.
From ensuring systems can be shut down instantly to managing workflows that run for months, the gap between a proof of concept and enterprise-grade deployment is wider than it appears.
“They all work well in demos, but when you talk about enterprise adoption, it’s a totally different view and lens that you need to put on,” said Amal Makwana, Senior Vice President and Engineering Manager at Citi.
“If it falls at 2 a.m., something’s wrong. The first set of questions to be asked is: Can I shut it down? Does it have a kill switch? Can I throttle it? Those are the things that enterprises look for,” he said.
He said the eventual architecture of enterprise AI will likely see every business department running its own dedicated agent.
“Every department has an agent, and then there is a super agent who’s coordinating between everybody,” Makwana said.
At an event, he outlined the full range of production requirements that enterprises must demand before deploying agentic AI at scale. The list includes observability tools for root cause analysis, robust integration with legacy applications, customization and extensibility, and state management for regulatory compliance.
Developer experience also matters. The quality of documentation, the availability of debugging tools and the learning curve of any given framework all factor into enterprise readiness.
No single agent can be an expert across all business domains, which makes a coordinating orchestration layer essential. In financial services, long-running workflows can span multiple months, making state management especially critical.
“In the financial world, we love building wrappers around everything,” he added.
Citi is one of the world’s largest financial institutions, operating across more than 160 countries and serving corporate, institutional and retail clients. Makwana leads engineering delivery teams focused on the bank’s technology transformation programs.
AI Ops goes autonomous
Makwana's presentation, titled "Deep Dive: Agentic Autonomous Systems," was delivered at the AI & Big Data Expo, part of the TechEx series of technology events held in London. The session drew a mixed audience of engineers, enterprise architects and technology managers.
He described two categories of emerging use cases: AI operations (AI Ops), which automates and optimizes IT service management, and four autonomous professional roles he expects to become mainstream in the near term.
On AI Ops, Makwana described capabilities including historical data analysis, anomaly detection, performance monitoring, cross-application data correlation, optimization suggestions and automated task execution.
“Next time there’s an increase in transactions because there’s a sale on Christmas, the system will automatically increase your memory and compute power to make sure that everything is okay,” he said.
AI Ops is distinct from two related disciplines. DevOps focuses on faster software delivery through automated continuous integration and continuous delivery (CI/CD) pipelines, while machine learning operations (MLOps) govern how ML models are selected, trained, tested and deployed into production applications.
Beyond IT operations, he outlined four roles in which autonomous agents are set to handle tasks currently performed by humans:
A travel agent that plans and books trips based on known preferences.
A financial advisor that monitors portfolios and rebalances them automatically.
A developer agent that writes code to organizational standards with minimal prompting.
An expert route planner that factors in constraints and transport modes.
He said choosing the right framework requires careful analysis. The key criteria include the complexity of the use case, the desired level of human oversight, latency and performance requirements, and deployment infrastructure.
Security and regulatory compliance are particularly important in financial services, where firms must adhere to rules such as the General Data Protection Regulation (GDPR). Community support and cost round out the checklist.
Agents will rarely operate in a greenfield environment, meaning integration with existing legacy applications is a non-negotiable consideration. For organizations in the early stages of experimentation, Makwana suggested open-source frameworks as a practical starting point before committing to commercial platforms.
Decade, not a year
Understanding what qualifies as a truly agentic AI system, as opposed to a simple AI agent, is a distinction Makwana said is frequently misunderstood.
“Agents are not agentic AI. They are the building blocks of agentic AI,” he said.
He described four observable traits that define a genuinely agentic system: adaptability, agency, autonomy and persistence. He said that any system exhibiting those four traits can be classified as agentic AI. The determination requires no technical knowledge; observing how the system behaves is enough.
He traced the evolution of such systems through four stages, using a medical analogy. The first is a basic reactive large language model (LLM), capable of answering general questions. The second adds retrieval-augmented generation (RAG), giving the model access to verified, trusted data sources.
The third stage embeds predefined workflows, enabling the system to coordinate tasks rather than merely advise. The fourth and final stage is a fully agentic system capable of dynamic planning, goal persistence and coordinated action across multiple agents.
Five core components are required to reach that final stage:
Intent and awareness: allowing the system to pursue goals rather than wait for specific instructions.
Intelligence and decision-making: to reason through options within policy constraints.
Execution and learning: enabling step-by-step action refined by feedback.
Coordination and control: to orchestrate multi-step plans across tools.
Governance and reliability: ensuring the system operates within defined boundaries and can be explained.
“It has to operate within a set of guardrails. It needs to know its boundaries, and it has to work reliably. We need to be able to explain what it does and how it does it,” Makwana said.
On the broader state of the industry, a late-2025 McKinsey report showed that nearly two-thirds of organizations remain in the experimenting or piloting phase. While 64% said AI helps their operations, only 39% reported a measurable impact on earnings.
Half identified workflow redesign as a critical success factor, and 80% said AI should drive growth and innovation, not only cut costs.
“More EBITDA [earnings before interest, taxes, depreciation and amortization] impact will be seen if you look at AI in a more holistic way, and not just as a tool,” Makwana said. “You need to redefine the workflows.”
Three interoperability protocols are gaining traction in the agentic AI space. Model Context Protocol (MCP) standardizes how LLMs access external tools and APIs. The Agent-to-Agent (A2A) protocol, initiated by Google, allows agents to advertise their capabilities and communicate directly. The Agent Communication Protocol (ACP), proposed by IBM, achieves similar coordination via HTTP rather than direct real-time connections.
He stressed that the winner in the coming decade of AI adoption will not be whoever moves fastest.
“This will not be the year, but the decade of AI and agents, and the winner would be the one who doesn’t adopt fastest, but adopts wisely,” he said.




