Agentic AI in large-scale systems: where should leaders draw the line?

With insights from

David Elliman
Global Chief of Software Engineering

Agentic AI is moving fast from boardroom curiosity to real enterprise ambition. The promise is compelling: AI systems that do not simply answer questions, but plan, act, use tools, and complete tasks with a degree of autonomy.

But the moment AI can act on behalf of a person or organisation, the question stops being ‘what can the model do?’ and it becomes ‘what should it be allowed to do, where should it be stopped, and how do we recover when it gets something wrong?’.

In episode nine of the Tech Tomorrow podcast, Zühlke’s David Elliman, Global Chief of Software Engineering, speaks with Sam Newman, independent consultant, author and one of the most influential voices in microservices, cloud and continuous delivery. Together, they explore what it takes to architect agentic AI safely in large-scale systems.

Meet the guest: Sam Newman

Sam Newman is an independent consultant based in London, working with organisations around the world on cloud, continuous delivery, microservices and system architecture. He is the author of Building Microservices and Monolith to Microservices, and his latest work focuses on resilient distributed systems.

With more than 30 years’ experience in software development, Sam Newman brings a deeply pragmatic view to the AI debate. He is optimistic about the potential of generative AI, but wary of certainty in a field that is changing by the month.

As he puts it in the episode: ‘Anyone that tells you with certainty that this is how things should be done, they’re either lying to you, or they misunderstand the state of the world’.

Key takeaways from the episode

Agentic AI is not magic — it is software with new failure modes

For Sam, the starting point is clarity. Agentic AI systems are often described as autonomous, but in today’s enterprise context they are usually LLM-powered systems that can use tools, respond to inputs, and take actions on behalf of a user.

That distinction matters: large language models are powerful, but they are also non-deterministic. In traditional software, the same input should normally produce the same output. With LLMs, the output may vary and may only look right.

That is not necessarily a problem in every context. If an image-generation system produces several different creative options, variation can be a feature (and an advantage). But if a system is calculating, deploying, approving, routing, or changing enterprise data, ‘plausibly right’ is nowhere near good enough.

The lesson is simple: Agentic AI should not be treated as a universal automation layer. It should be designed into the system where its probabilistic nature creates value, and kept away from tasks where determinism, auditability, and repeatability are non-negotiable.

Start with boundaries instead of capabilities

Much of the hype around agentic AI begins with capability: what can the agent do, which tools can it access, how far can it go?

Sam argues that enterprise architecture needs to start somewhere else: with boundaries.

Rather than placing AI agents deep inside core systems, organisations should ring-fence them behind clear abstractions. The agent should not have unrestricted access to data, services, or workflows. It should interact through defined interfaces, with explicit contracts around what comes in, what goes out, and what happens when confidence is low.

That architectural discipline makes systems easier to secure, allows organisations to swap models, vendors or workflows as the market changes, and creates the option to replace an AI component with deterministic code later if the use case becomes stable enough — or if token costs become too high.

This is where familiar software architecture principles become newly relevant. Modularity, information hiding, bounded contexts, and microservices are not legacy concepts in the AI era. They may be precisely what makes agentic AI safe enough to scale.

Abstract close-up of glowing blue and purple code on a screen, representing software development, data and digital technology.

Not everything needs an LLM

One of the strongest messages from the conversation is also one of the most easily forgotten: many tasks being handed to LLMs can be solved better with conventional software.

Sam describes a common pattern. Teams start with an AI-driven workflow because it is quick to prototype. Over time, they discover that parts of the workflow are stable, predictable, and repeatable. At that point, a deterministic service may be cheaper, faster, easier to test, and easier to trust.

This is not an argument against using AI-assisted coding tools. A deterministic component can be written with help from generative AI and still behave like conventional software. The important distinction is not how the code was authored, but how the live system behaves.

For enterprise leaders, this creates a useful design principle: use AI where ambiguity, language, planning or adaptation matter; use software where the task is known, testable and repeatable.

That mindset also helps avoid a common scaling trap: a working model is not the same as a working AI product. Production readiness depends on the surrounding system — data, integration, platforms, security, ownership, and change control.

Validation gates are essential in AI workflows

Agentic systems often work as chains of steps: one model extracts information, another interprets it, a third decides what to do next, and a tool executes an action.

That modularity is useful, but it introduces a serious risk. If an early step produces a flawed output, every later step may inherit and amplify the problem, even if those later steps behave exactly as designed.

David describes this as a need for validation gates: checkpoints between AI-driven steps that verify whether the previous output is safe, structured, and fit for purpose before it becomes the next input.

Those gates might include deterministic checks, schema validation, confidence thresholds, human review, audit trails, or rollback mechanisms. If a step is supposed to extract a number, check that it really is a number. If a tool is about to trigger a business-critical action, make sure the action is reversible, authorised, and observable.

This is where AI governance stops being a policy document and becomes system design.

External security guidance is also maturing quickly. The OWASP Top 10 for Large Language Model Applications highlights risks such as prompt injection, excessive agency and insecure output handling, while NIST’s Generative AI Profile offers a cross-sector framework for identifying and managing generative AI risks.

Token economics will become an architectural concern

AI costs are often treated as a procurement issue. Sam argues it is also an architectural one.

Today, many organisations are experimenting with AI through subscriptions, credits or early-stage pricing models. But as usage scales, token consumption can become a material cost driver — especially when workflows involve multiple models, repeated calls, long context windows or agent-to-agent orchestration.

The comparison with cloud adoption is useful. Many organisations initially expected cloud to reduce costs automatically. Instead, those that lifted old architectures into new pricing models often saw costs rise until they introduced proper governance, observability and design discipline.

AI may follow a similar pattern. Leaders need to understand the unit economics of their AI workflows before they become business-critical. Sometimes the right answer will be a different model. Sometimes it will be local deployment. And sometimes it will be replacing part of the AI workflow with ordinary software.

The future belongs to organisations that can experiment safely

Despite his concerns about hype, centralisation and market volatility, Sam is not pessimistic about the technology itself. He believes valuable generative AI use cases are still ahead of us, and that many of today’s most visible use cases are not necessarily the most interesting.

The organisations that succeed will not be those that try to predict the entire future of AI. They will be those that build the capacity to learn quickly without putting critical systems, customers, or data at unnecessary risk.

That means small experiments. Clear interfaces. Strong feedback loops. Human accountability. Modular architecture. And the humility to admit that some of today’s AI components may need to be changed, replaced or removed tomorrow.

A practical roadmap for leaders

Agentic AI is not something to block. But it is something to design carefully. For executives and technology leaders exploring AI agents in large-scale systems, the episode points to six practical steps:

Define the job before choosing the technology

Start with the business problem, not the desire to deploy an agent.

Separate deterministic and non-deterministic work

Use LLMs where uncertainty and language matter. Use conventional software where the process is stable and testable.

Ring-fence agents behind clear interfaces

Treat AI components as volatile dependencies with explicit contracts, permissions and data boundaries.

Add validation gates between workflow steps

Do not allow one model’s flawed output to silently become another model’s input.

Design for traceability and rollback

Assume agents will make mistakes. Make those mistakes visible, recoverable and accountable.

Track AI economics from the start

Monitor token costs, model usage and the cost-benefit case for each part of the workflow.

The boundary is a well-designed interface

So, what boundaries should define our relationship with agentic AI in large-scale systems?

The answer is not to keep AI outside the enterprise. Nor is it to give autonomous agents unrestricted access in the name of speed.

The right boundary is a well-designed interface: one that lets AI contribute where it is useful, while protecting the systems, data and decisions that matter most.

Agentic AI will create real opportunities for organisations that can combine experimentation with engineering discipline. But the winners will not be those with the most autonomous agents. They will be those with the clearest boundaries, the strongest controls, and the fastest learning loops.

Because in a world where nobody knows exactly what comes next, resilience is not about certainty, but about staying ready to adapt.

Tech Tomorrow podcast thumbnail featuring host David Elliman smiling inside a circular frame on a purple and blue abstract background, with the text “Tech Tomorrow with David Elliman” displayed on the right.

Discover all Tech Tomorrow episodes

Tech Tomorrow is your front-row seat to the conversations redefining the future. Each episode explores one big question about data, AI, or emerging tech, giving leaders clear, focused answers they can trust.

Explore all episodes

David Elliman