Computer vision in retail: How AI-enabled video is giving retail stores a new set of eyes

With insights from

Kevin Denver

Principal Software Engineering ConsultantKevin.Denver@zuhlke.com

Walk into any high street store, look up, and you’ll probably see a video camera staring right back at you. But despite CCTV serving as the bedrock of retail security since the 1980s, video footage is a fairly blunt instrument – a reactive resource, rather than a proactive tool.

Unless, that is, that age-old video capture hardware is paired with AI-enabled computer vision (CV) software.

With the help of machine learning and modular, interconnected data ecosystems, in-store cameras look set to unlock a seismic shift in the way retailers think about security, optimisation, and personalisation.

But that’s only if businesses can navigate a complex web of ethical concerns, outdated systems, and siloed data — all of which stand between today’s cameras and tomorrow’s smart stores.

The possibilities of computer vision in retail

Let’s fast-forward. Imagine a retail environment where computer vision is used to analyse camera footage across a range of verticals, and against a wide array of data sets.

In this store, footfall heatmaps highlight under-used areas of the store alongside common customer journeys and use that data to recommend more efficient layouts. Popular aisles are flagged and cross-referenced with weather and public holiday data to suggest hyper-timely promotions. Self-checkouts avoid weigh-in and age verification disputes with AI backup from overhead cameras. Wouldbe thieves are flagged in the act, rather than at the door. And queues are monitored in real time to optimise both in-the-moment and predictive staffing rotas.

This is the promise of AI-enabled computer vision in retail: turning the cameras that already exist in millions of stores and turning them into inputs for AI-powered insight.

These use cases are all based on this principle, where a ‘digital twin’ of each retail environment can analyse different data points – of which video is just one – to surface actionable suggestions. The potential outputs cover a few primary areas:

Theft detection
AI can watch for theft in real-time, using pattern analysis to spot when someone is near a high value product, and if they make movements typical to theft – like item concealment.
Safety and security enforcement
This might include monitoring that fire exits are clear, gating age restricted areas, detecting spills or hazards, and even using pattern analysis to predict violent behaviour before it occurs.
Store optimisation
Optimisation can include everything from AI-powered self-checkout technology to smart inventory management, automatic age checks, and intelligent store layout insights based on customer behaviour.

What’s key here is that computer vision really comes into its own when partnered with other data types. That means combining AI-enabled video with things like inventory databases, staff schedules, or customer purchase histories to find new insights. However, much of the necessary data remains hidden in legacy systems, fragmented across departments, or confined within outdated infrastructure. Extracting its value demands robust data analysis, thoughtful modelling, and seamless integration to truly prepare for AI at scale.

But even with the right data foundations, there are still a few more hurdles to overcome.

Business meeting with colleagues discussing something over pieces of paper

The fine print of progress: what’s holding AI-powered computer vision back

While the promise of AI-enabled computer vision in retail is compelling, the path to implementation is far from frictionless. Before the benefits of smarter stores can be realised, some uncomfortable questions and fundamental upgrades must be addressed.

Orwellian obstacles

Today’s retail businesses may have the hardware in place to bring our computer vision use cases to life, but how many of them also have the infrastructural and moral boxes ticked?

First up: ethics. Using video footage to generate AI insights naturally raises a handful of very important concerns. For decades we’ve all acknowledged (and largely ignored) signs advising that CCTV is in operation, but what happens when those same systems are actively inputting to generative AI models? Is a sign on the wall enough of a mutual agreement in that instance?

The European Data Protection Board has highlighted the sensitivity around surveillance in public spaces, noting the difficulty of obtaining meaningful consent in environments where individuals can’t easily opt out. Similarly, the UK’s Information Commissioner’s Office concedes that “in practice, it is often difficult to obtain genuine consent from individuals for processing their personal data in public spaces.” That’s while point 6 in the Surveillance Camera Code of Practice states that “no more images and information should be stored than that which is strictly required for the stated purpose of a surveillance camera system.”

That presents obvious regulatory challenges in ‘test and learn’ scenarios, wherein any final use case for this technology isn’t yet defined, and therefore can’t be clearly communicated or consented to. But even beyond the raw legalities, there are deeper ethical factors at play.

Staff and customers alike need to feel safe in the knowledge that any active surveillance system doesn’t exploit their privacy, and that there’s transparency around the kind of data being collected. Compliance with data collection regulations like GDPR can be tricky enough even before AI enters the mix – a space where the EU requires numerous ethical checks and balances.

Tech maturity gap

Wrapped up in all this are requirements for fairness, explainability, and data security – the latter of which goes hand-in-hand with our other major challenge: technological maturity.

Despite the vast majority of retail environments having cameras installed onsite, the data architecture and back-end systems often lag behind what’s needed to unlock true computer vision capabilities. Orchestrating data pipelines and applying effective governance are further, intertwining necessities that might seem for many stores to be a bridge too far.

So, whilst this technology has the potential to transform how retail organisations operate, it comes at the cost of significant investment in a range of key areas:

Analysis of ethical, legal, and security concerns
An end-to-end service design approach
User research and experience design
Trials of new technologies
Design and build of products
Customisation and licensing of best third-party products

Wind back, scale up

Time to rewind to the here and now. How can retailers actually overcome all the technological and ethical challenges outlined above today, in order to unlock tomorrow’s AI-powered efficiency gains?

The answer lies in finding focus. Computer vision in retail can build a new generation of smart stores, but that overwhelming wealth of possibilities can get in the way of practical deployment.

Retailers therefore need to row things back to a ‘minimum viable product’ approach – a clear use case with a specific business value. Focused outputs (with equally focused inputs) can help crystalise both the necessary tech stack upgrades and an ironclad data governance approach.

In simple terms, that means: