AI data quality: lay the data foundations for AI success

With insights from

Raffaele De Piano

Principal Data Architect at Zühlke

How data quality impacts your AI initiatives

Data quality refers to the accuracy, completeness, consistency, reliability, and relevance of your data and it has a direct bearing on AI decision-making and outputs.

Low-quality data compromises decision making, which can impact everything from strategic decision-making to customer interactions. It’s also one of the main causes of GenAI project failure.

The classic adage ‘garbage in, garbage out’ rings very true when it comes to building AI systems. Even where businesses can deliver early wins from the likes of GenAI RAG applications, they’ll soon realise that data access and quality are critical for turning functional prototypes into scalable, product-grade solutions.

The 2022 case involving Unity Technologies is a great example of how the quality of data used to train models directly impacts AI performance. Poor quality data was compromising the accuracy of the company's machine learning models – used for targeted advertising – and so many advertisers jumped ship. As a result, Unity Technologies suffered a 40% drop in stock price, $5 billion loss in market cap, and $110 million in lost revenue.

It’s a pertinent reminder of the very real impact of using flawed or incomplete data in AI systems, the cost of which can run up to trillions of dollars every year

Poor data quality squanders resources and can deliver inaccurate, biased, and even unsafe outputs. And it can prevent you from translating AI prototypes into scalable, value-driving solutions.

On the flipside, using high-quality data in AI systems enables:

Effective learning
Accurate and valuable decision making
Greater efficiency
Improved reliability
Reduced risk of bias
More effective personalisation

So what does ‘good’ versus ‘bad’ data look like?

What is ‘good’ AI data quality?

If you train a model on poor or irrelevant data, the results will likely be disappointing. But labelling data as either 'good' or 'bad' in this way might be too simplistic. To truly understand data quality, more nuance is required.

It comes down to the specific problem you’re trying to solve. A data source might be an excellent fit for one AI model and worthless for another. That’s why choosing the best data for each task is so important.

To mitigate risks and truly become an AI-empowered business, companies must focus on collecting data that’s not only accurate and timely but also directly relevant to the challenge being addressed – whether you’re an insurance firm looking to streamline underwriting and claims processing, or a retailer looking to enhance real-time strategic decision making.

In a nutshell then, ‘good’ AI data quality is data that’s:

Relevant
Accurate & reliable
Complete & consistent
Secure
Accessible
Compliant
Timely

How to harness AI to improve data quality

The journey toward AI success doesn’t start with cutting-edge models – it starts with data. Ensuring high data quality is fundamental – and AI itself can be a powerful ally in this process.

Through automating checks, detecting anomalies, and cleansing data, AI can help ensure the data you use for analytics is accurate and reliable.

For example, we’re working with a big customer in the manufacturing industry to automate intelligent quality checks across their vast data estate, enabling automated detection and correction of issues.

AI tools can learn to identify data anomalies and predict potential future quality issues, minimising errors before they escalate.

AI’s predictive capabilities can help your business take proactive measures, ensuring data remains consistent and trustworthy.

But while automation significantly enhances data quality, it cannot stand alone. Because data quality needs to be embedded within the organisation’s culture. A strong data governance framework, paired with automated tools, ensures quality standards are maintained consistently across your organisation.

A framework to ensure data quality for AI initiatives

We’re seeing an increased focus on data quality across our clients and the many industries we work across. More organisations are recognising that data quality is critical for the success of AI initiatives. It's not just about using the right tools; it's about creating a culture where data is valued and managed carefully, from collection to analysis.

To make data truly AI-ready, organisations should approach it methodically, focusing on alignment, validation, and governance.

The three pillars of AI data quality: governance, alignment, and validation — Alignment, governance, and validation are critical components for getting your data ‘AI ready’

1. Align: establish clear data foundations

At this stage, the focus is on understanding the needs and characteristics of the data in relation to AI applications:

Data scope and size. Assess whether the amount and variety of data is suitable to support your AI models.
Semantics. Ensure consistent data definitions and a shared understanding across the organisation to eliminate ambiguity and improve integration.
Fairness & ethics. Pay attention to eliminating bias, ensuring the data supports ethical and fair AI model behaviour.

2. Validate: ensure data reliability

Once the data is aligned with AI needs, it must be validated to ensure integrity:

Consistency & accuracy. Perform checks to ensure data is consistent across systems and adheres to established standards.
Validation methods. Use automated tools to cross-check data accuracy and consistency.
Quality SLAs. Define clear data quality benchmarks, ensuring the data meets operational and business requirements.
Ongoing monitoring. Continuously monitor and test for anomalies and inconsistencies in data quality over time.

3. Govern: maintain and secure data over time

The governance phase ensures long-term data quality by focusing on security, accountability, and compliance:

Data ownership & stewardship. Assign data stewards to manage data quality, ensuring accountability.
Data lineage. Track the flow of data to understand how it evolves and is used across AI systems.
Version control. Implement systems to track data changes and maintain historical records.
Compliance & regulation. Keep data compliant with relevant regulations, ensuring responsible data use.

Lay the data foundations for value-driving AI solutions

High-quality data is the foundation of successful AI initiatives and ensures that prototypes evolve into scalable, reliable, and impactful solutions. But many businesses rush this step, either underestimating its importance or only recognising its value when problems arise.

To truly unlock the potential of AI:

Focus on quality from the start. Accurate, complete, and consistent data sets are essential.
Invest in governance and culture. A robust framework ensures that quality becomes second nature across your organisation.
Leverage the right tools and approaches. Automating processes with AI tools can enhance efficiency and reliability, but it needs to be paired with accountability.

Underestimating the role of data quality is a costly mistake. But when businesses get it right, they unlock more effective AI learning, reliable decision-making, and scalable growth.

Your next step? Prioritise data quality today. Whether it’s building governance frameworks, adopting AI tools, or aligning your team around the importance of quality, starting now will set the stage for long-term AI success.

Homepage zuehlke.com

Raffaele De Piano

How data quality impacts your AI initiatives

What is ‘good’ AI data quality?

How to harness AI to improve data quality

A framework to ensure data quality for AI initiatives

1. Align: establish clear data foundations

2. Validate: ensure data reliability

3. Govern: maintain and secure data over time

Lay the data foundations for value-driving AI solutions

Explore more Insights

AI in the industrial value chain: Creating genuine impact

Reclaiming your healthcare data

Digital interaction: the role of LLMs in personalising experiences