The importance of data quality for generative AI in insurance, Zühlke

Thanks to the massive impact of artificial intelligence (AI) technologies like ChatGPT, generative AI (Gen AI) use cases in business are no longer theoretical. We’re now seeing new generative AI models across many industries, streamlining workflows and improving the customer experience.

But building and deploying these systems isn’t always easy. The success of any AI implementation relies on having high-quality underlying data that addresses a specific use case.

The insurance industry is an excellent example. The sector brings unique challenges, with its involved processes in areas like underwriting and claims handling. Using generative AI in insurance isn't simply a matter of rolling out the same technology from one context to another — it requires a more bespoke approach. It requires well-structured, relevant, high-quality data.

If the insurance sector can provide high quality and consistent data, the impact on benefits is enormous. It would lay the groundwork for tools that assist both professionals and clients, saving time, enhancing operational efficiencies and improving customer satisfaction. Let’s see how that can be possible.

The role of large language models in generative AI

Before diving deeper into the subject of data quality, it’s important to mention that what lies at the heart of Gen AI technology are large language models (LLMs) that generate outputs mimicking natural human language. At their most basic, LLMs are algorithms that learn from vast datasets, processing and generating text based on the training data input.

The early development stages of LLMs typically involve using large datasets, which can vary in quality. Yet, this broad training phase is necessary as it helps the model establish an understanding of language and patterns.

However, the true potential of Gen AI solutions and the importance of data quality come to light during the later fine-tuning process. You see, to refine the model, the focus should shift to using smaller but highly specific datasets and selecting the right data sources to ensure the system can do its job accurately and reliably.

Using LLMs, Gen AI has the potential to streamline operations across the insurance value chain by automating routine tasks and enhancing decision-making. More specifically:

In the underwriting process, it could act as a virtual assistant, retrieving information from documents and reducing time-consuming manual workloads (submission triage for example).
In claims processing, it could assist in fraud detection and streamlining claims management, helping agents handle complex cases more effectively.

Of course, the use of Gen AI in insurance is not without its challenges. Cultural and technical hurdles — mainly concerns over transparency and trust — must be considered carefully. Plus, it’s best to take baby steps before starting full scale LLM implementation as you first need to get the machine learning right. Finally, the quality of data used in these systems directly influences the value they offer, emphasising the need for insurers to prioritise data excellence now.

Why is data quality important for Gen AI systems in insurance?

The classic adage ‘garbage in, garbage out’ describes the need for high-quality data when building AI systems. If you train a model on poor or irrelevant data, the results will likely be disappointing. But labelling data as either 'good' or 'bad' in this way might be too simplistic. To truly understand data quality, more nuance is required.

The truth is that a dataset isn’t always good or always bad — instead, it comes down to the specific problem you’re trying to solve. A data source might be an excellent fit for one AI model and worthless for another. That’s why choosing the best data for each task is so important.

5 considerations for building high-quality generative AI solutions

For insurance innovation to work, AI models must be trained on highly relevant sources. Using unsuitable data can bring several potential risks. So, when beginning your Gen AI journey, keep the following key data quality considerations in mind.

1. Data integrity

Firstly, ensuring data integrity is absolutely vital. You need to use the correct data for the task and implement rigorous technical frameworks to connect and ensure the quality of that data. Otherwise, if you use unsuitable, biased, or unrepresentative digital information, hallucinations can occur. They’re a phenomenon where AI models generate false patterns in their outputs — potentially doing more harm than good.

2. Time relevance

Secondly, the data used to train and fine-tune systems must still be relevant. After all, the time relevance of data can diminish as societal norms, regulations, and circumstances change.

For instance, the COVID-19 pandemic altered risk assessment profiles as people’s behaviour was drastically different during that period. Everyone was home, people were driving less, etc. So, basing your AI system on that data can lead to inaccurate results.

3. Response accuracy

Traditional data management practices in insurance have often limited the scope for comprehensive machine learning and analytics. By embracing end-to-end lifecycle data management, insurers can improve the accuracy and applicability of their Gen AI systems.

4. Personal data permissions

It goes without saying that insurers must conform to data regulations at all times. However, ensuring compliance while maintaining data quality is a balancing act.

Responsible AI applications must be designed to adapt to continuous regulatory shifts, safeguarding both data privacy and integrity.

5. Availability of data

When training Gen AI systems, data extends beyond internal databases to external sources. The challenge here is not just accessing these outside datasets. It’s also vital to ensure their reliability and relevance.

Public data is invaluable, especially in the early stages of training. But it can carry inherent biases. Likewise, internal data might suffer from incompleteness or outdatedness. Acknowledging and managing these potential risks is a significant step towards obtaining high-quality data.

Insurance data quality and the importance of human oversight

So, what does 'good' look like? To mitigate risks and truly become an AI-empowered business, insurance companies must focus on collecting data that is not only accurate and timely but also directly relevant to the challenge being addressed.

In that way, good data should be:

Secure
Accessible when needed
Meticulously verified to ensure its correctness and relevance
Selected with a clear objective in mind

It’s also important to remember that the role of technology is not to replace human expertise. It’s there to support and enhance it. Our Zühlke experts have even recently discussed the importance of keeping humans at the centre of all data-driven insurance developments.

So, in any AI project, human oversight is indispensable — from the initial stages of data collection and training to the continuous refinement of models. It helps ensure that the results are accurate, trustworthy, and aligned with the nuanced needs of the insurance industry.

Embrace Gen AI potential

The success of Gen AI in insurance hinges on its underlying information. Good quality data is secure, relevant, and highly suited to the task in question. Building systems using the correct sources helps ensure reliability and effectiveness, from streamlining underwriting and claims processing to enhancing real-time business decision-making.

To begin your AI journey, it’s a good idea to start small. Look at existing data, create specific use cases, and learn from pilot projects. Build complementary partnerships with technology companies to increase expertise and open doors to new opportunities for innovation. Then, when ready, take those lessons and scale up by integrating successful initiatives with systems built on high-quality data.

Discover our insurance innovation services

Generative AI in banking: From buzz to business value

Generative AI in banking: From buzz to business value

Digitalisation & Disruption –

Creating revenue through digital health

The importance of data quality for generative AI in insurance

Dan Klein

Neelesh Parekh

The role of large language models in generative AI

Why is data quality important for Gen AI systems in insurance?