Building on the popularity of our recent GDPR posts, here’s more advice on data quality reporting.

We’ve seen the importance of data quality to GDPR compliance and your marketing performance. But, as business consultants are fond of saying, you get what you measure. How should that apply to data quality?

In my own tenure as a senior leader in large corporations, I’ve seen my fair share of poor, boring & non-existent data quality reporting. So, to set you up for success, I’ve sought out an expert.

I’m delighted to welcome Paul Weston as our newest guest blogger. Paul has made a career out of improving data quality, to deliver more effective analytics. But, before I hand over to Paul, let me just say he had so much advice to share, that it’s going to take 3 posts.

Over to Paul to share the first of his trilogy on customer data quality reporting…

Curb your enthusiasm – Data Quality Reporting matters

Data Quality Reporting is one of those practices that can be more harmful, if done badly, than not done at all. At this very moment there will be many thousands of operational people and executives who, incorrectly, believe that their data is in good shape; due to unintentionally bad quality reporting. They are not even thinking about fixing it as they have no idea that there is a problem.

Data Quality is not the most exciting topic on the management agenda. This is particularly true of Customer Data, where the stakeholders are more interested in sales targets, marketing campaigns & the latest creative, digital techniques. But Customer Data Quality has a huge impact on selling and marketing success, as well as a direct impact on the Customer Experience. In fact, the criticality of high quality customer data is increasing. Organisations are moving towards more programmatic media selection, personalised content & more digital channels.

In my experience, most organisations do have a measure that they can point to and claim to be monitoring data quality, on an ongoing basis. In reality what most have is a set of simple counts; e.g. of the percentage of key fields that are populated with something. This is mainly used to beat up the people responsible for capturing the data. It only applies to data on its way into systems, not to the overall base of customer data.

This article brings together a number of the measures that I have designed for clients to use or at least recommended that they monitor. I couldn’t point to a single client who has implemented the whole set but there are a number that are getting close and that appear to be reaping the rewards.

A basic principle – Measures are needed for Records as well as Fields. Measuring the quality of population of individual data fields is a critical part of any data quality framework. But it is, at best, half of the task. Although it would be an extreme set of circumstances – an organisation could have 50% of records populated with valid email addresses & 50% with first names. If they are different 50%’s then they have zero records that can be used for a personalised email. An extreme illustration but it does make the point.

The other critical factor is that it makes the exercise customer-centred, rather than purely data-centred. I suggest, as a minimum, any data quality framework needs measures by field and by record, so that is the way that my proposed ‘great set of measures‘ works.

Data Quality Reporting 1

Field-by-Field Quality Measures (The first 5)

Data Presence

Data Quality Reporting 2

This is the start point for all data quality measurement. But, all too often, it is also the end-point. Even this simplistic measure – How many records have got something in this field? – is not as simple as it would seem. Even after 20 years experience, I am sometimes caught out by the fact that a field can be populated with blank characters. This results in simple data counting queries believing a field is populated, when a human can easily tell that it is empty.

Entry Length

 

Data Quality Reporting 3

Many data types can have a minimum character length attributed to them and this can be easily checked by simple queries. Phone Numbers, Postcodes, Dates of Birth all have a clearly defined minimum length. But, these standards can vary across international boundaries. Name fields also often have minimum lengths. Although extreme care is required with names, in a today’s multi-cultural customer bases. The challenge, with entry length standards, is in identifying when an entry is too long. When it has been truncated in inputting or feeding it into the database. Ideally, this will have been trapped (by validation checks) at the capture point.  But, experience shows that, for certain data types, at least 80% of entries at maximum length for the field, are actually truncated.

Entry Format

Data Quality Reporting 4

Data needs to be of the right format, for the field in which it is entered. Many customer databases will have new records checked, for format, when they are loaded. For instance, a phone number can only be numbers (and possibly a “+” symbol). An email address has a clearly understood format requirement. Even international VAT Numbers follow a known pattern. Formats need to be individually defined for many fields. Any checks must be carried out against such field-specific definitions.

Match to a List of Values (LOV)

Data Quality Reporting 5

Some data needs to match to a list of acceptable values, to be considered as properly populated. The values will at least replicate those offered at the point of data capture. For quality measurement, they may also include old values which are still meaningful, but no longer offered at point of input. It is usually valuable to identify, which of these cases applies, in the quality reporting.

Inter-field replication

Data Quality Reporting 6

This is a less-used measure. But, it can add substantial value for a small amount of extra effort. Some data replication will be identified by the previous measures. For instance, an entry like (Mr, Mrs, Ms etc.), in the First Name field, may be caught by the format check on that field. But it may not be caught, whereas in a data replication check it is more likely to be identified. In a B2B scenario, entry replication can often be identified between fields such as Company Name and Trading Name.

More Data Quality Reporting to follow next time

OK, you can calm down now & sit back from being on the edge of your seat with excitement! All teasing aside, thanks to Paul for many good points there, most of which are overlooked in most businesses.

How are you served by your data quality reporting? Are you comfortable that you have the monitoring & control MI needed? Reading Paul’s advice, I hope you’ll agree with me, that there is more to this than you first imagine.

If you have either positive examples to share, or ‘war stories‘ about the bad decisions resulting from lack of such reporting – we’d love to hear from you. Feel free to add your comments via the link below, or to respond on social media.

Better data quality serves us all, as analysts, leaders & citizens. So, let’s share how to get this fundamental building block right, so we all know where we stand.

Thanks again Paul & look out for post 2 in this series, coming soon…