Susan Walsh examines data discovery at #RISK London

No comments

Data discovery expert, Susan Walsh will be speaking at #RISK London, coming to the ExCeL this month.

Susan Walsh Headshot

Susan Walsh is founder and MD of The Classification Guru, a specialist data classification, taxonomy customisation and data cleansing consultancy. An industry thought-leader, Susan is the author of “Between the Spreadsheets: Classifying and Fixing Dirty Data”.

Susan has developed a methodology to accurately and efficiently classify, cleanse and check data for errors which will help prevent costly mistakes – a topic she will explore in depth exclusively at #RISK London.

Prior to appearance, Susan tells us about her career and provides insight on the data minimisation challenges that all organisations face as they move to meet evolving compliance standards.

Could you outline your professional pathway to date?

I obtained a degree in commerce, went into merchandising and retail and then into sales. I subsequently opened my own clothing shop but that failed. I went bankrupt and needed a job. I ended up falling into data and went to work for a spend analytics company classifying spend data, normalising suppliers.

For the first time in my life, I found a real passion for what I was doing, and ended up spending five years with this analytics company, growing a team there. When it was time to move on, I didn’t really know where I could get a job doing the same thing as I hadn’t come from the world of procurement or data.

Five years ago, I started my own business and just wanted to keep doing this line of work because I love it. I’ve just been building my brand and growing awareness of what I do, with regards to data deep-cleaning as a standalone service, and data classification.

There’s some amazing tech out there, but none of it works if you don’t have clean data first, particularly if you’re using AI. The only way you are going to get clean training datasets is by improving people’s data handling behaviours. But I’m discovering that it’s not a subject that’s covered at university level – there are no practical tips on understanding the data you’re working with; establishing whether that data is clean; what’s right and what’s wrong.

I’m passionate about educating others about cleaning data and classifying data for clients. I wrote a book “Between the Spreadsheet”, which falls nicely into my #RISK London talk on skeletons in your data closet.

What does it mean to possess a healthy data inventory?

You can’t have a blanket approach to data – it really depends on your datasets and your individual business needs, as well as the needs within your teams and departments. The meanings of holding accurate and clean data will differ, so just make sure you know what the requirements are for the data you’re working with.

If you’re working in finance, obviously your numbers must be accurate. Regarding GDPR requirements and databases, it’s not enough just to protect that data; it must be only the information that you need. I might only need a client’s name and email address. I don’t need their phone number, date of birth, or home address, so I need to ensure that I’m not retaining that data.

Spelling is a good example of having clean data. If a client’s name is spelt incorrectly, then that’s dirty data. It’s about the right people within your organisation collecting the key data points, and then having communication channels in place to ensure that everyone is on the same page. That’s where the data quote comes from – you must be consistent, organised and accurate. Only then can data become clean and trustworthy.

What are the benefits of data minimisation beyond legislative compliance?

Essentially, minimising your data leads to an easier life – less stress, greater efficiency across the board and less time wasted trying to figure out problems. That might concern deciding if the right notes are being put on the right account in a CRM system or figuring out the right address for a supplier. If you’re dealing with personal information, then it’s hard to verify unless you’ve been diligent in minimising and cleaning your data.

What are the primary challenges that organisations face as they bid to minimise data?

Data problems are people problems: a lot of the issues stem from people inputting data incorrectly, and this is a task that needs to become less intimidating.

We also need people to understand the consequences of incorrect data input. If incorrect information is put into a product line, then in the real world it can cause so many problems. For example, poor data might mean a lorry goes into a warehouse to find that storage dimensions are wrong, so a product can’t fit into its allocated space. Issues like these can cause severe delays and impact on the bottom line. By enabling people to understand the consequences, we can begin to improve data handling behaviours.

Change is happening – for such a long time, clean data hasn’t been a priority, but we’re slowly starting to change mindsets and culturally we’re understanding that it leads to improved revenues.

It could be a question of tidying up spreadsheets and pulling in a few extra formulas and saving two people ten hours per week of manual input time, simply by automating a process.

We need to be speaking in detail within our businesses to understand these challenges so that decision-makers buy into what’s going on, what needs to be done and what the benefits are. The challenge is to keep these issues at the forefront of peoples’ minds. I think it’s very hard for larger companies to keep things in order when they’re dealing with so many people and other organisations.