Data Cleaning Checklist for Recruiting Operations

Data cleaning is the art of managing your recruiting data with best practices and involving everyone in the process.

Data cleaning is a skill that recruiters are learning these days out of necessity, but it’s far from universal. And unless all of your hiring managers happen to be data analysts, they may not be familiar with it either.

What is data cleaning in recruiting?

The way your hiring team manages the data in your applicant tracking system (ATS) can have a big impact on the story the data tells and the strategy you implement from it. Data cleaning is the art of managing your data with best practices. And it’s something that requires everyone’s participation.

Data cleaning: Your approach

The first part of this data cleaning checklist concerns your approach to data as a hiring team. Data cleaning isn’t difficult if you, first and foremost, approach it in the right way.

1. Teach Data Management: 101

Whoever is using your ATS needs some training on the basics. They should understand what a literal creature your ATS is (e.g., it doesn’t know that ‘LinkedIn’ and ‘LI’ are the same thing). And they should know exactly how you’re using it as well as any hidden fields or quirks it may have (e.g., if it doesn’t do bulk rejects).

2. Attach candidates to jobs

Your ATS has to be more than just a database for data cleaning to work. Including complete information for every candidate provides a full pipeline picture of a job. And moving candidates through the hiring stages in real time creates an accurate record of how they progressed. So you can assess each job’s pipeline and address any issues in your hiring stages.

3. Incentivize data cleaning

Recruiters and hiring managers need a reason to prioritize untainted statistics. To understand why a recruiting effort didn’t yield a hire, you need data on that recruiting effort. Therefore, all of your hiring team members have to update each candidate’s status in real time, even dropouts. To get users to stay on top of that, add data cleaning to their list of key performance indicators.

4. Use thoughtful reasoning

Finding real meaning in the data is finickier than it may seem at first. Something users may want to do is benchmark newer jobs against older jobs. But you have to look at the right metric. For example, there’s no point in using percentage of qualified candidates to unqualified candidates because number of qualified candidates is the metric that matters.

Data cleaning: Your data

The second part of this data cleaning checklist deals with your data directly. Getting an accurate story that helps your overall talent acquisition effort requires complete data on every job, organized in the right way.

5. Collect end-to-end data on every job

End-to-end data is crucial to data cleaning because it’s the only way to get a full view of every hiring effort. How can you gauge time to hire for a job or average time to hire across all your jobs, for example, if you don’t close jobs out? And how can you diagnose issues within each stage of the hiring process if you don’t collect data on each stage?

Collecting end-to-end data means staying on top of data cleaning best practices like updating candidate status in real time. It also means getting rid of so-called evergreen jobs that you hire for repeatedly and leave open for convenience.

6. Use enough data to be significant

It takes a certain amount of data for analytics to be meaningful. You may be tempted to compare two jobs against each other, but you need more than two jobs for actionable analytics. Rather than comparing individual jobs or single-digit numbers against each other, use comparison groups of at least 10 or more jobs.

7. Separate your data into thoughtful buckets

Every situation poses its own unique challenge to the hiring effort. Hiring teams may struggle because of where they are, who they’re trying to hire, or what industry they’re in. By segmenting your data into thoughtful buckets such as location, seniority level, and job type, you can see what’s happening on the ground in each situation.

8. Remove outlying pieces of data

Some jobs behave differently than others because of their unique nature. For example, you can count on evergreen jobs, internal hires, internships, and new-grad hires to act differently on a regular basis. It’s important to remove these regular outliers from your talent pipeline data so they don’t skew your analytics.

9. Use median calculations over mean calculations

Applicant pool data can contain lots of outliers. Given that a single outlier can muddy the picture of your talent pipeline, it’s important to use median calculations instead of mean calculations. (The median is the middle value, while the mean is the average.)

Follow the data cleaning checklist

While data cleaning isn’t particularly difficult, it does take some basic data management skills and constant vigilance. And it’s vital. Because there’s no point in analyzing your talent data when it’s not clean and the picture it shows you is inaccurate. And the strategy you implement based on that analysis is misguided. It’s better to stick to a data cleaning checklist like this one.

Maryam J.

Head of R&D