To balance the scales of gender equity, it's critical, yet very difficult, to understand the gender identity of your candidates

The Importance and Challenges of Understanding Candidate Gender in Talent Acquisition

By Kelsie T.

Datapeople

Since its establishment in 1965, the Equal Employment Opportunity Commission (EEOC) has worked to combat workplace discrimination across the United States. In support of that effort, the commission established reporting standards for key demographic fields, namely gender, race/ethnicity, veteran status, and disability status. All private sector employers with more than 100 employees  (and federal contractors with more than 50 employees meeting certain criteria) are required by law to submit an annual survey. The fields and their response options are determined by the EEOC, which enables the agency to swiftly compile information across all industries in a consistent and uniform way to monitor equity in the workplace.

EEOC in the hiring process

The EEOC strives to ensure “all segments of the population” are provided with equal opportunities. While equity amongst employees is essential, what about equity in the recruiting process? In this area, the EEOC is a little less prescriptive and consistent with their requirements. But as we all know, your talent pipeline is instrumental to achieving equity amongst employees. Thus, we strongly believe it is critical to also track “would-be-employees”, AKA your candidates!  As with any demographic information, candidate self-reports are the gold-standard for identity data. After all, self-reports reflect how the candidate sees themselves. While EEOC does offer a reporting form to capture the demographic information of applicants, it is completely voluntary and importantly prohibited from being considered as part of the hiring process. Optional participation is important for respecting an individual’s right to privacy, however it creates a challenge for data analysis in that it creates a “blindspot” of missing self-reported data. The bigger the “blindspot”, the less reliable the results can be.

You don’t know what you don’t know

Due to the optional nature of the EEOC candidate survey, companies often have an incomplete data set as a result of two factors: declined and missing responses. 

  1. Declined, sometimes written as “Prefer not to say”, means the candidate selected the survey option to not disclose their identity for that item. 
  2. Missing data occurs when data that is expected cannot be found. When it comes to EEOC data, we generally see that about half of the applicants don’t have corresponding EEOC survey data. This can be a result of multiple factors that may on the surface “look” similar and may be challenging to diagnose.
    • Refusal to complete the survey is the most obvious cause of missing data. This occurs when a candidate simply chooses not to answer a survey item (aka “refusal”), which is similar yet distinct from an explicit “Prefer not to answer” response. This may reflect a candidate’s belief that responses will be detrimental to their hiring experience or that the data itself is not of value. Our research suggests that nearly a third of missing data is caused by candidates refusing to answer EEOC survey questions.
    • Accessibility issues that could be caused by technology on the applicant or company side prevent the survey from being presented to the candidate during the application process. Applicant Tracking Systems (ATS) may charge extra for this feature and often take different approaches to default options asking organizations to bear the burden of configuring jobs to offer EEOC surveys during the application process. Validating that your jobs are configured to offer the EEOC survey (for all sources) is one way to reduce missing data and improve your candidate-level EEOC data integrity.

The cumulative impact of declined and missing data is considered incomplete data. As with any analysis, if there is too much incomplete data, the resulting statistics may be unusable or, possibly worse, may convince you of a reality that may not be representative of the total population. 

How big of a problem is it, really?

So how much of the candidate demographic data collected via EEOC surveys is actually incomplete? Most companies evaluate their compliance with EEOC surveys by looking at the number of candidates that answered specific questions. For gender, this would look like:

Based on our research, this calculation generally results in a compliance rate of 90+%. However, this approach completely ignores missing data, which often accounts for more than 25% of a company’s total dataset per EEOC field. From a conservative estimate, this means that a company having a 90% compliance rate based on the above calculation would only have a 65% compliance rate in reality. 

A compliance rate calculation which considers all of the incomplete data (not only the declined responses) would look like:

Continuing with the previous example, this calculation would yield the “realistic” compliance rate of 65%. 

While Compliance Rate offers a quantifiable measure that may be helpful for diversity assessment, it is important to recognize several ways “blindspots” (or areas for special consideration) can manifest in this analysis:

  1. Missing data has an impact: As we saw in the two calculations above, the inclusion or exclusion of missing data can dramatically affect the rate you see, thereby influencing your perception of the results and the decisions that follow. 
  2. Sample size matters: In general, the lower the Compliance Rate, the higher the probability that the responses are different to the candidate pool. If you have a 10% Compliance Rate, then the results are much less accurate than a 50% Compliance Rate. Again, this is only detectable if you consider the total number of applications in your calculation (rather than only completed surveys).
  3. Same rates does not mean equal: Some samples have biases. If candidates from particular backgrounds do not fill out the surveys, then it’s not enough to extrapolate your 65% Compliance Rate to the rest of the population. In order to truly be generalizable, samples must be constructed carefully and methodologically. For example, sampling for election polling involves extensive population research in order to yield fairly accurate estimates based on the small number of people polled. Unfortunately, the collection of EEOC candidate surveys does not lend itself to the statistical rigor necessary to ensure a “fair” sample. Therefore, you cannot assume the sample of candidates who did fill out the surveys are representative of your entire applicant population.

No analysis is without some type of “blindspot”, but being aware of where they might exist and how pervasive they may be in your data sets allow your company to interpret your data more reliably and accurately. 

EEOC data is rich, but often limited in potential use

It is easy to consider the valuable use cases of the rich EEOC data. However, usage to proactively and positively influence your hiring practices remains challenging.  

  • Declined or Missing Data, as highlighted above, is the most common impediment to harnessing this valuable candidate data. 
  • Inflexible Requirements dictated by the USA Federal Government, means that individual companies have no ability to edit or expand response options. While this methodology is ideal for comparability purposes, it may feel limiting to some identities and your future analysis efforts.
  • US-centric standards that may not apply globally and often are not due to local compliance regulations, norms, or other unique considerations.

There is no question about the value to better understand your candidate make-up and the influences of your hiring process in the eventual hires you secure. However, most companies have found their ability to reliably trust and use EEOC challenging, and thus the majority of the market ignores it.

How else can we “fill the gap”?

So if we cannot look to self-reported EEOC data to provide companies with important gender identity information across their entire candidate pipeline, what other options exist? At Datapeople we saw our customers seeking alternatives, but these efforts often added complexity to their hiring process or consumed unsustainable amounts of team resources. To support our customers and the talent populations they engage, we committed to innovation and built our own custom inference model.

An inference model is a machine learning model that has undergone training and is capable of making predictions on new, previously unseen, data based on the patterns it learned from the training data. At Datapeople, we leverage a proprietary algorithm trained on vast public datasets of self-reported gender and first names, encompassing a global range of combinations beyond just US and English names. This model is proven to have over 90% accuracy when compared to actual self-reports, and is generally able to provide an inferred gender for candidates more than 85% of the time. With this approach, our platform and customers are able to utilize more candidate data for analysis than previously achievable with self-reported EEOC data, delivering more reliable and generalizable overall trends and insights to improve gender equity. To learn more about our Gender Inference Model, click here

Beyond legal requirements, ensuring members from all communities feel safe and welcome to pursue employment with a company is considered the utmost priority for most employers. Employers want to know: Are we attracting individuals with different identities to our roles? Some ways to explore this question are:

  • Most companies today leverage existing data collection efforts (including EEOC surveys) whenever possible while being mindful of “blindspots” and caveats when determining calculations and interpreting trends. However, this approach puts the onus on the company to accurately identify and mitigate “blindspots” before using the data, adding a manual step that many are not willing to deploy. 
  • Mature companies who have established priorities of equity and data-driven insights to unlock efficiency have recognized the need to look beyond self-reported data for capturing candidate characteristics, such as inference modeling. Although modeling is not suitable for granular analysis of individual candidates, it is helpful in identifying trends that will power the future of their hiring process and put them in a better position to achieve their business and talent acquisition/population goals.

Are you ready to take advantage of the data you collect? By embracing a comprehensive hiring platform, like Datapeople, that supercharges your ATS and brings together the entire hiring team to make data-driven collaboration a reality, you will unlock deeper and more consistently available insights. Learn more about how our Datapeople Insights offering will turn recruiting unknowns into opportunities. 

Together let’s build a future where everyone has the opportunity to thrive.

Table of Contents