How we developed the Tech Hiring Report 2022

For this report, we analyzed jobs data from over 10,000 employers for the years 2019, 2020, and 2021. Here’s a closer look at how we did it.

What's new

Our unique contribution was the combination of a large and diverse dataset of job descriptions and job outcome data with annotations (including job postings, job types, companies, applicants, and application sources) that gave us significant depth in interpreting trends.

Unlike most other market reports, we categorized jobs by more than their job titles and categorized companies using more information than employer names. We used our language models to extract attributes of jobs (including the type of job as well as what skills and qualifications it required), attributes of the company advertising the job, and where it was located.

→ This provided much more depth than the occupational and industry classifications available in government data.

→ It also provided much more descriptive power than approaches that use the job title to classify a job. Job titles can be deceptive (e.g., a Data Scientist may go by many titles).

Together, this enabled us to better map the trends in tech hiring, expanding beyond specific skills and qualifications that were becoming more or less popular. We were able to identify where they were becoming more popular: whether that was based on physical location (i.e., Austin, Texas), a specific job type (i.e., Mobile Engineering), or a company type (i.e., Startups).

→ It also enabled us to map trends like changes in titles for particular job types (something not described in other hiring reports).

With these annotations, we controlled for the following factors in our analysis:

→ Job Attributes (e.g., job type, seniority/years of experience)
→ Location
→ Company Attributes (e.g, company size, industry)

In the analysis presented in this report, we primarily discussed areas where we saw significant and robust change.

Although we tested for other attributes, we chose to focus on robust core takeaways.

Details of the data

Sections 1, 2, 3, and 4 used our database of job descriptions, which contains annotations of job attributes, job-language sentiments, company attributes, and location.

Sections 5, 6, and 7 used our database of job outcomes. These also contain annotations of job attributes, job-language sentiments, company attributes, and location. They also contain annotations of applicant sources and inferences around gender representation in applicant pools.

Data cleaning

We restricted our sample of jobs in three key ways:

→ By focusing on full-time, U.S.-based tech jobs: Within this, we included jobs that were remote but based in the United States. (Note: We plan to focus on other geographies in upcoming editions of this report.)

→ By focusing on Employer-verified job descriptions: Data from Job Boards can suffer from duplication issues (i.e., a company can post the same job to multiple job boards). Instead, we used data from Application Tracking Systems (ATSs) (e.g., Greenhouse, Lever, Workday, iCIMS, and Taleo) to collect company-listed open jobs directly. These reflect job requisitions created by Recruiting Teams and, therefore, are credible sources about a company’s hiring intent.

→ In addition, we deduplicated jobs in our database, comparing incidence among Subsidiary and Parent companies.

→ By excluding jobs that did not show significant reflections of markets: Generally speaking, tech jobs at universities, nonprofits, and government entities are less sensitive to job market realities. We excluded these and Internship jobs from our analytic dataset.

In total, we collected job descriptions from 122 different ATSs and Company Career Sites, including more than 10,000 U.S. employers. This coverage provided a broad sample of tech jobs online in the United States.

Our focus was largely on jobs that were advertised online and were available to external job seekers. Some jobs were not posted online or may have been posted through Internal Systems (e.g., Internal Jobs). These were not included in our analyses.

Data labeling and augmentation

We annotated jobs in three major ways:


→ The locations we present in this report are based on mapping job locations to Metropolitan Statistical Areas (MSAs), which are geographic areas used by the government for statistical analyses.

→ These are geographical regions with a relatively high population density (50,000+) and close economic ties (i.e., nearby counties from which people may commute). This enabled us to map jobs in Manhattan and Brooklyn to the New York City MSA and treat jobs equivalently, regardless of the level of detail provided about their location.

→ We used the Federal designation of MSAs for almost all cases. The only change we introduced was to create a Bay Area MSA that contained the San Francisco-Oakland-Berkeley MSA and the San Jose-Sunnyvale-Santa Clara MSA.

Company attributes

→ We segmented companies based on various attributes in this analysis. These were based on our own internal research and classification of companies and did not correspond to government agency mappings (e.g., the North American Industry Classification System [NAICS]) or other industry mappings. These are further described in the Glossary.

Job attributes

→ Our analysis was based on decomposing job descriptions into their representative job types, skills, qualifications, and language attributes. These were based on our own internal research and training of language models on more than 40 million job descriptions. Our job taxonomy is proprietary and does not correspond to government agency mappings (e.g., the Bureau of Labor Statistics Standard Occupational Classification) or other industry mappings.

We annotated our job outcomes data in three additional ways:

Application sources

→ ATSs define application sources in a variety of different ways. Some enable user-inputted data, which is where we get ‘LinkedIn,’ ‘Linked-in,’ ‘linkedin,’ and other ways to denote LinkedIn. Meanwhile, others define whether a source like LinkedIn is being used as a sourcing/prospecting tool or as a Job Board.

→ Our platform organizes over 300,000 application sources into core categories. In this report, we present a top-level grouping focused on whether applicants applied through Company Career Sites, LinkedIn Job Posts, Other Organic Sources, Prospecting, Referrals, or Internal Job Postings.

Applicant gender representation

→ We present the results of gender inferences about applicant pool attributes. These were based on our proprietary inference engine.

→ Comparisons against Equal Employment Opportunity (EEO) self-reports suggest this is 94% accurate and does not suffer from issues of non-response associated with EEO data.

Data hygiene

→ Because we analyzed applicant pool sizes, we excluded recruiting data hygiene issues that would skew our ability to define the average applicant pool size/makeup for a certain type of tech job. These included:

→ Internal (-only) requisitions
→ Requisitions with a single application
→ Requisitions with incomplete job content (e.g, that don’t describe the skills/qualifications of the jobs or only include placeholder language)

Data analysis

In this analysis, we make the assumption that a job posting represents hiring intent. At Datapeople, we know that this may not always be the case: a single job posting can represent multiple positions (e.g., Evergreen jobs, Volume jobs) or vice versa.

→ Our goal in this analysis was to map industry-wide trends. Therefore, we looked for significant and robust changes. The analysis we present was the result of multiple tests with subsamples of data. We did this to exclude the impact of particular companies so that we could validate that a trend was broadly applicable and not just the result of an individual company’s recruiting hygiene.

Moreover, to understand changes in the trends we observe, we tested for compositional biases in the attributes of the job, company, and location in our data.

→ Although these tests could yield multiple variables that could correlate with a trend, we identified and discussed the ones that were the most robust.

Tech Hiring Report

To continue reading Tech Hiring Report 2022, please tell us more about yourself.

By clicking the button, you agree to Datapeople’s Privacy Policy. You also agree to receive the latest articles, reports, event invitations, and other communications from Datapeople.