OpenIntro Statistics Dataset - resume


This dataset was taken from the list of OpenIntro dataset files found at

OpenIntro features a number of free books that can be used in high school and AP statistics courses. The license on these datasets is currently unknown. You can find out more about OpenIntro at


This experiment data comes from a study that sought to understand theinfluence of race and gender on job application callback rates. The studymonitored job postings in Boston and Chicago for several months during 2001and 2002 and used this to build up a set of test cases. Over this timeperiod, the researchers randomly generating resumes to go out to a jobposting, such as years of experience and education details, to create arealistic-looking resume. They then randomly assigned a name to the resumethat would communicate the applicant's gender and race. The first nameschosen for the study were selected so that the names would predominantly berecognized as belonging to black or white individuals. For example, Lakishawas a name that their survey indicated would be interpretted as a blackwoman, while Greg was a name that would generally be interpretted to beassociated with a white male.


  • job_ad_id - Unique ID associated with the advertisement.
  • job_city - City where the job was located.
  • job_industry - Industry of the job.
  • job_type - Type of role.
  • job_fed_contractor - Indicator for if the employer is a federal contractor.
  • job_equal_opp_employer - Indicator for if the employer is anEqual Opportunity Employer.
  • job_ownership - The type ofcompany, e.g. a nonprofit or a private company.
  • job_req_any - Indicator for if any job requirements arelisted. If so, the other job_req_* fields give more detail.
  • job_req_communication - Indicator for if communication skillsare required.
  • job_req_education - Indicator for if somelevel of education is required.
  • job_req_min_experience - Amount of experience required.
  • job_req_computer - Indicatorfor if computer skills are required.
  • job_req_organization - Indicator for if organization skills are required.
  • job_req_school - Level of education required.
  • received_callback - Indicator for if there was a callback fromthe job posting for the person listed on this resume.
  • firstname - The first name used on the resume.
  • race - Inferred race associated with the first name on theresume.
  • gender - Inferred gender associated with the firstname on the resume.
  • years_college - Years of collegeeducation listed on the resume.
  • college_degree - Indicatorfor if the resume listed a college degree.
  • honors - Indicator for if the resume listed that the candidate has been awarded somehonors.
  • worked_during_school - Indicator for if the resumelisted working while in school.
  • years_experience - Years ofexperience listed on the resume.
  • computer_skills - Indicator for if computer skills were listed on the resume. These skillswere adapted for listings, though the skills were assigned independently ofother details on the resume.
  • special_skills - Indicator forif any special skills were listed on the resume.
  • volunteer - Indicator for if volunteering was listed on theresume.
  • military - Indicator for if military experience waslisted on the resume.
  • employment_holes - Indicator for ifthere were holes in the person's employment history.
  • has_email_address - Indicator for if the resume lists an emailaddress.
  • resume_quality - Each resume was generallyclassified as either lower or higher quality.


Bertrand M, Mullainathan S. 2004. "Are Emily and Greg More Employablethan Lakisha and Jamal? A Field Experiment on Labor Market Discrimination".The American Economic Review 94:4 (991-1013).


Because this is an experiment, where the race and gender attributes arebeing randomly assigned to the resumes, we can conclude that anystatistically significant difference in callback rates is causally linked tothese attributes.

Do you think it's reasonable to make a causal conclusion? You may have somehealth skepticism. However, do take care to appreciate that this was anexperiment: the first name (and so the inferred race and gender) wererandomly assigned to the resumes, and the quality and attributes of a resumewere assigned independent of the race and gender. This means that anyeffects we observe are in fact causal, and the effects related to race areboth statistically significant and very large: white applicants had about a50% better chance of getting a callback than black candidates.

Do you still have doubts lingering in the back of your mind about thevalidity of this study? Maybe a counterargument about why the standardconclusions from this study may not apply? The article summarizing theresults was exceptionally well-written, and it addresses many potentialconcerns about the study's approach. So if you're feeling skeptical aboutthe conclusions, please find the link above and explore!

Taken from:

Title Authored on Content type
R Dataset / Package psych / bfi March 9, 2018 - 1:06 PM Dataset
OpenIntro Statistics Dataset - scotus_healthcare August 9, 2020 - 2:38 PM Dataset
R Dataset / Package psych / withinBetween March 9, 2018 - 1:06 PM Dataset
R Dataset / Package Stat2Data / Kids198 March 9, 2018 - 1:06 PM Dataset
R Dataset / Package Ecdat / Wages1 March 9, 2018 - 1:06 PM Dataset
Attachment Size
dataset-448801590.csv 688.84 KB
Dataset License
Documentation License
No license (All rights reserved)