OpenIntro Statistics Dataset - photo_classify

Attachment Size
dataset-151428313.csv 32.33 KB
Dataset License
Documentation License
No license (All rights reserved)

This dataset was taken from the list of OpenIntro dataset files found at

OpenIntro features a number of free books that can be used in high school and AP statistics courses. The license on these datasets is currently unknown. You can find out more about OpenIntro at


Photo classifications: fashion or not

This is a simulated data set for photo classifications based on a machine learning algorithm versus what the true classification is for those photos. While the data are not real, they resemble performance that would be reasonable to expect in a well-built classifier.


  • mach_learn - The prediction by the machine learning system as to whether the photo is about fashion or not.
  • truth - The actual classification of the photo by a team of humans.


The hypothetical ML algorithm has a precision of 90\ meaning of those photos it claims are fashion, about 90\ of them are actually about fashion. The recall of the ML algorithm is about 64\ of the photos that are about fashion, it correctly predicts that they are about fashion about 64\


The data are simulated / hypothetical.

Taken from:

Title Authored on Content type
OpenIntro Statistics Dataset - dream August 9, 2020 - 12:25 PM Dataset
OpenIntro Statistics Dataset - winery_cars August 9, 2020 - 2:38 PM Dataset
R Dataset / Package HSAUR / toothpaste March 9, 2018 - 1:06 PM Dataset
R Dataset / Package HSAUR / pottery March 9, 2018 - 1:06 PM Dataset
R Dataset / Package HistData / Guerry March 9, 2018 - 1:06 PM Dataset