# R Dataset / Package datasets / crimtab

Documentation |
---|

On this Picostat.com statistics page, you will find information about the crimtab data set which pertains to Student's 3000 Criminals Data. The crimtab data set is found in the datasets R package. You can load the crimtab data set in R by issuing the following command at the console data("crimtab"). This will load the data into a variable called crimtab. If R says the crimtab data set is not found, you can try installing the package by issuing this command install.packages("datasets") and then attempt to reload the data. If you need to download R, you can go to the R project website. You can download a CSV (comma separated values) version of the crimtab R data set. The size of this file is about 2,143 bytes. ## Student's 3000 Criminals Data## DescriptionData of 3000 male criminals over 20 years old undergoing their sentences in the chief prisons of England and Wales. ## Usagecrimtab ## FormatA The 42 ## DetailsStudent is the pseudonym of William Sealy Gosset.
In his 1908 paper he wrote (on page 13) at the beginning of section VI
entitled “Before I had succeeded in solving my problem analytically,
I had endeavoured to do so empirically. The material used was a
correlation table containing the height and left middle finger
measurements of 3000 criminals, from a paper by W. R. MacDonell
( The table is in fact page 216 and not page 219 in MacDonell(1902).
In the MacDonell table, the middle finger lengths were given in mm
and the heights in feet/inches intervals, they are both converted into
cm here. The midpoints of intervals were used, e.g., where MacDonell
has MacDonell credited the source of data (page 178) as follows:
## Sourcehttp://pbil.univ-lyon1.fr/R/donnees/criminals1902.txt thanks to Jean R. Lobry and Anne-Béatrice Dufour. ## ReferencesGarson, J.G. (1900)
The metric system of identification of criminals, as used in in Great
Britain and Ireland.
MacDonell, W.R. (1902)
On criminal anthropometry and the identification of criminals.
Student (1908) The probable error of a mean.
## Examplesrequire(stats) dim(crimtab) utils::str(crimtab) ## for nicer printing: local({cT <- crimtab colnames(cT) <- substring(colnames(cT), 2, 3) print(cT, zero.print = " ") })## Repeat Student's experiment:# 1) Reconstitute 3000 raw data for heights in inches and rounded to # nearest integer as in Student's paper:(heIn <- round(as.numeric(colnames(crimtab)) / 2.54)) d.hei <- data.frame(height = rep(heIn, colSums(crimtab)))# 2) shuffle the data:set.seed(1) d.hei <- d.hei[sample(1:3000), , drop = FALSE]# 3) Make 750 samples each of size 4:d.hei$sample <- as.factor(rep(1:750, each = 4))# 4) Compute the means and standard deviations (n) for the 750 samples:h.mean <- with(d.hei, tapply(height, sample, FUN = mean)) h.sd <- with(d.hei, tapply(height, sample, FUN = sd)) * sqrt(3/4)# 5) Compute the difference between the mean of each sample and # the mean of the population and then divide by the # standard deviation of the sample:zobs <- (h.mean - mean(d.hei[,"height"]))/h.sd# 6) Replace infinite values by +/- 6 as in Student's paper:zobs[infZ <- is.infinite(zobs)] # 3 of them zobs[infZ] <- 6 * sign(zobs[infZ])# 7) Plot the distribution:require(grDevices); require(graphics) hist(x = zobs, probability = TRUE, xlab = "Student's z", col = grey(0.8), border = grey(0.5), main = "Distribution of Student's z score for 'crimtab' data") -- Dataset imported from https://www.r-project.org. |

Title | Authored on | Content type |
---|---|---|

R Dataset / Package psych / bfi | March 9, 2018 - 1:06 PM | Dataset |

OpenIntro Statistics Dataset - scotus_healthcare | August 9, 2020 - 2:38 PM | Dataset |

R Dataset / Package psych / withinBetween | March 9, 2018 - 1:06 PM | Dataset |

R Dataset / Package Stat2Data / Kids198 | March 9, 2018 - 1:06 PM | Dataset |

R Dataset / Package Ecdat / Wages1 | March 9, 2018 - 1:06 PM | Dataset |

Attachment | Size |
---|---|

dataset-78939.csv | 2.09 KB |