R crimtab 学生的 3000 名罪犯数据

R语言 crimtab 位于 datasets 包(package)。

说明

英格兰和威尔士主要监狱中 3000 名 20 岁以上男性罪犯的数据。

用法

crimtab

格式

integer 计数的 table 对象，维度为 42 \times 22，总计数 sum(crimtab) 为 3000。

42 个rownames("9.4"、"9.5"、...)对应于手指长度间隔的中点，而 22 个列名称(colnames)("142.24"、"144.78"、...)对应于手指长度间隔的中点。达到 3000 名罪犯的(身体)高度，另见下文。

细节

学生是威廉·西利·戈塞特的笔名。在他 1908 年的论文中(第 13 页)，他在标题为上述方程的实际测试的第六节开头写道：

“在我成功地通过分析方法解决问题之前，我一直努力通过经验来解决问题。使用的材料是一个相关表，其中包含 3000 名罪犯的身高和左中指测量值，来自 W. R. MacDonell 的一篇论文(Biometrika，Vol. I.，第 219 页)。测量结果被写在 3000 张纸板上，然后将其彻底打乱并随机抽取。每张卡被抽出时，其数字都会记在一本书中，因此其中包含了 3000 名犯罪分子的尺寸(按随机顺序排列)。最后，将每组连续 4 个样本作为样本(总共 750 个样本)，并确定每个样本的平均值、标准差和相关性。然后将每个样本的平均值与总体平均值之间的差值除以样本的标准差，得到第三部分的 z。”

该表实际上是 MacDonell (1902) 中的第 216 页，而不是第 219 页。在MacDonell表中，中指长度以毫米为单位，高度以英尺/英寸间隔为单位，此处均转换为厘米。使用间隔的中点，例如，当 MacDonell 有 4' 7''9/16 -- 8''9/16 时，我们有 142.24，即 2.54*56 = 2.54*( 4' 8'' )。

MacDonell 注明数据来源(第 178 页)如下：回忆录所依据的数据是在 Garson 博士的帮助下从新苏格兰场中央计量办公室获得的……他在第 178 页上指出179：这些表格是从办公室书架上的大量表格中随机抽取的；因此，我们正在处理随机抽样。

例子

require(stats)
dim(crimtab)
utils::str(crimtab)
## for nicer printing:
local({cT <- crimtab
       colnames(cT) <- substring(colnames(cT), 2, 3)
       print(cT, zero.print = " ")
})

## Repeat Student's experiment:

# 1) Reconstitute 3000 raw data for heights in inches and rounded to
#    nearest integer as in Student's paper:

(heIn <- round(as.numeric(colnames(crimtab)) / 2.54))
d.hei <- data.frame(height = rep(heIn, colSums(crimtab)))

# 2) shuffle the data:

set.seed(1)
d.hei <- d.hei[sample(1:3000), , drop = FALSE]

# 3) Make 750 samples each of size 4:

d.hei$sample <- as.factor(rep(1:750, each = 4))

# 4) Compute the means and standard deviations (n) for the 750 samples:

h.mean <- with(d.hei, tapply(height, sample, FUN = mean))
h.sd   <- with(d.hei, tapply(height, sample, FUN = sd)) * sqrt(3/4)

# 5) Compute the difference between the mean of each sample and
#    the mean of the population and then divide by the
#    standard deviation of the sample:

zobs <- (h.mean - mean(d.hei[,"height"]))/h.sd

# 6) Replace infinite values by +/- 6 as in Student's paper:

zobs[infZ <- is.infinite(zobs)] # none of them 
zobs[infZ] <- 6 * sign(zobs[infZ])

# 7) Plot the distribution:

require(grDevices); require(graphics)
hist(x = zobs, probability = TRUE, xlab = "Student's z",
     col = grey(0.8), border = grey(0.5),
     main = "Distribution of Student's z score  for 'crimtab' data")

来源

https://pbil.univ-lyon1.fr/R/donnees/criminals1902.txt感谢 Jean R. Lobry 和 Anne-Béatrice Dufour。

参考

Garson, J.G. (1900). The metric system of identification of criminals, as used in Great Britain and Ireland. The Journal of the Anthropological Institute of Great Britain and Ireland, 30, 161-198. doi:10.2307/2842627.

MacDonell, W.R. (1902). On criminal anthropometry and the identification of criminals. Biometrika, 1(2), 177-227. doi:10.2307/2331487.

Student (1908). The probable error of a mean. Biometrika, 6, 1-25. doi:10.2307/2331554.

相关用法

注：本文由纯净天空筛选整理自R-devel大神的英文原创作品 Student's 3000 Criminals Data。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。