R crimtab 學生的 3000 名罪犯數據

R語言 crimtab 位於 datasets 包(package)。

說明

英格蘭和威爾士主要監獄中 3000 名 20 歲以上男性罪犯的數據。

用法

crimtab

格式

integer 計數的 table 對象，維度為 42 \times 22，總計數 sum(crimtab) 為 3000。

42 個rownames("9.4"、"9.5"、...)對應於手指長度間隔的中點，而 22 個列名稱(colnames)("142.24"、"144.78"、...)對應於手指長度間隔的中點。達到 3000 名罪犯的(身體)高度，另見下文。

細節

學生是威廉·西利·戈塞特的筆名。在他 1908 年的論文中(第 13 頁)，他在標題為上述方程的實際測試的第六節開頭寫道：

“在我成功地通過分析方法解決問題之前，我一直努力通過經驗來解決問題。使用的材料是一個相關表，其中包含 3000 名罪犯的身高和左中指測量值，來自 W. R. MacDonell 的一篇論文(Biometrika，Vol. I.，第 219 頁)。測量結果被寫在 3000 張紙板上，然後將其徹底打亂並隨機抽取。每張卡被抽出時，其數字都會記在一本書中，因此其中包含了 3000 名犯罪分子的尺寸(按隨機順序排列)。最後，將每組連續 4 個樣本作為樣本(總共 750 個樣本)，並確定每個樣本的平均值、標準差和相關性。然後將每個樣本的平均值與總體平均值之間的差值除以樣本的標準差，得到第三部分的 z。”

該表實際上是 MacDonell (1902) 中的第 216 頁，而不是第 219 頁。在MacDonell表中，中指長度以毫米為單位，高度以英尺/英寸間隔為單位，此處均轉換為厘米。使用間隔的中點，例如，當 MacDonell 有 4' 7''9/16 -- 8''9/16 時，我們有 142.24，即 2.54*56 = 2.54*( 4' 8'' )。

MacDonell 注明數據來源(第 178 頁)如下：回憶錄所依據的數據是在 Garson 博士的幫助下從新蘇格蘭場中央計量辦公室獲得的……他在第 178 頁上指出179：這些表格是從辦公室書架上的大量表格中隨機抽取的；因此，我們正在處理隨機抽樣。

例子

require(stats)
dim(crimtab)
utils::str(crimtab)
## for nicer printing:
local({cT <- crimtab
       colnames(cT) <- substring(colnames(cT), 2, 3)
       print(cT, zero.print = " ")
})

## Repeat Student's experiment:

# 1) Reconstitute 3000 raw data for heights in inches and rounded to
#    nearest integer as in Student's paper:

(heIn <- round(as.numeric(colnames(crimtab)) / 2.54))
d.hei <- data.frame(height = rep(heIn, colSums(crimtab)))

# 2) shuffle the data:

set.seed(1)
d.hei <- d.hei[sample(1:3000), , drop = FALSE]

# 3) Make 750 samples each of size 4:

d.hei$sample <- as.factor(rep(1:750, each = 4))

# 4) Compute the means and standard deviations (n) for the 750 samples:

h.mean <- with(d.hei, tapply(height, sample, FUN = mean))
h.sd   <- with(d.hei, tapply(height, sample, FUN = sd)) * sqrt(3/4)

# 5) Compute the difference between the mean of each sample and
#    the mean of the population and then divide by the
#    standard deviation of the sample:

zobs <- (h.mean - mean(d.hei[,"height"]))/h.sd

# 6) Replace infinite values by +/- 6 as in Student's paper:

zobs[infZ <- is.infinite(zobs)] # none of them 
zobs[infZ] <- 6 * sign(zobs[infZ])

# 7) Plot the distribution:

require(grDevices); require(graphics)
hist(x = zobs, probability = TRUE, xlab = "Student's z",
     col = grey(0.8), border = grey(0.5),
     main = "Distribution of Student's z score  for 'crimtab' data")

來源

https://pbil.univ-lyon1.fr/R/donnees/criminals1902.txt感謝 Jean R. Lobry 和 Anne-Béatrice Dufour。

參考

Garson, J.G. (1900). The metric system of identification of criminals, as used in Great Britain and Ireland. The Journal of the Anthropological Institute of Great Britain and Ireland, 30, 161-198. doi:10.2307/2842627.

MacDonell, W.R. (1902). On criminal anthropometry and the identification of criminals. Biometrika, 1(2), 177-227. doi:10.2307/2331487.

Student (1908). The probable error of a mean. Biometrika, 6, 1-25. doi:10.2307/2331554.

相關用法

注：本文由純淨天空篩選整理自R-devel大神的英文原創作品 Student's 3000 Criminals Data。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。