R broom tidy.prcomp 整理 a(n) prcomp 對象

Tidy 總結了有關模型組件的信息。模型組件可能是回歸中的單個項、單個假設、聚類或類。 tidy 所認為的模型組件的確切含義因模型而異，但通常是不言而喻的。如果模型具有多種不同類型的組件，您將需要指定要返回哪些組件。

用法

# S3 method for prcomp
tidy(x, matrix = "u", ...)

參數

x

stats::prcomp() 返回的 prcomp 對象。

matrix

指定應整理 PCA 的哪個組件的字符。

"u" 、 "samples" 、 "scores" 或 "x" ：返回有關從原始空間到主成分空間的映射的信息。
"v" 、 "rotation" 、 "loadings" 或 "variables" ：將有關從主成分空間映射回原始空間的信息返回。
"d" 、 "eigenvalues" 或 "pcs" ：返回有關特征值的信息。

...

附加參數。不曾用過。僅需要匹配通用簽名。注意：拚寫錯誤的參數將被吸收到 ... 中，並被忽略。如果拚寫錯誤的參數有默認值，則將使用默認值。例如，如果您傳遞 conf.lvel = 0.9 ，所有計算將使用 conf.level = 0.95 進行。這裏有兩個異常：

tidy() 方法在提供 exponentiate 參數時會發出警告(如果該參數將被忽略)。
augment() 方法在提供 newdata 參數時會發出警告(如果該參數將被忽略)。

值

tibble::tibble，其列取決於正在整理的 PCA 的組件。

如果 matrix 是 "u" 、 "samples" 、 "scores" 或 "x" ，整理輸出中的每一行對應於 PCA 空間中的原始數據。這些列是：

row: 原始觀察的 ID(即原始數據中的行名稱)。
PC: 表示主成分的整數。
value: 該特定主成分的觀察分數。即 PCA 空間中觀測的位置。

如果 matrix 是 "v" 、 "rotation" 、 "loadings" 或 "variables" ，則整理輸出中的每一行對應於原始空間中主成分的信息。這些列是：

row: 執行 PCA 的數據集的變量標簽(列名)。
PC: 指示主成分的整數向量。
value: 指定主成分上的特征向量(軸得分)值。

如果 matrix 是 "d" 、 "eigenvalues" 或 "pcs" ，則列為：

PC: 指示主成分的整數向量。
std.dev: 此 PC 解釋的標準偏差。
percent: 該分量解釋的變異分數(0 到 1 之間的數值)。
cumulative: 由主要成分解釋的累積變異分數，直至該成分(0 到 1 之間的數值)。

細節

有關如何解釋各種整理矩陣的信息，請參閱 https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca。請注意，SVD 僅相當於中心數據上的 PCA。

也可以看看

stats::prcomp()、svd_tidiers

其他 svd 整理器：augment.prcomp()、tidy_irlba()、tidy_svd()

例子


pc <- prcomp(USArrests, scale = TRUE)

# information about rotation
tidy(pc)
#> # A tibble: 200 × 3
#>    row        PC  value
#>    <chr>   <dbl>  <dbl>
#>  1 Alabama     1 -0.976
#>  2 Alabama     2 -1.12 
#>  3 Alabama     3  0.440
#>  4 Alabama     4  0.155
#>  5 Alaska      1 -1.93 
#>  6 Alaska      2 -1.06 
#>  7 Alaska      3 -2.02 
#>  8 Alaska      4 -0.434
#>  9 Arizona     1 -1.75 
#> 10 Arizona     2  0.738
#> # ℹ 190 more rows

# information about samples (states)
tidy(pc, "samples")
#> # A tibble: 200 × 3
#>    row        PC  value
#>    <chr>   <dbl>  <dbl>
#>  1 Alabama     1 -0.976
#>  2 Alabama     2 -1.12 
#>  3 Alabama     3  0.440
#>  4 Alabama     4  0.155
#>  5 Alaska      1 -1.93 
#>  6 Alaska      2 -1.06 
#>  7 Alaska      3 -2.02 
#>  8 Alaska      4 -0.434
#>  9 Arizona     1 -1.75 
#> 10 Arizona     2  0.738
#> # ℹ 190 more rows

# information about PCs
tidy(pc, "pcs")
#> # A tibble: 4 × 4
#>      PC std.dev percent cumulative
#>   <dbl>   <dbl>   <dbl>      <dbl>
#> 1     1   1.57   0.620       0.620
#> 2     2   0.995  0.247       0.868
#> 3     3   0.597  0.0891      0.957
#> 4     4   0.416  0.0434      1    

# state map
library(dplyr)
library(ggplot2)
library(maps)

pc %>%
  tidy(matrix = "samples") %>%
  mutate(region = tolower(row)) %>%
  inner_join(map_data("state"), by = "region") %>%
  ggplot(aes(long, lat, group = group, fill = value)) +
  geom_polygon() +
  facet_wrap(~PC) +
  theme_void() +
  ggtitle("Principal components of arrest data")
#> Warning: Detected an unexpected many-to-many relationship between `x` and `y`.
#> ℹ Row 1 of `x` matches multiple rows in `y`.
#> ℹ Row 1 of `y` matches multiple rows in `x`.
#> ℹ If a many-to-many relationship is expected, set `relationship =
#>   "many-to-many"` to silence this warning.


au <- augment(pc, data = USArrests)

au
#> # A tibble: 50 × 9
#>    .rownames   Murder Assault UrbanPop  Rape .fittedPC1 .fittedPC2
#>    <chr>        <dbl>   <int>    <int> <dbl>      <dbl>      <dbl>
#>  1 Alabama       13.2     236       58  21.2    -0.976     -1.12  
#>  2 Alaska        10       263       48  44.5    -1.93      -1.06  
#>  3 Arizona        8.1     294       80  31      -1.75       0.738 
#>  4 Arkansas       8.8     190       50  19.5     0.140     -1.11  
#>  5 California     9       276       91  40.6    -2.50       1.53  
#>  6 Colorado       7.9     204       78  38.7    -1.50       0.978 
#>  7 Connecticut    3.3     110       77  11.1     1.34       1.08  
#>  8 Delaware       5.9     238       72  15.8    -0.0472     0.322 
#>  9 Florida       15.4     335       80  31.9    -2.98      -0.0388
#> 10 Georgia       17.4     211       60  25.8    -1.62      -1.27  
#> # ℹ 40 more rows
#> # ℹ 2 more variables: .fittedPC3 <dbl>, .fittedPC4 <dbl>

ggplot(au, aes(.fittedPC1, .fittedPC2)) +
  geom_point() +
  geom_text(aes(label = .rownames), vjust = 1, hjust = 1)

源代碼：R/stats-prcomp-tidiers.R

相關用法

注：本文由純淨天空篩選整理自等大神的英文原創作品 Tidy a(n) prcomp object。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。