R dplyr select 使用列的名稱和類型保留或刪除列

選擇(並可選擇重命名) DataFrame 中的變量，使用簡潔的 mini-language，可以輕鬆地根據名稱引用變量(例如，a:f 選擇從左側的 a 到 f 的所有列右側)或輸入(例如where(is.numeric) 選擇所有數字列)。

選擇函數概述

Tidyverse 選擇實現了 R 的方言，其中運算符可以輕鬆選擇變量：

: 用於選擇一係列連續變量。
! 用於獲取一組變量的補集。
&和|用於選擇兩組變量的交集或並集。
c() 用於組合選擇。

此外，您還可以使用選擇助手。一些助手選擇特定的列：

everything() ：匹配所有變量。
last_col() ：選擇最後一個變量，可能帶有偏移量。
group_cols() ：選擇所有分組列。

其他助手通過匹配名稱中的模式來選擇變量：

starts_with() ：以前綴開頭。
ends_with() ：以後綴結尾。
contains()：包含文字字符串。
matches() ：匹配正則表達式。
num_range() ：匹配數字範圍，如 x01、x02、x03。

或者來自存儲在字符向量中的變量：

all_of() ：匹配字符向量中的變量名稱。所有名稱都必須存在，否則會引發越界錯誤。
any_of() ：與 all_of() 相同，但對於不存在的名稱不會引發錯誤。

或者使用謂詞函數：

where() ：將函數應用於所有變量並選擇函數返回 TRUE 的變量。

用法

select(.data, ...)

參數

.data: 數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。有關更多詳細信息，請參閱下麵的方法。
...: < tidy-select > 一個或多個未加引號的表達式，以逗號分隔。變量名稱可以像 DataFrame 中的位置一樣使用，因此可以使用 x:y 等表達式來選擇一係列變量。

值

與 .data 類型相同的對象。輸出具有以下屬性：

行不受影響。
輸出列是輸入列的子集，可能具有不同的順序。如果使用 new_name = old_name 形式，列將被重命名。
DataFrame 屬性被保留。
維持群組；您無法選擇關閉分組變量。

方法

該函數是泛型函數，這意味著包可以為其他類提供實現(方法)。有關額外參數和行為差異，請參閱各個方法的文檔。

加載的包中當前提供以下方法： dbplyr ( tbl_lazy )、dplyr ( data.frame ) 。

例子

這裏我們展示了基本選擇運算符的用法。請參閱特定的幫助頁麵以了解類似 starts_with() 的幫助程序。

選擇語言可用於 dplyr::select() 或 tidyr::pivot_longer() 等函數。我們先附上 tidyverse：

library(tidyverse)

# For better printing
iris <- as_tibble(iris)

按名稱選擇變量：

starwars %>% select(height)
#> # A tibble: 87 x 1
#>   height
#>    <int>
#> 1    172
#> 2    167
#> 3     96
#> 4    202
#> # i 83 more rows

iris %>% pivot_longer(Sepal.Length)
#> # A tibble: 150 x 6
#>   Sepal.Width Petal.Length Petal.Width Species name         value
#>         <dbl>        <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5          1.4         0.2 setosa  Sepal.Length   5.1
#> 2         3            1.4         0.2 setosa  Sepal.Length   4.9
#> 3         3.2          1.3         0.2 setosa  Sepal.Length   4.7
#> 4         3.1          1.5         0.2 setosa  Sepal.Length   4.6
#> # i 146 more rows

通過用逗號分隔來選擇多個變量。請注意列的順序是如何由輸入的順序確定的：

starwars %>% select(homeworld, height, mass)
#> # A tibble: 87 x 3
#>   homeworld height  mass
#>   <chr>      <int> <dbl>
#> 1 Tatooine     172    77
#> 2 Tatooine     167    75
#> 3 Naboo         96    32
#> 4 Tatooine     202   136
#> # i 83 more rows

像 tidyr::pivot_longer() 這樣的函數不接受帶點的變量。在本例中，使用 c() 選擇多個變量：

iris %>% pivot_longer(c(Sepal.Length, Petal.Length))
#> # A tibble: 300 x 5
#>   Sepal.Width Petal.Width Species name         value
#>         <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5         0.2 setosa  Sepal.Length   5.1
#> 2         3.5         0.2 setosa  Petal.Length   1.4
#> 3         3           0.2 setosa  Sepal.Length   4.9
#> 4         3           0.2 setosa  Petal.Length   1.4
#> # i 296 more rows

操作符：

: 運算符選擇一係列連續變量：

starwars %>% select(name:mass)
#> # A tibble: 87 x 3
#>   name           height  mass
#>   <chr>           <int> <dbl>
#> 1 Luke Skywalker    172    77
#> 2 C-3PO             167    75
#> 3 R2-D2              96    32
#> 4 Darth Vader       202   136
#> # i 83 more rows

! 運算符否定選擇：

starwars %>% select(!(name:mass))
#> # A tibble: 87 x 11
#>   hair_color skin_color  eye_color birth_year sex   gender    homeworld species
#>   <chr>      <chr>       <chr>          <dbl> <chr> <chr>     <chr>     <chr>  
#> 1 blond      fair        blue            19   male  masculine Tatooine  Human  
#> 2 <NA>       gold        yellow         112   none  masculine Tatooine  Droid  
#> 3 <NA>       white, blue red             33   none  masculine Naboo     Droid  
#> 4 none       white       yellow          41.9 male  masculine Tatooine  Human  
#> # i 83 more rows
#> # i 3 more variables: films <list>, vehicles <list>, starships <list>

iris %>% select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#>   Sepal.Width Petal.Width Species
#>         <dbl>       <dbl> <fct>  
#> 1         3.5         0.2 setosa 
#> 2         3           0.2 setosa 
#> 3         3.2         0.2 setosa 
#> 4         3.1         0.2 setosa 
#> # i 146 more rows

iris %>% select(!ends_with("Width"))
#> # A tibble: 150 x 3
#>   Sepal.Length Petal.Length Species
#>          <dbl>        <dbl> <fct>  
#> 1          5.1          1.4 setosa 
#> 2          4.9          1.4 setosa 
#> 3          4.7          1.3 setosa 
#> 4          4.6          1.5 setosa 
#> # i 146 more rows

& 和 | 取兩個選擇的交集或並集：

iris %>% select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Width
#>         <dbl>
#> 1         0.2
#> 2         0.2
#> 3         0.2
#> 4         0.2
#> # i 146 more rows

iris %>% select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#>   Petal.Length Petal.Width Sepal.Width
#>          <dbl>       <dbl>       <dbl>
#> 1          1.4         0.2         3.5
#> 2          1.4         0.2         3  
#> 3          1.3         0.2         3.2
#> 4          1.5         0.2         3.1
#> # i 146 more rows

要獲取兩個選擇之間的差異，請組合 & 和 ! 運算符：

iris %>% select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Length
#>          <dbl>
#> 1          1.4
#> 2          1.4
#> 3          1.3
#> 4          1.5
#> # i 146 more rows

也可以看看

其他單表動詞： arrange() 、 filter() 、 mutate() 、 reframe() 、 rename() 、 slice() 、 summarise()

源代碼：R/select.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Keep or drop columns using their names and types。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。