R dplyr select 使用列的名称和类型保留或删除列

选择(并可选择重命名) DataFrame 中的变量，使用简洁的 mini-language，可以轻松地根据名称引用变量(例如，a:f 选择从左侧的 a 到 f 的所有列右侧)或输入(例如where(is.numeric) 选择所有数字列)。

选择函数概述

Tidyverse 选择实现了 R 的方言，其中运算符可以轻松选择变量：

: 用于选择一系列连续变量。
! 用于获取一组变量的补集。
&和|用于选择两组变量的交集或并集。
c() 用于组合选择。

此外，您还可以使用选择助手。一些助手选择特定的列：

everything() ：匹配所有变量。
last_col() ：选择最后一个变量，可能带有偏移量。
group_cols() ：选择所有分组列。

其他助手通过匹配名称中的模式来选择变量：

starts_with() ：以前缀开头。
ends_with() ：以后缀结尾。
contains()：包含文字字符串。
matches() ：匹配正则表达式。
num_range() ：匹配数字范围，如 x01、x02、x03。

或者来自存储在字符向量中的变量：

all_of() ：匹配字符向量中的变量名称。所有名称都必须存在，否则会引发越界错误。
any_of() ：与 all_of() 相同，但对于不存在的名称不会引发错误。

或者使用谓词函数：

where() ：将函数应用于所有变量并选择函数返回 TRUE 的变量。

用法

select(.data, ...)

参数

.data: 数据帧、数据帧扩展(例如 tibble)或惰性数据帧(例如来自 dbplyr 或 dtplyr)。有关更多详细信息，请参阅下面的方法。
...: < tidy-select > 一个或多个未加引号的表达式，以逗号分隔。变量名称可以像 DataFrame 中的位置一样使用，因此可以使用 x:y 等表达式来选择一系列变量。

值

与 .data 类型相同的对象。输出具有以下属性：

行不受影响。
输出列是输入列的子集，可能具有不同的顺序。如果使用 new_name = old_name 形式，列将被重命名。
DataFrame 属性被保留。
维持群组；您无法选择关闭分组变量。

方法

该函数是泛型函数，这意味着包可以为其他类提供实现(方法)。有关额外参数和行为差异，请参阅各个方法的文档。

加载的包中当前提供以下方法： dbplyr ( tbl_lazy )、dplyr ( data.frame ) 。

例子

这里我们展示了基本选择运算符的用法。请参阅特定的帮助页面以了解类似 starts_with() 的帮助程序。

选择语言可用于 dplyr::select() 或 tidyr::pivot_longer() 等函数。我们先附上 tidyverse：

library(tidyverse)

# For better printing
iris <- as_tibble(iris)

按名称选择变量：

starwars %>% select(height)
#> # A tibble: 87 x 1
#>   height
#>    <int>
#> 1    172
#> 2    167
#> 3     96
#> 4    202
#> # i 83 more rows

iris %>% pivot_longer(Sepal.Length)
#> # A tibble: 150 x 6
#>   Sepal.Width Petal.Length Petal.Width Species name         value
#>         <dbl>        <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5          1.4         0.2 setosa  Sepal.Length   5.1
#> 2         3            1.4         0.2 setosa  Sepal.Length   4.9
#> 3         3.2          1.3         0.2 setosa  Sepal.Length   4.7
#> 4         3.1          1.5         0.2 setosa  Sepal.Length   4.6
#> # i 146 more rows

通过用逗号分隔来选择多个变量。请注意列的顺序是如何由输入的顺序确定的：

starwars %>% select(homeworld, height, mass)
#> # A tibble: 87 x 3
#>   homeworld height  mass
#>   <chr>      <int> <dbl>
#> 1 Tatooine     172    77
#> 2 Tatooine     167    75
#> 3 Naboo         96    32
#> 4 Tatooine     202   136
#> # i 83 more rows

像 tidyr::pivot_longer() 这样的函数不接受带点的变量。在本例中，使用 c() 选择多个变量：

iris %>% pivot_longer(c(Sepal.Length, Petal.Length))
#> # A tibble: 300 x 5
#>   Sepal.Width Petal.Width Species name         value
#>         <dbl>       <dbl> <fct>   <chr>        <dbl>
#> 1         3.5         0.2 setosa  Sepal.Length   5.1
#> 2         3.5         0.2 setosa  Petal.Length   1.4
#> 3         3           0.2 setosa  Sepal.Length   4.9
#> 4         3           0.2 setosa  Petal.Length   1.4
#> # i 296 more rows

操作符：

: 运算符选择一系列连续变量：

starwars %>% select(name:mass)
#> # A tibble: 87 x 3
#>   name           height  mass
#>   <chr>           <int> <dbl>
#> 1 Luke Skywalker    172    77
#> 2 C-3PO             167    75
#> 3 R2-D2              96    32
#> 4 Darth Vader       202   136
#> # i 83 more rows

! 运算符否定选择：

starwars %>% select(!(name:mass))
#> # A tibble: 87 x 11
#>   hair_color skin_color  eye_color birth_year sex   gender    homeworld species
#>   <chr>      <chr>       <chr>          <dbl> <chr> <chr>     <chr>     <chr>  
#> 1 blond      fair        blue            19   male  masculine Tatooine  Human  
#> 2 <NA>       gold        yellow         112   none  masculine Tatooine  Droid  
#> 3 <NA>       white, blue red             33   none  masculine Naboo     Droid  
#> 4 none       white       yellow          41.9 male  masculine Tatooine  Human  
#> # i 83 more rows
#> # i 3 more variables: films <list>, vehicles <list>, starships <list>

iris %>% select(!c(Sepal.Length, Petal.Length))
#> # A tibble: 150 x 3
#>   Sepal.Width Petal.Width Species
#>         <dbl>       <dbl> <fct>  
#> 1         3.5         0.2 setosa 
#> 2         3           0.2 setosa 
#> 3         3.2         0.2 setosa 
#> 4         3.1         0.2 setosa 
#> # i 146 more rows

iris %>% select(!ends_with("Width"))
#> # A tibble: 150 x 3
#>   Sepal.Length Petal.Length Species
#>          <dbl>        <dbl> <fct>  
#> 1          5.1          1.4 setosa 
#> 2          4.9          1.4 setosa 
#> 3          4.7          1.3 setosa 
#> 4          4.6          1.5 setosa 
#> # i 146 more rows

& 和 | 取两个选择的交集或并集：

iris %>% select(starts_with("Petal") & ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Width
#>         <dbl>
#> 1         0.2
#> 2         0.2
#> 3         0.2
#> 4         0.2
#> # i 146 more rows

iris %>% select(starts_with("Petal") | ends_with("Width"))
#> # A tibble: 150 x 3
#>   Petal.Length Petal.Width Sepal.Width
#>          <dbl>       <dbl>       <dbl>
#> 1          1.4         0.2         3.5
#> 2          1.4         0.2         3  
#> 3          1.3         0.2         3.2
#> 4          1.5         0.2         3.1
#> # i 146 more rows

要获取两个选择之间的差异，请组合 & 和 ! 运算符：

iris %>% select(starts_with("Petal") & !ends_with("Width"))
#> # A tibble: 150 x 1
#>   Petal.Length
#>          <dbl>
#> 1          1.4
#> 2          1.4
#> 3          1.3
#> 4          1.5
#> # i 146 more rows

也可以看看

其他单表动词： arrange() 、 filter() 、 mutate() 、 reframe() 、 rename() 、 slice() 、 summarise()

源代码：R/select.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Keep or drop columns using their names and types。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。