当前位置: 首页>>代码示例 >>用法及示例精选 >>正文


R dplyr filter 保留符合条件的行


filter() 函数用于对 DataFrame 进行子集化,保留满足条件的所有行。要保留该行,在所有条件下都必须生成 TRUE 值。请注意,当条件计算为 NA 时,该行将被删除,这与 [ 的基本子集设置不同。

用法

filter(.data, ..., .by = NULL, .preserve = FALSE)

参数

.data

数据帧、数据帧扩展(例如 tibble)或惰性数据帧(例如来自 dbplyr 或 dtplyr)。有关更多详细信息,请参阅下面的方法。

...

< data-masking > 返回逻辑值的表达式,并根据 .data 中的变量进行定义。如果包含多个表达式,它们将与 & 运算符组合。仅保留所有条件评估为 TRUE 的行。

.by

[Experimental]

< tidy-select > (可选)仅针对此操作选择要分组的列,作为 group_by() 的替代方案。有关详细信息和示例,请参阅?dplyr_by

.preserve

.data 输入分组时相关。如果.preserve = FALSE(默认值),则根据结果数据重新计算分组结构,否则分组保持原样。

.data 类型相同的对象。输出具有以下属性:

  • 行是输入的子集,但以相同的顺序出现。

  • 列未修改。

  • 组的数量可以减少(如果.preserve不是TRUE)。

  • DataFrame 属性被保留。

细节

filter() 函数用于对 .data 的行进行子集化,将 ... 中的表达式应用于列值以确定应保留哪些行。它可以应用于分组和未分组的数据(请参阅 group_by()ungroup() )。然而,dplyr 还不够智能,无法优化不需要分组计算的分组数据集的过滤操作。因此,对未分组的数据进行过滤通常要快得多。

有用的过滤函数

在构造用于过滤数据的表达式时,有许多有用的函数和运算符:

分组标题

由于过滤表达式是在组内计算的,因此它们可能会在分组的 tibbles 上产生不同的结果。一旦涉及聚合、滞后或排名函数,就会出现这种情况。比较这个未分组的过滤:

starwars %>% filter(mass > mean(mass, na.rm = TRUE))

与分组等效:

starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))

在未分组的版本中,filter() 将每行中 mass 的值与全局平均值(涵盖整个数据集)进行比较,仅保留 mass 大于该全局平均值的行。相反,分组版本分别计算每个gender组的平均质量,并保留mass大于相关within-gender平均值的行。

方法

该函数是泛型函数,这意味着包可以为其他类提供实现(方法)。有关额外参数和行为差异,请参阅各个方法的文档。

加载的包中当前提供以下方法: dbplyr ( tbl_lazy )、dplyr ( data.framets ) 。

也可以看看

其他单表动词: arrange()mutate()reframe()rename()select()slice()summarise()

例子

# Filtering by one criterion
filter(starwars, species == "Human")
#> # A tibble: 35 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Luke Sky…    172    77 blond      fair       blue            19   male 
#>  2 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  3 Leia Org…    150    49 brown      light      brown           19   fema…
#>  4 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  5 Beru Whi…    165    75 brown      light      blue            47   fema…
#>  6 Biggs Da…    183    84 black      light      brown           24   male 
#>  7 Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male 
#>  8 Anakin S…    188    84 blond      fair       blue            41.9 male 
#>  9 Wilhuff …    180    NA auburn, g… fair       blue            64   male 
#> 10 Han Solo     180    80 brown      fair       brown           29   male 
#> # ℹ 25 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
filter(starwars, mass > 1000)
#> # A tibble: 1 × 14
#>   name       height  mass hair_color skin_color eye_color birth_year sex  
#>   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#> 1 Jabba Des…    175  1358 NA         green-tan… orange           600 herm…
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# Filtering by multiple criteria within a single logical expression
filter(starwars, hair_color == "none" & eye_color == "black")
#> # A tibble: 9 × 14
#>   name       height  mass hair_color skin_color eye_color birth_year sex  
#>   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#> 1 Nien Nunb     160    68 none       grey       black             NA male 
#> 2 Gasgano       122    NA none       white, bl… black             NA male 
#> 3 Kit Fisto     196    87 none       green      black             NA male 
#> 4 Plo Koon      188    80 none       orange     black             22 male 
#> 5 Lama Su       229    88 none       grey       black             NA male 
#> 6 Taun We       213    NA none       grey       black             NA fema…
#> 7 Shaak Ti      178    57 none       red, blue… black             NA fema…
#> 8 Tion Medon    206    80 none       grey       black             NA male 
#> 9 BB8            NA    NA none       none       black             NA none 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
filter(starwars, hair_color == "none" | eye_color == "black")
#> # A tibble: 38 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  2 Greedo       173    74 NA         green      black           44   male 
#>  3 IG-88        200   140 none       metal      red             15   none 
#>  4 Bossk        190   113 none       green      red             53   male 
#>  5 Lobot        175    79 none       light      blue            37   male 
#>  6 Ackbar       180    83 none       brown mot… orange          41   male 
#>  7 Nien Nunb    160    68 none       grey       black           NA   male 
#>  8 Nute Gun…    191    90 none       mottled g… red             NA   male 
#>  9 Jar Jar …    196    66 none       orange     orange          52   male 
#> 10 Roos Tar…    224    82 none       grey       orange          NA   male 
#> # ℹ 28 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# When multiple expressions are used, they are combined using &
filter(starwars, hair_color == "none", eye_color == "black")
#> # A tibble: 9 × 14
#>   name       height  mass hair_color skin_color eye_color birth_year sex  
#>   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#> 1 Nien Nunb     160    68 none       grey       black             NA male 
#> 2 Gasgano       122    NA none       white, bl… black             NA male 
#> 3 Kit Fisto     196    87 none       green      black             NA male 
#> 4 Plo Koon      188    80 none       orange     black             22 male 
#> 5 Lama Su       229    88 none       grey       black             NA male 
#> 6 Taun We       213    NA none       grey       black             NA fema…
#> 7 Shaak Ti      178    57 none       red, blue… black             NA fema…
#> 8 Tion Medon    206    80 none       grey       black             NA male 
#> 9 BB8            NA    NA none       none       black             NA none 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>


# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 10 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  2 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  3 Chewbacca    228   112 brown      unknown    blue           200   male 
#>  4 Jabba De…    175  1358 NA         green-tan… orange         600   herm…
#>  5 Jek Tono…    180   110 brown      fair       blue            NA   male 
#>  6 IG-88        200   140 none       metal      red             15   none 
#>  7 Bossk        190   113 none       green      red             53   male 
#>  8 Dexter J…    198   102 none       brown      yellow          NA   male 
#>  9 Grievous     216   159 none       brown, wh… green, y…       NA   male 
#> 10 Tarfful      234   136 brown      brown      blue            NA   male 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 14 × 14
#> # Groups:   gender [2]
#>    name     height   mass hair_color skin_color eye_color birth_year sex  
#>    <chr>     <int>  <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth V…    202  136   none       white      yellow          41.9 male 
#>  2 Owen La…    178  120   brown, gr… light      blue            52   male 
#>  3 Beru Wh…    165   75   brown      light      blue            47   fema…
#>  4 Chewbac…    228  112   brown      unknown    blue           200   male 
#>  5 Jabba D…    175 1358   NA         green-tan… orange         600   herm…
#>  6 Jek Ton…    180  110   brown      fair       blue            NA   male 
#>  7 IG-88       200  140   none       metal      red             15   none 
#>  8 Bossk       190  113   none       green      red             53   male 
#>  9 Ayla Se…    178   55   none       blue       hazel           48   fema…
#> 10 Luminar…    170   56.2 black      yellow     blue            58   fema…
#> 11 Zam Wes…    168   55   blonde     fair, gre… yellow          NA   fema…
#> 12 Shaak Ti    178   57   none       red, blue… black           NA   fema…
#> 13 Grievous    216  159   none       brown, wh… green, y…       NA   male 
#> 14 Tarfful     234  136   brown      brown      blue            NA   male 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>


# To refer to column names that are stored as strings, use the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
  filter(
    .data[[vars[[1]]]] > cond[[1]],
    .data[[vars[[2]]]] > cond[[2]]
  )
#> # A tibble: 21 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  2 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  3 Biggs Da…    183    84 black      light      brown           24   male 
#>  4 Anakin S…    188    84 blond      fair       blue            41.9 male 
#>  5 Chewbacca    228   112 brown      unknown    blue           200   male 
#>  6 Jabba De…    175  1358 NA         green-tan… orange         600   herm…
#>  7 Jek Tono…    180   110 brown      fair       blue            NA   male 
#>  8 IG-88        200   140 none       metal      red             15   none 
#>  9 Bossk        190   113 none       green      red             53   male 
#> 10 Ackbar       180    83 none       brown mot… orange          41   male 
#> # ℹ 11 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
# Learn more in ?rlang::args_data_masking
源代码:R/filter.R

相关用法


注:本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Keep rows that match a condition。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。