filter()
函数用于对 DataFrame 进行子集化,保留满足条件的所有行。要保留该行,在所有条件下都必须生成 TRUE
值。请注意,当条件计算为 NA
时,该行将被删除,这与 [
的基本子集设置不同。
参数
- .data
-
数据帧、数据帧扩展(例如 tibble)或惰性数据帧(例如来自 dbplyr 或 dtplyr)。有关更多详细信息,请参阅下面的方法。
- ...
-
<
data-masking
> 返回逻辑值的表达式,并根据.data
中的变量进行定义。如果包含多个表达式,它们将与&
运算符组合。仅保留所有条件评估为TRUE
的行。 - .by
-
<
tidy-select
> (可选)仅针对此操作选择要分组的列,作为group_by()
的替代方案。有关详细信息和示例,请参阅?dplyr_by。 - .preserve
-
当
.data
输入分组时相关。如果.preserve = FALSE
(默认值),则根据结果数据重新计算分组结构,否则分组保持原样。
细节
filter()
函数用于对 .data
的行进行子集化,将 ...
中的表达式应用于列值以确定应保留哪些行。它可以应用于分组和未分组的数据(请参阅 group_by()
和 ungroup()
)。然而,dplyr 还不够智能,无法优化不需要分组计算的分组数据集的过滤操作。因此,对未分组的数据进行过滤通常要快得多。
分组标题
由于过滤表达式是在组内计算的,因此它们可能会在分组的 tibbles 上产生不同的结果。一旦涉及聚合、滞后或排名函数,就会出现这种情况。比较这个未分组的过滤:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
与分组等效:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
在未分组的版本中,filter()
将每行中 mass
的值与全局平均值(涵盖整个数据集)进行比较,仅保留 mass
大于该全局平均值的行。相反,分组版本分别计算每个gender
组的平均质量,并保留mass
大于相关within-gender平均值的行。
方法
该函数是泛型函数,这意味着包可以为其他类提供实现(方法)。有关额外参数和行为差异,请参阅各个方法的文档。
加载的包中当前提供以下方法: dbplyr ( tbl_lazy
)、dplyr ( data.frame
、 ts
) 。
例子
# Filtering by one criterion
filter(starwars, species == "Human")
#> # A tibble: 35 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Luke Sky… 172 77 blond fair blue 19 male
#> 2 Darth Va… 202 136 none white yellow 41.9 male
#> 3 Leia Org… 150 49 brown light brown 19 fema…
#> 4 Owen Lars 178 120 brown, gr… light blue 52 male
#> 5 Beru Whi… 165 75 brown light blue 47 fema…
#> 6 Biggs Da… 183 84 black light brown 24 male
#> 7 Obi-Wan … 182 77 auburn, w… fair blue-gray 57 male
#> 8 Anakin S… 188 84 blond fair blue 41.9 male
#> 9 Wilhuff … 180 NA auburn, g… fair blue 64 male
#> 10 Han Solo 180 80 brown fair brown 29 male
#> # ℹ 25 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
filter(starwars, mass > 1000)
#> # A tibble: 1 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Jabba Des… 175 1358 NA green-tan… orange 600 herm…
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# Filtering by multiple criteria within a single logical expression
filter(starwars, hair_color == "none" & eye_color == "black")
#> # A tibble: 9 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Nien Nunb 160 68 none grey black NA male
#> 2 Gasgano 122 NA none white, bl… black NA male
#> 3 Kit Fisto 196 87 none green black NA male
#> 4 Plo Koon 188 80 none orange black 22 male
#> 5 Lama Su 229 88 none grey black NA male
#> 6 Taun We 213 NA none grey black NA fema…
#> 7 Shaak Ti 178 57 none red, blue… black NA fema…
#> 8 Tion Medon 206 80 none grey black NA male
#> 9 BB8 NA NA none none black NA none
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
filter(starwars, hair_color == "none" | eye_color == "black")
#> # A tibble: 38 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth Va… 202 136 none white yellow 41.9 male
#> 2 Greedo 173 74 NA green black 44 male
#> 3 IG-88 200 140 none metal red 15 none
#> 4 Bossk 190 113 none green red 53 male
#> 5 Lobot 175 79 none light blue 37 male
#> 6 Ackbar 180 83 none brown mot… orange 41 male
#> 7 Nien Nunb 160 68 none grey black NA male
#> 8 Nute Gun… 191 90 none mottled g… red NA male
#> 9 Jar Jar … 196 66 none orange orange 52 male
#> 10 Roos Tar… 224 82 none grey orange NA male
#> # ℹ 28 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# When multiple expressions are used, they are combined using &
filter(starwars, hair_color == "none", eye_color == "black")
#> # A tibble: 9 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Nien Nunb 160 68 none grey black NA male
#> 2 Gasgano 122 NA none white, bl… black NA male
#> 3 Kit Fisto 196 87 none green black NA male
#> 4 Plo Koon 188 80 none orange black 22 male
#> 5 Lama Su 229 88 none grey black NA male
#> 6 Taun We 213 NA none grey black NA fema…
#> 7 Shaak Ti 178 57 none red, blue… black NA fema…
#> 8 Tion Medon 206 80 none grey black NA male
#> 9 BB8 NA NA none none black NA none
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 10 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth Va… 202 136 none white yellow 41.9 male
#> 2 Owen Lars 178 120 brown, gr… light blue 52 male
#> 3 Chewbacca 228 112 brown unknown blue 200 male
#> 4 Jabba De… 175 1358 NA green-tan… orange 600 herm…
#> 5 Jek Tono… 180 110 brown fair blue NA male
#> 6 IG-88 200 140 none metal red 15 none
#> 7 Bossk 190 113 none green red 53 male
#> 8 Dexter J… 198 102 none brown yellow NA male
#> 9 Grievous 216 159 none brown, wh… green, y… NA male
#> 10 Tarfful 234 136 brown brown blue NA male
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 14 × 14
#> # Groups: gender [2]
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male
#> 2 Owen La… 178 120 brown, gr… light blue 52 male
#> 3 Beru Wh… 165 75 brown light blue 47 fema…
#> 4 Chewbac… 228 112 brown unknown blue 200 male
#> 5 Jabba D… 175 1358 NA green-tan… orange 600 herm…
#> 6 Jek Ton… 180 110 brown fair blue NA male
#> 7 IG-88 200 140 none metal red 15 none
#> 8 Bossk 190 113 none green red 53 male
#> 9 Ayla Se… 178 55 none blue hazel 48 fema…
#> 10 Luminar… 170 56.2 black yellow blue 58 fema…
#> 11 Zam Wes… 168 55 blonde fair, gre… yellow NA fema…
#> 12 Shaak Ti 178 57 none red, blue… black NA fema…
#> 13 Grievous 216 159 none brown, wh… green, y… NA male
#> 14 Tarfful 234 136 brown brown blue NA male
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# To refer to column names that are stored as strings, use the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
filter(
.data[[vars[[1]]]] > cond[[1]],
.data[[vars[[2]]]] > cond[[2]]
)
#> # A tibble: 21 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth Va… 202 136 none white yellow 41.9 male
#> 2 Owen Lars 178 120 brown, gr… light blue 52 male
#> 3 Biggs Da… 183 84 black light brown 24 male
#> 4 Anakin S… 188 84 blond fair blue 41.9 male
#> 5 Chewbacca 228 112 brown unknown blue 200 male
#> 6 Jabba De… 175 1358 NA green-tan… orange 600 herm…
#> 7 Jek Tono… 180 110 brown fair blue NA male
#> 8 IG-88 200 140 none metal red 15 none
#> 9 Bossk 190 113 none green red 53 male
#> 10 Ackbar 180 83 none brown mot… orange 41 male
#> # ℹ 11 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# Learn more in ?rlang::args_data_masking
相关用法
- R dplyr filter_all 在选择的变量中进行过滤
- R dplyr filter-joins 过滤连接
- R dplyr group_trim 修剪分组结构
- R dplyr slice 使用行的位置对行进行子集化
- R dplyr copy_to 将本地数据帧复制到远程src
- R dplyr sample_n 从表中采样 n 行
- R dplyr consecutive_id 为连续组合生成唯一标识符
- R dplyr row_number 整数排名函数
- R dplyr band_members 乐队成员
- R dplyr mutate-joins 变异连接
- R dplyr nth 从向量中提取第一个、最后一个或第 n 个值
- R dplyr coalesce 找到第一个非缺失元素
- R dplyr group_split 按组分割 DataFrame
- R dplyr mutate 创建、修改和删除列
- R dplyr order_by 用于排序窗口函数输出的辅助函数
- R dplyr context 有关“当前”组或变量的信息
- R dplyr percent_rank 比例排名函数
- R dplyr recode 重新编码值
- R dplyr starwars 星球大战人物
- R dplyr desc 降序
- R dplyr between 检测值落在指定范围内的位置
- R dplyr cumall 任何、全部和平均值的累积版本
- R dplyr group_map 对每个组应用一个函数
- R dplyr do 做任何事情
- R dplyr nest_join 嵌套连接
注:本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Keep rows that match a condition。非经特殊声明,原始代码版权归原作者所有,本译文未经允许或授权,请勿转载或复制。