filter()
函數用於對 DataFrame 進行子集化,保留滿足條件的所有行。要保留該行,在所有條件下都必須生成 TRUE
值。請注意,當條件計算為 NA
時,該行將被刪除,這與 [
的基本子集設置不同。
參數
- .data
-
數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。有關更多詳細信息,請參閱下麵的方法。
- ...
-
<
data-masking
> 返回邏輯值的表達式,並根據.data
中的變量進行定義。如果包含多個表達式,它們將與&
運算符組合。僅保留所有條件評估為TRUE
的行。 - .by
-
<
tidy-select
> (可選)僅針對此操作選擇要分組的列,作為group_by()
的替代方案。有關詳細信息和示例,請參閱?dplyr_by。 - .preserve
-
當
.data
輸入分組時相關。如果.preserve = FALSE
(默認值),則根據結果數據重新計算分組結構,否則分組保持原樣。
細節
filter()
函數用於對 .data
的行進行子集化,將 ...
中的表達式應用於列值以確定應保留哪些行。它可以應用於分組和未分組的數據(請參閱 group_by()
和 ungroup()
)。然而,dplyr 還不夠智能,無法優化不需要分組計算的分組數據集的過濾操作。因此,對未分組的數據進行過濾通常要快得多。
分組標題
由於過濾表達式是在組內計算的,因此它們可能會在分組的 tibbles 上產生不同的結果。一旦涉及聚合、滯後或排名函數,就會出現這種情況。比較這個未分組的過濾:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
與分組等效:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
在未分組的版本中,filter()
將每行中 mass
的值與全局平均值(涵蓋整個數據集)進行比較,僅保留 mass
大於該全局平均值的行。相反,分組版本分別計算每個gender
組的平均質量,並保留mass
大於相關within-gender平均值的行。
方法
該函數是泛型函數,這意味著包可以為其他類提供實現(方法)。有關額外參數和行為差異,請參閱各個方法的文檔。
加載的包中當前提供以下方法: dbplyr ( tbl_lazy
)、dplyr ( data.frame
、 ts
) 。
例子
# Filtering by one criterion
filter(starwars, species == "Human")
#> # A tibble: 35 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Luke Sky… 172 77 blond fair blue 19 male
#> 2 Darth Va… 202 136 none white yellow 41.9 male
#> 3 Leia Org… 150 49 brown light brown 19 fema…
#> 4 Owen Lars 178 120 brown, gr… light blue 52 male
#> 5 Beru Whi… 165 75 brown light blue 47 fema…
#> 6 Biggs Da… 183 84 black light brown 24 male
#> 7 Obi-Wan … 182 77 auburn, w… fair blue-gray 57 male
#> 8 Anakin S… 188 84 blond fair blue 41.9 male
#> 9 Wilhuff … 180 NA auburn, g… fair blue 64 male
#> 10 Han Solo 180 80 brown fair brown 29 male
#> # ℹ 25 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
filter(starwars, mass > 1000)
#> # A tibble: 1 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Jabba Des… 175 1358 NA green-tan… orange 600 herm…
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# Filtering by multiple criteria within a single logical expression
filter(starwars, hair_color == "none" & eye_color == "black")
#> # A tibble: 9 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Nien Nunb 160 68 none grey black NA male
#> 2 Gasgano 122 NA none white, bl… black NA male
#> 3 Kit Fisto 196 87 none green black NA male
#> 4 Plo Koon 188 80 none orange black 22 male
#> 5 Lama Su 229 88 none grey black NA male
#> 6 Taun We 213 NA none grey black NA fema…
#> 7 Shaak Ti 178 57 none red, blue… black NA fema…
#> 8 Tion Medon 206 80 none grey black NA male
#> 9 BB8 NA NA none none black NA none
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
filter(starwars, hair_color == "none" | eye_color == "black")
#> # A tibble: 38 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth Va… 202 136 none white yellow 41.9 male
#> 2 Greedo 173 74 NA green black 44 male
#> 3 IG-88 200 140 none metal red 15 none
#> 4 Bossk 190 113 none green red 53 male
#> 5 Lobot 175 79 none light blue 37 male
#> 6 Ackbar 180 83 none brown mot… orange 41 male
#> 7 Nien Nunb 160 68 none grey black NA male
#> 8 Nute Gun… 191 90 none mottled g… red NA male
#> 9 Jar Jar … 196 66 none orange orange 52 male
#> 10 Roos Tar… 224 82 none grey orange NA male
#> # ℹ 28 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# When multiple expressions are used, they are combined using &
filter(starwars, hair_color == "none", eye_color == "black")
#> # A tibble: 9 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Nien Nunb 160 68 none grey black NA male
#> 2 Gasgano 122 NA none white, bl… black NA male
#> 3 Kit Fisto 196 87 none green black NA male
#> 4 Plo Koon 188 80 none orange black 22 male
#> 5 Lama Su 229 88 none grey black NA male
#> 6 Taun We 213 NA none grey black NA fema…
#> 7 Shaak Ti 178 57 none red, blue… black NA fema…
#> 8 Tion Medon 206 80 none grey black NA male
#> 9 BB8 NA NA none none black NA none
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 10 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth Va… 202 136 none white yellow 41.9 male
#> 2 Owen Lars 178 120 brown, gr… light blue 52 male
#> 3 Chewbacca 228 112 brown unknown blue 200 male
#> 4 Jabba De… 175 1358 NA green-tan… orange 600 herm…
#> 5 Jek Tono… 180 110 brown fair blue NA male
#> 6 IG-88 200 140 none metal red 15 none
#> 7 Bossk 190 113 none green red 53 male
#> 8 Dexter J… 198 102 none brown yellow NA male
#> 9 Grievous 216 159 none brown, wh… green, y… NA male
#> 10 Tarfful 234 136 brown brown blue NA male
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 14 × 14
#> # Groups: gender [2]
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth V… 202 136 none white yellow 41.9 male
#> 2 Owen La… 178 120 brown, gr… light blue 52 male
#> 3 Beru Wh… 165 75 brown light blue 47 fema…
#> 4 Chewbac… 228 112 brown unknown blue 200 male
#> 5 Jabba D… 175 1358 NA green-tan… orange 600 herm…
#> 6 Jek Ton… 180 110 brown fair blue NA male
#> 7 IG-88 200 140 none metal red 15 none
#> 8 Bossk 190 113 none green red 53 male
#> 9 Ayla Se… 178 55 none blue hazel 48 fema…
#> 10 Luminar… 170 56.2 black yellow blue 58 fema…
#> 11 Zam Wes… 168 55 blonde fair, gre… yellow NA fema…
#> 12 Shaak Ti 178 57 none red, blue… black NA fema…
#> 13 Grievous 216 159 none brown, wh… green, y… NA male
#> 14 Tarfful 234 136 brown brown blue NA male
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# To refer to column names that are stored as strings, use the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
filter(
.data[[vars[[1]]]] > cond[[1]],
.data[[vars[[2]]]] > cond[[2]]
)
#> # A tibble: 21 × 14
#> name height mass hair_color skin_color eye_color birth_year sex
#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 Darth Va… 202 136 none white yellow 41.9 male
#> 2 Owen Lars 178 120 brown, gr… light blue 52 male
#> 3 Biggs Da… 183 84 black light brown 24 male
#> 4 Anakin S… 188 84 blond fair blue 41.9 male
#> 5 Chewbacca 228 112 brown unknown blue 200 male
#> 6 Jabba De… 175 1358 NA green-tan… orange 600 herm…
#> 7 Jek Tono… 180 110 brown fair blue NA male
#> 8 IG-88 200 140 none metal red 15 none
#> 9 Bossk 190 113 none green red 53 male
#> 10 Ackbar 180 83 none brown mot… orange 41 male
#> # ℹ 11 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> # films <list>, vehicles <list>, starships <list>
# Learn more in ?rlang::args_data_masking
相關用法
- R dplyr filter_all 在選擇的變量中進行過濾
- R dplyr filter-joins 過濾連接
- R dplyr group_trim 修剪分組結構
- R dplyr slice 使用行的位置對行進行子集化
- R dplyr copy_to 將本地數據幀複製到遠程src
- R dplyr sample_n 從表中采樣 n 行
- R dplyr consecutive_id 為連續組合生成唯一標識符
- R dplyr row_number 整數排名函數
- R dplyr band_members 樂隊成員
- R dplyr mutate-joins 變異連接
- R dplyr nth 從向量中提取第一個、最後一個或第 n 個值
- R dplyr coalesce 找到第一個非缺失元素
- R dplyr group_split 按組分割 DataFrame
- R dplyr mutate 創建、修改和刪除列
- R dplyr order_by 用於排序窗口函數輸出的輔助函數
- R dplyr context 有關“當前”組或變量的信息
- R dplyr percent_rank 比例排名函數
- R dplyr recode 重新編碼值
- R dplyr starwars 星球大戰人物
- R dplyr desc 降序
- R dplyr between 檢測值落在指定範圍內的位置
- R dplyr cumall 任何、全部和平均值的累積版本
- R dplyr group_map 對每個組應用一個函數
- R dplyr do 做任何事情
- R dplyr nest_join 嵌套連接
注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Keep rows that match a condition。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。