當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R dplyr filter 保留符合條件的行


filter() 函數用於對 DataFrame 進行子集化,保留滿足條件的所有行。要保留該行,在所有條件下都必須生成 TRUE 值。請注意,當條件計算為 NA 時,該行將被刪除,這與 [ 的基本子集設置不同。

用法

filter(.data, ..., .by = NULL, .preserve = FALSE)

參數

.data

數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。有關更多詳細信息,請參閱下麵的方法。

...

< data-masking > 返回邏輯值的表達式,並根據 .data 中的變量進行定義。如果包含多個表達式,它們將與 & 運算符組合。僅保留所有條件評估為 TRUE 的行。

.by

[Experimental]

< tidy-select > (可選)僅針對此操作選擇要分組的列,作為 group_by() 的替代方案。有關詳細信息和示例,請參閱?dplyr_by

.preserve

.data 輸入分組時相關。如果.preserve = FALSE(默認值),則根據結果數據重新計算分組結構,否則分組保持原樣。

.data 類型相同的對象。輸出具有以下屬性:

  • 行是輸入的子集,但以相同的順序出現。

  • 列未修改。

  • 組的數量可以減少(如果.preserve不是TRUE)。

  • DataFrame 屬性被保留。

細節

filter() 函數用於對 .data 的行進行子集化,將 ... 中的表達式應用於列值以確定應保留哪些行。它可以應用於分組和未分組的數據(請參閱 group_by()ungroup() )。然而,dplyr 還不夠智能,無法優化不需要分組計算的分組數據集的過濾操作。因此,對未分組的數據進行過濾通常要快得多。

有用的過濾函數

在構造用於過濾數據的表達式時,有許多有用的函數和運算符:

分組標題

由於過濾表達式是在組內計算的,因此它們可能會在分組的 tibbles 上產生不同的結果。一旦涉及聚合、滯後或排名函數,就會出現這種情況。比較這個未分組的過濾:

starwars %>% filter(mass > mean(mass, na.rm = TRUE))

與分組等效:

starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))

在未分組的版本中,filter() 將每行中 mass 的值與全局平均值(涵蓋整個數據集)進行比較,僅保留 mass 大於該全局平均值的行。相反,分組版本分別計算每個gender組的平均質量,並保留mass大於相關within-gender平均值的行。

方法

該函數是泛型函數,這意味著包可以為其他類提供實現(方法)。有關額外參數和行為差異,請參閱各個方法的文檔。

加載的包中當前提供以下方法: dbplyr ( tbl_lazy )、dplyr ( data.framets ) 。

也可以看看

其他單表動詞: arrange()mutate()reframe()rename()select()slice()summarise()

例子

# Filtering by one criterion
filter(starwars, species == "Human")
#> # A tibble: 35 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Luke Sky…    172    77 blond      fair       blue            19   male 
#>  2 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  3 Leia Org…    150    49 brown      light      brown           19   fema…
#>  4 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  5 Beru Whi…    165    75 brown      light      blue            47   fema…
#>  6 Biggs Da…    183    84 black      light      brown           24   male 
#>  7 Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male 
#>  8 Anakin S…    188    84 blond      fair       blue            41.9 male 
#>  9 Wilhuff …    180    NA auburn, g… fair       blue            64   male 
#> 10 Han Solo     180    80 brown      fair       brown           29   male 
#> # ℹ 25 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
filter(starwars, mass > 1000)
#> # A tibble: 1 × 14
#>   name       height  mass hair_color skin_color eye_color birth_year sex  
#>   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#> 1 Jabba Des…    175  1358 NA         green-tan… orange           600 herm…
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# Filtering by multiple criteria within a single logical expression
filter(starwars, hair_color == "none" & eye_color == "black")
#> # A tibble: 9 × 14
#>   name       height  mass hair_color skin_color eye_color birth_year sex  
#>   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#> 1 Nien Nunb     160    68 none       grey       black             NA male 
#> 2 Gasgano       122    NA none       white, bl… black             NA male 
#> 3 Kit Fisto     196    87 none       green      black             NA male 
#> 4 Plo Koon      188    80 none       orange     black             22 male 
#> 5 Lama Su       229    88 none       grey       black             NA male 
#> 6 Taun We       213    NA none       grey       black             NA fema…
#> 7 Shaak Ti      178    57 none       red, blue… black             NA fema…
#> 8 Tion Medon    206    80 none       grey       black             NA male 
#> 9 BB8            NA    NA none       none       black             NA none 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
filter(starwars, hair_color == "none" | eye_color == "black")
#> # A tibble: 38 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  2 Greedo       173    74 NA         green      black           44   male 
#>  3 IG-88        200   140 none       metal      red             15   none 
#>  4 Bossk        190   113 none       green      red             53   male 
#>  5 Lobot        175    79 none       light      blue            37   male 
#>  6 Ackbar       180    83 none       brown mot… orange          41   male 
#>  7 Nien Nunb    160    68 none       grey       black           NA   male 
#>  8 Nute Gun…    191    90 none       mottled g… red             NA   male 
#>  9 Jar Jar …    196    66 none       orange     orange          52   male 
#> 10 Roos Tar…    224    82 none       grey       orange          NA   male 
#> # ℹ 28 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# When multiple expressions are used, they are combined using &
filter(starwars, hair_color == "none", eye_color == "black")
#> # A tibble: 9 × 14
#>   name       height  mass hair_color skin_color eye_color birth_year sex  
#>   <chr>       <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#> 1 Nien Nunb     160    68 none       grey       black             NA male 
#> 2 Gasgano       122    NA none       white, bl… black             NA male 
#> 3 Kit Fisto     196    87 none       green      black             NA male 
#> 4 Plo Koon      188    80 none       orange     black             22 male 
#> 5 Lama Su       229    88 none       grey       black             NA male 
#> 6 Taun We       213    NA none       grey       black             NA fema…
#> 7 Shaak Ti      178    57 none       red, blue… black             NA fema…
#> 8 Tion Medon    206    80 none       grey       black             NA male 
#> 9 BB8            NA    NA none       none       black             NA none 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>


# The filtering operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
#
# The following filters rows where `mass` is greater than the
# global average:
starwars %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 10 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  2 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  3 Chewbacca    228   112 brown      unknown    blue           200   male 
#>  4 Jabba De…    175  1358 NA         green-tan… orange         600   herm…
#>  5 Jek Tono…    180   110 brown      fair       blue            NA   male 
#>  6 IG-88        200   140 none       metal      red             15   none 
#>  7 Bossk        190   113 none       green      red             53   male 
#>  8 Dexter J…    198   102 none       brown      yellow          NA   male 
#>  9 Grievous     216   159 none       brown, wh… green, y…       NA   male 
#> 10 Tarfful      234   136 brown      brown      blue            NA   male 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>

# Whereas this keeps rows with `mass` greater than the gender
# average:
starwars %>% group_by(gender) %>% filter(mass > mean(mass, na.rm = TRUE))
#> # A tibble: 14 × 14
#> # Groups:   gender [2]
#>    name     height   mass hair_color skin_color eye_color birth_year sex  
#>    <chr>     <int>  <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth V…    202  136   none       white      yellow          41.9 male 
#>  2 Owen La…    178  120   brown, gr… light      blue            52   male 
#>  3 Beru Wh…    165   75   brown      light      blue            47   fema…
#>  4 Chewbac…    228  112   brown      unknown    blue           200   male 
#>  5 Jabba D…    175 1358   NA         green-tan… orange         600   herm…
#>  6 Jek Ton…    180  110   brown      fair       blue            NA   male 
#>  7 IG-88       200  140   none       metal      red             15   none 
#>  8 Bossk       190  113   none       green      red             53   male 
#>  9 Ayla Se…    178   55   none       blue       hazel           48   fema…
#> 10 Luminar…    170   56.2 black      yellow     blue            58   fema…
#> 11 Zam Wes…    168   55   blonde     fair, gre… yellow          NA   fema…
#> 12 Shaak Ti    178   57   none       red, blue… black           NA   fema…
#> 13 Grievous    216  159   none       brown, wh… green, y…       NA   male 
#> 14 Tarfful     234  136   brown      brown      blue            NA   male 
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>


# To refer to column names that are stored as strings, use the `.data` pronoun:
vars <- c("mass", "height")
cond <- c(80, 150)
starwars %>%
  filter(
    .data[[vars[[1]]]] > cond[[1]],
    .data[[vars[[2]]]] > cond[[2]]
  )
#> # A tibble: 21 × 14
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  2 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  3 Biggs Da…    183    84 black      light      brown           24   male 
#>  4 Anakin S…    188    84 blond      fair       blue            41.9 male 
#>  5 Chewbacca    228   112 brown      unknown    blue           200   male 
#>  6 Jabba De…    175  1358 NA         green-tan… orange         600   herm…
#>  7 Jek Tono…    180   110 brown      fair       blue            NA   male 
#>  8 IG-88        200   140 none       metal      red             15   none 
#>  9 Bossk        190   113 none       green      red             53   male 
#> 10 Ackbar       180    83 none       brown mot… orange          41   male 
#> # ℹ 11 more rows
#> # ℹ 6 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>
# Learn more in ?rlang::args_data_masking
源代碼:R/filter.R

相關用法


注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Keep rows that match a condition。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。