R dplyr distinct 保留不同/唯一的行

僅保留 DataFrame 中唯一/不同的行。這與 unique.data.frame() 類似，但速度要快得多。

用法

distinct(.data, ..., .keep_all = FALSE)

參數

.data: 數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。有關更多詳細信息，請參閱下麵的方法。
...: < data-masking > 確定唯一性時使用的可選變量。如果給定的輸入組合有多行，則僅保留第一行。如果省略，將使用 DataFrame 中的所有變量。
.keep_all: 如果是 TRUE ，則將所有變量保留在 .data 中。如果 ... 的組合不不同，則保留第一行值。

值

與 .data 類型相同的對象。輸出具有以下屬性：

行是輸入的子集，但以相同的順序出現。
如果 ... 為空或 .keep_all 為 TRUE ，則不會修改列。否則，distinct() 首先調用mutate() 創建新列。
組不被修改。
DataFrame 屬性被保留。

方法

該函數是泛型函數，這意味著包可以為其他類提供實現(方法)。有關額外參數和行為差異，請參閱各個方法的文檔。

加載的包中當前提供以下方法： dbplyr ( tbl_lazy )、dplyr ( data.frame ) 。

例子

df <- tibble(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
nrow(df)
#> [1] 100
nrow(distinct(df))
#> [1] 67
nrow(distinct(df, x, y))
#> [1] 67

distinct(df, x)
#> # A tibble: 10 × 1
#>        x
#>    <int>
#>  1    10
#>  2     5
#>  3     9
#>  4     7
#>  5     8
#>  6     6
#>  7     2
#>  8     3
#>  9     4
#> 10     1
distinct(df, y)
#> # A tibble: 10 × 1
#>        y
#>    <int>
#>  1     2
#>  2     8
#>  3     4
#>  4     6
#>  5    10
#>  6     7
#>  7     9
#>  8     3
#>  9     1
#> 10     5

# You can choose to keep all other variables as well
distinct(df, x, .keep_all = TRUE)
#> # A tibble: 10 × 2
#>        x     y
#>    <int> <int>
#>  1    10     2
#>  2     5     8
#>  3     9     4
#>  4     7     4
#>  5     8    10
#>  6     6     2
#>  7     2    10
#>  8     3     6
#>  9     4     3
#> 10     1     7
distinct(df, y, .keep_all = TRUE)
#> # A tibble: 10 × 2
#>        x     y
#>    <int> <int>
#>  1    10     2
#>  2     5     8
#>  3     9     4
#>  4    10     6
#>  5     8    10
#>  6    10     7
#>  7     9     9
#>  8     4     3
#>  9    10     1
#> 10     5     5

# You can also use distinct on computed variables
distinct(df, diff = abs(x - y))
#> # A tibble: 10 × 1
#>     diff
#>    <int>
#>  1     8
#>  2     3
#>  3     5
#>  4     4
#>  5     2
#>  6     0
#>  7     1
#>  8     9
#>  9     6
#> 10     7

# Use `pick()` to select columns with tidy-select
distinct(starwars, pick(contains("color")))
#> # A tibble: 67 × 3
#>    hair_color    skin_color  eye_color
#>    <chr>         <chr>       <chr>    
#>  1 blond         fair        blue     
#>  2 NA            gold        yellow   
#>  3 NA            white, blue red      
#>  4 none          white       yellow   
#>  5 brown         light       brown    
#>  6 brown, grey   light       blue     
#>  7 brown         light       blue     
#>  8 NA            white, red  red      
#>  9 black         light       brown    
#> 10 auburn, white fair        blue-gray
#> # ℹ 57 more rows

# Grouping -------------------------------------------------

df <- tibble(
  g = c(1, 1, 2, 2, 2),
  x = c(1, 1, 2, 1, 2),
  y = c(3, 2, 1, 3, 1)
)
df <- df %>% group_by(g)

# With grouped data frames, distinctness is computed within each group
df %>% distinct(x)
#> # A tibble: 3 × 2
#> # Groups:   g [2]
#>       g     x
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     2
#> 3     2     1

# When `...` are omitted, `distinct()` still computes distinctness using
# all variables in the data frame
df %>% distinct()
#> # A tibble: 4 × 3
#> # Groups:   g [2]
#>       g     x     y
#>   <dbl> <dbl> <dbl>
#> 1     1     1     3
#> 2     1     1     2
#> 3     2     2     1
#> 4     2     1     3

源代碼：R/distinct.R

相關用法

注：本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Keep distinct/unique rows。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。