R dplyr distinct 保留不同/唯一的行

仅保留 DataFrame 中唯一/不同的行。这与 unique.data.frame() 类似，但速度要快得多。

用法

distinct(.data, ..., .keep_all = FALSE)

参数

.data: 数据帧、数据帧扩展(例如 tibble)或惰性数据帧(例如来自 dbplyr 或 dtplyr)。有关更多详细信息，请参阅下面的方法。
...: < data-masking > 确定唯一性时使用的可选变量。如果给定的输入组合有多行，则仅保留第一行。如果省略，将使用 DataFrame 中的所有变量。
.keep_all: 如果是 TRUE ，则将所有变量保留在 .data 中。如果 ... 的组合不不同，则保留第一行值。

值

与 .data 类型相同的对象。输出具有以下属性：

行是输入的子集，但以相同的顺序出现。
如果 ... 为空或 .keep_all 为 TRUE ，则不会修改列。否则，distinct() 首先调用mutate() 创建新列。
组不被修改。
DataFrame 属性被保留。

方法

该函数是泛型函数，这意味着包可以为其他类提供实现(方法)。有关额外参数和行为差异，请参阅各个方法的文档。

加载的包中当前提供以下方法： dbplyr ( tbl_lazy )、dplyr ( data.frame ) 。

例子

df <- tibble(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
nrow(df)
#> [1] 100
nrow(distinct(df))
#> [1] 67
nrow(distinct(df, x, y))
#> [1] 67

distinct(df, x)
#> # A tibble: 10 × 1
#>        x
#>    <int>
#>  1    10
#>  2     5
#>  3     9
#>  4     7
#>  5     8
#>  6     6
#>  7     2
#>  8     3
#>  9     4
#> 10     1
distinct(df, y)
#> # A tibble: 10 × 1
#>        y
#>    <int>
#>  1     2
#>  2     8
#>  3     4
#>  4     6
#>  5    10
#>  6     7
#>  7     9
#>  8     3
#>  9     1
#> 10     5

# You can choose to keep all other variables as well
distinct(df, x, .keep_all = TRUE)
#> # A tibble: 10 × 2
#>        x     y
#>    <int> <int>
#>  1    10     2
#>  2     5     8
#>  3     9     4
#>  4     7     4
#>  5     8    10
#>  6     6     2
#>  7     2    10
#>  8     3     6
#>  9     4     3
#> 10     1     7
distinct(df, y, .keep_all = TRUE)
#> # A tibble: 10 × 2
#>        x     y
#>    <int> <int>
#>  1    10     2
#>  2     5     8
#>  3     9     4
#>  4    10     6
#>  5     8    10
#>  6    10     7
#>  7     9     9
#>  8     4     3
#>  9    10     1
#> 10     5     5

# You can also use distinct on computed variables
distinct(df, diff = abs(x - y))
#> # A tibble: 10 × 1
#>     diff
#>    <int>
#>  1     8
#>  2     3
#>  3     5
#>  4     4
#>  5     2
#>  6     0
#>  7     1
#>  8     9
#>  9     6
#> 10     7

# Use `pick()` to select columns with tidy-select
distinct(starwars, pick(contains("color")))
#> # A tibble: 67 × 3
#>    hair_color    skin_color  eye_color
#>    <chr>         <chr>       <chr>    
#>  1 blond         fair        blue     
#>  2 NA            gold        yellow   
#>  3 NA            white, blue red      
#>  4 none          white       yellow   
#>  5 brown         light       brown    
#>  6 brown, grey   light       blue     
#>  7 brown         light       blue     
#>  8 NA            white, red  red      
#>  9 black         light       brown    
#> 10 auburn, white fair        blue-gray
#> # ℹ 57 more rows

# Grouping -------------------------------------------------

df <- tibble(
  g = c(1, 1, 2, 2, 2),
  x = c(1, 1, 2, 1, 2),
  y = c(3, 2, 1, 3, 1)
)
df <- df %>% group_by(g)

# With grouped data frames, distinctness is computed within each group
df %>% distinct(x)
#> # A tibble: 3 × 2
#> # Groups:   g [2]
#>       g     x
#>   <dbl> <dbl>
#> 1     1     1
#> 2     2     2
#> 3     2     1

# When `...` are omitted, `distinct()` still computes distinctness using
# all variables in the data frame
df %>% distinct()
#> # A tibble: 4 × 3
#> # Groups:   g [2]
#>       g     x     y
#>   <dbl> <dbl> <dbl>
#> 1     1     1     3
#> 2     1     1     2
#> 3     2     2     1
#> 4     2     1     3

源代码：R/distinct.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Keep distinct/unique rows。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。