R dplyr mutate 创建、修改和删除列

mutate() 创建作为现有变量函数的新列。它还可以修改(如果名称与现有列相同)和删除列(通过将其值设置为 NULL )。

用法

mutate(.data, ...)

# S3 method for data.frame
mutate(
  .data,
  ...,
  .by = NULL,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)

参数

.data

数据帧、数据帧扩展(例如 tibble)或惰性数据帧(例如来自 dbplyr 或 dtplyr)。有关更多详细信息，请参阅下面的方法。

...

< data-masking > Name-value 对。名称给出输出中列的名称。

该值可以是：

长度为1的向量，将被回收到正确的长度。
与当前组(或整个数据帧，如果未分组)长度相同的向量。
NULL ，删除该列。
DataFrame 或小标题，用于在输出中创建多个列。

.by

< tidy-select > (可选)仅针对此操作选择要分组的列，作为 group_by() 的替代方案。有关详细信息和示例，请参阅?dplyr_by。

.keep

控制 .data 中的哪些列保留在输出中。始终保留分组列和 ... 创建的列。

"all" 保留 .data 中的所有列。这是默认设置。
"used" 仅保留 ... 中使用的列以创建新列。这对于检查您的工作非常有用，因为它并排显示输入和输出。
"unused" 仅保留 ... 中未使用的列以创建新列。如果您生成新列，但不再需要用于生成它们的列，这非常有用。
"none" 不保留 .data 中的任何额外列。仅保留... 创建的分组变量和列。

.before, .after

< tidy-select > (可选)控制新列应出现的位置(默认设置是添加到右侧)。有关更多详细信息，请参阅relocate()。

值

与 .data 类型相同的对象。输出具有以下属性：

.data 中的列将根据 .keep 参数保留。
由 ... 修改的现有列将始终返回到其原始位置。
通过... 创建的新列将根据.before 和.after 参数进行放置。
行数不受影响。
指定值为 NULL 的列将被删除。
如果分组变量发生变化，将重新计算组。
DataFrame 属性被保留。

有用的变异函数

+ 、 - 、 log() 等，了解其通常的数学含义
lead() , lag()
dense_rank() , min_rank() , percent_rank() , row_number() , cume_dist() , ntile()
cumsum() , cummean() , cummin() , cummax() , cumany() , cumall()
na_if() , coalesce()
if_else() , recode() , case_when()

分组标题

由于变异表达式是在组内计算的，因此它们可能会在分组的 tibbles 上产生不同的结果。一旦涉及聚合、滞后或排名函数，就会出现这种情况。比较这个未分组的变异：

starwars %>%
  select(name, mass, species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

与分组等效：

starwars %>%
  select(name, mass, species) %>%
  group_by(species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

前者按全局平均值标准化mass，而后者按物种水平内的平均值标准化。

方法

该函数是泛型函数，这意味着包可以为其他类提供实现(方法)。有关额外参数和行为差异，请参阅各个方法的文档。

当前加载的包中可用的方法： dbplyr ( tbl_lazy )、dplyr ( data.frame ) 。

也可以看看

其他单表动词： arrange() 、 filter() 、 reframe() 、 rename() 、 select() 、 slice() 、 summarise()

例子

# Newly created variables are available immediately
starwars %>%
  select(name, mass) %>%
  mutate(
    mass2 = mass * 2,
    mass2_squared = mass2 * mass2
  )
#> # A tibble: 87 × 4
#>    name                mass mass2 mass2_squared
#>    <chr>              <dbl> <dbl>         <dbl>
#>  1 Luke Skywalker        77   154         23716
#>  2 C-3PO                 75   150         22500
#>  3 R2-D2                 32    64          4096
#>  4 Darth Vader          136   272         73984
#>  5 Leia Organa           49    98          9604
#>  6 Owen Lars            120   240         57600
#>  7 Beru Whitesun lars    75   150         22500
#>  8 R5-D4                 32    64          4096
#>  9 Biggs Darklighter     84   168         28224
#> 10 Obi-Wan Kenobi        77   154         23716
#> # ℹ 77 more rows

# As well as adding new variables, you can use mutate() to
# remove variables and modify existing variables.
starwars %>%
  select(name, height, mass, homeworld) %>%
  mutate(
    mass = NULL,
    height = height * 0.0328084 # convert to feet
  )
#> # A tibble: 87 × 3
#>    name               height homeworld
#>    <chr>               <dbl> <chr>    
#>  1 Luke Skywalker       5.64 Tatooine 
#>  2 C-3PO                5.48 Tatooine 
#>  3 R2-D2                3.15 Naboo    
#>  4 Darth Vader          6.63 Tatooine 
#>  5 Leia Organa          4.92 Alderaan 
#>  6 Owen Lars            5.84 Tatooine 
#>  7 Beru Whitesun lars   5.41 Tatooine 
#>  8 R5-D4                3.18 Tatooine 
#>  9 Biggs Darklighter    6.00 Tatooine 
#> 10 Obi-Wan Kenobi       5.97 Stewjon  
#> # ℹ 77 more rows

# Use across() with mutate() to apply a transformation
# to multiple columns in a tibble.
starwars %>%
  select(name, homeworld, species) %>%
  mutate(across(!name, as.factor))
#> # A tibble: 87 × 3
#>    name               homeworld species
#>    <chr>              <fct>     <fct>  
#>  1 Luke Skywalker     Tatooine  Human  
#>  2 C-3PO              Tatooine  Droid  
#>  3 R2-D2              Naboo     Droid  
#>  4 Darth Vader        Tatooine  Human  
#>  5 Leia Organa        Alderaan  Human  
#>  6 Owen Lars          Tatooine  Human  
#>  7 Beru Whitesun lars Tatooine  Human  
#>  8 R5-D4              Tatooine  Droid  
#>  9 Biggs Darklighter  Tatooine  Human  
#> 10 Obi-Wan Kenobi     Stewjon   Human  
#> # ℹ 77 more rows
# see more in ?across

# Window functions are useful for grouped mutates:
starwars %>%
  select(name, mass, homeworld) %>%
  group_by(homeworld) %>%
  mutate(rank = min_rank(desc(mass)))
#> # A tibble: 87 × 4
#> # Groups:   homeworld [49]
#>    name                mass homeworld  rank
#>    <chr>              <dbl> <chr>     <int>
#>  1 Luke Skywalker        77 Tatooine      5
#>  2 C-3PO                 75 Tatooine      6
#>  3 R2-D2                 32 Naboo         6
#>  4 Darth Vader          136 Tatooine      1
#>  5 Leia Organa           49 Alderaan      2
#>  6 Owen Lars            120 Tatooine      2
#>  7 Beru Whitesun lars    75 Tatooine      6
#>  8 R5-D4                 32 Tatooine      8
#>  9 Biggs Darklighter     84 Tatooine      3
#> 10 Obi-Wan Kenobi        77 Stewjon       1
#> # ℹ 77 more rows
# see `vignette("window-functions")` for more details

# By default, new columns are placed on the far right.
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
#> # A tibble: 1 × 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
df %>% mutate(z = x + y, .before = 1)
#> # A tibble: 1 × 3
#>       z     x     y
#>   <dbl> <dbl> <dbl>
#> 1     3     1     2
df %>% mutate(z = x + y, .after = x)
#> # A tibble: 1 × 3
#>       x     z     y
#>   <dbl> <dbl> <dbl>
#> 1     1     3     2

# By default, mutate() keeps all columns from the input data.
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df %>% mutate(z = x + y, .keep = "all") # the default
#> # A tibble: 1 × 5
#>       x     y a     b         z
#>   <dbl> <dbl> <chr> <chr> <dbl>
#> 1     1     2 a     b         3
df %>% mutate(z = x + y, .keep = "used")
#> # A tibble: 1 × 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
df %>% mutate(z = x + y, .keep = "unused")
#> # A tibble: 1 × 3
#>   a     b         z
#>   <chr> <chr> <dbl>
#> 1 a     b         3
df %>% mutate(z = x + y, .keep = "none")
#> # A tibble: 1 × 1
#>       z
#>   <dbl>
#> 1     3

# Grouping ----------------------------------------
# The mutate operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
# The following normalises `mass` by the global average:
starwars %>%
  select(name, mass, species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
#> # A tibble: 87 × 4
#>    name                mass species mass_norm
#>    <chr>              <dbl> <chr>       <dbl>
#>  1 Luke Skywalker        77 Human       0.791
#>  2 C-3PO                 75 Droid       0.771
#>  3 R2-D2                 32 Droid       0.329
#>  4 Darth Vader          136 Human       1.40 
#>  5 Leia Organa           49 Human       0.504
#>  6 Owen Lars            120 Human       1.23 
#>  7 Beru Whitesun lars    75 Human       0.771
#>  8 R5-D4                 32 Droid       0.329
#>  9 Biggs Darklighter     84 Human       0.863
#> 10 Obi-Wan Kenobi        77 Human       0.791
#> # ℹ 77 more rows

# Whereas this normalises `mass` by the averages within species
# levels:
starwars %>%
  select(name, mass, species) %>%
  group_by(species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
#> # A tibble: 87 × 4
#> # Groups:   species [38]
#>    name                mass species mass_norm
#>    <chr>              <dbl> <chr>       <dbl>
#>  1 Luke Skywalker        77 Human       0.930
#>  2 C-3PO                 75 Droid       1.08 
#>  3 R2-D2                 32 Droid       0.459
#>  4 Darth Vader          136 Human       1.64 
#>  5 Leia Organa           49 Human       0.592
#>  6 Owen Lars            120 Human       1.45 
#>  7 Beru Whitesun lars    75 Human       0.906
#>  8 R5-D4                 32 Droid       0.459
#>  9 Biggs Darklighter     84 Human       1.01 
#> 10 Obi-Wan Kenobi        77 Human       0.930
#> # ℹ 77 more rows

# Indirection ----------------------------------------
# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
#> # A tibble: 87 × 15
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Luke Sky…    172    77 blond      fair       blue            19   male 
#>  2 C-3PO        167    75 NA         gold       yellow         112   none 
#>  3 R2-D2         96    32 NA         white, bl… red             33   none 
#>  4 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  5 Leia Org…    150    49 brown      light      brown           19   fema…
#>  6 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  7 Beru Whi…    165    75 brown      light      blue            47   fema…
#>  8 R5-D4         97    32 NA         white, red red             NA   none 
#>  9 Biggs Da…    183    84 black      light      brown           24   male 
#> 10 Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male 
#> # ℹ 77 more rows
#> # ℹ 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>, prod <dbl>
# Learn more in ?rlang::args_data_masking

源代码：R/mutate.R

相关用法

注：本文由纯净天空筛选整理自Hadley Wickham等大神的英文原创作品 Create, modify, and delete columns。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。