當前位置: 首頁>>代碼示例 >>用法及示例精選 >>正文


R dplyr mutate 創建、修改和刪除列


mutate() 創建作為現有變量函數的新列。它還可以修改(如果名稱與現有列相同)和刪除列(通過將其值設置為 NULL )。

用法

mutate(.data, ...)

# S3 method for data.frame
mutate(
  .data,
  ...,
  .by = NULL,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)

參數

.data

數據幀、數據幀擴展(例如 tibble)或惰性數據幀(例如來自 dbplyr 或 dtplyr)。有關更多詳細信息,請參閱下麵的方法。

...

< data-masking > Name-value 對。名稱給出輸出中列的名稱。

該值可以是:

  • 長度為1的向量,將被回收到正確的長度。

  • 與當前組(或整個數據幀,如果未分組)長度相同的向量。

  • NULL ,刪除該列。

  • DataFrame 或小標題,用於在輸出中創建多個列。

.by

[Experimental]

< tidy-select > (可選)僅針對此操作選擇要分組的列,作為 group_by() 的替代方案。有關詳細信息和示例,請參閱?dplyr_by

.keep

控製 .data 中的哪些列保留在輸出中。始終保留分組列和 ... 創建的列。

  • "all" 保留 .data 中的所有列。這是默認設置。

  • "used" 僅保留 ... 中使用的列以創建新列。這對於檢查您的工作非常有用,因為它並排顯示輸入和輸出。

  • "unused" 僅保留 ... 中未使用的列以創建新列。如果您生成新列,但不再需要用於生成它們的列,這非常有用。

  • "none" 不保留 .data 中的任何額外列。僅保留... 創建的分組變量和列。

.before, .after

< tidy-select > (可選)控製新列應出現的位置(默認設置是添加到右側)。有關更多詳細信息,請參閱relocate()

.data 類型相同的對象。輸出具有以下屬性:

  • .data 中的列將根據 .keep 參數保留。

  • ... 修改的現有列將始終返回到其原始位置。

  • 通過... 創建的新列將根據.before.after 參數進行放置。

  • 行數不受影響。

  • 指定值為 NULL 的列將被刪除。

  • 如果分組變量發生變化,將重新計算組。

  • DataFrame 屬性被保留。

分組標題

由於變異表達式是在組內計算的,因此它們可能會在分組的 tibbles 上產生不同的結果。一旦涉及聚合、滯後或排名函數,就會出現這種情況。比較這個未分組的變異:

starwars %>%
  select(name, mass, species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

與分組等效:

starwars %>%
  select(name, mass, species) %>%
  group_by(species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

前者按全局平均值標準化mass,而後者按物種水平內的平均值標準化。

方法

該函數是泛型函數,這意味著包可以為其他類提供實現(方法)。有關額外參數和行為差異,請參閱各個方法的文檔。

當前加載的包中可用的方法: dbplyr ( tbl_lazy )、dplyr ( data.frame ) 。

也可以看看

其他單表動詞: arrange()filter()reframe()rename()select()slice()summarise()

例子

# Newly created variables are available immediately
starwars %>%
  select(name, mass) %>%
  mutate(
    mass2 = mass * 2,
    mass2_squared = mass2 * mass2
  )
#> # A tibble: 87 × 4
#>    name                mass mass2 mass2_squared
#>    <chr>              <dbl> <dbl>         <dbl>
#>  1 Luke Skywalker        77   154         23716
#>  2 C-3PO                 75   150         22500
#>  3 R2-D2                 32    64          4096
#>  4 Darth Vader          136   272         73984
#>  5 Leia Organa           49    98          9604
#>  6 Owen Lars            120   240         57600
#>  7 Beru Whitesun lars    75   150         22500
#>  8 R5-D4                 32    64          4096
#>  9 Biggs Darklighter     84   168         28224
#> 10 Obi-Wan Kenobi        77   154         23716
#> # ℹ 77 more rows

# As well as adding new variables, you can use mutate() to
# remove variables and modify existing variables.
starwars %>%
  select(name, height, mass, homeworld) %>%
  mutate(
    mass = NULL,
    height = height * 0.0328084 # convert to feet
  )
#> # A tibble: 87 × 3
#>    name               height homeworld
#>    <chr>               <dbl> <chr>    
#>  1 Luke Skywalker       5.64 Tatooine 
#>  2 C-3PO                5.48 Tatooine 
#>  3 R2-D2                3.15 Naboo    
#>  4 Darth Vader          6.63 Tatooine 
#>  5 Leia Organa          4.92 Alderaan 
#>  6 Owen Lars            5.84 Tatooine 
#>  7 Beru Whitesun lars   5.41 Tatooine 
#>  8 R5-D4                3.18 Tatooine 
#>  9 Biggs Darklighter    6.00 Tatooine 
#> 10 Obi-Wan Kenobi       5.97 Stewjon  
#> # ℹ 77 more rows

# Use across() with mutate() to apply a transformation
# to multiple columns in a tibble.
starwars %>%
  select(name, homeworld, species) %>%
  mutate(across(!name, as.factor))
#> # A tibble: 87 × 3
#>    name               homeworld species
#>    <chr>              <fct>     <fct>  
#>  1 Luke Skywalker     Tatooine  Human  
#>  2 C-3PO              Tatooine  Droid  
#>  3 R2-D2              Naboo     Droid  
#>  4 Darth Vader        Tatooine  Human  
#>  5 Leia Organa        Alderaan  Human  
#>  6 Owen Lars          Tatooine  Human  
#>  7 Beru Whitesun lars Tatooine  Human  
#>  8 R5-D4              Tatooine  Droid  
#>  9 Biggs Darklighter  Tatooine  Human  
#> 10 Obi-Wan Kenobi     Stewjon   Human  
#> # ℹ 77 more rows
# see more in ?across

# Window functions are useful for grouped mutates:
starwars %>%
  select(name, mass, homeworld) %>%
  group_by(homeworld) %>%
  mutate(rank = min_rank(desc(mass)))
#> # A tibble: 87 × 4
#> # Groups:   homeworld [49]
#>    name                mass homeworld  rank
#>    <chr>              <dbl> <chr>     <int>
#>  1 Luke Skywalker        77 Tatooine      5
#>  2 C-3PO                 75 Tatooine      6
#>  3 R2-D2                 32 Naboo         6
#>  4 Darth Vader          136 Tatooine      1
#>  5 Leia Organa           49 Alderaan      2
#>  6 Owen Lars            120 Tatooine      2
#>  7 Beru Whitesun lars    75 Tatooine      6
#>  8 R5-D4                 32 Tatooine      8
#>  9 Biggs Darklighter     84 Tatooine      3
#> 10 Obi-Wan Kenobi        77 Stewjon       1
#> # ℹ 77 more rows
# see `vignette("window-functions")` for more details

# By default, new columns are placed on the far right.
df <- tibble(x = 1, y = 2)
df %>% mutate(z = x + y)
#> # A tibble: 1 × 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
df %>% mutate(z = x + y, .before = 1)
#> # A tibble: 1 × 3
#>       z     x     y
#>   <dbl> <dbl> <dbl>
#> 1     3     1     2
df %>% mutate(z = x + y, .after = x)
#> # A tibble: 1 × 3
#>       x     z     y
#>   <dbl> <dbl> <dbl>
#> 1     1     3     2

# By default, mutate() keeps all columns from the input data.
df <- tibble(x = 1, y = 2, a = "a", b = "b")
df %>% mutate(z = x + y, .keep = "all") # the default
#> # A tibble: 1 × 5
#>       x     y a     b         z
#>   <dbl> <dbl> <chr> <chr> <dbl>
#> 1     1     2 a     b         3
df %>% mutate(z = x + y, .keep = "used")
#> # A tibble: 1 × 3
#>       x     y     z
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
df %>% mutate(z = x + y, .keep = "unused")
#> # A tibble: 1 × 3
#>   a     b         z
#>   <chr> <chr> <dbl>
#> 1 a     b         3
df %>% mutate(z = x + y, .keep = "none")
#> # A tibble: 1 × 1
#>       z
#>   <dbl>
#> 1     3

# Grouping ----------------------------------------
# The mutate operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
# The following normalises `mass` by the global average:
starwars %>%
  select(name, mass, species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
#> # A tibble: 87 × 4
#>    name                mass species mass_norm
#>    <chr>              <dbl> <chr>       <dbl>
#>  1 Luke Skywalker        77 Human       0.791
#>  2 C-3PO                 75 Droid       0.771
#>  3 R2-D2                 32 Droid       0.329
#>  4 Darth Vader          136 Human       1.40 
#>  5 Leia Organa           49 Human       0.504
#>  6 Owen Lars            120 Human       1.23 
#>  7 Beru Whitesun lars    75 Human       0.771
#>  8 R5-D4                 32 Droid       0.329
#>  9 Biggs Darklighter     84 Human       0.863
#> 10 Obi-Wan Kenobi        77 Human       0.791
#> # ℹ 77 more rows

# Whereas this normalises `mass` by the averages within species
# levels:
starwars %>%
  select(name, mass, species) %>%
  group_by(species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
#> # A tibble: 87 × 4
#> # Groups:   species [38]
#>    name                mass species mass_norm
#>    <chr>              <dbl> <chr>       <dbl>
#>  1 Luke Skywalker        77 Human       0.930
#>  2 C-3PO                 75 Droid       1.08 
#>  3 R2-D2                 32 Droid       0.459
#>  4 Darth Vader          136 Human       1.64 
#>  5 Leia Organa           49 Human       0.592
#>  6 Owen Lars            120 Human       1.45 
#>  7 Beru Whitesun lars    75 Human       0.906
#>  8 R5-D4                 32 Droid       0.459
#>  9 Biggs Darklighter     84 Human       1.01 
#> 10 Obi-Wan Kenobi        77 Human       0.930
#> # ℹ 77 more rows

# Indirection ----------------------------------------
# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
#> # A tibble: 87 × 15
#>    name      height  mass hair_color skin_color eye_color birth_year sex  
#>    <chr>      <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr>
#>  1 Luke Sky…    172    77 blond      fair       blue            19   male 
#>  2 C-3PO        167    75 NA         gold       yellow         112   none 
#>  3 R2-D2         96    32 NA         white, bl… red             33   none 
#>  4 Darth Va…    202   136 none       white      yellow          41.9 male 
#>  5 Leia Org…    150    49 brown      light      brown           19   fema…
#>  6 Owen Lars    178   120 brown, gr… light      blue            52   male 
#>  7 Beru Whi…    165    75 brown      light      blue            47   fema…
#>  8 R5-D4         97    32 NA         white, red red             NA   none 
#>  9 Biggs Da…    183    84 black      light      brown           24   male 
#> 10 Obi-Wan …    182    77 auburn, w… fair       blue-gray       57   male 
#> # ℹ 77 more rows
#> # ℹ 7 more variables: gender <chr>, homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>, prod <dbl>
# Learn more in ?rlang::args_data_masking
源代碼:R/mutate.R

相關用法


注:本文由純淨天空篩選整理自Hadley Wickham等大神的英文原創作品 Create, modify, and delete columns。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。