R textrecipes step_clean_names 幹淨的變量名稱

step_clean_names() 創建配方步驟的規範，該步驟將清理變量名稱，以便名稱僅包含字母、數字和下劃線。

用法

step_clean_names(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  clean = NULL,
  skip = FALSE,
  id = rand_id("clean_names")
)

參數

recipe: 一個recipe 對象。該步驟將添加到此配方的操作序列中。
...: 一個或多個選擇器函數用於選擇受該步驟影響的變量。有關更多詳細信息，請參閱recipes::selections()。
role: 由於沒有創建新變量，因此此步驟未使用。
trained: 指示預處理數量是否已估計的邏輯。
clean: 用於清理變量名稱的命名字符向量。在由 recipes::prep.recipe() 計算之前，這是 NULL 。
skip: 一個合乎邏輯的。當recipes::bake.recipe() 烘焙食譜時是否應該跳過此步驟？雖然所有操作都是在 recipes::prep.recipe() 運行時烘焙的，但某些操作可能無法對新數據進行(例如處理結果變量)。使用 skip = FALSE 時應小心。
id: 該步驟特有的字符串，用於標識它。

值

recipe 的更新版本，其中新步驟添加到現有步驟(如果有)的序列中。

整理

當您tidy()這一步時，會出現一個包含列terms(新的幹淨變量名稱)和value(原始變量名稱)的小標題。

箱重

底層操作不允許使用案例權重。

也可以看看

step_clean_levels() , recipes::step_factor2string() , recipes::step_string2factor() , recipes::step_regex() , recipes::step_unknown() , recipes::step_novel() , recipes::step_other()

文本清理的其他步驟：step_clean_levels()

例子

library(recipes)
data(airquality)

air_tr <- tibble(airquality[1:100, ])
air_te <- tibble(airquality[101:153, ])

rec <- recipe(~., data = air_tr)

rec <- rec %>%
  step_clean_names(all_predictors())
rec <- prep(rec, training = air_tr)
tidy(rec, number = 1)
#> # A tibble: 6 × 3
#>   terms   value   id               
#>   <chr>   <chr>   <chr>            
#> 1 ozone   Ozone   clean_names_Q6XEj
#> 2 solar_r Solar.R clean_names_Q6XEj
#> 3 wind    Wind    clean_names_Q6XEj
#> 4 temp    Temp    clean_names_Q6XEj
#> 5 month   Month   clean_names_Q6XEj
#> 6 day     Day     clean_names_Q6XEj

bake(rec, air_tr)
#> # A tibble: 100 × 6
#>    ozone solar_r  wind  temp month   day
#>    <int>   <int> <dbl> <int> <int> <int>
#>  1    41     190   7.4    67     5     1
#>  2    36     118   8      72     5     2
#>  3    12     149  12.6    74     5     3
#>  4    18     313  11.5    62     5     4
#>  5    NA      NA  14.3    56     5     5
#>  6    28      NA  14.9    66     5     6
#>  7    23     299   8.6    65     5     7
#>  8    19      99  13.8    59     5     8
#>  9     8      19  20.1    61     5     9
#> 10    NA     194   8.6    69     5    10
#> # ℹ 90 more rows
bake(rec, air_te)
#> # A tibble: 53 × 6
#>    ozone solar_r  wind  temp month   day
#>    <int>   <int> <dbl> <int> <int> <int>
#>  1   110     207   8      90     8     9
#>  2    NA     222   8.6    92     8    10
#>  3    NA     137  11.5    86     8    11
#>  4    44     192  11.5    86     8    12
#>  5    28     273  11.5    82     8    13
#>  6    65     157   9.7    80     8    14
#>  7    NA      64  11.5    79     8    15
#>  8    22      71  10.3    77     8    16
#>  9    59      51   6.3    79     8    17
#> 10    23     115   7.4    76     8    18
#> # ℹ 43 more rows

源代碼：R/clean_names.R

相關用法

注：本文由純淨天空篩選整理自等大神的英文原創作品 Clean Variable Names。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。