R textrecipes step_clean_names 干净的变量名称

step_clean_names() 创建配方步骤的规范，该步骤将清理变量名称，以便名称仅包含字母、数字和下划线。

用法

step_clean_names(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  clean = NULL,
  skip = FALSE,
  id = rand_id("clean_names")
)

参数

recipe: 一个recipe 对象。该步骤将添加到此配方的操作序列中。
...: 一个或多个选择器函数用于选择受该步骤影响的变量。有关更多详细信息，请参阅recipes::selections()。
role: 由于没有创建新变量，因此此步骤未使用。
trained: 指示预处理数量是否已估计的逻辑。
clean: 用于清理变量名称的命名字符向量。在由 recipes::prep.recipe() 计算之前，这是 NULL 。
skip: 一个合乎逻辑的。当recipes::bake.recipe() 烘焙食谱时是否应该跳过此步骤？虽然所有操作都是在 recipes::prep.recipe() 运行时烘焙的，但某些操作可能无法对新数据进行(例如处理结果变量)。使用 skip = FALSE 时应小心。
id: 该步骤特有的字符串，用于标识它。

值

recipe 的更新版本，其中新步骤添加到现有步骤(如果有)的序列中。

整理

当您tidy()这一步时，会出现一个包含列terms(新的干净变量名称)和value(原始变量名称)的小标题。

箱重

底层操作不允许使用案例权重。

也可以看看

step_clean_levels() , recipes::step_factor2string() , recipes::step_string2factor() , recipes::step_regex() , recipes::step_unknown() , recipes::step_novel() , recipes::step_other()

文本清理的其他步骤：step_clean_levels()

例子

library(recipes)
data(airquality)

air_tr <- tibble(airquality[1:100, ])
air_te <- tibble(airquality[101:153, ])

rec <- recipe(~., data = air_tr)

rec <- rec %>%
  step_clean_names(all_predictors())
rec <- prep(rec, training = air_tr)
tidy(rec, number = 1)
#> # A tibble: 6 × 3
#>   terms   value   id               
#>   <chr>   <chr>   <chr>            
#> 1 ozone   Ozone   clean_names_Q6XEj
#> 2 solar_r Solar.R clean_names_Q6XEj
#> 3 wind    Wind    clean_names_Q6XEj
#> 4 temp    Temp    clean_names_Q6XEj
#> 5 month   Month   clean_names_Q6XEj
#> 6 day     Day     clean_names_Q6XEj

bake(rec, air_tr)
#> # A tibble: 100 × 6
#>    ozone solar_r  wind  temp month   day
#>    <int>   <int> <dbl> <int> <int> <int>
#>  1    41     190   7.4    67     5     1
#>  2    36     118   8      72     5     2
#>  3    12     149  12.6    74     5     3
#>  4    18     313  11.5    62     5     4
#>  5    NA      NA  14.3    56     5     5
#>  6    28      NA  14.9    66     5     6
#>  7    23     299   8.6    65     5     7
#>  8    19      99  13.8    59     5     8
#>  9     8      19  20.1    61     5     9
#> 10    NA     194   8.6    69     5    10
#> # ℹ 90 more rows
bake(rec, air_te)
#> # A tibble: 53 × 6
#>    ozone solar_r  wind  temp month   day
#>    <int>   <int> <dbl> <int> <int> <int>
#>  1   110     207   8      90     8     9
#>  2    NA     222   8.6    92     8    10
#>  3    NA     137  11.5    86     8    11
#>  4    44     192  11.5    86     8    12
#>  5    28     273  11.5    82     8    13
#>  6    65     157   9.7    80     8    14
#>  7    NA      64  11.5    79     8    15
#>  8    22      71  10.3    77     8    16
#>  9    59      51   6.3    79     8    17
#> 10    23     115   7.4    76     8    18
#> # ℹ 43 more rows

源代码：R/clean_names.R

相关用法

注：本文由纯净天空筛选整理自等大神的英文原创作品 Clean Variable Names。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。