R parsnip set_new_model 注册模型的工具

这些函数与构造函数类似，可用于验证与包使用的底层模型结构不存在冲突。

用法

set_new_model(model)

set_model_mode(model, mode)

set_model_engine(model, mode, eng)

set_model_arg(model, eng, parsnip, original, func, has_submodel)

set_dependency(model, eng, pkg = "parsnip", mode = NULL)

get_dependency(model)

set_fit(model, mode, eng, value)

get_fit(model)

set_pred(model, mode, eng, type, value)

get_pred_type(model, type)

show_model_info(model)

pred_value_template(pre = NULL, post = NULL, func, ...)

set_encoding(model, mode, eng, options)

get_encoding(model)

参数

model: 模型类型的单个字符串(例如 "rand_forest" 等)。
mode: 模型模式的单个字符串(例如"regression")。
eng: 模型引擎的单个字符串。
parsnip: parsnip 公开的 "harmonized" 参数名称的单个字符串。
original: 底层模型函数使用的参数名称的单个字符串。
func: 说明如何调用函数的命名字符向量。 func 应该包含元素 pkg 和 fun 。前者是可选的，但建议使用，后者是必需的。例如，c(pkg = "stats", fun = "lm") 将用于调用通常的线性回归函数。在某些情况下，在使用包的 predict 方法时使用 c(fun = "predict") 会很有帮助。
has_submodel: 关于参数是否可以同时对多个子模型进行预测的单一逻辑。
pkg: 包名称的选项字符串。
value: 符合下面 fit_obj 或 pred_obj 说明的列表，具体取决于上下文。
type: 预测类型的单个字符值。可能的值为： class 、 conf_int 、 numeric 、 pred_int 、 prob 、 quantile 和 raw 。
pre, post: 用于预测结果的预处理和后处理的可选函数。
...: 应传递到预测对象的 args 槽中的可选参数。
options: 特定于引擎的预处理编码的选项列表。请参阅下面的详细信息。

细节

这些函数可供用户添加自己的模型或引擎(在包中或以其他方式)，以便可以使用 parsnip 访问它们。软件包网站上对此进行了更详细的记录(请参阅下面的引用)。

简而言之，parsnip 存储一个环境对象，其中包含有关如何使用模型的所有信息和代码(例如拟合、预测等)。这些函数可用于将模型添加到该环境中，以及可用于确保模型数据采用正确格式的辅助函数。

check_model_exists() 检查模型值并确保模型已注册。 check_model_doesnt_exist() 检查模型值，并检查它在环境中是否新颖。

特定于引擎的编码选项决定了如何处理预测变量。这些选项确保 parsnip 提供给基础模型的数据允许模型拟合与其直接生成的数据尽可能相似。

例如，如果使用 fit() 来拟合没有公式接口的模型，通常必须进行一些预测器预处理。 glmnet 就是一个很好的例子。

有四个选项可用于编码：

predictor_indicators 说明是否以及如何从因子预测变量创建指标/虚拟变量。共有三个选项："none"(不扩展因子预测变量)、"traditional"(应用标准 model.matrix() 编码)和 "one_hot"(创建包括所有因子的基线水平的完整集)。此编码仅影响使用 fit.model_spec() 并且底层模型具有 x/y 接口的情况。

另一个选项是compute_intercept；这控制 model.matrix() 是否应在其公式中包含截距。这影响的不仅仅是截距列的包含。通过截距，model.matrix() 计算除一个因子水平之外的所有虚拟变量。如果没有截距，model.matrix() 会计算第一个因子变量的完整指标集，但会计算其余变量的不完整指标集。

接下来，选项remove_intercept将在model.matrix()完成后删除截距列。如果模型函数(例如 lm() )自动生成截距，这会很有用。

最后，allow_sparse_x 指定模型函数是否可以在拟合和调整期间本机容纳预测变量的稀疏矩阵表示。

参考

“如何构建防风草模型”https://www.tidymodels.org/learn/develop/models/

例子

# set_new_model("shallow_learning_model")

# Show the information about a model:
show_model_info("rand_forest")
#> Information for `rand_forest`
#>  modes: unknown, classification, regression, censored regression 
#> 
#>  engines: 
#>    classification: randomForest, ranger¹, spark
#>    regression:     randomForest, ranger¹, spark
#> 
#> ¹The model can use case weights.
#> 
#>  arguments: 
#>    ranger:       
#>       mtry  --> mtry
#>       trees --> num.trees
#>       min_n --> min.node.size
#>    randomForest: 
#>       mtry  --> mtry
#>       trees --> ntree
#>       min_n --> nodesize
#>    spark:        
#>       mtry  --> feature_subset_strategy
#>       trees --> num_trees
#>       min_n --> min_instances_per_node
#> 
#>  fit modules:
#>          engine           mode
#>          ranger classification
#>          ranger     regression
#>    randomForest classification
#>    randomForest     regression
#>           spark classification
#>           spark     regression
#> 
#>  prediction modules:
#>              mode       engine                    methods
#>    classification randomForest           class, prob, raw
#>    classification       ranger class, conf_int, prob, raw
#>    classification        spark                class, prob
#>        regression randomForest               numeric, raw
#>        regression       ranger     conf_int, numeric, raw
#>        regression        spark                    numeric
#>

源代码：R/aaa_models.R

相关用法

注：本文由纯净天空筛选整理自Max Kuhn等大神的英文原创作品 Tools to Register Models。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。