Python ArcGIS forest用法及代碼示例

本文簡要介紹 python 語言中 arcgis.geoanalytics.analyze_patterns.forest 的用法。

用法: arcgis.geoanalytics.analyze_patterns.forest(input_layer, var_prediction, var_explanatory, trees, max_tree_depth=None, random_vars=None, sample_size=100, min_leaf_size=None, prediction_type='train', features_to_predict=None, validation=10, importance_tbl=False, exp_var_matching=None, output_name=None, gis=None, context=None, future=False, return_tuple=False)

返回：

如果 return_tuple 設置為“True”，則結果元組具有以下鍵：
- output : FeatureLayer
- output_predicted : FeatureLayer
- coefficient_table : Table
- process_info：列表
否則， FeatureLayer

‘forest’ 方法是基於森林的分類和回歸任務，它使用 Leo Breiman 的隨機森林算法(一種有監督的機器學習方法)的改編創建模型並生成預測。可以對分類變量(分類)和連續變量(回歸)執行預測。解釋變量可以采用訓練特征屬性表中字段的形式。除了基於訓練數據驗證模型性能之外，還可以對另一個特征數據集進行預測。

以下是示例：

Given data on occurrence of seagrass, as well as a number of environmental explanatory variables represented as both attributes which has been enriched using a multi-variable grid to calculate distances to factories upstream and major ports, future seagrass occurrence can be predicted based on future projections for those same environmental explanatory variables.

Suppose you have crop yield data at hundreds of farms across the country along with other attributes at each of those farms (number of employees, acreage, and so on). Using these pieces of data, you can provide a set of features representing farms where you don’t have crop yield (but you do have all of the other variables), and make a prediction about crop yield.

Housing values can be predicted based on the prices of houses that have been sold in the current year. The sale price of homes sold along with information about the number of bedrooms, distance to schools, proximity to major highways, average income, and crime counts can be used to predict sale prices of similar homes.

注意：

ArcGIS Enterprise 10.7 中提供了基於森林的分類和回歸。

Parameter	Description
input_layer	必需的層。將用於訓練數據集的特征。該層必須包括表示要預測的變量和解釋變量的字段。請參閱特征輸入。
var_prediction	必需的字典。 `input_layer` 參數中的變量包含用於訓練模型的值，以及一個表示它是否為分類的布爾值。此字段包含將用於在未知位置進行預測的變量的已知(訓練)值。語法：`{"fieldName":"<field name>", "categorical":bool}`
var_explanatory	必填清單。表示解釋變量的字段列表和表示字段是否為分類的布爾值。解釋變量有助於預測 `var_prediction` 參數的值或類別。對表示類或類別(例如土地覆蓋或存在或不存在)的任何變量使用分類參數。對於任何代表類或類別(例如土地覆蓋或存在或不存在)的變量，將變量指定為“True”，如果變量是連續的，則將其指定為“False”。用法:`[{"fieldName":"<field name>", "categorical":bool},...]`
trees	必需的整數。在森林模型中創建的樹的數量。更多的樹通常會導致更準確的模型預測，但模型會花費更長的時間來計算。
max_tree_depth	可選整數。將沿樹進行的最大拆分數。使用較大的最大深度，將創建更多的分割，這可能會增加模型過度擬合的機會。默認值是數據驅動的，取決於創建的樹數和包含的變量數。 `max_tree_depth` 必須為正且小於或等於 30。
random_vars	可選整數。指定用於創建每個決策樹的解釋變量的數量。森林中的每個決策樹都是使用指定的解釋變量的隨機子集創建的。增加每個決策樹中使用的變量數量將增加過度擬合模型的機會，尤其是在存在一個或幾個主要變量的情況下。如果您的 variablePredict 是數字，則通常的做法是使用解釋變量總數(字段、距離和柵格組合)的平方根，或者將解釋變量(字段、距離和柵格組合)的總數除以 3，如果`var_prediction` 是分類的。
sample_size	可選整數。指定用於每個決策樹的`input_layer` 的百分比。每棵樹的樣本都是從指定數據的two-thirds 中隨機抽取的。默認值為 100% 的數據。
min_leaf_size	可選整數。保持葉子(即樹上沒有進一步分裂的終端節點)所需的最小觀察次數。對於非常大的數據，增加這些數字將減少工具的運行時間。回歸的默認最小值為 5，分類的默認最小值為 1。
prediction_type	可選字符串。指定工具的操作模式。可以運行該工具來訓練模型以僅評估性能，或訓練模型並預測特征。預測類型如下： `Train` - A model will be trained, but no predictions will be generated. Use this option to assess the accuracy of your model before generating predictions. This option will output model diagnostics in the messages window and a chart of variable importance. `TrainAndPredict` - Predictions or classifications will be generated for features. Explanatory variables must be provided for both the training features and the features to be predicted. The output of this option will be a feature service, model diagnostics, and an optional table of variable importance. 默認值為“訓練”。
features_to_predict(如果使用`TrainAndPredict`則需要)	可選層。表示將進行預測的位置的要素圖層。此層必須包含與 `input_layer` 中使用的字段相對應的解釋變量字段。此參數僅在`prediction_type` 為`TrainAndPredict` 時使用並且在這種情況下是必需的。請參閱特征輸入。
validation	可選整數。指定要保留作為驗證測試數據集的 inFeatures 百分比(10% 到 50% 之間)。該模型將在沒有此隨機數據子集的情況下進行訓練，並將這些特征的觀察值與預測值進行比較。默認值為 10%。
importance_tbl	可選的布爾值。指定是否將生成一個輸出表，其中包含說明創建的模型中使用的每個解釋變量的重要性的信息。
exp_var_matching	可選的字典列表。表示解釋變量的字段列表和表示字段是否為分類的布爾值。解釋變量有助於預測 variable_predict 的值或類別。對表示類或類別(例如土地覆蓋或存在或不存在)的任何變量使用分類參數。對於任何代表類或類別(例如土地覆蓋或存在或不存在)的變量，將變量指定為“True”，如果變量是連續的，則將其指定為“False”。用法:`[{"fieldName":"<explanatory field name>", "categorical":bool}]` fieldname is the name of the field in the `input_layer` used to predict the `var_prediction`. categorical is one of: ‘True’ or ‘False’. A string field should always be ‘True’, and a continue value should always be set as ‘False’.
output_name	可選字符串。該任務將創建結果的要素服務。您定義服務的名稱。
gis	可選 `GIS` 。運行該工具的 GIS。如果未指定，則使用活動 GIS。
context	可選字典。 context 參數包含影響任務執行的其他設置。對於此任務，有四個設置： `extent` - A bounding box that defines the analysis area. Only those features that intersect the bounding box will be analyzed. `processSR` - The features will be projected into this coordinate system for analysis. `outSR` - The features will be projected into this coordinate system after the analysis to be saved. The output spatial reference for the spatiotemporal big data store is always WGS84. `dataStore`- Results will be saved to the specified data store. For ArcGIS Enterprise, the default is the spatiotemporal big data store.
future	可選的布爾值。如果為“真”，則返回 GPJob 而不是結果。可以查詢GPJob 的執行狀態。默認值為“假”。
return_tuple	可選的布爾值。如果為“True”，則返回具有多個輸出鍵的命名元組。默認值為“假”。

例子：

# Usage Example: To predict the number of 911 calls in each block group.
predicted_result = forest(input_layer=call_lyr,
                          var_prediction={"fieldName":"Calls", "categorical":False},
                          var_explanatory=[{"fieldName":"Pop", "categorical":False},
                                            {"fieldName":"Unemployed", "categorical":False},
                                            {"fieldName":"AlcoholX", "categorical":False},
                                            {"fieldName":"UnEmpRate", "categorical":False},
                                            {"fieldName":"MedAge00", "categorical":False}],
                          trees=50,
                          max_tree_depth=10,
                          random_vars=3,
                          sample_size=100,
                          min_leaf_size=5,
                          prediction_type='TrainAndPredict',
                          validation=10,
                          importance_tbl=True,
                          output_name='train and predict number of 911 calls')

相關用法

注：本文由純淨天空篩選整理自arcgis.com大神的英文原創作品 arcgis.geoanalytics.analyze_patterns.forest。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。