本文整理汇总了Python中pyspark.ml.feature.VectorAssembler.getInputCols方法的典型用法代码示例。如果您正苦于以下问题:Python VectorAssembler.getInputCols方法的具体用法?Python VectorAssembler.getInputCols怎么用?Python VectorAssembler.getInputCols使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类pyspark.ml.feature.VectorAssembler
的用法示例。
在下文中一共展示了VectorAssembler.getInputCols方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: main
# 需要导入模块: from pyspark.ml.feature import VectorAssembler [as 别名]
# 或者: from pyspark.ml.feature.VectorAssembler import getInputCols [as 别名]
#.........这里部分代码省略.........
rfc = RandomForestClassifier(
featuresCol="Features_vec",
labelCol="ArrDelayBucket",
predictionCol="Prediction",
maxBins=4896,
)
model = rfc.fit(training_data)
# Save the new model over the old one
model_output_path = "{}/models/spark_random_forest_classifier.flight_delays.baseline.bin".format(
base_path
)
model.write().overwrite().save(model_output_path)
# Evaluate model using test data
predictions = model.transform(test_data)
# Evaluate this split's results for each metric
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
for metric_name in metric_names:
evaluator = MulticlassClassificationEvaluator(
labelCol="ArrDelayBucket",
predictionCol="Prediction",
metricName=metric_name
)
score = evaluator.evaluate(predictions)
scores[metric_name].append(score)
print("{} = {}".format(metric_name, score))
#
# Collect feature importances
#
feature_names = vector_assembler.getInputCols()
feature_importance_list = model.featureImportances
for feature_name, feature_importance in zip(feature_names, feature_importance_list):
feature_importances[feature_name].append(feature_importance)
#
# Evaluate average and STD of each metric and print a table
#
import numpy as np
score_averages = defaultdict(float)
# Compute the table data
average_stds = [] # ha
for metric_name in metric_names:
metric_scores = scores[metric_name]
average_accuracy = sum(metric_scores) / len(metric_scores)
score_averages[metric_name] = average_accuracy
std_accuracy = np.std(metric_scores)
average_stds.append((metric_name, average_accuracy, std_accuracy))
# Print the table
print("\nExperiment Log")
print("--------------")
print(tabulate(average_stds, headers=["Metric", "Average", "STD"]))
#
# Persist the score to a sccore log that exists between runs
#
import pickle