本文整理汇总了Python中pyspark.mllib.util.MLUtils.convertVectorColumnsFromML方法的典型用法代码示例。如果您正苦于以下问题:Python MLUtils.convertVectorColumnsFromML方法的具体用法?Python MLUtils.convertVectorColumnsFromML怎么用?Python MLUtils.convertVectorColumnsFromML使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类pyspark.mllib.util.MLUtils
的用法示例。
在下文中一共展示了MLUtils.convertVectorColumnsFromML方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: print
# 需要导入模块: from pyspark.mllib.util import MLUtils [as 别名]
# 或者: from pyspark.mllib.util.MLUtils import convertVectorColumnsFromML [as 别名]
input = "data/mllib/sample_libsvm_data.txt"
# Load input data
print("Loading LIBSVM file with UDT from " + input + ".")
df = spark.read.format("libsvm").load(input).cache()
print("Schema from LIBSVM:")
df.printSchema()
print("Loaded training data as a DataFrame with " +
str(df.count()) + " records.")
# Show statistical summary of labels.
labelSummary = df.describe("label")
labelSummary.show()
# Convert features column to an RDD of vectors.
features = MLUtils.convertVectorColumnsFromML(df, "features") \
.select("features").rdd.map(lambda r: r.features)
summary = Statistics.colStats(features)
print("Selected features column with average values:\n" +
str(summary.mean()))
# Save the records in a parquet file.
tempdir = tempfile.NamedTemporaryFile(delete=False).name
os.unlink(tempdir)
print("Saving to " + tempdir + " as Parquet file.")
df.write.parquet(tempdir)
# Load the records back.
print("Loading Parquet file with UDT from " + tempdir)
newDF = spark.read.parquet(tempdir)
print("Schema from Parquet:")
newDF.printSchema()