本文整理汇总了Python中pyspark.ml.feature.VectorAssembler.getOutputCol方法的典型用法代码示例。如果您正苦于以下问题:Python VectorAssembler.getOutputCol方法的具体用法?Python VectorAssembler.getOutputCol怎么用?Python VectorAssembler.getOutputCol使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类pyspark.ml.feature.VectorAssembler
的用法示例。
在下文中一共展示了VectorAssembler.getOutputCol方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: OneHotEncoder
# 需要导入模块: from pyspark.ml.feature import VectorAssembler [as 别名]
# 或者: from pyspark.ml.feature.VectorAssembler import getOutputCol [as 别名]
onehotenc = OneHotEncoder(inputCol=c, outputCol=c+"-onehot", dropLast=False)
newdf = onehotenc.transform(newdf).drop(c)
newdf = newdf.withColumnRenamed(c+"-onehot", c)
return newdf
dfhot = oneHotEncodeColumns(dfnumeric, ["Take-out","GoodFor_lunch", "GoodFor_dinner", "GoodFor_breakfast"])
dfhot.show(5)
# Taining set
assembler = VectorAssembler(inputCols = list(set(dfhot.columns) | set(['stars','review_count'])), outputCol="features")
train = assembler.transform(dfhot)
# Kmeans set for 5 clusters
knum = 5
kmeans = KMeans(featuresCol=assembler.getOutputCol(), predictionCol="cluster", k=knum, seed=0)
model = kmeans.fit(train)
print "Model Created!"
# See cluster centers:
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
print(center)
# Apply the clustering model to our data:
prediction = model.transform(train)
prediction.groupBy("cluster").count().orderBy("cluster").show()
# Look at the features of each cluster
customerCluster = {}