本文整理汇总了Python中pyspark.ml.feature.StringIndexer._call_java方法的典型用法代码示例。如果您正苦于以下问题:Python StringIndexer._call_java方法的具体用法?Python StringIndexer._call_java怎么用?Python StringIndexer._call_java使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在类pyspark.ml.feature.StringIndexer
的用法示例。
在下文中一共展示了StringIndexer._call_java方法的1个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Python代码示例。
示例1: applyModel
# 需要导入模块: from pyspark.ml.feature import StringIndexer [as 别名]
# 或者: from pyspark.ml.feature.StringIndexer import _call_java [as 别名]
#.........这里部分代码省略.........
print('Loaded and prapared %d entries' % df.count())
#########
# keep only needed features
#########
features = ['ADLOADINGTIME',
'PLACEMENTID',
'TIMESTAMP',
'CREATIVETYPE',
'UA_HARDWARETYPE',
'UA_VENDOR',
'UA_MODEL',
'UA_BROWSER',
'UA_BROWSERVERSION',
'FILESJSON',
'ERRORSJSON',
'TOPMOSTREACHABLEWINDOWAREA',
'FILESJSON_SIZE',
'COMBINEDID',
'COMBINEDEXTERNALID',
'PLATFORMCOMBINED',
'UA_OSCOMB',
'SDK',
'EXTERNALADSERVER'
]
df = df.select(features)
#########
# Convert categorical features to numerical
#########
featuresCat = [
'PLACEMENTID',
'CREATIVETYPE',
'UA_HARDWARETYPE',
'UA_VENDOR',
'UA_MODEL',
'UA_BROWSER',
'UA_BROWSERVERSION',
'FILESJSON',
'ERRORSJSON',
'COMBINEDID',
'COMBINEDEXTERNALID',
'PLATFORMCOMBINED',
'UA_OSCOMB',
'SDK',
'EXTERNALADSERVER'
]
for i in range(len(featuresCat)):
indexer = StringIndexer(inputCol=featuresCat[i], outputCol='_'+featuresCat[i]).setHandleInvalid("skip").fit(df)
df = indexer.transform(df).drop(featuresCat[i])
writer = indexer._call_java("write")
writer.overwrite().save("indexer_" + featuresCat[i])
featuresCat = [ '_' + featuresCat[i] for i in range(len(featuresCat))]
features = featuresCat[:]
features.append('TIMESTAMP')
features.append('FILESJSON_SIZE')
features.append('TOPMOSTREACHABLEWINDOWAREA')
#########
# Assemble features
#########
assembler = VectorAssembler(
inputCols=features,
outputCol="features")
df = assembler.transform(df)
#########
# Convert to labeled point
#########
lp = (df.select(func.col("ADLOADINGTIME").alias("label"), func.col("features"))
.map(lambda row: LabeledPoint(row.label, row.features)))
lp.cache()
#########
# Load trained model
#########
model = RandomForestModel.load(sc, loadModelName)
print('Model loaded!')
predictions = model.predict(lp.map(lambda x: x.features)).collect()
return predictions