本文整理匯總了Python中pyspark.ml.feature.StringIndexer._call_java方法的典型用法代碼示例。如果您正苦於以下問題:Python StringIndexer._call_java方法的具體用法?Python StringIndexer._call_java怎麽用?Python StringIndexer._call_java使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類pyspark.ml.feature.StringIndexer
的用法示例。
在下文中一共展示了StringIndexer._call_java方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Python代碼示例。
示例1: applyModel
# 需要導入模塊: from pyspark.ml.feature import StringIndexer [as 別名]
# 或者: from pyspark.ml.feature.StringIndexer import _call_java [as 別名]
#.........這裏部分代碼省略.........
print('Loaded and prapared %d entries' % df.count())
#########
# keep only needed features
#########
features = ['ADLOADINGTIME',
'PLACEMENTID',
'TIMESTAMP',
'CREATIVETYPE',
'UA_HARDWARETYPE',
'UA_VENDOR',
'UA_MODEL',
'UA_BROWSER',
'UA_BROWSERVERSION',
'FILESJSON',
'ERRORSJSON',
'TOPMOSTREACHABLEWINDOWAREA',
'FILESJSON_SIZE',
'COMBINEDID',
'COMBINEDEXTERNALID',
'PLATFORMCOMBINED',
'UA_OSCOMB',
'SDK',
'EXTERNALADSERVER'
]
df = df.select(features)
#########
# Convert categorical features to numerical
#########
featuresCat = [
'PLACEMENTID',
'CREATIVETYPE',
'UA_HARDWARETYPE',
'UA_VENDOR',
'UA_MODEL',
'UA_BROWSER',
'UA_BROWSERVERSION',
'FILESJSON',
'ERRORSJSON',
'COMBINEDID',
'COMBINEDEXTERNALID',
'PLATFORMCOMBINED',
'UA_OSCOMB',
'SDK',
'EXTERNALADSERVER'
]
for i in range(len(featuresCat)):
indexer = StringIndexer(inputCol=featuresCat[i], outputCol='_'+featuresCat[i]).setHandleInvalid("skip").fit(df)
df = indexer.transform(df).drop(featuresCat[i])
writer = indexer._call_java("write")
writer.overwrite().save("indexer_" + featuresCat[i])
featuresCat = [ '_' + featuresCat[i] for i in range(len(featuresCat))]
features = featuresCat[:]
features.append('TIMESTAMP')
features.append('FILESJSON_SIZE')
features.append('TOPMOSTREACHABLEWINDOWAREA')
#########
# Assemble features
#########
assembler = VectorAssembler(
inputCols=features,
outputCol="features")
df = assembler.transform(df)
#########
# Convert to labeled point
#########
lp = (df.select(func.col("ADLOADINGTIME").alias("label"), func.col("features"))
.map(lambda row: LabeledPoint(row.label, row.features)))
lp.cache()
#########
# Load trained model
#########
model = RandomForestModel.load(sc, loadModelName)
print('Model loaded!')
predictions = model.predict(lp.map(lambda x: x.features)).collect()
return predictions