本文整理匯總了Java中org.apache.spark.api.java.JavaSparkContext.hadoopConfiguration方法的典型用法代碼示例。如果您正苦於以下問題:Java JavaSparkContext.hadoopConfiguration方法的具體用法?Java JavaSparkContext.hadoopConfiguration怎麽用?Java JavaSparkContext.hadoopConfiguration使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類org.apache.spark.api.java.JavaSparkContext
的用法示例。
在下文中一共展示了JavaSparkContext.hadoopConfiguration方法的1個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。
示例1: start
import org.apache.spark.api.java.JavaSparkContext; //導入方法依賴的package包/類
public synchronized void start() { // 加鎖,單線程執行
String id = getID();
if (id != null) {
log.info("Starting Batch Layer {}", id);
}
streamingContext = buildStreamingContext();
JavaSparkContext sparkContext = streamingContext.sparkContext();//saprk初始化方法
Configuration hadoopConf = sparkContext.hadoopConfiguration();
//設置路徑
Path checkpointPath = new Path(new Path(modelDirString), ".checkpoint");
log.info("Setting checkpoint dir to {}", checkpointPath);
sparkContext.setCheckpointDir(checkpointPath.toString());
//spark 讀取kafka的topic
log.info("Creating message stream from topic");
JavaInputDStream<ConsumerRecord<K,M>> kafkaDStream = buildInputDStream(streamingContext);
JavaPairDStream<K,M> pairDStream =
kafkaDStream.mapToPair(mAndM -> new Tuple2<>(mAndM.key(), mAndM.value()));
Class<K> keyClass = getKeyClass();
Class<M> messageClass = getMessageClass();
//對每條spark裏讀取的kafka信息做處理
pairDStream.foreachRDD(
new BatchUpdateFunction<>(getConfig(),
keyClass,
messageClass,
keyWritableClass,
messageWritableClass,
dataDirString,
modelDirString,
loadUpdateInstance(),
streamingContext));
// "Inline" saveAsNewAPIHadoopFiles to be able to skip saving empty RDDs
// spark讀取kafka數據,寫入到hdfs上,每條數據進行處理
pairDStream.foreachRDD(new SaveToHDFSFunction<>(
dataDirString + "/oryx",
"data",
keyClass,
messageClass,
keyWritableClass,
messageWritableClass,
hadoopConf));
// Must use the raw Kafka stream to get offsets
kafkaDStream.foreachRDD(new UpdateOffsetsFn<>(getGroupID(), getInputTopicLockMaster()));
if (maxDataAgeHours != NO_MAX_AGE) {
pairDStream.foreachRDD(new DeleteOldDataFn<>(hadoopConf,
dataDirString,
Pattern.compile("-(\\d+)\\."),
maxDataAgeHours));
}
if (maxModelAgeHours != NO_MAX_AGE) {
pairDStream.foreachRDD(new DeleteOldDataFn<>(hadoopConf,
modelDirString,
Pattern.compile("(\\d+)"),
maxModelAgeHours));
}
log.info("Starting Spark Streaming");
streamingContext.start();
}