本文整理匯總了Java中org.apache.spark.api.java.JavaSparkContext.parallelizePairs方法的典型用法代碼示例。如果您正苦於以下問題:Java JavaSparkContext.parallelizePairs方法的具體用法?Java JavaSparkContext.parallelizePairs怎麽用?Java JavaSparkContext.parallelizePairs使用的例子?那麽, 這裏精選的方法代碼示例或許可以為您提供幫助。您也可以進一步了解該方法所在類org.apache.spark.api.java.JavaSparkContext
的用法示例。
在下文中一共展示了JavaSparkContext.parallelizePairs方法的3個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。
示例1: main
import org.apache.spark.api.java.JavaSparkContext; //導入方法依賴的package包/類
public static void main(String[] args) {
SparkSession sparkSession = SparkSession.builder().master("local").appName("My App")
.config("spark.sql.warehouse.dir", "file:////C:/Users/sgulati/spark-warehouse").getOrCreate();
JavaSparkContext jsc = new JavaSparkContext(sparkSession.sparkContext());
JavaPairRDD<String, String> userIdToCityId = jsc.parallelizePairs(
Arrays.asList(new Tuple2<String, String>("1", "101"), new Tuple2<String, String>("2", "102"),
new Tuple2<String, String>("3", "107"), new Tuple2<String, String>("4", "103"),
new Tuple2<String, String>("11", "101"), new Tuple2<String, String>("12", "102"),
new Tuple2<String, String>("13", "107"), new Tuple2<String, String>("14", "103")));
JavaPairRDD<String, String> cityIdToCityName = jsc.parallelizePairs(
Arrays.asList(new Tuple2<String, String>("101", "India"), new Tuple2<String, String>("102", "UK"),
new Tuple2<String, String>("103", "Germany"), new Tuple2<String, String>("107", "USA")));
Broadcast<Map<String, String>> citiesBroadcasted = jsc.broadcast(cityIdToCityName.collectAsMap());
JavaRDD<Tuple3<String, String, String>> joined = userIdToCityId.map(
v1 -> new Tuple3<String, String, String>(v1._1(), v1._2(), citiesBroadcasted.value().get(v1._2())));
System.out.println(joined.collect());
}
開發者ID:PacktPublishing,項目名稱:Apache-Spark-2x-for-Java-Developers,代碼行數:26,代碼來源:MapSideJoinBroadcast.java
示例2: main
import org.apache.spark.api.java.JavaSparkContext; //導入方法依賴的package包/類
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "C:\\softwares\\Winutils");
SparkConf conf = new SparkConf().setMaster("local").setAppName("Partitioning");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaPairRDD<String, String> pairRdd = jsc.parallelizePairs(
Arrays.asList(new Tuple2<String, String>("India", "Asia"),new Tuple2<String, String>("Germany", "Europe"),
new Tuple2<String, String>("Japan", "Asia"),new Tuple2<String, String>("France", "Europe"))
,3);
JavaPairRDD<String, String> customPartitioned = pairRdd.partitionBy(new CustomPartitioner());
System.out.println(customPartitioned.getNumPartitions());
JavaRDD<String> mapPartitionsWithIndex = customPartitioned.mapPartitionsWithIndex((index, tupleIterator) -> {
List<String> list=new ArrayList<>();
while(tupleIterator.hasNext()){
list.add("Partition number:"+index+",key:"+tupleIterator.next()._1());
}
return list.iterator();
}, true);
System.out.println(mapPartitionsWithIndex.collect());
}
開發者ID:PacktPublishing,項目名稱:Apache-Spark-2x-for-Java-Developers,代碼行數:30,代碼來源:CustomPartitionerExample.java
示例3: main
import org.apache.spark.api.java.JavaSparkContext; //導入方法依賴的package包/類
public static void main(String[] args) {
System.setProperty("hadoop.home.dir", "C:\\softwares\\Winutils");
SparkConf conf = new SparkConf().setMaster("local").setAppName("Partitioning");
JavaSparkContext jsc = new JavaSparkContext(conf);
JavaPairRDD<Integer, String> pairRdd = jsc.parallelizePairs(
Arrays.asList(new Tuple2<Integer, String>(1, "A"),new Tuple2<Integer, String>(2, "B"),
new Tuple2<Integer, String>(3, "C"),new Tuple2<Integer, String>(4, "D"),
new Tuple2<Integer, String>(5, "E"),new Tuple2<Integer, String>(6, "F"),
new Tuple2<Integer, String>(7, "G"),new Tuple2<Integer, String>(8, "H")),3);
RDD<Tuple2<Integer, String>> rdd = JavaPairRDD.toRDD(pairRdd);
System.out.println(pairRdd.getNumPartitions());
// JavaPairRDD<Integer, String> hashPartitioned = pairRdd.partitionBy(new HashPartitioner(2));
//
// System.out.println(hashPartitioned.getNumPartitions());
RangePartitioner rangePartitioner = new RangePartitioner(4, rdd, true, scala.math.Ordering.Int$.MODULE$ , scala.reflect.ClassTag$.MODULE$.apply(Integer.class));
JavaPairRDD<Integer, String> rangePartitioned = pairRdd.partitionBy(rangePartitioner);
JavaRDD<String> mapPartitionsWithIndex = rangePartitioned.mapPartitionsWithIndex((index, tupleIterator) -> {
List<String> list=new ArrayList<>();
while(tupleIterator.hasNext()){
list.add("Partition number:"+index+",key:"+tupleIterator.next()._1());
}
return list.iterator();
}, true);
System.out.println(mapPartitionsWithIndex.collect());
}