本文整理汇总了Java中org.apache.crunch.io.ReadableSource类的典型用法代码示例。如果您正苦于以下问题:Java ReadableSource类的具体用法?Java ReadableSource怎么用?Java ReadableSource使用的例子?那么, 这里精选的类代码示例或许可以为您提供帮助。
ReadableSource类属于org.apache.crunch.io包,在下文中一共展示了ReadableSource类的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。
示例1: asSource
import org.apache.crunch.io.ReadableSource; //导入依赖的package包/类
/**
* Expose the given {@link Dataset} as a Crunch {@link ReadableSource}.
*
* Only the FileSystem {@code Dataset} implementation is supported and the
* file format must be {@code Formats.PARQUET} or {@code Formats.AVRO}.
*
* @param dataset the dataset to read from
* @param type the Java type of the entities in the dataset
* @param <E> the type of entity produced by the source
* @return the {@link ReadableSource}, or <code>null</code> if the dataset is not
* filesystem-based.
*/
@SuppressWarnings("unchecked")
public static <E> ReadableSource<E> asSource(Dataset<E> dataset, Class<E> type) {
Path directory = Accessor.getDefault().getDirectory(dataset);
if (directory != null) {
List<Path> paths = Lists.newArrayList(
Accessor.getDefault().getPathIterator(dataset));
AvroType<E> avroType;
if (type.isAssignableFrom(GenericData.Record.class)) {
avroType = (AvroType<E>) Avros.generics(dataset.getDescriptor().getSchema());
} else {
avroType = Avros.records(type);
}
final Format format = dataset.getDescriptor().getFormat();
if (Formats.PARQUET.equals(format)) {
return new AvroParquetFileSource<E>(paths, avroType);
} else if (Formats.AVRO.equals(format)) {
return new AvroFileSource<E>(paths, avroType);
} else {
throw new UnsupportedOperationException(
"Not a supported format: " + format);
}
}
return null;
}
示例2: run
import org.apache.crunch.io.ReadableSource; //导入依赖的package包/类
@Override
public int run(String[] args) throws Exception {
final long startOfToday = startOfDay();
// the destination dataset
Dataset<Record> persistent = Datasets.load(
"dataset:file:/tmp/data/logs", Record.class);
// the source: anything before today in the staging area
Dataset<Record> staging = Datasets.load(
"dataset:file:/tmp/data/logs_staging", Record.class);
View<Record> ready = staging.toBefore("timestamp", startOfToday);
ReadableSource<Record> source = CrunchDatasets.asSource(ready);
PCollection<Record> stagedLogs = read(source);
getPipeline().write(stagedLogs,
CrunchDatasets.asTarget(persistent), Target.WriteMode.APPEND);
PipelineResult result = run();
if (result.succeeded()) {
// remove the source data partition from staging
ready.deleteAll();
return 0;
} else {
return 1;
}
}