當前位置: 首頁>>代碼示例>>Java>>正文


Java CrunchDatasets類代碼示例

本文整理匯總了Java中org.kitesdk.data.crunch.CrunchDatasets的典型用法代碼示例。如果您正苦於以下問題:Java CrunchDatasets類的具體用法?Java CrunchDatasets怎麽用?Java CrunchDatasets使用的例子?那麽, 這裏精選的類代碼示例或許可以為您提供幫助。


CrunchDatasets類屬於org.kitesdk.data.crunch包,在下文中一共展示了CrunchDatasets類的4個代碼示例,這些例子默認根據受歡迎程度排序。您可以為喜歡或者感覺有用的代碼點讚,您的評價將有助於係統推薦出更棒的Java代碼示例。

示例1: run

import org.kitesdk.data.crunch.CrunchDatasets; //導入依賴的package包/類
@Override
public int run(String[] args) throws Exception {
  final long startOfToday = startOfDay();

  // the destination dataset
  Dataset<Record> persistent = Datasets.load(
      "dataset:file:/tmp/data/logs", Record.class);

  // the source: anything before today in the staging area
  Dataset<Record> staging = Datasets.load(
      "dataset:file:/tmp/data/logs_staging", Record.class);
  View<Record> ready = staging.toBefore("timestamp", startOfToday);

  ReadableSource<Record> source = CrunchDatasets.asSource(ready);

  PCollection<Record> stagedLogs = read(source);

  getPipeline().write(stagedLogs,
      CrunchDatasets.asTarget(persistent), Target.WriteMode.APPEND);

  PipelineResult result = run();

  if (result.succeeded()) {
    // remove the source data partition from staging
    ready.deleteAll();
    return 0;
  } else {
    return 1;
  }
}
 
開發者ID:kite-sdk,項目名稱:kite-examples,代碼行數:31,代碼來源:StagingToPersistent.java

示例2: run

import org.kitesdk.data.crunch.CrunchDatasets; //導入依賴的package包/類
public void run() {

    // TODO: Switch to parameterized views.
    View<ExampleEvent> view = Datasets.load(ScheduledReportApp.EXAMPLE_DS_URI,
        ExampleEvent.class);

    RefinableView<GenericRecord> target = Datasets.load(ScheduledReportApp.REPORT_DS_URI,
        GenericRecord.class);

    // Get the view into which this report will be written.
    DateTime dateTime = getNominalTime().toDateTime(DateTimeZone.UTC);

    View<GenericRecord> output = target
        .with("year", dateTime.getYear())
        .with("month", dateTime.getMonthOfYear())
        .with("day", dateTime.getDayOfMonth())
        .with("hour", dateTime.getHourOfDay())
        .with("minute", dateTime.getMinuteOfHour());

    Pipeline pipeline = getPipeline();

    PCollection<ExampleEvent> events = pipeline.read(CrunchDatasets.asSource(view));

    PTable<Long, ExampleEvent> eventsByUser = events.by(new GetEventId(), Avros.longs());

    // Count of events by user ID.
    PTable<Long, Long> userEventCounts = eventsByUser.keys().count();

    PCollection<GenericData.Record> report = userEventCounts.parallelDo(
        new ToUserReport(),
        Avros.generics(SCHEMA));

    pipeline.write(report, CrunchDatasets.asTarget(output));

    pipeline.run();
  }
 
開發者ID:rbrush,項目名稱:kite-apps,代碼行數:37,代碼來源:ScheduledReportJob.java

示例3: run

import org.kitesdk.data.crunch.CrunchDatasets; //導入依賴的package包/類
@Override
public int run(String[] args) throws Exception {
  // Turn debug on while in development.
  getPipeline().enableDebug();
  getPipeline().getConfiguration().set("crunch.log.job.progress", "true");

  Dataset<StandardEvent> eventsDataset = Datasets.load(
      "dataset:hdfs:/tmp/data/default/events", StandardEvent.class);

  View<StandardEvent> eventsToProcess;
  if (args.length == 0 || (args.length == 1 && args[0].equals("LATEST"))) {
    // get the current minute
    Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
    cal.set(Calendar.SECOND, 0);
    cal.set(Calendar.MILLISECOND, 0);
    long currentMinute = cal.getTimeInMillis();
    // restrict events to before the current minute
    // in the workflow, this also has a lower bound for the timestamp
    eventsToProcess = eventsDataset.toBefore("timestamp", currentMinute);

  } else if (isView(args[0])) {
    eventsToProcess = Datasets.load(args[0], StandardEvent.class);
  } else {
    eventsToProcess = FileSystemDatasets.viewForPath(eventsDataset, new Path(args[0]));
  }

  if (eventsToProcess.isEmpty()) {
    LOG.info("No records to process.");
    return 0;
  }

  // Create a parallel collection from the working partition
  PCollection<StandardEvent> events = read(
      CrunchDatasets.asSource(eventsToProcess));

  // Group events by user and cookie id, then create a session for each group
  PCollection<Session> sessions = events
      .by(new GetSessionKey(), Avros.strings())
      .groupByKey()
      .parallelDo(new MakeSession(), Avros.specifics(Session.class));

  // Write the sessions to the "sessions" Dataset
  getPipeline().write(sessions,
      CrunchDatasets.asTarget("dataset:hive:/tmp/data/default/sessions"),
      Target.WriteMode.APPEND);

  return run().succeeded() ? 0 : 1;
}
 
開發者ID:kite-sdk,項目名稱:kite-examples,代碼行數:49,代碼來源:CreateSessions.java

示例4: run

import org.kitesdk.data.crunch.CrunchDatasets; //導入依賴的package包/類
public void run(@DataIn(name="example_events", type=ExampleEvent.class) View<ExampleEvent> input,
                @DataOut(name="odd_users", type=ExampleEvent.class) View<ExampleEvent> output) {

  Pipeline pipeline = getPipeline();

  PCollection<ExampleEvent> events = pipeline.read(CrunchDatasets.asSource(input));

  PCollection<ExampleEvent> oddUsers = events.filter(new KeepOddUsers());

  pipeline.write(oddUsers, CrunchDatasets.asTarget(output));

  pipeline.run();
}
 
開發者ID:rbrush,項目名稱:kite-apps,代碼行數:14,代碼來源:TriggeredJob.java


注:本文中的org.kitesdk.data.crunch.CrunchDatasets類示例由純淨天空整理自Github/MSDocs等開源代碼及文檔管理平台,相關代碼片段篩選自各路編程大神貢獻的開源項目,源碼版權歸原作者所有,傳播和使用請參考對應項目的License;未經允許,請勿轉載。