当前位置: 首页>>代码示例>>Java>>正文


Java ParquetFileWriter.start方法代码示例

本文整理汇总了Java中org.apache.parquet.hadoop.ParquetFileWriter.start方法的典型用法代码示例。如果您正苦于以下问题:Java ParquetFileWriter.start方法的具体用法?Java ParquetFileWriter.start怎么用?Java ParquetFileWriter.start使用的例子?那么恭喜您, 这里精选的方法代码示例或许可以为您提供帮助。您也可以进一步了解该方法所在org.apache.parquet.hadoop.ParquetFileWriter的用法示例。


在下文中一共展示了ParquetFileWriter.start方法的2个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: mergeOutput

import org.apache.parquet.hadoop.ParquetFileWriter; //导入方法依赖的package包/类
@Override
protected boolean mergeOutput(FileSystem fs, String sourceFolder, String targetFile) {
    try {
        FileStatus[] sourceStatuses = FileSystemUtil.listSubFiles(fs, sourceFolder);
        List<Path> sourceFiles = new ArrayList<>();
        for (FileStatus sourceStatus : sourceStatuses) {
            sourceFiles.add(sourceStatus.getPath());
        }
        FileMetaData mergedMeta = ParquetFileWriter.mergeMetadataFiles(sourceFiles, fs.getConf()).getFileMetaData();
        ParquetFileWriter writer = new ParquetFileWriter(fs.getConf(), mergedMeta.getSchema(), new Path(targetFile),
                ParquetFileWriter.Mode.CREATE);
        writer.start();
        for (Path input : sourceFiles) {
            writer.appendFile(fs.getConf(), input);
        }
        writer.end(mergedMeta.getKeyValueMetaData());
    } catch (Exception e) {
        LOG.error("Error when merging files in {}.\n{}", sourceFolder, e.getMessage());
        return false;
    }
    return true;
}
 
开发者ID:Talend,项目名称:components,代码行数:23,代码来源:ParquetHdfsFileSink.java

示例2: execute

import org.apache.parquet.hadoop.ParquetFileWriter; //导入方法依赖的package包/类
@Override
public void execute(CommandLine options) throws Exception {
  // Prepare arguments
  List<String> args = options.getArgList();
  List<Path> inputFiles = getInputFiles(args.subList(0, args.size() - 1));
  Path outputFile = new Path(args.get(args.size() - 1));

  // Merge schema and extraMeta
  FileMetaData mergedMeta = mergedMetadata(inputFiles);
  PrintWriter out = new PrintWriter(Main.out, true);

  // Merge data
  ParquetFileWriter writer = new ParquetFileWriter(conf,
          mergedMeta.getSchema(), outputFile, ParquetFileWriter.Mode.CREATE);
  writer.start();
  boolean tooSmallFilesMerged = false;
  for (Path input: inputFiles) {
    if (input.getFileSystem(conf).getFileStatus(input).getLen() < TOO_SMALL_FILE_THRESHOLD) {
      out.format("Warning: file %s is too small, length: %d\n",
        input,
        input.getFileSystem(conf).getFileStatus(input).getLen());
      tooSmallFilesMerged = true;
    }

    writer.appendFile(HadoopInputFile.fromPath(input, conf));
  }

  if (tooSmallFilesMerged) {
    out.println("Warning: you merged too small files. " +
      "Although the size of the merged file is bigger, it STILL contains small row groups, thus you don't have the advantage of big row groups, " +
      "which usually leads to bad query performance!");
  }
  writer.end(mergedMeta.getKeyValueMetaData());
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:35,代码来源:MergeCommand.java


注:本文中的org.apache.parquet.hadoop.ParquetFileWriter.start方法示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。