当前位置: 首页>>代码示例>>Java>>正文


Java ColumnIOFactory类代码示例

本文整理汇总了Java中org.apache.parquet.io.ColumnIOFactory的典型用法代码示例。如果您正苦于以下问题:Java ColumnIOFactory类的具体用法?Java ColumnIOFactory怎么用?Java ColumnIOFactory使用的例子?那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。


ColumnIOFactory类属于org.apache.parquet.io包,在下文中一共展示了ColumnIOFactory类的14个代码示例,这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞,您的评价将有助于系统推荐出更棒的Java代码示例。

示例1: initialize

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public void initialize(FileMetaData parquetFileMetadata,
                       Path file, List<BlockMetaData> blocks, Configuration configuration)
    throws IOException {
  // initialize a ReadContext for this file
  Map<String, String> fileMetadata = parquetFileMetadata.getKeyValueMetaData();
  ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
      configuration, toSetMultiMap(fileMetadata), fileSchema));
  this.columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy());
  this.requestedSchema = readContext.getRequestedSchema();
  this.fileSchema = parquetFileMetadata.getSchema();
  this.file = file;
  this.columnCount = requestedSchema.getPaths().size();
  this.recordConverter = readSupport.prepareForRead(
      configuration, fileMetadata, fileSchema, readContext);
  this.strictTypeChecking = configuration.getBoolean(STRICT_TYPE_CHECKING, true);
  List<ColumnDescriptor> columns = requestedSchema.getColumns();
  reader = new ParquetFileReader(configuration, parquetFileMetadata, file, blocks, columns);
  for (BlockMetaData block : blocks) {
    total += block.getRowCount();
  }
  this.unmaterializableRecordCounter = new UnmaterializableRecordCounter(configuration, total);
  LOG.info("RecordReader initialized will read a total of " + total + " records.");
}
 
开发者ID:apache,项目名称:tajo,代码行数:24,代码来源:InternalParquetRecordReader.java

示例2: initialize

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public void initialize(MessageType fileSchema,
                       FileMetaData parquetFileMetadata,
                       Path file, List<BlockMetaData> blocks, Configuration configuration)
        throws IOException {
  // initialize a ReadContext for this file
  Map<String, String> fileMetadata = parquetFileMetadata.getKeyValueMetaData();
  ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
          configuration, toSetMultiMap(fileMetadata), fileSchema));
  this.columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy());
  this.requestedSchema = readContext.getRequestedSchema();
  this.fileSchema = fileSchema;
  this.file = file;
  this.columnCount = requestedSchema.getPaths().size();
  this.recordConverter = readSupport.prepareForRead(
          configuration, fileMetadata, fileSchema, readContext);
  this.strictTypeChecking = true;
  List<ColumnDescriptor> columns = requestedSchema.getColumns();
  reader = new ParquetFileReader(configuration, parquetFileMetadata, file, blocks, columns);
  for (BlockMetaData block : blocks) {
    total += block.getRowCount();
  }
  Log.info("RecordReader initialized will read a total of " + total + " records.");
}
 
开发者ID:h2oai,项目名称:h2o-3,代码行数:24,代码来源:H2OInternalParquetReader.java

示例3: initialize

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public void initialize(ParquetFileReader reader, Configuration configuration)
    throws IOException {
  // initialize a ReadContext for this file
  this.reader = reader;
  FileMetaData parquetFileMetadata = reader.getFooter().getFileMetaData();
  this.fileSchema = parquetFileMetadata.getSchema();
  Map<String, String> fileMetadata = parquetFileMetadata.getKeyValueMetaData();
  ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
      configuration, toSetMultiMap(fileMetadata), fileSchema));
  this.columnIOFactory = new ColumnIOFactory(parquetFileMetadata.getCreatedBy());
  this.requestedSchema = readContext.getRequestedSchema();
  this.columnCount = requestedSchema.getPaths().size();
  this.recordConverter = readSupport.prepareForRead(
      configuration, fileMetadata, fileSchema, readContext);
  this.strictTypeChecking = configuration.getBoolean(STRICT_TYPE_CHECKING, true);
  this.total = reader.getRecordCount();
  this.unmaterializableRecordCounter = new UnmaterializableRecordCounter(configuration, total);
  this.filterRecords = configuration.getBoolean(RECORD_FILTERING_ENABLED, true);
  reader.setRequestedSchema(requestedSchema);
  LOG.info("RecordReader initialized will read a total of {} records.", total);
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:22,代码来源:InternalParquetRecordReader.java

示例4: validateSameTupleAsEB

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
/**
 * <ul> steps:
 * <li>Writes using the thrift mapping
 * <li>Reads using the pig mapping
 * <li>Use Elephant bird to convert from thrift to pig
 * <li>Check that both transformations give the same result
 * @param o the object to convert
 * @throws TException
 */
public static <T extends TBase<?,?>> void validateSameTupleAsEB(T o) throws TException {
  final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
  @SuppressWarnings("unchecked")
  final Class<T> class1 = (Class<T>) o.getClass();
  final MessageType schema = thriftSchemaConverter.convert(class1);

  final StructType structType = ThriftSchemaConverter.toStructType(class1);
  final ThriftToPig<T> thriftToPig = new ThriftToPig<T>(class1);
  final Schema pigSchema = thriftToPig.toSchema();
  final TupleRecordMaterializer tupleRecordConverter = new TupleRecordMaterializer(schema, pigSchema, true);
  RecordConsumer recordConsumer = new ConverterConsumer(tupleRecordConverter.getRootConverter(), schema);
  final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
  ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
  o.write(p);
  final Tuple t = tupleRecordConverter.getCurrentRecord();
  final Tuple expected = thriftToPig.getPigTuple(o);
  assertEquals(expected.toString(), t.toString());
  final MessageType filtered = new PigSchemaConverter().filter(schema, pigSchema);
  assertEquals(schema.toString(), filtered.toString());
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:30,代码来源:TestThriftToPigCompatibility.java

示例5: newSchema

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void newSchema() throws IOException {
  // Reset it to half of current number and bound it within the limits
  recordCountForNextMemCheck = min(max(MINIMUM_RECORD_COUNT_FOR_CHECK, recordCountForNextMemCheck / 2), MAXIMUM_RECORD_COUNT_FOR_CHECK);

  String json = new Schema(batchSchema).toJson();
  extraMetaData.put(DREMIO_ARROW_SCHEMA, json);
  List<Type> types = Lists.newArrayList();
  for (Field field : batchSchema) {
    if (field.getName().equalsIgnoreCase(WriterPrel.PARTITION_COMPARATOR_FIELD)) {
      continue;
    }
    Type childType = getType(field);
    if (childType != null) {
      types.add(childType);
    }
  }
  Preconditions.checkState(types.size() > 0, "No types for parquet schema");
  schema = new MessageType("root", types);

  int dictionarySize = (int)context.getOptions().getOption(ExecConstants.PARQUET_DICT_PAGE_SIZE_VALIDATOR);
  final ParquetProperties parquetProperties = new ParquetProperties(dictionarySize, writerVersion, enableDictionary,
    new ParquetDirectByteBufferAllocator(columnEncoderAllocator), pageSize, true, enableDictionaryForBinary);
  pageStore = ColumnChunkPageWriteStoreExposer.newColumnChunkPageWriteStore(codecFactory.getCompressor(codec), schema, parquetProperties);
  store = new ColumnWriteStoreV1(pageStore, pageSize, parquetProperties);
  MessageColumnIO columnIO = new ColumnIOFactory(false).getColumnIO(this.schema);
  consumer = columnIO.getRecordWriter(store);
  setUp(schema, consumer);
}
 
开发者ID:dremio,项目名称:dremio-oss,代码行数:29,代码来源:ParquetRecordWriter.java

示例6: ParquetRowReader

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public ParquetRowReader(Configuration configuration, Path filePath, ReadSupport<T> readSupport) throws IOException
{
    this.filePath = filePath;

    ParquetMetadata parquetMetadata = ParquetFileReader.readFooter(configuration, filePath, ParquetMetadataConverter.NO_FILTER);
    List<BlockMetaData> blocks = parquetMetadata.getBlocks();

    FileMetaData fileMetadata = parquetMetadata.getFileMetaData();
    this.fileSchema = fileMetadata.getSchema();
    Map<String, String> keyValueMetadata = fileMetadata.getKeyValueMetaData();
    ReadSupport.ReadContext readContext = readSupport.init(new InitContext(
            configuration, toSetMultiMap(keyValueMetadata), fileSchema));
    this.columnIOFactory = new ColumnIOFactory(fileMetadata.getCreatedBy());

    this.requestedSchema = readContext.getRequestedSchema();
    this.recordConverter = readSupport.prepareForRead(
            configuration, fileMetadata.getKeyValueMetaData(), fileSchema, readContext);

    List<ColumnDescriptor> columns = requestedSchema.getColumns();

    reader = new ParquetFileReader(configuration, fileMetadata, filePath, blocks, columns);

    long total = 0;
    for (BlockMetaData block : blocks) {
        total += block.getRowCount();
    }
    this.total = total;

    this.unmaterializableRecordCounter = new UnmaterializableRecordCounter(configuration, total);
    logger.info("ParquetRowReader initialized will read a total of " + total + " records.");
}
 
开发者ID:CyberAgent,项目名称:embulk-input-parquet_hadoop,代码行数:32,代码来源:ParquetRowReader.java

示例7: load

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
public ITable load() {
    try {
        Configuration conf = new Configuration();
        System.setProperty("hadoop.home.dir", "/");
        conf.set("hadoop.security.authentication", "simple");
        conf.set("hadoop.security.authorization", "false");
        Path path = new Path(this.filename);
        ParquetMetadata md = ParquetFileReader.readFooter(conf, path,
                ParquetMetadataConverter.NO_FILTER);
        MessageType schema = md.getFileMetaData().getSchema();
        ParquetFileReader r = new ParquetFileReader(conf, path, md);
        IAppendableColumn[] cols = this.createColumns(md);
        MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);

        PageReadStore pages;
        while (null != (pages = r.readNextRowGroup())) {
            final long rows = pages.getRowCount();
            RecordReader<Group> recordReader = columnIO.getRecordReader(
                    pages, new GroupRecordConverter(schema));
            for (int i = 0; i < rows; i++) {
                Group g = recordReader.read();
                appendGroup(cols, g, md.getFileMetaData().getSchema().getColumns());
            }
        }

        for (IAppendableColumn c: cols)
            c.seal();
        return new Table(cols);
    } catch (IOException ex) {
        throw new RuntimeException(ex);
    }
}
 
开发者ID:vmware,项目名称:hillview,代码行数:33,代码来源:ParquetReader.java

示例8: newSchema

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void newSchema() throws IOException {
  List<Type> types = Lists.newArrayList();
  for (MaterializedField field : batchSchema) {
    if (field.getName().equalsIgnoreCase(WriterPrel.PARTITION_COMPARATOR_FIELD)) {
      continue;
    }
    types.add(getType(field));
  }
  schema = new MessageType("root", types);

  // We don't want this number to be too small, ideally we divide the block equally across the columns.
  // It is unlikely all columns are going to be the same size.
  // Its value is likely below Integer.MAX_VALUE (2GB), although rowGroupSize is a long type.
  // Therefore this size is cast to int, since allocating byte array in under layer needs to
  // limit the array size in an int scope.
  int initialBlockBufferSize = max(MINIMUM_BUFFER_SIZE, blockSize / this.schema.getColumns().size() / 5);
  // We don't want this number to be too small either. Ideally, slightly bigger than the page size,
  // but not bigger than the block buffer
  int initialPageBufferSize = max(MINIMUM_BUFFER_SIZE, min(pageSize + pageSize / 10, initialBlockBufferSize));
  // TODO: Use initialSlabSize from ParquetProperties once drill will be updated to the latest version of Parquet library
  int initialSlabSize = CapacityByteArrayOutputStream.initialSlabSizeHeuristic(64, pageSize, 10);
  // TODO: Replace ParquetColumnChunkPageWriteStore with ColumnChunkPageWriteStore from parquet library
  // once PARQUET-1006 will be resolved
  pageStore = new ParquetColumnChunkPageWriteStore(codecFactory.getCompressor(codec), schema, initialSlabSize,
      pageSize, new ParquetDirectByteBufferAllocator(oContext));
  store = new ColumnWriteStoreV1(pageStore, pageSize, initialPageBufferSize, enableDictionary,
      writerVersion, new ParquetDirectByteBufferAllocator(oContext));
  MessageColumnIO columnIO = new ColumnIOFactory(false).getColumnIO(this.schema);
  consumer = columnIO.getRecordWriter(store);
  setUp(schema, consumer);
}
 
开发者ID:axbaretto,项目名称:drill,代码行数:32,代码来源:ParquetRecordWriter.java

示例9: initStore

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void initStore() {
  pageStore = new ColumnChunkPageWriteStore(compressor, schema, props.getAllocator());
  columnStore = props.newColumnWriteStore(schema, pageStore);
  MessageColumnIO columnIO = new ColumnIOFactory(validating).getColumnIO(schema);
  this.recordConsumer = columnIO.getRecordWriter(columnStore);
  writeSupport.prepareForWrite(recordConsumer);
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:8,代码来源:InternalParquetRecordWriter.java

示例10: validate

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private <T extends TBase<?,?>> void validate(T expected) throws TException {
  @SuppressWarnings("unchecked")
  final Class<T> thriftClass = (Class<T>)expected.getClass();
  final MemPageStore memPageStore = new MemPageStore(1);
  final ThriftSchemaConverter schemaConverter = new ThriftSchemaConverter();
  final MessageType schema = schemaConverter.convert(thriftClass);
  LOG.info("{}", schema);
  final MessageColumnIO columnIO = new ColumnIOFactory(true).getColumnIO(schema);
  final ColumnWriteStoreV1 columns = new ColumnWriteStoreV1(memPageStore,
      ParquetProperties.builder()
          .withPageSize(10000)
          .withDictionaryEncoding(false)
          .build());
  final RecordConsumer recordWriter = columnIO.getRecordWriter(columns);
  final StructType thriftType = schemaConverter.toStructType(thriftClass);
  ParquetWriteProtocol parquetWriteProtocol = new ParquetWriteProtocol(recordWriter, columnIO, thriftType);

  expected.write(parquetWriteProtocol);
  recordWriter.flush();
  columns.flush();

  ThriftRecordConverter<T> converter = new TBaseRecordConverter<T>(thriftClass, schema, thriftType);
  final RecordReader<T> recordReader = columnIO.getRecordReader(memPageStore, converter);

  final T result = recordReader.read();

  assertEquals(expected, result);
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:29,代码来源:TestParquetReadProtocol.java

示例11: validateThrift

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private void validateThrift(String[] expectations, TBase<?, ?> a)
      throws TException {
    final ThriftSchemaConverter thriftSchemaConverter = new ThriftSchemaConverter();
//      System.out.println(a);
    final Class<TBase<?,?>> class1 = (Class<TBase<?,?>>)a.getClass();
    final MessageType schema = thriftSchemaConverter.convert(class1);
    LOG.info("{}", schema);
    final StructType structType = thriftSchemaConverter.toStructType(class1);
    ExpectationValidatingRecordConsumer recordConsumer = new ExpectationValidatingRecordConsumer(new ArrayDeque<String>(Arrays.asList(expectations)));
    final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
    ParquetWriteProtocol p = new ParquetWriteProtocol(new RecordConsumerLoggingWrapper(recordConsumer), columnIO, structType);
    a.write(p);
  }
 
开发者ID:apache,项目名称:parquet-mr,代码行数:14,代码来源:TestParquetWriteProtocol.java

示例12: prepareForWrite

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
@Override
public void prepareForWrite(RecordConsumer recordConsumer) {
  final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
  this.parquetWriteProtocol = new ParquetWriteProtocol(recordConsumer, columnIO, thriftStruct);
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:6,代码来源:AbstractThriftWriteSupport.java

示例13: prepareForWrite

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
@Override
public void prepareForWrite(RecordConsumer recordConsumer) {
  final MessageColumnIO columnIO = new ColumnIOFactory().getColumnIO(schema);
  this.parquetWriteProtocol = new ParquetWriteProtocol(recordConsumer, columnIO, thriftStruct);
  thriftWriteSupport.prepareForWrite(recordConsumer);
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:7,代码来源:ThriftBytesWriteSupport.java

示例14: newColumnFactory

import org.apache.parquet.io.ColumnIOFactory; //导入依赖的package包/类
private static MessageColumnIO newColumnFactory(String pigSchemaString) throws ParserException {
  MessageType schema = new PigSchemaConverter().convert(Utils.getSchemaFromString(pigSchemaString));
  return new ColumnIOFactory().getColumnIO(schema);
}
 
开发者ID:apache,项目名称:parquet-mr,代码行数:5,代码来源:TupleConsumerPerfTest.java


注:本文中的org.apache.parquet.io.ColumnIOFactory类示例由纯净天空整理自Github/MSDocs等开源代码及文档管理平台,相关代码片段筛选自各路编程大神贡献的开源项目,源码版权归原作者所有,传播和使用请参考对应项目的License;未经允许,请勿转载。