Python tf.io.RaggedFeature用法及代码示例

用于传递 RaggedTensor 输入函数的配置。

用法

tf.io.RaggedFeature(
    dtype, value_key=None, partitions=(), row_splits_dtype=tf.dtypes.int32,
    validate=False
)

属性

dtype 字段编号 0 的 namedtuple 别名
value_key 字段编号 1 的 namedtuple 别名
partitions 字段编号 2 的 namedtuple 别名
row_splits_dtype 字段编号 3 的 namedtuple 别名
validate 字段编号 4 的 namedtuple 别名

value_key 为可变长度的值列表指定特征键； partitions 指定零个或多个特征键，用于将这些值划分为更高的维度。 partitions 的每个元素必须是以下之一：

tf.io.RaggedFeature.RowSplits(key:string)
tf.io.RaggedFeature.RowLengths(key:string)
tf.io.RaggedFeature.RowStarts(key:string)
tf.io.RaggedFeature.RowLimits(key:string)
tf.io.RaggedFeature.ValueRowIds(key:string)
tf.io.RaggedFeature.UniformRowLength(length:int) 。

其中key 是一个特征键，其值用于对值进行分区。分区从最外到最内列出。

如果len(partitions) == 0(默认)，那么：
- 来自单个 tf.Example 的特征被解析为一维 tf.Tensor 。
- 一批 tf.Example 中的特征被解析为 2D tf.RaggedTensor ，其中外部维度是批次维度，内部(不规则)维度是每个示例中的特征长度。
如果 len(partitions) == 1 ，则：
- 来自单个 tf.Example 的特征被解析为 2D tf.RaggedTensor ，其中从 value_key 获取的值使用分区键分隔成行。
- 一批 tf.Example 的特征被解析为 3D tf.RaggedTensor ，其中外部维度是批次维度，两个内部维度是通过使用该示例的分区将每个示例的 value_key 值分成行来形成的钥匙。
如果 len(partitions) > 1 ，则：
- 来自单个 tf.Example 的特征被解析为 tf.RaggedTensor ，其等级为 len(partitions)+1 ，其 ragged_rank 为 len(partitions) 。
- 来自一批 tf.Example 的特征被解析为 tf.RaggedTensor，其排名为 len(partitions)+2 并且其 ragged_rank 为 len(partitions)+1 ，其中外部维度是批次维度。

有一个异常：如果 partitions 的最后一个(即最里面的)元素是 UniformRowLength ，那么这些值会被简单地重新整形(作为更高维的 tf.Tensor )，而不是被包在一个tf.RaggedTensor 。

例子

import google.protobuf.text_format as pbtext
example_batch = [
  pbtext.Merge(r'''
    features {
      feature {key:"v" value {int64_list {value:[3, 1, 4, 1, 5, 9]} } }
      feature {key:"s1" value {int64_list {value:[0, 2, 3, 3, 6]} } }
      feature {key:"s2" value {int64_list {value:[0, 2, 3, 4]} } }
    }''', tf.train.Example()).SerializeToString(),
  pbtext.Merge(r'''
    features {
      feature {key:"v" value {int64_list {value:[2, 7, 1, 8, 2, 8, 1]} } }
      feature {key:"s1" value {int64_list {value:[0, 3, 4, 5, 7]} } }
      feature {key:"s2" value {int64_list {value:[0, 1, 1, 4]} } }
    }''', tf.train.Example()).SerializeToString()]

features = {
    # Zero partitions:returns 1D tf.Tensor for each Example.
    'f1':tf.io.RaggedFeature(value_key="v", dtype=tf.int64),
    # One partition:returns 2D tf.RaggedTensor for each Example.
    'f2':tf.io.RaggedFeature(value_key="v", dtype=tf.int64, partitions=[
        tf.io.RaggedFeature.RowSplits("s1")]),
    # Two partitions:returns 3D tf.RaggedTensor for each Example.
    'f3':tf.io.RaggedFeature(value_key="v", dtype=tf.int64, partitions=[
        tf.io.RaggedFeature.RowSplits("s2"),
        tf.io.RaggedFeature.RowSplits("s1")])
}

feature_dict = tf.io.parse_single_example(example_batch[0], features)
for (name, val) in sorted(feature_dict.items()):
  print('%s:%s' % (name, val))
f1:tf.Tensor([3 1 4 1 5 9], shape=(6,), dtype=int64)
f2:<tf.RaggedTensor [[3, 1], [4], [], [1, 5, 9]]>
f3:<tf.RaggedTensor [[[3, 1], [4]], [[]], [[1, 5, 9]]]>

feature_dict = tf.io.parse_example(example_batch, features)
for (name, val) in sorted(feature_dict.items()):
  print('%s:%s' % (name, val))
f1:<tf.RaggedTensor [[3, 1, 4, 1, 5, 9],
                      [2, 7, 1, 8, 2, 8, 1]]>
f2:<tf.RaggedTensor [[[3, 1], [4], [], [1, 5, 9]],
                      [[2, 7, 1], [8], [2], [8, 1]]]>
f3:<tf.RaggedTensor [[[[3, 1], [4]], [[]], [[1, 5, 9]]],
                      [[[2, 7, 1]], [], [[8], [2], [8, 1]]]]>

领域：

dtype:数据类型RaggedTensor.必须是以下之一：tf.dtypes.int64,tf.dtypes.float32,tf.dtypes.string.
value_key：(可选。) key Feature在输入Example, 其解析Tensor将是结果RaggedTensor.flat_values.如果未指定，则默认为此键RaggedFeature.
partitions：(可选。)指定row-partitioning张量的对象列表(从最外层到最内层)。此列表中的每个条目必须是以下之一：
- tf.io.RaggedFeature.RowSplits(key:string)
- tf.io.RaggedFeature.RowLengths(key:string)
- tf.io.RaggedFeature.RowStarts(key:string)
- tf.io.RaggedFeature.RowLimits(key:string)
- tf.io.RaggedFeature.ValueRowIds(key:string)
- tf.io.RaggedFeature.UniformRowLength(length:int) 。其中 key 是输入 Example 中 Feature 的键，其解析后的 Tensor 将是生成的 row-partitioning 张量。
row_splits_dtype：(可选。)row-partitioning 张量的数据类型。之一int32或者int64.默认为int32.
validate：(可选。)布尔值，指示是否验证输入值形成有效的 RaggedTensor。默认为False.

子类

class UniformRowLength

class ValueRowIds

相关用法

注：本文由纯净天空筛选整理自tensorflow.org大神的英文原创作品 tf.io.RaggedFeature。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。