Python cudf.DataFrame.info用法及代码示例

用法: DataFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, null_counts=None)

打印 DataFrame 的简明摘要。

此方法打印有关 DataFrame 的信息，包括索引 dtype 和列 dtypes、非空值和内存使用情况。

参数：

verbose：布尔型，可选: 是否打印完整的摘要。默认情况下，遵循pandas.options.display.max_info_columns 中的设置。
buf：可写缓冲区，默认为 sys.stdout: 将输出发送到哪里。默认情况下，输出打印到 sys.stdout。如果您需要进一步处理输出，请传递一个可写缓冲区。
max_cols：整数，可选: 何时从详细输出切换到截断输出。如果 DataFrame 的列超过 max_cols 列，则使用截断的输出。默认情况下，使用pandas.options.display.max_info_columns 中的设置。
memory_usage：布尔，str，可选: 指定是否应显示 DataFrame 元素(包括索引)的总内存使用情况。默认情况下，这遵循 pandas.options.display.memory_usage 设置。 True 总是显示内存使用情况。 False 从不显示内存使用情况。 ‘deep’ 的值相当于“True with deep introspection”。内存使用以人类可读的单位(base-2 表示)显示。如果没有深入的自省，则基于列 dtype 和行数进行内存估计，假设值消耗相应 dtype 的相同内存量。使用深度内存自省，以计算资源为代价执行实际内存使用计算。
null_counts：布尔型，可选: 是否显示非空计数。默认情况下，仅当帧小于 pandas.options.display.max_info_rows 和 pandas.options.display.max_info_columns 时才会显示。 True 值始终显示计数，而 False 从不显示计数。

None: 此方法打印 DataFrame 的摘要并返回 None。

例子：

>>> import cudf
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = cudf.DataFrame({"int_col": int_values,
...                     "text_col": text_values,
...                     "float_col": float_values})
>>> df
   int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

打印所有列的信息：

>>> df.info(verbose=True)
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      object
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 130.0+ bytes

打印列数及其 dtypes 的摘要，但不打印每列信息：

>>> df.info(verbose=False)
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 130.0+ bytes

管道输出 DataFrame.info 到缓冲区而不是 sys.stdout 并打印缓冲区内容：

>>> import io
>>> buffer = io.StringIO()
>>> df.info(buf=buffer)
>>> print(buffer.getvalue())
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   int_col    5 non-null      int64
 1   text_col   5 non-null      object
 2   float_col  5 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 130.0+ bytes

memory_usage 参数允许深度自省模式，特别适用于大数据帧和fine-tune 内存优化：

>>> import numpy as np
>>> random_strings_array = np.random.choice(['a', 'b', 'c'], 10 ** 6)
>>> df = cudf.DataFrame({
...     'column_1': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_2': np.random.choice(['a', 'b', 'c'], 10 ** 6),
...     'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info(memory_usage='deep')
<class 'cudf.core.dataframe.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
 #   Column    Non-Null Count    Dtype
---  ------    --------------    -----
 0   column_1  1000000 non-null  object
 1   column_2  1000000 non-null  object
 2   column_3  1000000 non-null  object
dtypes: object(3)
memory usage: 14.3 MB

相关用法

注：本文由纯净天空筛选整理自rapids.ai大神的英文原创作品 cudf.DataFrame.info。非经特殊声明，原始代码版权归原作者所有，本译文未经允许或授权，请勿转载或复制。

用法:

参数：

返回：

例子：