Python dask.dataframe.Series.describe用法及代碼示例

用法: Series.describe(split_every=False, percentiles=None, percentiles_method='default', include=None, exclude=None, datetime_is_numeric=False)

生成說明性統計數據。

此文檔字符串是從 pandas.core.frame.DataFrame.describe 複製而來的。

可能存在與 Dask 版本的一些不一致之處。

說明性統計包括總結數據集分布的集中趨勢、離散度和形狀的統計，不包括NaN 值。

分析數字和對象係列，以及混合數據類型的DataFrame 列集。輸出將根據提供的內容而有所不同。有關詳細信息，請參閱下麵的注釋。

參數：

percentiles：list-like 個數字，可選

要包含在輸出中的百分位數。全部應介於 0 和 1 之間。默認值為 [.25, .5, .75] ，它返回第 25、第 50 和第 75 個百分位數。

include：‘all’, list-like of dtypes 或 None(默認)，可選

要包含在結果中的數據類型白名單。忽略 Series 。以下是選項：

‘all’：輸入的所有列都將包含在輸出中。
A list-like of dtypes ：將結果限製為提供的數據類型。要將結果限製為數字類型，請提交 numpy.number 。要將其限製為對象列，請提交 numpy.object 數據類型。字符串也可以以 select_dtypes 的樣式使用(例如 df.describe(include=['O']) )。要選擇 pandas 分類列，請使用 'category'
無(默認)：結果將包括所有數字列。

exclude：list-like of dtypes 或 None(默認)，可選，

要從結果中省略的數據類型的黑名單。忽略 Series 。以下是選項：

A list-like of dtypes ：從結果中排除提供的數據類型。要排除數字類型，請提交 numpy.number 。要排除對象列，請提交數據類型 numpy.object 。字符串也可以以 select_dtypes 的樣式使用(例如 df.describe(exclude=['O']) )。要排除 pandas 分類列，請使用 'category'
無(默認)：結果將不排除任何內容。

datetime_is_numeric：布爾值，默認為 False

是否將 datetime dtypes 視為數字。這會影響為列計算的統計信息。對於 DataFrame 輸入，這還控製默認情況下是否包含日期時間列。

Series或DataFrame: 提供的係列或 DataFrame 的匯總統計信息。

注意：

對於數字數據，結果的索引將包括 count , mean , std , min , max 以及較低的、50 和較高的百分位數。默認情況下，下百分位是 25 ，上百分位是 75 。 50 百分位數與中位數相同。

對於對象數據(例如字符串或時間戳)，結果的索引將包括 count , unique , top 和 freq 。 top 是最常見的值。 freq 是最常見值的頻率。時間戳還包括first 和last 項。

如果多個對象值具有最高計數，則將從具有最高計數的那些中任意選擇count 和top 結果。

對於通過 DataFrame 提供的混合數據類型，默認情況下僅返回對數值列的分析。如果 DataFrame 僅包含對象和分類數據而沒有任何數字列，則默認返回對對象和分類列的分析。如果 include='all' 作為選項提供，則結果將包括每種類型的屬性的聯合。

include 和 exclude 參數可用於限製 DataFrame 中的哪些列被分析用於輸出。分析 Series 時忽略這些參數。

例子：

說明一個數字 Series 。

>>> s = pd.Series([1, 2, 3])  
>>> s.describe()  
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
dtype: float64

說明一個分類的 Series 。

>>> s = pd.Series(['a', 'a', 'b', 'c'])  
>>> s.describe()  
count     4
unique    3
top       a
freq      2
dtype: object

說明時間戳 Series 。

>>> s = pd.Series([  
...   np.datetime64("2000-01-01"),
...   np.datetime64("2010-01-01"),
...   np.datetime64("2010-01-01")
... ])
>>> s.describe(datetime_is_numeric=True)  
count                      3
mean     2006-09-01 08:00:00
min      2000-01-01 00:00:00
25%      2004-12-31 12:00:00
50%      2010-01-01 00:00:00
75%      2010-01-01 00:00:00
max      2010-01-01 00:00:00
dtype: object

說明 DataFrame 。默認情況下，僅返回數字字段。

>>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']),  
...                    'numeric': [1, 2, 3],
...                    'object': ['a', 'b', 'c']
...                   })
>>> df.describe()  
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

說明 DataFrame 的所有列，無論數據類型如何。

>>> df.describe(include='all')  
       categorical  numeric object
count            3      3.0      3
unique           3      NaN      3
top              f      NaN      a
freq             1      NaN      1
mean           NaN      2.0    NaN
std            NaN      1.0    NaN
min            NaN      1.0    NaN
25%            NaN      1.5    NaN
50%            NaN      2.0    NaN
75%            NaN      2.5    NaN
max            NaN      3.0    NaN

通過將 DataFrame 中的列作為屬性訪問來說明該列。

>>> df.numeric.describe()  
count    3.0
mean     2.0
std      1.0
min      1.0
25%      1.5
50%      2.0
75%      2.5
max      3.0
Name: numeric, dtype: float64

在DataFrame 說明中僅包括數字列。

>>> df.describe(include=[np.number])  
       numeric
count      3.0
mean       2.0
std        1.0
min        1.0
25%        1.5
50%        2.0
75%        2.5
max        3.0

在DataFrame 說明中僅包括字符串列。

>>> df.describe(include=[object])  
       object
count       3
unique      3
top         a
freq        1

僅包括來自 DataFrame 說明的分類列。

>>> df.describe(include=['category'])  
       categorical
count            3
unique           3
top              d
freq             1

從 DataFrame 說明中排除數字列。

>>> df.describe(exclude=[np.number])  
       categorical object
count            3      3
unique           3      3
top              f      a
freq             1      1

從 DataFrame 說明中排除對象列。

>>> df.describe(exclude=[object])  
       categorical  numeric
count            3      3.0
unique           3      NaN
top              f      NaN
freq             1      NaN
mean           NaN      2.0
std            NaN      1.0
min            NaN      1.0
25%            NaN      1.5
50%            NaN      2.0
75%            NaN      2.5
max            NaN      3.0

相關用法

注：本文由純淨天空篩選整理自dask.org大神的英文原創作品 dask.dataframe.Series.describe。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。

用法:

參數：

返回：

注意：

例子：