Python Pandas DataFrame rolling方法用法及代碼示例

Pandas DataFrame.rolling(~) 方法用於使用移動窗口計算統計數據。請注意，窗口隻是用於計算平均值等統計數據的值序列。

參數

1.window | int 或 offset 或 BaseIndexer 子類

移動窗口的大小。

當處理時間序列時，即當源DataFrame的索引為 DatetimeIndex 時，offset表示每個窗口的時間間隔。

2. min_periods | int | optional

窗口中值的最小數量。如果窗口包含少於 min_periods 的觀測值，則返回 NaN 作為該窗口的計算統計量。默認值取決於以下因子：

如果窗口是基於偏移的，則 min_periods=1 。
否則，min_periods=window 。

3. center | boolean | optional

如果 True ，則觀察值設置為窗口的中心。
如果 False ，則觀察值設置在窗口的右側。

默認情況下，center=False 。請參閱下麵的示例以進行說明。

4. win_type | string | optional

窗口的類型(例如boxvar,triang)。欲了解更多信息，請谘詢官方文檔open_in_new.

5. on | string | optional

使用 datetime-like 列的標簽代替 DatetimeIndex ，這僅在處理時間序列時相關。

6. axis | int 或 string | optional

是否計算每列或每行的統計信息。默認情況下， axis=0 ，即為每一列計算統計量。

7. closed | string | optional

端點是包含的還是排除的：

值	說明
`"left"`	包含左端點。右端點是獨占的。
`"right"`	左端點是獨占的。右端點包含在內。
`"both"`	兩個端點都包含在內。
`"neither"`	兩個端點都是互斥的。

默認，

對於基於偏移的窗口，closed="right" 。
否則，closed="both" 。

返回值

將用於計算某些統計數據的 Window 或 Rolling 對象。

例子

基本用法

考慮以下 DataFrame ：

df = pd.DataFrame({"A":[2,4,8,10],"B":[4,5,6,7]}, index=["a","b","c","d"])
df



   A  B
a  2  4
b  4  5
c  8  6
d  10  7

要使用大小為 2 的移動窗口計算值的總和：

df.rolling(window=2).sum()



   A     B
a  NaN   NaN
b  6.0   9.0
c  12.0  11.0
d  18.0  13.0

在此，請注意以下事項：

由於axis=0(默認)，我們正在計算每列的統計數據(總和)。
window=2 表示使用兩個連續觀測值計算總和：
- 我們在第一列中得到 6.0 因為 2+4=6 。
- 我們得到 12.0 因為 4+8=12 。
- 我們得到 18.0 因為 8+10=18 。
我們在第一行得到 NaN，因為在窗口不是基於偏移的情況下，min_periods 等於我們為 window 指定的值。這意味著計算統計數據所需的最小觀測數為 2 ，但對於第一行，窗口中隻有一個數字，因此返回 NaN 。

指定中心

考慮以下 DataFrame ：

df = pd.DataFrame({"A":[2,4,8,10]}, index=["a","b","c","d"])
df



   A
a  2
b  4
c  8
d  10

默認情況下， center=False ，這意味著窗口不會以觀察為中心：

df.rolling(window=3, min_periods=0).sum()   # center=False



   A
a  2.0
b  6.0
c  14.0
d  22.0

這裏，數字的計算方式如下：

A[a]: 2 = 2
A[b]: 2 + 4 = 6 # the observation is 4 (see how 4 is right-aligned)
A[c]: 2 + 4 + 8 = 14 # the observation is 8
A[d]: 4 + 8 + 10 = 22 # the observation is 10

將此與 center=True 的輸出進行比較：

df.rolling(window=3, min_periods=0, center=True).sum()



   A
a  6.0
b  14.0
c  22.0
d  18.0

這裏，數字的計算方式如下：

A[a]: 2 + 4 = 6
A[b]: 2 + 4 + 8 = 14 # the observation is 4 (see how 4 is centered here)
A[c]: 4 + 8 + 10 = 22 # the observation is 8
A[d]: 8 + 10 = 18

時間序列案例

考慮以下時間序列 DataFrame：

idx = [pd.Timestamp('20201220 15:00:00'),
       pd.Timestamp('20201220 15:00:01'),
       pd.Timestamp('20201220 15:00:02'),
       pd.Timestamp('20201220 15:00:04'),
       pd.Timestamp('20201220 15:00:05')]
df = pd.DataFrame({"A":[1,10,100,1000,10000]}, index=idx)
df



                     A
2020-12-20 15:00:00  1
2020-12-20 15:00:01  10
2020-12-20 15:00:02  100
2020-12-20 15:00:04  1000
2020-12-20 15:00:05  10000

對周期為 2 秒的窗口求和：

df.rolling(window="2S").sum()



                     A
2020-12-20 15:00:00  1.0
2020-12-20 15:00:01  11.0
2020-12-20 15:00:02  110.0
2020-12-20 15:00:04  1000.0
2020-12-20 15:00:05  11000.0

請注意，由於窗口是基於偏移量的，因此默認情況下為min_periods=1。

您可以指定 closed 參數來指示端點是否應包含/排除：

df.rolling(window="2S", closed="both").sum()   # both endpoints are inclusive



                     A
2020-12-20 15:00:00  1.0
2020-12-20 15:00:01  11.0
2020-12-20 15:00:02  111.0
2020-12-20 15:00:04  1100.0
2020-12-20 15:00:05  11000.0

相關用法

注：本文由純淨天空篩選整理自Isshin Inada大神的英文原創作品 Pandas DataFrame | rolling method。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。