Python sklearn TimeSeriesSplit用法及代碼示例

本文簡要介紹python語言中 sklearn.model_selection.TimeSeriesSplit 的用法。

用法: class sklearn.model_selection.TimeSeriesSplit(n_splits=5, *, max_train_size=None, test_size=None, gap=0)

時間序列cross-validator

提供訓練/測試索引以拆分在訓練/測試集中以固定時間間隔觀察到的時間序列數據樣本。在每次拆分中，測試指標必須高於以前，因此交叉驗證器中的洗牌是不合適的。

此交叉驗證對象是 KFold 的變體。在第 k 次拆分中，它返回前 k 折作為訓練集，第 (k+1) 折作為測試集。

請注意，與標準交叉驗證方法不同，連續訓練集是之前的訓練集的超集。

在用戶指南中閱讀更多信息。

參數：

n_splits：整數，默認=5: 分割數。必須至少為 2。
max_train_size：整數，默認=無: 單個訓練集的最大大小。
test_size：整數，默認=無: 用於限製測試集的大小。默認為 n_samples // (n_splits + 1) ，這是 gap=0 允許的最大值。
gap：整數，默認=0: 在測試集之前從每個訓練集末尾排除的樣本數。

注意：

訓練集在第 i 次拆分中的大小為 i * n_samples // (n_splits + 1) + n_samples % (n_splits + 1)，默認情況下測試集的大小為 n_samples//(n_splits + 1)，其中 n_samples 是樣本數。

例子：

>>> import numpy as np
>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> tscv = TimeSeriesSplit()
>>> print(tscv)
TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None)
>>> for train_index, test_index in tscv.split(X):
...     print("TRAIN:", train_index, "TEST:", test_index)
...     X_train, X_test = X[train_index], X[test_index]
...     y_train, y_test = y[train_index], y[test_index]
TRAIN: [0] TEST: [1]
TRAIN: [0 1] TEST: [2]
TRAIN: [0 1 2] TEST: [3]
TRAIN: [0 1 2 3] TEST: [4]
TRAIN: [0 1 2 3 4] TEST: [5]
>>> # Fix test_size to 2 with 12 samples
>>> X = np.random.randn(12, 2)
>>> y = np.random.randint(0, 2, 12)
>>> tscv = TimeSeriesSplit(n_splits=3, test_size=2)
>>> for train_index, test_index in tscv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0 1 2 3 4 5] TEST: [6 7]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9]
TRAIN: [0 1 2 3 4 5 6 7 8 9] TEST: [10 11]
>>> # Add in a 2 period gap
>>> tscv = TimeSeriesSplit(n_splits=3, test_size=2, gap=2)
>>> for train_index, test_index in tscv.split(X):
...    print("TRAIN:", train_index, "TEST:", test_index)
...    X_train, X_test = X[train_index], X[test_index]
...    y_train, y_test = y[train_index], y[test_index]
TRAIN: [0 1 2 3] TEST: [6 7]
TRAIN: [0 1 2 3 4 5] TEST: [8 9]
TRAIN: [0 1 2 3 4 5 6 7] TEST: [10 11]

相關用法

注：本文由純淨天空篩選整理自scikit-learn.org大神的英文原創作品 sklearn.model_selection.TimeSeriesSplit。非經特殊聲明，原始代碼版權歸原作者所有，本譯文未經允許或授權，請勿轉載或複製。