本文簡要介紹python語言中 sklearn.model_selection.TimeSeriesSplit
的用法。
用法:
class sklearn.model_selection.TimeSeriesSplit(n_splits=5, *, max_train_size=None, test_size=None, gap=0)
時間序列cross-validator
提供訓練/測試索引以拆分在訓練/測試集中以固定時間間隔觀察到的時間序列數據樣本。在每次拆分中,測試指標必須高於以前,因此交叉驗證器中的洗牌是不合適的。
此交叉驗證對象是
KFold
的變體。在第 k 次拆分中,它返回前 k 折作為訓練集,第 (k+1) 折作為測試集。請注意,與標準交叉驗證方法不同,連續訓練集是之前的訓練集的超集。
在用戶指南中閱讀更多信息。
- n_splits:整數,默認=5
分割數。必須至少為 2。
- max_train_size:整數,默認=無
單個訓練集的最大大小。
- test_size:整數,默認=無
用於限製測試集的大小。默認為
n_samples // (n_splits + 1)
,這是gap=0
允許的最大值。- gap:整數,默認=0
在測試集之前從每個訓練集末尾排除的樣本數。
參數:
注意:
訓練集在第
i
次拆分中的大小為i * n_samples // (n_splits + 1) + n_samples % (n_splits + 1)
,默認情況下測試集的大小為n_samples//(n_splits + 1)
,其中n_samples
是樣本數。例子:
>>> import numpy as np >>> from sklearn.model_selection import TimeSeriesSplit >>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> tscv = TimeSeriesSplit() >>> print(tscv) TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None) >>> for train_index, test_index in tscv.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [0] TEST: [1] TRAIN: [0 1] TEST: [2] TRAIN: [0 1 2] TEST: [3] TRAIN: [0 1 2 3] TEST: [4] TRAIN: [0 1 2 3 4] TEST: [5] >>> # Fix test_size to 2 with 12 samples >>> X = np.random.randn(12, 2) >>> y = np.random.randint(0, 2, 12) >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2) >>> for train_index, test_index in tscv.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [0 1 2 3 4 5] TEST: [6 7] TRAIN: [0 1 2 3 4 5 6 7] TEST: [8 9] TRAIN: [0 1 2 3 4 5 6 7 8 9] TEST: [10 11] >>> # Add in a 2 period gap >>> tscv = TimeSeriesSplit(n_splits=3, test_size=2, gap=2) >>> for train_index, test_index in tscv.split(X): ... print("TRAIN:", train_index, "TEST:", test_index) ... X_train, X_test = X[train_index], X[test_index] ... y_train, y_test = y[train_index], y[test_index] TRAIN: [0 1 2 3] TEST: [6 7] TRAIN: [0 1 2 3 4 5] TEST: [8 9] TRAIN: [0 1 2 3 4 5 6 7] TEST: [10 11]
相關用法
- Python sklearn TweedieRegressor用法及代碼示例
- Python sklearn TSNE用法及代碼示例
- Python sklearn TfidfVectorizer用法及代碼示例
- Python sklearn TheilSenRegressor用法及代碼示例
- Python sklearn TfidfTransformer用法及代碼示例
- Python sklearn TruncatedSVD用法及代碼示例
- Python sklearn TransformedTargetRegressor用法及代碼示例
- Python sklearn jaccard_score用法及代碼示例
- Python sklearn WhiteKernel用法及代碼示例
- Python sklearn CalibrationDisplay.from_predictions用法及代碼示例
- Python sklearn VotingRegressor用法及代碼示例
- Python sklearn gen_batches用法及代碼示例
- Python sklearn ExpSineSquared用法及代碼示例
- Python sklearn MDS用法及代碼示例
- Python sklearn adjusted_rand_score用法及代碼示例
- Python sklearn MLPClassifier用法及代碼示例
- Python sklearn train_test_split用法及代碼示例
- Python sklearn RandomTreesEmbedding用法及代碼示例
- Python sklearn GradientBoostingRegressor用法及代碼示例
- Python sklearn GridSearchCV用法及代碼示例
- Python sklearn log_loss用法及代碼示例
- Python sklearn r2_score用法及代碼示例
- Python sklearn ndcg_score用法及代碼示例
- Python sklearn ShrunkCovariance用法及代碼示例
- Python sklearn SelfTrainingClassifier用法及代碼示例
注:本文由純淨天空篩選整理自scikit-learn.org大神的英文原創作品 sklearn.model_selection.TimeSeriesSplit。非經特殊聲明,原始代碼版權歸原作者所有,本譯文未經允許或授權,請勿轉載或複製。