Python库实现K-Fold Corss Vavlidation
k 折交叉验证(k-fold cross validation)
- 可以参看官网链接: http://scikit-learn.org/stable/modules/cross_validation.html
- csdn上有非常详细的k-fold cross 示例 : http://blog.csdn.net/liuweiyuxiang/article/details/78489867
示例1
最简单的方法是直接调用 cross_val_score
我们可以直接看一下 K-fold 是怎样划分数据的: X 有四个数据,把它分成 2 折, 结果中最后一个集合是测试集,前面的是训练集, 每一行为 1 折:
import numpy as np
from sklearn.model_selection import KFold
X = ["a", "b", "c", "d"]
kf = KFold(n_splits=2)
for train, test in kf.split(X):
print("%s %s" % (train, test))
[2 3] [0 1]
[0 1] [2 3]
示例2
# 下面代码演示了K-fold交叉验证是如何进行数据分割的
# simulate splitting a dataset of 25 observations into 5 folds
from sklearn.cross_validation import KFold
kf = KFold(25, n_folds=5, shuffle=False)
# print the contents of each training and testing set
print ('{} {:^61} {}'.format('Iteration', 'Training set observations', 'Testing set observations'))
for iteration, data in enumerate(kf, start=1):
print ('{:^9} {} {:^25}'.format(iteration, str(data[0]), str(data[1])))
Iteration Training set observations Testing set observations
1 [ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [0 1 2 3 4]
2 [ 0 1 2 3 4 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] [5 6 7 8 9]
3 [ 0 1 2 3 4 5 6 7 8 9 15 16 17 18 19 20 21 22 23 24] [10 11 12 13 14]
4 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 20 21 22 23 24] [15 16 17 18 19]
5 [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] [20 21 22 23 24]
LeaveOneOut
同样的数据 X,我们看 LeaveOneOut 后是什么样子, 那就是把它分成 4 折, 结果中最后一个集合是测试集,只有一个元素,前面的是训练集, 每一行为 1 折:
from sklearn.model_selection import LeaveOneOut
X = [1, 2, 3, 4]
loo = LeaveOneOut()
for train, test in loo.split(X):
print("%s %s" % (train, test))
[1 2 3] [0]
[0 2 3] [1]
[0 1 3] [2]
[0 1 2] [3]