博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Python 实现 Cross-validation
阅读量:2135 次
发布时间:2019-04-30

本文共 1964 字,大约阅读时间需要 6 分钟。

here we do evaluation only. We can then build the model separately

 

import pandas as pdfrom sklearn import treefrom sklearn.model_selection import cross_validateimport sklearn.metrics as mtdf = pd.read_csv("diabetes_data_upload.csv")X = pd.get_dummies(df.drop(columns="class"))y = df["class"]scorer = {    'f1': mt.make_scorer(mt.f1_score, pos_label="Positive"),     #Define different scoring metric to be used     #Defint “positive” label for F-measure    'accuracy': 'accuracy'    }dtc = tree.DecisionTreeClassifier()scores = cross_validate(dtc, X, y, cv=10, scoring=scorer) #Cross-validate with a decision tree classifierprint(scores)

 

parameter tuning + cross validation

import numpy as npimport pandas as pdimport sklearn.tree as treefrom sklearn.model_selection import GridSearchCVfrom sklearn.model_selection import KFoldfrom sklearn.metrics import confusion_matrixdf = pd.read_csv("diabetes_data_upload.csv")X = pd.get_dummies(df.drop(columns="class"))y = df["class"]parameters = {'min_impurity_decrease': [0.05*i for i in range(3)],              'criterion': ["gini", "entropy"]}dtc = tree.DecisionTreeClassifier() #Setup 10-fold CVkf = KFold(n_splits=10, shuffle=True)matrix = np.matrix('0 0; 0 0') #Generating confusion matrixfor train_index, test_index in kf.split(X): #Iterating through each fold of CV    X_train, X_test = X.iloc[train_index], X.iloc[test_index]    y_train, y_test = y.iloc[train_index], y.iloc[test_index]    gs_dtc = GridSearchCV(dtc, parameters, scoring="accuracy", cv=10)    gs_dtc.fit(X_train, y_train)    best_index = gs_dtc.cv_results_['rank_test_score'].argmin()    best_param = gs_dtc.cv_results_['params'][best_index] #Perform GridSearchCV using training data and get the best parameter    dtc_cv = tree.DecisionTreeClassifier(**best_param)    dtc_cv.fit(X_train, y_train)    y_pred = dtc_cv.predict(X_test)    matrix = np.add(matrix, confusion_matrix(y_test, y_pred)) #Build model and evaluate with testing dataprint(matrix)

 

 

转载地址:http://qaygf.baihongyu.com/

你可能感兴趣的文章
linux之CentOS下文件解压方式
查看>>
Django字段的创建并连接MYSQL
查看>>
div标签布局的使用
查看>>
HTML中表格的使用
查看>>
(模板 重要)Tarjan算法解决LCA问题(PAT 1151 LCA in a Binary Tree)
查看>>
(PAT 1154) Vertex Coloring (图的广度优先遍历)
查看>>
(PAT 1115) Counting Nodes in a BST (二叉查找树-统计指定层元素个数)
查看>>
(PAT 1143) Lowest Common Ancestor (二叉查找树的LCA)
查看>>
(PAT 1061) Dating (字符串处理)
查看>>
(PAT 1118) Birds in Forest (并查集)
查看>>
数据结构 拓扑排序
查看>>
(PAT 1040) Longest Symmetric String (DP-最长回文子串)
查看>>
(PAT 1145) Hashing - Average Search Time (哈希表冲突处理)
查看>>
(1129) Recommendation System 排序
查看>>
PAT1090 Highest Price in Supply Chain 树DFS
查看>>
(PAT 1096) Consecutive Factors (质因子分解)
查看>>
(PAT 1019) General Palindromic Number (进制转换)
查看>>
(PAT 1073) Scientific Notation (字符串模拟题)
查看>>
(PAT 1080) Graduate Admission (排序)
查看>>
Play on Words UVA - 10129 (欧拉路径)
查看>>