- sklearn核心对象类型:评估器(estimator)
sklearn的建模过程最核心的步骤就是围绕着评估器进行模型的训练。
- sklearn建模过程1-回归
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
# 1读取数据 data : =2*1?2+1
df = pd.read_csv('demo_data.csv')
# 2.dataframe数据类型 --> array
# sklearn默认接收的对象类型是数组,即无论是特征矩阵还是标签数组,最好都先转化成array对象类型再进行输入
# 称特征矩阵为Features Matrix,称特征数组为Target Vector
X = df[['x1','x2']].values
y = df.y.values
# 3.初始化评估器
model = LinearRegression()
#4.调用model的fit方法进行模型数据训练
model.fit(X, y)
# 5. 查看模型结果 自变量参数, 模型截距
print(model.coef_,model.intercept_)
# 6. 调用model中的predict方法进行预测
y_pre = model.predict(X)
# 7. 模型评估 sklearn.metrics模块下导入MSE计算函数
from sklearn.metrics import mean_squared_error
mean_squared_error(y_pre,y)
- sklearn建模过程2-分类
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
df = pd.read_csv('iris_data.csv')
X = df[df.columns[:-1]].values
y = df[df.columns[-1]].values
# 数据切割:训练集、测试集
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,stratify=y)
clf_test = LogisticRegression()
clf_test.fit(X_train, y_train)
#模型评估
clf_test.score(X_train, y_train)
clf_test.score(X_test, y_test)
from sklearn.metrics import accuracy_score
accuracy_score(y_test, clf_test.predict(X_test))
- sklearn模型保存及调用:joblib
import joblib
joblib.dump(clf_test,'clf_test.model')
model = joblib.load('clf_test.model')
model.predict(X_test)