之前已经通过贝叶斯分类器(NBC)建立了对Kmeans分类结果的预测模型,可以对新节能小区的节能场景进行预测,以制定特定场景下的精细化节能措施。模型准确率: 0.88、召回率: 0.88、F1值: 0.87,模型表现良好。现在我们使用Python的scikit-learn模块自动构建、选择最优模型算法并进行参数调优,以找到最优的预测模型,进一步提高预测的准确和稳定。
自动建模
我们还是延续Kmeans和贝叶斯所选择的特征工程方法及特征变量,重新选择预测模型。
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# 将数据集拆分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X_clean[selected_features], y, test_size=0.2, random_state=42)
# 定义要尝试的模型和参数
models = [ { 'name': 'Random Forest',
'model': RandomForestClassifier(),
'params': { 'n_estimators': [10, 50, 100], 'max_depth': [None, 5, 10] } },
{ 'name': 'Support Vector Machine',
'model': SVC(),
'params': { 'C': [1, 10, 100], 'kernel': ['linear', 'rbf'] } } ]
best_model = None
best_accuracy = 0.0
# 循环尝试每个模型
for model_info in models:
name = model_info['name']
model = model_info['model']
params = model_info['params']
# 使用网格搜索进行参数调整
grid_search = GridSearchCV(model, params, cv=5)
grid_search.fit(X_train, y_train)
# 获取最佳模型和参数
best_estimator = grid_search.best_estimator_
best_params = grid_search.best_params_
# 在测试集上评估模型
y_pred = best_estimator.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
# 打印结果
print(f"{name}:")
print(f"Best Parameters: {best_params}")
print(f"Accuracy: {accuracy}\n")
# 更新最优模型和参数
if accuracy > best_accuracy:
best_model = best_estimator
best_accuracy = accuracy
Random Forest:
Best Parameters: {'max_depth': 10, 'n_estimators': 100}
Accuracy: 0.848314606741573
Support Vector Machine:
Best Parameters: {'C': 10, 'kernel': 'linear'}
Accuracy: 0.9157303370786517
使用了两个模型:随机森林(Random Forest)和支持向量机(Support Vector Machine)。它通过网格搜索(GridSearchCV)来调整每个模型的参数,并在测试集上评估模型的准确性。最后,选择具有最高准确性的模型,并使用该模型进行预测。根据预测准确率结果,支持向量机的模型表现更佳,准确率达到0.9157,最优参数组合为{'C': 10, 'kernel': 'linear'}
模型评估
from sklearn.metrics import accuracy_score, recall_score, f1_score, confusion_matrix
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
# 计算召回率
recall = recall_score(y_test, y_pred, average='macro')
# 计算F1值
f1 = f1_score(y_test, y_pred, average='macro')
print("准确率: {:.2f}".format(accuracy))
print("召回率: {:.2f}".format(recall))
print("F1值: {:.2f}".format(f1))
准确率: 0.92
召回率: 0.91
F1值: 0.91
模型校验
# 计算混淆矩阵
cm = confusion_matrix(y_test, y_pred)
# print("Confusion Matrix:")
# print(cm)
# 绘制热图
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, cmap='coolwarm')
plt.title('Confusion Matrix')
plt.show()
模型评估各指标值均大于0.9,预测准确率、稳定性均有显著提升,支持向量机较随机森林和贝叶斯分类器表现更好,预测能力更强。