sklearn中朴素贝叶斯的使用(python)
摘要:本文主要展示朴素贝叶斯模型在分类中的使用,包含三种:高斯贝叶斯,多项式贝叶斯,伯努利贝叶斯(二项贝叶斯);
00 安装scikit-learn库
pip install scikit-learn
01 获取sklearn图像识别素材
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, naive_bayes
digits=datasets.load_digits()
dex1=np.random.choice(1797,1500,replace=False)
dex2=[]
for i in range(1797):
if i not in dex1:
dex2.append(i)
train_x=digits.data[dex1]
train_y=digits.target[dex1]
test_x=digits.data[dex2]
test_y=digits.target[dex2]
02 高斯贝叶斯模型
classi=naive_bayes.GaussianNB()
classi.fit(train_x,train_y)
classi.score(test_x,test_y)
Out[44]: 0.835016835016835
classi.predict(test_x)
Out[45]:
array([0, 7, 1, 3, 7, 5, 4, 7, 8, 0, 1, 6, 3, 3, 4, 5, 1, 7, 7, 5, 8, 8,
1, 8, 3, 6, 8, 0, 7, 7, 3, 8, 7, 1, 3, 7, 3, 9, 0, 6, 9, 7, 4, 1,
7, 9, 8, 0, 7, 6, 2, 6, 7, 4, 6, 8, 3, 8, 4, 3, 7, 4, 9, 7, 2, 3,
6, 1, 5, 9, 5, 7, 2, 1, 6, 6, 1, 5, 8, 2, 1, 3, 1, 4, 5, 4, 8, 9,
0, 8, 8, 9, 6, 7, 8, 8, 8, 9, 1, 5, 6, 3, 1, 1, 3, 7, 9, 1, 6, 2,
2, 6, 1, 0, 3, 8, 7, 4, 9, 8, 0, 3, 8, 9, 1, 5, 8, 4, 3, 7, 1, 7,
6, 7, 7, 6, 4, 8, 8, 4, 9, 8, 7, 5, 0, 2, 7, 4, 7, 1, 2, 6, 9, 1,
5, 6, 8, 8, 6, 4, 4, 7, 2, 4, 1, 0, 3, 9, 6, 0, 3, 3, 4, 6, 1, 0,
7, 6, 5, 7, 9, 6, 6, 0, 7, 8, 3, 6, 3, 7, 7, 6, 4, 8, 1, 4, 6, 7,
3, 5, 8, 2, 3, 6, 1, 5, 8, 1, 7, 6, 4, 2, 5, 7, 3, 7, 9, 5, 5, 7,
0, 8, 6, 1, 6, 3, 6, 4, 8, 2, 8, 6, 5, 5, 6, 5, 3, 7, 1, 7, 2, 3,
3, 3, 4, 9, 1, 5, 6, 3, 9, 4, 6, 9, 4, 5, 5, 7, 2, 4, 7, 0, 7, 0,
0, 7, 8, 6, 5, 7, 1, 7, 5, 8, 0, 7, 7, 8, 5, 8, 4, 7, 9, 5, 9, 9,
0, 7, 6, 6, 9, 0, 1, 3, 1, 2, 5])
classi.predict_proba(test_x)
03 多项式贝叶斯模型
classi=naive_bayes.MultinomialNB()
classi.fit(train_x,train_y)
classi.score(test_x,test_y)
Out[46]: 0.9023569023569024
alphas=np.logspace(-2,6,200)
scores=[]
考察参数alpha对模型预测性能的影响:
for i in alphas:
classi=naive_bayes.MultinomialNB(alpha=i)
classi.fit(train_x,train_y)
scores.append(classi.score(test_x,test_y))
plt.plot(alphas,scores)
plt.xscale('log')
plt.ylim(0,1)
04 伯努利贝叶斯模型
classi=naive_bayes.BernoulliNB()
classi.fit(train_x,train_y)
classi.score(train_x,train_y)
Out[59]: 0.8653333333333333
classi.score(test_x,test_y)
Out[60]: 0.8451178451178452
考察参数alpha对模型预测性能的影响:
alphas=np.logspace(-2,5,200)
scores=[]
for i in alphas:
classi=naive_bayes.BernoulliNB(alpha=i)
classi.fit(train_x,train_y)
scores.append(classi.score(test_x,test_y))
plt.plot(alphas,scores)
plt.xscale('log')
plt.ylim(0,1)
05 人工图像识别(对比伯努利贝叶斯模型)
伯努利贝叶斯模型的识别结果:
查看图片索引对应的图片:
pdex=[3,13,14,17,26,37,45,46,47,53]
fig,ax=plt.subplots(3,4)
fig.set_size_inches(10,8)
ax1=ax[0,0]
ax1.imshow(digits.images[3],cmap=plt.cm.gray_r)
ax2=ax[0,1]
ax2.imshow(digits.images[13],cmap=plt.cm.gray_r)
ax3=ax[0,2]
ax3.imshow(digits.images[14],cmap=plt.cm.gray_r)
ax4=ax[0,3]
ax4.imshow(digits.images[17],cmap=plt.cm.gray_r)
ax5=ax[1,0]
ax5.imshow(digits.images[26],cmap=plt.cm.gray_r)
ax6=ax[1,1]
ax6.imshow(digits.images[37],cmap=plt.cm.gray_r)
ax7=ax[1,2]
ax7.imshow(digits.images[45],cmap=plt.cm.gray_r)
ax8=ax[1,3]
ax8.imshow(digits.images[46],cmap=plt.cm.gray_r)
ax9=ax[2,0]
ax9.imshow(digits.images[47],cmap=plt.cm.gray_r)
ax10=ax[2,1]
ax10.imshow(digits.images[83],cmap=plt.cm.gray_r)
fig.show()
人工识别(*为看不出来):3,3,*,*,*,*,3,*,1,*
结果,人工能识别的,模型都正确识别了。
06 结论
01 重要概念:贝叶斯定理,全概率公式,条件概率分布;
02 特征(属性,参数,变量,特性):特征值可以离散,也可以连续;