sklearn中朴素贝叶斯的使用(python)

摘要:本文主要展示朴素贝叶斯模型在分类中的使用,包含三种:高斯贝叶斯,多项式贝叶斯,伯努利贝叶斯(二项贝叶斯);

00 安装scikit-learn库

pip install scikit-learn


01 获取sklearn图像识别素材

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, naive_bayes
digits=datasets.load_digits()

dex1=np.random.choice(1797,1500,replace=False)
dex2=[]
for i in range(1797):
    if i not in dex1:
        dex2.append(i)

train_x=digits.data[dex1]
train_y=digits.target[dex1]
test_x=digits.data[dex2]
test_y=digits.target[dex2]

02 高斯贝叶斯模型

 classi=naive_bayes.GaussianNB()
classi.fit(train_x,train_y)
classi.score(test_x,test_y)
Out[44]: 0.835016835016835

classi.predict(test_x)
Out[45]:
array([0, 7, 1, 3, 7, 5, 4, 7, 8, 0, 1, 6, 3, 3, 4, 5, 1, 7, 7, 5, 8, 8,
       1, 8, 3, 6, 8, 0, 7, 7, 3, 8, 7, 1, 3, 7, 3, 9, 0, 6, 9, 7, 4, 1,
       7, 9, 8, 0, 7, 6, 2, 6, 7, 4, 6, 8, 3, 8, 4, 3, 7, 4, 9, 7, 2, 3,
       6, 1, 5, 9, 5, 7, 2, 1, 6, 6, 1, 5, 8, 2, 1, 3, 1, 4, 5, 4, 8, 9,
       0, 8, 8, 9, 6, 7, 8, 8, 8, 9, 1, 5, 6, 3, 1, 1, 3, 7, 9, 1, 6, 2,
       2, 6, 1, 0, 3, 8, 7, 4, 9, 8, 0, 3, 8, 9, 1, 5, 8, 4, 3, 7, 1, 7,
       6, 7, 7, 6, 4, 8, 8, 4, 9, 8, 7, 5, 0, 2, 7, 4, 7, 1, 2, 6, 9, 1,
       5, 6, 8, 8, 6, 4, 4, 7, 2, 4, 1, 0, 3, 9, 6, 0, 3, 3, 4, 6, 1, 0,
       7, 6, 5, 7, 9, 6, 6, 0, 7, 8, 3, 6, 3, 7, 7, 6, 4, 8, 1, 4, 6, 7,
       3, 5, 8, 2, 3, 6, 1, 5, 8, 1, 7, 6, 4, 2, 5, 7, 3, 7, 9, 5, 5, 7,
       0, 8, 6, 1, 6, 3, 6, 4, 8, 2, 8, 6, 5, 5, 6, 5, 3, 7, 1, 7, 2, 3,
       3, 3, 4, 9, 1, 5, 6, 3, 9, 4, 6, 9, 4, 5, 5, 7, 2, 4, 7, 0, 7, 0,
       0, 7, 8, 6, 5, 7, 1, 7, 5, 8, 0, 7, 7, 8, 5, 8, 4, 7, 9, 5, 9, 9,
       0, 7, 6, 6, 9, 0, 1, 3, 1, 2, 5])

classi.predict_proba(test_x)

03 多项式贝叶斯模型

classi=naive_bayes.MultinomialNB()
classi.fit(train_x,train_y)
classi.score(test_x,test_y)
Out[46]: 0.9023569023569024

alphas=np.logspace(-2,6,200)
scores=[]

考察参数alpha对模型预测性能的影响:
for i in alphas:
    classi=naive_bayes.MultinomialNB(alpha=i)
    classi.fit(train_x,train_y)
    scores.append(classi.score(test_x,test_y))
plt.plot(alphas,scores)
plt.xscale('log')
plt.ylim(0,1) 

sklearn中朴素贝叶斯的使用(python)的图1

04 伯努利贝叶斯模型

classi=naive_bayes.BernoulliNB()
classi.fit(train_x,train_y)
classi.score(train_x,train_y)
Out[59]: 0.8653333333333333

classi.score(test_x,test_y)
Out[60]: 0.8451178451178452

考察参数alpha对模型预测性能的影响:

alphas=np.logspace(-2,5,200)
scores=[]
for i in alphas:
    classi=naive_bayes.BernoulliNB(alpha=i)
    classi.fit(train_x,train_y)
    scores.append(classi.score(test_x,test_y))
plt.plot(alphas,scores)
plt.xscale('log')
plt.ylim(0,1)

sklearn中朴素贝叶斯的使用(python)的图2


05 人工图像识别(对比伯努利贝叶斯模型)

伯努利贝叶斯模型的识别结果:

sklearn中朴素贝叶斯的使用(python)的图3

查看图片索引对应的图片:

pdex=[3,13,14,17,26,37,45,46,47,53]
fig,ax=plt.subplots(3,4)
fig.set_size_inches(10,8)
ax1=ax[0,0]
ax1.imshow(digits.images[3],cmap=plt.cm.gray_r)
ax2=ax[0,1]
ax2.imshow(digits.images[13],cmap=plt.cm.gray_r)
ax3=ax[0,2]
ax3.imshow(digits.images[14],cmap=plt.cm.gray_r)
ax4=ax[0,3]
ax4.imshow(digits.images[17],cmap=plt.cm.gray_r)
ax5=ax[1,0]
ax5.imshow(digits.images[26],cmap=plt.cm.gray_r)
ax6=ax[1,1]
ax6.imshow(digits.images[37],cmap=plt.cm.gray_r)
ax7=ax[1,2]
ax7.imshow(digits.images[45],cmap=plt.cm.gray_r)
ax8=ax[1,3]
ax8.imshow(digits.images[46],cmap=plt.cm.gray_r)
ax9=ax[2,0]
ax9.imshow(digits.images[47],cmap=plt.cm.gray_r)
ax10=ax[2,1]
ax10.imshow(digits.images[83],cmap=plt.cm.gray_r)
fig.show()

sklearn中朴素贝叶斯的使用(python)的图4人工识别(*为看不出来):3,3,*,*,*,*,3,*,1,*

结果,人工能识别的,模型都正确识别了。


06 结论

01 重要概念:贝叶斯定理,全概率公式,条件概率分布;

02 特征(属性,参数,变量,特性):特征值可以离散,也可以连续;

默认 最新
当前暂无评论,小编等你评论哦!
点赞 1 评论 收藏
关注