线性模型用于分类(sklearn)
摘要:本文主要展示scikit-learn中线性模型在分类问题中的使用,涉及逻辑回归,线性判别;
00 安装scikit-learn库
pip install scikit-learn
01 获取sklearn中鸢尾花数据
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
iris=datasets.load_iris()
dex1=np.random.choice(150,size=120,replace=False)
dex2=[]
for i in range(150):
if i not in dex1:
dex2.append(i)
train_x=iris.data[dex1,:]
train_y=iris.target[dex1]
test_x=iris.data[dex2,:]
test_y=iris.target[dex2]
02 逻辑回归
regre=linear_model.LogisticRegression(multi_class='ovr',solver='liblinear')
regre.fit(train_x,train_y)
regre.score(test_x,test_y)
regre.coef_
Out[40]:
array([[ 0.39001649, 1.4110123 , -2.14837944, -0.97686956],
[ 0.54586613, -1.70617607, 0.38138451, -1.1176497 ],
[-1.63246048, -1.17046085, 2.28632906, 2.21395272]])
regre.intercept_
Out[41]: array([ 0.25529718, 0.900473 , -1.07104004])
regre.predict(test_x)
Out[42]:
array([0, 0, 0, 0, 0, 0, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2])
regre.predict_proba(test_x)
regre.predict_log_proba(test_x)
regre=linear_model.LogisticRegression(multi_class='multinomial',solver='lbfgs',max_iter=102)
regre.fit(train_x,train_y)
regre.score(test_x,test_y)
regre.n_iter_
Out[43]: array([96])
03 线性判别
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, discriminant_analysis
iris=datasets.load_iris()
dex1=np.random.choice(150,size=120,replace=False)
dex2=[]
for i in range(150):
if i not in dex1:
dex2.append(i)
train_x=iris.data[dex1,:]
train_y=iris.target[dex1]
test_x=iris.data[dex2,:]
test_y=iris.target[dex2]
regre=discriminant_analysis.LinearDiscriminantAnalysis()
regre.fit(train_x,train_y)
regre.score(test_x,test_y)
Out[52]: 1.0
regre.coef_
Out[54]:
array([[ 6.34755316, 13.66153017, -16.63757493, -22.43052621],
[ -1.61851372, -4.47717549, 4.1348641 , 3.09718096],
[ -4.26521741, -8.34418599, 11.26989661, 17.27715514]])
regre.intercept_
Out[55]: array([-18.84873469, 0.66754016, -30.35506047])
regre.predict_proba(test_x)
04 总结
01 线性模型不仅仅可以用于回归,也可以用于分类;
02 对于LogisticRegression,LinearDiscriminantAnalysis算法,属性(变量,特征)个数就是coef_一行的个数(列数),标签(目标,标记)分类个数就是coef_的行数,也是intercept_一行的个数;
03 对于LogisticRegression,LinearDiscriminantAnalysis算法,不仅仅能得到分类结果,还能计算样本分类的概率;
查看更多评论 >