All technological notes.
The majority of machine learning models contain parameters that can be adjusted to vary how the model learns.
parameter C that controls regularization, which affects the complexity of the model. 控制模型的复杂程度grid search: try out different values and then pick the value that gives the best score.
Higher values of C tell the model, the training data resembles real world information, place a greater weight on the training data. While lower values of C do the opposite.
sklearn.datasets.load_iris(): Load and return the iris dataset (classification).
The iris dataset is a classic and very easy multi-class classification dataset.
These measures were used to create a linear discriminant model 线性判别分析 to classify the species.
| Feature | |
|---|---|
| Classes | 3 |
| Samples per class | 50 |
| Samples total | 150 |
| Dimensionality | 4 |
| Features | real, positive |
from sklearn import datasets
iris = datasets.load_iris()
X = iris['data']
y = iris['target']
print("X",X)
print("y",y)
Using logistic model for classifying the iris flowers.
max_iter to a higher value to ensure that the model finds a result.max_iterint: default=100
from sklearn.linear_model import LogisticRegression
predict_model = LogisticRegression(max_iter=10000)
predict_model.fit(X, y)
sklearn.linear_model.LogisticRegression().score(): Return the mean accuracy on the given test data and labels.
With the default setting of C = 1, we achieved a score of 0.973.
score = predict_model.score(X, y)
print("score", score)
# score 0.9733333333333334
Since the default value for C is 1, we will set a range of values surrounding it.
C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]
scores = []
for choice in C:
# set_params(): Set the parameters of this estimator.
predict_model.set_params(C=choice)
# fit data after setting parameters
predict_model.fit(X, y)
scores.append(predict_model.score(X, y))
print(scores)
# [0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]
It seems that increasing C beyond this amount does not help increase model accuracy.
over fitting.To avoid being misled by the scores on the training data, we can put aside a portion of our data and use it specifically for the purpose of testing the model.