Note_Tech

All technological notes.


Project maintained by simonangel-fong Hosted on GitHub Pages — Theme by mattgraham

Machine Learning - Grid Search

Back



Example: Find the best C

Using Default Parameters

  1. Load in the dataset
from sklearn import datasets
iris = datasets.load_iris()

X = iris['data']
y = iris['target']

print("X",X)
print("y",y)

from sklearn.linear_model import LogisticRegression

predict_model = LogisticRegression(max_iter=10000)
predict_model.fit(X, y)
score = predict_model.score(X, y)

print("score", score)
#  score 0.9733333333333334

Since the default value for C is 1, we will set a range of values surrounding it.

  1. create a range of value arround 1, which is the default value.
  2. loop over the range of values and append to a score list.

C = [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2]

scores = []
for choice in C:
    # set_params(): Set the parameters of this estimator.
    predict_model.set_params(C=choice)
    # fit data after setting parameters
    predict_model.fit(X, y)
    scores.append(predict_model.score(X, y))

print(scores)
# [0.9666666666666667, 0.9666666666666667, 0.9733333333333334, 0.9733333333333334, 0.98, 0.98, 0.9866666666666667, 0.9866666666666667]

Results Explained

It seems that increasing C beyond this amount does not help increase model accuracy.


Note on Best Practices

To avoid being misled by the scores on the training data, we can put aside a portion of our data and use it specifically for the purpose of testing the model.


TOP