All technological notes.
Predict whether a customer go to a comedy show, based on series of data.
Different Results
panda.read_csv()import pandas as pd
FILE_PATH = "./data_decision_tree.csv"
file_data = pd.read_csv(FILE_PATH)
# file_data.shape
# file_data.columns
# file_data.info()
# print(file_data)
file_data.head()
Missing data
To make a decision tree, all data has to be numerical.
pandas.map()# Convert nationality into numerical values
map_nationality = {'UK': 0, 'USA': 1, 'N': 2}
file_data['Nationality'] = file_data['Nationality'].map(map_nationality)
# Convert go into numerical values
map_go = {'YES': 1, 'NO': 0}
file_data['Go'] = file_data['Go'].map(map_go)
print(file_data)
feature column: the columns with the values that target values are predicted from.
target column: the column with the values that are to be predicted.
features = ['Age', 'Experience', 'Rank', 'Nationality']
feature_list = file_data[features]
target_list = file_data['Go']
# print(feature_list)
# print(target_list)
sklearn.tree.DecisionTreeClassifier(): create a decision tree model objectfrom sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import matplotlib.pyplot as plt
predict_model = DecisionTreeClassifier()
predict_model = predict_model.fit(feature_list.values, target_list.values)
# sklearn.tree.plot_tree(): Plot a decision tree.
# decision_tree: The decision tree to be plotted.
# feature_names: Names of each of the features.
tree.plot_tree(predict_model, feature_names=features)
plt.show()

Rank <= 6.5 means that every comedian with a rank of 6.5 or lower will follow the True arrow (to the left), and the rest will follow the False arrow (to the right).
gini = 0.497 refers to the quality of the split, and is always a number between 0.0 and 0.5, where 0.0 would mean all of the samples got the same result, and 0.5 would mean that the split is done exactly in the middle.
samples = 13 means that there are 13 comedians left at this point in the decision, which is all of them since this is the first step.
value = [6, 7] means that of these 13 comedians, 6 will get a “NO”, and 7 will get a “GO”.
predict_value = predict_model.predict([[40, 10, 7, 1]])
# print(predict_value)
print("Yes" if predict_value[0] else "No") # No