Note_Tech

All technological notes.


Project maintained by simonangel-fong Hosted on GitHub Pages — Theme by mattgraham

Machine Learning - Hierarchical Clustering

Back


Hierarchical Clustering 分层群聚


Create Dataset

x_list = [4, 5, 10, 4, 3, 11, 14, 6, 10, 12]
y_list = [21, 19, 24, 17, 16, 25, 24, 22, 21, 21]

# zip() function returns a zip object
# Turn the data into a set of points 将两个数据列组合成坐标
data = list(zip(x_list, y_list))

print(data)
# [(4, 21), (5, 19), (10, 24), (4, 17), (3, 16), (11, 25), (14, 24), (6, 22), (10, 21), (12, 21)]

Method1: Scipy Library

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# scipy.cluster.hierarchy.linkage():
# Perform hierarchical/agglomerative clustering.
#   Method: method=’ward’ uses the Ward variance minimization algorithm.
#   metric: The distance metric to use in the case that y is a collection of observation vectors;
# return: ndarray, The hierarchical clustering encoded as a linkage matrix.
linkage_data = linkage(data, method='ward', metric='euclidean')

# dendrogram(): Plot the hierarchical clustering as a dendrogram(系统树图(一种表示亲缘关系的树状图解)).
dendrogram(linkage_data)

plt.show()

dendrogram


Method2: Scikit-Learn Library

import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering

# AgglomerativeClustering()
#   n_clustersint: The number of clusters to find.
#   affinity: The metric to use when calculating distance between instances in a feature array.
#   linkage: Which linkage criterion to use.
# return: object
hierarchical_cluster = AgglomerativeClustering(
    n_clusters=2, affinity='euclidean', linkage='ward')

# Fit and return the result of each sample’s clustering assignment.
labels = hierarchical_cluster.fit_predict(data)     # [0 0 1 0 0 1 1 0 1 1]

plt.scatter(x_list, y_list, c=labels)
plt.show()

scatter


TOP