leaders(Z, T)
Returns the root nodes in a hierarchical clustering corresponding to a cut defined by a flat cluster assignment vector T
. See the fcluster
function for more information on the format of T
.
For each flat cluster j of the k flat clusters represented in the n-sized flat cluster assignment vector T
, this function finds the lowest cluster node i in the linkage tree Z, such that:
- leaf descendants belong only to flat cluster j (i.e.,
T[p]==j
for all p in S(i), where S(i) is the set of leaf ids of descendant leaf nodes with cluster node i)- there does not exist a leaf that is not a descendant with i that also belongs to cluster j (i.e.,
T[q]!=j
for all q not in S(i)). If this condition is violated,T
is not a valid cluster assignment vector, and an exception will be thrown.
The hierarchical clustering encoded as a matrix. See linkage for more information.
The flat cluster assignment vector.
The leader linkage node id's stored as a k-element 1-D array, where k
is the number of flat clusters found in T
.
L[j]=i
is the linkage cluster node id that is the leader of flat cluster with id M[j]. If i < n
, i
corresponds to an original observation, otherwise it corresponds to a non-singleton cluster.
The leader linkage node id's stored as a k-element 1-D array, where k
is the number of flat clusters found in T
. This allows the set of flat cluster ids to be any arbitrary set of k
integers.
For example: if L[3]=2
and M[3]=8
, the flat cluster with id 8's leader is linkage node 2.
Return the root nodes in a hierarchical clustering.
fcluster
from scipy.cluster.hierarchy import ward, fcluster, leaders
from scipy.spatial.distance import pdist
X = [[0, 0], [0, 1], [1, 0],
[0, 4], [0, 3], [1, 4],
[4, 0], [3, 0], [4, 1],
[4, 4], [3, 4], [4, 3]]
Z = ward(pdist(X))
Z
T = fcluster(Z, 3, criterion='distance')
T
L, M = leaders(Z, T)
L
M
Hover to see nodes names; edges to Self not shown, Caped at 50 nodes.
Using a canvas is more power efficient and can get hundred of nodes ; but does not allow hyperlinks; , arrows or text (beyond on hover)
SVG is more flexible but power hungry; and does not scale well to 50 + nodes.
All aboves nodes referred to, (or are referred from) current nodes; Edges from Self to other have been omitted (or all nodes would be connected to the central node "self" which is not useful). Nodes are colored by the library they belong to, and scaled with the number of references pointing them