0 votes
1 view
in Machine Learning by (19k points)

I want to color my clusters with a color map that I made in the form of a dictionary (i.e. {leaf: color}).

I've tried following https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/ but the colors get messed up for some reason. The default plot looks good, I just want to assign those colors differently. I saw that there was a link_color_func but when I tried using my color map (D_leaf_color dictionary) I got an error b/c it wasn't a function. I've created D_leaf_color to customize the colors of the leaves associated with particular clusters. In my actual dataset, the colors mean something so I'm steering away from arbitrary color assignments.

I don't want to use color_threshold b/c in my actual data, I have way more clusters and SciPy repeats the colors, hence this question. . .

How can I use my leaf-color dictionary to customize the color of my dendrogram clusters?

I made a GitHub issue https://github.com/scipy/scipy/issues/6346 where I further elaborated on the approach to color the leaves in Interpreting the output of SciPy's hierarchical clustering dendrogram? (maybe found a bug...) but I still can't figure out how to actually either: (i) use dendrogram output to reconstruct my dendrogram with my specified color dictionary or (ii) reformat my D_leaf_color dictionary for the link_color_func parameter.

# Init

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns; sns.set()

# Load data

from sklearn.datasets import load_diabetes

# Clustering

from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list

from scipy.spatial import distance

from fastcluster import linkage # You can use SciPy one too

%matplotlib inline

# Dataset

A_data = load_diabetes().data

DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])

# Absolute value of correlation matrix, then subtract from 1 for disimilarity

DF_dism = 1 - np.abs(DF_diabetes.corr())

# Compute average linkage

A_dist = distance.squareform(DF_dism.as_matrix())

Z = linkage(A_dist,method="average")

# Color mapping

D_leaf_colors = {"attr_1": "#808080", # Unclustered gray

                 "attr_4": "#B061FF", # Cluster 1 indigo

                 "attr_5": "#B061FF",

                 "attr_2": "#B061FF",

                 "attr_8": "#B061FF",

                 "attr_6": "#B061FF",

                 "attr_7": "#B061FF",

                 "attr_0": "#61ffff", # Cluster 2 cyan

                 "attr_3": "#61ffff",

                 "attr_9": "#61ffff",

                 }

# Dendrogram

# To get this dendrogram coloring below  `color_threshold=0.7`

D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None, leaf_font_size=12, leaf_rotation=45, link_color_func=D_leaf_colors)

# TypeError: 'dict' object is not callable

enter image description here

1 Answer

0 votes
by (33.2k points)

You can return the matrix Z of linkage() and link_color_func:

dflt_col = "#808080"   # Unclustered gray

D_leaf_colors = {"attr_1": dflt_col,

                 "attr_4": "#B061FF", # Cluster 1 indigo

                 "attr_5": "#B061FF",

                 "attr_2": "#B061FF",

                 "attr_8": "#B061FF",

                 "attr_6": "#B061FF",

                 "attr_7": "#B061FF",

                 "attr_0": "#61ffff", # Cluster 2 cyan

                 "attr_3": "#61ffff",

                 "attr_9": "#61ffff",

                 }

link_cols = {}

for i, i12 in enumerate(Z[:,:2].astype(int)):

  c1, c2 = (link_cols[x] if x > len(Z) else D_leaf_colors["attr_%d"%x]

    for x in i12)

  link_cols[i+1+len(Z)] = c1 if c1 == c2 else dflt_col

# Dendrogram

D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None,

  leaf_font_size=12, leaf_rotation=45, link_color_func=lambda x: link_cols[x])

Hope this answer helps you! Study Scipy Tutorial for more details on this. To master this course, you can go through Python Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...