Back

Explore Courses Blog Tutorials Interview Questions
0 votes
4 views
in Python by (12.7k points)

I have a Dataset that is made of 22 Categorical variables (non-requested). I might want to picture their correlation in a decent heatmap. Since the Pandas built in function

DataFrame.corr(method='pearson', min_periods=1)

just execute correlation coefficients for mathematical factors (Pearson, Kendall, Spearman), I need to total it myself to play out a chi-square or something like it and I am not exactly sure what function use to do it in one exquisite advance (as opposed to emphasizing through all the cat1*cat2 sets). Honestly, this is the thing that I might want to wind up with (a dataframe):

       cat1  cat2  cat3  

  cat1|  coef  coef  coef  

  cat2|  coef  coef  coef

  cat3|  coef  coef  coef

Do you have any idea about pd.pivot_table?

Thanks in advance

1 Answer

0 votes
by (26.4k points)

You can try to utilize pd.factorize

df.apply(lambda x : pd.factorize(x)[0]).corr(method='pearson', min_periods=1)

Out[32]: 

     a    c    d

a  1.0  1.0  1.0

c  1.0  1.0  1.0

d  1.0  1.0  1.0

Data input:

df=pd.DataFrame({'a':['a','b','c'],'c':['a','b','c'],'d':['a','b','c']})

Update:

from scipy.stats import chisquare

df=df.apply(lambda x : pd.factorize(x)[0])+1

pd.DataFrame([chisquare(df[x].values,f_exp=df.values.T,axis=1)[0] for x in df])

Out[123]: 

     0    1    2    3

0  0.0  0.0  0.0  0.0

1  0.0  0.0  0.0  0.0

2  0.0  0.0  0.0  0.0

3  0.0  0.0  0.0  0.0

df=pd.DataFrame({'a':['a','d','c'],'c':['a','b','c'],'d':['a','b','c'],'e':['a','b','c']})

Want to become an expert in Python? Join the python course fast!

 

Browse Categories

...