Yes, the pd.pivot_table() function in Pandas can be useful for generating a table of correlation coefficients between categorical variables. However, since correlation coefficients like Pearson, Kendall, or Spearman are not applicable to categorical variables, you cannot directly compute them using DataFrame.corr() or pd.pivot_table().
To compute the correlation between categorical variables, you can use the chi-square test of independence. Here's an approach you can follow:
Create a contingency table using pd.crosstab() to count the occurrences of each combination of categorical variables.
contingency_table = pd.crosstab(df['cat1'], df['cat2'])
Apply the chi-square test of independence using scipy.stats.chi2_contingency() to obtain the chi-square statistic, p-value, and other relevant information.
from scipy.stats import chi2_contingency
chi2, p_value, _, _ = chi2_contingency(contingency_table)
You can convert the chi-square statistic into a measure of association like Cramer's V for a better understanding of the strength of the relationship between the variables.
n = contingency_table.sum().sum()
phi_c = np.sqrt(chi2 / (n * min(contingency_table.shape) - 1))
Create a new DataFrame to store the correlation coefficients.
correlation_df = pd.DataFrame(data=np.zeros_like(contingency_table.values), columns=contingency_table.columns, index=contingency_table.index)
Fill the DataFrame with the correlation coefficients.
correlation_df.iloc[:, :] = phi_c
The resulting correlation_df DataFrame will contain the correlation coefficients between the categorical variables based on the chi-square test of independence.
Note: This method assumes that each categorical variable has more than two distinct categories. If a categorical variable has only two categories, you might consider using other measures like point biserial correlation or the tetrachoric correlation coefficient.
I hope this helps you visualize the correlation between your categorical variables using a chi-square-based approach!