I have the below dataset
Labels Usernames
1 Londonderry
1 Londoncalling
1 Steveonder43
0 Maryclare_re
1 Patent107391
0 Anonymous
1 _24londonqr
...
I am trying to show there is a correlation between usernames containing the word London and label 1. To do it, I created the second label to see where the word London was
for idx, username in df['Usernames']:
if 'London' in username:
df['London'].iloc[idx] = 1
else:
df['London'].iloc[idx] = 0
Then I compared these binary variables, using the Pearson correlation coefficient:
import scipy.stats.pearsonr as rho
corr = rho(df['labels'], df['London'])
However, it is not working. Am I missing something?