Estimate correlation in Python

Question

asked Sep 5, 2020 in Data Science by blackindya (18.4k points)

I have the below dataset

Labels Usernames
1 Londonderry
1 Londoncalling
1 Steveonder43
0 Maryclare_re
1 Patent107391
0 Anonymous
1 _24londonqr
...

I am trying to show there is a correlation between usernames containing the word London and label 1. To do it, I created the second label to see where the word London was

for idx, username in df['Usernames']:
if 'London' in username:
df['London'].iloc[idx] = 1
else:
df['London'].iloc[idx] = 0

Then I compared these binary variables, using the Pearson correlation coefficient:

import scipy.stats.pearsonr as rho
corr = rho(df['labels'], df['London'])

However, it is not working. Am I missing something?

1 Answer

Related questions

0 votes

1 answer

Convert a correlation matrix to a covariance matrix in R?

asked Aug 9, 2020 in Data Science by blackindya (18.4k points)

0 votes

1 answer

what is correlation in data science

asked Mar 8, 2020 in Data Science by ashely (50.2k points)

+4 votes

1 answer

covariance vs correlation

asked Jul 30, 2019 in Data Science by Sammy (47.6k points)

0 votes

1 answer

How to get correlation of two vectors in python

asked Oct 15, 2019 in Python by Sammy (47.6k points)

0 votes

1 answer

Calculating Pearson correlation and significance in Python

asked Oct 3, 2019 in Python by Sammy (47.6k points)

supriya · Answer 1 · 2020-09-05T05:19:58+0000

You have gone wrong with the column name that is the reason you are getting the error. I have also enhanced the code:

df['London'] = df['Usernames'].str.contains('London').astype(int)
from scipy import stats
stats.pearsonr(df['Labels'], df['London'])
Out[12]: (0.4, 0.37393392381774704)

Do check out Data Science with Python course which helps you understand from scratch

Estimate correlation in Python

1 Answer

Related questions

Browse Categories