0 votes
1 view
in Data Science by (17.6k points)

I am doing a Kaggle tutorial for Titanic using the Datacamp platform.

I understand the use of .loc within Pandas - to select values by row using column labels...

My confusion comes from the fact that in the Datacamp tutorial, we want to locate all the "Male" inputs within the "Sex" column, and replace it with the value of 0. They use the following piece of code to do it:

titanic.loc[titanic["Sex"] == "male", "Sex"] = 0

Can someone please explain how this works? I thought .loc took inputs of row and column, so what is the == for?

Shouldn't it be:

titanic.loc["male", "Sex"] = 0

Thanks!

1 Answer

0 votes
by (38.2k points)

If the condition is True it sets  column Sex to 1, and other values are unchanged.

titanic["Sex"] == "male"

So, let’s have a look at the sample:

titanic = pd.DataFrame({'Sex':['male','female', 'male']})

print (titanic)

      Sex

0    male

1  female

2    male

 

print (titanic["Sex"] == "male")

0     True

1    False

2     True

Name: Sex, dtype: bool

titanic.loc[titanic["Sex"] == "male", "Sex"] = 0

print (titanic)

0       0

1  female

2       0

It is  similar to boolean indexing with loc - it select only the values of column Sex by condition:

 

print (titanic.loc[titanic["Sex"] == "male", "Sex"])

0    male

2    male

Name: Sex, dtype: object

If only male and female values need to be converted to some another values then use map:

titanic = pd.DataFrame({'Sex':['male','female', 'male']})

titanic["Sex"] = titanic["Sex"].map({'male':0, 'female':1})

print (titanic)

   Sex

0    0

1    1

2    0

Related questions

0 votes
1 answer
0 votes
1 answer
asked Sep 23, 2019 in Python by Sammy (47.8k points)
0 votes
1 answer
Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...