Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Data Science by (17.6k points)

I'm working with an airbnb dataset on Kaggle:

https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings

and want to simplify the values for the language column into 2 groupings - english and non-english.

For instance:

users.language.value_counts()

en    15011

zh      101

fr       99

de       53

es       53

ko       43

ru       21

it       20

ja       19

pt       14

sv       11

no        6

da        5

nl        4

el        2

pl        2

tr        2

cs        1

fi        1

is        1

hu        1

Name: language, dtype: int64

And the result I want it is:

users.language.value_counts()

    english    15011

    non-english 459

    Name: language, dtype: int64

This is sort of the solution I want:

def language_groupings():

    for i in users:

        if users.language !='en':

            replace(users.language.str, 'non-english')

        else: 

            replace(users.language.str, 'english')

    return users

users['language'] = users.apply(lambda row: language_groupings)

Except there's obviously something wrong with this as it returns an empty series when I run value_counts on the column.

1 Answer

0 votes
by (41.4k points)

Try this line of code:

( users.assign(lang=np.where(users.language == 'en', 'english', 'non-english'))['lang'].value_counts() )

If you wish to know more about Python visit this  Python Course.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94.2k users

Browse Categories

...