Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Data Science by (18.4k points)

I am new to data science and I am trying to build a pia chart using the matlplot package. Which shows the percentage of traffic on the web sites of a particular model. For example 10% iPhone 50%ipad and 40% Mac etc.

useragent count

iPhone    11298

Mac        3206

iPad        627

SM-N960F    433

SM-N950F    430

...         ...

K330          1

K220          1

SM-J737P      1

SM-J737T1     1

0PFJ50        1

[1991 rows x 2 columns]

If You see the dataset it consists of 1991 rows, but I don't want all of them to be displayed. I want only the first 5 user agents to be displayed.

The desired output should be something like this:

useragent  count

iPhone     11298

Mac         3206

iPad        627

SM-N960F    433

Others     9000

Can anyone help me achieve my output?

1 Answer

0 votes
by (36.8k points)
edited by

Use the code below: 

#first sorting data if necessary

df1 = df.sort_values('count', ascending=False)

#then get top 4 rows

df2 = df1.head(4)

#filter column `count` for all values after 4 rows

summed = df1.loc[df1.index[4, 'count'].sum()

#create DataFrame by another counts

df3 = pd.DataFrame({'useragent':['Other'], 'count':[summed]})

#join together

df4 = pd.concat([df2, df3], sort=False, ignore_index=True)

print (df4)

  useragent  count

0    iPhone  11298

1       Mac   3206

2      iPad    627

3  SM-N960F    433

4     Other    435

EDITED:

#filter by threshold

mask = df['count'] > 500

#filtered rows by boolean indexing

df2 = df[mask]

#inverted mask - sum by count

summed = df.loc[~mask, 'count'].sum()

#same like above

df3 = pd.DataFrame({'useragent':['Other'], 'count':[summed]})

df5 = pd.concat([df2, df3], sort=False, ignore_index=True)

print (df5)

  useragent  count

0    iPhone  11298

1       Mac   3206

2      iPad    627

3     Other    868

Improve your knowledge in data science from scratch using Data Science tutorial 

Browse Categories

...