Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I am using a census dataset which I got it from the internet and trying to manipulate the dataset.

This is the dataset and the way I am extracting it:

census_df = df = pd.read_csv('https://raw.githubusercontent.com/Qian-Han/coursera-Applied-Data-Science-with-Python/master/Introduction-to-Data-Science-in-Python/original_data/census.csv')

sortedit = census_df.sort_values(by = ['STNAME','CENSUS2010POP'],ascending=False)

Hear in the above code I have ordered 'CENSUS2010POP' column in descending order and also ordered my dataset by using the column state in alphabetical order hence I have used 'STNAME' 

I am able to get the desired output until now.

But I further want to manipulate it by extracting only the first 3 highest values of the 'CENSUS2010POP'  for each 'STNAME' 

To achieve it should I do 146x3  rows in the new data frame. Can anyone help me solve it?

1 Answer

0 votes
by (36.8k points)

You can use this below code to achieve your desired output.

df = census_df.groupby(["STNAME"]).apply(lambda x: x.sort_values(["CENSUS2010POP"], ascending = False)).reset_index(drop=True)

df.groupby('STNAME').head(3)[['STNAME','CENSUS2010POP']]

In the above code, the first line gives the sorted CENSUS2010POP of each STNAME.

the second line gives the top 3 records 

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

Related questions

Browse Categories

...