I am using a census dataset which I got it from the internet and trying to manipulate the dataset.

This is the dataset and the way I am extracting it:

census_df = df = pd.read_csv('')

sortedit = census_df.sort_values(by = ['STNAME','CENSUS2010POP'],ascending=False)

Hear in the above code I have ordered 'CENSUS2010POP' column in descending order and also ordered my dataset by using the column state in alphabetical order hence I have used 'STNAME' 

I am able to get the desired output until now.

But I further want to manipulate it by extracting only the first 3 highest values of the 'CENSUS2010POP'  for each 'STNAME' 

To achieve it should I do 146x3  rows in the new data frame. Can anyone help me solve it?

1 Answer

You can use this below code to achieve your desired output.

df = census_df.groupby(["STNAME"]).apply(lambda x: x.sort_values(["CENSUS2010POP"], ascending = False)).reset_index(drop=True)


In the above code, the first line gives the sorted CENSUS2010POP of each STNAME.

the second line gives the top 3 records 

