Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

To filter a dataframe (df) by a single column, if we consider data with male and females we might:

males = df[df[Gender]=='Male']

Question 1 - But what if the data spanned multiple years and i wanted to only see males for 2014?

In other languages I might do something like:

if A = "Male" and if B = "2014" then 

(except I want to do this and get a subset of the original dataframe in a new dataframe object)

Question 2. How do I do this in a loop, and create a dataframe object for each unique sets of year and gender (i.e. a df for: 2013-Male, 2013-Female, 2014-Male, and 2014-Female

for y in year:

for g in gender:

df = .....

2 Answers

0 votes
by (41.4k points)

Using & operator:

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

Using a for loop to store your dataframes in a dict:

from collections import defaultdict

dic={}

for g in ['male', 'female']:

  dic[g]=defaultdict(dict)

  for y in [2013, 2014]:

    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

0 votes
ago by (3.5k points)

Applying conditions to individual columns and combining them with value operators is the first way to filter a Pandas DataFrame by multiple columns. Here are a few ways to filter a DataFrame using multiple rows:

Using appropriate logic (&,,,~)

Logical operators can be used to combine two or more layers of conditions for filtering. Make sure that each condition is enclosed in parentheses as part of the previous operator. A': [1, 2, 3, 4, 5],

'B': ['a', 'b', 'c', 'd', 'e'],

'C' : [ 10, 20, 30, 40, 50]

})

# Filter by multiple conditions

filtered_df = df[(df['A'] > 2) & (df [' B' ] == 'c')]

print(filtered_df)

Using the query() method

Pandas also provides another function called query(), which allows filtering a DataFrame using string input.

print(filtered_df)

Using loc[] method

Using loc[] allows you to filter the rows and select specific rows. You can put multiple conditions in the loc[] method.

filtered_df = df.loc[(df['A'] > 2) & (df['B'] == 'c')]

print(filtered_df)

31k questions

32.9k answers

503 comments

693 users

Browse Categories

...