Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

If columns are the same then I want to merge the rows. To do it I am using grouby command then replace the value of the column based on the condition given. Is there any other way better than this. Any suggestion is appreciated 

This is the code:

data={'Name': {0: 'Sam', 1: 'Amy', 2: 'Cat', 3: 'Sam', 4: 'Kathy'},

 'Subject1': {0: 'Math', 1: 'Science', 2: 'Art', 3: np.nan, 4: 'Science'},

 'Subject2': {0: np.nan, 1: np.nan, 2: np.nan, 3: 'English', 4: np.nan},

 'Result': {0: 'Pass', 1: 'Pass', 2: 'Fail', 3: 'TBD', 4: 'Pass'}}

df=pd.DataFrame(data)

df=df.groupby('Name').agg({

 'Subject1': 'first',

 'Subject2': 'first',

 'Result': ', '.join}).reset_index()

df['Result']=df['Result'].apply(lambda x: 'RESULT_FAILED' if x=='Pass, TBD'  else x )

data frame looks like this:

    Name    Subject1    Subject2    Result

0   Sam     Math    NaN     Pass

1   Amy     Science     NaN     Pass

2   Cat     Art     NaN     Fail

3   Sam     NaN     English     TBD

4   Kathy   Science     NaN     Pass

1 Answer

0 votes
by (36.8k points)

In your sample, you have unique names that contain no NAN subjectX values. That is your SubjectX column has a single non-NAN value for duplicating the Name. So you can use the code below for the solution:

import numpy as np

df_final = (df.fillna('').groupby('Name', as_index=False).agg(''.join)

                         .replace({'':np.nan, 'PassTBD': 'RESULT_FAILED'}))

Out[16]:

    Name Subject1 Subject2         Result

0    Amy  Science      NaN           Pass

1    Cat      Art      NaN           Fail

2  Kathy  Science      NaN           Pass

3    Sam     Math  English  RESULT_FAILED

 If you are a beginner and want to know more about Data Science the do check out the Data Science course

Browse Categories

...