Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

I have 2 dataframes, one of which has supplemental information for some (but not all) of the rows in the other.

names = df({'names':['bob','frank','james','tim','ricardo','mike','mark','joan','joe'],

            'position':['dev','dev','dev','sys','sys','sys','sup','sup','sup']})

info = df({'names':['joe','mark','tim','frank'],

           'classification':['thief','thief','good','thief']})

I would like to take the classification column from the info dataframe above and add it to the names dataframe above. However, when I do combined = pd.merge(names, info) the resulting dataframe is only 4 rows long. All of the rows that do not have supplemental info are dropped.

Ideally, I would have the values in those missing columns set to unknown. Resulting in a dataframe where some people are theives, some are good, and the rest are unknown.

EDIT: One of the first answers I received suggested using merge outter which seems to do some weird things. Here is a code sample:

names = df({'names':['bob','frank','bob','bob','bob''james','tim','ricardo','mike','mark','joan','joe'],

            'position':['dev','dev','dev','dev','dev','dev''sys','sys','sys','sup','sup','sup']})

info = df({'names':['joe','mark','tim','frank','joe','bill'],

           'classification':['thief','thief','good','thief','good','thief']})

what = pd.merge(names, info, how="outer")

what.fillna("unknown")

The strange thing is that in the output I'll get a row where the resulting name is "bobjames" and another where position is "devsys". Finally, even though bill does not appear in the names dataframe it shows up in the resulting dataframe. So I really need a way to say lookup a value in this other dataframe and if you find something tack on those columns.

1 Answer

0 votes
by (41.4k points)

I think you want to perform an outer merge:

In [60]:

pd.merge(names, info, how='outer')

Out[60]:

     names position classification

0      bob      dev            NaN

1    frank      dev          thief

2    james      dev            NaN

3      tim      sys           good

4  ricardo      sys            NaN

5     mike      sys            NaN

6     mark      sup          thief

7     joan      sup            NaN

8      joe      sup          thief

If you wish to learn more about Pandas visit this Pandas Tutorial.

Related questions

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

Browse Categories

...