Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (50.2k points)

Suppose I have two DataFrames like so:

left = pd.DataFrame({'key1': ['foo', 'bar'], 'lval': [1, 2]})

right = pd.DataFrame({'key2': ['foo', 'bar'], 'rval': [4, 5]})

I want to merge them, so I try something like this:

pd.merge(left, right, left_on='key1', right_on='key2')

And I'm happy

     key1    lval key2    rval

0   foo     1 foo     4

1   bar     2 bar     5

But I'm trying to use the join method, which I've been lead to believe is pretty similar.

left.join(right, on=['key1', 'key2'])

And I get this:

 //anaconda/lib/python2.7/site-packages/pandas/tools/merge.pyc in _validate_specification(self)

    406             if self.right_index:

    407                 if not ((len(self.left_on) == self.right.index.nlevels)):

--> 408                     raise AssertionError()

    409                 self.right_on = [None] * n

    410         elif self.right_on is not None:

AssertionError: 

What am I missing?

1 Answer

0 votes
by (108k points)

Here are the main differences between df.join() and df.merge():

  • lookup on right table: df1.join(df2) always joins with the help of the index of df2, but df1.merge(df2) can join to one or more columns of df2 (default) or to the index of df2 (with right_index=True).

  • lookup on left table: By default, df1.join(df2) utilizes the index of df1 and df1.merge(df2) uses column(s) of df1. That can be overridden by stating df1.join(df2, on=key_or_keys) or df1.merge(df2, left_index=True).

  • left vs inner join: df1.join(df2) does a left join by default (keeps all rows of df1), but df.merge does an inner join by default (returns only matching rows of df1 and df2).

If you want to learn more about Pandas then visit this Python Course designed by the industrial experts.

Related questions

Browse Categories

...