Pandas merge two dataframes with different columns

Question

asked Jul 31, 2019 in Data Science by sourav (17.6k points)

I'm surely missing something simple here. Trying to merge two dataframes in pandas that have mostly the same column names, but the right dataframe has some columns that the left doesn't have, and vice versa.

>df_may
id quantity attr_1 attr_2
0 1 20 0 1
1 2 23 1 1
2 3 19 1 1
3 4 19 0 0
>df_jun
id quantity attr_1 attr_3
0 5 8 1 0
1 6 13 0 1
2 7 20 1 1
3 8 25 1 1

I've tried joining with an outer join:

mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer")

But that yields:

Left data columns not unique: Index([....

I've also specified a single column to join on (on = "id", e.g.), but that duplicates all columns except "id" like attr_1_x, attr_1_y, which is not ideal. I've also passed the entire list of columns (there are many) to "on":

mayjundf = pd.DataFrame.merge(df_may, df_jun, how="outer", on=list(df_may.columns.values))

Which yields:

ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

What am I missing? I'd like to get a df with all rows appended, and attr_1, attr_2, attr_3 populated where possible, NaN where they don't show up. This seems like a pretty typical workflow for data munging, but I'm stuck.

Thanks in advance.

1 Answer

Shlok Pandey · Answer 1 · 2019-08-01T09:19:36+0000

Use concat to merge two data frames with different columns:

pd.concat([df,df1], axis=0, ignore_index=True)

This will give the output:

attr_1 attr_2 attr_3 id quantity
0 0 1 NaN 1 20
1 1 1 NaN 2 23
2 1 1 NaN 3 19
3 0 0 NaN 4 19
4 1 NaN 0 5 8
5 0 NaN 1 6 13
6 1 NaN 1 7 20
7 1 NaN 1 8 25

If you wish to learn more about Data Science, then go through this Data Science tutorial for more insights.

Pandas merge two dataframes with different columns

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources