Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I took a section of a large DataFrame (called 'df') with .copy() (named 'df_copy') and applied certain functions to create a new column, 'Category'.

However, I also gave 'df' a column called 'Category', and assigned some other values to that column. All of the other values/columns in 'df_copy' are the same as their respective values/columns in 'df': the only difference is the 'Category' column.

For understanding,

Original DataFrames:

df is 100 rows with 3 columns.

df_copy is 5 rows from df, with the same columns.

After Processing:

df is 100 rows with 4 columns (new column is 'Category'), 5 of those rows have 'NaN' for the 'Category' column

df_copy is 5 rows with 4 columns, new column is 'Category' which has values that are not in df.

Basically, I want to replace the rows which I took from df (the ones that were in the original df_copy DataFrame) with the current, post-processing rows from df_copy.

I have tried different forms of merges:

left merge, don't specify 'on': Results in 'NA' for Category column of the rows which were originally copied into df_copy

right merge, don't specify 'on': Is the same as df_copy

left merge, on one column that did not change between df and df_copy (for example, "Number"): Every column is duplicated: "Number_x","Number_y","Category_x","Category_y".

1 Answer

0 votes
by (25.1k points)

You can do it with dataframes where method:

df = df.where(df['Category'].isna(), df_copy)

Browse Categories

...