I took a section of a large DataFrame (called 'df') with .copy() (named 'df_copy') and applied certain functions to create a new column, 'Category'.
However, I also gave 'df' a column called 'Category', and assigned some other values to that column. All of the other values/columns in 'df_copy' are the same as their respective values/columns in 'df': the only difference is the 'Category' column.
For understanding,
Original DataFrames:
df is 100 rows with 3 columns.
df_copy is 5 rows from df, with the same columns.
After Processing:
df is 100 rows with 4 columns (new column is 'Category'), 5 of those rows have 'NaN' for the 'Category' column
df_copy is 5 rows with 4 columns, new column is 'Category' which has values that are not in df.
Basically, I want to replace the rows which I took from df (the ones that were in the original df_copy DataFrame) with the current, post-processing rows from df_copy.
I have tried different forms of merges:
left merge, don't specify 'on': Results in 'NA' for Category column of the rows which were originally copied into df_copy
right merge, don't specify 'on': Is the same as df_copy
left merge, on one column that did not change between df and df_copy (for example, "Number"): Every column is duplicated: "Number_x","Number_y","Category_x","Category_y".