I want to append (merge) all the csv files in a folder using Python pandas.
For example: Say folder has two csv files test1.csv and test2.csv as follows:
A_Id P_Id CN1 CN2 CN3
AAA 111 702 709 740
BBB 222 1727 1734 1778
and
A_Id P_Id CN1 CN2 CN3
CCC 333 710 750 750
DDD 444 180 734 778
So the python script I wrote was as follows:
#!/usr/bin/python
import pandas as pd
import glob
all_data = pd.DataFrame()
for f in glob.glob("testfolder/*.csv"):
df = pd.read_csv(f)
all_data = all_data.append(df)
all_data.to_csv('testfolder/combined.csv')
Though the combined.csv seems to have all the appended rows, it looks as follows:
CN1 CN2 CN3 A_Id P_Id
0 710 750 750 CCC 333
1 180 734 778 DDD 444
0 702 709 740 AAA 111
1 1727 1734 1778 BBB 222
Whereas it should look like this:
A_ID P_Id CN1 CN2 CN2
AAA 111 702 709 740
BBB 222 1727 1734 1778
CCC 333 110 356 123
DDD 444 220 256 223
Why are the first two columns moved to the end?
Why is it appending in the first line rather than at the last line?
What am I missing? And how can I get of 0s and 1s in the first column?
P.S: Since these are large csv files, I thought of using pandas.