Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)
edited by

I am starting using pandas data frames and I am stuck at this point. I am trying to do image segmentation from real-time traffic images. Thus, I need to order information in a proper way, basically, I have two CSV files with N rows as follows:

File1.csv

Id  Cam_id  Image                                            Timestamp

0   1501      2020-06-29T16:20:57+08:00

1   1502      2020-06-29T16:20:57+08:00

2   1503      2020-06-29T16:20:57+08:00

...

File2.csv 

Id  Detection_Class  Detection_Score

0      3            0.9345

1      82           0.9016

2      73           0.1456

0      3            0.9283

1      1            0.8499

2      1            0.4658

3      3            0.9944

4      1            0.3422

5      3            0.2174

...

Every time when Id counter of my File2.csv starts again with 0 it means that it is counting objects from a new image of my File1.csv in the image column.

What I am trying to achieve is to merge files in a way that I can get a list of the column values from Detection_Class and Detection_Score. Then, add them to cells in two new columns as below:

Id  Cam_id  Image                                            Timestamp           Detection_Class  Detection_Score

0   1501      2020-06-29T16:20:57+08:00     [3,82,73]       [0.9345,0.9016,0.1456]

1   1502      2020-06-29T16:20:57+08:00     [3,1,1,3,1,3]   [0.9283,0.8499,0.4658,0.9944,0.3422,0.2174]

...

How I can accomplish this?

1 Answer

0 votes
by (36.8k points)
edited by

You can try the cumsum() to group Id blocks in File2.csv, then merge and the groupby:

(df2.assign(Id=df2.Id.eq(0).cumsum()-1)

    .merge(df1, on='Id')

    .groupby('Id')

    .agg({'Cam_id':'first','Image':'first','Timestamp':'first',          

          'Detection_Class':list, 'Detection_Score':list})

    .reset_index()

)

Output:

Id    Cam_id  Image                                   Timestamp                  Detection_Class     Detection_Score

--  --------  --------------------------------------  -------------------------  ------------------  ------------------------------------------------

 0      1501    2020-06-29T16:20:57+08:00  [3, 82, 73]         [0.9345, 0.9016, 0.1456]

 1      1502    2020-06-29T16:20:57+08:00  [3, 1, 1, 3, 1, 3]  [0.9283, 0.8499, 0.4658, 0.9944, 0.3422, 0.2174]

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

Browse Categories

...