Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)
edited by

I am starting using pandas data frames and I am stuck at this point. I am trying to do image segmentation from real-time traffic images. Thus, I need to order information in a proper way, basically, I have two CSV files with N rows as follows:

File1.csv

Id  Cam_id  Image                                            Timestamp

0   1501      2020-06-29T16:20:57+08:00

1   1502      2020-06-29T16:20:57+08:00

2   1503      2020-06-29T16:20:57+08:00

...

File2.csv 

Id  Detection_Class  Detection_Score

0      3            0.9345

1      82           0.9016

2      73           0.1456

0      3            0.9283

1      1            0.8499

2      1            0.4658

3      3            0.9944

4      1            0.3422

5      3            0.2174

...

Every time when Id counter of my File2.csv starts again with 0 it means that it is counting objects from a new image of my File1.csv in the image column.

What I am trying to achieve is to merge files in a way that I can get a list of the column values from Detection_Class and Detection_Score. Then, add them to cells in two new columns as below:

Id  Cam_id  Image                                            Timestamp           Detection_Class  Detection_Score

0   1501      2020-06-29T16:20:57+08:00     [3,82,73]       [0.9345,0.9016,0.1456]

1   1502      2020-06-29T16:20:57+08:00     [3,1,1,3,1,3]   [0.9283,0.8499,0.4658,0.9944,0.3422,0.2174]

...

How I can accomplish this?

1 Answer

0 votes
by (36.8k points)
edited by

You can try the cumsum() to group Id blocks in File2.csv, then merge and the groupby:

(df2.assign(Id=df2.Id.eq(0).cumsum()-1)

    .merge(df1, on='Id')

    .groupby('Id')

    .agg({'Cam_id':'first','Image':'first','Timestamp':'first',          

          'Detection_Class':list, 'Detection_Score':list})

    .reset_index()

)

Output:

Id    Cam_id  Image                                   Timestamp                  Detection_Class     Detection_Score

--  --------  --------------------------------------  -------------------------  ------------------  ------------------------------------------------

 0      1501    2020-06-29T16:20:57+08:00  [3, 82, 73]         [0.9345, 0.9016, 0.1456]

 1      1502    2020-06-29T16:20:57+08:00  [3, 1, 1, 3, 1, 3]  [0.9283, 0.8499, 0.4658, 0.9944, 0.3422, 0.2174]

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

31k questions

32.9k answers

503 comments

693 users

Browse Categories

...