Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

I have a set of oil wells compiled in the panda's data frame. It looks like this:

wells = pd.DataFrame({'date':['2019-01-01', '2019-01-01','2019-01-01', '2019-01-01','2019-01-01', '2019-01-01'],

                      'well': ['FIELDX-1D', 'FIELDX-2D', 'FIELDY-1C', 'FIELDY-1B', 'FIELDZ-1A', 'FIELDZ-1J'],

                      'oil_mmbd':[3.1, 5.6, 9.8, 2.5, 4.7, 6.1]})

         date       well  oil_mmbd

0  2019-01-01  FIELDX-1D       3.1

1  2019-01-01  FIELDX-2D       5.6

2  2019-01-01  FIELDY-1C       9.8

3  2019-01-01  FIELDY-1B       2.5

4  2019-01-01  FIELDZ-1A       4.7

5  2019-01-01  FIELDZ-1J       6.1

What I am trying to achieve is to create a new variable (ie 'Field') that compiles the aggregate output of each field. For this, I need to get rid of the last part of each well's name; but I am not able to find the solution for this using Python and Pandas.

I want to get to the table that looks like this:

wells_agg = pd.DataFrame({'date':['2019-01-01', '2019-01-01','2019-01-01'],

                      'field': ['FIELDX', 'FIELDY', 'FIELDZ'],

                      'oil_mmbd':[8.7, 12.3, 10.8]})

         date   field  oil_mmbd

0  2019-01-01  FIELDX       8.7

1  2019-01-01  FIELDY      12.3

2  2019-01-01  FIELDZ      10.8

1 Answer

0 votes
by (36.8k points)

You can strip the suffix of a well column using wells.well.str.split('-').str[0]). Therefore instead of wells.groupby('well'), use wells.groupby(wells.well.str.split('-').str[0])). See solution below with this in mind.

wells.groupby(['date',wells.well.str.split('-')\

                     .str[0]]).oil_mmbd.sum().reset_index()

     date    well  oil_mmbd

0  2019-01-01  FIELDX       8.7

1  2019-01-01  FIELDY      12.3

2  2019-01-01  FIELDZ      10.8

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

Browse Categories

...