How to remove the last part of each entry in a column based on a pattern?

Question

asked Aug 12, 2020 in Data Science by blackindya (18.4k points)

I have a set of oil wells compiled in the panda's data frame. It looks like this:

wells = pd.DataFrame({'date':['2019-01-01', '2019-01-01','2019-01-01', '2019-01-01','2019-01-01', '2019-01-01'],
'well': ['FIELDX-1D', 'FIELDX-2D', 'FIELDY-1C', 'FIELDY-1B', 'FIELDZ-1A', 'FIELDZ-1J'],
'oil_mmbd':[3.1, 5.6, 9.8, 2.5, 4.7, 6.1]})
date well oil_mmbd
0 2019-01-01 FIELDX-1D 3.1
1 2019-01-01 FIELDX-2D 5.6
2 2019-01-01 FIELDY-1C 9.8
3 2019-01-01 FIELDY-1B 2.5
4 2019-01-01 FIELDZ-1A 4.7
5 2019-01-01 FIELDZ-1J 6.1

What I am trying to achieve is to create a new variable (ie 'Field') that compiles the aggregate output of each field. For this, I need to get rid of the last part of each well's name; but I am not able to find the solution for this using Python and Pandas.

I want to get to the table that looks like this:

wells_agg = pd.DataFrame({'date':['2019-01-01', '2019-01-01','2019-01-01'],
'field': ['FIELDX', 'FIELDY', 'FIELDZ'],
'oil_mmbd':[8.7, 12.3, 10.8]})
date field oil_mmbd
0 2019-01-01 FIELDX 8.7
1 2019-01-01 FIELDY 12.3
2 2019-01-01 FIELDZ 10.8

1 Answer

supriya · Answer 1 · 2020-08-12T10:43:45+0000

You can strip the suffix of a well column using wells.well.str.split('-').str[0]). Therefore instead of wells.groupby('well'), use wells.groupby(wells.well.str.split('-').str[0])). See solution below with this in mind.

wells.groupby(['date',wells.well.str.split('-')\
.str[0]]).oil_mmbd.sum().reset_index()
date well oil_mmbd
0 2019-01-01 FIELDX 8.7
1 2019-01-01 FIELDY 12.3
2 2019-01-01 FIELDZ 10.8

If you want to know more about the Data Science then do check out the following Data Science which will help you in understanding Data Science from scratch

How to remove the last part of each entry in a column based on a pattern?

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources