I have a set of oil wells compiled in the panda's data frame. It looks like this:
wells = pd.DataFrame({'date':['2019-01-01', '2019-01-01','2019-01-01', '2019-01-01','2019-01-01', '2019-01-01'],
'well': ['FIELDX-1D', 'FIELDX-2D', 'FIELDY-1C', 'FIELDY-1B', 'FIELDZ-1A', 'FIELDZ-1J'],
'oil_mmbd':[3.1, 5.6, 9.8, 2.5, 4.7, 6.1]})
date well oil_mmbd
0 2019-01-01 FIELDX-1D 3.1
1 2019-01-01 FIELDX-2D 5.6
2 2019-01-01 FIELDY-1C 9.8
3 2019-01-01 FIELDY-1B 2.5
4 2019-01-01 FIELDZ-1A 4.7
5 2019-01-01 FIELDZ-1J 6.1
What I am trying to achieve is to create a new variable (ie 'Field') that compiles the aggregate output of each field. For this, I need to get rid of the last part of each well's name; but I am not able to find the solution for this using Python and Pandas.
I want to get to the table that looks like this:
wells_agg = pd.DataFrame({'date':['2019-01-01', '2019-01-01','2019-01-01'],
'field': ['FIELDX', 'FIELDY', 'FIELDZ'],
'oil_mmbd':[8.7, 12.3, 10.8]})
date field oil_mmbd
0 2019-01-01 FIELDX 8.7
1 2019-01-01 FIELDY 12.3
2 2019-01-01 FIELDZ 10.8