I have the following dataframe:
import pandas as pd
df = pd.DataFrame({'var': ['A', 'A', 'B', 'B', 'C', 'C', 'C'],
'value': [1, 2, 1, 2, 3, 4, 5],
'input': [0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.3]})
I would like to keep the var for which the value is the highest by input and set the rest of the var to NA.
So I would like to end up with:
df = pd.DataFrame({'var': [np.nan, 'A', np.nan, 'B', np.nan, np.nan, 'C'],
'value': [1, 2, 1, 2, 3, 4, 5],
'input': [0.1, 0.1, 0.2, 0.2, 0.3, 0.3, 0.3]})
Any ideas ?