I am facing the following problem: I need to rewrite Matlab code into Pandas.
The problem is as follows: I have height difference data. Based on a rolling window, I determined a moving average and std of the height difference. When the height diff data of a row is greater than the moving average + 2*std, then it will be considered a 'peak' (which I need to identify). The reason is, a peak could identify a mounting point, which are not given in the dataset. So far, so good.
Now comes the harder part which I cannot solve: There can be multiple peaks near each other. When a peak is within 10 indices (1 index / row = 0.25 meter, hence when a peak is within 2.5 meters) of another peak, then the peaks need to be 'merged': only the peak with the biggest height diff needs to be kept. If the peak is not surrounded by another peak within 10 indices, then just that value is kept as the highest mounting point.
Another solution could be to assign that biggest height diff and index to the surrounding peaks.
I tried something with idxmax() of a rolling window, which didn't work. Then I tried the following, but still cannot figure it out.
First, I tried to convert the index to a column. Then I filtered the dataframe where heightdiff_peak == True Then I calculated the difference with the next index. And tried to get the max value of to rows where current row where the difference is less than 10. However this does not give the right solution.
The dataframe looks as follows:
df:
Location abs_diff_height heightdiff_peak index difference_next_index
277 9.00 4.000000 True 277 1.0
278 9.25 5.000000 True 278 74.0
352 27.75 6.900000 True 352 39.0
391 37.50 6.000000 True 391 169.0
560 79.75 6.000000 True 560 1.0
561 80.00 5.900000 True 561 1.0
562 80.25 5.900000 True 562 1.0
563 80.50 8.900000 True 563 1.0
564 80.75 9.900000 True 564 1.0
565 81.00 10.900000 True 565 1.0
566 81.25 13.900000 True 566 1.0
I tried the following code, but it doesn't work.
def get_max_value(df):
return df.assign(
max_diff_height = lambda df: np.where(df['difference_next_index']<10,
df['abs_diff_height'].rolling(2).max().shift(1),
df['abs_diff_height'])
)
I also tried something like:
df[['highest_peak']].rolling(20, center=True).apply(lambda s: s.idxmax(), raw=False)
However, this only results in NaNs.
The matlab code is:
%% Snap multiple detections in a row to the highest point of that peak.
% Initialise variables based on first detection value
x=2;
Remember=PeakIndexT(1);
PeakIndex=PeakIndexT(1);
PeakValue=Dataset(PeakIndexT(1));
while x<=length(PeakIndexT)
if PeakIndexT(x)-Remember>10 % If there is more then 10 points (2.5 meters) difference between this and previous detection identify this one as a new one
PeakIndex=[PeakIndex,PeakIndexT(x)];
PeakValue=[PeakValue,Dataset(PeakIndexT(x))];
else % Else merge the detections and use the highest absolute value as the detection peak
if PeakValue(end)<Dataset(PeakIndexT(x))
PeakValue(end)=Dataset(PeakIndexT(x));
PeakIndex(end)=PeakIndexT(x);
end
end
Remember=PeakIndexT(x); % Store previous value for reference in loop
x=x+1;
end
The result I expect is the max_value and the index.
df:
Location abs_diff_height heightdiff_peak index difference_next_index max_value index_max_value
277 9.00 4.000000 True 277 1.0 5.0 278
278 9.25 5.000000 True 278 74.0 5.0 278
352 27.75 6.900000 True 352 39.0 6.9 352
391 37.50 6.000000 True 391 169.0 6.0 591
560 79.75 6.000000 True 560 1.0 13.9 566
561 80.00 5.900000 True 561 1.0 13.9 566
562 80.25 5.900000 True 562 1.0 13.9 566
563 80.50 8.900000 True 563 1.0 13.9 566
564 80.75 9.900000 True 564 1.0 13.9 566
565 81.00 10.900000 True 565 1.0 13.9 566
566 81.25 13.900000 True 566 1.0 13.9 566