Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Python by (19.9k points)

I am facing the following problem: I need to rewrite Matlab code into Pandas.

The problem is as follows: I have height difference data. Based on a rolling window, I determined a moving average and std of the height difference. When the height diff data of a row is greater than the moving average + 2*std, then it will be considered a 'peak' (which I need to identify). The reason is, a peak could identify a mounting point, which are not given in the dataset. So far, so good.

Now comes the harder part which I cannot solve: There can be multiple peaks near each other. When a peak is within 10 indices (1 index / row = 0.25 meter, hence when a peak is within 2.5 meters) of another peak, then the peaks need to be 'merged': only the peak with the biggest height diff needs to be kept. If the peak is not surrounded by another peak within 10 indices, then just that value is kept as the highest mounting point.

Another solution could be to assign that biggest height diff and index to the surrounding peaks.

I tried something with idxmax() of a rolling window, which didn't work. Then I tried the following, but still cannot figure it out.

First, I tried to convert the index to a column. Then I filtered the dataframe where heightdiff_peak == True Then I calculated the difference with the next index. And tried to get the max value of to rows where current row where the difference is less than 10. However this does not give the right solution.

The dataframe looks as follows:

df:

    Location    abs_diff_height heightdiff_peak index   difference_next_index

277 9.00    4.000000    True    277 1.0

278 9.25    5.000000    True    278 74.0

352 27.75   6.900000    True    352 39.0

391 37.50   6.000000    True    391 169.0

560 79.75   6.000000    True    560 1.0

561 80.00   5.900000    True    561 1.0

562 80.25   5.900000    True    562 1.0

563 80.50   8.900000    True    563 1.0

564 80.75   9.900000    True    564 1.0

565 81.00   10.900000   True    565 1.0

566 81.25   13.900000   True    566 1.0

I tried the following code, but it doesn't work.

def get_max_value(df):

    return df.assign(

    max_diff_height = lambda df: np.where(df['difference_next_index']<10,

                                          df['abs_diff_height'].rolling(2).max().shift(1),

                                          df['abs_diff_height'])

    )

I also tried something like:

df[['highest_peak']].rolling(20, center=True).apply(lambda s: s.idxmax(), raw=False)

However, this only results in NaNs.

The matlab code is:

%% Snap multiple detections in a row to the highest point of that peak.

% Initialise variables based on first detection value

x=2;

Remember=PeakIndexT(1);                                          

PeakIndex=PeakIndexT(1);

PeakValue=Dataset(PeakIndexT(1));

while x<=length(PeakIndexT)

    if PeakIndexT(x)-Remember>10                        % If there is more then 10 points (2.5 meters) difference between this and previous detection identify this one as a new one

        PeakIndex=[PeakIndex,PeakIndexT(x)];

        PeakValue=[PeakValue,Dataset(PeakIndexT(x))];

    else                                                % Else merge the detections and use the highest absolute value as the detection peak

        if PeakValue(end)<Dataset(PeakIndexT(x))

            PeakValue(end)=Dataset(PeakIndexT(x));

            PeakIndex(end)=PeakIndexT(x);

        end

    end

    Remember=PeakIndexT(x);                             % Store previous value for reference in loop

    x=x+1;

end

The result I expect is the max_value and the index.

df:
    Location    abs_diff_height heightdiff_peak index   difference_next_index  max_value  index_max_value
277 9.00    4.000000    True    277 1.0     5.0 278 
278 9.25    5.000000    True    278 74.0    5.0 278
352 27.75   6.900000    True    352 39.0    6.9     352
391 37.50   6.000000    True    391 169.0   6.0     591
560 79.75   6.000000    True    560 1.0     13.9    566
561 80.00   5.900000    True    561 1.0     13.9    566
562 80.25   5.900000    True    562 1.0     13.9    566
563 80.50   8.900000    True    563 1.0     13.9    566
564 80.75   9.900000    True    564 1.0     13.9    566
565 81.00   10.900000   True    565 1.0     13.9    566
566 81.25   13.900000   True    566 1.0     13.9    566

1 Answer

0 votes
by (25.1k points)

You need to first use groupby

s = df.difference_next_index.shift().gt(10)

df['index_max_value'] = (df.abs_diff_height                     

                           .groupby([s,s.cumsum()])

                      .transform('idxmax')

                    )

 

Then you can get the values by:

df['max_value'] = df.loc[df['index_max_value'],'abs_diff_height']

Browse Categories

...