2 views

As part of an AI application I am toying with, I'd like to be able to rate a candidate array of integers based on its monotonicity, aka its "sortedness". At the moment, I'm using a heuristic that calculates the longest sorted run, and then divides that by the length of the array:

public double monotonicity(int[] array) {

if (array.length == 0) return 1d;

int longestRun = longestSortedRun(array);

return (double) longestRun / (double) array.length;

}

public int longestSortedRun(int[] array) {

if (array.length == 0) return 0;

int longestRun = 1;

int currentRun = 1;

for (int i = 1; i < array.length; i++) {

if (array[i] >= array[i - 1]) {

currentRun++;

} else {

currentRun = 1;

}

if (currentRun > longestRun) longestRun = currentRun;

}

return longestRun;

}

This is a good start, but it fails to take into account the possibility that there may be "clumps" of sorted sub-sequences. E.g.:

``````{ 4, 5, 6, 0, 1, 2, 3, 7, 8, 9}
``````

This array is partitioned into three sorted sub-sequences. My algorithm will rate it as only 40% sorted, but intuitively, it should get a higher score than that. Is there a standard algorithm for this sort of thing?

by (108k points)

I would like to suggest looking at the Pancake Problem and the reversal distance of the permutations. These algorithms are often used to find the distance between two permutations (the Identity and the permuted string). This distance measure should take into account more clumps of in order values, as well as reversals (monotonically decreasing instead of increasing subsequences). You can also see the approximations that are polynomial time[PDF].

It depends on what the number means and if this distance function makes sense in your context though.

By treating this as the Pancake problem, if the array is sorted descending, there is just one 'flip' operation needed to sort it, so it will be seen as 'almost sorted'. Thus, It is almost sorted. Besides, you only said monotonicity. Descending or Ascending, still, it shows the essence of sorted. I would say that 7654321 is more sorted then 4237516. It solves your 'clumping' issue.