Here, the solution is based on dynamic programming.
Now, with the following notations: (x_1, y_1), ..., (x_n, y_n) the points, with x_1 <= x_2 <= ... <= x_n to cut in K groups.
Var(i, j) the variance of the y's: y_i, ..., y_j.
F_K((x_1,y_1), ..., (x_n, y_n)) = F_k(1,n) the value of the best solution for the problem.
After that, we have the following:
F_k(i,j) = min for l in i...j-k+1 of (Var(i,l) + F_(k-1)(l+1, j) and
F_1(i,j) = Var(i,j).
Now, the best way to split the points in 'k' groups is to select the leftmost cut and for the remaining points, the best choice of k-1 cuts.
Now, we need a 3D array A of dimensions n*n*K to store the value of F_k(i,j) for all i,j,k.
So, the program will be like this:
function get_value(P: points, A: 3D array, i, j, k){
if A[i][j][k] is defined{
result = A[i][j][k]
} else if k == 1 {
A[i][j][k] = get_var(P, i, j)
result = A[i][j][k]
} else {
result = +INF
for l in i ... j-k+1 {
tmp = get_value(P, A, i, l, 1) + get_value(P, A, l+1, j, k-1)
if tmp < result {
result = tmp
}
}
}
return result
}
If you wish to learn more about how to use python for data science, then go through this data science python course by Intellipaat for more insights.