2 views

What i actually want to do is to fit all possible straight lines in some data and find the best group of fitted lines by measuring their average R squared.

The step that i got stuck, is how to obtain with a sufficient method all those possible sublists so that i can make the fit afterwards. That's also the reason why i want a minimum length of 3 because every line that passes through two points has a perfect fit and i don't want that.

For example my first try was something like that:

def sub_lists(lst):

lr = [lst[:i] for i in range(3,len(lst)-2)]

rl = [lst[i:] for i in range(len(lst)-3,2,-1)]

return [[lr[i], rl[-i-1]] for i in range(len(lr))]

>>> tst = [489, 495, 501, 506, 508, 514, 520, 522]

>>> sub_lists(tst)

[[[489, 495, 501], [506, 508, 514, 520, 522]],

[[489, 495, 501, 506], [508, 514, 520, 522]],

[[489, 495, 501, 506, 508], [514, 520, 522]]]

but then i came across the below list with a length of 5 and it didn't work.Thus the expected output would be just the list:

>>> tst = [489, 495, 501, 506, 508]

>>> sub_lists_revised(tst)

[489, 495, 501, 506, 508]

and following the same logic when i have a bigger length of data, like 10 for example:

>>> tst = [489, 495, 501, 506, 508, 514, 520, 525, 527, 529]

>>> sub_lists_revised(tst)

# the whole list

[489, 495, 501, 506, 508, 514, 520, 525, 527, 529]

# all possible pairs

[[[489, 495, 501], [506, 508, 514, 520, 525, 527, 529]],

[[489, 495, 501, 506], [508, 514, 520, 525, 527, 529]],

[[489, 495, 501, 506, 508], [514, 520, 525, 527, 529]],

[[489, 495, 501, 506, 508, 514], [520, 525, 527, 529]],

[[489, 495, 501, 506, 508, 514, 520], [525, 527, 529]]]

# and finally, all possible triplets which i couldn't figure out

[[[489, 495, 501], [506, 508, 514], [520, 525, 527, 529]],

[[489, 495, 501], [506, 508, 514, 520], [525, 527, 529]],

[[489, 495, 501, 506], [508, 514, 520], [525, 527, 529]]]

So to conclude, what i want is a general approach that will work for even more data, although i don't think i would really need more than triplets at the moment.

I also add the figures from the first example after the fit: fig1, fig2, fig3

by (41.4k points)

Here is a general approach that you want.

#This function will produce sublist of length 3 by generating the cut points for a list of n.

# The first cut point is at 0

# We can cut at all places between the last cut plus 3

# and the length minus 3, and yield recursively the solutions for each choice

yield from cut_points(n, cuts)

# When we tried all cut points and reached the total length, we yield the cut points list

#This provides the sublists

def all_possible_sublists(data):

n = len(data)

for cut in cut_points(n):

yield [data[cut[i]:cut[i+1]] for i in range(len(cut)-1)]

Some tests are as follows:

list(all_possible_sublists([0, 1, 2, 3]))

# [[[0, 1, 2, 3]]]

list(all_possible_sublists([0, 1, 2, 3, 4, 5, 6]))

# [[[0, 1, 2], [3, 4, 5, 6]],

#  [[0, 1, 2, 3], [4, 5, 6]],

#  [[0, 1, 2, 3, 4, 5, 6]]]

for sublist in all_possible_sublists([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]):

print(sublist)

# [[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]

# [[0, 1, 2], [3, 4, 5, 6], [7, 8, 9]]

# [[0, 1, 2], [3, 4, 5, 6, 7, 8, 9]]

# [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]]

# [[0, 1, 2, 3], [4, 5, 6, 7, 8, 9]]

# [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

# [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9]]

# [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9]]

# [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.