0 votes
1 view
in Data Science by (11.2k points)

What i actually want to do is to fit all possible straight lines in some data and find the best group of fitted lines by measuring their average R squared.

The step that i got stuck, is how to obtain with a sufficient method all those possible sublists so that i can make the fit afterwards. That's also the reason why i want a minimum length of 3 because every line that passes through two points has a perfect fit and i don't want that.

For example my first try was something like that:

def sub_lists(lst):

    lr = [lst[:i] for i in range(3,len(lst)-2)]

    rl = [lst[i:] for i in range(len(lst)-3,2,-1)]

    return [[lr[i], rl[-i-1]] for i in range(len(lr))]

>>> tst = [489, 495, 501, 506, 508, 514, 520, 522]

>>> sub_lists(tst)

[[[489, 495, 501], [506, 508, 514, 520, 522]],

[[489, 495, 501, 506], [508, 514, 520, 522]],

[[489, 495, 501, 506, 508], [514, 520, 522]]]

but then i came across the below list with a length of 5 and it didn't work.Thus the expected output would be just the list:

>>> tst = [489, 495, 501, 506, 508]

>>> sub_lists_revised(tst)

[489, 495, 501, 506, 508]

and following the same logic when i have a bigger length of data, like 10 for example:

>>> tst = [489, 495, 501, 506, 508, 514, 520, 525, 527, 529]

>>> sub_lists_revised(tst)

# the whole list

[489, 495, 501, 506, 508, 514, 520, 525, 527, 529]

# all possible pairs

[[[489, 495, 501], [506, 508, 514, 520, 525, 527, 529]],

[[489, 495, 501, 506], [508, 514, 520, 525, 527, 529]],

[[489, 495, 501, 506, 508], [514, 520, 525, 527, 529]],

[[489, 495, 501, 506, 508, 514], [520, 525, 527, 529]],

[[489, 495, 501, 506, 508, 514, 520], [525, 527, 529]]]

# and finally, all possible triplets which i couldn't figure out

[[[489, 495, 501], [506, 508, 514], [520, 525, 527, 529]],

[[489, 495, 501], [506, 508, 514, 520], [525, 527, 529]],

[[489, 495, 501, 506], [508, 514, 520], [525, 527, 529]]]

So to conclude, what i want is a general approach that will work for even more data, although i don't think i would really need more than triplets at the moment.

I also add the figures from the first example after the fit: fig1, fig2, fig3

1 Answer

0 votes
by (16.1k points)

Here is a general approach that you want.

#This function will produce sublist of length 3 by generating the cut points for a list of n.

def cut_points(n, already_cut=None):

    # The first cut point is at 0 

    if already_cut is None:

        already_cut = [0]

    # We can cut at all places between the last cut plus 3 

    # and the length minus 3, and yield recursively the solutions for each choice

    for i in range(already_cut[-1]+3, n-2):

        cuts = already_cut[:] + [i]

        yield from cut_points(n, cuts)

    # When we tried all cut points and reached the total length, we yield the cut points list 

    yield already_cut[:] + [n]

#This provides the sublists

def all_possible_sublists(data):

    n = len(data)

    for cut in cut_points(n):

        yield [data[cut[i]:cut[i+1]] for i in range(len(cut)-1)]

Some tests are as follows:

list(all_possible_sublists([0, 1, 2, 3]))

# [[[0, 1, 2, 3]]]

list(all_possible_sublists([0, 1, 2, 3, 4, 5, 6]))

# [[[0, 1, 2], [3, 4, 5, 6]],

#  [[0, 1, 2, 3], [4, 5, 6]],

#  [[0, 1, 2, 3, 4, 5, 6]]]

for sublist in all_possible_sublists([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]):

    print(sublist)

# [[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]

# [[0, 1, 2], [3, 4, 5, 6], [7, 8, 9]]

# [[0, 1, 2], [3, 4, 5, 6, 7, 8, 9]]

# [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]]

# [[0, 1, 2, 3], [4, 5, 6, 7, 8, 9]]

# [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

# [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9]]

# [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9]]

# [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

...