0 votes
1 view
in Data Science by (17.6k points)

What i actually want to do is to fit all possible straight lines in some data and find the best group of fitted lines by measuring their average R squared.

The step that i got stuck, is how to obtain with a sufficient method all those possible sublists so that i can make the fit afterwards. That's also the reason why i want a minimum length of 3 because every line that passes through two points has a perfect fit and i don't want that.

For example my first try was something like that:

def sub_lists(lst):

    lr = [lst[:i] for i in range(3,len(lst)-2)]

    rl = [lst[i:] for i in range(len(lst)-3,2,-1)]

    return [[lr[i], rl[-i-1]] for i in range(len(lr))]

>>> tst = [489, 495, 501, 506, 508, 514, 520, 522]

>>> sub_lists(tst)

[[[489, 495, 501], [506, 508, 514, 520, 522]],

[[489, 495, 501, 506], [508, 514, 520, 522]],

[[489, 495, 501, 506, 508], [514, 520, 522]]]

but then i came across the below list with a length of 5 and it didn't work.Thus the expected output would be just the list:

>>> tst = [489, 495, 501, 506, 508]

>>> sub_lists_revised(tst)

[489, 495, 501, 506, 508]

and following the same logic when i have a bigger length of data, like 10 for example:

>>> tst = [489, 495, 501, 506, 508, 514, 520, 525, 527, 529]

>>> sub_lists_revised(tst)

# the whole list

[489, 495, 501, 506, 508, 514, 520, 525, 527, 529]

# all possible pairs

[[[489, 495, 501], [506, 508, 514, 520, 525, 527, 529]],

[[489, 495, 501, 506], [508, 514, 520, 525, 527, 529]],

[[489, 495, 501, 506, 508], [514, 520, 525, 527, 529]],

[[489, 495, 501, 506, 508, 514], [520, 525, 527, 529]],

[[489, 495, 501, 506, 508, 514, 520], [525, 527, 529]]]

# and finally, all possible triplets which i couldn't figure out

[[[489, 495, 501], [506, 508, 514], [520, 525, 527, 529]],

[[489, 495, 501], [506, 508, 514, 520], [525, 527, 529]],

[[489, 495, 501, 506], [508, 514, 520], [525, 527, 529]]]

So to conclude, what i want is a general approach that will work for even more data, although i don't think i would really need more than triplets at the moment.

I also add the figures from the first example after the fit: fig1, fig2, fig3

1 Answer

0 votes
by (38.2k points)

Here is a general approach that you want.

#This function will produce sublist of length 3 by generating the cut points for a list of n.

def cut_points(n, already_cut=None):

    # The first cut point is at 0 

    if already_cut is None:

        already_cut = [0]

    # We can cut at all places between the last cut plus 3 

    # and the length minus 3, and yield recursively the solutions for each choice

    for i in range(already_cut[-1]+3, n-2):

        cuts = already_cut[:] + [i]

        yield from cut_points(n, cuts)

    # When we tried all cut points and reached the total length, we yield the cut points list 

    yield already_cut[:] + [n]

#This provides the sublists

def all_possible_sublists(data):

    n = len(data)

    for cut in cut_points(n):

        yield [data[cut[i]:cut[i+1]] for i in range(len(cut)-1)]

Some tests are as follows:

list(all_possible_sublists([0, 1, 2, 3]))

# [[[0, 1, 2, 3]]]

list(all_possible_sublists([0, 1, 2, 3, 4, 5, 6]))

# [[[0, 1, 2], [3, 4, 5, 6]],

#  [[0, 1, 2, 3], [4, 5, 6]],

#  [[0, 1, 2, 3, 4, 5, 6]]]

for sublist in all_possible_sublists([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]):

    print(sublist)

# [[0, 1, 2], [3, 4, 5], [6, 7, 8, 9]]

# [[0, 1, 2], [3, 4, 5, 6], [7, 8, 9]]

# [[0, 1, 2], [3, 4, 5, 6, 7, 8, 9]]

# [[0, 1, 2, 3], [4, 5, 6], [7, 8, 9]]

# [[0, 1, 2, 3], [4, 5, 6, 7, 8, 9]]

# [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

# [[0, 1, 2, 3, 4, 5], [6, 7, 8, 9]]

# [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9]]

# [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...