Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (17.6k points)

Assuming I have a multi-class sequence prediction problem, with the correct answer:

gold = [1,2,1,0,2,2]

There are N models which gives different predictions:

pred1 = [1,2,2,0,2,2]

pred2 = [2,2,1,1,1,1]

pred3 = [1,2,1,0,2,1]

pred4 = [1,1,0,2,1,2]

pred5 = [2,2,1,0,1,2]

I would like to find a subset of the predictions (e.g. [pred1, pred3, pred5]), such that if I take the most common item at each position, the number of correct voting results is maximized.

In the actually problem, the length of sequence > 10000, and N > 100, is there any efficient way to find the subset?

Currently I am simply randomly sampling the subset, since I couldn't find an exact search algorithm (nor do I know if it exists). Some heuristics could help to reduce the computation, e.g. remove the unanimous predictions, but not complexity-wise.

If an efficient answer doesn't exist, I would also like to have a solution on a relaxed problem: namely binary instead of multi-class prediction, such that the most common prediction is the majority (>50%) one.

(Solutions with Python/Numpy native functions are appreciated)

1 Answer

0 votes
by (41.4k points)

The first step is to sort the lists:

gold = [1,2,1,0,2,2]

 

preds = [

    [1,2,2,0,2,2],

    [2,2,1,1,1,1],

    [1,2,1,0,2,1],

    [1,1,0,2,1,2],

    [2,2,1,0,1,2]

]

 

def count_correct(pred, goal=gold):

    return sum(1 for a,b in zip(pred, goal) if a==b)

 

print(sorted(preds, key=count_correct, reverse=True))

# [[1, 2, 2, 0, 2, 2], [1, 2, 1, 0, 2, 1], [2, 2, 1, 0, 1, 2], [2, 2, 1, 1, 1, 1], [1, 1, 0, 2, 1, 2]]

# the predictions that are correct, calculate the indices for them, and then try to look for a set cover:

def correct_ids(pred, goal=gold):

    return [i for i,(a,b) in enumerate(zip(pred,goal)) if a==b]

print([correct_ids(pred) for pred in preds])

# [[0, 1, 3, 4, 5], [1, 2], [0, 1, 2, 3, 4], [0, 5], [1, 2, 3, 5]]

If you wish to learn more about how to use python for data science, then go through data science python programming course by Intellipaat for more insights.

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...