0 votes
1 view
in Machine Learning by (17.1k points)

Simple machine learning question. Probably numerous ways to solve this:

There is an infinite stream of 4 possible events:

'event_1', 'event_2', 'event_4', 'event_4'

The events do not come in in completely random order. We will assume that there are some complex patterns to the order that most events come in, and the rest of the events are just random. We do not know the patterns ahead of time though.

After each event is received, I want to predict what the next event will be based on the order that events have come in in the past. So my question is: What machine learning algorithm should I use for this predictor?

The predictor will then be told what the next event actually was:

Predictor=new_predictor()

prev_event=False

while True:

    event=get_event()

    if prev_event is not False:

        Predictor.last_event_was(prev_event)

    predicted_event=Predictor.predict_next_event(event)

The question arises of how long of a history that the predictor should maintain, since maintaining infinite history will not be possible. I'll leave this up to you to answer. The answer can't be infinte though for practicality.

So I believe that the predictions will have to be done with some kind of rolling history. Adding a new event and expiring an old event should therefore be rather efficient, and not require rebuilding the entire predictor model, for example.

Specific code, instead of research papers, would add for me immense value to your responses. Python or C libraries are nice, but anything will do.

1 Answer

0 votes
by (33.2k points)

It seems like a sequence prediction problem, that needs the Recurrent neural networks or hidden Markov models.

Your predictor can’t look back in time longer than the size of your window. RNNs and HMMs can do that after proper implementation of the model. 

Here is a pybrain code for your problem:

from pybrain.datasets import SequentialDataSet

from pybrain.supervised.trainers import BackpropTrainer

from pybrain.tools.shortcuts import buildNetwork

from pybrain.structure import SigmoidLayer

INPUTS = 4

HIDDEN = 10

OUTPUTS = 4

net = buildNetwork(INPUTS, HIDDEN, OUTPUTS, hiddenclass=LSTMLayer, outclass=SigmoidLayer, recurrent=True)

ds = SequentialDataSet(INPUTS, OUTPUTS)

for sequence in your_sequences:

    for (inpt, target) in zip(sequence, sequence[1:]):

        ds.newSequence()

        ds.appendLinked(inpt, target)

net.randomize()

trainer = BackpropTrainer(net, ds, learningrate=0.05, momentum=0.99)

for _ in range(1000):

    print trainer.train()

The code will train the recurrent network for 1000 epochs and print out the error after every epoch. You can check for correct predictions like this:

net.reset()

for i in sequence:

  next_item = net.activate(i) > 0.5

  print next_item

This will print an array of booleans for every event.

The above code is not tested on the current version, there might be some issues with the updated version.

Hope this answer helps.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...