I'm looking at pybrain for taking server monitor alarms and determining the root cause of a problem. I'm happy with training it using supervised learning and curating the training data sets. The data is structured something like this.
* Server Type **A** #1
* Alarm type 1
* Alarm type 2
* Server Type **A** #2
* Alarm type 1
* Alarm type 2
* Server Type **B** #1
* Alarm type **99**
* Alarm type 2
So there are n servers, with x alarms that can be UP or DOWN. Both n and x are variable.
If Server A1 has alarm 1 & 2 as DOWN, then we can say that service a is down on that server and is the cause of the problem.
If alarm 1 is down on all servers, then we can say that service a is the cause.
There can potentially be multiple options for the cause, so straight classification doesn't seem appropriate.
I would also like to tie later sources of data to the net. Such as just scripts that ping some external service.
All the appropriate alarms may not be triggered at once, due to serial service checks, so it can start with one server down and then another server down 5 minutes later.
I'm trying to do some basic stuff at first:
from pybrain.tools.shortcuts import buildNetwork
from pybrain.datasets import SupervisedDataSet
from pybrain.supervised.trainers import BackpropTrainer
INPUTS = 2
OUTPUTS = 1
# Build network
# 2 inputs, 3 hidden, 1 output neurons
net = buildNetwork(INPUTS, 3, OUTPUTS)
# Build dataset
# Dataset with 2 inputs and 1 output
ds = SupervisedDataSet(INPUTS, OUTPUTS)
# Add one sample, iterable of inputs and iterable of outputs
ds.addSample((0, 0), (0,))
# Train the network with the dataset
trainer = BackpropTrainer(net, ds)
# Train 1000 epochs
for x in xrange(10):
trainer.train()
# Train infinite epochs until the error rate is low
trainer.trainUntilConvergence()
# Run an input over the network
result = net.activate([2, 1])
But I[m having a hard time mapping variable numbers of alarms to static numbers of inputs. For example, if we add an alarm to a server, or add a server, the whole net needs to be rebuilt. If that is something that needs to be done, I can do it, but want to know if there's a better way.
Another option I'm trying to think of, is have a different net for each type of server, but I don't see how I can draw an environment-wide conclusion, since it will just make evaluations on a single host, instead of all hosts at once.
Which type of algorithm should I use and how do I map the dataset to draw environment-wide conclusions as a whole with variable inputs?
I'm very open to any algorithm that will work. Go is even better than python. Machine learning for monitoring servers.