Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)
edited by

I'm looking at pybrain for taking server monitor alarms and determining the root cause of a problem. I'm happy with training it using supervised learning and curating the training data sets. The data is structured something like this.

 * Server Type **A** #1

  * Alarm type 1

  * Alarm type 2

 * Server Type **A** #2

  * Alarm type 1

  * Alarm type 2

 * Server Type **B** #1

  * Alarm type **99**

  * Alarm type 2

So there are n servers, with x alarms that can be UP or DOWN. Both n and x are variable.

If Server A1 has alarm 1 & 2 as DOWN, then we can say that service a is down on that server and is the cause of the problem.

If alarm 1 is down on all servers, then we can say that service a is the cause.

There can potentially be multiple options for the cause, so straight classification doesn't seem appropriate.

I would also like to tie later sources of data to the net. Such as just scripts that ping some external service.

All the appropriate alarms may not be triggered at once, due to serial service checks, so it can start with one server down and then another server down 5 minutes later.

I'm trying to do some basic stuff at first:

from pybrain.tools.shortcuts import buildNetwork

from pybrain.datasets import SupervisedDataSet

from pybrain.supervised.trainers import BackpropTrainer

INPUTS = 2

OUTPUTS = 1

# Build network

# 2 inputs, 3 hidden, 1 output neurons

net = buildNetwork(INPUTS, 3, OUTPUTS)

# Build dataset

# Dataset with 2 inputs and 1 output

ds = SupervisedDataSet(INPUTS, OUTPUTS)

# Add one sample, iterable of inputs and iterable of outputs

ds.addSample((0, 0), (0,))

# Train the network with the dataset

trainer = BackpropTrainer(net, ds)

# Train 1000 epochs

for x in xrange(10):

    trainer.train()

# Train infinite epochs until the error rate is low

trainer.trainUntilConvergence()

# Run an input over the network

result = net.activate([2, 1])

But I[m having a hard time mapping variable numbers of alarms to static numbers of inputs. For example, if we add an alarm to a server, or add a server, the whole net needs to be rebuilt. If that is something that needs to be done, I can do it, but want to know if there's a better way.

Another option I'm trying to think of, is have a different net for each type of server, but I don't see how I can draw an environment-wide conclusion, since it will just make evaluations on a single host, instead of all hosts at once.

Which type of algorithm should I use and how do I map the dataset to draw environment-wide conclusions as a whole with variable inputs?

I'm very open to any algorithm that will work. Go is even better than python. Machine learning for monitoring servers.

1 Answer

0 votes
by (33.1k points)
edited by

This is a challenging problem actually.

Representation of labels

It's difficult to represent your target labels for learning. As you pointed out,

  • If Server A1 has alarm 1 & 2 as DOWN, then we can say that service a is down on that server and is the cause of the problem.
  • If alarm 1 is down on all servers, then we can say that service is the cause.
  • There can potentially be multiple options for the cause ...

I guess you need to list all possible options otherwise we cannot expect an ML algorithm to generalize. To make it simple, let's say you have only two possible causes of the problem:

1. Service problem 

2. Server problem  

Hope this answer helps you!

Also, studying Deep Learning would be giving some of the basic insights as far as Machine Learning is concerned.

If you want to know more about Machine Learning Course then watch this video:

Browse Categories

...