0 votes
1 view
in AI and Deep Learning by (44.6k points)

I’ve been following Andrew NG’s course AI course, specifically, the section on neural networks and I’m planning on implementing a neural network on log file data.

My log file contains data of this type :

<IP OF MACHINE INITIATING REQUEST><DATE OF REQUEST><TIME OF REQUEST><NAME OF RESOURCE BEING ACCESSED ON SERVER><RESPONSE CODE><TIME TAKEN FOR SERVER TO SERVE PAGE>

I’m aware there are other classification algorithms that could be used for this task such as naïve Bayes and local outlier factor but want to gain exposure with neural networks using a real-world applicable problem.

I have read about neural network self-organizing maps and this seems to be more suited to this type of problem as the log file does not have any structure, but seems to be a more advanced topic.

Instead of using a self-organizing map neural network I plan to create the training data from log file data by grouping the data into a key-value pair where the key is the 

<IP OF MACHINE INITIATING REQUEST>

 and the value for each key is

 [<NAME OF RESOURCE BEING ACCESSED ON SERVER>, ><TIME TAKEN FOR SERVER TO SERVE PAGE>]

From the above log file data I’m aiming to use a neural network(s) :

To classify similar IP behaviors based on what resources are being accessed. 

Classify behavior at specific periods/moments in time, so what IP’s are behaving similarly and specific moment in time. 

I’m not sure where to start with above. I’ve implemented very basic neural networks that perform integer arithmetic but now want to implement a network that is of use based on the data I have.

Based on the log data format is this a good use case?

Any pointers on where to being with this task?

I hope this question is not too generic, I'm just unsure what questions to consider when beginning implementation of a neural network.

Update :

I would like to output data that is best suited to be generated from a neural network.

For this, I think outputting a classification of users based over periods of time based on a similarity score.

To generate the similarity score I could generate a count of times each IP address accesses a resource :

e.g :

1.2.3.A,4,3,1

1.2.3.B,0,1,2

1.2.3.C,3,7,3

from this then generate :

<HOUR OF DAY>,<IP ADDRESS X>,<IP ADDRESS Y>,<SIMMILARITY SCORE>

:

1,1.2.3.A,1.2.3.B,.3

1,1.2.3.C,1.2.3.B,.2

1,1.2.3.B,1.2.3.B,0

2,1.2.3.D,1.2.3.B,.764

2,1.2.3.E,1.2.3.B,.332

3,1.2.3.F,1.2.3.B,.631

So then can begin to correlate how users behave over the course of the day.

Is it applicable to a neural network?

I realize I'm asking about a neural network looking for a problem, but is this a suitable problem?

1 Answer

0 votes
by (96.2k points)

Answering your question that is based on the log data is this a good case, you can use it as a dataset to train a neural network to predict future values or classify them in labels. For some types of neural network (especially, Multi-Layer Perceptron) it depends on how you organize your dataset to use during the training of the neural network. There are other cases you can group the sample which is also known as clustering.

There are many tools that allow you to work easily with neural networks, you can use arrays of double values to define these sets and the object will be trained for you. I have been using the Encog Framework from Heaton Research and it supports Java, C#, C++, and others.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...