I’ve been following Andrew NG’s course AI course, specifically, the section on neural networks and I’m planning on implementing a neural network on log file data.
My log file contains data of this type :
<IP OF MACHINE INITIATING REQUEST><DATE OF REQUEST><TIME OF REQUEST><NAME OF RESOURCE BEING ACCESSED ON SERVER><RESPONSE CODE><TIME TAKEN FOR SERVER TO SERVE PAGE>
I’m aware there are other classification algorithms that could be used for this task such as naïve Bayes and local outlier factor but want to gain exposure with neural networks using a real-world applicable problem.
I have read about neural network self-organizing maps and this seems to be more suited to this type of problem as the log file does not have any structure, but seems to be a more advanced topic.
Instead of using a self-organizing map neural network I plan to create the training data from log file data by grouping the data into a key-value pair where the key is the
<IP OF MACHINE INITIATING REQUEST>
and the value for each key is
[<NAME OF RESOURCE BEING ACCESSED ON SERVER>, ><TIME TAKEN FOR SERVER TO SERVE PAGE>]
From the above log file data I’m aiming to use a neural network(s) :
To classify similar IP behaviors based on what resources are being accessed.
Classify behavior at specific periods/moments in time, so what IP’s are behaving similarly and specific moment in time.
I’m not sure where to start with above. I’ve implemented very basic neural networks that perform integer arithmetic but now want to implement a network that is of use based on the data I have.
Based on the log data format is this a good use case?
Any pointers on where to being with this task?
I hope this question is not too generic, I'm just unsure what questions to consider when beginning implementation of a neural network.
Update :
I would like to output data that is best suited to be generated from a neural network.
For this, I think outputting a classification of users based over periods of time based on a similarity score.
To generate the similarity score I could generate a count of times each IP address accesses a resource :
e.g :
1.2.3.A,4,3,1
1.2.3.B,0,1,2
1.2.3.C,3,7,3
from this then generate :
<HOUR OF DAY>,<IP ADDRESS X>,<IP ADDRESS Y>,<SIMMILARITY SCORE>
:
1,1.2.3.A,1.2.3.B,.3
1,1.2.3.C,1.2.3.B,.2
1,1.2.3.B,1.2.3.B,0
2,1.2.3.D,1.2.3.B,.764
2,1.2.3.E,1.2.3.B,.332
3,1.2.3.F,1.2.3.B,.631
So then can begin to correlate how users behave over the course of the day.
Is it applicable to a neural network?
I realize I'm asking about a neural network looking for a problem, but is this a suitable problem?