Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)

I recently got access to a huge amount of server log data (at the new job). I have some experience in machine learning from college. The logs data include server logs, database access logs etc. I was wondering what kind of learning can be done from such a data.

One little thing i tried was to predict number of requests on a certain hour of the day based on the data of past week, which seemed ok but this is kind of trivial. So,

  • What kind of learning can be done from such data?
    • May be predicting the probability of an IP doing spam clicks on ads(yes the company is into that) based on some usage pattern of previous spammers?
    • May be predicting at what time the traffic may shoot up.
  • Are there any existing tools/projects which specifically leverage?
  • Any interesting resources/papers which talk about similar stuff?
  • Also, data related process activity at over a certain time on server. can this be any useful for learning?

1 Answer

0 votes
by (33.1k points)
edited by

There are some ways to solve your problem:

  1. Extract logging templates from the source code to extract identifiers from the logs (the thing in the log corresponding to %s is an identifier). They use certain heuristics to distinguish identifiers from non-identifiers.
  2. Use ratios between values instead of raw numbers.
  3. Use Principal Component Analysis to discover anomalies in vectors of such features.

Hope this answer helps you! Machine Learning Algorithms are also required for a broader version of the explanation.

Learn Machine Learning through this video:

Browse Categories