We have an API server running that serves around 500.000 requests a day. We want to keep all these requests in a database to be able to analyze the data. We log things like:
- Who did the request
- How long time did it take
- Date and time
- Http response code
- What API resource was asked for (URL)
- Cached response or not (bool)
We want to keep these logs for 3 months, something which will result in about 45.000.000 records in that database. When records are older than 3 months they are deleted.
Storing these 45 million records in a SQL database is possible, but then it is really slow to perform any analysis on these data. We would like to do extensive analysis like - how many requests did a specific user do today, compared to the same day last week? How many percents of requests failed today compared to any other day? See a trend diagram showing if the number of requests is going up or down. See the top 10 resources being asked for at a given time. You get it - we want to be able to do all kinds of analyses like this.
Can you give any advice on where to store these logs to be able to do analysis like this in real-time (or near real-time)? Any NoSQL database that could be good for this? Azure? I see there is something called Azure SQL data warehouse, could that be used for this? I have looked at Microsoft Power Bi which will probably be great for doing the analysis on these data, but where do I store the data.
I would really appreciate it if someone has some suggestions for me.