Explore Courses Blog Tutorials Interview Questions
0 votes
in Azure by (17.6k points)

So the scenario is the following:

I have multiple instances of a web service that writes a blob of data to Azure Storage. I need to be able to group blobs into a container (or a virtual directory) depending on when it was received. Once in a while (every day at the worst) older blobs will get processed and then deleted.

I have two options:

Option 1

I make one container called "blobs" (for example) and then store all the blogs into that container. Each blob will use a directory style name with the directory name being the time it was received (e.g. "hr0min0/data.bin", "hr0min0/data2.bin", "hr0min30/data3.bin", "hr1min45/data.bin", ... , "hr23min0/dataN.bin", etc - a new directory every X minutes). The thing that processes these blobs will process hr0min0 blobs first, then hr0minX and so on (and the blobs are still being written when being processed).

Option 2

I have many containers each with a name based on the arrival time (so first will be a container called blobs_hr0min0 then blobs_hr0minX, etc) and all the blobs in the container are those blobs that arrived at the named time. The thing that processes these blogs will process one container at a time.

So my question is, which option is better? Does option 2 give me better parallelization (since containers can be in different servers) or is option 1 better because many containers can cause other unknown issues?

1 Answer

0 votes
by (47.2k points)

If Blobs needs to be listed in a container, you will likely see better performance with the many-container model. If any company is storing a massive number of blobs in a single container. The company frequently list the objects in the container and then perform actions against a subset of those blobs. They're seeing a performance hit, as the time to retrieve a full listing has been growing.

Browse Categories