I'm building an Azure data lake using data factory at the moment, and am after some advice on having multiple data factories vs just one.
I have one data factory at the moment, that is sourcing data from one EBS instance, for one specific company under an enterprise. In the future though there might be other EBS instances, and other companies (with other applications as sources) to incorporate into the factory - and I'm thinking the diagram might get a bit messy.
I've searched around, and I found this site, that recommends to keep everything in a single data factory to reuse linked services. I guess that is a good thing, however as I have scripted the build for one data factory, it would be pretty easy to build the linked services again to point at the same data lake for instance.
https://www.purplefrogsystems.com/paul/2017/08/chaining-azure-data-factory-activities-and-datasets/
Pros for having only one instance of data factory:
- Have to only create the data sets, linked services once
- Can see overall lineage in one diagram
Cons
- Could get messy over time
- Could get quite big to even find the pipeline you are after
Has anyone got some large deployments of Azure Data Factories out there, that bring in potentially thousands of data sources, mix them together and transform? Would be interested in hearing your thoughts.