I'm setting up a pipeline in an Azure "Data Factory", for the purpose of taking flat files from storage and loading them into tables within an Azure SQL DB.
The template for this pipeline specifies that I need a start and end time, which the tutorial says to set to 1 day.
I'm trying to understand this. If it were a CRON job in Linux or a scheduled task in Windows Server, then I'd simply tell it when to start (i.e. daily at 6 am) and it would take however long it takes to complete.
This leads me to several related questions:
- Why would I need to specify an end time?
- What if I don't know how long it will take to run?
- If I set it too far in the future, do I run the risk of the data pipeline not completing in a timely manner?
- If I set it too soon, will the pipeline break?
- Why is it hardcoded as a date instead of a frequency (i.e. it says to use this format -- "2014-10-14T16:32:41Z")