I'm having an issue with scheduling job in Data Factory. I'm trying to approach a scheduled job per hour which will execute the same script each hour with different condition.
Consider I have a bunch of Avro Files spread in Azure Data Lake Store with following pattern. /Data/SomeEntity/{date:yyyy}/{date:MM}/{date:dd}/SomeEntity_{date:yyyy}{date:MM}{date:dd}__{date:H}
Each hour new files are added to Data Lake Store. In order to process the files only once I decided to handle them by the help of U-SQL virtual file set column and some SyncTable which I created in Data Lake Store.
My query looks like the following.
DECLARE @file_set_path string = /Data/SomeEntity/{date:yyyy}/{date:MM}/{date:dd}/SomeEntity_{date:yyyy}_{date:MM}_{date:dd}__{date:H};
@result = EXTRACT [Id] long,
....
date DateTime
FROM @file_set_path
USING someextractor;
@rdate =
SELECT MAX(ProcessedDate) AS ProcessedDate
FROM dbo.SyncTable
WHERE EntityName== "SomeEntity";
@finalResult = SELECT [Id],... FROM @result
CROSS JOIN @rdate AS r
WHERE date >= r.ProcessedDate;
since I can't use a rowset variable in where clause I'm cross joining the singe row with a set , however even in this case U-SQL won't find the correct files and always return all files set.
Is there any workaround or other approaches?