virtual file set column and rowset variable U-SQL

Question

asked Jul 5, 2019 in Azure by Vaibhav Ameta (17.6k points)

I'm having an issue with scheduling job in Data Factory. I'm trying to approach a scheduled job per hour which will execute the same script each hour with different condition.

Consider I have a bunch of Avro Files spread in Azure Data Lake Store with following pattern. /Data/SomeEntity/{date:yyyy}/{date:MM}/{date:dd}/SomeEntity_{date:yyyy}{date:MM}{date:dd}__{date:H}

Each hour new files are added to Data Lake Store. In order to process the files only once I decided to handle them by the help of U-SQL virtual file set column and some SyncTable which I created in Data Lake Store.

My query looks like the following.

DECLARE @file_set_path string = /Data/SomeEntity/{date:yyyy}/{date:MM}/{date:dd}/SomeEntity_{date:yyyy}_{date:MM}_{date:dd}__{date:H};
@result = EXTRACT [Id] long,
....
date DateTime
FROM @file_set_path
USING someextractor;
@rdate =
SELECT MAX(ProcessedDate) AS ProcessedDate
FROM dbo.SyncTable
WHERE EntityName== "SomeEntity";
@finalResult = SELECT [Id],... FROM @result
CROSS JOIN @rdate AS r
WHERE date >= r.ProcessedDate;

since I can't use a rowset variable in where clause I'm cross joining the singe row with a set , however even in this case U-SQL won't find the correct files and always return all files set.

Is there any workaround or other approaches?

1 Answer

Ashok · Answer 1 · 2019-07-08T06:04:09+0000

On Dynamic joins, these files set cannot perform partition elimination as the values are not known to optimizer during the preparation phase.
Synch point needs to be passed as the parameter from Azure Data Factory to the processing script so that value will be known to optimize and file set partition will kick it. In the worst case, we must read the value from sync table in the previous script and use it as a parameter in the next.

https://intellipaat.com/community/5078/what-does-rerun-in-upstream-in-pipeline

virtual file set column and rowset variable U-SQL

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources