Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Azure by (17.6k points)

I'm trying to parse below the JSON file using USQL but keep getting an error.

Json file@

{"dimBetType_SKey":1,"BetType_BKey":1,"BetTypeName":"Test1"}

{"dimBetType_SKey":2,"BetType_BKey":2,"BetTypeName":"Test2"}

{"dimBetType_SKey":3,"BetType_BKey":3,"BetTypeName":"Test3"}

Below is the USQL script, I'm trying to extract the data from the above file. 

 REFERENCE ASSEMBLY [Newtonsoft.Json];

    REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

DECLARE @Full_Path string =

"adl://xxxx.azuredatalakestore.net/2017/03/28/00_0_66ffdd26541742fab57139e95080e704.json";

DECLARE @Output_Path = "adl://xxxx.azuredatalakestore.net/Output/Output.csv";

@logSchema =

EXTRACT dimBetType_SKey int

FROM @Full_Path

USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();

OUTPUT @logSchema

TO @Output_Path 

USING Outputters.Csv();

 But the USQL keeps failing with Vertex error

Any help?

1 Answer

0 votes
by (47.2k points)
  • As you have new JSON blocks on each new line of the file, you need to parse it in a different manner rather than a straight JSON file

  • You can make use of text extractor to bring a new  line delimiter in each JSON element in the following way

DECLARE @Full_Path string = "etc"

@RawExtract = 

    EXTRACT 

        [RawString] string, 

        [FileName] string //optional, see below

    FROM

        @Full_Path

    USING 

        Extractors.Text(delimiter:'\b', quoting : false);

  • Later you need to shred the JSON with the assembly which you have referenced, by using the JSON tuple method in the following way:

REFERENCE ASSEMBLY [Newtonsoft.Json];

REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats];

@ParsedJSONLines = 

    SELECT 

        Microsoft.Analytics.Samples.Formats.Json.JsonFunctions.JsonTuple([RawString]) AS JSONLine,

        [FileName]

    FROM 

        @RawExtract

  • Then you will get the values in this way:

@StagedData =

    SELECT 

        JSONLine["dimBetType_SKey"] AS dimBetType_SKey,

        JSONLine["BetType_BKey"] AS BetType_BKey,

        JSONLine["BetTypeName"] AS BetTypeName

        [FileName]

    FROM 

        @ParsedJSONLines;

  • Finally, format your output to CSV or any other

DECLARE @Output_Path string = "etc"

OUTPUT @StagedData

TO @Output_Path 

USING Outputters.Csv();

  • You don't need to refer the complete data lake store path as the analytic engine already know the path of the root storage so you can replace the variables in this way

DECLARE @Full_Path string = "/2017/03/28/{FileName}";

...