Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
in Azure by (45.3k points)

I am trying to do a pipeline in Azure Data factory V1 which will do an Azure Batch Execution on a file. I implemented it using a blob storage as input and output and it worked. However, I am not trying to change the input and output to a folder in my data lake store. When I try to deploy it, it gives me the following error:

Entity provisioning failed: AzureML Activity 'MLActivity' specifies 'DatalakeInput' in a property that requires an Azure Blob Dataset reference.  

How can I have the input and output as a datalakestore instead of a blob?



        "name": "MLPipeline",

        "properties": {

            "description": "use AzureML model",

            "activities": [


                    "type": "AzureMLBatchExecution",

                    "typeProperties": {

                        "webServiceInput": "DatalakeInput",

                        "webServiceOutputs": {

                            "output1": "DatalakeOutput"


                        "webServiceInputs": {},

                        "globalParameters": {}


                    "inputs": [


                            "name": "DatalakeInput"



                    "outputs": [


                            "name": "DatalakeOutput"



                    "policy": {

                        "timeout": "02:00:00",

                        "concurrency": 3,

                        "executionPriorityOrder": "NewestFirst",

                        "retry": 1


                    "scheduler": {

                        "frequency": "Hour",

                        "interval": 1


                    "name": "MLActivity",

                    "description": "description",

                    "linkedServiceName": "MyAzureMLLinkedService"



            "start": "2016-02-08T00:00:00Z",

            "end": "2016-02-08T00:00:00Z",

            "isPaused": false,

            "hubName": "hubname",

            "pipelineMode": "Scheduled"



Output Dataset:


        "name": "DatalakeOutput",

        "properties": {

            "published": false,

            "type": "AzureDataLakeStore",

            "linkedServiceName": "AzureDataLakeStoreLinkedService",

            "typeProperties": {

                "folderPath": "/DATA_MANAGEMENT/"


            "availability": {

                "frequency": "Hour",

                "interval": 1




Input dataset:


        "name": "DatalakeInput",

        "properties": {

            "published": false,

            "type": "AzureDataLakeStore",

            "linkedServiceName": "AzureDataLakeStoreLinkedService",

            "typeProperties": {

                "fileName": "data.csv",

                "folderPath": "/RAW/",

                "format": {

                    "type": "TextFormat",

                    "columnDelimiter": ","



            "availability": {

                "frequency": "Hour",

                "interval": 1






    "name": "AzureDataLakeStoreLinkedService",

    "properties": {

        "description": "",

        "hubName": "xyzdatafactoryv1_hub",

        "type": "AzureDataLakeStore",

        "typeProperties": {

            "dataLakeStoreUri": "",

            "authorization": "**********",

            "sessionId": "**********",

            "subscriptionId": "*****",

            "resourceGroupName": "xyzresourcegroup"




1 Answer

0 votes
by (16.8k points)

This is an issue with the Azure DataLakeStoreLinkedService.

Use this service principal authentication


    "name": "AzureDataLakeStoreLinkedService",

    "properties": {

        "type": "AzureDataLakeStore",

        "typeProperties": {

            "dataLakeStoreUri": "https://<accountname>",

            "servicePrincipalId": "<service principal id>",

            "servicePrincipalKey": {

                "type": "SecureString",

                "value": "<service principal key>"


            "tenant": "<tenant info, e.g.>",

            "subscriptionId": "<subscription of ADLS>",

            "resourceGroupName": "<resource group of ADLS>"


        "connectVia": {

            "referenceName": "<name of Integration Runtime>",

            "type": "IntegrationRuntimeReference"




Using managed service identity authentication


    "name": "AzureDataLakeStoreLinkedService",

    "properties": {

        "type": "AzureDataLakeStore",

        "typeProperties": {

            "dataLakeStoreUri": "https://<accountname>",

            "tenant": "<tenant info, e.g.>",

            "subscriptionId": "<subscription of ADLS>",

            "resourceGroupName": "<resource group of ADLS>"


        "connectVia": {

            "referenceName": "<name of Integration Runtime>",

            "type": "IntegrationRuntimeReference"




You can take a similar reference here:

Browse Categories
