Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Azure by (45.3k points)

I am trying to do a pipeline in Azure Data factory V1 which will do an Azure Batch Execution on a file. I implemented it using a blob storage as input and output and it worked. However, I am not trying to change the input and output to a folder in my data lake store. When I try to deploy it, it gives me the following error:

Entity provisioning failed: AzureML Activity 'MLActivity' specifies 'DatalakeInput' in a property that requires an Azure Blob Dataset reference.  

How can I have the input and output as a datalakestore instead of a blob?

Pipeline:

{

        "name": "MLPipeline",

        "properties": {

            "description": "use AzureML model",

            "activities": [

                {

                    "type": "AzureMLBatchExecution",

                    "typeProperties": {

                        "webServiceInput": "DatalakeInput",

                        "webServiceOutputs": {

                            "output1": "DatalakeOutput"

                        },

                        "webServiceInputs": {},

                        "globalParameters": {}

                    },

                    "inputs": [

                        {

                            "name": "DatalakeInput"

                        }

                    ],

                    "outputs": [

                        {

                            "name": "DatalakeOutput"

                        }

                    ],

                    "policy": {

                        "timeout": "02:00:00",

                        "concurrency": 3,

                        "executionPriorityOrder": "NewestFirst",

                        "retry": 1

                    },

                    "scheduler": {

                        "frequency": "Hour",

                        "interval": 1

                    },

                    "name": "MLActivity",

                    "description": "description",

                    "linkedServiceName": "MyAzureMLLinkedService"

                }

            ],

            "start": "2016-02-08T00:00:00Z",

            "end": "2016-02-08T00:00:00Z",

            "isPaused": false,

            "hubName": "hubname",

            "pipelineMode": "Scheduled"

        }

    }

Output Dataset:

{

        "name": "DatalakeOutput",

        "properties": {

            "published": false,

            "type": "AzureDataLakeStore",

            "linkedServiceName": "AzureDataLakeStoreLinkedService",

            "typeProperties": {

                "folderPath": "/DATA_MANAGEMENT/"

            },

            "availability": {

                "frequency": "Hour",

                "interval": 1

            }

        }

    }

Input dataset:

{

        "name": "DatalakeInput",

        "properties": {

            "published": false,

            "type": "AzureDataLakeStore",

            "linkedServiceName": "AzureDataLakeStoreLinkedService",

            "typeProperties": {

                "fileName": "data.csv",

                "folderPath": "/RAW/",

                "format": {

                    "type": "TextFormat",

                    "columnDelimiter": ","

                }

            },

            "availability": {

                "frequency": "Hour",

                "interval": 1

            }

        }

    }

AzureDatalakeStoreLinkedService:

{

    "name": "AzureDataLakeStoreLinkedService",

    "properties": {

        "description": "",

        "hubName": "xyzdatafactoryv1_hub",

        "type": "AzureDataLakeStore",

        "typeProperties": {

            "dataLakeStoreUri": "https://xyzdatastore.azuredatalakestore.net/webhdfs/v1",

            "authorization": "**********",

            "sessionId": "**********",

            "subscriptionId": "*****",

            "resourceGroupName": "xyzresourcegroup"

        }

    }

}

1 Answer

0 votes
by (16.8k points)

This is an issue with the Azure DataLakeStoreLinkedService.

Use this service principal authentication

{

    "name": "AzureDataLakeStoreLinkedService",

    "properties": {

        "type": "AzureDataLakeStore",

        "typeProperties": {

            "dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",

            "servicePrincipalId": "<service principal id>",

            "servicePrincipalKey": {

                "type": "SecureString",

                "value": "<service principal key>"

            },

            "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",

            "subscriptionId": "<subscription of ADLS>",

            "resourceGroupName": "<resource group of ADLS>"

        },

        "connectVia": {

            "referenceName": "<name of Integration Runtime>",

            "type": "IntegrationRuntimeReference"

        }

    }

}

Using managed service identity authentication

{

    "name": "AzureDataLakeStoreLinkedService",

    "properties": {

        "type": "AzureDataLakeStore",

        "typeProperties": {

            "dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",

            "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",

            "subscriptionId": "<subscription of ADLS>",

            "resourceGroupName": "<resource group of ADLS>"

        },

        "connectVia": {

            "referenceName": "<name of Integration Runtime>",

            "type": "IntegrationRuntimeReference"

        }

    }

}

You can take a similar reference here:

https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-store

31k questions

32.8k answers

501 comments

693 users

Browse Categories

...