0 votes
1 view
in Azure by (17.6k points)

I have a data factory with a pipeline copy activity like this:

{

  "type": "Copy",

  "name": "Copy from storage to SQL",

  "inputs": [

    {

      "name": "storageDatasetName"

    }

  ],

  "outputs": [

    {

      "name": "sqlOutputDatasetName"

    }

  ],

  "typeProperties": {

    "source": {

      "type": "BlobSource"

    },

    "sink": {

      "type": "SqlSink"

    }

  },

  "policy": {

    "concurrency": 1,

    "retry": 3

  },

  "scheduler": {

    "frequency": "Month",

    "interval": 1

  }

}

The input data is approx 90MB in size, about 1.5 million rows, broken into approx. 20 x 4.5MB block blob files in Azure Storage. Here's an example of the data (CSV):

A81001,1,1,1,2,600,3.0,0.47236654,141.70996,0.70854986 A81001,4,11,0,25,588,243.0,5.904582,138.87576,57.392536 A81001,7,4,1,32,1342,278.0,7.5578647,316.95795,65.65895 

The sink is an Azure SQL Server of type S2, which is rated at 50 DTUs. I've created a simple table with sensible data types, and no keys, indexes or anything fancy, just columns:

CREATE TABLE [dbo].[Prescriptions](

    [Practice] [char](6) NOT NULL,

    [BnfChapter] [tinyint] NOT NULL,

    [BnfSection] [tinyint] NOT NULL,

    [BnfParagraph] [tinyint] NOT NULL,

    [TotalItems] [int] NOT NULL,

    [TotalQty] [int] NOT NULL,

    [TotalActCost] [float] NOT NULL,

    [TotalItemsPerThousand] [float] NOT NULL,

    [TotalQtyPerThousand] [float] NOT NULL,

    [TotalActCostPerThousand] [float] NOT NULL

The source, sink and data factory are all in the same region (North Europe).

According to Microsoft's 'Copy activity performance and tuning guide', for Azure Storage Source and Azure SQL S2 sink, I should be getting about 0.4 MBps. By my calculation, that means 90MB should transfer in about half an hour (is that right?).

For some reason it copies 70,000 rows very quickly, then seems to hang. Using SQL management studio I can see the count of rows in the database table is exactly 70,000 and hasn't increased at all in 7 hours. Yet the copy task is still running with no errors:

Any ideas why this is hanging at 70,000 rows? I can't see anything unusual about the 70,001st data row which would cause a problem. I've tried completely trashing the data factory and starting again, and I always get the same behavior. I have another copy activity with a smaller table (8000 rows), which completes in 1 minute.

1 Answer

0 votes
by (47.2k points)
  • As there is a null value in one of the columns, your run was hanging at 70,000 rows where the row 76560 of your blob storage exists. 

  • Hive script which you have used to generate the blob file was written with NULL value as '\N', and your destination store(SQL Table) is also specified with 'NOT NULL' as a part of the column whose value was Float value actually.

  • Now you need to add the property to your blob dataset definition in this way: 

"nullValue": "\\N" and make your SQL table column Nullable.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...