0 votes
1 view
in Azure by (15.1k points)

I have a HDInsight Hadoop cluster (Linux, deployed separately) on Azure VNet (restricting client IPs using NSG).

Azure SQL firewall has an option called "Allow access to Azure services", which allows Data Factory to access Azure SQL.

In VNet there is no such option, you have to either specify IP addresses range or set a tag (Internet, Virtual Network, AzureLoadBalancer). I thought AzureLoadBalancer will solve the issue, but no - HDInsight is still hidden from Azure Data Factory.

I tried to find Data Factory port ranges, unsuccessfully.

Is there a way to access secured HDInsight Linux cluster from Azure Data Factory?

2 Answers

0 votes
by (41.2k points)
selected by
 
Best answer
  • Azure Data Factory can access resources that can be accessed publicly. If your HDInsight cluster is in a VNet then it cannot be accessed publicly. So Azure Data Factory cannot access and orchestrate it.

 

  • Azure Data Factory is not supported in a VNet environment but that would take some time to land.

by (150 points)
Azure Data Factory is supported in VNet environment.
0 votes
by (150 points)

Hello,

As the Hadoop cluster is inside a virtual network, you need to install a self-hosted integration runtime (IR) in the same virtual network.

You need to create a new VMjoin it to the same virtual network, and install self-hosted IR on it.

The self-hosted IR allows Data Factory service to dispatch processing requests to a compute service such as HDInsight inside a virtual network. It also allows you to move data to/from data stores inside a virtual network to Azure. You use a self-hosted IR when the data store or compute is in an on-premises environment as well.

For more details, refer “Transform data in Azure Virtual Network using Hive activity in ADF”.

Hope this helps.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...