How TensorFlow and Apache Spark Simplify Deep Learning?

In this Blog, we’ll discuss how to use Apache Spark and TensorFlow for deep learning models. You will also learn, how you can use Spark and machine to improve deep learning pipelines with TensorFlow. TensorFlow is released by Google. It is basically a framework which is used to provide Neural Networks. It’s designed for the power user.

How TensorFlow and Apache Spark Simplify Deep Learning?
 22nd May, 2019

TensorFlow is basically a framework released by Google to provide state of the art numerical computations and for neural networks. Packages like TensorFlow are actually designed for power users. You have to build a computation graph from scratchin every application and there should be a lot of code around it. This is useful to check few parameters or keep track of an experiment. Also, Scale out requires further work and is not built in.

You can run TensorFlow on a distributed back end but the way it should be done is that you should decide which part of the computation goes on to which device. The issue doesn’t end here. It is hard to expose these models in larger applications.

Databricks incepted a Deep Learning pipelines library that integrated well with Apache Spark’s ML pipelines. This was a solution to these woes. The good thing about this ML pipelines is that saving your model, loading it back later, evaluating it and doing parameter search on different models inbuilt into the APIs. Deep Learning Pipelines has some similar features which are beneficial in developing AI applications. It gives strong support for TensorFlow. Through this arrangement, a beneficial aspect of both TensorFlow and Deep Learning pipelines can be realized.

A few lines of code are enough to design a use case. On Spark everything automatically scales out. Model exposure is good here. Remember this was not the case with low level APIs earlier. Models can be used in batch or streaming applications and Spark SQL.

Neural networks built using these frameworks are therefore very useful in image recognition and automated translation. Deep Learning computations are single node only most of the time using TensorFlow. But you may be puzzled then why the parallel processing framework Spark is used then. Stick to this post to find out.

Hyperparameter Tuning   

There is a process called Hyperparameter tuning through Spark. Using this tool the best set of Hyperparameters can be found out for neural network training which results in reduced training time to the tune of ten times. Also, the error rate is lowered by 34%.

In Deep Learning Machine Learning there is Artificial Neural Networks (ANNs). Complex images serve as input and strong mathematical transforms act on these signals. What results is a vector of signals that is easy to manipulate by the Machine Learning algorithms. Artificial Neural networks do this transformation by imbibing the working of the human brain.

The creation of training algorithms can be automated for neural networks of various shapes and sizes using TensorFlow library. The process of building a neural network is more complicated than running a function on a dataset. There are several Hyperparameters to choose from which will boost the performance. Machine learning professionals rerun the same model many times with different Hyperparameters to detect the best fit. This is Hyperparameter tuning.

Want to learn Spark?

How to choose the right Hyperparameter?

There are certain factors which need to be considered when choosing the right Hyperparameter.

Number of neurons – Too few neurons will reduce the expression power of the network and too many neurons will induce noise in the network.

Learning rate – The neural network will only read the last states when the learning rate is too high. On the other hand, it will take too long to reach a good state if the learning rate is too low.

Hyperparameter process is parallel even though TensorFlow itself is not distributed. It is for this reason that Apache Spark is used. Spark can be used to broadcast common elements like data and model description and then individual repetitive computations across a cluster of machines in a fault tolerant manner can be scheduled.

The accuracy with default set of Hyperparameters is in the tune of 99.2%. The computations can be scaled linearly as the nodes are added to the cluster. Assume we have 13 node cluster through which we can train 13 models in parallel. This will give upto 7 times the speedup as when compared to training the models on one machine at a time.

Key Takeaway

Deep Learning is reaffirming the proposition that it is the future of AI. Previously no one had thought that self-driving cars would be possible. Now they are a stark reality. TensorFlow is a library that is seeing enhancements being made by various big players in the technology sphere. Amazon has released MXNet which works well on multiple nodes. Research is going on in Deep Learning and as new libraries are created it shouldn’t be a surprise that more sophisticated Deep Learning applications can be created easily with TensorFlow and Deep Learning pipelines. As these applications demand high speed parallel processing Spark will be present to fulfil it.

Want to master Deep Learning? Learn more in Intellipaat Deep Learning training!    


Related Articles