Explore Courses Blog Tutorials Interview Questions
+1 vote
in Machine Learning by (4.2k points)

My machine has the following spec:

CPU: Xeon E5-1620 v4

GPU: Titan X (Pascal)

Ubuntu 16.04

Nvidia driver 375.26

CUDA tookit 8.0

cuDNN 5.1

I've benchmarked on the following Keras examples with Tensorflow as the backed reference:

SCRIPT NAME                  GPU       CPU               5sec      5sec                  10sec     12sec   240sec    116sec                 113sec    106sec

My gpu is clearly out performing my cpu in non-lstm models.

SCRIPT NAME                  GPU       CPU               12sec     123sec                  5sec      119sec                 3sec      47sec 

Has anyone else experienced this?

1 Answer

+1 vote
by (6.8k points)

This is mainly due to the sequential computation in the LSTM layer. Remember that LSTM requires sequential input to calculate the hidden layer weights iteratively, in other words, you must wait for the hidden state at time t-1 to calculate the hidden state at time t.

That's not a good idea for GPU cores since they are many small cores who like doing computations in parallel, sequential computation can't fully utilize their computing powers. That's why we are seeing GPU load around 10% - 20% most of the time.

But in the phase of backpropagation, GPU could run a derivative computation in parallel, so we can see the GPU load peak around 80%.

For LSTM, check out the Recurrent Neural Network by Intellipaat. This will help in cracking Machine Learning Interview Questions as well for a tech newbie.

Browse Categories