I'm just learning to use TensorFlow's tf.data API, and I've found that it is slowing my code down a lot, measured in time per epoch. This is the opposite of what it's supposed to do, I thought. I wrote a simple linear regression program to test it out.

Tl;Dr: With 100,000 training data, tf.data slows time per epoch down by about a factor of ten, if you're using full batch training. Worse if you use smaller batches. The opposite is true with 500 training data.

My question: What is going on? Is my implementation flawed? Other sources I've read have tf.data improving speeds by about 30%.

import tensorflow as tf

import numpy as np

import timeit

def regress_without_tfData(n_epochs, input_dimension, training_inputs, training_labels):

tf.reset_default_graph()

weights = tf.get_variable("weights", initializer=np.random.randn(input_dimension, 1).astype(np.float32))

init = tf.global_variables_initializer()

with tf.Session() as sess:

sess.run(init)

for _ in range(n_epochs):

sess.run(loss_op, feed_dict={X: training_inputs, Y:training_labels})

X,Y = data_set.make_one_shot_iterator().get_next()

prediction = tf.matmul(X, weights)

loss = tf.reduce_mean(tf.square(tf.subtract(prediction, Y)))

loss_op = tf.train.AdamOptimizer(.01).minimize(loss)

init = tf.global_variables_initializer()

for input_dimension in input_dimensions_list:

for data_size in [500, 100000]:

training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)

random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))

training_labels = function_to_approximate(training_inputs)

for input_dimension in input_dimensions_list:

for data_size, batch_size in [(500, 50), (500, 500), (100000, 50), (100000, 100000)]:

training_inputs = np.random.randn(data_size, input_dimension).astype(np.float32)

random_covector = np.random.randint(-5, 5, size=(input_dimension, 1))

training_labels = function_to_approximate(training_inputs)

data_set = tf.data.Dataset.from_tensor_slices((training_inputs, training_labels))

data_set = data_set.repeat(n_epochs)

data_set = data_set.batch(batch_size)

This outputs for me:

Not using tf.data, with data size 500, input dimension 10 and training with a full batch, it took an average of 0.20243382899980134 seconds to run 10 epochs.

Not using tf.data, with data size 100000, input dimension 10 and training with a full batch, it took an average of 0.2431719040000644 seconds to run 10 epochs.

Using tf.data, with data size 500, and input dimension 10, and training with batch size 50, it took an average of 0.09512088866661846 seconds to run 10 epochs.

Using tf.data, with data size 500, and input dimension 10, and training with batch size 500, it took an average of 0.07286913600000844 seconds to run 10 epochs.

Using tf.data, with data size 100000, and input dimension 10, and training with batch size 50, it took an average of 4.421892363666605 seconds to run 10 epochs.

Using tf.data, with data size 100000, and input dimension 10, and training with batch size 100000, it took an average of 2.2555197536667038 seconds to run 10 epochs.