2 views

I'm implementing a model relying on 3D convolutions (for a task that is similar to action recognition) and I want to use batch normalization (see [Ioffe & Szegedy 2015]). I could not find any tutorial focusing on 3D convs, hence I'm making a short one here which I'd like to review with you.

The code below refers to TensorFlow r0.12 and it explicitly instances variables - I mean I'm not using tf.contrib.learn except for the tf.contrib.layers.batch_norm() function. I'm doing this both to better understand how things work under the hood and to have more implementation freedom (e.g., variable summaries).

I will get to the 3D convolution case smoothly by first writing the example for a fully-connected layer, then for a 2D convolution and finally for the 3D case. While going through the code, it would be great if you could check if everything is done correctly - the code runs, but I'm not 100% sure about the way I apply batch normalization. I end this post with a more detailed question.

import tensorflow as tf

# This flag is used to allow/prevent batch normalization params updates

# depending on whether the model is being trained or used for prediction.

training = tf.placeholder_with_default(True, shape=())

Fully-connected (FC) case

# Input.

INPUT_SIZE = 512

u = tf.placeholder(tf.float32, shape=(None, INPUT_SIZE))

# FC params: weights only, no bias as per [Ioffe & Szegedy 2015].

FC_OUTPUT_LAYER_SIZE = 1024

w = tf.Variable(tf.truncated_normal(

[INPUT_SIZE, FC_OUTPUT_LAYER_SIZE], dtype=tf.float32, stddev=1e-1))

# Layer output with no activation function (yet).

fc = tf.matmul(u, w)

# Batch normalization.

fc_bn = tf.contrib.layers.batch_norm(

fc,

center=True,

scale=True,

is_training=training,

scope='fc-batch_norm')

# Activation function.

fc_bn_relu = tf.nn.relu(fc_bn)

print(fc_bn_relu)  # Tensor("Relu:0", shape=(?, 1024), dtype=float32)

I guess the code above is also correct for the 3D conv case. In fact, when I define my model if I print all the trainable variables, I also see the expected numbers of beta and gamma variables. For instance:

Tensor("conv3a/conv3d_weights/read:0", shape=(3, 3, 3, 128, 256), dtype=float32)

by (33.1k points)

In your case, batchnorm can be applied to any tensor of rank greater than 1.

A "standard" 2D batchnorm can be significantly faster in tensorflow than 3D or higher, because it supports fused_batch_norm implementation, which applies on one kernel operation:

Fused batch norm combines the multiple operations needed to do batch normalization into a single kernel. Batch norm is an expensive process that for some models makes up a large percentage of the operation time. Using fused batch norm can result in a 12%-30% speedup.

There is an issue on GitHub to support 3D filters as well, but there hasn't been any recent activity and at this point the issue is closed unresolved.

Although the original paper prescribes using batchnorm before ReLU activation (and that's what you did in the code above), there is evidence that it's probably better to use batchnorm after the activation.

For anyone interested to apply the idea of normalization in practice, there's been recent research developments of this idea, namely weight normalization and layer normalization, which fix certain disadvantages of original batchnorm, for example they work better for LSTM and recurrent networks.

For more details, study Tensorflow For Windows.