0 votes
1 view
in Machine Learning by (11.5k points)
edited by

I was looking at the docs of TensorFlow about tf.nn.conv2d here. But I can't understand what it does or what it is trying to achieve. It says on the docs,

# 1: Flattens the filter to a 2-D matrix with the shape

[filter_height * filter_width * in_channels, output_channels].

Now, what does that do? Is that element-wise multiplication or just plain matrix multiplication? I also could not understand the other two points mentioned in the docs. I have written them below :

# 2: Extracts image patches from the input tensor to form a virtual tensor of shape

[batch, out_height, out_width, filter_height * filter_width * in_channels].

# 3: For each patch, right-multiplies the filter matrix and the image patch vector.

It would be really helpful if anyone could give an example, a piece of code (extremely helpful) maybe and explain what is going on there and why the operation is like this.

I've tried coding a small portion and printing out the shape of the operation. Still, I can't understand.

I tried something like this:

Code:

op = tf.shape(tf.nn.conv2d(tf.random_normal([1,10,10,10]), 

tf.random_normal([2,10,10,10]), 

strides=[1, 2, 2, 1], padding='SAME')) 

with tf.Session() as sess: 

result = sess.run(op)

print(result)

I understand bits and pieces of convolutional neural networks. I studied them here. But the implementation of TensorFlow is not what I expected. So it raised the question.

1 Answer

0 votes
by (32.8k points)

You should know about the use of convolutional layers first, to implement them with different filters. Convolutions are used to transform the original function into a form to get more information. Convolutions have been used in image processing to blur and sharpen images and many other applications.

If the image is larger than the size of the filter, we slide the filter to the various parts of the image and perform the convolution operation. Each time we do that, we generate a new pixel in the output image. The 2D convolution is most commonly used and abbreviated as conv2D. 

TensorFlow's conv2d function calculates convolutions in batches and uses a slightly different format. For an input it is [batch, in_height, in_width, in_channels] for the kernel it is [filter_height, filter_width, in_channels, out_channels].

For example:

import tensorflow as tf

k = tf.constant([ 

[1, 0, 1], 

[2, 1, 0], 

[0, 0, 1] ], 

dtype=tf.float32, name='k') 

i = tf.constant([ 

[4, 3, 1, 0], 

[2, 1, 0, 1], 

[1, 2, 4, 1], 

[3, 1, 0, 2] ], 

dtype=tf.float32, name='i') 

kernel = tf.reshape(k, [3, 3, 1, 1], name='kernel') 

image = tf.reshape(i, [1, 4, 4, 1], name='image')

Connecting to a session to compute value:

res = tf.squeeze(tf.nn.conv2d(image, kernel, [1, 1, 1, 1], "VALID")) 

# VALID means no padding 

with tf.Session() as sess: 

print sess.run(res)

This code will return the result of 2d convolution computations.

...