2 views
in Python
edited

I am running a CNN for a classification problem. I have 3 conv layers with 3 pooling layers. P3 is the output of the last pooling layer, whose dimensions are: [Batch_size, 4, 12, 48]_, and I want to flatten that matrix into a [Batch_size, 2304] size matrix, being 2304 = 4*12*48. I had been working with "Option A" (see below) for a while, but one day I wanted to try out "Option B", which would theoretically give me the same result. However, it did not. I have checked the following thread before

https://intellipaat.com/community/733/is-tf-contrib-layers-flatten-x-the-same-as-tf-reshape-x-n-1

but that just added more confusion, since trying "Option C" (taken from the aforementioned thread) gave a new different result.

P3 = tf.nn.max_pool(A3, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding='VALID')

P3_shape = P3.get_shape().as_list()

P = tf.contrib.layers.flatten(P3)                             <-----Option A

P = tf.reshape(P3, [-1, P3_shape[1]*P3_shape[2]*P3_shape[3]]) <---- Option B

P = tf.reshape(P3, [tf.shape(P3)[0], -1])                     <---- Option C

I am more inclined to go with "Option B" since that is the one I have seen in a video by Dandelion Mane (

), but I would like to understand why these 3 options are giving different results.

by (33.1k points)

In deep learning, we use rescaling methods to fit images of different shapes into a particular shape. Rescale options help to reshape the image easily.

There all three methods are used to rescale images:

import tensorflow as tf

import numpy as np

p3 = tf.placeholder(tf.float32, [None, 1, 2, 4])

p3_shape = p3.get_shape().as_list()

p_a = tf.contrib.layers.flatten(p3)

p_b = tf.reshape(p3, [-1, p3_shape[1] * p3_shape[2] * p3_shape[3]])

p_c = tf.reshape(p3, [tf.shape(p3)[0], -1])

print(p_a.get_shape())

print(p_b.get_shape())

print(p_c.get_shape())

with tf.Session() as sess:

i_p3 = np.arange(16, dtype=np.float32).reshape([2, 1, 2, 4])

print("a", sess.run(p_a, feed_dict={p3: i_p3}))

print("b", sess.run(p_b, feed_dict={p3: i_p3}))

print("c", sess.run(p_c, feed_dict={p3: i_p3}))

Here you can see, the above code yields the same result 3 times. But different results here are caused by something else, not by the reshaping.