Recently I started toying with neural networks. I was trying to implement an AND gate with Tensorflow. I am having trouble understanding when to use different cost and activation functions. This is a basic neural network with only input and output layers, with no hidden layers.

First I tried to implement it in this way. As you can see this is poor implementation, but I think it gets the job done, at least in some way. So, I tried only the real outputs, no one hot true outputs. For activation functions, I used a sigmoid function and for cost function I used squared error cost function (I think its called that, correct me if I'm wrong).

I've tried using ReLU and Softmax as activation functions (with the same cost function) and it doesn't work. I figured out why they don't work. I also tried the sigmoid function with Cross Entropy cost function, it also doesn't work.

import tensorflow as tf

import numpy

train_X = numpy.asarray([[0,0],[0,1],[1,0],[1,1]])

train_Y = numpy.asarray([[0],[0],[0],[1]])

x = tf.placeholder("float",[None, 2])

y = tf.placeholder("float",[None, 1])

W = tf.Variable(tf.zeros([2, 1]))

b = tf.Variable(tf.zeros([1, 1]))

activation = tf.nn.sigmoid(tf.matmul(x, W)+b)

cost = tf.reduce_sum(tf.square(activation - y))/4

optimizer = tf.train.GradientDescentOptimizer(.1).minimize(cost)

init = tf.initialize_all_variables()

with tf.Session() as sess:

sess.run(init)

for i in range(5000):

train_data = sess.run(optimizer, feed_dict={x: train_X, y: train_Y})

result = sess.run(activation, feed_dict={x:train_X})

print(result)

after 5000 iterations:

[[ 0.0031316 ]

[ 0.12012422]

[ 0.12012422]

[ 0.85576665]]