feed forward neural network fails to classify due to dimensionality of biases

Question

asked Jul 2, 2019 in AI and Deep Learning by ashely (50.2k points)

I'm making a basic feedforward neural network to solve the XOR gate problem.

Standard settings: input layer + hidden layer + output layer, the constant learning rate of 0.01 and the number of epochs is 500.

Sigmoid activation all the way. Stochastic/Gradient descent for backpropagation.

the hidden layer has 2 neurons. The input and output data:

input = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
output = [[0.0], [1.0], [1.0], [0.0]]

Now here's the problem: I know bias is a (column) vector, and you complete a cycle (forward + back) on a sample data. The predictions after training look like this:

(   0.4954120458511844  )
(   0.5081637529087711  )
(   0.5153967874989785  )
(   0.5653967874989785  )

Compared to when I set bias as a matrix (number of rows is input.rows) and instead train full sample data per cycle, the predictions are:

⎛   0.18379659987542804 ⎞
⎜   0.8220424701617579  ⎥
⎜   0.8217815808742437  ⎥
⎝   0.18653256456589742 ⎠

which are the correct ones?

I can post full code here, but I am certain the problem is from biases I just don't know why?

EDIT As I said in comments, the reason may be from Backpropagation part (Stochastic Gradient Descent) Here's the full code (yes it's in Swift, don't ask why) and I am using Surge Matrix library

It's LONG THOUGH:

import Surge
// XOR TABLE DATA
let inputDataAsArray: [[Double]] = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
let outputDataAsArray: [[Double]] = [[0.0], [1.0], [1.0], [0.0]]
let inputData: Matrix<Double> = Matrix<Double>(inputDataAsArray)
let outputData: Matrix<Double> = Matrix<Double>(outputDataAsArray)
var inputData_samples : Array<Matrix<Double>> = Array()
var outputData_samples : Array<Matrix<Double>> = Array()
for i in 0..<inputDataAsArray.count{
inputData_samples.append(Matrix<Double>([inputDataAsArray[i]]))
outputData_samples.append(Matrix<Double>([outputDataAsArray[i]]))
}
let size = inputData.rows
let neurons = 2 // NUMBER OF NEURONS IN HIDDEN LAYER
var weights0 : Matrix<Double> = random(rows: inputData.columns, columns: neurons)
var biases0 : Matrix<Double> = Matrix<Double>(rows: 1, columns: neurons, repeatedValue: 0.0)
var weights1 : Matrix<Double> = random(rows: neurons, columns: outputData.columns)
var biases1 : Matrix<Double> = Matrix<Double>(rows: 1, columns: outputData.columns, repeatedValue: 0.0)
print("Running...")
let alpha = 0.01
let loops = size * 500
var sampleIndex = 0
for i in 0..<loops{
// FORWARD PROPAGATION
// LAYER 1
sampleIndex = i % size
let j : Int = .random(in: ClosedRange<Int>(uncheckedBounds: (lower: 0, upper: size - 1)))
let a0 = inputData_samples[j]
let output = outputData_samples[j]
let z1: Matrix<Double> = a0 * weights0 + biases0
let a1: Matrix<Double> = sigmoidMatrix(x: z1)
// LAYER 2
let z2 : Matrix<Double> = a1 * weights1 + biases1
let a2 : Matrix<Double> = sigmoidMatrix(x: z2)
// let cost = cross_entropy(size: size, a: a2, y: output)
// BACKPROPAGATION
// LAYER 2
var dz2 : Matrix<Double> = subtractMatrix(x: a2, y: output)
let dw2 : Matrix<Double> = divideMatrix(x: transpose(a1) * dz2 , y: size)
let db2 : Matrix<Double> = divideMatrix(x: dz2, y: size)
// LAYER 1
dz2 = dz2 * transpose(weights1)
let dz1 : Matrix<Double> = sub(y: 1.0, x: a0)
* transpose(a0) * dz2 // multiply(x: part1, y: sub(y: 1.0, x: part2))
let dw1 : Matrix<Double> = divideMatrix(x: transpose(a0) * dz1 , y: size)
let db1 : Matrix<Double> = divideMatrix(x: dz1, y: size)
weights0 = subtractMatrix(x: weights0, y: mul(alpha, x: dw1))
biases0 = subtractMatrix(x: biases0, y: mul(alpha, x: db1))
weights1 = subtractMatrix(x: weights1, y: mul(alpha, x: dw2))
biases1 = subtractMatrix(x: biases1, y: mul(alpha, x: db2))
}
for sample in inputData_samples{
let z1: Matrix<Double> = sample * weights0 + biases0
let a1: Matrix<Double> = sigmoidMatrix(x: z1)
let z2 : Matrix<Double> = a1 * weights1 + biases1
let a2 : Matrix<Double> = sigmoidMatrix(x: z2)
print(a2.description)
}

1 Answer

vinita · Answer 1 · 2019-07-02T09:42:12+0000

AND and OR are linearly separable but XOR’s outputs are not linearly separable.

Therefore, we have to introduce another hidden layer to solve it. It turns out that each and every node in the hidden layer represents one of the simpler linearly separable logical operations (AND, OR, NAND) and the output layer will act as another logical operation that was fed by the outputs from the previous layer.

To understand what logic our network uses to come up with results, we need to analyze it’s weights (and biases).

We do that with model.get_weights(layer.W) to get the weights vector and model.get_weights(layer.W) to get the biases vector. You can refer to the link for the code:

https://towardsdatascience.com/tflearn-soving-xor-with-a-2x2x1-feed-forward-neural-network-6c07d88689ed

If you wish to know more about Neural Network then visit this Neural Network Tutorial.

feed forward neural network fails to classify due to dimensionality of biases

1 Answer

Related questions

Browse Categories

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources