2 views

I'm making a basic feedforward neural network to solve the XOR gate problem.

Standard settings: input layer + hidden layer + output layer, the constant learning rate of 0.01 and the number of epochs is 500.

Sigmoid activation all the way. Stochastic/Gradient descent for backpropagation.

the hidden layer has 2 neurons. The input and output data:

``````input = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]
output = [[0.0], [1.0], [1.0], [0.0]]``````

Now here's the problem: I know bias is a (column) vector, and you complete a cycle (forward + back) on a sample data. The predictions after training look like this:

``````(   0.4954120458511844  )
(   0.5081637529087711  )
(   0.5153967874989785  )
(   0.5653967874989785  )``````

Compared to when I set bias as a matrix (number of rows is `input.rows`) and instead train full sample data per cycle, the predictions are:

``````⎛   0.18379659987542804 ⎞
⎜   0.8220424701617579  ⎥
⎜   0.8217815808742437  ⎥
⎝   0.18653256456589742 ⎠``````

which are the correct ones?

I can post full code here, but I am certain the problem is from biases I just don't know why?

EDIT As I said in comments, the reason may be from Backpropagation part (Stochastic Gradient Descent) Here's the full code (yes it's in Swift, don't ask why) and I am using Surge Matrix library

It's LONG THOUGH:

import Surge

// XOR TABLE DATA

let inputDataAsArray: [[Double]] = [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]]

let outputDataAsArray: [[Double]] = [[0.0], [1.0], [1.0], [0.0]]

let inputData: Matrix<Double> = Matrix<Double>(inputDataAsArray)

let outputData: Matrix<Double> = Matrix<Double>(outputDataAsArray)

var inputData_samples : Array<Matrix<Double>> = Array()

var outputData_samples : Array<Matrix<Double>> = Array()

for i in 0..<inputDataAsArray.count{

inputData_samples.append(Matrix<Double>([inputDataAsArray[i]]))

outputData_samples.append(Matrix<Double>([outputDataAsArray[i]]))

}

let size = inputData.rows

let neurons = 2 // NUMBER OF NEURONS IN HIDDEN LAYER

var weights0 : Matrix<Double> = random(rows: inputData.columns, columns: neurons)

var biases0 : Matrix<Double> = Matrix<Double>(rows: 1, columns: neurons, repeatedValue: 0.0)

var weights1 : Matrix<Double> = random(rows: neurons, columns: outputData.columns)

var biases1 : Matrix<Double> = Matrix<Double>(rows: 1, columns: outputData.columns, repeatedValue: 0.0)

print("Running...")

let alpha = 0.01

let loops = size * 500

var sampleIndex = 0

for i in 0..<loops{

// FORWARD PROPAGATION

// LAYER 1

sampleIndex = i % size

let j : Int = .random(in: ClosedRange<Int>(uncheckedBounds: (lower: 0, upper: size - 1)))

let a0 = inputData_samples[j]

let output = outputData_samples[j]

let z1: Matrix<Double> = a0 * weights0 + biases0

let a1: Matrix<Double> = sigmoidMatrix(x: z1)

// LAYER 2

let z2 : Matrix<Double> = a1 * weights1 + biases1

let a2 : Matrix<Double> = sigmoidMatrix(x: z2)

//    let cost = cross_entropy(size: size, a: a2, y: output)

// BACKPROPAGATION

// LAYER 2

var dz2 : Matrix<Double> = subtractMatrix(x: a2, y: output)

let dw2 : Matrix<Double> = divideMatrix(x: transpose(a1) * dz2 , y: size)

let db2 : Matrix<Double> = divideMatrix(x: dz2, y: size)

// LAYER 1

dz2 = dz2 * transpose(weights1)

let dz1 : Matrix<Double> = sub(y: 1.0, x: a0)

* transpose(a0) *  dz2 // multiply(x: part1, y: sub(y: 1.0, x: part2))

let dw1 : Matrix<Double> = divideMatrix(x: transpose(a0) * dz1 , y: size)

let db1 : Matrix<Double> = divideMatrix(x: dz1, y: size)

weights0 = subtractMatrix(x: weights0, y: mul(alpha, x: dw1))

biases0 = subtractMatrix(x: biases0, y: mul(alpha, x: db1))

weights1 = subtractMatrix(x: weights1, y: mul(alpha, x: dw2))

biases1 = subtractMatrix(x: biases1, y: mul(alpha, x: db2))

}

for sample in inputData_samples{

let z1: Matrix<Double> = sample * weights0 + biases0

let a1: Matrix<Double> = sigmoidMatrix(x: z1)

let z2 : Matrix<Double> = a1 * weights1 + biases1

let a2 : Matrix<Double> = sigmoidMatrix(x: z2)

print(a2.description)

}

by (108k points)

AND and OR are linearly separable but XOR’s outputs are not linearly separable.

Therefore, we have to introduce another hidden layer to solve it. It turns out that each and every node in the hidden layer represents one of the simpler linearly separable logical operations (AND, OR, NAND) and the output layer will act as another logical operation that was fed by the outputs from the previous layer.

To understand what logic our network uses to come up with results, we need to analyze it’s weights (and biases).

We do that with model.get_weights(layer.W) to get the weights vector and model.get_weights(layer.W) to get the biases vector. You can refer to the link for the code:

https://towardsdatascience.com/tflearn-soving-xor-with-a-2x2x1-feed-forward-neural-network-6c07d88689ed

If you wish to know more about Neural Network then visit this Neural Network Tutorial.