Activation Function :
The activation function of the node defines the output of the node. There are 4 most popular activation function:
- Step function – It restricts the value of output to 0 and 1.
- Rectified linear unit – ReLU is like half of step function, it suppresses the negative values. It is the most popular and utilized function.
- Sigmoid function – Better than step function, it also limits the output from 0 to 1, but it smoothens the value. It is also called probabilities, it is a continuous function. When we have binary problems, we use sigmoid function.
- Tanh function – similar to sigmoid, it limits the function from -1 to 1.
For the best of career growth, check out Intellipaat’s Machine Learning Course and get certified.
To introduce bias in the model, we add an input node ‘1’ and try to find the weight of the particular node. For example, if we want to predict a model to find out how much interest rates should be imposed on person A’s credit card whose salary is x, expenditure is y, we make a model for each salary and find the interest rate. If the person B does not have any salary, then the interest rate will not be zero, it will a default interest rate is applied, it is called bias. It is generally introduced in the model to find out one more input node ‘1’ with weight w0.
Neural Network Training Process Flow :
We have the data and we randomly initialize the weights, the reason for random initialization we don’t get to reach one local minima, so we can go to global minima point. We input the data and apply the activation function like sigmoid, to predict the result, calculate the cost and final value. We create derivation of the error and we back propagate it to reach the global minima where the error is less. This is called backward propagation.
In case of multilayer perceptron, we have multiple neurons, all inputs go to each neuron, some activation function are applied in layer 1, other in layer 2, this is a very complex structure. It can be defined easily in TensorFlow layer by layer and can be connected to each neuron with different activation functions. We can perform forward and backward propagation simultaneously where gradient descent happens during this process. It is repeated until the model gets trained.
In case of continuous data, we can avoid applying any activation function. But if the output is continuous, we can apply sigmoid in the last/output layer and find the discrete output.
Go through this Artificial Intelligence Interview Questions And Answers to excel in your Artificial Intelligence Interview.
Multilayer neuron :
A real data set will contain large dataset, there will be multiple layers, features and weights are initialized in random manner. The learning rate are fixed in the range of 0.01-0.001. We have decided the number of layers and number of neurons in each layer. Generally, the higher the layer and neurons (2n+1), if we have 3 variables, then 7 neurons in the layers is the idle math for number of neurons. On further layers, we keep adding neurons, neural networks are robust enough, but extra neurons increase training time.
The last layer will have the number of outputs. For regression problem, the output neuron will be one. In case of classification problem, there are two types-
- Binary – 0 or 1, we have single neuron
- Multiple – more classes will be existing., k number of neurons
A logistic function uses sigmoid function when it is a classification problem in case of single neuron. In case of multiple neural network, we have complex problem, multiple features. We have three inputs in the start followed with hidden layer with 4 and 3 neurons and last output layer with one neuron. All the weights will be multiplied with the output on the hidden layer and passed to next layer. We can apply any activation function to reach next layer. This will be scaled up based on the number of neurons. This process is forward propagation. The cost function for logistic regression
C = -( ⅀ y log (ŷ) + (1-y) log(1-ŷ))
We apply sigmoid function on the logistic regression and the derivation of the cost function is
dCdw=(y- y) x
In backward propagation we start from the last layer where the activation function is ŷ.
If you have any doubts or queries related to Data Science, do post on Machine Learning Community.