0 votes
1 view
in AI and Deep Learning by (44.3k points)

I'm trying to implement stochastic gradient descent in MATLAB however I am not seeing any convergence. Mini-batch gradient descent worked as expected so I think that the cost function and gradient steps are correct.

The two main issues I am having are:

  1. Randomly shuffling the data in the training set before the for-loop
  2. Selecting one example at a time

Here is my MATLAB code:

Generating Data

alpha = 0.001;

num_iters = 10;

xrange =(-10:0.1:10); % data lenght

ydata  = 5*(xrange)+30; % data with gradient 2, intercept 5

% plot(xrange,ydata); grid on;

noise  = (2*randn(1,length(xrange))); % generating noise 

target = ydata + noise; % adding noise to data

f1 = figure

subplot(2,2,1);

scatter(xrange,target); grid on; hold on; % plot a scttaer

title('Linear Regression')

xlabel('xrange')

ylabel('ydata')

tita0 = randn(1,1); %intercept (randomised)

tita1 = randn(1,1); %gradient  (randomised)

% Initialize Objective Function History

J_history = zeros(num_iters, 1);

% Number of training examples

m = (length(xrange));

Shuffling data, Gradient Descent and Cost Function

% STEP1 : we shuffle the data

data = [ xrange, ydata];

data = data(randperm(size(data,1)),:);

y = data(:,1);

X = data(:,2:end);

for iter = 1:num_iters

    for i = 1:m

        x = X(:,i); % STEP2 Select one example

        h = tita0 + tita1.*x; % building the estimated     %Changed to xrange in BGD

        %c = (1/(2*length(xrange)))*sum((h-target).^2)

        temp0 = tita0 - alpha*((1/m)*sum((h-target)));

        temp1 = tita1 - alpha*((1/m)*sum((h-target).*x));  %Changed to xrange in BGD

        tita0 = temp0;

        tita1 = temp1;

        fprintf("here\n %d; %d", i, x)

    end

        J_history(iter) = (1/(2*m))*sum((h-target).^2); % Calculating cost from data to estimate

        fprintf('Iteration #%d - Cost = %d... \r\n',iter, J_history(iter));

end

On plotting the cost vs iterations and linear regression graphs, the MSE settles (local minimum?) at around 420 which is wrong.

enter image description here

On the other hand if I re-run the exact same code however using batch gradient descent I get acceptable results. In batch gradient descent I am changing x to xrange:

enter image description here

Any suggestions on what I am doing wrong?


EDIT:

I also tried selecting random indexes using:

f = round(1+rand(1,1)*201);        %generating random indexes 

and then selecting one example:

x = xrange(f); % STEP2 Select one example

Proceeding to use x in the hypothesis and GD steps also yield a cost of 420.

1 Answer

0 votes
by (95.5k points)

 Firstly we have to shuffle the data as it has two main advantages:

  • Improve the ML model quality

  • Improve predictive performance

This is how you will shuffle your data:

data = [ xrange', target']; data = data(randperm(size(data,1)),:);

Now we have to index X and y correctly:

y = data(:,2); X = data(:,1);

Then during gradient descent, you need to update based on a single value not on target:

tita0 = tita0 - alpha*((1/m)*((h-y(i)))); tita1 = tita1 - alpha*((1/m)*((h-y(i)).*x));

Theta converges to [5, 30] with the changes above.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...