1 view

I am trying to implement the SVM loss function and its gradient. I found some example projects that implement these two, but I could not figure out how they can use the loss function when computing the gradient.

Here is the formula of loss function: What I cannot understand is that how can I use the loss function's result while computing gradient?

The example project computes the gradient as follows:

for i in xrange(num_train):

scores = X[i].dot(W)

correct_class_score = scores[y[i]]

for j in xrange(num_classes):

if j == y[i]:

continue

margin = scores[j] - correct_class_score + 1 # note delta = 1

if margin > 0:

loss += margin

dW[:,j] += X[i]

dW[:,y[i]] -= X[i]

dW is for gradient result. And X is the array of training data. But I didn't understand how the derivative of the loss function results in this code.

by (33.1k points)
edited by

There is a method to calculate the gradient is Calculus. It differentiates loss function with respect to W(yi) like this: and with respect to W(j) when j!=yi is: Here 1 is just an indicator function so that we can ignore the middle form when the condition is true. Also, SVM Algorithms are useful as well.

Hope this helps!

If you want to know about Artificial Intelligence and also undergo Deep Learning Tutorial, then you can watch this video: