Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)
edited by

I successfully built and train the network and introduced the L2 regularization on all weights and biases. Right now I am trying out the dropout for hidden layer in order to improve generalization. I wonder, does it makes sense to both introduce the L2 regularization into the hidden layer and dropout on that same layer? If so, how to do this properly?

During dropout, we literally switch off half of the activations of the hidden layer and double the amount outputted by the rest of the neurons. While using the L2 we compute the L2 norm on all hidden weights. But I am not sure how to compute L2 in case we use dropout. We switch off some activations, shouldn't we remove the weights which are 'not used' now from the L2 calculation? Any references on that matter will be useful, I haven't found any info.

Just in case you are interested, my code for ANN with L2 regularization is below:

#for NeuralNetwork model code is below

#We will use SGD for training to save time. Code is from Assignment 2

#beta is the new parameter - controls the level of regularization. Default is 0.01

#but feel free to play with it

#notice, we introduce L2 for both biases and weights of all layers

beta = 0.01

#building tensorflow graph

graph = tf.Graph()

with graph.as_default():

Input data. For the training data, we use a placeholder that will be fed at run time with a training minibatch.

  tf_train_dataset = tf.placeholder(tf.float32,

  shape=(batch_size, image_size * image_size))

  tf_train_labels = tf.placeholder(tf.float32, shape=(batch_size, num_labels))

  tf_valid_dataset = tf.constant(valid_dataset)

  tf_test_dataset = tf.constant(test_dataset)

Now let's build our new hidden layer that's how many hidden neurons we want

  num_hidden_neurons = 1024

  it's weights

  hidden_weights = tf.Variable(

  tf.truncated_normal([image_size * image_size, num_hidden_neurons]))

  hidden_biases = tf.Variable(tf.zeros([num_hidden_neurons]))

  Now the layer itself. It multiplies data by weights, adds biases

  and takes ReLU over the result

  hidden_layer = tf.nn.relu(tf.matmul(tf_train_dataset, hidden_weights) + hidden_biases)

 time to go for an output linear layer

  out weights connect hidden neurons to output labels

  biases are added to output labels  

  out_weights = tf.Variable(

    tf.truncated_normal([num_hidden_neurons, num_labels]))  

  out_biases = tf.Variable(tf.zeros([num_labels]))  

  compute output  

  out_layer = tf.matmul(hidden_layer,out_weights) + out_biases

  #our real output is a softmax of prior result

  #and we also compute its cross-entropy to get our loss

  #Notice - we introduce our L2 here

  loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(

    out_layer, tf_train_labels) +

    beta*tf.nn.l2_loss(hidden_weights) +

    beta*tf.nn.l2_loss(hidden_biases) +

    beta*tf.nn.l2_loss(out_weights) +

    beta*tf.nn.l2_loss(out_biases)))

  #now we just minimize this loss to actually train the network

  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

  #nice, now let's calculate the predictions on each dataset for evaluating the

  #performance so far

  # Predictions for the training, validation, and test data.

  train_prediction = tf.nn.softmax(out_layer)

  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)

  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) 

  test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)

  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)

#now is the actual training on the ANN we built

#we will run it for some number of steps and evaluate the progress after 

#every 500 steps

#number of steps we will train our ANN

num_steps = 3001

#actual training

with tf.Session(graph=graph) as session:

  tf.initialize_all_variables().run()

  print("Initialized")

  for step in range(num_steps):

    # Pick an offset within the training data, which has been randomized.

    # Note: we could use better randomization across epochs.

    offset = (step * batch_size) % (train_labels.shape[0] - batch_size)

    # Generate a minibatch.

    batch_data = train_dataset[offset:(offset + batch_size), :]

    batch_labels = train_labels[offset:(offset + batch_size), :]

    # Prepare a dictionary telling the session where to feed the minibatch.

    # The key of the dictionary is the placeholder node of the graph to be fed,

    # and the value is the numpy array to feed to it.

    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels}

    _, l, predictions = session.run(

      [optimizer, loss, train_prediction], feed_dict=feed_dict)

    if (step % 500 == 0):

      print("Minibatch loss at step %d: %f" % (step, l))

      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))

      print("Validation accuracy: %.1f%%" % accuracy(

        valid_prediction.eval(), valid_labels))

      print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

1 Answer

0 votes
by (33.1k points)

The L2 and dropout in the network is a slight improvement over the same network without the dropout. I am still not sure if it is really worth the effort to introduce both of them, L2 and dropout but at least it works and slightly improves the results.

Code:

  #add dropout on a hidden layer

  #we pick up the probabylity of switching off the activation

  #and perform the switch off of the activations

  keep_prob = tf.placeholder("float")

  hidden_layer_drop = tf.nn.dropout(hidden_layer, keep_prob)  

  #time to go for output linear layer

  #out weights connect hidden neurons to output labels

  #biases are added to output labels  

  out_weights = tf.Variable(

    tf.truncated_normal([num_hidden_neurons, num_labels]))  

  out_biases = tf.Variable(tf.zeros([num_labels]))  

  #compute output

  #notice that upon training we use the switched off activations

  #i.e. the variaction of hidden_layer with the dropout active

  out_layer = tf.matmul(hidden_layer_drop,out_weights) + out_biases

  #our real output is a softmax of prior result

  #and we also compute its cross-entropy to get our loss

  #Notice - we introduce our L2 here

  loss = (tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(

    out_layer, tf_train_labels) +

    beta*tf.nn.l2_loss(hidden_weights) +

    beta*tf.nn.l2_loss(hidden_biases) +

    beta*tf.nn.l2_loss(out_weights) +

    beta*tf.nn.l2_loss(out_biases)))

  #now we just minimize this loss to actually train the network

  optimizer = tf.train.GradientDescentOptimizer(0.5).minimize(loss)

  #nice, now let's calculate the predictions on each dataset for evaluating the

  #performance so far

  # Predictions for the training, validation, and test data.

  train_prediction = tf.nn.softmax(out_layer)

  valid_relu = tf.nn.relu(  tf.matmul(tf_valid_dataset, hidden_weights) + hidden_biases)

  valid_prediction = tf.nn.softmax( tf.matmul(valid_relu, out_weights) + out_biases) 

  test_relu = tf.nn.relu( tf.matmul( tf_test_dataset, hidden_weights) + hidden_biases)

  test_prediction = tf.nn.softmax(tf.matmul(test_relu, out_weights) + out_biases)



 

#now is the actual training on the ANN we built

#we will run it for some number of steps and evaluate the progress after 

#every 500 steps

#number of steps we will train our ANN

num_steps = 3001

#actual training

with tf.Session(graph=graph) as session:

  tf.initialize_all_variables().run()

  print("Initialized")

  for step in range(num_steps):

    # Pick an offset within the training data, which has been randomized.

    # Note: we could use better randomization across epochs.

    offset = (step * batch_size) % (train_labels_2.shape[0] - batch_size)

    # Generate a minibatch.

    batch_data = train_dataset_2[offset:(offset + batch_size), :]

    batch_labels = train_labels_2[offset:(offset + batch_size), :]

    # Prepare a dictionary telling the session where to feed the minibatch.

    # The key of the dictionary is the placeholder node of the graph to be fed,

    # and the value is the numpy array to feed to it.

    feed_dict = {tf_train_dataset : batch_data, tf_train_labels : batch_labels, keep_prob : 0.5}

    _, l, predictions = session.run(

      [optimizer, loss, train_prediction], feed_dict=feed_dict)

    if (step % 500 == 0):

      print("Minibatch loss at step %d: %f" % (step, l))

      print("Minibatch accuracy: %.1f%%" % accuracy(predictions, batch_labels))

      print("Validation accuracy: %.1f%%" % accuracy(

        valid_prediction.eval(), valid_labels))

      print("Test accuracy: %.1f%%" % accuracy(test_prediction.eval(), test_labels))

Hope this answer helps.

Visit here if you wish to know more about Tensor Flow.

Browse Categories

...