I am using autoencoders to do anomaly detection. So, I have finished training my model and now I want to calculate the reconstruction loss for each entry in the dataset. so that I can assign anomalies to data points with high reconstruction loss.
This is my current code to calculate the reconstruction loss But this is really slow. By my estimation, it should take 5 hours to go through the dataset whereas training one epoch occurs in approx 55 mins. I feel that converting to tensor operation is bottlenecking the code, but I can't find a better way to do it.
I've tried changing the batch sizes but it does not make much of a difference. I have to use the convert to tensor part because K.eval is throwing an error if I do it normally.
for i in range(0, encoded_dataset.shape, batch_size):
# Append the batch losses (numpy array) to the list
reconstruction_loss_transaction.append(K.eval(loss_function( y_true, y_pred)))
I was able to train in 55 mins per epoch. So I feel prediction should not take 5 hours per epoch. encoded_dataset is a variable that has the entire dataset in main memory as a data frame. I am using Azure VM instance. K.eval(loss_function(y_true,y_pred) is to find the loss for each row of the batch So y_true will be of size (batch_size,2000) and so will y_pred K.eval(loss_function(y_true,y_pred) will give me an output of
(batch_size,1) evaluating binary cross-entropy on each row of y _true and y_pred