Explore Courses Blog Tutorials Interview Questions
0 votes
in Machine Learning by (19k points)
I want to predict the next frame of a (greyscale) video given N previous frames - using CNNs or RNNs in Keras. Most tutorials and other information regarding time series prediction and Keras use a 1-dimensional input in their network but mine would be 3D (N frames x rows x cols)

I'm currently really unsure what a good approach for this problem would be. My ideas include:

Using one or more LSTM layers. The problem here is that I'm not sure whether they're suited to take a series of images instead a series of scalars as input. Wouldn't the memory consumption explode? If it is okay to use them: How can I use them in Keras for higher dimensions?

Using 3D convolution on the input (the stack of previous video frames). This raises other questions: Why would this help when I'm not doing a classification but a prediction? How can I stack the layers in such a way that the input of the network has dimensions (N x cols x rows) and the output (1 x cols x rows)?

I'm pretty new to CNNs/RNNs and Keras and would appreciate any hint into the right direction.

1 Answer

0 votes
by (33.1k points)

In the current version of Keras (v1.2.2), this layer is already included and can be imported using

from keras.layers.convolutional_recurrent import ConvLSTM2D 

To use this layer, the video data has to be formatted as follows:

[nb_samples, nb_frames, width, height, channels] # if using dim_ordering = 'tf'

[nb_samples, nb_frames, channels, width, height] # if using dim_ordering = 'th'

For more details on this, study Artificial Intelligence Course

Browse Categories