Back

Explore Courses Blog Tutorials Interview Questions
0 votes
3 views
in Machine Learning by (19k points)
I want to predict the next frame of a (greyscale) video given N previous frames - using CNNs or RNNs in Keras. Most tutorials and other information regarding time series prediction and Keras use a 1-dimensional input in their network but mine would be 3D (N frames x rows x cols)

I'm currently really unsure what a good approach for this problem would be. My ideas include:

Using one or more LSTM layers. The problem here is that I'm not sure whether they're suited to take a series of images instead a series of scalars as input. Wouldn't the memory consumption explode? If it is okay to use them: How can I use them in Keras for higher dimensions?

Using 3D convolution on the input (the stack of previous video frames). This raises other questions: Why would this help when I'm not doing a classification but a prediction? How can I stack the layers in such a way that the input of the network has dimensions (N x cols x rows) and the output (1 x cols x rows)?

I'm pretty new to CNNs/RNNs and Keras and would appreciate any hint into the right direction.

1 Answer

0 votes
by (33.1k points)

In the current version of Keras (v1.2.2), this layer is already included and can be imported using

from keras.layers.convolutional_recurrent import ConvLSTM2D 

To use this layer, the video data has to be formatted as follows:

[nb_samples, nb_frames, width, height, channels] # if using dim_ordering = 'tf'

[nb_samples, nb_frames, channels, width, height] # if using dim_ordering = 'th'

For more details on this, study Artificial Intelligence Course

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

29.3k questions

30.6k answers

501 comments

104k users

Browse Categories

...