Flatten is used to reshape the tensor to such a shape which is equal to the number of elements present in the tensor.
Suppose you have the output of a layer of shape(20,4,5,3) , flatten will unstack all the tensor values into a 1-D tensor with shape(20*4*5*3)
As per your code,
The above statement actually means that it will result in a dense network having 2 inputs and 16 outputs which will be applied independently for the 3steps. If D(x) transforms a 3D vector to a 16D layer you will get the output of a sequence of vectors [D(x[0,:], D(x[1,:],..., D(x[4,:]] having shape (4, 16) but for that you will have to first flatten the input to a 15D vector and then apply:
model = Sequential()