I am trying to reconcile my understanding of LSTMs and pointed out here in __this post by Christopher Olah__ implemented in Keras. I am following the __blog written by Jason Brownlee__ for the Keras tutorial. What I am mainly confused about is,

The reshaping of the data series into [samples, time steps, features] and,

The stateful LSTMs

Let's concentrate on the above two questions with reference to the code posted below:

features]

trainX = numpy.reshape(trainX, (trainX.shape[0], look_back, 1))

testX = numpy.reshape(testX, (testX.shape[0], look_back, 1))

########################

# The IMPORTANT BIT

##########################

# create and fit the LSTM network

batch_size = 1

model = Sequential()

model.add(LSTM(4, batch_input_shape=(batch_size, look_back, 1), stateful=True))

model.add(Dense(1))

model.compile(loss='mean_squared_error', optimizer='adam')

for i in range(100):

model.fit(trainX, trainY, nb_epoch=1, batch_size=batch_size, verbose=2, shuffle=False)

model.reset_states()

Note: create_dataset takes a sequence of length N and returns an N-look_back array of which each element is a look_back length sequence.

# Stateful LSTMs

Does stateful LSTMs mean that we save the cell memory values between runs of batches? If this is the case, batch_size is one, and the memory is reset between the training runs so what was the point of saying that it was stateful. I'm guessing this is related to the fact that training data is not shuffled, but I'm not sure how.

Any thoughts? Image reference: __http://karpathy.github.io/2015/05/21/rnn-effectiveness/__