# fill missing values of sequence with neural networks

1 view

I want to make a little project and I want to use neural networks with python. I found that pybrain is the best solution. But until now, all the examples and questions I have found, cannot help me.

I have a sequence of numbers. Hundreds of rows. Some values are missing and instead of a number, there is an "x".

For example

1425234838636**x**40543485435097**x**43953458345345430843967067045764607457607645067045**x**04376037654067458674506704567408576405

and so on. This is just an example. Not my sequence.

I thought to read one by one the values and train my neural net and when I find one 'x' I will predict the number and I will continue training it with the following numbers.

What I have found until now are training like this one

with some inputs and some outputs.

Any advice on how can I continue with it?

Edit: I figure something and I would like to receive feedback because I don't know if it is right.

I still have the string for above. I split it in a list so I have a list where each entity is a number.

for ind in range(len(myList)):

if not myList[ind] == "x" and not myList[ind+1]=="x":

else:

break

net = FeedForwardNetwork()

inp = LinearLayer(1)

h1 = SigmoidLayer(1)

outp = LinearLayer(1)

net.sortModules()

trainer = BackpropTrainer(net, ds)

trainer.trainOnDataset(ds,1000)

trainer.testOnData(verbose=True)

lis[ind+1] = net.activate((ind,))

GO to the beginning and continue from the last "x" which replaced from the net.activate()

What do you think? Do you believe that something like this will work?

by (90.3k points)

Most often, researchers fill missing values using a heuristic or unsupervised imputation strategy. Adding complications, these values are not missing at random, but reflect decisions by caregivers. Thus, the pattern of recorded measurements holds potential information about the state of the patient.

It is very common to have missing observations from sequence data.

Data may be corrupt or unavailable, but it is also feasible that your data has variable-length sequences by definition. Those sequences with fewer timesteps may be estimated to have missing values.

For more information regarding the same, refer to the following link which will tell you more about how to handle missing Timesteps in Sequence Prediction Problems with Python

https://machinelearningmastery.com/handle-missing-timesteps-sequence-prediction-problems-python/