There are many advantages of data preprocessing like scaling or normalization before the training model. Scaling enables the same model to be used for training and evaluation.
If you use a single neural net at test time without dropout, then the weight of that network will be scaled-down versions of the trained weights. If a unit is retained with probability p during training, the outgoing weights of that unit are multiplied by p at test time.
You can use TensorFlow implementation to add op to scale up the weights by (1/keep_prob) at the training time, rather than adding ops to scale down the weights by keep_prob 1 to at the test time.
Afterward, the effect on performance is negligible, and the code will be simpler because we use that same graph to treat keep_prob as a tf.placeholder() to feed a different value depending on whether we are training or evaluating the network.