2 views

`I am working on a neural network based on the NEAT algorithm that learns to play an Atari Breakout clone in Python 2.7, and I have all of the pieces working, but I think the evolution could be greatly improved with a better algorithm for calculating species fitness.`

The inputs to the neural network are:

• X coordinate of the center of the paddle
• X coordinate of the center of the ball
• Y coordinate of the center of the ball
• ball's dx (velocity in X)
• ball's dy (velocity in Y)

The outputs are:

The parameters I have available to the species fitness calculation are:

• breakout_model.score - int: the final score of the game played by the species
• breakout_model.num_times_hit_paddle - int: the number of times the paddle hit the ball
• breakout_model.hits_per_life - int: the number of times the paddle hit the ball per life, in the form of a list; e.g. first element is the value for the first life, 2nd element is the value for the 2nd life, and so on up to 4
• breakout_model.avg_paddle_offset_from_ball - decimal: the average linear distance in the X direction between the ball and the center of the paddle
• breakout_model.avg_paddle_offset_from_center - decimal: the average linear distance in the X direction between the center of the frame and the center of the paddle
• breakout_model.time - int: the total duration of the game, measured in frames
• breakout_model.stale - boolean: whether or not the game was artificially terminated due to staleness (e.g. ball gets stuck bouncing directly vertical and paddle not moving)

If you think I need more data about the final state of the game than just these, I can likely implement a way to get it very easily.

Here is my current fitness calculation, which I don't think is very good:

def calculate_fitness(self):

self.fitness = self.breakout_model.score

else:

self.fitness -= 0.5

self.fitness -= (1 / self.breakout_model.avg_paddle_offset_from_ball) * 100

for hits in self.breakout_model.hits_per_life:

if hits == 0:

self.fitness -= 0.2

if self.breakout_model.stale:

self.fitness = 0 - self.fitness

return self.fitness

by (33.1k points)

You should minimize the conditional logic in your fitness function, using it only in those cases where you want to force the fitness score to 0 or a major penalty. I would just decide how much weight each component of the score should have and multiply. Neural Network Tutorials are quite helpful in doing these things.

Negative components just add complexity to understanding the fitness function, with no real benefit; the model learns from the relative difference in scores. So my version of the function would look something like this:

def fitness(...):

if total_hits == 0:

return 0

return (game_score/max_score) * .7 \

+ game_score/total_hits * .2 \

+ game_score_per_life/hits_per_life * .1

Hope this answer helps you! For more details check out the Machine Learning Online Course by Intellipaat.