Selection is the step of a genetic algorithm in which individual genomes are chosen from a population for later breeding (using the crossover operator).
A generic selection procedure may be implemented as follows:
The fitness function is evaluated for each, providing fitness values, which are then normalized. Normalization means dividing the fitness value of each by the sum of all fitness values so that the sum of all resulting fitness values equals 1.
The population is sorted by descending fitness values.
Accumulated normalized fitness values are computed: the accumulated fitness value of an individual is the sum of its fitness value plus the fitness values of all the previous individuals; the accumulated fitness of the last individual should be 1, otherwise, something went wrong in the normalization step.
A random number let say 'R' between 0 and 1 is chosen.
The selected individual is the last one whose accumulated normalized value is greater than or equal to R.
For a large number of individuals, the above algorithm might be computationally quite demanding. A simpler and faster alternative uses the so-called stochastic acceptance.
If this method is repeated until there are enough selected individuals, this selection method is called fitness proportionate selection or roulette-wheel selection. If instead of a single pointer spun multiple times, there are multiple, equally spaced pointers on a wheel that is spun once, it is called stochastic universal sampling. Frequently selecting the best individual of a randomly chosen subset is tournament selection. Taking the best half, third or another portion of the individuals is truncation selection.
Crossover attempts to distribute good parts of the genome among individuals that have arisen due to mutation. It would indeed be nice if one could just swap the good parts of the genome, and this has been attempted; for example, you can look at each gene and measure the average fitness of individuals possessing that gene.
There are two main problems with this though:
- It is computationally expensive.
- There might be interdependencies in the genome. Maybe gene A looks really good according to your metric, but gene B doesn't, so you leave it out. In reality, though, it might be that gene A doesn't actually work without gene B being present.