Ever since Darwin first came up with the theory of natural selection, many scientists went on an adventurous journey to study its biological basis. Genetics, especially, advanced our understanding of inheritance by introducing the idea that genetic information of two parents recombine to give rise to millions of possibilities of their next generations. Not to bother you with all the biology terms you struggle to memorize, but you should appreciate how much genetic variations this recombination process can bring about. On a less abstract note, if evolution is an extremely challenging math problem that mother nature struggles to solve—who should survive and who should not—genetics is so far the best solution. It’s such a successful optimizer that it, for the most part, benefits both individuals and their populations.
Computer algorithm fans were so inspired by this optimizer that they decided to have an artificial version of it. Here comes the famous “genetic algorithm”. But you may ask, how is it possible that such a complex process can be captured by a single algorithm? While there is much room left for improvement (by smart readers like you in the future!), GA so far has captured at least one idea successfully: only the fittest individual can survive to the next generation. Ready for a quick snapshot of Genetic Algorithms? To help you shift gears from real biology to computer science, here is a conversion chart:
|Genetics||Computer science||Visual representation in algorithm|
|“Population”||A collection of different chromosomes||An unsolved math problem that needs optimization||010000010000100010011111000011110011……|
|“Individuals”||One chromosome in the population||One possible solution to the math problem||1000100111110|
|“Genes”||Different genes that consist of one chromosome||Different variables that consist of particular solution||Each bit in the string|
|“alleles”||All possible versions of a gene||All possible values for a given bit||0 or 1 (binary in our case, but can be any other encoding systems)|
Now imagine the math problem you are given is to find the best path between two cities. Even though you know several different shortcuts to travel from New York to Boston, you are desperate to find the best solution! Intuitively, there is a great chance that the best solution should somehow embody features you see here and there in these existing solutions. Given the great power of GA, what we can do is to develop romantic relationships between these solutions and try to find a child that captures all the desirable traits we are looking for.
The first step is to turn existing solutions into binary strings for efficiency purposes. For simplicity, let’s say there are only four variables in a solution, going east (00), west (01), north (10), or south (11). A solution like 01000011 would denote going west, east, east, and south in order to travel from NY to Bos. Now we have a pool of binary strings to work with.
The next step is to perform crossover between individuals. However, if only the fittest individual can be selected, how do we know who is the fittest? That’s where the fitness function comes in handy, which tells you the competence of each individual. Note the challenge we presented earlier is to find, not the shortest, but the best path. Therefore, based on our personal preferences, we can design a fitness function, say, to measure the number of vertical turns—two non-identical variables next to each other in a string—in a solution, since you would rather keep going straight then making turns when driving. For example, 00001100 is better than 00111001. With a fitness score for each string, only those that pass a threshold value can be considered for the recombination process.
Now we randomly choose two relatively fitter individuals as parents. Remember the first time you were amazed by the swap-over of chromosomes in the biology class? Now it’s your chance to do it! All we need to do is exchange the genes of two chromosomes. Here are two competent individuals to start with:
Parent 1: 00001100
Parent 2: 01010111
Assume crossover point is in the middle (you can decide your own). After crossover we have two children:
Child 1: 01011100
Child 2: 00000111
You can also incorporate the idea of mutation by randomly changing the bit in a string. For instance, after a single point mutation at position 0 in child 1, now we have two offspring:
11011100 and 00000111.
Just as evolution would go on for a long time, GA would compute the fitness score for the current offspring, perform recombination between them, and repeat the process until the children do not differ from their parents that much.
Is this algorithm guaranteed to give you the best solution? Unfortunately, not always. But isn’t it fun to mimic the evolution in the compiler?