Schemata, Building Blocks, and Everything Else

Genetic Algorithms (GAs), are a search and optimization method inspired in the way nature works with living entities, using evolutionary-based operators. These operators exchange genetic information through different generations until an ending condition, typically the desired solution, is found. In this entry, the formalism of why GAs work is described as proposed by Holland in the middle seventies and later by Goldberg. To do so, we first need to introduce some key concepts, assuming the classical ternary representation {0, 1, *}, where * is the don't care symbol.

A fundamental concept in GA theory is the one of schema. A schema is a particular subset among the set of all possible binary strings described by a template composed of the ternary alphabet {0, 1, *}. For instance, the schema 01**1 corresponds to the set of strings of length five (that is, strings composed of five symbols from the ternary alphabet) with a 0 in the first position, an 1 in the second position and an 1 in the last position. Every string that fits a template is called an instance of a schema. In our particular example, the strings 01001, 01011 and 01111, among others, are instances of the schema 01**1. It is very common to use the term schema to denote the subset as well as the template defining the subset itself.

Two more concepts are needed in order to understand the intrinsics of GAs: (1) the order of a schema and (2) the length of a schema.

(1) The order of a schema is defined as the number of bits that are non-don't care symbols. For example, the order of the schema 01**1 is 3.

(2) The length of a schema is defined as the distance between the first and the last non-don't care symbols. For example, the length of the schema 01**1 is 4.

With these definitions in mind we can model how different schemas evolve along a GA run using the schema theorem. This theorem states how the frequency of schema instances changes considering the effects of the evolutionary operators of selection, crossover and mutation. Going further, schema theorem states that the frequency of schemata with fitness higher than the average, short length and low-order ones increases in the following generation. One can find the mathematical foundations elsewhere in the bibliography. Anyway, schemata theorem is not enough to comprehend why GAs work; we need another conceptual leap. That is the case of building block hypothesis, which states that the combination of low-order, highly fit schemata (the building blocks) generate higher-order schemata. These new schemata will form high performance solutions. As in Goldberg's own words, the primary idea of GAs in that they work through a mechanism of decomposition and reassembly.


O. CorĂ³n, F. Herrera, F. Hoffmann, and L. Magdalena. Genetic fuzzy systems: Evolutionary tuning and learning of fuzzy knowledge bases, volume 19 of Advances in Fuzzy Systems— Aplications and Theory. World Scientific, 2001.

D. E. Goldberg. The design of innovation: Lessons from and for competent genetic algorithms. Kluwer Academic Publishers, 1 edition, 2002.

D. E. Goldberg. Genetic algorithms in search, optimization & machine learning. Addison Wesley, 1 edition, 1989.

J. H. Holland. Adaptation in natural and artificial systems. The University of Michigan Press, 1975.

A. Orriols-Puing. New Challenges in Learning Classifier Systems: Mining Rarities and Evolving Fuzzy Models. PhD thesis, Arquitectura i Enginyeria La Salle, Universitat Ramon Llull, Passeig de la Bonanova 8, 08022 - Barcelona, Nov 2008.