Posts

Showing posts from March, 2011

Schemata, Building Blocks, and Everything Else

Genetic Algorithms (GAs), are a search and optimization method inspired in the way nature works with living entities, using evolutionary-based operators. These operators exchange genetic information through different generations until an ending condition, typically the desired solution, is found. In this entry, the formalism of why GAs work is described as proposed by Holland in the middle seventies and later by Goldberg. To do so, we first need to introduce some key concepts, assuming the classical ternary representation {0, 1, *} , where * is the don't care symbol. A fundamental concept in GA theory is the one of schema . A schema is a particular subset among the set of all possible binary strings described by a template composed of the ternary alphabet {0, 1, *} . For instance, the schema 01**1 corresponds to the set of strings of length five (that is, strings composed of five symbols from the ternary alphabet) with a 0 in the first position, an 1 in the second position

From Market Baskets to Databases: Association Rule Mining

What do the customers buy? Which products are bought together? With these two short questions the field of association rule (AR) mining makes its appearance. In this field of ML, the original aim was to find associations and correlations between the different items that customers place in their shopping market. More generally, the goal of AR is to find frequent and interesting patterns , associations , correlations , or causal structures among sets of items or elements in large databases and put these relationships in terms of association rules . AR is an important part of the unsupervised learning paradigm, so the algorithm has not the presence of an expert to teach it during the training stage. Why AR mining may be so important ? Many commercial applications generate huge amounts of unlabeled data (just think of Facebook for a moment), so our favorite classifier system will not work in this environment. With AR we can exploit such databases and extract any kind of useful in