Skip to main content

Insights on Linkage Learning: Truly Competent Genetic Algorithms

In the last post we acknowledged that competent GAs are those that automatically identify building blocks (BBs) and exchange these BBs without disrupting them. As a simple example, the compact genetic algorithm (cGA) was presented for competently solving the trap-n functions. In spite of the good results obtained by cGA in these environments, this algorithm is way too simple for truly tough problems: the m-concatenated trap-n problems, as depicted in Figure 1. In these kinds of problems, deceptive trap-n functions are concatenated into a single, bigger problem.  cGA’s probability vector representation cannot detect the complicated combinations of BBs so, again, a new strategy to tackle this challenging environments is required: we need an order-n probabilistic optimization algorithm (in contrast cGA is of order-1).

Figure 1: the 4-concatenated trap-3: It is composed of four trap-3 functions and the objective is to find 111111111111, but the problem is deceptive and guides the learner towards 000000000000 (a local optima!). Click to enlarge.

Following the initial steps of primeval estimation-distribution algorithms (EDAs), a complex probabilistic model is used to detect BBs (instead of the simple probability vector). One of the most successful EDAs that use such idea is the extended compact genetic algorithm (ECGA) [1]. This algorithm works by detecting the trap sub-functions and then computing the proportions of all the m-combinations of n-bits in the corresponding string positions. ECGA does this by learning marginal product models (MPMs).  The idea behind it is that the MPM contains the probabilities of each combination of BB and these are used to model and sample forthcoming populations. In this regard, to obtain the right combination of BB the algorithm uses a combined complexity criterion (CC): it greedily tries to minimize the amount of bits required to describe the model while being accurate.

ECGA works as in the following (rather simplified) pseudocode:
  1.  It first generates a random population of size N.
  2.  Evaluate the fitness of each individual.
  3. Generate the first MPM model assuming that all the variables are independent and then compute CC.
  4. Compact the current MPM greedily (newMPM) trying the combination of BB with better score (newCC).
  5. If newCC < CC then MPM = newMPM and go to step (4), otherwise skip to step (6).
  6. Sample a new population using MPM.
  7. If the termination criteria are met, exit. Otherwise go to step (2).

I have coded the ECGA in R solving the m-concatenated trap-3 function; it is in the following.

 #Class definition.  
 setClass( "Individual", representation( chromosome = "array", fitness = "numeric", size = "numeric" ) )  
 #Concatenated m-Trap-3 function fitness computation. Individuals must have length 3 * m (for instance 12)!  
 #Ind is the individual to be evaluated.  
 computeFitness <- function( ind ) {  
     fit <- 0  
     size <- 3 #Size of each block  
     i <- 1  
     while( i <= length( ind ) ) {   
         bb <- ind[ i:(i + size - 1) ]  
         no <- sum( bb )  
         if( no <= (length( bb ) - 1) ) {  
             fit <- fit + (length( bb ) - 1 ) - no  
         } else {  
             fit <- fit + length( bb )  
         }  
         i <- i + size  
     }  
     return( fit )  
 }  
 #Constructor: it initializes individuals at random.  
 Individual <- function( indSize ) {  
     ind <- array( rbinom( indSize, 1, 0.5 ), indSize )  
     fit <- computeFitness( ind )  
     return( new( "Individual", chromosome = as.array( ind ), fitness = fit, size = indSize ) )  
 }  
 #Another Initialization function for Individual class.  
 #ind is an array containing the chromosomes of the individual.  
 Initialize <- function( ind ) {  
     fit <- computeFitness( ind )  
     return( new( "Individual", chromosome = as.array( ind ), fitness = fit, size = length( ind ) ) )  
 }  
 #Accesor Methods.  
 getFitness <- function( obj ) obj@fitness  
 getSize <- function( obj ) obj@size  
 getChromosome <- function( obj ) obj@chromosome  
 #Class containing the model  
 setClass( "SModel", representation( Nij = "array", size = "numeric" ) )  
 #Initializer.  
 SModel <- function( partial, partSize ) {  
     return( new( "SModel", Nij = as.array( partial ), size = partSize ))  
 }  
 #Accesor Methods.  
 getModel <- function( obj ) obj@Nij  
 #It computes the marginal product model.  
 #BBList is the building block list.  
 #Lbb is the array containing the number of (possibly) merged BBs.  
 #Nbb is the size of the Lbb array.  
 computeMPM <- function( pop, BBList, Lbb, Nbb ) {  
     MPM <- list()  
     for( i in 1:Nbb ) {  
         Nij <- array( 0, 2^Lbb[i] )  
         for( ps in 1:length( pop ) ) { #For the entire population...  
             tmp <- as.numeric( intToBits( 0 ) )  
             count <- Lbb[i] #Required for mapping the reversed binary numbers obtained by intToBits().  
             for( j in BBList[[i]] ) {   
                 tmp[ count ] <- pop[[ps]]@chromosome[j]  
                 count <- count - 1  
             }  
             dec <- packBits( as.integer( tmp ), "integer" ) #dec contains the combination  
             Nij[ dec + 1 ] <- 1 + Nij[ dec + 1 ]  
         }  
         MPM <- c( MPM, SModel( Nij, length( Nij ) ) )  
     }  
     return( MPM )  
 }  
 #It computes the complexity of the model.  
 #Lbb is the array containing the number of (possibly) merged BBs.  
 #Nbb is the size of the Lbb array.  
 Cm <- function( popSize, Lbb, Nbb ) {  
     sum <- 0  
     for( i in 1:Nbb ) {  
         sum <- sum + (2^Lbb[i] - 1)  
     }  
     return (sum * (log2( popSize + 1 ) ) )  
 }  
 #It computes the compressed population complexity.  
 #MPM is the marginal product model.  
 Cp <- function( MPM, popSize, Lbb, Nbb ) {  
     sum <- 0  
     for( i in 1:Nbb ) {  
         for( j in 1:(2^Lbb[i]) ) {  
             if( getModel( MPM[[i]] )[j] != 0 ) {  
                 sum <- sum - (getModel( MPM[[i]] )[j] / popSize) * log2( (getModel( MPM[[i]] )[j] / popSize) )  
             }  
         }  
     }  
     sum <- sum * popSize  
     return( sum )  
 }  
 #It choooses the model out of the given one.  
 #It returns a list containing (1) the model, (2) the building block list,  
 #(3) the list of building block ids, (4) the array with the building   
 #block counter and (5) the new complexity model cost.  
 chooseModel <- function( p, indSize, BBList, Lbb, Nbb ) {  
     popSize <- length( p )  
     Ccp <- .Machine$integer.max - 1  
     mpmp <- list()  
     BBListp <- list()  
     Nbbp <- 0  
     Lbbp <- array( 0, indSize )  
         for( i in 1:(Nbb - 1) ) { #For each building block (except for the last one)...  
             for( j in (i + 1):Nbb ) { #The remaining building blocks...  
                 BBList2 <- list()  
                 BBList2 <- list ( c( BBList[[i]], BBList[[j]] ) )  
                 for( k in 1:Nbb ) {  
                     if( k != i && k != j ) {  
                         BBList2 <- c( BBList2, BBList[k] )  
                     }  
                 }  
                 Lbb2 <- array( 0, indSize )  
                 for( k in 1:length( BBList2 ) ) {  
                     Lbb2[k] <- length( BBList2[[k]] )  
                 }  
                 Nbb2 <- length( BBList2 )  
                 #Check if the new model is better than the previous.  
                 mpm2 <- computeMPM( p, BBList2, Lbb2, Nbb2 )   
                 Cc2 <- Cm( popSize, Lbb2, Nbb2 ) + Cp( mpm2, popSize, Lbb2, Nbb2 )  
                 if( Cc2 < Ccp ) { #Choose the smallest one.  
                     Ccp <- Cc2  
                     mpmp <- mpm2  
                     Lbbp <- Lbb2  
                     Nbbp <- Nbb2  
                     BBListp <- BBList2  
                 }  
             } #end j-for.  
         } #end i-for.  
     return( list( mpmp, BBListp, Lbbp, Nbbp, Ccp ) )      
 }  
 #Selection operator. It returns the selected individual.  
 tournamentSelection <- function( pop, tau ) {  
     best <- -1;  
     ind <- 1;  
     for( i in 1:tau ) {  
         ind <- as.integer( runif( 1 ) * (length( pop ) + 1) )  
         if( ind > length( pop ) ) {  
             id <- length( pop )  
         } else if( ind <= 0 ) {  
             ind <- 1  
         }  
         if( best <= 0 ) {  
             best <- ind  
         } else if( getFitness( pop[[ ind ]] ) > getFitness( pop[[ best ]] ) ) {  
             best <- ind  
         }  
     }  
     return( pop[[ ind ]] );  
 }  
 #It returns the best individuals.  
 #howMany it tells the function how many best individuals return.  
 getBests <- function( pop, howMany ) {  
     if( howMany > length( pop ) ) {  
         howMany <- length( pop )  
     } else if( howMany < 1 ) {  
         howMany <- 1  
     }  
     #Sort the population (insertion sort).  
     for( i in 1:length( pop ) ) {  
         valueToInsert <- pop[[i]]  
         holePos <- i  
         while( holePos > 1 && getFitness( valueToInsert ) > getFitness( pop[[ holePos - 1 ]] ) ) {  
             pop[[ holePos ]] <- pop[[ holePos - 1 ]]  
             holePos <- holePos - 1  
         }  
         pop[[ holePos ]] <- valueToInsert  
     }  
     bests <- list()  
     for( i in 1:howMany ) {  
         bests <- c( bests, c = pop[[i]] )  
     }  
     return( bests )  
 }  
 #It generates individuals using the model in a roulette-wheel fashion.  
 generateIndividual <- function( indSize, mpm, BBList, Lbb ) {  
     where <- 1  
     ind <- array( 0, indSize )  
     for( i in mpm ) {  
         propSum <- sum( i@Nij )  
         choicePoint <- runif( 1 ) * propSum  
         j <- 1  
         propSum <- i@Nij[[j]]  
         while( choicePoint > propSum ) {  
             j <- j + 1  
             if( i@size < j ) {   
                 break  
             } else {  
                 propSum <- propSum + i@Nij[[j]]  
             }  
         }  
         #Once the BB is selected, generate the individuals.  
         for( bits in 1:Lbb[ where ] ) {  
             tmp <- as.numeric( intToBits( (j - 1) ) )  
             ind[ BBList[[ where ]][ bits ] ] <- tmp[ Lbb[ where ] - bits + 1 ]  
         }  
         where <- where + 1  
     }  
     return( Initialize( ind ) )  
 }  
 #The main ECGA algorithm.  
 #indSize is the number of genes in an individual.  
 #popSize is the size of the population.  
 #Pc is the probability of crossover.  
 #tau is the selection pressure parameter for tournament selection.  
 #iterations is the number of loops the algorithm will perform.  
 #It returns a list containing (1) the final model, (2) the building block list,  
 #(3) the list of building block ids, (4) the array with the building   
 #block counter, (5) the new complexity model cost, (6) the final population.  
 ECGA <- function( indSize, popSize, Pc, tau, iterations ) {  
     pop <- list()  
     for( i in 1:popSize ) { #Generate the initial population randomly.  
         pop <- c( pop, c = Individual( indSize ) )  
     }  
     tstamp <- 1  
     while( tstamp < iterations ) {  
         #Selection.  
         newPop <- list()  
         for( i in 1:tau ) {  
             newPop <- c( newPop, tournamentSelection( pop, tau ) )  
         }  
         newPopSize <- length( newPop )  
         #Start the building block list.  
         BBList <- list()  
         for( i in 1:indSize ) { #Assume that all variables are independent...  
             BBList <- c( BBList, list( c( i ) ) )  
         }  
         Lbb <- array( 1, indSize )   
         Nbb <- indSize  
         #Compute how well the initial model fits.  
         mpm <- computeMPM( newPop, BBList, Lbb, Nbb )   
         Cc <- Cm( newPopSize, Lbb, Nbb ) + Cp( mpm, newPopSize, Lbb, Nbb )  
         improvement <- TRUE  
         while( improvement == TRUE && length( BBList ) > 1 ) {  
             tmpList <- chooseModel( newPop, indSize, BBList, Lbb, Nbb )  
             if( tmpList[[5]] < Cc ) { #Evaluate the model.  
                 mpm <- tmpList[[1]]  
                 BBList <- tmpList[[2]]  
                 Lbb <- tmpList[[3]]  
                 Nbb <- tmpList[[4]]  
                 Cc <- tmpList[[5]]  
             } else {  
                 improvement <- FALSE  
             }  
         }  
         #Crossover.  
         if( runif( 1 ) < Pc ) {  
             newInd <- 0  
             while( newInd < round( Pc * popSize ) ) {  
                 newPop <- c( newPop, generateIndividual( indSize, mpm, BBList, Lbb ) )  
                 newInd <- newInd + 1  
             }  
         }  
         conserve <- round( popSize * (1.0 - Pc) )  
         if( conserve > 0 ) {  
             newPop <- c( newPop, getBests( pop, conserve ) )  
         }  
         #Replace the old population with the new one.  
         pop <- newPop  
         tstamp <- tstamp + 1  
     }  
     return( list( mpm, BBList, Lbb, Nbb, Cc, pop ) )  
 }  
 #usage: res <- ECGA( 12, 200, 0.9, 50, 20 )  

References 

[1] Harik, G.R.;  Lobo, F.G.; Sastry, K.: "Linkage Learning via Probabilistic Modeling in the Extended Compact Genetic Algorithm (ECGA)," Scalable Optimization via Probabilistic Modeling 2006: 39-61

Comments

Popular posts from this blog

Toward ensemble methods: A primer with Random Forest

The Kaggle-Higgs competition has attracted my attention very much lately. In this particular challenge, the goal is to generate a predictive model out of a bunch of data taken from distinct LHC’s detectors. The numbers are the following: 250000 properly labeled instances, 30 real-valued features (containing missing values), and two classes, namely background {b} and signal {s}. Also, the site provides a test set consisting of 550000 unlabeled instances. There are nearly 1300 participants while I am writing this post, and a lot of distinct methods are put to the test to win the challenge. Very interestingly, ensemble methods are all in the head of the leaderboard, and the surprise is XGBoost, a gradient boosting method that makes use of binary trees.
After checking the computational horsepower of the XGBoost algorithm by myself, I decided to take a closer look at ensemble methods. To start with, I implemented a Random Forest, an algorithm that consists of many independent binary trees…

High Performance Computing, yet another brief installation tutorial

Today’s mid- and high-end computers come with a tremendous hardware, mostly used in video games and other media software, that can be exploited for advanced computation, that is: High Performance Computing (HPC). This is a hot topic in Deep Learning as modern graphic cards come with huge streaming process power and large and quick memory. The most successful example is in Nvidia’s CUDA platform. In summary, CUDA significantly speeds up the fitting of large neural nets (for instance: from several hours to just a few minutes!).
However, the drawbacks come when setting up the scenario: it is non-trivial to install the requirements and set it running, and personally I had a little trouble the first time as many packages need to be manually compiled and installed in a specific order. The purpose of this entry is to reflect what I did for setting up Theano and Keras with HPC using an Nvidia’s graphic card (in my case a GT730) using GNU/Linux. To do so, I will start assuming a clean Debian …

A conceptual machine for cloning a human driver

If anything else, a Machine Learning practitioner has to get a global view and think on how far our conceptual machines can go. This is a little tale of my recent experience with Deep Learning. To start with, say that I am enrolled in the Udacity’s nanodegree inSelf Driving Car Engineer. Here we are learning the techniques needed for building a car controller capable of driving at the human level. The interesting part of the course (to be read as where one truly learns) is in the so called projects: these are nothing but assignments in which the student has to solve a challenging task, mainly by using Convolutional Neural Networks (convnets), the most well-known Deep Learning models. So far, the most mind-blowing task and the focus of this entry is project 3, were we have to record our driving behavior in a car simulator and let a convnet to clone it and successfully generalize the act of driving. It is remarkable that a convnet learns to control the car from raw color images jointly…