Wednesday, June 4, 2014

Boosting the accuracy rate by means of AdaBoost

Several machine learning techniques foster the cooperation between subsolutions for obtaining an accurate outcome by combining many of them. Michigan-style LCSs are one of these families. However, the most well known are those that implement Boosting, and AdaBoost is the most successful of them (or, at least, the most studied one and the first to implement the ideas of Boosting).

AdaBoost generates accurate predictions by combining several weak classifiers (also referred to as “the wisdom of the crowd”). The most outstanding fact about AdaBoost is that it is a deadly simple algorithm. The key of its success lies in the combination of many weak classifiers: these are very limited and their error rate is just slightly better than a random choice (hence their name). The typical implementation uses decision stumps (i.e., binary trees) that minimize the error between the prediction and the ground truth (that is, the desired outcome), so the classifiers have the form "if variable_1 < 3.145 then class = -1; otherwise class = +1". Another characteristic is that AdaBoost only handles two classes, namely {-1} and {+1}.


Its scheme is very simple: it trains classifiers that predict correctly a small part of the problem space and then it combines all these by using a weighting mechanism. Then, AdaBoost decides the class of the example by computing the sign (+1 or -1) of the aggregated predictions. Recall that the weights are computed based on the error achieved by the distinct weak predictors (see the image below).


Figure 1. The learning process of AdaBoost. The Di's are the distinct weights applied to each weak classifier.

Boosting techniques have attracted my attention lately since these are the ones that provide the best results in the distinct Kaggle competitions. As usual, I implemented a little R code for playing a bit with this wonderful technique. It is listed in the following. Notice that I implemented the univariate version of the AdaBoost algorithm, keeping the code very simple and easy to understand and extend.


Wednesday, May 28, 2014

The challenge of learning from rare cases


An important challenge is learning from domains that do not have the same proportion of classes, that is, learning from problems that contain class imbalances (Orriols-Puig, 2008). Figure 1 show a toy example of this issue. It is challenging because (1) in many real-world problems we cannot assume a balanced distribution of classes and (2) traditional machine learning algorithms cannot induce accurate models in such domains. Oftentimes it happens that the key knowledge to solve a problem that previously eluded solution is hidden in patterns that are rate. To tackle this issue, practitioners rely on re-sampling techniques, that is, algorithms that pre-process the data sets and either (1) add synthetic instances of the minority pattern to the original data or (2) eliminate instances from the majority class. The first type is called over-sampling and the later, under-sampling. 
Figure 1. Our unbalanced data set. Black dots are the majority class (0) and red dots the minority (1). It has an imbalance ratio of 11.25, where the majority class correspods to the 91.8367% of the data, and the minority to the 8.1633%

In this post I will present the most successful over-sampling technique, the so called Synthetic Minority Over-sampling Technique (SMOTE), which was introduced by Chawla et al. (2002). It works in a very simple manner: it generates new samples out of the minority class by seeking the nearest neighbors of these. Figure 2 shows the results of applying this method to our toy problem.

Figure 2. The SMOTEd version of the data set (see Figure 1). Now we have a much more balanced domain (almost 50% of the instances are in each class)

In the following I provide the R code. In it one can select the requested number of samples to generate, the number of k-neighbors for the data generation and the distance metric used (one of the following two: the Euclidean distance or the Mahalanobis distance).
 



References

A. Orriols-Puig. New Challenges in Learning Classifier Systems: Mining Rarities and Evolving Fuzzy Models. PhD thesis, Universitat Ramon Llull, Barcelona (Spain). 2008. 

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.

Monday, March 24, 2014

Learning Classifier System Open Repository

It is time to announce the Learning Classifier System Open Repository (LCSOR), a meeting point to foster the popularization of these exciting evolutionary computation techniques. In this repository I will upload distinct implementations of different Learning Classifier Systems. To start with, I uploaded a first version of the sUpervised Classifier System (UCS). Notice I created a page for this contend under the label LCSOR.


I hope practitioners will enjoy this experience.