Beyond state-of-the-art accuracy by fostering ensemble generalization
Sometimes
practitioners are forced to go beyond the standard methods in order to gain
more accuracy with their models. If one analyzes the problem of rocketing
accuracy, ensembling is a good starting point. However, the trick lies in
getting enough generalization from feature space. In this regard, ensemble generalization--do
not confuse with classic or "standard" ensemble methods such as Random Forest
or Gradient Boosting--is the right path to follow, however complex. The idea is
to combine predictions from "base learners" to train a second stage regressor,
using these predictions as metafeatures. The trick is to use a J-fold
cross-validation scheme and use always the same data partitions and seed. This
kind of ensemble is often called stacking--as we "stack" layers of
classifiers.
Let’s
do an example: suppose that we have three base learners: GBM, ET, and RF. Then
assume we have a LM as level 2 learner. First we divide the training data
into J-folds, for example in 4--recall that these 4 folds are stratified and
disjoint. Then we train each model using the traditional cross-validation
scheme, that is train with 3 folds and predict with the remaining (works best
if the predictions are in form of probabilities). These predictions are
stored and will be used for training the level 2 model. Figure 1 depicts this
process.
After
training the level 2 algorithm, we can proceed with the final predictions. To
do so, we train again the base learners but using the whole training set. We do this to gain up to a 20% accuracy.
It is important to highlight that we’ve to assure that the random seeds are the
same that in the J-fold training! Afterwards, for each test example we predict
with the base learners and collect the predictions. These are the input of the
level 2 algorithm, which performs the final prediction.
I
used these in Kaggle a few times and I’ve to say that it makes the difference.
However, I found it to be difficult to get it working and it requires a lot of processing
power. There is a nice post from Triskelion explaining ensembles that gave me the inspiration to write this.
Comments
Post a Comment