Sometimes practitioners are forced to go beyond the standard methods in order to gain more accuracy with their models. If one analyzes the problem of rocketing accuracy, ensembling is a good starting point. However, the trick lies in getting enough generalization from feature space. In this regard, ensemble generalization--do not confuse with classic or "standard" ensemble methods such as Random Forest or Gradient Boosting--is the right path to follow, however complex. The idea is to combine predictions from "base learners" to train a second stage regressor, using these predictions as metafeatures. The trick is to use a J-fold cross-validation scheme and use always the same data partitions and seed. This kind of ensemble is often called stacking--as we "stack" layers of classifiers.
Let’s do an example: suppose that we have three base learners: GBM, ET, and RF. Then assume we have a LM as level 2 learner. First we divide the training data into J-folds, for example in 4--recall that these 4 folds are stratified and disjoint. Then we train each model using the traditional cross-validation scheme, that is train with 3 folds and predict with the remaining (works best if the predictions are in form of probabilities). These predictions are stored and will be used for training the level 2 model. Figure 1 depicts this process.
After training the level 2 algorithm, we can proceed with the final predictions. To do so, we train again the base learners but using the whole training set. We do this to gain up to a 20% accuracy. It is important to highlight that we’ve to assure that the random seeds are the same that in the J-fold training! Afterwards, for each test example we predict with the base learners and collect the predictions. These are the input of the level 2 algorithm, which performs the final prediction.
I used these in Kaggle a few times and I’ve to say that it makes the difference. However, I found it to be difficult to get it working and it requires a lot of processing power. There is a nice post from Triskelion explaining ensembles that gave me the inspiration to write this.