Data Mining Survivor: Resources_Further3

DATA MINING
Desktop Survival Guide
by Graham Williams

Bagging

The term originates with ().

Todo: Bagging (Bootstrap aggregating) was proposed by Leo Breiman in 1994. He improved classification accuracy by combining classifications from randomly generated training sets. See Breiman, 1994. Technical Report No. 421. Also Leo Breiman (1996). "Bagging predictors". Machine Learning 24 (2): 123–140. doi:10.1007/BF00058655.

Bagging is a variance reduction method for model building. That is, through building multiple models from samples of the training data, the aim is to reduce the variance. Bagging is a technique generating multiple training sets by sampling with replacement from the available training data. In an ideal world we can eliminate variance due to a particular choice of training set by combining models that are built from each training set of size N. In practise only one training set is available. By sampling with replacement from the training set to form new training sets, bagging simulates the ideal situation. Bagging is also known as bootstrap aggregating.

A good introduction is available from http://www.idiap.ch/~bengio/lectures/tex_ensemble.png

Bagging is bootstrap aggregation. The underlying idea is that part of the error due to variance in building a model comes from the specific choice of the training dataset. So create many similar training data sets, and for each of them train a new function. The final function will then be the average of each functions output.

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010