Bagging Technique in Ensemble Learning

# Bagging Technique in Ensemble Learning

Updated on Sep 18, 2022 16:02 IST

In this article, we will discuss the concept how to solve machine learning problems using the ensemble learning bagging.

## Introduction

You need to search online for reviews and ask your friends for their recommendations and then eventually, you will be able to make a decision based on multiple sources of information.

Similarly, in the world of machine learning, there’s no one-size-fits-all, or rather a one-model-fits-all, when working on a problem. Different models perform well in different scenarios.

So, instead of relying on the outcome of one specific model, you can choose from different models by aggregating their results and obtaining a best-fit model for your problem.

This is where Ensemble Learning comes into the picture.

In this article, we will discuss a common Ensemble Learning technique in detail – Bagging.

## Quick Intro to Ensemble Learning

An Ensemble Learning model is an aggregation of multiple models to improve the overall performance and make a final decision.

The ensemble model combines multiple models (aka weak learner) to make a strong learner. These model solves a given ML problem that would not be resolved as efficiently by any of the standalone learners.

The two popular techniques to create an ensemble model are:

The two popular techniques to create an ensemble model are:

### Bias and Variance

Before moving ahead, let’s recall an important concept when estimating model performance. A good ML model must have a minimal error – which means it should theoretically have a low bias and low variance while learning the training data. Why is that? Because both these sources of error will prevent a model from generalizing the training data to perform well on any new data.

However, achieving both low bias and low variance at the same time is not quite possible due to what we call the bias-variance trade-off.

• Bias error comes up when an algorithm makes incorrect assumptions about the relationship between the features and target variable in the training data. High bias causes the underfitting of the model.
• Variance error is caused by over-sensitivity to minute fluctuations in the training data. Due to this, the model learns noise from the data. A high variance would result in the overfitting of the model.

To obtain good results, a model must have a low error rate and enough degrees of freedom to resolve the underlying complexity of the data. But as you can see from the graph above, high degrees of freedom would mean high variance, which would affect the robustness of our model.

So, to obtain an optimal model, we need to find a balance between bias and variance.

This is the idea of ensemble methods – reducing the bias and/or variance of weak learner models by different combining techniques that are chosen based on the source of error we are trying to reduce.

Top Free Machine Learning Courses to Take Up in 2024
Machine learning is among the most exciting fields of computer science and statistics that have been helping many industries to become more efficient and smart. The job market demands skilled...read more
Top 10 concepts and Technologies in Machine learning
This blog will make you acquainted with new technologies in machine learning

## What is Bagging in Ensemble Learning – Bootstrap Aggregating

Bagging is an ensemble learning method that is used to reduce the error by training homogeneous weak learners on different random samples from the training set, in parallel. The results of these base learners are then combined through voting or averaging approach to produce an ensemble model that is more robust and accurate.

Bagging mainly focuses on obtaining an ensemble model with lower variance than the individual base models composing it. Hence, bagging techniques help avoid the overfitting of the model.

### Bootstrapping: Random Sampling with Replacement

Let’s define bootstrapping first – It is a statistical technique that generates random samples (called bootstrap samples) from the initial dataset by randomly drawing with replacement observations.

The samples should have two properties:

• Representativity: The initial dataset should be large enough so that the samples are a good approximation of sampling from the underlying distribution of data.
• Independence: The initial dataset should be large enough compared to the sample size so that the samples are not much correlated.

In general, classification problems require more samples in comparison to regression problems.

As per our assumptions, each bootstrap sample shown above will act as an almost-independent dataset drawn from the true (unknown) underlying distribution.

### Ensemble Learning and Aggregating

Now, we fit a weak learner for each of the samples (ensemble learning) and finally combine their outputs to obtain an ensemble model (aggregating) with lower variance.

• For regression problems, predicted outcomes from base models are averaged.
• For classification problems, majority votes are considered.

## Demo: Implementing Bagging

#### Problem Statement:

Let’s build an ensemble model through the bagging technique. For this, we will implement a Decision Tree as the base learner using the Scikit-learn library in Python.

Dataset Description:

This dataset has the following columns:

• alcohol – Alcohol percentage in that particular type of wine
• malic_acid – Malic acid percentage in that particular type of wine
• ash – Amount of ash in that particular type of wine
• alcalinity_of_ash – Amount of alkalinity of ash in that particular type of wine
• magnesium – Amount of magnesium in that particular type of wine
• total_phenols – Amount of phenols in that particular type of wine
• flavanoids – Amount of flavonoids in that particular type of wine
• nonflavanoid_phenols – Amount of non flavonoid phenols in that particular type of wine
• proanthocyanins – Amount of proanthocyanins in that particular type of wine
• color_intensity – The color intensity of that particular type of wine
• hue – The hue of that particular type of wine
• od280/od315_of_diluted_wines – Amount of dilution of that particular type of wine
• proline – Amount of proline in that particular type of wine
• target – Class label of the wine (1,2, or 3)

The target column is used to predict the class of wine.

2. Split the data into training and testing sets
3. Build a Decision Tree Classifier
4. Build an Ensemble Model using Bagging
5. Train the Ensemble Model
6. Evaluate the Ensemble model

Split the data into training and testing sets

from sklearn.model_selection import train_test_split

#Split the dataset into 70% training set and 30% testing set
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3,  random_state=23)

Build a Decision Tree Classifier

We are going to use a Decision Tree with fixed parameters as the base learner.

from sklearn.tree import DecisionTreeClassifier

#Decision tree classifier
dtree = DecisionTreeClassifier(max_depth=3, random_state=23)

Build an Ensemble Model using Bagging Technique

The bagging ensemble model is initialized with the following:

• base_estimator = Decision Tree
• n_estimators = 5 – To create 5 bootstrap samples to train 5 decision tree base models
• max_samples = 50 – The number of items per sample is 50
• bootstrap = True – The sampling will be with replacement
from sklearn.ensemble import BaggingClassifier

#Bagging ensemble model
bagging = BaggingClassifier(base_estimator=dtree, n_estimators=5, max_samples=50, bootstrap=True)

Train the Ensemble Model

#Training the model
bagging.fit(x_train, y_train)

Evaluate the Ensemble Model

#Evaluating the model
print(f"Train score: {bagging.score(x_train, y_train)}")
print(f"Test score: {bagging.score(x_test, y_test)}")

As we can see from the above result, a few learners or estimators are enough for small datasets. However, larger data may require more learners.

One of the most prominent advantages of bagging is that the samples are generated concurrently. So, when different base estimators are fitted independently, intensive parallelization techniques can be used as and when required.

## Conclusion

We have discussed how to solve Machine Learning problems based on an ensemble learning – bagging. Ensemble models usually perform more accurately than a single model because they alleviate the overfitting problem while also combining the strengths of different models.

Artificial Intelligence & Machine Learning is an increasingly growing domain that has hugely impacted big businesses worldwide.

Normalization vs Standardization
Normalization and standardization are two techniques used to transform data into a common scale. Normalization is a technique used to scale numerical data in the range of 0 to 1....read more
One hot encoding vs label encoding in Machine Learning
As in the previous blog, we come to know that the machine learning model can’t process categorical variables. So when we have categorical variables in our dataset then we...read more