Understanding Hierarchical Clustering in Data Science

# Understanding Hierarchical Clustering in Data Science

Updated on Jan 20, 2023 18:51 IST

Data can be challenging to comprehend as it can be extensive. Clustering is a method to divide objects into clusters that are similar and dissimilar to the objects belonging to another cluster.

In this blog, we will see about Hierarchical Clustering, its types, and how it works.

## What is Hierarchical Clustering?

The Hierarchical Clustering method will work by grouping data into a tree of clusters. The algorithm will build clusters by calculating the dissimilarities between data. The algorithm will treat each data point as a different cluster and do the below steps:

• It will recognize the two clusters that are close to each other.
• Then, it will merge the two close clusters together.

We should repeat these steps till we get all the clusters together. In this process, we will produce a hierarchical series of nested clusters, and we can represent these clusters using dendrograms.

### Why Do We Need Hierarchical Clustering?

Unlike other clustering algorithms, Hierarchical clustering doesn’t need the number of clusters to be specified before the data training. This algorithm will not create clusters of the same size, but it is easy to understand and implement.

## Types of Hierarchical Clustering

Hierarchical Clustering can be divided into two types explained below:

### 1. Agglomerative Hierarchical Clustering

In Agglomerative Hierarchical Clustering, we assign every point to an individual cluster. If we have four data points, we will assign every point to a cluster, and there will be four clusters. Then, with each iteration, we will merge the closest pair of clusters and repeat the same till only a single cluster is left. We are adding the clusters in each step, so it is Agglomerative Hierarchical Clustering or Additive Hierarchical Clustering.

### 2. Divisive Hierarchical Clustering

Divisive Hierarchical Clustering will work opposite to Agglomerative Hierarchical Clustering. Instead of starting with n clusters, we will start with a single cluster, and all the data will be inside that cluster.

So in each iteration, we will split the farthest point in the cluster and repeat the same until every cluster only has a single data point. We will divide the clusters in each step, which is known as Divisive Hierarchical Clustering.

## How does Hierarchical Clustering work?

We can consider having a few points on a 2D plane with x-y coordinates. Each data point is a cluster of its own. We want to determine a way to compute the distance between every point. Before that, we should find the shortest distance between any two points for forming a cluster.

Once we get the shortest distance, we can start grouping them and form clusters of different points. We can represent it in a tree-like structure called a dendrogram. As an output, we will now have three clusters P1 – P2, P3 – P4, and P5 – P6. Also, we will have three dendrograms for denoting the clusters.

Next, we should bring two groups together. Now let’s bring P1 – P2 and P3 – P4 together under the same dendrogram as they are closer than P1 – P2. We will now have one cluster which we can bring together next. Now, we can see that we have a cluster where everything is together.

### 1. What is Dendrogram?

A type of tree diagram that shows a hierarchical relationship between different data sets is known as a dendrogram. A Dendrogram has the memory of a hierarchical clustering algorithm, so we can understand how the cluster is formed.

• The dissimilarities in the dendrogram will represent the distance between data points.
• The block height will represent the distance between clusters.

A dendrogram can be a column graph or a row one. Some are circular or will have a fluid shape, but the software will usually produce a column or row graph. Whatever it is the basic graph will have the same parts:

Clades: The clades are the branches. They will be arranged according to how dissimilar or similar they are. Clades that are close to the same height will be similar, and different ones are dissimilar. The greater the height difference, the more dissimilarity. A clade can have an infinite amount of leaves. But, the more leaves harder it will get to read the graph.

### 2. What is Distance Measure?

Distance measures will determine the similarity between the two elements and influence how the cluster shape changes. Some ways of calculating the distance measures are explained below:

Euclidean Distance Measure: Euclidean Distance Measure is the common method to calculate distance measures for determining the distance between two points. If we have two points A and B, the distance is the direct straight line between the points. As there might be more than two dimensions, we can calculate the distance between every dimension squared and take the square root to get the actual distance between them.

Squared Euclidean Distance Measure: It is similar to the Euclidean Distance Measure method, but we don’t take the square root at the end. Depending on whether the points are far from each other or close, we can compute the difference in distance using the Squared Euclidean Distance Measure. Though this method gives us the exact distance, it won’t make a difference if we calculate which is smaller or larger. We will remove the square root part to make the computation faster.

Manhattan Distance Measure: Manhattan Distance Measure is a method that is a simple sum of vertical and horizontal components or the distance between two points that are measured along with axes at right angles. This method is different as we won’t look at the direct line, and the individual distances can provide good results. Most people use Euclidean squared method as it is faster. But when using the Manhattan method, we can measure either the A difference or the B and take the absolute value of it.

Cosine Distance Measure: The Cosine Distance Measure will measure the angle between two vectors. As the two vectors separate, the cosine distance will be higher. It is similar to Euclidean Distance Measure, and you can get similar results with both of them. This method will produce a different result. We can end up with bias when your data is very skewed or if both sets of value will have a dramatic size difference.

### 3. Measuring the Similarities & Dissimilarities in Clustering

The common methods for measuring the similarities and dissimilarities in clustering are explained below:

• Maximum Linkage Clustering: It will help us compute all the pairwise dissimilarities between the items in clusters and use the greatest value of these dissimilarities as the distance between clusters. It will produce more compact clusters.
• Minimum Linkage Clustering: The least pairwise dissimilarities between the items in clusters are computed. Those will be used as a linkage criterion in minimum linkage clustering. The result of this clustering is long and loose clusters.
• Average Linkage Clustering: It will compute all the pairwise dissimilarities between the items in the cluster and use the average of the dissimilarities for determining the distance between the clusters.
• Centroid Linkage Clustering: The dissimilarity between the Centroid for clusters is computed with the help of Centroid Linkage Clustering.
• Ward’s Minimum Variance Method: Ward’s Minimum Variance method will reduce the overall within-cluster variation to the lowest level possible. At every phase, we will merge the clusters with a short distance between them.

The common advantages of Hierarchical Clustering are as follows:

• Straightforward: The approach for this algorithm is straightforward compared to others. It will directly take us to the program screen and make us mix and solve different problems using it.
• Easy to understand: It doesn’t use complex methods as those are hard to understand, so they use simple methods to make it easy to understand by anyone.
• Clarity: The dendrogram will provide clarity for the output we will get it. Using this approach, we can know what will happen in the future.
• Appealing output: The main output delivered from this approach is the dendrogram. The dendrogram will appeal to the users and make them understand things easily. It will provide users with an appealing output.

Here are the common disadvantages of Hierarchical Clustering:

• It won’t be possible to undo the previous step, i.e., once the instances are assigned to a cluster, we cannot move them.
• Initial seeds will have a stronger impact on the final outputs.
• The order of data will impact the final outputs.
• It might be sensitive to outliers.
• This won’t be suitable for larger datasets as it can be time and space-consuming.
• There is no mathematical objective for this clustering.
An Introduction to Different Methods of Clustering In Machine Learning
Clustering in Machine Learning is a technique that involves the clustering of data points. In any data set, clustering algorithms are used to classify each data point into a specific...read more
K-means Clustering in Machine Learning
When you are dealing with Machine Learning problems that work with unlabeled training datasets, the most common learning algorithms you will come across are clustering algorithms. Amongst them, the simplest...read more
A Gentle Introduction of Divide and Conquer Algorithm
The divide and Conquer algorithm first divide the problem and then conquers or solves it. This article will briefly discuss about the algorithm, its working, and properties of algorithms. The...read more

## Applications of Hierarchical Clustering

Here are the top applications that use Hierarchical Clustering:

### 1. Fake News Identification

Fake news is not a new phenomenon, but it can be more prevalent. Due to technological advancements like social media, fake news will be manufactured and circulated at an alarming rate. This method will work by analyzing the words used in the false news, then grouping them accordingly. These clusters will help determine which news is authentic and which is fraudulent.

### 2.Document Analysis

Document Analysis needs more generation. Each individual will have a different reason why they want to run an analysis on their documents. Hierarchical clustering will be best suitable for this application. The system can analyze the text and categorize them into different topics. Using the features described in the text, we can easily cluster and arrange the papers methodologically.

### 3.Phylogenetic Tree Tracking Virus

Viral epidemics and their sources are huge health issues that need proper tracking. Tracing the roots of these diseases will provide scientists and doctors with information about why and how the outbreak started, which will help find a suitable outcome. Viruses can have Rapid mutation rates, which can happen on certain DNA sequences and varies depending on the transmission phase, time, and other health factors. So there will be different pathways that we need to track, and this clustering will help us with it.

### Wrapping Up

Hierarchical Clustering is an unsupervised machine learning algorithm. This method will work by grouping data into a tree of clusters. We have already seen the two types of Hierarchical Clustering algorithms. Hierarchical clustering is a useful way of segmenting observations. This will help you predefine the number of clusters providing it quite an edge over k-means. But, it doesn’t work well with larger datasets.

Explore machine learning courses

Contributed by: Aswini R

Recently completed any professional course/certification from the market? Tell us what you liked or disliked in the course for more curated content.

## FAQs

What are the two techniques of Hierarchical Clustering?

The two techniques of Hierarchical Clustering are Agglomerative and Divisive. Agglomerative is a bottom-up approach where every observation starts in its cluster, and cluster pairs are merged as one moves up in the hierarchy. Divisive is a top-down approach where all observations start in a single cluster, and we will perform splits recursively as one moves down the hierarchy.

How is Hierarchical Clustering used in Machine Learning?

Hierarchical Clustering is an unsupervised machine-learning approach. It will help us group unlabeled datasets into clusters. With this technique, we develop the cluster hierarchy in a tree form, and the tree structure is known as a dendrogram. Hierarchical Clustering is also known as HCA or Hierarchical Cluster Analysis.

Which is better k-means or Hierarchical Clustering?

Experimental studies show that k-means clustering will outperform hierarchical clustering in terms of entropy and purity using the cosine similarity measures. But, hierarchical clustering will outperform the k-means using Euclidean distance.

Is Hierarchical Clustering top-down or bottom-up?

Hierarchical Clustering is a Bottom-up approach when it's Agglomerative Hierarchical Clustering. But it will be top-down if it needs splitting a cluster and proceeding recursively until we reach the individual clusters.

Is Hierarchical Clustering good for large datasets?

The classical clustering methods will not cope with large datasets. The major reason is either the constraint of data maintenance in the main or temporary storage can be complicated.

What are the key issues in Hierarchical Clustering?

Once two clusters are merged, they cannot be split up later even for a more favorable outcome. The Agglomerative Hierarchical Clustering method will perform clustering on a local level. K-means can do it globally. Though it's advantageous, it's time and space-consuming, which will be expensive. We have to decide on treating the clusters based on their size of it and merging similar ones, which can be time-consuming.