Introduction to Huffman Coding

4 mins read1.4K Views 1 Comment

Call 8585951111Got Doubts?

Senior Executive - Content

Updated on Feb 9, 2024 17:06 IST

Huffman coding is a greedy approach based lossless data compression algorithm. It uses variable-length encoding to compress the data. The main idea in Huffman coding is to assign each character with a variable length code. The length of each character is decided on the basis of its occurrence frequency. The character that occurred most time would have the smallest code and the character that occurred least in the data would have the largest code.

In this article, we are going to discuss Huffman coding, prefix rule, major steps in building a Huffman tree, and algorithm for Huffman coding, followed by the complexity and application of Huffman coding.

Huffman coding

Huffman coding was introduced by David Huffman. It is a lossless data compression methodology used before sending the data so that data can be compressed and sent using minimal bits, without redundancy, and without losing any of the details. It generally compresses the most frequently occurring characters.

Stay updated with the latest blogs on online courses and skills

Enter Mobile Number

The characters which have more frequency are assigned with the smallest code and the characters with larger frequency get assigned with the largest code.

The codes are assigned in such a manner that each code will be unique and different for each character.

Also explore: Understanding Data Structures in C: Types And Operations

Recommended online courses

Best-suited Data Structures and Algorithms courses for you

Learn Data Structures and Algorithms with these high-rated online courses

DSA to Development A Complete Guide

GeeksforGeeksCertificate

Total Fees

– / –

Duration

150 hours

Python Foundation with Data Structures & Algorithms

Coding NinjasCertificate

4.6

Total Fees

– / –

Duration

4 months

Data Structures & Algorithm

Coding NinjasCertificate

5.0

Total Fees

– / –

Duration

6 months

Data Structures and Algorithms in Java [Level 1 - Foundation]

PepcodingCertificate

Total Fees

– / –

Duration

200 hours

Data Structures

Great LearningCertificate

5.0

Total Fees

Free

Duration

40 hours

Intro to Data Structures and Algorithms

Grow With GoogleCertificate

Total Fees

Free

Duration

4 weeks

Foundation for Competitive Programming

Coding NinjasCertificate

5.0

Total Fees

– / –

Duration

4 months

Programming, Data Structures And Algorithms Using Python

NPTELCertificate

4.7

Total Fees

– / –

Duration

8 hours

Numerical Methods

NPTELCertificate

Total Fees

– / –

Duration

8 weeks

Data Structures and Algorithms in Java

Great LearningCertificate

5.0

Total Fees

Free

Duration

1 hours

Prefix rule

Huffman coding is implemented using the prefix rule. The variable-length codes assigned to the characters based on their frequency are the Prefix code. Prefix codes are used to remove ambiguity during the decoding of the data, no two characters can have the same prefix code.

Major steps in building a Huffman tree

Huffman coding technique involves two major steps, which are as follows:

Creating a Huffman tree of the input characters.
Traversing the tree and assigning codes to the characters.

You can also explore: Tree Traversal in Data Structure

Explore: Data Structures Online Courses & Certifications

Working or Creation of Huffman Tree

Huffman Coding step-by-step working or creation of Huffman Tree is as follows:

Step-1: Calculate the frequency of each string.
Step-2: Sort all the characters on the basis of their frequency in ascending order.
Step-3: Mark each unique character as a leaf node.
Step-4: Create a new internal node.
Step-5: The frequency of the new node as the sum of the single leaf node
Step-6: Mark the first node as this left child and another node as the right child of the recently created node.
Step-7: Repeat all the steps from step-2 to step-6.

You can also explore: Introduction to Linked List Data Structure

Formulas Used in Huffman Tree

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= ∑ ( frequency_i x Code length_i)

Example:

Suppose a data file has the following characters and the frequencies. If huffman coding is used, calculate:

Huffman Code of each character
Average code length
Length of Huffman encoded data

Characters	Frequencies
A	12
B	15
C	7
D	13
E	9

Solution:

Initially, create the Huffman Tree:

Step-2:

Step-3:

Step-4:

Step-5:

The above tree is a Huffman Tree.

Now, assign weight to all the nodes.

Assign “0” to all left edges and “1” to all right edges.

The tree will become

Huffman Code of each character:

A: 00

B: 10

C: 110

D: 01

E: 111

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

= {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x 3)} / (12 + 13 + 15 + 7+ 9)

= (24+ 26+ 30 + 21 + 27)/56

= 128/56

= 2.28

Average code length per character = 2.28

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= 56 x 2.28

= 127.68

Algorithm for Huffman coding

For implementing Huffman coding, we need a priority queue that will have all the unique characters of the data. Then sort all the characters on the basis of their occurring frequency in ascending order. Take out the minimum value from the priority queue and assign it to the left child and similarly assign the value to the right child.

The algorithm for Huffman coding is as follows:

create a priority queue X having each unique character.

sort them in ascending order based on their frequencies.

for ( all the unique characters):

    create a newNode

    extract minimum value from X and assign it to the left child of newNode

    extract minimum value from X and assign it to the right child of newNode

    find the sum of these two minimum values and assign it to the value of newNode

    insert this newNode into the tree

return rootNode

Complexity

Time complexity is O (n logn).

Application of Huffman coding.

Huffman coding is used in traditional compression techniques like GZIP etc.
For tax and fax transmissions

FAQs

Is Huffman coding lossless or lossy compression?

Huffman coding is a lossless compression algorithm. It preserves all the original information from the input data during the compression and decompression process. When the compressed data is decompressed using the same Huffman tree, the original data is fully recovered without any loss.

What is the time complexity of the Huffman coding algorithm?

The time complexity of the Huffman coding algorithm is O(n log n), where n is the number of unique characters or symbols in the input data. This complexity arises from building the Huffman tree by repeatedly merging nodes and constructing the variable-length codes.

Is Huffman coding used in practice?

Yes, Huffman coding is widely used in practice. It has been used in various applications, including file compression formats such as ZIP, GZIP, and DEFLATE. It is also used in image compression algorithms like JPEG and video compression algorithms like MPEG.

About the Author

Anshuman Singh

Senior Executive - Content

Anshuman Singh is an accomplished content writer with over three years of experience specializing in cybersecurity, cloud computing, networking, and software testing. Known for his clear, concise, and informative wr Read Full Bio

Comments

(1)

arun velan

a year ago

Report

"The characters which have more frequency are assigned with the smallest code and the characters with larger frequency get assigned with the largest code. " I hope in the above line it must be less frequency instead of larger frequency. Kindly check

Reply to arun velan

Introduction to Huffman Coding

Huffman coding

Best-suited Data Structures and Algorithms courses for you

DSA to Development A Complete Guide

Python Foundation with Data Structures & Algorithms

Data Structures & Algorithm

Data Structures and Algorithms in Java [Level 1 - Foundation]

Data Structures

Intro to Data Structures and Algorithms

Foundation for Competitive Programming

Programming, Data Structures And Algorithms Using Python

Numerical Methods

Data Structures and Algorithms in Java

Prefix rule

Major steps in building a Huffman tree

Working or Creation of Huffman Tree

Formulas Used in Huffman Tree

Example:

Algorithm for Huffman coding

Complexity

Application of Huffman coding.

FAQs

Comments

Top Picks & New Arrivals