Introduction to Huffman Coding

Introduction to Huffman Coding

clickHere
Updated on Feb 9, 2024 17:06 IST

Huffman coding is a greedy approach based lossless data compression algorithm. It uses variable-length encoding to compress the data. The main idea in Huffman coding is to assign each character with a variable length code. The length of each character is decided on the basis of its occurrence frequency. The character that occurred most time would have the smallest code and the character that occurred least in the data would have the largest code.

In this article, we are going to discuss Huffman coding, prefix rule, major steps in building a Huffman tree, and algorithm for Huffman coding, followed by the complexity and application of Huffman coding.

Huffman coding

Huffman coding was introduced by David Huffman. It is a lossless data compression methodology used before sending the data so that data can be compressed and sent using minimal bits, without redundancy, and without losing any of the details. It generally compresses the most frequently occurring characters.

The characters which have more frequency are assigned with the smallest code and the characters with larger frequency get assigned with the largest code.

The codes are assigned in such a manner that each code will be unique and different for each character.

Also explore: Understanding Data Structures in C: Types And Operations

Prefix rule

Huffman coding is implemented using the prefix rule. The variable-length codes assigned to the characters based on their frequency are the Prefix code. Prefix codes are used to remove ambiguity during the decoding of the data, no two characters can have the same prefix code.

Major steps in building a Huffman tree

Huffman coding technique involves two major steps, which are as follows:

• Creating a Huffman tree of the input characters.
• Traversing the tree and assigning codes to the characters.

You can also explore: Tree Traversal in Data Structure

Explore: Data Structures Online Courses & Certifications

Working or Creation of Huffman Tree

Huffman Coding step-by-step working or creation of Huffman Tree is as follows:

• Step-1: Calculate the frequency of each string.
• Step-2: Sort all the characters on the basis of their frequency in ascending order.
• Step-3: Mark each unique character as a leaf node.
• Step-4: Create a new internal node.
• Step-5: The frequency of the new node as the sum of the single leaf node
• Step-6: Mark the first node as this left child and another node as the right child of the recently created node.
• Step-7: Repeat all the steps from step-2 to step-6.

You can also explore: Introduction to Linked List Data Structure

Formulas Used in Huffman Tree

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

= ∑ ( frequencyi x Code lengthi )

Example:

Suppose a data file has the following characters and the frequencies. If huffman coding is used, calculate:

• Huffman Code of each character
• Average code length
• Length of Huffman encoded data

Solution:

Initially, create the Huffman Tree:

Step-2:

Step-3:

Step-4:

Step-5:

The above tree is a Huffman Tree.

Now, assign weight to all the nodes.

Assign “0” to all left edges and “1” to all right edges.

The tree will become

Huffman Code of each character:

A: 00

B: 10

C: 110

D: 01

E: 111

Average code length per character = ∑(frequencyi x code lengthi)/ ∑ frequencyi

= {(12 x 2) + (13 x 2)+ (15 x 2)+ ( 7 x 3)+ (9 x 3)} / (12 + 13 + 15 + 7+ 9)

= (24+ 26+ 30 + 21 + 27)/56

= 128/56

= 2.28

Average code length per character = 2.28

Total number of bits in Huffman encoded message

= Total number of characters in the message x Average code length per character

=  56 x 2.28

=  127.68

Algorithm for Huffman coding

For implementing Huffman coding, we need a priority queue that will have all the unique characters of the data. Then sort all the characters on the basis of their occurring frequency in ascending order. Take out the minimum value from the priority queue and assign it to the left child and similarly assign the value to the right child.

The algorithm for Huffman coding is as follows:

`create a priority queue X having each unique character.`
`sort them in ascending order based on their frequencies.`
`for ( all the unique characters):`
`    create a newNode`
`    extract minimum value from X and assign it to the left child of newNode`
`    extract minimum value from X and assign it to the right child of newNode`
`    find the sum of these two minimum values and assign it to the value of newNode`
`    insert this newNode into the tree`
`return rootNode`

Complexity

Time complexity is O (n logn).

Application of Huffman coding.

• Huffman coding is used in traditional compression techniques like GZIP etc.
• For tax and fax transmissions

FAQs

Is Huffman coding lossless or lossy compression?

Huffman coding is a lossless compression algorithm. It preserves all the original information from the input data during the compression and decompression process. When the compressed data is decompressed using the same Huffman tree, the original data is fully recovered without any loss.

What is the time complexity of the Huffman coding algorithm?

The time complexity of the Huffman coding algorithm is O(n log n), where n is the number of unique characters or symbols in the input data. This complexity arises from building the Huffman tree by repeatedly merging nodes and constructing the variable-length codes.

Is Huffman coding used in practice?

Yes, Huffman coding is widely used in practice. It has been used in various applications, including file compression formats such as ZIP, GZIP, and DEFLATE. It is also used in image compression algorithms like JPEG and video compression algorithms like MPEG.