Huffman coding in C#

Huffman coding is an encoding algorithm used for data compression. So called Huffman trees are used for encoding and decoding of data. The leaves of these trees contain the symbols and their frequencies. Each parent node contains the sum of the frequencies of its children.

Example of a Huffman tree

Huffman trees are constructed as follows:

1. Create a list of nodes where each node holds a symbol and its frequency. Sort the list according to frequency.

Ordered list of Huffman nodes for the input string "foo bar"

2. Take the first two nodes in the list and create a new node (also known as the parent node) which holds the sum of the frequencies of those nodes and the nodes as its children.

3. Add the new node to the list of nodes and remove the two nodes you’ve used to create the parent node.

4. Repeat these steps until the list contains one node. This is the root of the Huffman tree.

Node class

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace HuffmanTest
{
    public class Node
    {
        public char Symbol { get; set; }
        public int Frequency { get; set; }
        public Node Right { get; set; }
        public Node Left { get; set; }

        public List<bool> Traverse(char symbol,List<bool> data)
        {
            // Leaf
            if (Right == null && Left == null)
            {
                if (symbol.Equals(this.Symbol))
                {
                    return data;
                }
                else
                {
                    return null;
                }
            }
            else
            {
                List<bool> left = null;
                List<bool> right = null;

                if (Left != null)
                {
                    List<bool> leftPath = new List<bool>();
                    leftPath.AddRange(data);
                    leftPath.Add(false);

                    left = Left.Traverse(symbol, leftPath);
                }

                if (Right != null)
                {
                    List<bool> rightPath = new List<bool>();
                    rightPath.AddRange(data);
                    rightPath.Add(true);
                    right = Right.Traverse(symbol, rightPath);
                }

                if (left != null)
                {
                    return left;
                }
                else
                {
                    return right;
                }
            }
        }
    }
}

HuffmanTree class

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;

namespace HuffmanTest
{
    public class HuffmanTree
    {
        private List<Node> nodes = new List<Node>();
        public Node Root { get; set; }
        public Dictionary<char, int> Frequencies = new Dictionary<char, int>();

        public void Build(string source)
        {
            for (int i = 0; i < source.Length; i++)
            {
                if (!Frequencies.ContainsKey(source[i]))
                {
                    Frequencies.Add(source[i], 0);
                }

                Frequencies[source[i]]++;
            }

            foreach (KeyValuePair<char, int> symbol in Frequencies)
            {
                nodes.Add(new Node() { Symbol = symbol.Key, Frequency = symbol.Value });
            }

            while (nodes.Count > 1)
            {
                List<Node> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList<Node>();

                if (orderedNodes.Count >= 2)
                {
                    // Take first two items
                    List<Node> taken = orderedNodes.Take(2).ToList<Node>();

                    // Create a parent node by combining the frequencies
                    Node parent = new Node()
                    {
                        Symbol = '*',
                        Frequency = taken[0].Frequency + taken[1].Frequency,
                        Left = taken[0],
                        Right = taken[1]
                    };

                    nodes.Remove(taken[0]);
                    nodes.Remove(taken[1]);
                    nodes.Add(parent);
                }

                this.Root = nodes.FirstOrDefault();

            }

        }

        public BitArray Encode(string source)
        {
            List<bool> encodedSource = new List<bool>();

            for (int i = 0; i < source.Length; i++)
            {
                List<bool> encodedSymbol = this.Root.Traverse(source[i], new List<bool>());
                encodedSource.AddRange(encodedSymbol);
            }

            BitArray bits = new BitArray(encodedSource.ToArray()); 

            return bits;
        }

        public string Decode(BitArray bits)
        {
            Node current = this.Root;
            string decoded = "";

            foreach(bool bit in bits){
                if (bit)
                {
                    if (current.Right != null)
                    {
                        current = current.Right;
                    }
                }
                else
                {
                    if(current.Left != null)
                    {
                        current = current.Left;
                    }
                }

                if (IsLeaf(current))
                {
                    decoded += current.Symbol;
                    current = this.Root;
                }
            }

            return decoded;
        }

        public bool IsLeaf(Node node)
        {
            return (node.Left == null && node.Right == null);
        }

    }
}

The main program

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;

namespace HuffmanTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string input = "foo bar";
            HuffmanTree huffmanTree = new HuffmanTree();

            // Build the Huffman tree
            huffmanTree.Build(input);

            // Encode
            BitArray encoded = huffmanTree.Encode(input);

            Console.Write("Encoded: ");
            foreach (bool bit in encoded)
            {
                Console.Write((bit ? 1 : 0) + "");
            }
            Console.WriteLine();

            // Decode
            string decoded = huffmanTree.Decode(encoded);

            Console.WriteLine("Decoded: " + decoded);

            Console.ReadLine();
        }
    }
}
This entry was posted in C# and tagged , , , , , , , , , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

11 Comments

  1. Posted April 24, 2012 at 10:11 am | Permalink

    THANK YOU

  2. Seth
    Posted October 10, 2012 at 8:34 pm | Permalink

    How would one use this to decode without knowing the original string?

    • Posted October 12, 2012 at 9:15 am | Permalink

      Hi Seth,

      The Huffman tree (with the symbols and their frequencies) is also transmitted with the compressed array of bits. The tree is needed to encode and to decode. However, the original message is not needed to decode.

      Regards,

      Ferhat

      • david
        Posted July 31, 2014 at 7:46 am | Permalink

        how can i use build metod without start string ?

      • david
        Posted July 31, 2014 at 7:47 am | Permalink

        how can i use build metod without original string ?

  3. Aldarion
    Posted October 28, 2012 at 5:11 pm | Permalink

    I just did this as homework. Using array instead of Dictionary is about 5 times faster and when you are trying to process GBs of data it really matters.
    I wanted to point it out, because i spent a lot of time today optimizing my code and Dictionary performed terrible.

    Cheers
    Alda

    • Posted November 1, 2012 at 7:08 pm | Permalink

      Hi Alda,

      Thanks for pointing this out!

      Regards,

      Ferhat

    • michael lynn
      Posted May 5, 2013 at 7:02 pm | Permalink

      hey can I see how you did yours faster with arrays?

  4. Royalmind
    Posted June 18, 2014 at 9:33 am | Permalink

    How can i read the media file rather than string ?

  5. Anas Jabar
    Posted October 29, 2014 at 5:56 am | Permalink

    I want to make changes in this program so that I can write the encoding in a text file along with the tree (encoding map) and later on read it and decode the binaries.

  6. Vahram
    Posted May 31, 2015 at 10:06 am | Permalink

    Thank you

Post a Reply to Royalmind Cancel reply

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Why ask?