Description
The ‘coder’ object attached
to this article can work box implementation of an arithmetic
encoder. However, to use it, we used understand the basics of
statistical modeling for arithmetic coding and concepts of
arithmetic coding.
In general, using arithmetic coding effects creating a statistical model of the data. In this example, I will
assume that we are trying to encode the words “HELLO WORLD”.
Creating the statistical model of
the data proceeds as follows:
- Taking the number of independent characters in the
words to be encoded (HELLO WORLD), we obtain, in alphabetical order:
- D
- E
- H
- L
- O
- R
- W
The total number of characters to be encoded is 10. For convenience and clarity, I have ignored the space between the words HELLO and WORLD.- We can arrange these characters into a table with their
corresponding frequency, as below:
CharacterFrequencyD1E1H1L3O2R1W1- Each of the characters will be assigned a range based
on its frequency/ probability of occurrence. This range will be between 0
and 1, as below (note that I have not used any optimizations for the
frequency and probability model; for the most optimal compression, this is
necessary; however, for the purposes of our example, this should do).
CharacterFrequencyProbabilityRangeD11/100.0 – 0.1E11/100.1 – 0.2H11/100.2 – 0.3L33/100.3 – 0.6O22/100.6 – 0.8R11/100.8 – 0.9W11/100.9 – 1.0- The algorithm for encoding is as below:
Set low to 0.0Set high to 1.0While there are still input symbols doget an input symbolcode_range = high - low.- O
- The algorithm for decoding the number and retrieving
the encoded string/data is as below:
Find the symbol represented by the range that the number is in, output it. Remove the effects of encoding and repeat. In pseudo-code:- The algorithm for decoding the number and retrieving
the encoded string/data is as below: