Compression Concepts

Data compression principles

Below, data compression principles are listed. Data compression:

Is the substitution of frequently occurring data items, or symbols, with short codes that require fewer bits of storage than the original symbol.
Saves space, but requires time to save and extract.
Success varies with type of data.
Works best on data with low spatial variability and limited possible values.
Works poorly with high spatial variability data or continuous surfaces.
Exploits inherent redundancy and irrelevancy by transforming a data file into a smaller one.

The compression ratio is the ratio of the two file sizes. e.g., original image is 100MB, after compression, the new file is 10MB. Then the compression ratio is 10:1.

Two compression techniques are mostly used: Lossless and Lossy compression.

Lossless compression

A lossless compression algorithm eliminates only redundant information, so that one can recover the data exactly upon decompression of the file.

Lossless data compression is compression without any loss of data quality. The decompressed file is an exact replica of the original one. Lossless compression is used when it is important that the original and the decompressed data be identical. It is done by re-writing the data in a more space efficient way, removing all kinds of repetitions (compression ratio 2:1).
Some image file formats, notably PNG, use only lossless compression, while those like TIFF may use either lossless or lossy methods.

Examples of LOSSLESS METHODS are:

run-length coding
Huffman coding
Lempel-Ziv-Welsh (LZW) method

Numerical example

An example of seven gray pixels:
128, 127, 126, 121, 124, 123, 120

Can be re-written in shorter numbers requiring less bits like:
128, -1, -1, -5, +3, -1, -3

Lossless method

Lossy compression

A lossy compression method is one where compressing data and then decompressing it retrieves data that may well be different from the original, but is "close enough" to be useful in some way.
The algorithm eliminates irrelevant information as well, and permits only an approximate reconstruction of the original file. Lossy compression is also done by re-writing the data in a more space efficient way, but more than that: less important details of the image are manipulated or even removed so that higher compression rates are achieved. Lossy compression is dangerously attractive because it can provide compression ratios of 100:1 to 200:1, depending on the type of information being compressed. But the cost is loss of data.

The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application.

Examples of LOSSY METHODS are:

PCM
JPEG
MPEG

Numerical example

The previous sequence of numbers
128, 127, 126, 121, 124, 123, 120

can be re-written like:
128 - 6
Result after decompression:
128, 127, 126, 125, 124, 123, 122

Lossy method