Slide1 : Data Compression © Prof. Aiman Hanna
Department of Computer Science Concordia University Montreal, Canada
D ata Compression : D ata Compression Why needed?
Size of applications is going from large to larger
MP3, MPEG, Tiff, Facsimile (fax), …etc.
Fax has about 4 million dots/page more than 1 minutes over 56Kbps
TV / Motion Pictures uses 30 pictures (frames) / second
200,000 pixels / frames
Color pictures require 3 bytes for each pixel (RGB)
Each frame has 200,000 * 24 = 4.8 Mbits
2-hour movie requires 216,000 pictures
total bits for such movie = 216,000 * 4.8 Mbits = 1.0368 x 1012
This is much higher than the capacity of DVDs
D ata Compression : D ata Compression Simple Compression example
Assume only uppercase characters are to be sent
ASCII can be used 8-bit * number of characters to sent
Alternatively, a different 5-bit code can be used
A: 00000
B: 00001
C: 00010
:
:
Y: 11000
Z: 11001
37.5% reduction is achieved
F requency-Dependent Codes : F requency-Dependent Codes Some characters are used more than others
ASCII assigns the same number of bits to all characters
Alternatively, assign shorter code for those who are used more frequently
Huffman Code & Arithmetic Compression are examples of frequency-dependent code
H uffman Code : H uffman Code Assign a percentage of usage to each character
Create a Huffman code based on that
Example
For illustration, assume only 5 characters are used
Letter Frequency
A 25%
B 15%
C 10%
D 20%
E 30%
H uffman Code : H uffman Code Huffman code for the previous example
Letter Code
A 01
B 110
C 111
D 10
E 00
Now, assume the sent code is 1101001010010
How the receiver can decode that sequence?
H uffman Code : H uffman Code Huffman uses no-prefix property
Codes obtained by creating Huffman trees and merging them Figure 5. 2 – Merging Huffman Trees
H uffman Code : H uffman Code Using the no-prefix property enables decoding Figure 5. 1 – Receiving & Interpreting Huffman-Coded Message
A rithmetic Compression : A rithmetic Compression Interprets a character string as a single real number
Define an association between a character string and a real number [0 .. 1]
Example
For illustration, assume only 5 characters are used
Letter Frequency Subinterval [p,q]
A 25% [0, 0.25]
B 15% [0.25, 0.40]
C 10% [0.4, 0.5]
D 20% [0.5, 0.7]
E 30% [0.7, 1]
A rithmetic Compression : A rithmetic Compression In a nutshell, the idea is:
Start with the entire range, that is [0..1]
Narrow this range down every time you move through the string.
This narrowing down operation depends on two factors:
The previous narrowed down range
The frequency of the character
Once reached the last character in the string, just pick up any value from the final range
A rithmetic Compression : A rithmetic Compression Example:
Letter Frequency Subinterval [p,q]
A 25% [0, 0.25]
B 15% [0.25, 0.40]
C 10% [0.4, 0.5]
D 20% [0.5, 0.7]
E 30% [0.7, 1]
What is the code for CABACE?
Start range is [0..1], distance is 1
C has subinterval [0.40 .. 0.50] we get 0.40 * 1 (which is 0.40) and 0.50% * 1 (which is 0.50) from the beginning of the previous range
New range now is [0.40 .. 0.50]; that is [0.4 + 0 .. 0.5 + 0]
A rithmetic Compression : A rithmetic Compression Example:
Letter Frequency Subinterval [p,q]
A 25% [0, 0.25]
B 15% [0.25, 0.40]
C 10% [0.4, 0.5]
D 20% [0.5, 0.7]
E 30% [0.7, 1]
What is the code for CABACE?
Start range now is [0.40..0.50] distance is 0.10
A has subinterval [0 .. 0.25] we get 0 * 0.10 (which is 0) and 0.25% * 0.10 (which is 0.025) from the beginning of the previous range
New range now is [0.40 .. 0.425]; that is [0.40 + 0 .. 0.40 + 0.025]
A rithmetic Compression : A rithmetic Compression Example (continue): What is the representation of CABACE?
Letter Frequency Subinterval [p,q]
A 25% [0, 0.25]
B 15% [0.25, 0.40]
C 10% [0.4, 0.5]
D 20% [0.5, 0.7]
E 30% [0.7, 1]
from previous page, the last range after CA are coded is [0.4 .. 0.425] a width/difference of 0.025 (that is 0.425 – 0.4)
B has subinterval [0.25 .. 0.40] we get 25% * 0.025 (which is 0.00625) and 0.4% * 0. 0.025 (which is 0.01) from the beginning of the previous range
New range now is [0.40625 .. 0.41]; that is [0.4 + 0.00625 .. 0.4 + 0.01]
A will move the range to [0.40625 .. 0.4071875]
C will move the range to [0.406625 .. 0.40671875]
E will finally move the range to [0.406690625 .. 0.40671875]
A final code can then be 0.40671
A rithmetic Compression : A rithmetic Compression Decoding: real value character string
What is the string for 0.40671?
R un-Length Encoding : R un-Length Encoding Run of the Same Bit
0 represents white spot, 1 black spot
Bits are grouped into long runs of 0s
Instead of transmitting the bits, transmit their run amount
Use 4-bit for run (value ranges from 0 to 15), however
The maximum length of a run is 14
Run of 15 is represented as 1111 0000
Run of 30 is represented as 1111 1111 0000
Run of 31 is represented as 1111 1111 0001
R un-Length Encoding : R un-Length Encoding Run of the Same Bit Figure 5.5 – Run-Length Encoding
R un-Length Encoding : R un-Length Encoding Runs with Different Characters
Needed if the transmitted stream is a character stream (not only 1s & 0s)
A string of “HHHHUFFFFFFFFFFFFFFGGG” would be coded as: 4H1U14F3G
R elative Encoding : R elative Encoding Also referred to as Differential Encoding
Huffman codes are good for data messages
Run Length encoding is good for fax and voice
None of them is that suitable for video
Pictures may have little difference in between
Send the full 1st frame, then send frames that have the differences
Run-length encoding can be used for those differential frames
R elative Encoding : R elative Encoding
Figure 5.6 – Relative Encoding
L empel-Ziv Compression : L empel-Ziv Compression Focuses on repetition of words or phrases
For example: the, that, ing, in, on, of, ...etc
Look for often-repeated strings and store (code) them just once
Reference those strings through their special code
I mage Compression : I mage Compression Image Representation
Images are made up of very small dots (pixels)
Is there really such thing as black & white photos?
I mage Compression : I mage Compression Different video colors may be obtained using combination of Red, Green & Blue (RGB)
8 bits can be used to represent each of the tree colors
A total of 24 bits 224 different combinations
Since human eye cannot distinguish these many colors, we think of it as true color
I mage Compression : I mage Compression An alterative to RGB is YIQ, which is based on RGB
YIQ uses 8-bit group for luminance (brightness), and two other 8-bit (each) for chrominance (color)
For example,
Y=0.30R+0.59G+.11B (luminance)
I=0.60R-0.28G-0.32B (chrominance)
Q=0.21R-0.52G+0.31B (chrominance)
Human eye is more sensitive to luminance than chrominance; that is some loss in colors through transmission may not be visually detectable
That is useful information when it comes to image compression
I mage Compression : I mage Compression Regardless of using RGB or YIQ, there is 3 8-bit groups per pixel
To fill a 640 x 480 computer screen, we need:
640 * 480 * 24 7,372,800 bits
Video usually uses 30 images/sec, and transfer may be done simultaneously to many users
This may easily average to Gbps rate and higher, which is far more than what current technology (as of 2006, when this was written) can handle
JPEG Compression : JPEG Compression Joint Photographic Experts Group
Formed by ISO, ITU & IEC
Previous methods were examples of lossless compression
JPEG however is lossy
Considering the optical system of a human, that may still be acceptable
JPEG Compression : JPEG Compression JPEG is good for grayscale or photo-colored image
JPEG compression consists of three phases: Discrete Cosine Transform (DCT), Quantization, and Encoding
DCT divides an image into a series of “blocks” (8x8 pixels each block). I.e. 640x480 pixels = 80x60 blocks Figure 5.8 – JPEG’s Three Phases
JPEG Compression : JPEG Compression For example, 800 x 800 image would be divided into 100 x 100 blocks – each block has 8 x 8 pixels Figure 5.9 –
800 x 800 VGA Screen Image Divided into 8 x 8 pixel blocks
JPEG Compression : JPEG Compression If grayscale, then each pixel is represented by an 8-bit number
Each 8 x 8 pixel block will be represented as 2-D array with 8 rows & 8 columns
If color, then each pixel is represented by a 24-bit number
Each 8 x 8 pixel block will be represented as three 2-D arrays with 8 rows & 8 columns each
JPEG Compression : JPEG Compression DCT Phase
Basically, it is a function that takes a 2-D array with 8 rows & 8 columns and produces another 2-D array
P[i][j] is the input array, T[i][j] is the output array
The values inside T[i][j] are called spatial frequencies Formula page 240
JPEG Compression : JPEG Compression DCT Phase Figure 5.10 – Discrete Cosine Transform Results on Two Different Arrays
JPEG Compression : JPEG Compression Quantization Phase
Provides a way of ignoring small difference that may not be perceptible
It produces another 2-D array, call it Q for example
Q[i][j] values are obtained by dividing T[i][j] values by some number and rounding to the nearest integer
The resulting array with have fewer distinct numbers, so it easier to compress
JPEG Compression : JPEG Compression Quantization Phase
Example: T values are divided by 10 then rounded Step 5-4, page 242 - Q Array Step 5-3, page 242 - T Array
JPEG Compression : JPEG Compression Quantization Phase
How can we obtain T (decompression) from Q then?! Step 5-5, page 242 – T Array After Decompression
JPEG Compression : JPEG Compression Quantization Phase
To preserve as much information as possible, define a quantization array, say U, and divide T values by the values of U Step 5-3, page 242 - T Array Step 5-6, page 243 - U Array
JPEG Compression : JPEG Compression Quantization Phase
Step 5-7, page 243 - Q Array Step 5-8, page 243 – T Array After Decompression
JPEG Compression : JPEG Compression Encoding Phase
So far, transformation & quantization were done, but nothing about compression!
This phase finally does the compression
The idea is to linearize the content of the Q array and compress it for transmission
Run-length coding can then be used Figure 5.11 – Order in which Array elements are transmitted
JPEG Compression : JPEG Compression JPEG can use Huffman encoding or Arithmetic encoding for the non-zero values
Since many 2-D must be transmitted for the image, and many of them may not have much differences, Differential encoding can also be applied
In general, JPEG compression ration is about 20:1; that is, the resulted file is 5% of the original
Better ratios are possible but loss of quality may become noticeable
JPEG 2000 is the newest JPEG coding
Mass details of JPEG can be found at www.JPEG.org
GIF Files : GIF Files Graphics Interchange Format
The number of colors that GIF works with is only 256 (28 in contrast to 224 for JPEG
Stores up to 256 colors in a table and attempt to cover the range of colors in an image as closely as possible
The resulting bit values are subjected to some form of Lempel-Ziv compression
Lossy if the number of colors in an image is more than 256; lossless otherwise
Best suited for graphics that contain relatively few colors and sharply defined boundaries between the colors, such as cartoons, charts, …etc.
Not that suitable for images with lots of variations between colors, such as full-color photographic-quality images
M ultimedia Compression : M ultimedia Compression Compression of video, motion pictures and audio
MPEG - Moving Pictures Expert Group
MP3 – File extension for MPEGaudio Layer 3
MPEG : MPEG Actually not a single standard:
MPEG-1: designed for video on CD-ROM
MPEG-2: designed for more demanding applications, such as multimedia entertainment and high-definition television (HDTV)
MPEG-4: designed for videoconferencing over low BW channels
MPEG-7 & MPEG-21 are also in progress
MPEG : MPEG Video/motion is produced by displaying still pictures at a rate of some frames/second
NTSC defined that as 30 frames/second
Lower rate than that may produce a motion, but a jerky one, as in older movies
MPEG compression may be obtained by using JPEG compression for each of these frames, however this is not sufficient
MPEG : MPEG On 640 x 480 VGA, one frame may contain 24 * 640 * 480 = 7,372,800 bits
On 20:1 JPEG compression, image is reduced to 368,640
30 images/second 30 * 368,640 = 11,059,200 bps
That is still too high, especially considering shared channels that may be used by multiple users accessing video
MPEG : MPEG No matter how much action is in a video, the difference between consecutive frames is usually quite small
MPEG compression takes advantage of this redundancy (temporal redundancy) in successive frames, however
What happen if hidden objects in the old frame start to appear?
What happens if the seen completely changes?
MPEG identifies 3 types of frames:
I Frame (Intrapicture Frame)
P Frame (Predicted Frame)
B Frame (Bidirectional Frame)
MPEG : MPEG P frames are coded using a method called motion-compensated prediction
This method divides the image into a collection of macroblocks of 16 x 16 pixels (totals to 256 pixels)
If each pixel has a color (1 luminescence & 2 chrominance), then three 2-D arrays are needed (each array of size 16 x 16) Figure 5.12 – Typical MPEG Frame Sequence (Logical Sequence)
MPEG : MPEG To speed things up, the two chrominance arrays are reduced from 16 x 16 to 8 x 8
Figure 5.13 – Reduction of 16 x 16 Chrominance Arrays to 8 x 8 Chrominance Arrays
MPEG : MPEG Prior to sending the P frame, an algorithm runs to find out what is the best matched macroblock in the I frame; this may not be at the same position in the I frame
The algorithm then calculates the differences between the matching macroblocks in the I & P frames
The algorithm also calculates a motion-vector
Both the differences and the motion-vectors are then transmitted
MPEG : MPEG The B frame is similar to P frame, except that the macroblocks are interpolated from matching macroblocks in a prior & future frames
Interpolation is a way of predicting a value based on two existing values
Figure 5.14 – Using Interpolation to Estimate a Value
MPEG : MPEG Notes:
MPEG has a high computation and algorithmic complexity
However, in most applications video is recorded just once and stored in some medium
As a result, time spend on encoding (compression) is not a concern here
Yet, time spent in decompression (decoding) is very significant since it is run-time
MP3 : MP3 MPEG Layer3 for audio compression
Pulse-Code Modulation (PCM) can be used to transform analog (audio) to digital
Human’s auditory system can only hear frequencies between 20Hz & 20KHz
According to Nyquist Theorem, a sampling frequency of about 40KHz would be sufficient to reconstruct the audio signal within a human’s hearing range
Consequently, common PCM techniques to produce a CD-quality sound use 16-bit sampling and 44.1KHz sampling frequency
1 second of PCM-coded music requires 16 * 44.1 * 1000 ≈ 700,000 bits
A 2-chaneel stereo would hence require about 1.4 Mb
2-minute recording would require about 168 Mb
MP3 : MP3 Psychoacoustic Model:
What can human auditory hear?
What can human auditory distinguish?
Auditory Masking:
If two sounds have similar frequencies, but one is weak and the other is high, it is possible that a human cannot hear the weak sound
MP3 fundamental idea is:
Capture an audio signal
Determine what cannot be heard and remove it, then
Digitize the rest
MP3 : MP3 Psychoacoustic Model:
What can human auditory hear?
What can human auditory distinguish?
Auditory Masking:
If two sounds have similar frequencies, but one is weak and the other is high, it is possible that a human cannot hear the weak sound
MP3 fundamental idea is:
Capture an audio signal
Determine what cannot be heard and remove it, then
Digitize the rest
MP3 : MP3 Figure 5.15 – MP3 Encoding