ELE 488 Fall 2006Image Processing and Transmission (11-28 -06): ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06)
Digital Video
Motion Pictures
Broadcast Television
Digital Video
11/28
Two Images: Two Images http://www.tomography.manchester.ac.uk/whattom.shtml
Motion Picture Television Digital Video: Motion Picture Television Digital Video Broadcast Television (analog)
why invent new technology?
movie at home, mass market
influence of movie on development
Key Steps
convert pictures to electric signal
send electric signal
convert electric signal to picture
Comparison with motion picture
High Definition Television - analog digital, compression
Video telephone - analog predecessor
Video conference - travel cost, people cost
Cable (narrowcast), satellite, interactive, ...
NTSC (National Television Systems Committee): NTSC (National Television Systems Committee) 525 lines
2 dots less than 1/2000 of distance from eye are not separated (merge into one)
Assume view at distance 4 times the screen height. No need to have more than 500 lines
NTSC set 525 lines (475 active)
Movies in 1940 has 4:3 aspect ratio (width to height)
25 or more pictures per second to see continuous motion
50 or more pictures per second to avoid flicker
movies use 24 frames/sec, each shown twice
30 frames/sec with 2:1 interlace (60 even-odd fields/sec)
Bandwidth of Broadcast Television: Bandwidth of Broadcast Television Without interlace (progressive scan), 60 frames/sec
500 lines alternating black and white gives 250 full cycles
each horizontal line has 250 x 4/3 ~ 350 full cycles
60 (frames/sec) x 500 (line) x 350 = 10 MHz (video ONLY)
With 2:1 interlace, 5 MHz for video
FCC assigns 6 MHz per broadcast channel
real usable bandwidth is less, MUCH less
actual resolvable lines per vertical height ~250
Color insertion - must compatible with B/W receiver
Change R-G-B to Y-Cb-Cr
Y is luminance (brightness), Cb and Cr are chrominances
B/W sets converts Y to picture, color sets converts Y-Cb-Cr to R-G-B then display
Digital Video: Digital Video What drives digital video?
Information technology:
electronics, communication, storage, new functionality, …
HDTV
R-G-B component video
640 x 480 (pixel) x 3 (color) x 8 (bits/color) x 30 = 221 Mb/sec
Y-Cb-Cr with subsampled Cb and Cr
640 x 480 (pixel) x 1.5 (color) x 8 (bits/color) x 30 = 110 Mb/sec
Compression - MPEG (motion picture expert group)
MPEG-1: CD-ROM, 1.5Mb/sec, 1.2Mb/sec for video, 352x240 (CIF), progressive scan, motion compensation
MPEG-2: extension of MPEG-1, interlace, HD
MPEG-4: object/region based
H.2xx
Video Coding: Video Coding Video consists of frames In(i,j)
Code each frame as a still picture – motion JPEG
Each frame is close to the previous frame
Code the difference FDn(i,j) = In(i,j) – In-1(i,j)
Differential coding (DPCM, predictive coding)
( In-1(i,j) is the predicted value of In(i,j) )
Need to code the first frame
Encoding Three Frame Types: Encoding Three Frame Types Differential encoding of video
I – Intra Frame, code by itself
P – Prediction Frame, code by referring to
previous I or P frame
B – Bi-direction Frame, code by referring to
forward AND backward I or P frames
Coding of I-frame – same as still image: Coding of I-frame – same as still image
I Frames: I Frames I frames are Intra-coded using the JPEG coded
I frames can be decoded without reference to other frames of the video.
Sometimes called anchor frames I frame: JPEG Frame 31 A group of pictures (GOP) begins with an I-frame and ends before the next I-frame
A typical GOP length is 15 frames
With only 1 I-frame per GOP (the first frame)
Coding P Frames: Coding P Frames Each frame is close to the previous frame
Code frame difference (differential coding – DPCM) Occlusion
parts of current frame is blocked in previous frame
need future frame to “predict” FDn(i,j) = In(i,j) – In+1(i,j) current frame In frame difference In - In-1
Coding P Frames: Coding P Frames Each frame is close to the previous frame
Code frame difference (differential coding – DPCM) current frame In frame difference In - In-1
Coding of P Frames: Coding of P Frames Video consists of frames In(i,j)
Code each frame as a still picture – motion JPEG
Each frame is close to the previous frame
Code the difference FDn(i,j) = In(i,j) – In-1(i,j)
Differential coding (predictive coding)
In-1(i,j) is the predicted value of In(i,j)
Observe:
Most part of frame is unchanged
Except for moving objects
Motion Compensated Coding MPEG
Motion Compensated Video Coding: Motion Compensated Video Coding Observe: Most of picture remains unchanged
But some objects have moved.
So code Displaced Frame Difference
Motion Compensated Coding
previous frame current frame
Displaced Frame Difference: Displaced Frame Difference
Displaced Frame Difference: Displaced Frame Difference
P Frames: P Frames I frame: JPEG P frame: motion compensated. macro-blocks and macro-block
motion vectors are indicated Frame 31 Frame 34 P frames are coded using two methods:
- block motion compensation + error coding
- jpeg (intra-coded), without referring to previous frames
P frames are also anchor frames Divide P-frame into Macro-blocks
MB ~16x16
Finding Motion Vectors: Finding Motion Vectors Matching a block from current frame with a displaced block in reference
frame using: (a) sum of squared difference (SSD), or
(b) sum of absolute difference (SAD) (almost always used)
The displacement giving best match is the motion vector of the block
Search methods:
Global search over the entire anchor frame
Restricted search over local neighborhood
Fast search – over a selected neighborhood,
anchor frame current frame
Illustration: P-frame Macro-Blocks: Illustration: P-frame Macro-Blocks Frame 34 P-frame
MPEG: I and P frames (anchor frames): MPEG: I and P frames (anchor frames)
Block Matching Motion Estimation: Block Matching Motion Estimation current frame Block for which motion vector to be determined a position for comparison previous frame another position Blocks of size MxN
Motion Compensated Encoding of P Frame: Motion Compensated Encoding of P Frame current frame previous frame Y
Coding of P frame: Coding of P frame reconstructed previous frame Encoder contains decoder
More Detail: More Detail
Need for Bi-directional Encoding: Need for Bi-directional Encoding
Bidirectional Encoding: Bidirectional Encoding
Frame Transmit Order vs Viewing Order: Frame Transmit Order vs Viewing Order View order Decode order
= transmit order
B-frames: B-frames B-frames are coded in the same way as P-frames except that for each macro-block, search for the best matching block in both the preceding and succeeding anchor frames. Use the encoding that requires the fewest bits. Called bidirectional encoding.
Block Matching Motion Estimation: Block Matching Motion Estimation current frame Block for which motion vector to be determined a position for comparison previous frame another position Blocks of size MxN
Complexity of Exhaustive Block-Matching: Complexity of Exhaustive Block-Matching Assumptions
Block size NxN and image size S=M1xM2
Search step size is 1 pixel ~ “integer-pixel accuracy”
Search range +/–R pixels both horizontally and vertically
Computation complexity
# Candidate matching blocks = (2R+1)2
# Operations for computing MAD for one block ~ O(N2)
# Operations for MV estimation per block ~ O((2R+1)2 N2)
# Blocks = S / N2
Total # operations for entire frame ~ O((2R+1)2 S)
i.e., overall computation load is independent of block size!
E.g., M=512, N=16, R=16, 30fps
=> On the order of 8.55 x 109 operations per second!
Was difficult for real time estimation, but possible with parallel hardware UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Exhaustive Search: Cons and Pros: Exhaustive Search: Cons and Pros Pros
Guaranteed optimality within search range and motion model
Cons
Motion vectors are integer valued
High computation complexity
On the order of [search-range-size * image-size] for 1-pixel step size
How to improve accuracy?
Half pixel – significantly improvement
Quarter pixel – some improvement
Requires interpolation
How to improve speed?
Fast search
Try to exclude unlikely candidates UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Half pixel resolution in matching: Half pixel resolution in matching B p
Fast Algorithm: 3-Step Search : Fast Algorithm: 3-Step Search Search candidates at 9 positions
Reduce step-size after each iteration
Start with step size approx. half of max. search range
Total number of computations: 9 + 82 = 25 (3-step) (2R+1)2 = 169 (full search) (Fig. from Ken Lam – HK Poly Univ. short course in summer’2001) UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Hierarchical Block Matching: Hierarchical Block Matching Problem with fast search at full resolution
Small mis-alignment may give large displacement error esp. for texture and edge blocks
Hierarchical (multi-resolution) block matching
Match with coarse resolution to narrow down search range
Match with high resolution to refine motion estimation (From Wang’s Preprint Fig.6.19) UMCP ENEE408G Slides (created by M.Wu & R.Liu © 2002)
Pixel Decimation: Pixel Decimation IEEE Trans. on Video Technology, April 1993, pp. 148- 157. a block in current frame part of a block in reference frame
Pixel Decimation: Pixel Decimation
Subsampled Motion Field: Subsampled Motion Field
Subsampled Motion Field: Subsampled Motion Field
What else can you do with MPEG video?: What else can you do with MPEG video? The MPEG encoder-decoder is asymmetric.
Encoder is much more complex than the decoder. Determining motion vectors is a major task
Decoding is easy and fast.
The encoding only has to be done once, the decoding will be done many times or at many locations.
Symmetric application?
Compression loses information. But
compressed video has information not readily available in original video