Presentation Transcript
Slide1: Segmenting Digital Video
Paul Browne
Centre for Digital Video Processing
Dublin City University
Slide2: Presentation
Introduction
Evaluation Baseline
Shot Boundary Detection Methods
Evaluation of Methods
Combining Shot Boundary Algorithms
Combining Results
Scene Segmentation
Luminance Scene Segmentation Results
Television News Segmentation
Television News Segmentation Results
Conclusions 2#23
Slide3: Digital video is composed of :
Frame
Shots
Shot boundary
Scenes
Audio Introduction
Navigating digital video.
Segmentation needed to replace 3#23
Slide4: Introduction
Shot Segmentation problems Examples
object motion person moves into a camera shot ...
camera motion panning, zooming …
lighting changes camera flash , lightning ..
some types of shot boundary dissolves , fades ...
digital effects swirls , morphing …
To reduce false shot changes
Threshold values - higher values
Empirical restrictions - example: shot must be greater than 100 frames …..
4#23
Slide5: Evaluation Baseline
Eight hours of continuous digital video.
Recorded 12th June 1998 from 1pm to 9pm.
Broken up into 24 * 20 minute video segments.
Manually evaluated for shot boundary changes.
Results outputted to baseline log file.
For each segment:
Video file
audio file
baseline log file
Baseline segmented into 13 programs
5#23
Slide6: Shot Boundary Detection Methods
Colour Histogram
Colour percentages for a frame is stored.
Results compared with that of the adjacent frame.
Difference value calculated.
Difference above a certain value (threshold) is shot change Histogram values generated difference value above certain value is shot change Compared with previous frames histogram values 6#23
Slide7: Shot Boundary Detection Methods
Edge Detection
frame turned into a grayscale image.
edge detection algorithm is then applied to the image.
difference value calculated for two adjacent frames.
difference above a certain value (threshold) is shot change difference value Compared with previous frames edge values 7#23
Slide8:
Macroblock
works on compressed MPEG digital video.
Frame split into fixed regions called macroblocks
Three types of macroblock
I encoded independently of other macroblocks
P encode not the region but the motion vector and error block of the previous frame
B same as above except that the motion vector and error block are encoded from the previous or next frame
Detecting shot changes
specific numbers of macroblock types will occur i i i i i i i i P i B B B Frame with macroblocks P P Shot Boundary Detection Methods 8#23
Slide9: Evaluation of Methods
Two evaluation measures are: Number of correct shots found Number of correct shots found Actual number of shots Number of correct shots found + false shots Recall : Precision : 9#23 There is a balance between these measures
Slide10: Evaluation of Methods
The following Venn diagram shows the overlap in correct shot boundaries detected by each of the methods. 241 419 23 52 391 281 4449 Colour Histogram Edge Detection Macroblock 10#23
Slide11: Evaluation of Methods
Average Precision values over 8 hours
Colour Histogram 90.4
Edge Detection 90.0
Macroblock 87.4
Average Recall values over 8 hours
Colour Histogram 78.9
Edge Detection 70.2
Macroblock 75.3
Programs with lowest Recall values are:
Home & Away (Australian soap)
Cooking Program 11#23
Slide12: Combining Shot Boundary Algorithms
Points of note
Limit of 356 extra shots by combining
Highest Precision and Recall using Colour Histogram
Combining
Combining favors histogram
Problem
Additional false changes will also be introduced
12#23
Slide13: Combining Shot Boundary Algorithms
Logic of the combining method that selects a shot boundary:
if difference value(s) above threshold value(s) then shot boundary Colour Histogram Edge Detection Macroblock or or Method(s) difference value Thresholds Shot boundary Low Histogram 13#23
Slide14: Combining Results
Colour histogram method (best performing method)
Precision average on 8 hours : 90.4 %
Recall average on 8 hours : 78.9 %
Combined method:
Precision decreased an average : 1% or 37 shots
Recall increased an average : 4% or 167 shots
14#23
Slide15: Scene Segmentation
Two Approaches
Luminance based segmentation
Television News Segmentation
Problems
Scene is a semantic concept
Computer needs wide domain knowledge
Typical scene will contain many large changes
in light and colour over its duration
15#23
Slide16: Luminance Scene Segmentation
Method designed to detect location based scenes
Method operation:
Compare adjacent shots using existing shot boundary results
Look for large changes in light to detect scene changes
Those above threshold are selected as candidates
When all shots compared apply a second low threshold to all candidate scenes
Finally apply a minimum gap between scenes 16#23
Slide17: Luminance Scene Segmentation Results
Results are reasonable for situation comedies
Algorithm will not segment action programs well
Nt # of scenes.
Nf # full scene boundaries detected
Ns # of valid scenes groups found
Ni # false scene groups
Baseline content Nt Nf Ns Ni
Keeping up Appearances 10 2 15 2
Shortland Street 9 3 6 3
Fair City 13 3 12 4
Shortland Street 17 5 13 6
17#23
Slide18: Luminance Scene Segmentation Results
False scene detection 18#23
Slide19: Television News Segmentation
News is highly structured content
There is a structure for News scenes that is generally followed
1. Scene begins in studio, introduced by newscaster
2. Move to location shot, view of newsperson or voice at scene
3. Video segment(s) of the main topic is shown
Algorithm three step process to detect scenes:
1. Obtain shots with a length of 280 frames (11.2 seconds)
2. Generate colour histogram for candidate shots and their adjacent shot. Remove those with only a small difference value.
3. Generate a comparison of the 20th frame with the 100th frame in the shot. The comparison is an average of the pixel difference of the two frames. 19#23
Slide20: Television News Segmentation Results
Results for this algorithm are good compared to the luminance approach
Nt # of news anchor persons
Ns # of news anchors detected by the algorithm.
Ni # of false news anchors.
Nd # of missing news anchors.
Program Nt Ns Ni Nd Details
RTE1 9pm news 10 9 3 1 24th October 2000
RTE1 9pm news 6 6 0 3 15th September 2000
RTE1 9pm news 16 14 3 2 4th October 2000
RTE1 1pm news 11 9 2 2 29th November 2000
TV3 7pm news 16 13 3 6 4th December 2000
TV3 7pm news 13 9 4 4 5th December 2000 20#23
Slide21: Television News Segmentation Results
Correctly Identified scenes
Incorrectly Identified scene
21#23
Slide22: Conclusions
Separating the baseline into logical programs gives a better view of how the shot boundary detection method performs.
It is possible to improve the overall Recall performance of shot boundary methods by combining them.
Precision and Recall performance will depend on the threshold levels used
There is a trade-off between Precision and Recall
Future video indexing, retrieval and summarisation may require a higher performance than any single shot boundary is able to deliver.
Scene segmentation is feasible on highly structured content like news and location based scenes
Automatic scene segmentation will introduce additional errors 22#23
Slide23: The End For more information
pbrowne@compapp.dcu.ie
Center for Digital Video Processing
Dublin City University : http://lorca.compapp.dcu.ie/Video/ 23#23