Presentation Transcript
Sound Detection: Sound Detection Derek Hoiem
Rahul Sukthankar (mentor)
August 24, 2004
Objective: Objective
Learn model of sound object from few (10-20) examples and distinguish from all other sounds
Examples of sound classes:
Gunshots, screams, laughter, car horns, meow, dog bark, etc
Applications: Applications
“Tell me if you hear a gunshot.” (monitoring)
“Get me video clips containing dogs barking.” (search and retrieval)
“What’s going on?” (scene understanding)
Why its difficult: Why its difficult
Sound classes have large variations
Sounds are often ambiguous without context
Overlaid “noise” obscures sound
Sound or not?: Sound or not? Car horn Laser gun Dog bark Which of these sounds are not from their named classes?
Previous work: Previous work
Sound Classification (Wold 1996, Casey 2001, etc)
Categorize short sound clips
Reasonable accuracy (5-20% error)
Sound Detection (Defaux 2000, Piamsa-nga 1999)
Localize and recognize sound objects in long clips
Poor performance or assumption of unrealistic conditions (e.g., very quiet background)
Detection via Windowed Search: Detection via Windowed Search Long Track Break audio track into short overlapping short clips Clip
Classifier Independently classify short clips as object or non-object Return locations of detected sound object
Representation: Representation meows phone rings Raw Representation
Classification Features: Classification Features Diverse feature set:
Different sound classes are distinctive in different ways
means and standard deviations of power at different frequencies
Band-width, peaks, loudness, etc.
138 features in all
Classification by Decision Trees: Classification by Decision Trees Try to find simple rules that discriminate object from non-object
Each decision is based on a threshold of a feature value
Assign confidence based on likelihood of data for object and non-object classes at each leaf node Decision nodes Leaf Nodes
Boosted Trees: Boosted Trees Problem: One decision tree by itself may not be a great classifier
Solution: Use several trees, with each one focusing on the mistakes of previously learned trees
Adaboost:
Weight training data uniformly
Learn a decision tree classifier on weighted data
Re-weight data giving more weight to incorrectly classified examples
Final classification based on linear combination of confidences from all learned decision trees
Examples of Decision Trees: Examples of Decision Trees Low percentage of power in low frequencies in mid-time of sound Very high power amplitude range Meow Gunshot High power amplitude range More complex tree that focuses on examples misclassified by tree above Gunshot
Cascade of Classifiers: Cascade of Classifiers Goal: eliminate false positives with few false negatives in early stages
Advantages:
Allows use of large set of negative training examples
Improves classification speed
Dangers: cannot recover from false negatives Stage 1 Sound Clip Stage 2 Stage 3 Pass Fail Pass (5%) Pass (2%) Pass (0.005%) Fail Fail Fail
Results: Classification Error: Results: Classification Error
Results: ROC curves: Results: ROC curves Note: to approximate negative error rate divide FP by 25,000
Results: Anecdotal: Results: Anecdotal Gunshots Female Laugh Male Laugh Swords Scream