logging in or signing up MLI NOD Status98 Jancis Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 40 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 05, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. WactlarMLI and NoD Tasks: MLI and NoD Tasks Data collection & preparation - English, Serb-Croation, and German Multilingual speech recognition enhancements Video and audio segmentation Multilingual indexing, retrieval, search Summarization-on-demand Annotations User studies Additional languages and functionalities Demonstration as a network-based serviceAccomplishments to Apr 98: Accomplishments to Apr 98 We are achieving what we proposed and beyond Advances in capability (research => integrated function) Infrastructure evolution & growth Testbed activity and extension Related research and outreachSlide4: Accomplishments to Apr 98 (cont’d) Serbo-Croation demonstration system Automated and dynamic abstraction and summarization for improved navigation Topic detection and assignment for subject browsing Dynamically improved speech recognition for index generation Coherent story segmentation through corpus specific, rule-based analysis more ...Slide5: Accomplishments to Apr 98 (cont’d) Video-OCR for improved name/face identification Multi-level annotations to mark and share commentary Web interface enabling “slide show” viewing over slow links Database restructuring to enable size growth and function evolution Remote testbeds with access to daily updated newsAutomated Abstraction and Summarization: Automated Abstraction and Summarization Critical to efficient navigation of video Improved automatic title generation Dynamic “poster frame” icons - query based Skims smoothed through enhanced language models and rule-based scene selection“Naïve” Poster Frame Result List (Uses First Shot Image): “Naïve” Poster Frame Result List (Uses First Shot Image)Query-based Poster Frame Result List: Query-based Poster Frame Result ListQuery-based Poster Frame Selection Process: Query-based Poster Frame Selection Process 1. Decompose video segment into shots. 2. Compute representative frame for each shot. 3. Locate query scoring words (shown by arrows). 4. Use frame from highest scoring shot.Topic Detection and Tracking: Enhances browsing and discovery over directed search Different methods from several areas being evaluated Information retrieval - vector space methods - relevance feedback Speech recognition - hidden Markov models Statistics - k-nearest neighbors - exponential models Topic Detection and TrackingKNN-based Topic Detection: KNN-based Topic Detection Build training index with pre-labeled topics - 45000 Broadcast News stories from 1995 and 1996 - 3178 different news topics occurring > 10 times Search for top 10 related stories in training index Lookup topics for related stories Re-weight topics by story relevance (select top 5) At 5 topics, Recall - .491 Relevance - .482Speech Recognition for Index Generation: Speech Recognition for Index Generation Integrate closed captioning with speech recognition generated transcription Improve accuracy by automatic daily expansion of language model from closed captioning e.g. “Dodi Fayed” Participated (with Claritech) in TREC Spoken Document track large text retrieval evaluation benchmarks (NIST/DARPA) scored second due to OOV words (CIA, well-known, torched) Segmentation - Creating the Video Paragraph: Segmentation - Creating the Video Paragraph Break up a video stream into semantically coherent pieces corpus-specific analysis language model approaches video structure analysis Segmentation - Commercial Detection: Segmentation - Commercial Detection Look for several potential indicators in multiple passes detect lapses in cc capture greater than some threshold occurrence of black frames rate of scene change and motion Ad Removal based on Black Frame and Scene Change Detection: Ad Removal based on Black Frame and Scene Change Detection Truth=> Hypothesis=> <= Black frames <= Scene changeSegmentation - Language Models: Segmentation - Language Models Novel application to find shift in topic within a document Adaptive exponential language models improve as they see more material from current topic e.g., probable distance of “managed care” to “physicians” Static language models are pre-computed likelihood of short-range adjacency (e.g. trigrams) Compare predictive performance models i.e., assigned probability to the next observed words A segment boundary is likely to exist when the adaptive model shows a dip in performance relative to the short-range model Slide18: A plot of the ratio of the two language models as a function of the relative position in a segment. Video OCR: Image component crucial to news corpus Capture of text overlayed on the video image Detected, filtered, OCR’d, incorporated into content and indexed Video OCRVideo OCR Block Diagram: Video OCR Block Diagram Text Area Detection Text Area Preprocessing Commercial OCR Video ASCII TextSlide21: Video Frames (1/2 s intervals) Filtered Frames AND-ed FramesText Detection False Alarms: Text Detection False Alarms Video Frame Filtered and Anded FrameText Detection Misses: Text Detection Misses Video Frame Filtered and Anded FrameChallenges for VOCR Preprocessing: Challenges for VOCR Preprocessing The resolution of video text is very low (<10×10 ppc). Text detection and extraction are complicated by complex backgrounds. VOCR Preprocessing Problems: VOCR Preprocessing ProblemsVideo OCR - Results: Character recognition - 83% Word recognition - 70% Language model post processing will improve word recognition rate, but new names and places will not be in language model Important adjunct to Name-It: name/face correlation through co-occurrence matrices Video OCR - ResultsAnnotations: Annotation fields contain metadata automatically derived from the content (e.g. topics, chyron) Annotations are included in the index (searchable separately or combined with transcript) Personal annotations are typed or spoken comments that are established on a per user basis bookmarking or commentary fully indexed and searchable with other data AnnotationsWeb Interface: Long-time concern about video fidelity on internet Compromise is slide show of high quality JPEG images and continuous audio Not all navigation tools translate directly Required substantive change in interface specification Browsing improved over full video interface User effectiveness versus full video to be explored Web InterfaceInfrastructure Evolution and Growth: Conversion of underlying database architecture (ONGOING) extends functionality - e.g. date filtering => “What’s new?” query improved interoperability - fully distributed, replicated function increased scale negative impact on query performance (improving) Summer-long ruggedization program for reliable processing and quality control 900 hours on-line, terabyte data store 12 Alphas for parallel processing (and experiments) Infrastructure Evolution and GrowthTestbeds: Corpus CNN data: 620 hours + 12 hrs/wk Early Prime, World View, Impact, Science & Technology Week, Earth Matters, Travel Guide, Your Health Distant high speed network access Informedia-Net attached to both vBNS and AAI nets enables attachment of clients to CMU servers from selected locations clients at DARPA, SPAWAR (forthcoming), NSA TestbedsSerbo-Croation LVCSR on the Dictation and Broadcast News Domain: Serbo-Croation LVCSR on the Dictation and Broadcast News Domain Informedia (English) CMU Informedia Group (Howard Wactlar, Alex Hauptmann, Ricky Houghton, et al.) CMU Sphinx Group Multilingual Speech Recognition CMU/UKA Interactive Systems Labs - JanusRTk (Alex Waibel, Michael Finke, Petra Geutner, Peter Scheytt) Translation/Cross Language Retrieval CMU Language Technologies Institute (Jaime Carbonell, Eric Nyberg, Bob Frederking, Paul Kennedy, et al.)Slide33: Serbo-Croation Broadcast News Recognition Initial database: Globalphone Serbo-Croation (UKA) Broadcast news: Collected by satellite from Germany (UKA) 15 hours transcribed Janus recognition toolkit: 15 languages Janus applied to Serbo-Croation broadcast news Problem: Morphology, large number of inflections Competitive performance already: 26% WERVocabulary Growth Per Broadcast: Vocabulary Growth Per Broadcast Broadcast News SystemSerbo-Croatian BN Speech Performance: Serbo-Croatian BN Speech Performance Broadcast News SystemProposed National Research Data Testbed: Informedia dataset and infrastructure as a benchmarkable testbed for research in spoken language and visual documents Potential for establishing on-line public domain video archive e.g. all government produced video for training and public information fully indexed and searchable Proposed National Research Data TestbedSlide37: Project Genoa Contributions Code to extract video to place in a CIP Processing changes to index I-frames Code to run Web browser to play the MPEG segment Working towards a generic Web-based interface Other CMU: Meeting browser Full access to client but not full source codeSlide38: CMU Informedia Server CMU Informedia Client (NOD) CrisisBrowse Client SpIKE/Visage/NOD? Netscape CrisisBrowse Server Mass Storage CIP Server ? Starlight BWD JTF Planner JEDS Pseudo-TS/SCI Secret Unclassified Starlight ? DIA Wash, DC Pittsburgh, PA Internet CIA Langley, VA SIPRNET DISN LES JEDS SAIC San Diego, CA SAIC San Diego, CA mpeg jpeg txt html mpeg jpeg txt html DB? DB? DB? Data Source Picture DIAL-IN Network Neighborhood http ? DARPA TIE Arlington, VASlide39: Complete full-function Web interface Foreign language system unification S-C language models for improved query and selection S-C segmentation System completeness, robustness Should we pursue? Regular capture & processing Delivery to testbeds Future Plans - Near TermSlide40: Future Plans - Long Term NSA’s formal evaluation will help guide modifications and new features Other languages - Korean? Chinese? Translation? Translation tools? Named entity extraction: people, places, faces Geospatial correlation and visualization More content and multiple sources Multidocument summarization You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
MLI NOD Status98 Jancis Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 40 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: October 05, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Slide1: NoD and Multilingual Status Report April 1998 Carnegie Mellon University Howard D. WactlarMLI and NoD Tasks: MLI and NoD Tasks Data collection & preparation - English, Serb-Croation, and German Multilingual speech recognition enhancements Video and audio segmentation Multilingual indexing, retrieval, search Summarization-on-demand Annotations User studies Additional languages and functionalities Demonstration as a network-based serviceAccomplishments to Apr 98: Accomplishments to Apr 98 We are achieving what we proposed and beyond Advances in capability (research => integrated function) Infrastructure evolution & growth Testbed activity and extension Related research and outreachSlide4: Accomplishments to Apr 98 (cont’d) Serbo-Croation demonstration system Automated and dynamic abstraction and summarization for improved navigation Topic detection and assignment for subject browsing Dynamically improved speech recognition for index generation Coherent story segmentation through corpus specific, rule-based analysis more ...Slide5: Accomplishments to Apr 98 (cont’d) Video-OCR for improved name/face identification Multi-level annotations to mark and share commentary Web interface enabling “slide show” viewing over slow links Database restructuring to enable size growth and function evolution Remote testbeds with access to daily updated newsAutomated Abstraction and Summarization: Automated Abstraction and Summarization Critical to efficient navigation of video Improved automatic title generation Dynamic “poster frame” icons - query based Skims smoothed through enhanced language models and rule-based scene selection“Naïve” Poster Frame Result List (Uses First Shot Image): “Naïve” Poster Frame Result List (Uses First Shot Image)Query-based Poster Frame Result List: Query-based Poster Frame Result ListQuery-based Poster Frame Selection Process: Query-based Poster Frame Selection Process 1. Decompose video segment into shots. 2. Compute representative frame for each shot. 3. Locate query scoring words (shown by arrows). 4. Use frame from highest scoring shot.Topic Detection and Tracking: Enhances browsing and discovery over directed search Different methods from several areas being evaluated Information retrieval - vector space methods - relevance feedback Speech recognition - hidden Markov models Statistics - k-nearest neighbors - exponential models Topic Detection and TrackingKNN-based Topic Detection: KNN-based Topic Detection Build training index with pre-labeled topics - 45000 Broadcast News stories from 1995 and 1996 - 3178 different news topics occurring > 10 times Search for top 10 related stories in training index Lookup topics for related stories Re-weight topics by story relevance (select top 5) At 5 topics, Recall - .491 Relevance - .482Speech Recognition for Index Generation: Speech Recognition for Index Generation Integrate closed captioning with speech recognition generated transcription Improve accuracy by automatic daily expansion of language model from closed captioning e.g. “Dodi Fayed” Participated (with Claritech) in TREC Spoken Document track large text retrieval evaluation benchmarks (NIST/DARPA) scored second due to OOV words (CIA, well-known, torched) Segmentation - Creating the Video Paragraph: Segmentation - Creating the Video Paragraph Break up a video stream into semantically coherent pieces corpus-specific analysis language model approaches video structure analysis Segmentation - Commercial Detection: Segmentation - Commercial Detection Look for several potential indicators in multiple passes detect lapses in cc capture greater than some threshold occurrence of black frames rate of scene change and motion Ad Removal based on Black Frame and Scene Change Detection: Ad Removal based on Black Frame and Scene Change Detection Truth=> Hypothesis=> <= Black frames <= Scene changeSegmentation - Language Models: Segmentation - Language Models Novel application to find shift in topic within a document Adaptive exponential language models improve as they see more material from current topic e.g., probable distance of “managed care” to “physicians” Static language models are pre-computed likelihood of short-range adjacency (e.g. trigrams) Compare predictive performance models i.e., assigned probability to the next observed words A segment boundary is likely to exist when the adaptive model shows a dip in performance relative to the short-range model Slide18: A plot of the ratio of the two language models as a function of the relative position in a segment. Video OCR: Image component crucial to news corpus Capture of text overlayed on the video image Detected, filtered, OCR’d, incorporated into content and indexed Video OCRVideo OCR Block Diagram: Video OCR Block Diagram Text Area Detection Text Area Preprocessing Commercial OCR Video ASCII TextSlide21: Video Frames (1/2 s intervals) Filtered Frames AND-ed FramesText Detection False Alarms: Text Detection False Alarms Video Frame Filtered and Anded FrameText Detection Misses: Text Detection Misses Video Frame Filtered and Anded FrameChallenges for VOCR Preprocessing: Challenges for VOCR Preprocessing The resolution of video text is very low (<10×10 ppc). Text detection and extraction are complicated by complex backgrounds. VOCR Preprocessing Problems: VOCR Preprocessing ProblemsVideo OCR - Results: Character recognition - 83% Word recognition - 70% Language model post processing will improve word recognition rate, but new names and places will not be in language model Important adjunct to Name-It: name/face correlation through co-occurrence matrices Video OCR - ResultsAnnotations: Annotation fields contain metadata automatically derived from the content (e.g. topics, chyron) Annotations are included in the index (searchable separately or combined with transcript) Personal annotations are typed or spoken comments that are established on a per user basis bookmarking or commentary fully indexed and searchable with other data AnnotationsWeb Interface: Long-time concern about video fidelity on internet Compromise is slide show of high quality JPEG images and continuous audio Not all navigation tools translate directly Required substantive change in interface specification Browsing improved over full video interface User effectiveness versus full video to be explored Web InterfaceInfrastructure Evolution and Growth: Conversion of underlying database architecture (ONGOING) extends functionality - e.g. date filtering => “What’s new?” query improved interoperability - fully distributed, replicated function increased scale negative impact on query performance (improving) Summer-long ruggedization program for reliable processing and quality control 900 hours on-line, terabyte data store 12 Alphas for parallel processing (and experiments) Infrastructure Evolution and GrowthTestbeds: Corpus CNN data: 620 hours + 12 hrs/wk Early Prime, World View, Impact, Science & Technology Week, Earth Matters, Travel Guide, Your Health Distant high speed network access Informedia-Net attached to both vBNS and AAI nets enables attachment of clients to CMU servers from selected locations clients at DARPA, SPAWAR (forthcoming), NSA TestbedsSerbo-Croation LVCSR on the Dictation and Broadcast News Domain: Serbo-Croation LVCSR on the Dictation and Broadcast News Domain Informedia (English) CMU Informedia Group (Howard Wactlar, Alex Hauptmann, Ricky Houghton, et al.) CMU Sphinx Group Multilingual Speech Recognition CMU/UKA Interactive Systems Labs - JanusRTk (Alex Waibel, Michael Finke, Petra Geutner, Peter Scheytt) Translation/Cross Language Retrieval CMU Language Technologies Institute (Jaime Carbonell, Eric Nyberg, Bob Frederking, Paul Kennedy, et al.)Slide33: Serbo-Croation Broadcast News Recognition Initial database: Globalphone Serbo-Croation (UKA) Broadcast news: Collected by satellite from Germany (UKA) 15 hours transcribed Janus recognition toolkit: 15 languages Janus applied to Serbo-Croation broadcast news Problem: Morphology, large number of inflections Competitive performance already: 26% WERVocabulary Growth Per Broadcast: Vocabulary Growth Per Broadcast Broadcast News SystemSerbo-Croatian BN Speech Performance: Serbo-Croatian BN Speech Performance Broadcast News SystemProposed National Research Data Testbed: Informedia dataset and infrastructure as a benchmarkable testbed for research in spoken language and visual documents Potential for establishing on-line public domain video archive e.g. all government produced video for training and public information fully indexed and searchable Proposed National Research Data TestbedSlide37: Project Genoa Contributions Code to extract video to place in a CIP Processing changes to index I-frames Code to run Web browser to play the MPEG segment Working towards a generic Web-based interface Other CMU: Meeting browser Full access to client but not full source codeSlide38: CMU Informedia Server CMU Informedia Client (NOD) CrisisBrowse Client SpIKE/Visage/NOD? Netscape CrisisBrowse Server Mass Storage CIP Server ? Starlight BWD JTF Planner JEDS Pseudo-TS/SCI Secret Unclassified Starlight ? DIA Wash, DC Pittsburgh, PA Internet CIA Langley, VA SIPRNET DISN LES JEDS SAIC San Diego, CA SAIC San Diego, CA mpeg jpeg txt html mpeg jpeg txt html DB? DB? DB? Data Source Picture DIAL-IN Network Neighborhood http ? DARPA TIE Arlington, VASlide39: Complete full-function Web interface Foreign language system unification S-C language models for improved query and selection S-C segmentation System completeness, robustness Should we pursue? Regular capture & processing Delivery to testbeds Future Plans - Near TermSlide40: Future Plans - Long Term NSA’s formal evaluation will help guide modifications and new features Other languages - Korean? Chinese? Translation? Translation tools? Named entity extraction: people, places, faces Geospatial correlation and visualization More content and multiple sources Multidocument summarization