logging in or signing up euro2003 Pravez Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 16 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: October 05, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction: Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee National Taiwan University, Taiwan Speaker: Pi-Chuan Chang 張碧娟Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction: Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee National Taiwan University, Taiwan Speaker: Pi-Chuan Chang 張碧娟Problems...: Problems... New words, phrases are generated every day, especially serious in Chinese languageProblems...: Problems... New words, phrases are generated every day, especially serious in Chinese language Each Chinese characters is a morpheme with its own meaningProblems...: Problems... New words, phrases are generated every day, especially serious in Chinese language Each Chinese characters is a morpheme with its own meaning No “blanks” in written Chinese sentences as boundariesProblems...: Problems... New words, phrases are generated every day, especially serious in Chinese language How to extract new words or phrases in Chinese? Problems...: Problems... New words, phrases are generated every day, especially serious in Chinese language How to extract new words or phrases in Chinese? How does new phrase extraction improve Statistical Language Modeling for speech recognition?Our Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) dataOur Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) data Contemporary New Words and Phrases can be extractedOur Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) data Contemporary New Words and Phrases can be extracted Text news are easily obtained for Broadcast News task Our Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) data Iterative New Phrase Extraction Enrich our lexicon with contemporary new words Capture longer dependencies through n-gram LMIterative Chinese New Phrase Extraction: Iterative Chinese New Phrase Extraction w1 w2 Given a word pair...Iterative Chinese New Phrase Extraction -- an example: A 3-character personal name... Iterative Chinese New Phrase Extraction -- an example w1 w2 w3 [ Iteration 1 ] IterativeIterative Chinese New Phrase Extraction -- an example: Iterative Chinese New Phrase Extraction -- an example [ Iteration 2 ] w2 w3 w1 IterativeIterative Chinese New Phrase Extraction: New Words and Phrases with flexible length can be extracted For example: <<動態,<隨機,存取>>,記憶體> ( <<dynamic,<random,access>>,memory> ) Iterative Chinese New Phrase Extraction IterativeExperiments: Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LMExperiments: Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM Goal: Show the time effect of training corpus and testing speech dataExperiments: Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM Goal: Show the time effect of training corpus and testing speech data Compare our new phrase extraction with other phrase extraction methodExperimental Environment: Experimental Environment Test set Chinese Broadcast News (September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-SeptemberExperimental Environment: Experimental Environment Test set Chinese Broadcast News(September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-SeptemberExperimental Environment: Experimental Environment Test set Chinese Broadcast News (September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-SeptemberExperimental Environment (cont.): Experimental Environment (cont.) ASR character accuracy is used as the performance metricExperimental Environment (cont.): Experimental Environment (cont.) ASR character accuracy is used as the performance metric Baseline language model 60K-word trigram LM (BSL) Estimated on a 40M-character corpus from 1997-1999 selected news[Exp1] Preliminary Experiment on New Word Extraction: [Exp1] Preliminary Experiment on New Word Extraction Temporal relationship Results [Exp2]Analysis of LM Enhancement w.r.t. the Degree of Temporal Consistency for Adaptation Corpus: [Exp2] Analysis of LM Enhancement w.r.t. the Degree of Temporal Consistency for Adaptation Corpus 3-day smoothing window GapConclusion: Conclusion We propose an iterative Chinese new phrase extraction method in this paper Detailed analysis with respect to the degree of temporal consistency from adaptation corpora using expanded lexicon and LM adaptation You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
euro2003 Pravez Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 16 Category: Product Traini.. License: All Rights Reserved Like it (0) Dislike it (0) Added: October 05, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction: Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee National Taiwan University, Taiwan Speaker: Pi-Chuan Chang 張碧娟Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction: Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee National Taiwan University, Taiwan Speaker: Pi-Chuan Chang 張碧娟Problems...: Problems... New words, phrases are generated every day, especially serious in Chinese languageProblems...: Problems... New words, phrases are generated every day, especially serious in Chinese language Each Chinese characters is a morpheme with its own meaningProblems...: Problems... New words, phrases are generated every day, especially serious in Chinese language Each Chinese characters is a morpheme with its own meaning No “blanks” in written Chinese sentences as boundariesProblems...: Problems... New words, phrases are generated every day, especially serious in Chinese language How to extract new words or phrases in Chinese? Problems...: Problems... New words, phrases are generated every day, especially serious in Chinese language How to extract new words or phrases in Chinese? How does new phrase extraction improve Statistical Language Modeling for speech recognition?Our Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) dataOur Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) data Contemporary New Words and Phrases can be extractedOur Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) data Contemporary New Words and Phrases can be extracted Text news are easily obtained for Broadcast News task Our Ideas: Our Ideas Language Modeling with temporally consistent (contemporary) data Iterative New Phrase Extraction Enrich our lexicon with contemporary new words Capture longer dependencies through n-gram LMIterative Chinese New Phrase Extraction: Iterative Chinese New Phrase Extraction w1 w2 Given a word pair...Iterative Chinese New Phrase Extraction -- an example: A 3-character personal name... Iterative Chinese New Phrase Extraction -- an example w1 w2 w3 [ Iteration 1 ] IterativeIterative Chinese New Phrase Extraction -- an example: Iterative Chinese New Phrase Extraction -- an example [ Iteration 2 ] w2 w3 w1 IterativeIterative Chinese New Phrase Extraction: New Words and Phrases with flexible length can be extracted For example: <<動態,<隨機,存取>>,記憶體> ( <<dynamic,<random,access>>,memory> ) Iterative Chinese New Phrase Extraction IterativeExperiments: Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LMExperiments: Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM Goal: Show the time effect of training corpus and testing speech dataExperiments: Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM Goal: Show the time effect of training corpus and testing speech data Compare our new phrase extraction with other phrase extraction methodExperimental Environment: Experimental Environment Test set Chinese Broadcast News (September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-SeptemberExperimental Environment: Experimental Environment Test set Chinese Broadcast News(September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-SeptemberExperimental Environment: Experimental Environment Test set Chinese Broadcast News (September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-SeptemberExperimental Environment (cont.): Experimental Environment (cont.) ASR character accuracy is used as the performance metricExperimental Environment (cont.): Experimental Environment (cont.) ASR character accuracy is used as the performance metric Baseline language model 60K-word trigram LM (BSL) Estimated on a 40M-character corpus from 1997-1999 selected news[Exp1] Preliminary Experiment on New Word Extraction: [Exp1] Preliminary Experiment on New Word Extraction Temporal relationship Results [Exp2]Analysis of LM Enhancement w.r.t. the Degree of Temporal Consistency for Adaptation Corpus: [Exp2] Analysis of LM Enhancement w.r.t. the Degree of Temporal Consistency for Adaptation Corpus 3-day smoothing window GapConclusion: Conclusion We propose an iterative Chinese new phrase extraction method in this paper Detailed analysis with respect to the degree of temporal consistency from adaptation corpora using expanded lexicon and LM adaptation