euro2003

Uploaded from authorPOINTLite
Views:
 
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction: 

Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee National Taiwan University, Taiwan Speaker: Pi-Chuan Chang 張碧娟

Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction: 

Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction Pi-Chuan Chang, Shuo-Peng Liao, Lin-Shan Lee National Taiwan University, Taiwan Speaker: Pi-Chuan Chang 張碧娟

Problems...: 

Problems... New words, phrases are generated every day, especially serious in Chinese language

Problems...: 

Problems... New words, phrases are generated every day, especially serious in Chinese language Each Chinese characters is a morpheme with its own meaning

Problems...: 

Problems... New words, phrases are generated every day, especially serious in Chinese language Each Chinese characters is a morpheme with its own meaning No “blanks” in written Chinese sentences as boundaries

Problems...: 

Problems... New words, phrases are generated every day, especially serious in Chinese language How to extract new words or phrases in Chinese?

Problems...: 

Problems... New words, phrases are generated every day, especially serious in Chinese language How to extract new words or phrases in Chinese? How does new phrase extraction improve Statistical Language Modeling for speech recognition?

Our Ideas: 

Our Ideas Language Modeling with temporally consistent (contemporary) data

Our Ideas: 

Our Ideas Language Modeling with temporally consistent (contemporary) data Contemporary New Words and Phrases can be extracted

Our Ideas: 

Our Ideas Language Modeling with temporally consistent (contemporary) data Contemporary New Words and Phrases can be extracted Text news are easily obtained for Broadcast News task

Our Ideas: 

Our Ideas Language Modeling with temporally consistent (contemporary) data Iterative New Phrase Extraction Enrich our lexicon with contemporary new words Capture longer dependencies through n-gram LM

Iterative Chinese New Phrase Extraction: 

Iterative Chinese New Phrase Extraction w1 w2 Given a word pair...

Iterative Chinese New Phrase Extraction -- an example: 

A 3-character personal name... Iterative Chinese New Phrase Extraction -- an example w1 w2 w3 [ Iteration 1 ] Iterative

Iterative Chinese New Phrase Extraction -- an example: 

Iterative Chinese New Phrase Extraction -- an example [ Iteration 2 ] w2 w3 w1 Iterative

Iterative Chinese New Phrase Extraction: 

New Words and Phrases with flexible length can be extracted For example: <<動態,<隨機,存取>>,記憶體> ( <<dynamic,<random,access>>,memory> ) Iterative Chinese New Phrase Extraction Iterative

Experiments: 

Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM

Experiments: 

Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM Goal: Show the time effect of training corpus and testing speech data

Experiments: 

Experiments We use temporally consistent (contemporary) text news to update the lexicon and the LM Goal: Show the time effect of training corpus and testing speech data Compare our new phrase extraction with other phrase extraction method

Experimental Environment: 

Experimental Environment Test set Chinese Broadcast News (September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-September

Experimental Environment: 

Experimental Environment Test set Chinese Broadcast News(September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-September

Experimental Environment: 

Experimental Environment Test set Chinese Broadcast News (September 2002) Adaptation corpus Yahoo! News mid-August 2002 to mid-September

Experimental Environment (cont.): 

Experimental Environment (cont.) ASR character accuracy is used as the performance metric

Experimental Environment (cont.): 

Experimental Environment (cont.) ASR character accuracy is used as the performance metric Baseline language model 60K-word trigram LM (BSL) Estimated on a 40M-character corpus from 1997-1999 selected news

[Exp1] Preliminary Experiment on New Word Extraction: 

[Exp1] Preliminary Experiment on New Word Extraction Temporal relationship Results

[Exp2] Analysis of LM Enhancement w.r.t. the Degree of Temporal Consistency for Adaptation Corpus: 

[Exp2] Analysis of LM Enhancement w.r.t. the Degree of Temporal Consistency for Adaptation Corpus 3-day smoothing window Gap

Conclusion: 

Conclusion We propose an iterative Chinese new phrase extraction method in this paper Detailed analysis with respect to the degree of temporal consistency from adaptation corpora using expanded lexicon and LM adaptation