logging in or signing up ObaAtwell Dorotea Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 128 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English Toshifumi Oba, Eric Atwell University of Leeds, School of Computing tosh@comp.leeds.ac.uk eric@comp.leeds.ac.ukOutline: Outline Introduction Intonation and Speech Recognition Tendency of Speech Recognition Research ISLE Speech Corpus HTK Hidden Markov Model Toolkit Prosodic Annotation Human Evalution of Intonation Abilities Grouping of German Speakers by Intonation Ability HTK speech recognition experiments Conclusions Q & A Intonation and Speech Recognition : Intonation and Speech Recognition Intonation is important in Human Communication. Convey the meaning and attitude of the speaker Intonation is important for Speech Recognition. Acoustic Models (duration, F0, intensity) Language Models (identify the dialogue type) Tendency of Speech Recognition Research: Tendency of Speech Recognition Research Intonation << Pronunciation Non-native speaker << Native speaker → Speech recognition research for non-native speakers’ intonation is unique. Also, Intonation is paid less attention in CALL compared with pronunciation. Features of Various Speech Recognition Research: Features of Various Speech Recognition ResearchObjectives: Objectives Analysis of non-native speakers’ English intonation. If the HTK is able to distinguish intonation ? Is it possible to train distinct models for different intonation ability groups? Prosodic annotation of written English text to produce ‘model’ intonation patterns. Human evaluation to group German speakers by English intonation ability. ISLE Speech Corpus (1) : ISLE Speech Corpus (1) Re-use of speech corpus collected in ISLE Interactive Spoken Languge Education project. Leeds University, Universität Hamburg, Università di Milano-Bicocca, Entropic Ltd., Ernst Klett Verlag GmbH, and Dida*El S.R.L. Time-aligned audio recordings from 23 German and 23 Italian spoken learners’ English + 2 Native English Speakers.ISLE Speech Corpus (2): ISLE Speech Corpus (2) Speaker adaptation 82 sentences edited from ‘The Ascent of Everest’ e.g. ‘It is in fact a story of many years, in which men tried to climb that mountain.’ Typical EFL exercises Minimal Pairs and Polysyllabic words e.g. ‘I said bad not bed.’ ‘He's a photographer.’ ISLE Speech Corpus (3): ISLE Speech Corpus (3) Annotated corpus Pronuciation errors at word- and phone-levels Stress errors at word level Prosodic annotation was added to a written transcription of the speech corpus in our research. HTK Hidden Markov Model Toolkit: HTK Hidden Markov Model Toolkit Developed at Cambridge University Engineering Depertment (CUED). Free toolkit for building Hidden Markov Models (HMMs). Module call: available from both command line and script file. Used in speech recognition research and other pattern recogntion research. e.g. Hand writing recognition Facial recognition Prosodic Annotation: Prosodic Annotation Purpose: Predict ‘model’ intonation patterns to be compared against German spoken learners’ English. Instructions: ‘From text structure to prosodic structure’ (Knowles, 1996) Environment: Windows Excel Amount: First 27 sentences from ‘the Ascent of Everest’ Result of Prosodic Annotation (1): Result of Prosodic Annotation (1) 27 sentences, consisting of 429 words, were divided into 84 tone groups: prosodic ‘phrases‘. → 1 ‘low rise ’, 3 ‘high rise’, 52 ‘fall-rise’ and 28 ‘fall’ patterns. First 10 sentences were modified according to native speakers‘ recordings. → 15 ‘fall-rise’ and 10 ‘fall’ patterns 1 ‘low rise’, 2 ‘high rise’ and 4 ‘fall-rise’ were deleted. Result of Prosodic Annotation (2): Result of Prosodic Annotation (2) (A_01)This is the story <HR> of how two men <FR> reached the top of Everest <FR> on the twenty-ninth of May nineteen fifty-three <FR> and came back safely <HR> to their friends below <F>. (A_02)Yet this will not be the whole story <F>. (A_03) The ascent of Everest <FR> was not the work of one day <FR>, nor even of those few unforgettable weeks <FR> in which we prepared and climbed that summer <F>. Human Evaluation of German Spoken Learners’ English Intonation Abilities: Human Evaluation of German Spoken Learners’ English Intonation Abilities Purpose: Group German speakers into ‘good’ and ‘poor’ intonation groups. Evaluator I: Computational linguistics researcher Evaluator II: English language teaching researcher Quantity: First 10 utterances from each speaker. If all the tone types of an utterance was matched with model pattern, then it was judged as correct; otherwise incorrect. Grouping of 23 German Speakers: Grouping of 23 German Speakers Grouping I: based on Evaluator I (Computational linguistics researcher) Grouping II: based on Evaluator II (English language teaching (ELT) researcher) Grouping III: agreement of Evaluator I and II. 23speakers 3exceptionally poor pronunciation speakers 8good 4intermediate 8poor intonation speakers Result of Human Evalution and Grouping: Result of Human Evalution and Grouping Two evaluators agreed about 63% (144 utterances out of 230) Evaluator II marked 109 errors, while Evaluator I marked 78 errors. However, 7 ‘poor’ and 5 ‘good’ speakers were same in Grouping I and Grouping II. → 2 speakers were added to ‘good’ intonation group in Grouping III. Conditions of HTK Speech Recognition Experiments: Conditions of HTK Speech Recognition Experiments Monophone and triphone HMMs were trained. No language models were used. Perl script and configuration file were used for module calls. Number of training speakers: 6 speakers from the same intonation group. Number of test speakers: 2 (1 for Grouping III) speakers from each group. Results of HTK experiments: Results of HTK experiments Recognition accuracy was generally higher when test and training speakers’ intonation abilities were same. Improvement was higher against triphone HMMs. Improvement was most significant in Experiment II. One ‘poor’ intonation speaker showed negative improvement in all three experiments. Another ‘poor’ speaker also showed the negative improvement in Experiment I. Average Recognition Accuracies of Good Intonation Speakers(Parentheses show results against monophone HMMs): Average Recognition Accuracies of Good Intonation Speakers (Parentheses show results against monophone HMMs)Average Recognition Accuracies of Poor Intonation Speakers(Parentheses show results against monophone HMMs): Average Recognition Accuracies of Poor Intonation Speakers (Parentheses show results against monophone HMMs)Prosodic Keywords: Prosodic Keywords Tone type is decided by the last accented syllable. (Knowles, 1996) → We called word containing the last accented syllable of each tone group the ‘prosodic keyword’. → Recognition accuracy among ‘prosodic keywords’ was counted for triphone cases of Experiment II. Improvement of recognition accuracy among prosodic keywords was higher that of overall. Good test speakers: 26.00% (overall 19.20%) Poor test speakers: 24.50% (overall 15.50%) Irrelevance of Pronunciation Abilities: Irrelevance of Pronunciation Abilities Good intonation speakers tended to have slightly better pronunication ability than poor intonation speakers, although 3 exceptionally poor pronunciatioin speakers had been excluded. → Additional experiments were executed taking 2 ‘best’ and 2 ‘worst’ pronunciation speakers from poor and good intonation groups, respectively. → Similar improvement was observed in this experiment too.Conclusions: Conclusions Matching of test and training speakers’ intonation abilities brought about higher recognition accuracy. HTK was able to distinguish ‘good’ and ‘poor’ intonation. Confirmed that German speakers’ weakness of English intonation was generally ‘fall-rise’ patterns. Human evaluation was successful enough. Future Work: Future Work Expand tone types. (not only for ‘fall-rise’ and ‘fall’ patterns) Applied to other languages and to different native-speaker groups. Use of results in practical language-teaching systems. You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ObaAtwell Dorotea Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 128 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: January 14, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English: Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners’ English Toshifumi Oba, Eric Atwell University of Leeds, School of Computing tosh@comp.leeds.ac.uk eric@comp.leeds.ac.ukOutline: Outline Introduction Intonation and Speech Recognition Tendency of Speech Recognition Research ISLE Speech Corpus HTK Hidden Markov Model Toolkit Prosodic Annotation Human Evalution of Intonation Abilities Grouping of German Speakers by Intonation Ability HTK speech recognition experiments Conclusions Q & A Intonation and Speech Recognition : Intonation and Speech Recognition Intonation is important in Human Communication. Convey the meaning and attitude of the speaker Intonation is important for Speech Recognition. Acoustic Models (duration, F0, intensity) Language Models (identify the dialogue type) Tendency of Speech Recognition Research: Tendency of Speech Recognition Research Intonation << Pronunciation Non-native speaker << Native speaker → Speech recognition research for non-native speakers’ intonation is unique. Also, Intonation is paid less attention in CALL compared with pronunciation. Features of Various Speech Recognition Research: Features of Various Speech Recognition ResearchObjectives: Objectives Analysis of non-native speakers’ English intonation. If the HTK is able to distinguish intonation ? Is it possible to train distinct models for different intonation ability groups? Prosodic annotation of written English text to produce ‘model’ intonation patterns. Human evaluation to group German speakers by English intonation ability. ISLE Speech Corpus (1) : ISLE Speech Corpus (1) Re-use of speech corpus collected in ISLE Interactive Spoken Languge Education project. Leeds University, Universität Hamburg, Università di Milano-Bicocca, Entropic Ltd., Ernst Klett Verlag GmbH, and Dida*El S.R.L. Time-aligned audio recordings from 23 German and 23 Italian spoken learners’ English + 2 Native English Speakers.ISLE Speech Corpus (2): ISLE Speech Corpus (2) Speaker adaptation 82 sentences edited from ‘The Ascent of Everest’ e.g. ‘It is in fact a story of many years, in which men tried to climb that mountain.’ Typical EFL exercises Minimal Pairs and Polysyllabic words e.g. ‘I said bad not bed.’ ‘He's a photographer.’ ISLE Speech Corpus (3): ISLE Speech Corpus (3) Annotated corpus Pronuciation errors at word- and phone-levels Stress errors at word level Prosodic annotation was added to a written transcription of the speech corpus in our research. HTK Hidden Markov Model Toolkit: HTK Hidden Markov Model Toolkit Developed at Cambridge University Engineering Depertment (CUED). Free toolkit for building Hidden Markov Models (HMMs). Module call: available from both command line and script file. Used in speech recognition research and other pattern recogntion research. e.g. Hand writing recognition Facial recognition Prosodic Annotation: Prosodic Annotation Purpose: Predict ‘model’ intonation patterns to be compared against German spoken learners’ English. Instructions: ‘From text structure to prosodic structure’ (Knowles, 1996) Environment: Windows Excel Amount: First 27 sentences from ‘the Ascent of Everest’ Result of Prosodic Annotation (1): Result of Prosodic Annotation (1) 27 sentences, consisting of 429 words, were divided into 84 tone groups: prosodic ‘phrases‘. → 1 ‘low rise ’, 3 ‘high rise’, 52 ‘fall-rise’ and 28 ‘fall’ patterns. First 10 sentences were modified according to native speakers‘ recordings. → 15 ‘fall-rise’ and 10 ‘fall’ patterns 1 ‘low rise’, 2 ‘high rise’ and 4 ‘fall-rise’ were deleted. Result of Prosodic Annotation (2): Result of Prosodic Annotation (2) (A_01)This is the story <HR> of how two men <FR> reached the top of Everest <FR> on the twenty-ninth of May nineteen fifty-three <FR> and came back safely <HR> to their friends below <F>. (A_02)Yet this will not be the whole story <F>. (A_03) The ascent of Everest <FR> was not the work of one day <FR>, nor even of those few unforgettable weeks <FR> in which we prepared and climbed that summer <F>. Human Evaluation of German Spoken Learners’ English Intonation Abilities: Human Evaluation of German Spoken Learners’ English Intonation Abilities Purpose: Group German speakers into ‘good’ and ‘poor’ intonation groups. Evaluator I: Computational linguistics researcher Evaluator II: English language teaching researcher Quantity: First 10 utterances from each speaker. If all the tone types of an utterance was matched with model pattern, then it was judged as correct; otherwise incorrect. Grouping of 23 German Speakers: Grouping of 23 German Speakers Grouping I: based on Evaluator I (Computational linguistics researcher) Grouping II: based on Evaluator II (English language teaching (ELT) researcher) Grouping III: agreement of Evaluator I and II. 23speakers 3exceptionally poor pronunciation speakers 8good 4intermediate 8poor intonation speakers Result of Human Evalution and Grouping: Result of Human Evalution and Grouping Two evaluators agreed about 63% (144 utterances out of 230) Evaluator II marked 109 errors, while Evaluator I marked 78 errors. However, 7 ‘poor’ and 5 ‘good’ speakers were same in Grouping I and Grouping II. → 2 speakers were added to ‘good’ intonation group in Grouping III. Conditions of HTK Speech Recognition Experiments: Conditions of HTK Speech Recognition Experiments Monophone and triphone HMMs were trained. No language models were used. Perl script and configuration file were used for module calls. Number of training speakers: 6 speakers from the same intonation group. Number of test speakers: 2 (1 for Grouping III) speakers from each group. Results of HTK experiments: Results of HTK experiments Recognition accuracy was generally higher when test and training speakers’ intonation abilities were same. Improvement was higher against triphone HMMs. Improvement was most significant in Experiment II. One ‘poor’ intonation speaker showed negative improvement in all three experiments. Another ‘poor’ speaker also showed the negative improvement in Experiment I. Average Recognition Accuracies of Good Intonation Speakers(Parentheses show results against monophone HMMs): Average Recognition Accuracies of Good Intonation Speakers (Parentheses show results against monophone HMMs)Average Recognition Accuracies of Poor Intonation Speakers(Parentheses show results against monophone HMMs): Average Recognition Accuracies of Poor Intonation Speakers (Parentheses show results against monophone HMMs)Prosodic Keywords: Prosodic Keywords Tone type is decided by the last accented syllable. (Knowles, 1996) → We called word containing the last accented syllable of each tone group the ‘prosodic keyword’. → Recognition accuracy among ‘prosodic keywords’ was counted for triphone cases of Experiment II. Improvement of recognition accuracy among prosodic keywords was higher that of overall. Good test speakers: 26.00% (overall 19.20%) Poor test speakers: 24.50% (overall 15.50%) Irrelevance of Pronunciation Abilities: Irrelevance of Pronunciation Abilities Good intonation speakers tended to have slightly better pronunication ability than poor intonation speakers, although 3 exceptionally poor pronunciatioin speakers had been excluded. → Additional experiments were executed taking 2 ‘best’ and 2 ‘worst’ pronunciation speakers from poor and good intonation groups, respectively. → Similar improvement was observed in this experiment too.Conclusions: Conclusions Matching of test and training speakers’ intonation abilities brought about higher recognition accuracy. HTK was able to distinguish ‘good’ and ‘poor’ intonation. Confirmed that German speakers’ weakness of English intonation was generally ‘fall-rise’ patterns. Human evaluation was successful enough. Future Work: Future Work Expand tone types. (not only for ‘fall-rise’ and ‘fall’ patterns) Applied to other languages and to different native-speaker groups. Use of results in practical language-teaching systems.