logging in or signing up KevynCollins Thompson June2003 Soffia Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 30 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 10, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Prosody Models for Automatically Derived Focus Words In Narrative Text: Prosody Models for Automatically Derived Focus Words In Narrative Text Kevyn Collins-Thompson Speech Seminar June 13, 2003Prosody Models for Automatically Derived Focus Words: Prosody Models for Automatically Derived Focus Words When the farmer saw the dark clouds in the eastern sky… When the farmer saw the dark clouds in the eastern sky… Story Text Automatically extract focus features Map word features to pitch and duration ..the farmer saw…What are ‘focus’ words ?: What are ‘focus’ words ? Focus words are words that are given special attention by a speaker. Changes may be in pitch, timing, duration, energy Why give a word special attention? Topic words that are important to the story Difficult, unexpected, or novel words Modifier words Stories are centered around topic words: Stories are centered around topic words ‘… When the farmer saw the dark clouds in the eastern sky he knew that rain was coming. The fields badly needed more rain. There had been no rain all summer. Last summer, his crops had plenty of water. But this summer had been very dry…’We can try to find topic words automatically using relative likelihoods: We can try to find topic words automatically using relative likelihoods Compare, for each word W in the story: log PS(W) in the story log PB(W) in general ‘background’ English corpus Choose words such that: F(W) = log PS(W) - log PB(W) > T … where T = 5 in my code Topic words are often uncommon words, but not always.Word difficulty is estimated from a general English model: Word difficulty is estimated from a general English model Calculate P(W) in British Corpus of English 100 million tokens from various genres Word difficulty is estimated by log P(W). We can also use custom language models to estimate difficulty For example, words that most 4-th graders knowDifficult or novel words become more familiar with time: Difficult or novel words become more familiar with time ‘…our food is a product of photosynthesis, the process that converts energy in sunlight to chemical forms of energy. Photosynthesis is carried out by many different organisms. The best known form of photosynthesis …’ Change in difficulty & novelty over time is modeled with an ‘S’ curve: Change in difficulty & novelty over time is modeled with an ‘S’ curve Repetitions in Time Novelty FactorModifiers gain importance as a focus word is repeated: Modifiers gain importance as a focus word is repeated ‘…The fields badly needed more rain. There had been no rain all summer. Last summer, his crops had plenty of water. But this summer had been very dry…’ Derivation: 1. First word of noun phrases containing focus words 2. Very common words (‘a’, ‘the’, …) ignored ‘He picked some plants…’Word duration is modeled as a mixture of several factors: Word duration is modeled as a mixture of several factors Word stretch T(W) combines focus F(W) and difficulty D(W) Individual phoneme stretching For infant speech, vowels may be more extended Also ‘close’ consonants like M vs. N, R vs. L Table-based via customized Festival module in C. Segment time: Determining pitch contour for a single syllable: Determining pitch contour for a single syllable Overall word pitch is highly correlated with focus properties Focus words have highest pitch peaks Several other factors to consider: Stressed / unstressed syllable First / last syllable Sentence type: Question, Exclamation Source: Fernald & Mazzie (1991), ‘Prosody and Focus in Speech to Infants and Adults’. Developmental Psychology Vol. 27, No. 2, 209 - 221Final pitch contour is derived from syllable peaks: Final pitch contour is derived from syllable peaks An F0 value f(S) is calculated for each syllable S in word W: fBASE is speaker’s base F0 level (e.g. 90 Hz) F(W) is focus level of word W R(S) = 1 if S is stressed Focus stretch α = 15, stress emphasis β = 0.5 Final F0 contour is piecewise linear This is OK, perceptually close to smooth Source: J ‘t Hart et al. (1990), ‘A Perceptual Study of Intonation’. Cambridge University Press, Cambridge UK. Sample Pitch Contour: Sample Pitch Contour Is that a dog in the park ? 100 Hz 200 Hz 150 Hz Focus F(W) 0.01 0.58 0.21 4.50 0.12 1.26 4.15 245 HzPhoneme duration trace: ‘in the park’: Phoneme duration trace: ‘in the park’ word stretch 0.87 seg name: ih word stretch 0.87 seg name: n word stretch 0.84 seg name: dh word stretch 0.84 seg name: ax word stretch 1.12 seg name: p word stretch 1.12 seg name: aa word stretch 1.12 seg name: r Phoneme: r has avg stretch 2.07, and this stretch is 1.97, so final local stretch = 2.20837 word stretch 1.12 seg name: kTime for a story!: Time for a story! Synthesis uses basic diphone voice to highlight duration and pitch changesSlide16: Default synthesis Focus-based prosodyHow the project was implemented: How the project was implemented The story text is parsed with the Apple Pie Parser. Vocabulary patterns are analyzed with unigram language models in Perl. Perl script creates a Scheme list of Word utterances Features for topic words, difficulty, etc. Festival is invoked with a customized duration module and Scheme intonation function Ideas for improvement: Ideas for improvement More accurate modeling of dialogue and other word interaction Include variation in energy levels Customize the language profile for each listener Questions?: Questions? The End http://www.cs.cmu.edu/~kct/sounds/ You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
KevynCollins Thompson June2003 Soffia Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 30 Category: Education License: All Rights Reserved Like it (0) Dislike it (0) Added: January 10, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Prosody Models for Automatically Derived Focus Words In Narrative Text: Prosody Models for Automatically Derived Focus Words In Narrative Text Kevyn Collins-Thompson Speech Seminar June 13, 2003Prosody Models for Automatically Derived Focus Words: Prosody Models for Automatically Derived Focus Words When the farmer saw the dark clouds in the eastern sky… When the farmer saw the dark clouds in the eastern sky… Story Text Automatically extract focus features Map word features to pitch and duration ..the farmer saw…What are ‘focus’ words ?: What are ‘focus’ words ? Focus words are words that are given special attention by a speaker. Changes may be in pitch, timing, duration, energy Why give a word special attention? Topic words that are important to the story Difficult, unexpected, or novel words Modifier words Stories are centered around topic words: Stories are centered around topic words ‘… When the farmer saw the dark clouds in the eastern sky he knew that rain was coming. The fields badly needed more rain. There had been no rain all summer. Last summer, his crops had plenty of water. But this summer had been very dry…’We can try to find topic words automatically using relative likelihoods: We can try to find topic words automatically using relative likelihoods Compare, for each word W in the story: log PS(W) in the story log PB(W) in general ‘background’ English corpus Choose words such that: F(W) = log PS(W) - log PB(W) > T … where T = 5 in my code Topic words are often uncommon words, but not always.Word difficulty is estimated from a general English model: Word difficulty is estimated from a general English model Calculate P(W) in British Corpus of English 100 million tokens from various genres Word difficulty is estimated by log P(W). We can also use custom language models to estimate difficulty For example, words that most 4-th graders knowDifficult or novel words become more familiar with time: Difficult or novel words become more familiar with time ‘…our food is a product of photosynthesis, the process that converts energy in sunlight to chemical forms of energy. Photosynthesis is carried out by many different organisms. The best known form of photosynthesis …’ Change in difficulty & novelty over time is modeled with an ‘S’ curve: Change in difficulty & novelty over time is modeled with an ‘S’ curve Repetitions in Time Novelty FactorModifiers gain importance as a focus word is repeated: Modifiers gain importance as a focus word is repeated ‘…The fields badly needed more rain. There had been no rain all summer. Last summer, his crops had plenty of water. But this summer had been very dry…’ Derivation: 1. First word of noun phrases containing focus words 2. Very common words (‘a’, ‘the’, …) ignored ‘He picked some plants…’Word duration is modeled as a mixture of several factors: Word duration is modeled as a mixture of several factors Word stretch T(W) combines focus F(W) and difficulty D(W) Individual phoneme stretching For infant speech, vowels may be more extended Also ‘close’ consonants like M vs. N, R vs. L Table-based via customized Festival module in C. Segment time: Determining pitch contour for a single syllable: Determining pitch contour for a single syllable Overall word pitch is highly correlated with focus properties Focus words have highest pitch peaks Several other factors to consider: Stressed / unstressed syllable First / last syllable Sentence type: Question, Exclamation Source: Fernald & Mazzie (1991), ‘Prosody and Focus in Speech to Infants and Adults’. Developmental Psychology Vol. 27, No. 2, 209 - 221Final pitch contour is derived from syllable peaks: Final pitch contour is derived from syllable peaks An F0 value f(S) is calculated for each syllable S in word W: fBASE is speaker’s base F0 level (e.g. 90 Hz) F(W) is focus level of word W R(S) = 1 if S is stressed Focus stretch α = 15, stress emphasis β = 0.5 Final F0 contour is piecewise linear This is OK, perceptually close to smooth Source: J ‘t Hart et al. (1990), ‘A Perceptual Study of Intonation’. Cambridge University Press, Cambridge UK. Sample Pitch Contour: Sample Pitch Contour Is that a dog in the park ? 100 Hz 200 Hz 150 Hz Focus F(W) 0.01 0.58 0.21 4.50 0.12 1.26 4.15 245 HzPhoneme duration trace: ‘in the park’: Phoneme duration trace: ‘in the park’ word stretch 0.87 seg name: ih word stretch 0.87 seg name: n word stretch 0.84 seg name: dh word stretch 0.84 seg name: ax word stretch 1.12 seg name: p word stretch 1.12 seg name: aa word stretch 1.12 seg name: r Phoneme: r has avg stretch 2.07, and this stretch is 1.97, so final local stretch = 2.20837 word stretch 1.12 seg name: kTime for a story!: Time for a story! Synthesis uses basic diphone voice to highlight duration and pitch changesSlide16: Default synthesis Focus-based prosodyHow the project was implemented: How the project was implemented The story text is parsed with the Apple Pie Parser. Vocabulary patterns are analyzed with unigram language models in Perl. Perl script creates a Scheme list of Word utterances Features for topic words, difficulty, etc. Festival is invoked with a customized duration module and Scheme intonation function Ideas for improvement: Ideas for improvement More accurate modeling of dialogue and other word interaction Include variation in energy levels Customize the language profile for each listener Questions?: Questions? The End http://www.cs.cmu.edu/~kct/sounds/