Presentation Transcript
How to Use Grammars in a More Flexible Way : Feb 22nd, 2007 SpeechTEK WEST How to Use Grammars in a More Flexible Way Paolo Baggia
Overview : Overview Inside the ASR
Language Constraints
Speech Grammars
SLMs
Pros and Cons
More flexible grammars
Garbage Techniques
Experimental Results
Applications
Final Remarks
Loquendo Today : Loquendo Today Global company formed in 2001 as a spin-off from the Telecom Italia R&D center with over 30 years experience in Speech Technologies
Complete set of Multilingual speech technologies on a wide spectrum of devices
Full support of international standards (VoiceXML, MRCP, VoIP)
Ready for challenging future scenarios: Multimodality, Security
A Telecom Italia Group company
HQ in Italy, Offices in US, Spain, Germany and France, and Worldwide Network of Parters “Best Innovation
in Speech Synthesis” Prize
AVIOS-SpeechTEK West
2006. “Best Innovation in Multi-Lingual
Speech Synthesis” Prize
AVIOS-SpeechTEK West 2005.
Inside the ASR engine : Inside the ASR engine Acoustic Signal Recognized Words O = o1, o2, …, on W = w1, w2, …, wm Likelihood Prior
Two Probabilities : Two Probabilities P(O|W) Likelihood
Acoustic Models ASR engine
P(W) Prior probability
Language Constraints Application Developer
COSTLY ACTIVITY
Language Constraints : Language Constraints On one side: Very constrained speech grammars
On the other side: Statistical Language Models
Proposal of a new tecniques in the middle Speech Grammars SLMs
Speech Grammars vs. SLMs : Speech Grammars vs. SLMs SPEECH GRAMMARS
Method: A compact and yet complete description of the user’s response
Constraints: Rules have to correctly represent sentence construction
Issues:
Standard syntax (W3C SRGS)
Standard results (Semantic Interpretation – W3C SISR)
Best performance for in grammar utt.
Grammar should cover all possible responses
Cost of developing a grammar
Cost of fine-tuning grammars (need for tuning tools) STATISTICAL LANGUAGE MODELS
Method: Assesses the probability of word occurrence in a sentence
Constraints: Probability of 2, 3, …, n preceding words (n-gram)
Issues:
Easy to generate, but only with a specific corpus for each application
Difficult to assign probabilities to unforeseen events (smoothing techniques)
Very large corpora are needed
Time-consuming transcription of field data required to tune the SLMs
More Flexible Grammars : More Flexible Grammars How can we simplify grammar creation?
We can focus on modeling just the relevant content, and not the rest of the phrase. Eg.
(I’d like to travel) from Rome to Venice (please)
(I need to go) from Rome to Venice (as soon as possible)
(Well…I’m going…er…sorry, a ticket) from Rome to Venice
Use a special grammar node to discard the rest of the sentence! Garbage
Where to put the Garbage rule? : Simple usage
More complex usages are possible. Still under evaluation! Where to put the Garbage rule? Garbage Garbage Garbage Garbage Relevant part Relevant part Relevant part Prefix Postfix
Example Grammar with Garbage Rule : Example Grammar with Garbage Rule
-
-
from
out.from = rules.latest();
to
out.to = rules.latest();
-
- Rome
- Venice
Garbage Techniques : Garbage Techniques General model
Can be trained at an acoustic level as an average model or as the average of the first N-best activated unit
A calibration of the general model could be problematic
Filler words
A vocabulary composed of different words from the speech grammar is inserted as a garbage node
Efficiency and accuracy performance greatly depends upon the number and choice of the added words
Phonetic models (Loquendo solution)
A node containing a chosen subset of the phonetic units of the acoustic model is selected and used as a garbage node
The “power” of the garbage could be easily modulated through the unit subset selection and the garbage node weights
Vocabulary size is usually smaller than the list of ‘filler-words’
Experimental Results : Experimental Results Garbage grammar accuracy should be analyzed in 2 ways
If the input speech matches the speech grammar
The presence of garbage nodes should not affect recognition
If the input speech is partially covered by the speech grammar
The garbage should cover several parts of the input speech
Laboratory test bed: Average % accuracy loss on Built-in Grammars: dates, pin codes, currency amounts, time expressions, ….
Identifying keywords: months of the year, a test on spontaneous date expressions. The phonetic garbage outperforms the filler words technique except in the ORACLE case.
Further Considerations : Further Considerations Garbage model is a flexible and powerful solution, but …
Difficult to use if the content grammars contains very similar words
The number of garbage nodes in a grammar should be limited (4 – 5 max)
Garbage nodes could benefit from grammar weights in some circumstances
For some tasks the best results are achieved when speech grammars fully cover vocal user formulations
Loquendo Phonetic Learner allows analysis of recognition log data to discover frequently used formulations covered by the garbage
When the complexity of a recognition task, in terms of user formulations’ prediction, is too high…
A Statistical Language Model is better, but an appropriate corpus is needed
A grammar with garbage rules could be used as a starting point for the acquisition of the corpus
First Real Applications : First Real Applications Quatro Rodas - Travel Guide Voice Portal (Brazil)
phone number: +55 2140034842
Bizvox's innovative solution allows customers to access information on 16,000 businesses (cost of hotels, restaurants and tourist attractions), by means of a single local call from anywhere in Brazil. The user also receives an SMS with the correct address of the desired location.
Automated Police Info Service - (Italy)
www.carabinieri.it phone number: +39 06 80985232
An automated service for the police force providing info on joining the force and moving up through the service, on upcoming exams, exam results etc. The caller can speak freely and naturally thanks to an extensive use of garbage nodes.
(experimental) Telecom Italia Directory Assistance
Final Remarks : Final Remarks Develop ASR application is a costly activity, even if progresses have been done on standard formats and tools
A technique based on GARBAGE nodes in grammars:
Greatly simplifies the grammar development
May be used to fast prototype system to be tuned in a second phase
Helps in noisy and chatty environments (garbage covers unwanted speech)
Promotes a more flexible dialog developments (from system guided dialogs to more open prompts, shift of contexts, etc.)
First feedbacks from real applications
Thank You! : Thank You! For testing this feature go to: “Silver Bullet” demo
In building 3, floor 4, “Unione Square” room #14
For more information please:
Visit Loquendo’s booth #314 and try the ASR
Keep an eye on: www.loquendo.com
Contact us:
paolo.baggia@loquendo.com
Live Demo – Corps of Carabinieri : Live Demo – Corps of Carabinieri Go to www.carabinieri.it Corps of Carabinieri (armed force):
Click on “Operatore Virtuale” (Virtual Operator)
Call +39 06 80985232
Information service about:
Appliance tests, requirements, documents
Calendar of the tests
Results of the tests (20.000 people)
Catch the
buzz on authorSTREAM
Copyright © 2002-2008 authorSTREAM. All rights reserved.