VoiceXMLApplications

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

VoiceXML Applications: 

VoiceXML Applications Adam Hocek ahocek@broadstrokesinc.com Web Services Working Group Ballston, VA 15 April 2003 COPYRIGHT 2003 Broadstrokes, Inc All rights reserved www.broadstrokesinc.com

Slide2: 

Definitive VoiceXML Prentice Hall Charles Goldfarb XML Series Adam Hocek and David Cuddihy http://www.voicexml.org/edu/examples/definitive_vxml.html VoiceXML Forum (code examples) http://www.voicexml.org/edu/examples/definitive_vxml.html

What is VoiceXML?: 

What is VoiceXML? VoiceXML is an XML language for defining dialogs and voice interfaces Developed by Lucent, AT&T, IBM, and Motorola Version 1.0 released March 2000 Version 2.0 working draft released October 2001 Version 2.0 candidate recommendation January 2003

Applications with VoiceXML: 

Applications with VoiceXML IVR (bank, customer service, transportation,…) Voice mail PBX, softswitches Unified Messaging Information Retrieval Voice Browsers Voice Forms (information collection) Call Trees

Why VoiceXML?: 

Why VoiceXML? Proliferation of cell-phones Smaller devices Voice is a natural means of communication It’s dynamic nature Simplified development cycle Market opportunities (legacy systems, call-centers, ..)

VoiceXML components: 

VoiceXML components Voice Rec. DTMF Rec. voice DTMF Dialog Manager Text To Speech audio player Audio Input Audio Output VXML Parser FIA Process Phase Collect Phase Select Phase Initialize Phase phone

Traditional IVR architecture: 

Traditional IVR architecture

VoiceXML IVR architecture: 

VoiceXML IVR architecture

How do VoiceXML dialogs work?: 

How do VoiceXML dialogs work? Form element Form Interpretation Algorithm +

The form element: 

The form element

VoiceXML form items: 

VoiceXML form items Variables: Form Item Form Form Item: field block initial link event handlers filled grammar subdialog object record transfer Add’l Elements: Common Attr: Form Interpretation Algorithm result: count: guard: name: expr: cond:

Form Interpretation Algorithm: 

Form Interpretation Algorithm Interpreting a form item generally involves: · Initializing form items; · Selecting and playing one or more prompts; · Collecting a user input, either a response that fills in one or more fields, or a throwing of some event (help, for instance); and · Interpreting any <filled> actions that pertain to the newly filled in fields (or form).

A simple form: 

A simple form <form> <block>To complete your order we need the following information.</block> <field name="color"> <prompt>Select from one of the following colors <enumerate/></prompt> <option>red</option> <option>blue</option> <option>green</option> </field> <field name="size"> <prompt>Select from the size. You can select <enumerate/></prompt> <option>small</option> <option>medium</option> <option>large</option> </field> <filled> Thank you. Your order is being processed. <submit next="/cgi/details.cgi"/ namelist="color size"/> </filled> </form>

Processing a form item: 

Processing a form item Variables: Form Item Common Attr: Variables: Form Item Common Attr: Form Interpretation Algorithm result: undefined count: 0 guard: false result: red count: 1 guard: true name: color expr: undefined cond: true name: color expr: undefined cond: true

VoiceXML event handlers: 

VoiceXML event handlers

SSML: 

SSML Speech Synthesis Markup Language elements include the following: emphasis - text spoken with emphasis prosody - allows for control of pitch, rate, duration, and volume sentence - identifies a sentence paragraph - identifies a paragraph say-as - uses a type construct to render text phoneme - specifies a phonetic pronunciation voice - specifies a voice characteristic mark - used for asynchronous notification break - a pause

Say-as element’s type attribute: 

acronym address number currency date duration measure Say-as element’s type attribute name net:email net:uri number:ordinal number:digits telephone time

An example using SSML: 

An example using SSML <?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0"> <form id="audiotest"> <block> Your <emphasis>total</emphasis> is <say-as class="currency">$299.95</say- as> <audio src="http://205.188.234.65:8006"> I'm sorry. The audio stream is not available today. </audio> </block> </form> </vxml>

Grammars: 

Grammars VoiceXML must support one of the grammars of Speech Recognition Grammar Format Grammars describe to the underlying ASR the active word or phrases that can be recognized For a matching utterance grammars return a corresponding semantic interpretation Grammars can be very simple or complex; rules are used to describe the logic in grammars

SGRS: 

SGRS Speech Grammar Recognition Specification elements include the following: rule - a rule expansion declaration ruleref - a local or external rule reference item - define an entity one-of - a set of alternatives tag - a string associated to a rule expansion grammar - root element

Rule Constructs: 

Rule Constructs • Sequence, an expansion used to define an exact phrase • Alternative set of expansions with optional weighting, to define choices • Precedence used to define grouping • Optional rule expansions • Repetition operators • Recursion (implementation is optional for GRXML and ABNF) • Tagging, is an application aid for providing semantic interpretation.

An example using GRXML: 

An example using GRXML <form id="test"> <field name="favcolor"> <prompt>What is your favorite color?</prompt> <grammar xml:lang="en-US" version="1.0" root="example1"> <rule id="example1" scope="public"> <one-of> <item><tag>'red'</tag>red</item> <item><tag>'green'</tag>green</item> <item><tag>'blue'</tag>blue</item> <item><tag>'red'</tag>burgundy</item> <item><tag>'blue'</tag>indigo</item> </one-of> </rule> </grammar> <filled> <prompt>You said your favorite color is <value expr="favcolor"/>.</prompt> </filled> </field> </form>

How grammars work: 

How grammars work User VXML ASR initialization load grammars prompt user “burgundy” recognition bind result red to form item prompt user “What is your favorite color?” “You said your favorite color is red” red, green, blue, burgundy, indigo

Advanced Grammars (1 of 2): 

Advanced Grammars (1 of 2) <rule id="userAction"> <one-of> <item> <ruleref uri="#transactionComplex"/> <ruleref uri="builtin:currency"/> </item> <item><ruleref uri="#transactionSimple"></item> </one-of> <item repeat="1-2"> <one-of> <item>in</item> <item>from</item> <item>into</item> <item>to</item> </one-of> <item repeat="0-1">my</item> <item><ruleref uri="#accountType"/></item> <item repeat="0-1">account</item> </item> </rule>

Advanced Grammars (2 of 2): 

Advanced Grammars (2 of 2) #JSGF V1.0; grammar userAction; public <userAction> = <NULL> (Please | Kindly | I'd like to | I want to) ((<transactionComplex> | <transactionSimple>){xa} | exit {exit}); <accountType> = savings | checking | money market; <transactionSimple> = (/4/ check balance | /3/ check the balance | /2/ balance | /1/ inquire) ([in] | [in my]) <accountType> [account]; <transactionComplex> = (transfer <builtin:currency> from [my] <accountType> [account] (to | in | into) [my] <accountType> [account]) | (transfer <builtin:currency> (to | into) [my] <accountType> [account] from [my] <accountType> [account]) | (withdraw <builtin:currency> ([from] | [from my]) <accountType> [account], [and] deposit ([in] | [into]) [my] <accountType> [account]) | (deposit <builtin:currency> ([in] | [into]) [my] <accountType> [account], [and] withdraw ([from] | [from my]) <accountType> [account]);

Grammar recognized results: 

Grammar recognized results • “Transfer $300 from my savings to checking.” • “Transfer $20 from my savings account to my money market account.” • “Check balance in savings account.” • “Deposit $40 into checking.”

Call Control: 

Call Control VoiceXML 2.0 application limitations: Limited call-control (<transfer>, <disconnect>) No outbound call capability Single-threaded event model No selective call answering

CCXML: 

CCXML The main features of the Call Control Markup Language are:   Allows for outbound calls Support for multi-party calls Selective inbound call routing Asynchronous "external" event handling Conference objects for joining and unjoining participants Audio objects for splitting and mixing audio resources Control and connectivity to one or more VoiceXML interpreter instances VoiceXML control to start, kill, or suspend a process Supports multiple CCXML programs and interconnection through events Whisper transfer Supervised transfer For details see: http://www.w3.org/TR/ccxml

Multimodal: 

Multimodal What is multimodal? Multi-interface support for applications What modes of operation? Sequential (input/output) Synchronous (input/output)

Slide30: 

Voice Server Plus Resource Manger Session Manger XML Interpreter Global Event Manger Adapter SPI GRXML SSML WorkFlow VoiceXML PSTN IP Network Logging/Report Voice Server Plus XML Files Non-XML Files Database

Slide31: 

VSP key features VoiceXML 2.0 compliance GRXML Interpreter SSML Interpreter ECMAScript Interpreter Session support (SM): time-delayed and collaborative modes Basic telephony and device event support (GEM) Global events (GEM): extensible to JMS, SOAP, SIP … Platform support (SPI): Nuance, SpeechWorks, Dialogic (PSTN) Data repository (RM): XML, Non-XML (binary, audio) Data access (RM): XSLT and considerations for XNDL, XQuery Multimodal capabilities OS Unix and Windows Logging and reporting capabilities: outputs XML Remote Administration

Slide32: 

How VSP can simplify apps Take existing Digital Talking Books Application How existing ECMAScript data structure is created How the data structure can be implemented using the VSP An alternative implementation

Slide33: 

ECMAScript snippet used var aMenu = new Array; aMenu[0] = new Object; aMenu[0].text = "Introduction" aMenu[0].audio = "../CONV0002.wav" aMenu[1] = new Object; aMenu[1].text = "Venice"; aMenu[1].audio = "../CONV0003.wav"; aMenu[1].items = new Array; aMenu[1].items[0] = new Object; aMenu[1].items[0].text = "The Elements of the City, A Description of its Sensory Riches"; aMenu[1].items[0].audio = "../CONV0004.wav"; aMenu[1].items[0].contenttext = "The Elements of the City, A Description of its Sensory Riches"; aMenu[1].items[0].contentaudio = ""; aMenu[1].items[0].helptext = "Would you like to hear about the Elements of the Venice? Press the Pound key "; aMenu[1].items[0].helpaudio = ""; aMenu[1].items[0].morehelptext = "Press 1 to start over"; aMenu[1].items[0].morehelpaudio = "";

Slide34: 

Using the RM <object name="result" classid="method://file_lookup/find" codebase=”http://www.broadstrokesinc.com/demo” data="rsm/italy.xml" type=”xml”> <param name="Depth" expr="Depth"> <param name="LevelOneIndex" expr="LevelOneIndex"> <param name="LevelTwoIndex" expr="LevelTwoIndex"> <param name="LevelThreeIndex" expr="LevelThreeIndex"> </object> <prompt> <audio expr="result.audio">   <value expr=”result.text" mode="tts" />.   </audio> </prompt> VoiceXML call to RM using <object> Rendering result returned from RM

Slide35: 

An alternative implementation <rm:load-doc docid=”mydoc1” src=“http://www.broadstrokesinc.com/apps/data/my.xml” type=”xml” />   <rm:get-next-node name=”result” docid=”mydoc1” /> get-current-node get-prev-node get-parent-node get-child-node get-all-nodes get-registered-docids remove-doc Call to RM using XML Some other common RM elements

Slide36: 

Q & A For VSP product details see: http://www.broadstrokesinc.com/products.html And for a product demo see: http://www.broadstrokesinc.com/vsp/demo Also see: SRGS (Speech Recognition Grammar Specification) http://www.w3.org/TR/speech-grammar XDNL (XML Document Navigation Language) http://www.w3.org/TR/xdnl NLSML Natural Language Semantics Markup Language) http://www.w3.org/TR/nl-spec EMMA (Extensible MultiModal Annotation Language ) http://www.w3.org/TR/EMMAreqs