logging in or signing up VoiceXMLApplications Sabatini Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 257 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 01, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript VoiceXML Applications: VoiceXML Applications Adam Hocek ahocek@broadstrokesinc.com Web Services Working Group Ballston, VA 15 April 2003 COPYRIGHT 2003 Broadstrokes, Inc All rights reserved www.broadstrokesinc.comSlide2: Definitive VoiceXML Prentice Hall Charles Goldfarb XML Series Adam Hocek and David Cuddihy http://www.voicexml.org/edu/examples/definitive_vxml.html VoiceXML Forum (code examples) http://www.voicexml.org/edu/examples/definitive_vxml.htmlWhat is VoiceXML?: What is VoiceXML? VoiceXML is an XML language for defining dialogs and voice interfaces Developed by Lucent, AT&T, IBM, and Motorola Version 1.0 released March 2000 Version 2.0 working draft released October 2001 Version 2.0 candidate recommendation January 2003Applications with VoiceXML: Applications with VoiceXML IVR (bank, customer service, transportation,…) Voice mail PBX, softswitches Unified Messaging Information Retrieval Voice Browsers Voice Forms (information collection) Call TreesWhy VoiceXML?: Why VoiceXML? Proliferation of cell-phones Smaller devices Voice is a natural means of communication It’s dynamic nature Simplified development cycle Market opportunities (legacy systems, call-centers, ..)VoiceXML components: VoiceXML components Voice Rec. DTMF Rec. voice DTMF Dialog Manager Text To Speech audio player Audio Input Audio Output VXML Parser FIA Process Phase Collect Phase Select Phase Initialize Phase phoneTraditional IVR architecture: Traditional IVR architectureVoiceXML IVR architecture: VoiceXML IVR architectureHow do VoiceXML dialogs work?: How do VoiceXML dialogs work? Form element Form Interpretation Algorithm +The form element: The form elementVoiceXML form items: VoiceXML form items Variables: Form Item Form Form Item: field block initial link event handlers filled grammar subdialog object record transfer Add’l Elements: Common Attr: Form Interpretation Algorithm result: count: guard: name: expr: cond:Form Interpretation Algorithm: Form Interpretation Algorithm Interpreting a form item generally involves: · Initializing form items; · Selecting and playing one or more prompts; · Collecting a user input, either a response that fills in one or more fields, or a throwing of some event (help, for instance); and · Interpreting any <filled> actions that pertain to the newly filled in fields (or form).A simple form: A simple form <form> <block>To complete your order we need the following information.</block> <field name="color"> <prompt>Select from one of the following colors <enumerate/></prompt> <option>red</option> <option>blue</option> <option>green</option> </field> <field name="size"> <prompt>Select from the size. You can select <enumerate/></prompt> <option>small</option> <option>medium</option> <option>large</option> </field> <filled> Thank you. Your order is being processed. <submit next="/cgi/details.cgi"/ namelist="color size"/> </filled> </form>Processing a form item: Processing a form item Variables: Form Item Common Attr: Variables: Form Item Common Attr: Form Interpretation Algorithm result: undefined count: 0 guard: false result: red count: 1 guard: true name: color expr: undefined cond: true name: color expr: undefined cond: trueVoiceXML event handlers: VoiceXML event handlers SSML: SSML Speech Synthesis Markup Language elements include the following: emphasis - text spoken with emphasis prosody - allows for control of pitch, rate, duration, and volume sentence - identifies a sentence paragraph - identifies a paragraph say-as - uses a type construct to render text phoneme - specifies a phonetic pronunciation voice - specifies a voice characteristic mark - used for asynchronous notification break - a pauseSay-as element’s type attribute: acronym address number currency date duration measure Say-as element’s type attribute name net:email net:uri number:ordinal number:digits telephone timeAn example using SSML: An example using SSML <?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0"> <form id="audiotest"> <block> Your <emphasis>total</emphasis> is <say-as class="currency">$299.95</say- as> <audio src="http://205.188.234.65:8006"> I'm sorry. The audio stream is not available today. </audio> </block> </form> </vxml> Grammars: Grammars VoiceXML must support one of the grammars of Speech Recognition Grammar Format Grammars describe to the underlying ASR the active word or phrases that can be recognized For a matching utterance grammars return a corresponding semantic interpretation Grammars can be very simple or complex; rules are used to describe the logic in grammarsSGRS: SGRS Speech Grammar Recognition Specification elements include the following: rule - a rule expansion declaration ruleref - a local or external rule reference item - define an entity one-of - a set of alternatives tag - a string associated to a rule expansion grammar - root elementRule Constructs: Rule Constructs • Sequence, an expansion used to define an exact phrase • Alternative set of expansions with optional weighting, to define choices • Precedence used to define grouping • Optional rule expansions • Repetition operators • Recursion (implementation is optional for GRXML and ABNF) • Tagging, is an application aid for providing semantic interpretation. An example using GRXML: An example using GRXML <form id="test"> <field name="favcolor"> <prompt>What is your favorite color?</prompt> <grammar xml:lang="en-US" version="1.0" root="example1"> <rule id="example1" scope="public"> <one-of> <item><tag>'red'</tag>red</item> <item><tag>'green'</tag>green</item> <item><tag>'blue'</tag>blue</item> <item><tag>'red'</tag>burgundy</item> <item><tag>'blue'</tag>indigo</item> </one-of> </rule> </grammar> <filled> <prompt>You said your favorite color is <value expr="favcolor"/>.</prompt> </filled> </field> </form>How grammars work: How grammars work User VXML ASR initialization load grammars prompt user “burgundy” recognition bind result red to form item prompt user “What is your favorite color?” “You said your favorite color is red” red, green, blue, burgundy, indigoAdvanced Grammars (1 of 2): Advanced Grammars (1 of 2) <rule id="userAction"> <one-of> <item> <ruleref uri="#transactionComplex"/> <ruleref uri="builtin:currency"/> </item> <item><ruleref uri="#transactionSimple"></item> </one-of> <item repeat="1-2"> <one-of> <item>in</item> <item>from</item> <item>into</item> <item>to</item> </one-of> <item repeat="0-1">my</item> <item><ruleref uri="#accountType"/></item> <item repeat="0-1">account</item> </item> </rule>Advanced Grammars (2 of 2): Advanced Grammars (2 of 2) #JSGF V1.0; grammar userAction; public <userAction> = <NULL> (Please | Kindly | I'd like to | I want to) ((<transactionComplex> | <transactionSimple>){xa} | exit {exit}); <accountType> = savings | checking | money market; <transactionSimple> = (/4/ check balance | /3/ check the balance | /2/ balance | /1/ inquire) ([in] | [in my]) <accountType> [account]; <transactionComplex> = (transfer <builtin:currency> from [my] <accountType> [account] (to | in | into) [my] <accountType> [account]) | (transfer <builtin:currency> (to | into) [my] <accountType> [account] from [my] <accountType> [account]) | (withdraw <builtin:currency> ([from] | [from my]) <accountType> [account], [and] deposit ([in] | [into]) [my] <accountType> [account]) | (deposit <builtin:currency> ([in] | [into]) [my] <accountType> [account], [and] withdraw ([from] | [from my]) <accountType> [account]);Grammar recognized results: Grammar recognized results • “Transfer $300 from my savings to checking.” • “Transfer $20 from my savings account to my money market account.” • “Check balance in savings account.” • “Deposit $40 into checking.”Call Control: Call Control VoiceXML 2.0 application limitations: Limited call-control (<transfer>, <disconnect>) No outbound call capability Single-threaded event model No selective call answering CCXML: CCXML The main features of the Call Control Markup Language are: Allows for outbound calls Support for multi-party calls Selective inbound call routing Asynchronous "external" event handling Conference objects for joining and unjoining participants Audio objects for splitting and mixing audio resources Control and connectivity to one or more VoiceXML interpreter instances VoiceXML control to start, kill, or suspend a process Supports multiple CCXML programs and interconnection through events Whisper transfer Supervised transfer For details see: http://www.w3.org/TR/ccxmlMultimodal: Multimodal What is multimodal? Multi-interface support for applications What modes of operation? Sequential (input/output) Synchronous (input/output)Slide30: Voice Server Plus Resource Manger Session Manger XML Interpreter Global Event Manger Adapter SPI GRXML SSML WorkFlow VoiceXML PSTN IP Network Logging/Report Voice Server Plus XML Files Non-XML Files DatabaseSlide31: VSP key features VoiceXML 2.0 compliance GRXML Interpreter SSML Interpreter ECMAScript Interpreter Session support (SM): time-delayed and collaborative modes Basic telephony and device event support (GEM) Global events (GEM): extensible to JMS, SOAP, SIP … Platform support (SPI): Nuance, SpeechWorks, Dialogic (PSTN) Data repository (RM): XML, Non-XML (binary, audio) Data access (RM): XSLT and considerations for XNDL, XQuery Multimodal capabilities OS Unix and Windows Logging and reporting capabilities: outputs XML Remote AdministrationSlide32: How VSP can simplify apps Take existing Digital Talking Books Application How existing ECMAScript data structure is created How the data structure can be implemented using the VSP An alternative implementationSlide33: ECMAScript snippet used var aMenu = new Array; aMenu[0] = new Object; aMenu[0].text = "Introduction" aMenu[0].audio = "../CONV0002.wav" aMenu[1] = new Object; aMenu[1].text = "Venice"; aMenu[1].audio = "../CONV0003.wav"; aMenu[1].items = new Array; aMenu[1].items[0] = new Object; aMenu[1].items[0].text = "The Elements of the City, A Description of its Sensory Riches"; aMenu[1].items[0].audio = "../CONV0004.wav"; aMenu[1].items[0].contenttext = "The Elements of the City, A Description of its Sensory Riches"; aMenu[1].items[0].contentaudio = ""; aMenu[1].items[0].helptext = "Would you like to hear about the Elements of the Venice? Press the Pound key "; aMenu[1].items[0].helpaudio = ""; aMenu[1].items[0].morehelptext = "Press 1 to start over"; aMenu[1].items[0].morehelpaudio = ""; Slide34: Using the RM <object name="result" classid="method://file_lookup/find" codebase=”http://www.broadstrokesinc.com/demo” data="rsm/italy.xml" type=”xml”> <param name="Depth" expr="Depth"> <param name="LevelOneIndex" expr="LevelOneIndex"> <param name="LevelTwoIndex" expr="LevelTwoIndex"> <param name="LevelThreeIndex" expr="LevelThreeIndex"> </object> <prompt> <audio expr="result.audio"> <value expr=”result.text" mode="tts" />. </audio> </prompt> VoiceXML call to RM using <object> Rendering result returned from RMSlide35: An alternative implementation <rm:load-doc docid=”mydoc1” src=“http://www.broadstrokesinc.com/apps/data/my.xml” type=”xml” /> <rm:get-next-node name=”result” docid=”mydoc1” /> get-current-node get-prev-node get-parent-node get-child-node get-all-nodes get-registered-docids remove-doc Call to RM using XML Some other common RM elementsSlide36: Q & A For VSP product details see: http://www.broadstrokesinc.com/products.html And for a product demo see: http://www.broadstrokesinc.com/vsp/demo Also see: SRGS (Speech Recognition Grammar Specification) http://www.w3.org/TR/speech-grammar XDNL (XML Document Navigation Language) http://www.w3.org/TR/xdnl NLSML Natural Language Semantics Markup Language) http://www.w3.org/TR/nl-spec EMMA (Extensible MultiModal Annotation Language ) http://www.w3.org/TR/EMMAreqs You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
VoiceXMLApplications Sabatini Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 257 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 01, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript VoiceXML Applications: VoiceXML Applications Adam Hocek ahocek@broadstrokesinc.com Web Services Working Group Ballston, VA 15 April 2003 COPYRIGHT 2003 Broadstrokes, Inc All rights reserved www.broadstrokesinc.comSlide2: Definitive VoiceXML Prentice Hall Charles Goldfarb XML Series Adam Hocek and David Cuddihy http://www.voicexml.org/edu/examples/definitive_vxml.html VoiceXML Forum (code examples) http://www.voicexml.org/edu/examples/definitive_vxml.htmlWhat is VoiceXML?: What is VoiceXML? VoiceXML is an XML language for defining dialogs and voice interfaces Developed by Lucent, AT&T, IBM, and Motorola Version 1.0 released March 2000 Version 2.0 working draft released October 2001 Version 2.0 candidate recommendation January 2003Applications with VoiceXML: Applications with VoiceXML IVR (bank, customer service, transportation,…) Voice mail PBX, softswitches Unified Messaging Information Retrieval Voice Browsers Voice Forms (information collection) Call TreesWhy VoiceXML?: Why VoiceXML? Proliferation of cell-phones Smaller devices Voice is a natural means of communication It’s dynamic nature Simplified development cycle Market opportunities (legacy systems, call-centers, ..)VoiceXML components: VoiceXML components Voice Rec. DTMF Rec. voice DTMF Dialog Manager Text To Speech audio player Audio Input Audio Output VXML Parser FIA Process Phase Collect Phase Select Phase Initialize Phase phoneTraditional IVR architecture: Traditional IVR architectureVoiceXML IVR architecture: VoiceXML IVR architectureHow do VoiceXML dialogs work?: How do VoiceXML dialogs work? Form element Form Interpretation Algorithm +The form element: The form elementVoiceXML form items: VoiceXML form items Variables: Form Item Form Form Item: field block initial link event handlers filled grammar subdialog object record transfer Add’l Elements: Common Attr: Form Interpretation Algorithm result: count: guard: name: expr: cond:Form Interpretation Algorithm: Form Interpretation Algorithm Interpreting a form item generally involves: · Initializing form items; · Selecting and playing one or more prompts; · Collecting a user input, either a response that fills in one or more fields, or a throwing of some event (help, for instance); and · Interpreting any <filled> actions that pertain to the newly filled in fields (or form).A simple form: A simple form <form> <block>To complete your order we need the following information.</block> <field name="color"> <prompt>Select from one of the following colors <enumerate/></prompt> <option>red</option> <option>blue</option> <option>green</option> </field> <field name="size"> <prompt>Select from the size. You can select <enumerate/></prompt> <option>small</option> <option>medium</option> <option>large</option> </field> <filled> Thank you. Your order is being processed. <submit next="/cgi/details.cgi"/ namelist="color size"/> </filled> </form>Processing a form item: Processing a form item Variables: Form Item Common Attr: Variables: Form Item Common Attr: Form Interpretation Algorithm result: undefined count: 0 guard: false result: red count: 1 guard: true name: color expr: undefined cond: true name: color expr: undefined cond: trueVoiceXML event handlers: VoiceXML event handlers SSML: SSML Speech Synthesis Markup Language elements include the following: emphasis - text spoken with emphasis prosody - allows for control of pitch, rate, duration, and volume sentence - identifies a sentence paragraph - identifies a paragraph say-as - uses a type construct to render text phoneme - specifies a phonetic pronunciation voice - specifies a voice characteristic mark - used for asynchronous notification break - a pauseSay-as element’s type attribute: acronym address number currency date duration measure Say-as element’s type attribute name net:email net:uri number:ordinal number:digits telephone timeAn example using SSML: An example using SSML <?xml version="1.0" encoding="iso-8859-1"?> <vxml version="2.0"> <form id="audiotest"> <block> Your <emphasis>total</emphasis> is <say-as class="currency">$299.95</say- as> <audio src="http://205.188.234.65:8006"> I'm sorry. The audio stream is not available today. </audio> </block> </form> </vxml> Grammars: Grammars VoiceXML must support one of the grammars of Speech Recognition Grammar Format Grammars describe to the underlying ASR the active word or phrases that can be recognized For a matching utterance grammars return a corresponding semantic interpretation Grammars can be very simple or complex; rules are used to describe the logic in grammarsSGRS: SGRS Speech Grammar Recognition Specification elements include the following: rule - a rule expansion declaration ruleref - a local or external rule reference item - define an entity one-of - a set of alternatives tag - a string associated to a rule expansion grammar - root elementRule Constructs: Rule Constructs • Sequence, an expansion used to define an exact phrase • Alternative set of expansions with optional weighting, to define choices • Precedence used to define grouping • Optional rule expansions • Repetition operators • Recursion (implementation is optional for GRXML and ABNF) • Tagging, is an application aid for providing semantic interpretation. An example using GRXML: An example using GRXML <form id="test"> <field name="favcolor"> <prompt>What is your favorite color?</prompt> <grammar xml:lang="en-US" version="1.0" root="example1"> <rule id="example1" scope="public"> <one-of> <item><tag>'red'</tag>red</item> <item><tag>'green'</tag>green</item> <item><tag>'blue'</tag>blue</item> <item><tag>'red'</tag>burgundy</item> <item><tag>'blue'</tag>indigo</item> </one-of> </rule> </grammar> <filled> <prompt>You said your favorite color is <value expr="favcolor"/>.</prompt> </filled> </field> </form>How grammars work: How grammars work User VXML ASR initialization load grammars prompt user “burgundy” recognition bind result red to form item prompt user “What is your favorite color?” “You said your favorite color is red” red, green, blue, burgundy, indigoAdvanced Grammars (1 of 2): Advanced Grammars (1 of 2) <rule id="userAction"> <one-of> <item> <ruleref uri="#transactionComplex"/> <ruleref uri="builtin:currency"/> </item> <item><ruleref uri="#transactionSimple"></item> </one-of> <item repeat="1-2"> <one-of> <item>in</item> <item>from</item> <item>into</item> <item>to</item> </one-of> <item repeat="0-1">my</item> <item><ruleref uri="#accountType"/></item> <item repeat="0-1">account</item> </item> </rule>Advanced Grammars (2 of 2): Advanced Grammars (2 of 2) #JSGF V1.0; grammar userAction; public <userAction> = <NULL> (Please | Kindly | I'd like to | I want to) ((<transactionComplex> | <transactionSimple>){xa} | exit {exit}); <accountType> = savings | checking | money market; <transactionSimple> = (/4/ check balance | /3/ check the balance | /2/ balance | /1/ inquire) ([in] | [in my]) <accountType> [account]; <transactionComplex> = (transfer <builtin:currency> from [my] <accountType> [account] (to | in | into) [my] <accountType> [account]) | (transfer <builtin:currency> (to | into) [my] <accountType> [account] from [my] <accountType> [account]) | (withdraw <builtin:currency> ([from] | [from my]) <accountType> [account], [and] deposit ([in] | [into]) [my] <accountType> [account]) | (deposit <builtin:currency> ([in] | [into]) [my] <accountType> [account], [and] withdraw ([from] | [from my]) <accountType> [account]);Grammar recognized results: Grammar recognized results • “Transfer $300 from my savings to checking.” • “Transfer $20 from my savings account to my money market account.” • “Check balance in savings account.” • “Deposit $40 into checking.”Call Control: Call Control VoiceXML 2.0 application limitations: Limited call-control (<transfer>, <disconnect>) No outbound call capability Single-threaded event model No selective call answering CCXML: CCXML The main features of the Call Control Markup Language are: Allows for outbound calls Support for multi-party calls Selective inbound call routing Asynchronous "external" event handling Conference objects for joining and unjoining participants Audio objects for splitting and mixing audio resources Control and connectivity to one or more VoiceXML interpreter instances VoiceXML control to start, kill, or suspend a process Supports multiple CCXML programs and interconnection through events Whisper transfer Supervised transfer For details see: http://www.w3.org/TR/ccxmlMultimodal: Multimodal What is multimodal? Multi-interface support for applications What modes of operation? Sequential (input/output) Synchronous (input/output)Slide30: Voice Server Plus Resource Manger Session Manger XML Interpreter Global Event Manger Adapter SPI GRXML SSML WorkFlow VoiceXML PSTN IP Network Logging/Report Voice Server Plus XML Files Non-XML Files DatabaseSlide31: VSP key features VoiceXML 2.0 compliance GRXML Interpreter SSML Interpreter ECMAScript Interpreter Session support (SM): time-delayed and collaborative modes Basic telephony and device event support (GEM) Global events (GEM): extensible to JMS, SOAP, SIP … Platform support (SPI): Nuance, SpeechWorks, Dialogic (PSTN) Data repository (RM): XML, Non-XML (binary, audio) Data access (RM): XSLT and considerations for XNDL, XQuery Multimodal capabilities OS Unix and Windows Logging and reporting capabilities: outputs XML Remote AdministrationSlide32: How VSP can simplify apps Take existing Digital Talking Books Application How existing ECMAScript data structure is created How the data structure can be implemented using the VSP An alternative implementationSlide33: ECMAScript snippet used var aMenu = new Array; aMenu[0] = new Object; aMenu[0].text = "Introduction" aMenu[0].audio = "../CONV0002.wav" aMenu[1] = new Object; aMenu[1].text = "Venice"; aMenu[1].audio = "../CONV0003.wav"; aMenu[1].items = new Array; aMenu[1].items[0] = new Object; aMenu[1].items[0].text = "The Elements of the City, A Description of its Sensory Riches"; aMenu[1].items[0].audio = "../CONV0004.wav"; aMenu[1].items[0].contenttext = "The Elements of the City, A Description of its Sensory Riches"; aMenu[1].items[0].contentaudio = ""; aMenu[1].items[0].helptext = "Would you like to hear about the Elements of the Venice? Press the Pound key "; aMenu[1].items[0].helpaudio = ""; aMenu[1].items[0].morehelptext = "Press 1 to start over"; aMenu[1].items[0].morehelpaudio = ""; Slide34: Using the RM <object name="result" classid="method://file_lookup/find" codebase=”http://www.broadstrokesinc.com/demo” data="rsm/italy.xml" type=”xml”> <param name="Depth" expr="Depth"> <param name="LevelOneIndex" expr="LevelOneIndex"> <param name="LevelTwoIndex" expr="LevelTwoIndex"> <param name="LevelThreeIndex" expr="LevelThreeIndex"> </object> <prompt> <audio expr="result.audio"> <value expr=”result.text" mode="tts" />. </audio> </prompt> VoiceXML call to RM using <object> Rendering result returned from RMSlide35: An alternative implementation <rm:load-doc docid=”mydoc1” src=“http://www.broadstrokesinc.com/apps/data/my.xml” type=”xml” /> <rm:get-next-node name=”result” docid=”mydoc1” /> get-current-node get-prev-node get-parent-node get-child-node get-all-nodes get-registered-docids remove-doc Call to RM using XML Some other common RM elementsSlide36: Q & A For VSP product details see: http://www.broadstrokesinc.com/products.html And for a product demo see: http://www.broadstrokesinc.com/vsp/demo Also see: SRGS (Speech Recognition Grammar Specification) http://www.w3.org/TR/speech-grammar XDNL (XML Document Navigation Language) http://www.w3.org/TR/xdnl NLSML Natural Language Semantics Markup Language) http://www.w3.org/TR/nl-spec EMMA (Extensible MultiModal Annotation Language ) http://www.w3.org/TR/EMMAreqs