Tutorial: Tutorial Developing and Deploying Multimodal Applications
James A. Larson Larson Technical Services jim @ larson-tech.com
SpeechTEK West February 23, 2007
Developing and Deploying Multimodal Applications: Developing and Deploying Multimodal Applications What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
Capturing Input from the User: Capturing Input from the User Medium Input Device Mode
Capturing Input From the User: Capturing Input From the User Multimodal Acoustic Tactile Visual Microphone Keypad Keyboard Pen Joystick Scanner Still camera RFID Speech Key Ink GUI Photograph Gaze tracking
Gesture reco Mouse Medium Input Device Mode Electronic Video camera Biometric GPS Digital data
Presenting Output to the User: Presenting Output to the User Acoustic Visual Speaker Display Speech Text Photograph Movie Medium Output Device Mode Tactile Joystick Pressure
Presenting Output to the User: Presenting Output to the User Acoustic Visual Speaker Display Speech Text Photograph Movie Medium Output Device Mode Tactile Joystick Pressure Multimedia
Multimodal and Multimedia Application Benefits: Multimodal and Multimedia Application Benefits Provide a natural user interface by using multiple channels for user interactions
Simplify interaction with small devices with limited keyboard and display, especially on portable devices
Leverage advantages of different modes in different contexts
Decrease error rates and time required to perform tasks
Increase accessibility of applications for special users
Enable new kinds of applications
Exercise 1: Exercise 1 What new multimodal applications would be useful for your work?
What new multimodal applications would be entertaining to you, your family, or friends?
Voice as a “Third Hand”: Voice as a 'Third Hand' Game Commander 3
http://www.gamecommander.com/
Voice-Enabled Games: Voice-Enabled Games Scansoft’s VoCon Games Speech SDK
http://www.scansoft.com/games/
PlayStation® 2
Nintendo® GameCube™
http://www.omnipage.com/games/poweredby/
Education: Education Tucker Maxon School of Oral Education
http://www.tmos.org/
Education: Education Reading Tutor Project
http://cslr.colorado.edu/beginweb/reading/reading.html
Multimodal Applications Developed by PSU and OHSU Students: Multimodal Applications Developed by PSU and OHSU Students Hands-busy
Troubleshooting a car’s motor
Repairing a leaky faucet
Tune musical instruments
Construction
Complex origami artifact Project book for children
Cooking—Talking recipe book
Entertainment
Child’s fairy tale book Audio-controlled juke box Games (Battleship, Go)
Multimodal Applications Developed by PSU and OHSU Students (continued): Multimodal Applications Developed by PSU and OHSU Students (continued) Data collection
Buy a car Collect health data Buy movie tickets Order meals from a restaurant Conduct banking business Locate a business Order a computer Choose homeless pets from an animal shelter
Authoring Photo album tour
Education
Flash cards—Addition tables
Download Opera and the speech plug-in Go to www.larson-tech.com/mm-Projects/Demos.htm
New Application Classes: New Application Classes Active listening
Verbal VCR controls: start, stop, fast forward, rewind, etc.
Virtual assistants
Listen for requests and immediately perform them
- Violin tuner - TV Controller - Environmental controller - Family-activity coordinator
Synthetic experiences
Synthetic interviews Speech-enabled games Education and training
Authoring content
Two General Uses of Multiple Modes of Input: Two General Uses of Multiple Modes of Input Redundancy—One mode acts as backup for another mode
In noisy environments, use keypad instead of speech input.
In cold environments, use speech instead of keypad.
Complementary—One mode supplements another mode
Voice as a third hand
'Move that (point) to there (point)' (late fusion)
Lip reading = video + speech (early fusion)
Potential Problems with Multimodal Applications: Potential Problems with Multimodal Applications Voice may make an application 'noisy.'
Privacy and security concerns
Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language.
Potential Problems with Multimodal Applications: Potential Problems with Multimodal Applications Voice may make an application 'noisy.'
Privacy and security concerns
Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language. Full natural language processing requires:
Knowledge of outside world
History of the user-computer interaction
Sophisticated understanding of language structure
'Natural language-like' simulates natural language for a small domain, short history, and specialized language structures
Potential Problems with Multimodal Applications: Potential Problems with Multimodal Applications Voice may make an application 'noisy.'
Privacy and security concerns
Noise pollution
Sometimes speech and handwriting recognition systems fail.
False expectations of users wanting to use natural language. Full 'natural language' processing requires:
Knowledge of outside world
History of the user-computer interaction
Sophisticated understanding of language structure
'Natural language-like' simulates natural language for a small domain, short history, and specialized language structures. Possible only
on Star Trek Incorrectly
called 'NLP'
Adding a New Mode to an Application: Adding a New Mode to an Application Only if…
The new mode enables new features not previously possible.
The new modes dramatically improves the usability
Always….
Redesign the application to take advantage of the new mode.
Provide backup for the new mode.
Test, test, and test some more.
Exercise 2: Exercise 2 Where will multimodal applications be used?
A. At home
B. At work
C. 'On the road'
D. Other?
Developing and Deploying Multimodal Applications: Developing and Deploying Multimodal Applications What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
The Playbill—Who’s Who on the Team : The Playbill— Who’s Who on the Team Users—Their lives will be improved by using the multimodal application
Interaction designer—Designs the dialog—when and how the user and system interchange requests and information
Multimodal programmer—Implements VUI
Voice talent—Records spoken prompts and messages
Grammar writer—Specifies words and phrases the user may speak in response to a prompt
TTS specialist—Specifies verbal and audio sounds and inflections
Quality assurance specialist—Performs tests to validate the application is both useful and usable
Customer—Pays the bills
Program manager—Organizes the work and makes sure it is completed according to schedule and under budget
Development Process: Development Process Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage Each stage involves users
Iterative refinement
Development Process: Development Process Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage Identify the Application
Conduct ethnography studies
Identify candidate applications
Conduct focus groups
Select the application
Slide26:
Exercise 3: Exercise 3 What will be the 'killer' consumer multimodal applications?
Development Process: Development Process Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage Specify the Application
Construct the conceptual model
Construct scenarios
Specify performance and and#xB; preference requirements
Specify Performance and Preference Requirements: Specify Performance and Preference Requirements Is the application useful? Is the application enjoyable? Performance Preference Measure what the users actually accomplished. Validate that the users achieved success. Measure users’ likes and dislikes. Validate that the users enjoyed the application and will use it again again.
Performance Metrics: Performance Metrics
Exercise 4: Exercise 4 Specify performance metrics for the multimodal email application
Preference Metrics: Preference Metrics
Exercise 5: Exercise 5 Specify preference metrics for the multimodal email application
Preference Metrics (Open-ended Questions): Preference Metrics (Open-ended Questions) What did you like the best about this voice-enabled application? (Do not change these features.)
What did you like the least about this voice-enabled application? (Consider changing these features.)
What new features would you like to have added? (Consider adding these features in this or a later release.)
What features do you think you will never use? (Consider deleting these features.)
Do you have any other comments and suggestions? (Pay attention to these responses. Callers frequently suggest very useful ideas.)
Development Process: Development Process Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage Develop the Application
Specify the persona
Specify the modes and and#xB; modalities
Specify the dialog script
UI Design Guidelines: UI Design Guidelines Guidelines for Voice User Interfaces
Bruce Balentine and David P. Morgan. How to Build a Speech Recognition Application, Second Edition. http://www.eiginc.com
Guidelines for Graphical User Interfaces
Research-Based Web Design and Usability Guidelines. U.S. Department of Health and Human Services. http://www.usability.gov/pdfs/guidelines.html
Guidelines for Graphical User Interfaces
Common Sense Guidelines for Developing Multimodal User Interfaces.W3C Working Group Note. 19 April 2006 http://www.w3.org/2002/mmi/Group/2006/Guidelines/
Common-sense Suggestions1. Satisfy Real-World Constraints : Common-sense Suggestions 1. Satisfy Real-World Constraints Task-oriented Guidelines
1.1. Guideline: For each task, use the easiest mode available on the device.
Physical Guidelines
1.2. Guideline: If the user’s hands are busy, then use speech.
1.3. Guideline: If the user’s eyes are busy, then use speech.
1.4. Guideline: If the user may be walking, use speech for input.
Environmental Guidelines
1.5. Guideline: If the user may be in a noisy environment, then use a pen, keys or mouse.
1.6. Guideline: If the user’s manual dexterity may be impaired, then use speech.
Exercise 6: Exercise 6 What input mode(s) should be used for each of the following tasks?
A. Selecting objects
B. Entering text
C. Entering symbols
D. Enter sketches or illustrations
Common-sense Suggestions2. Communicate Clearly, Concisely, and Consistently with Users : Common-sense Suggestions 2. Communicate Clearly, Concisely, and Consistently with Users
Consistency Guidelines
2.1. Phrase all prompts consistently.
2.2. Enable the user to speak keyword utterances rather than natural language sentences.
2.3. Switch presentation modes only when the information is not easily presented in the current mode.
2.4. Make commands consistent.
2.5. Make the focus consistent across modes.
Organizational Guidelines
2.6. Use audio to indicate the verbal structure.
2.7. Use pauses to divide information into natural 'chunks.'
2.8. Use animation and sound to show transitions.
2.9. Use voice navigation to reduce the number of screens.
2.10. Synchronize multiple modalities appropriately.
2.11. Keep the user interface as simple as possible.
Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors : Common-sense Suggestions 3. Help Users Recover Quickly and Efficiently from Errors
Conversational Guidelines
3.1. Users tend to use the same mode that was used to prompt them.
3.2. If privacy is not a concern, use speech as output to provide commentary or help.
3.3. Use directed user interfaces, unless the user is always knowledgeable and experienced in the domain.
3.4 Always provide context-sensitive help for every field and command.
Common-sense Suggestions3. Help Users Recover Quickly and Efficiently from Errors (Continued): Common-sense Suggestions 3. Help Users Recover Quickly and Efficiently from Errors (Continued)
Reliability Guidelines
Operational status
3.5. The user always should be able to determine easily if the device is listening to the user.
3.6. For devices with batteries, users always should be able to determine easily how much longer the device will be operational.
3.8. Support at least two input modes so one input mode can be used when the other cannot.
Visual feedback
3.8. Present words recognized by the speech recognition system on the display, so the user can verify they are correct.
3.9. Display the n-best list to enable easy speech recognition error correction
3.10. Try to keep response times less than 5 seconds. Inform the user of longer response times.
Common-sense Suggestions4. Make Users Comfortable : Common-sense Suggestions 4. Make Users Comfortable Listening mode
4.1. Speak after pressing a speak key. which automatically releases after the user finishes speaking.
System Status
4.2. Always present the current system status to the user.
Human-memory Constraints
4.3. Use the screen to ease stress on the user’s short-term memory.
Common-sense Suggestions4. Make Users Comfortable (Continued): Common-sense Suggestions 4. Make Users Comfortable (Continued) Social Guidelines
4.4. If the user may need privacy, use a display rather than render speech.
4.5. If the user may need privacy, use a pen or keys.
4.6. If the device may be used during a business meeting, then use a pen or keys (with the keyboard sounds turned off).
Advertising Guidelines
4.7. Use animation and sound to attract the user’s attention.
4.8. Use landmarks to help the know where he is.
Common-sense Suggestions4. Make Users Comfortable (continued): Common-sense Suggestions 4. Make Users Comfortable (continued) Ambience
4.9 Use audio and graphic design to set the mood and convey emotion in games and entertainment applications.
Accessibility
4.10 For each traditional output technique, provide an alternative output technique.
4.11. Enable users to adjust the output presentation.
Books: Books Ramon Lopez-Cozar Delgado and Masahiro Araki. Spoken, Multilingual and Multimodal Dialog Systems—Development and Assessment. West Sussex, England: Wiley, 2005.
Julie A. Jacko and Andrew Sears (Editors) The Human-Computer Interaction Handbook—Fundamentals, Evolving technologies, and Emerging Applications. Mahwah, New Jersey: Lawrence Erlbaum Associates, 2003.
Development Process: Development Process Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage Test The Application
Component test
Usability test
Stress test
Field test
Testing Resources: Testing Resources Jeffrey Rubin. Handbook of Usability Testing. New York: Wiley Technical Communication Library, 1994.
Peter and David Leppik. Gourmet Customer Service. Eden Prairie, MN: VocalLabs, 2005.
[email protected]
Development Process: Development Process Investigation Stage
Design Stage
Development Stage
Testing Stage
Sustaining Stage Deploy and Monitor the Application
User Survey
Usage reports from log files
User feedback and comments
Developing and Deploying Multimodal Applications: Developing and Deploying Multimodal Applications What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
W3C Multimodal Interaction Framework: W3C Multimodal Interaction Framework
Recognition Grammar
Semantic Interpretation
Extended Multimodal Annotation (EMMA)
Speech Synthesis
Interaction Managers
General description of speech application components and how they relate
W3C Multimodal Interaction Framework: Interaction
Manager Application
Functions Telephony
Properties W3C Multimodal Interaction Framework Input Output
Slide52: ASR Semantic
Interpretation Information
Integration
Interaction
Manager TTS Language
Generation Application
Functions User Ink Media
Planning Audio Telephony
Functions W3C Multimodal Interaction Framework Display
Slide53: W3C Multimodal Interaction Framework ASR Semantic
Interpretation Information
Integration
Interaction
Manager TTS Language
Generation Application
Functions User Ink Media
Planning Audio Telephony
Functions Display SRGS: Describe what the user may say at each point in the dialog
Speech Recognition Engines: Speech Recognition Engines
Speech Recognition Engines: Speech Recognition Engines
Switch
vocabularies
Grammars: Grammars Describe what the user may say or handwrite at a point in the dialog
Enable the recognition engine to work faster and more accurately
Two types of grammars:
Structured Grammar
Statistical Grammar (N-grams)
Structured Grammars: Structured Grammars Specifies words that a user may speak or write
Two representation formats
1. Backus-Naur format (ABNF)
Production Rules
Single_digit ::= zero | one | two | … | nine
Zero_thru_ten ::= Single_digit | ten
2. XML format
Can be processed by XML validater
Example XML Grammar: Example XML Grammar andlt;grammar mode = 'voice' type = 'application/srgs+xml' root = 'zero_to_ten'andgt; andlt;rule id = 'zero_to_ten'andgt; andlt;one-ofandgt; andlt;ruleref uri = '#single_digit'/andgt; andlt;itemandgt; ten andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt; andlt;rule id = 'single_digit'andgt; andlt;one-ofandgt; andlt;itemandgt; zero andlt;/itemandgt; andlt;itemandgt; one andlt;/itemandgt; andlt;itemandgt; two andlt;/itemandgt; andlt;itemandgt; three andlt;/itemandgt; andlt;itemandgt; four andlt;/itemandgt; andlt;itemandgt; five andlt;/itemandgt; andlt;itemandgt; six andlt;/itemandgt; andlt;itemandgt; seven andlt;/itemandgt; andlt;itemandgt; eight andlt;/itemandgt; andlt;itemandgt; nine andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt;
Exercise 7: Exercise 7 Write a grammar that recognizes the digits zero through nineteen
(Hint: Modify the previous page)
Reusing Existing Grammars: Reusing Existing Grammars andlt;grammar
type = 'application/srgs+xml' root = 'size ' src = 'http://www.example.com/size.grxml'/andgt;
Exercise 8: Exercise 8 Write a grammar for positive responses to a yes/no question (i.e., 'yes,' 'sure,' 'affirmative,' and so forth)
When Is a Grammar Too Large?: When Is a Grammar Too Large? Word
Coverage Response
Slide63: W3C Multimodal Interaction Framework ASR Semantic
Interpretation Information
Integration
Interaction
Manager TTS Language
Generation Application
Functions User Ink Media
Planning Audio Telephony
Functions Display SISR: A procedural JavaScript-like language for interpreting the text strings returned by the speech synthesis engine
Semantic Interpretation: Semantic Interpretation Semantic scripts employ ECMAScript
Advantages:
Translate aliases to vocabulary words
Perform calculations
Produces a rich structure rather than a text string
Semantic Interpretation: Semantic Interpretation Recognizer
Conversation
Manager Large white
t-shirt Big white
t-shirt Grammar
Semantic Interpretation: Semantic Interpretation Recognizer
Grammar with
Semantic
Interpretation
Scripts Semantic
Interpretation
Processor Conversation
Manager andlt;rule id = 'action'andgt;
andlt;one-ofandgt;and#xB; andlt;itemandgt; small andlt;tagandgt; out.size = 'small'; andlt;/tagandgt; andlt;/itemandgt;and#xB; andlt;itemandgt; medium andlt;tagandgt; out.size = 'medium'; andlt;/tagandgt; andlt;/itemandgt; and#xB; andlt;itemandgt; large andlt;tagandgt; out.size = 'large'; andlt;/tagandgt; andlt;/itemandgt; and#xB; andlt;itemandgt; big andlt;tagandgt; out.size = 'large'; andlt;/tagandgt; andlt;/itemandgt;
andlt;/one-ofandgt;and#xB; andlt;one-ofandgt;and#xB; andlt;itemandgt; green andlt;tagandgt; out.color = 'green'; andlt;/tagandgt; andlt;/itemandgt;and#xB; andlt;itemandgt; blue andlt;tagandgt; out.color = 'blue'; andlt;/tagandgt; andlt;/itemandgt;and#xB; andlt;itemandgt; white andlt;tagandgt; out.color = 'white'; andlt;/tagandgt; andlt;/itemandgt;and#xB; andlt;/one-ofandgt;and#xB;andlt;/ruleandgt; Big white
t-shirt {
size: large
color: white
}
Exercise 9 Modify this rule to return only “yes”: Exercise 9 Modify this rule to return only 'yes' andlt;grammar type = 'application/srgs+xml' root = 'yes' mode = 'voice'andgt;
andlt;rule id = 'yes'andgt; andlt;one-ofandgt; andlt;itemandgt; yes andlt;/itemandgt; andlt;itemandgt; sure andlt;/itemandgt; andlt;itemandgt; affirmative andlt;/itemandgt;
…
andlt;/one-ofandgt; andlt;/ruleandgt;
andlt;/grammarandgt;
Slide68: W3C Multimodal Interaction Framework ASR Semantic
Interpretation Information
Integration
Interaction
Manager TTS Language
Generation Application
Functions User Ink Media
Planning Audio Telephony
Functions Display EMMA: A language for representing the semantic content from speech recognizers, handwriting recognizers, and other input devices
EMMA: EMMA Extensible MultiModal Annotation markup language
Canonical structure semantic interpretations for a variety of inputs including:
Speech
Natural language text
GUI
Ink
EMMA: EMMA Keyboard
Interpretation Speech
Recognition Merging/
Unification Speech Keyboard EMMA EMMA EMMA Grammar
+ Semantic
Interpretation
Instructions Interpretation
Instructions Applications
EMMA: EMMA Keyboard
Interpretation Speech
Recognition Merging/
Unification Speech Keyboard EMMA EMMA EMMA Grammar
+ Semantic
Interpretation
Instructions Interpretation
Instructions Applications andlt;interpretation mode = 'speech'andgt;
andlt;travelandgt;
andlt;to hook='ink'/andgt;
andlt;from hook='ink'/andgt;
andlt;dayandgt; Tuesday andlt;/dayandgt;
andlt;/travelandgt;
andlt;/interpretationandgt;
EMMA: EMMA Keyboard
Interpretation Speech
Recognition Merging/
Unification Speech Keyboard EMMA EMMA EMMA Grammar
+ Semantic
Interpretation
Instructions Interpretation
Instructions Applications andlt;interpretation mode = 'speech'andgt;
andlt;travelandgt;
andlt;to hook='ink'/andgt;
andlt;from hook='ink'/andgt;
andlt;dayandgt; Tuesday andlt;/dayandgt;
andlt;/travelandgt;
andlt;/interpretationandgt;
andlt;interpretation mode = 'ink'andgt;
andlt;travelandgt;
andlt;toandgt;Las Vegas andlt;/toandgt;
andlt;fromandgt;Portland andlt;/fromandgt;
andlt;/travelandgt;
andlt;/interpretationandgt;
EMMA: andlt;interpretation mode = 'speech'andgt;
andlt;travelandgt;
andlt;to hook='ink'/andgt;
andlt;from hook='ink'/andgt;
andlt;dayandgt; Tuesday andlt;/dayandgt;
andlt;/travelandgt;
andlt;/interpretationandgt;
andlt;interpretation mode = 'ink'andgt;
andlt;travelandgt;
andlt;toandgt;Las Vegas andlt;/toandgt;
andlt;fromandgt;Portland andlt;/fromandgt;
andlt;/travelandgt;
andlt;/interpretationandgt;
EMMA Keyboard
Interpretation Speech
Recognition Merging/
Unification Speech Keyboard EMMA EMMA EMMA Grammar
+ Semantic
Interpretation
Instructions Interpretation
Instructions Applications andlt;interpretation mode = 'interp1'andgt;
andlt;travelandgt;
andlt;toandgt; Las Vegas andlt;/toandgt;
andlt;fromandgt; Portland andlt;/fromandgt;
andlt;dayandgt; Tuesday andlt;/dayandgt;
andlt;/travelandgt;
andlt;/interpretationandgt;
Exercise 10: Exercise 10 andlt;interpretation mode = 'speech'andgt; andlt;moneyTransferandgt; andlt;sourceAcct hook='ink'/andgt; andlt;targetAcct hook='ink'/andgt; andlt;amountandgt; 300 andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; andlt;interpretation mode = 'ink'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; savings andlt;/sourceAcctandgt; andlt;targetAcctandgt; checkingandlt;/targetAcctandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt;
Given the following two EMMA specifications,
what is the unified EMMA specification? andlt;interpretation mode ='intp1'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; ______ andlt;/sourceAcctandgt; andlt;targetAcctandgt; _______andlt;/targetAcctandgt; andlt;amountandgt; ______ andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt;
Unified EMMA specification:
Slide75: W3C Multimodal Interaction Framework ASR Semantic
Interpretation Information
Integration
Interaction
Manager TTS Language
Generation Application
Functions User Ink Media
Planning Audio Telephony
Functions Display SSML: A language for rendering text as synthesized speech
Speech Synthesis Markup Language: Speech Synthesis Markup Language Structure
Analysis Text
Normali-
zation Text-to-
Phoneme
Conversion Prosody
Analysis Waveform
Production Markup support:
emphasis, break, prosody
Non-markup behavior:
automatically generate
prosody through analysis
of document structure and
sentence syntax Markup support:
phoneme, sayas
Non-markup behavior:
look up in pronunciation
dictionary Markup support: sayas for dates, times, etc.
Non-markup behavior: automatically identify
and convert constructs Markup support:
paragraph, sentence
Non-markup behavior:
infer structure by
automated text analysis
Speech Synthesis Markup LanguageExamples: Speech Synthesis Markup Language Examples andlt;phoneme alphabet='ipa' ph='wɪnɛfɛks'andgt; WinFX andlt;/phonemeandgt; is a great platform
andlt;prosody pitch = 'x-low'andgt; Who’s been sleeping in my bed? andlt;/prosodyandgt; said papa bear. andlt;prosody pitch = 'medium'andgt; Who’s been sleeping in my bed? andlt;/prosodyandgt; said momma bear. andlt;prosody pitch = 'x-high'andgt; Who’s been sleeping in my bed? andlt;/prosodyandgt; said baby bear.
Popular Strategy: Popular Strategy Develop dialogs using SSML
Usability test dialogs
Extract prompts
Hire voice talent to record prompts
Replace andlt;promptandgt; with andlt;audioandgt;
Slide79: W3C Multimodal Interaction Framework ASR Semantic
Interpretation Information
Integration
Interaction
Manager TTS Language
Generation Application
Functions User Ink Media
Planning Audio Telephony
Functions Display VXML: A language for controlling the exchange of information and commands between the user and the system
Developing and Deploying Multimodal Applications: Developing and Deploying Multimodal Applications What applications should be multimodal?
What is the multimodal application development process?
What standard languages can be used to develop multimodal applications?
What standard platforms are available for multimodal applications?
Speech APIs and SDKs: Speech APIs and SDKs JSAPI—Java Speech Application Program Interface
http://java.sun.com/products/java-media/speech/
http://developer.mozilla.org/en/docs/JSAPI_Reference
Nuance Mobil Speech Platform
http://www.nuance.com/speechplatform/components.asp
VSAPI—Voice Signal API
http://www.voicesignal.com/news/articles/2006-06-21-SymbianOne.htm
SALT
http://www.saltforum.org/
Interaction Manager Approaches: Interaction Manager Approaches Interaction
Manager
(XHTML)
VoiceXML 2.0
Modules Interaction
Manager
(C#)
SAPI 5.3 X+V Object-
oriented Interaction
Manager
(SCXML) XHTML VoiceXML 3.0 InkML W3C
Interaction Manager Approaches: Interaction Manager Approaches X+V Interaction
Manager
(SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction
Manager
(XHTML)
VoiceXML 2.0
Modules Interaction
Manager
(C#)
SAPI 5.3 Object-
oriented
SAPI 5.3 & Windows Vista™Speech Synthesis: SAPI 5.3 andamp; Windows Vista™ Speech Synthesis W3C Speech Synthesis Markup Language 1.0
andlt;speakandgt; andlt;phoneme alphabet='ipa' ph='wɪnɛfɛks'andgt; WinFX andlt;/phonemeandgt; is a great platform andlt;/speakandgt;
Microsoft proprietary PromptBuilder
myPrompt.AppendTextWithPronunciation ('WinFX', 'wɪnɛfɛks'); myPrompt.AppendText('is a great platform.'); Interaction
Manager
(C#)
SAPI 5.3 Object-
oriented
SAPI 5.3 & Windows Vista™Speech Recognition: SAPI 5.3 andamp; Windows Vista™ Speech Recognition W3C Speech Recognition Grammar Specification 1.0
andlt;grammar type='application/srgs+xml' root= 'city' mode='voice'andgt; andlt;rule id = 'city'andgt; andlt;one-ofandgt; andlt;itemandgt; New York City andlt;/itemandgt; andlt;itemandgt; New York andlt;/itemandgt; andlt;itemandgt; Boston andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt;
Microsoft proprietary Grammar Builder
Choices cityChoices = new Choices(); cityChoices.AddPhrase ('New York City'); cityChoices.AddPhrase ('New York'); cityChoices.AddPhrase ('Boston'); Grammar pizzaGrammar = new Grammar (new GrammarBuilder(pizzaChoices));
SAPI 5.3 & Windows Vista™Semantic Interpretation: SAPI 5.3 andamp; Windows Vista™ Semantic Interpretation Augment SRGS grammar with Jscript® for semantic interpretation
andlt;grammar type='application/srgs+xml' root= 'city' mode='voice'andgt; andlt;rule id = 'city'andgt; andlt;one-ofandgt; andlt;itemandgt; New York City andlt;tagandgt; city='JFK' andlt;/tagandgt;andlt;/itemandgt; andlt;itemandgt; New York andlt;tagandgt; city = 'JFK' andlt;/tagandgt; andlt;/itemandgt; andlt;itemandgt; Portland andlt;tagandgt; city = 'PDX' andlt;/tagandgt;andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt; andlt;/grammarandgt;
User-Specified 'Shortcuts' recognizer replaces 'shortcut word' by expanded string
User says: my address
System: 1033 Smith Street, Apt. 7C, Bloggsville 00000
SAPI 5.3 & Windows Vista™Dialog : SAPI 5.3 andamp; Windows Vista™ Dialog Introduce the System Speech.Recognition namespace
Instantiate a SpeechRecognizer object
Build a grammar
Attach an event handler
Load the grammar into the recognizer
When the recognizer hears something that fits the grammar, the SpeechRecognized event handler is invoked, which accesses the Result object and works with the recognized text
SAPI 5.3 & Windows Vista™Dialog: SAPI 5.3 andamp; Windows Vista™ Dialog using System;
using System.Windows.Forms;
using System.ComponentModel;
using System.Collections.Generic;
using System.Speech.Recognition;
namespace Reco_Sample_1
{
public partial class Form1 : Form
{
//create a recognizer
SpeechRecognizer _recognizer = new SpeechRecognizer();
public Form1() { InitializeComponent(); }
private void Form1_Load(object sender, EventArgs e)
//Create a pizza grammar
Choices pizzaChoices = new Choices();
pizzaChoices.AddPhrase('I'd like a cheese pizza');
pizzaChoices.AddPhrase('I'd like a pepperoni pizza');
{
pizzaChoices.AddPhrase('I'd like a large pepperoni pizza');
pizzaChoices.AddPhrase(
'I'd like a small thin crust vegetarian pizza');
Grammar pizzaGrammar =
new Grammar(new GrammarBuilder(pizzaChoices));
//Attach an event handler
pizzaGrammar.SpeechRecognized +=
new EventHandlerandlt;RecognitionEventArgsandgt;(
PizzaGrammar_SpeechRecognized);
_recognizer.LoadGrammar(pizzaGrammar);
}
void PizzaGrammar_SpeechRecognized(
object sender, RecognitionEventArgs e)
{
MessageBox.Show(e.Result.Text);
}
}
}
SAPI 5.3 & Windows Vista™References: SAPI 5.3 andamp; Windows Vista™ References Speech API Overview
http://msdn2.microsoft.com/en- us/library/ms720151.aspx#API_Speech_Recognition
Microsoft Speech API (SAPI) 5.3
http://msdn2.microsoft.com/en-us/library/ms723627.aspx
'Exploring New Speech Recognition And Synthesis APIs In Windows Vista' by Robert Brown
http://msdn.microsoft.com/msdnmag/issues/06/01/ speechinWindowsVista/default.aspx#Resources
Interaction Manager Approaches: Interaction Manager Approaches X+V Interaction
Manager
(SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction
Manager
(XHTML)
VoiceXML 2.0
Modules Interaction
Manager
(C#)
SAPI 5.3 Object-
oriented
Step 1: Start with Standard VoiceXML and Standard XHTML: Step 1: Start with Standard VoiceXML and Standard XHTML VoiceXML
andlt;form id='topform'andgt; andlt;field name='city'andgt; andlt;promptandgt;Say a nameandlt;/promptandgt; andlt;grammar src='city.grxml'/andgt; andlt;/fieldandgt; andlt;/formandgt;
XHTML
andlt;formandgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; W3C grammar language
Step 2: Combine: Step 2: Combine andlt;html xmlns='http://www.w3.org/1999/xhtml'andgt;
andlt;headandgt; andlt;form id='topform'andgt; andlt;field name='city'andgt; andlt;promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;grammar src ='city.grxml'/andgt; andlt;/fieldandgt; andlt;/formandgt; andlt;/headandgt;
andlt;body andlt;formandgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; andlt;/bodyandgt;
andlt;/htmlandgt;
Step 3: Insert vxml Namespace: Step 3: Insert vxml Namespace andlt;html xmlns='http://www.w3.org/1999/xhtml'
xmlns:vxml='http://www.w3.org/2001/vxml'andgt;
andlt;headandgt; andlt;vxml:form id='topform'andgt; andlt;vxml:field name='city'andgt; andlt;vxml:promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;vxml:grammar ='city.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt;
andlt;bodyandgt; andlt;formandgt; Result: andlt;input type='text' name='in1'/ andlt;/formandgt; andlt;/bodyandgt;
andlt;/htmlandgt;
Step 4: Insert event: Step 4: Insert event andlt;html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev='http://www.w3.org/2001/xml-events'andgt;
andlt;headandgt; andlt;vxml:form id='topform'andgt; andlt;vxml:field name='city'andgt; andlt;vxml:promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;vxml:grammar src ='city.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt;
andlt;body andlt;form ev:event='load' ev:handler='#topform'andgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; andlt;/bodyandgt;
andlt;/htmlandgt;
Step 5: Insert <sync>: Step 5: Insert andlt;syncandgt; andlt;html xmlns=http://www.w3.org/1999/xhtml xmlns:vxml=http://www.w3.org/2001/vxml xmlns:ev=http://www.w3.org/2001/xml-events xmlns:xv='http://www.w3.org/2002/xhtml+voice'andgt;
andlt;headandgt; andlt;xv:sync xv:input='in1' xv:field='#result'/andgt; andlt;vxml:form id='topform'andgt; andlt;vxml:field name='city' xv:id='result'andgt; andlt;vxml:promptandgt;Say a nameandlt;/vxml:promptandgt; andlt;vxml:grammar src ='city.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt;
andlt;body andlt;form ev:event='load' ev:handler='#topform'andgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt; andlt;/bodyandgt;
andlt;/htmlandgt;
XHTML plus Voice (X+V) References: XHTML plus Voice (X+V) References Available on
ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb
Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb
Opera 9 for Windows http://www.opera.com/
Programmers Guide
ftp://ftp.software.ibm.com/software/pervasive/info/multimodal /XHTML_voice_programmers_guide.pdf
For a variety of small illustrative applications
http://www.larson-tech.com/MM-Projects/Demos.htm
Exercise 11: Exercise 11 Specify the X+V notation for integrating the following VoiceXML and XHTML code by completing the code on the next page
VoiceXML
andlt;form id='stateForm'andgt; andlt;field name='state'andgt; andlt;promptandgt;Say a state nameandlt;/promptandgt; andlt;grammar src='city.grxml'/andgt; andlt;/fieldandgt; andlt;/formandgt;
XHTML
andlt;formandgt; Result: andlt;input type='text' name='in1'/andgt; andlt;/formandgt;
Exercise 11 (continued): Exercise 11 (continued) andlt;html xmlns='http://www.w3.org/1999/xhtml' xmlns:vxml='http://www.w3.org/2001/vxml' xmlns:ev='http://www.w3.org/2001/xml-events' xmlns:xv='http://www.w3.org/2002/xhtml+voice'andgt;
andlt;headandgt; andlt;xv:sync xv:input='_______' xv:field='________'/andgt; andlt;vxml:form id='________'andgt; andlt;vxml:field name='state' xv:id='________'andgt; andlt;vxml:promptandgt;Say a state nameandlt;/vxml:promptandgt; andlt;vxml:grammar src ='state.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt;
andlt;body andlt;form ev:event='load' ev:handler='#________'andgt; Result: andlt;input type='text' name='_______'/andgt; andlt;/formandgt; andlt;/bodyandgt;
andlt;/htmlandgt;
Interaction Manager Approaches: Interaction Manager Approaches X+V Interaction
Manager
(SCXML) XHTML VoiceXML 3.0 InkML W3C Interaction
Manager
(XHTML)
VoiceXML 2.0
Modules Interaction
Manager
(C#)
SAPI 5.3 Object-
oriented
MMI Architecture—4 Basic Components: MMI Architecture—4 Basic Components Runtime Framework or Browser— initializes application and interprets the markup
Interaction Manager—coordinates modality components and provides application flow
Modality Components—provide modality capabilities such as speech, pen, keyboard, mouse
Data Model—handles shared data Interaction
Manager
(SCXML) XHTML VoiceXML 3.0 InkML Data
Model
Multimodal Architecture and Interfaces: Multimodal Architecture and Interfaces A loosely-coupled, event-based architecture for integrating multiple modalities into applications
All communication is event-based
Based on a set of standard life-cycle events
Components can also expose other events as required
Encapsulation protects component data
Encapsulation enhances extensibility to new modalities
Can be used outside a Web environment
XHTML VoiceXML 3.0 InkML Interaction
Manager
(SCXML) Data
Model
Specify Interaction Manager Using Harel State Charts: Specify Interaction Manager Using Harel State Charts Extension of state transition systems
States
Transitions
Nested state-transition systems
Parallel state-transition systems
History
Prepare
State Start
State WaitState EndState FailState Prepare
Response
(success) Start
Response Done
Success StartFail DoneFail Prepare
Response
(fail)
Example State Transition System : Example State Transition System State Chart XML (SCXML)
…
andlt;state id='PrepareState'andgt;
andlt;send event='prepare' contentURL='hello.vxml'/andgt;
andlt;transition event='prepareResponse' cond='status='success'' target='StartState'/andgt;
andlt;transition event='prepareResponse' cond='status='failure'' target='FailState'/andgt;
andlt;/stateandgt;
… Prepare
State Start
State WaitState EndState FailState Prepare
Response
(success) Start
Response Done
Success StartFail DoneFail Prepare
Response
(fail)
Example State Chart with Parallel States : Example State Chart with Parallel States Prepare
Voice Start
Voice Wait
Voice End
Voice Fail Voice Prepare
Response
Success Start
Response Done
Success Start Fail Done Fail Prepare
GUI Start
GUI Wait
GUI End
GUI Fail GUI Prepare
Response
Success Start
Response Done
Success Start Fail Done Fail Prepare
Response
Fail Prepare
Response
Fail
The Life Cycle Events: The Life Cycle Events
More Life Cycle Events: More Life Cycle Events Interaction
Manager GUI VUI newContextRequest newContextRequest newContextResponse newContextResponse Interaction
Manager GUI VUI data data Interaction
Manager GUI done Interaction
Manager GUI VUI clearContext clearContext
Synchronization Using the Lifecycle Data Event: Synchronization Using the Lifecycle Data Event Intent-based events
Capture the underlying intent rather than the physical manifestation of user-interaction events
Independent of the physical characteristics of particular devices Data/reset
Reset one or more field values to null
Data/focus
Focus on another field
Data/change
Field value has changed
Interaction
Manager GUI VUI data data
Lifecycle Events between Interaction Manager and Modality : Interaction Manager
Lifecycle Events between Interaction Manager and Modality
Modality Prepare
State Start
State WaitState EndState FailState Prepare
Response
Success) Start
Response Done
Success Start Fail DoneFail Prepare
Response
Fail prepare prepare response (success) start start response (success) data done prepare response (failure) start response (failure)
MMI Architecture Principles: MMI Architecture Principles Runtime Framework communicates with Modality Components through asynchronous events
Modality Components don’t communicate directly with each other, but indirectly through the Runtime Framework
Components must implement basic life cycle events, may expose other events
Modality components can be nested (e.g. a Voice Dialog component like a VoiceXML andlt;formandgt;)
Components need not be markup-based
EMMA communicates users’ inputs to the Interaction Manager
Modalities: Modalities GUI Modality (XHTML)
Adapter converts Lifecycle events to XHTML events
XHTML events converted to lifecycle events XHTML VoiceXML 3.0 Interaction
Manager
(SCXML) Data
Model Voice Modality (VoiceXML 3.0)
Lifecyle events are embedded into VoiceXML 3.0
Exercise 12: Exercise 12 What should VoiceXML do when it receives each of the following events?
Reset
Change
Focus
Modalities: Modalities VoiceXML 3.0 will support lifecycle events.
andlt;formandgt; andlt;catch name='change'andgt; andlt;assign name='city' value='data'/andgt; andlt;/catchandgt;
…
andlt;field name = 'city'andgt; andlt;promptandgt; Blah andlt;/promptandgt; andlt;grammar src='city.grxml'/andgt; andlt;filledandgt; andlt;send event='data.change' data='city'/andgt; andlt;/filledandgt; andlt;/fieldandgt;
andlt;/formandgt; XHTML VoiceXML 3.0 Interaction
Manager
(SCXML) Data
Model
Exercise 13: Exercise 13 What should HTML do when it receives each of the following events?
Reset
Change
Focus
Modalities: Modalities XHTML is extended to support lifecycle events sent to a modality.
andlt;headandgt; … andlt;ev:Listener ev:event='onChange' ev:observer='app1' ev:handler='onChangeHandler()';andgt; … andlt;scriptandgt; {function onChangeHandler() post ('data', data='city') } andlt;/scriptandgt; andlt;/headandgt;
…
andlt;body id='app1'? andlt;input type='text' id=city 'value= ' '/andgt; andlt;/bodyandgt;
… XHTML VoiceXML 3.0 Interaction
Manager
(SCXML) Data
Model
Modalities: Modalities XHTML is extended to support lifecycle events sent to the interaction manager
andlt;headandgt; … andlt;handler type='text/javascript' ev:event='data' if (event='change' {document.app1.city.value='data.city'} andlt;/handlerandgt; … andlt;/headandgt;
…
andlt;body id='app1'? andlt;input type='text' id='city' value=' '/andgt;
andlt;/bodyandgt; … XHTML VoiceXML 3.0 Interaction
Manager
(SCXML) Data
Model
References: References SCXML
Second working draft available at http://www.w3.org/TR/2006/WD-scxml-20060124/
Open Source available from http://jakarta.apache.org/commons/sandbox/scxml/
Multimodal Architecture and Interfaces
Working draft available at http://www.w3.org/TR/2006/WD-mmi-arch-20060414/
Voice Modality
First working draft VoiceXML 3.0 scheduled for November 2007
XHTML
Full recommendation
Adapters must be hand-coded
Other modalities
TBD
Comparison: Comparison Object- oriented X+V W3C Standard Languages SRGS VoiceXML SCXML
SISR SRGS SRGS
SSML SSML VoiceXML
SISR SSML
XHTML SISR
XHTML
EMMA
CCXML
Interaction Manager C# XHTML SCXML
Modes GUI GUI GUI
Speech Speech Speech
Ink
…
Availability: Availability SAPI 5.3
Microsoft Windows Vista®
X+V
ACCESS Systems’ NetFront Multimodal Browser for PocketPC 2003
http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb
Opera Software Multimodal Browser for Sharp Zaurus http://www-306.ibm.com/software/pervasive/ multimodal/?Openandamp;ca=daw-prod-mmb
Opera 9 for Windows http://www.opera.com/
W3C
First working draft of VoiceXML 3.0 not yet available
Working drafts of SCXML are available; some open-source implementations are available
Proprietary APIs
Available from vendor
Discussion Question: Discussion Question Should a developer insert SALT tags or X+V modules into an existing Web page without redesigning the Web page?
Conclusion: Conclusion Multimodal applications offer benefits over today’s traditional GUIs.
Only use multimodal if there is a clear benefit.
Standard languages are available today to develop multimodal applications.
Don’t reinvent the wheel.
Creativity and lots of usability testing are necessary to create world-class multimodal applications.
Web Resources: Web Resources http://www.w3.org/voice
Specification of grammar, semantic interpretation, and speech synthesis languages
http://www.w3.org/2002/mmi
Specification of EMMA and InkML languages
http:/www.microsoft.com (and query SALT)
SALT specification and download instructions for adding SALT to Internet Explorer
http://www-306.ibm.com/software/pervasive/multimodal/
X+V specification; download Opera and ACCESS browsers
http://www.larson-tech.com/SALT/ReadMeFirst.html
Student projects using SALT to develop multimodal applications
http://www.larson-tech.com/MMGuide.html or http://www.w3.org/2002/mmi/Group/2006/Guidelines/
User interface guidelines for multimodal applications
Status of W3C Multimodal Interface Languages: Working Draft Recommendation Status of W3C Multimodal Interface Languages Proposed
Recommendation Candidate
Recommendation Last Call
Working Draft Requirements
Voice
XML 2.0
Speech
Recog-
nition
Grammar
Format
(SRGS)
1.0
Speech
Synthesis
Markup
Language
(SSML)
1.0 Extended
Multi-
modal
Interaction
(EMMA)
1.0 Semantic
Interpret-
ation
of
Speech
Recog-
nition
(SISR)
1.0 State
Chart
XML (SCXML)
1.0 InkXL
1.0
Voice
XML 2.1
Questions: Questions
?
Answer to Exercise 5: Answer to Exercise 5
Answer to Exercise 7Write a grammar for zero to nineteen: Answer to Exercise 7 Write a grammar for zero to nineteen andlt;grammar type = 'application/srgs+xml' root = 'zero_to_19' mode = 'voice'andgt; andlt;rule id = 'zero_to_19'andgt; andlt;one-ofandgt; andlt;ruleref uri = '#single_digit'/andgt;
andlt;ruleref uri ='#teens'andgt;
andlt;/one-ofandgt; andlt;/ruleandgt;
andlt;rule id = 'single_digit'andgt; andlt;one-ofandgt; andlt;itemandgt; zero andlt;/itemandgt; andlt;itemandgt; one andlt;/itemandgt; andlt;itemandgt; two andlt;/itemandgt; andlt;itemandgt; three andlt;/itemandgt; andlt;itemandgt; four andlt;/itemandgt; andlt;itemandgt; five andlt;/itemandgt; andlt;itemandgt; six andlt;/itemandgt; andlt;itemandgt; seven andlt;/itemandgt; andlt;itemandgt; eight andlt;/itemandgt; andlt;itemandgt; nine andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt;
andlt;rule id = '#teens'andgt; andlt;one-ofandgt; andlt;itemandgt; tenandlt;/itemandgt;
andlt;itemandgt; eleven andlt;/itemandgt; andlt;itemandgt; twelve andlt;/itemandgt; andlt;itemandgt; thirteen andlt;/itemandgt; andlt;itemandgt; fourteen andlt;/itemandgt; andlt;itemandgt; fifteen andlt;/itemandgt; andlt;itemandgt; sixteen andlt;/itemandgt; andlt;itemandgt; seventeen andlt;/itemandgt; andlt;itemandgt; eighteen andlt;/itemandgt; andlt;itemandgt; nineteen andlt;/itemandgt; andlt;/one-ofandgt; andlt;/ruleandgt;
andlt;/grammarandgt;
Answer to Exercise 8: Answer to Exercise 8 andlt;grammar type = 'application/srgs+xml' root = 'yes' mode = 'voice'andgt;
andlt;rule id = 'yes'andgt; andlt;one-ofandgt; andlt;itemandgt; yes andlt;/itemandgt; andlt;itemandgt; sure andlt;/itemandgt; andlt;itemandgt; affirmative andlt;/itemandgt;
…
andlt;/one-ofandgt; andlt;/ruleandgt;
andlt;/grammarandgt;
Answer to Exercise 9: Answer to Exercise 9 andlt;grammar type = 'application/srgs+xml' root = 'yes' mode = 'voice'andgt;
andlt;rule id = 'yes'andgt; andlt;one-ofandgt; andlt;itemandgt; yes andlt;/itemandgt; andlt;itemandgt; sure andlt;tagandgt; out = 'yes' andlt;/tagandgt; andlt;/itemandgt; andlt;itemandgt; affirmative andlt;tagandgt; out = 'yes' andlt;/tagandgt; andlt;/itemandgt; …
andlt;/one-ofandgt; andlt;/ruleandgt;
andlt;/grammarandgt;
Answer to Exercise 10: Answer to Exercise 10 andlt;interpretation mode = 'speech'andgt; andlt;moneyTransferandgt; andlt;sourceAcct hook='ink'/andgt; andlt;targetAcct hook='ink'/andgt; andlt;amountandgt; 300 andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt; andlt;interpretation mode = 'ink'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; savings andlt;/sourceAcctandgt; andlt;targetAcctandgt; checkingandlt;/targetAcctandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt;
Given the following two EMMA specifications,
what is the unified EMMA specification? andlt;interpretation mode = 'intp1'andgt; andlt;moneyTransferandgt; andlt;sourceAcctandgt; savings andlt;/sourceAcctandgt; andlt;targetAcctandgt; checkingandlt;/targetAcctandgt; andlt;amountandgt; 300 andlt;/amountandgt; andlt;/moneyTransferandgt; andlt;/interpretationandgt;
Answer to Exercise 11: Answer to Exercise 11 andlt;html xmlns= 'http://www.w3.org/1999/xhtml' xmlns:vxml= 'http://www.w3.org/2001/vxml' xmlns:ev= 'http://www.w3.org/2001/xml-events' xmlns:xv='http://www.w3.org/2002/xhtml+voice'andgt;
andlt;headandgt; andlt;xv:sync xv:input='in4' xv:field='#answer'/andgt; andlt;vxml:form id= 'stateForm'andgt; andlt;vxml:field name= 'state' xv:id= 'answer'andgt; andlt;vxml:promptandgt;Say a state nameandlt;/vxml:promptandgt; andlt;vxml:grammar src = 'state.grxml'/andgt; andlt;/vxml:fieldandgt; andlt;/vxml:formandgt; andlt;/headandgt;
andlt;body andlt;form ev:event='load' ev:handler='#stateForm'andgt; Result: andlt;input type='text' name='in4'/andgt; andlt;/formandgt; andlt;/bodyandgt;
andlt;/htmlandgt;
Exercise 12: Exercise 12 What should HTML do when it receives each of the following events?
Reset
Reset the value
Change
Change the value
Focus
Prompt for the value now in focus
Exercise 13: Exercise 13 What should HTML do when it receives each of the following events?
Reset
Reset the value
Author decides if cursor should be moved to the reset value
Change
Change the value
Author decides if cursor should be moved to the reset value
Focus
Move the cursor to the item in focus