td06030

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

6th MCM of COST 290 Action, BERN, Switzerland, May 9-10, 2006 : 

6th MCM of COST 290 Action, BERN, Switzerland, May 9-10, 2006 Speech quality and Word-Level intelligibility in wireless. The critical impact of lost packets TD(06)030 Algimantas Kajackas, Aurimas Anskaitis, Darius Guršnys Vilnius Gediminas Technical University Telecommunications Engeneering Department E-mail: tmc@el.vtu.lt

Outline: 

Outline Introduction Phone speech quality Intelligibility of speech Models of voice packet loss in GSM Experiment. Substitution of frames and word intelligibility Conclusions and future work

Introduction: 

Introduction Packet loss is a key factor determining the quality of many multimedia applications. These applications experience degradation in quality with increasing packet loss and delay in the network. Performed research shows that sparse lost packets only slightly worsen speech quality. Voice quality is deteriorated much more when lost packets are bursty. In such conditions parts of words or even complete words may be lost.

Phone speech quality: 

Phone speech quality The first in ITU P.800 recommendation described method for subjective speech quality evaluation was MOS (Mean-Opinion-Score). The algorithms like PSQM and PESQ provide an objective MOS-equivalent score for a voice call.

Speech material for quality tests: 

Speech material for quality tests According to ITU P.800 the speech material for quality tests should consist of simple, meaningful, short sentences, chosen at random as being easy to understand (from current non-technical literature or newspapers, for example). Each sentence when spoken should fit into a time-slot of 2–3 seconds. Examples of sentences are (TABLE B.1/P.800): There was nothing to be seen; I want a minute with the inspector.

Quality of the speech: 

Quality of the speech The MOS method and objective MOS-equivalent methods are designed for the quality evaluation experiments when analyzing new coding or transmission methods and devices and resulting voice quality impairments. The following features of speech can be assessed: overall quality, sound quality; difficulty in talking or listening, dialogue capability; echo performance; influence of the background noise.

Packet loss & speech quality : 

Packet loss & speech quality Many papers have reported results in the field of voice and speech-like MOS quality evaluation under different coding and transmission conditions. Some works have been carried out concerning the effects of packet loss on speech quality. ETSI TIPHON project offers the following quality grades for IP telephony voice packet loss: < 0,5% for class 1 = gold, from 0,5% to 1% for class 2 = silver, and 1% to 2% for class 3 = bronze.

Packet loss & speech quality: 

Packet loss & speech quality Voice quality is deteriorated much more when lost patterns are bursty. Parts of words or even complete words may be lost. Duration of such erasures is between few tens of milliseconds to few seconds.

MOS. In summary 1: 

MOS. In summary 1 ITU standards were designated for evaluation of overall speech quality. MOS evaluations are sufficient when the purpose is investigation of coding quality, comparative codec analysis or average quality of speech in mobile network. It is obvious that averaging is well suited when measurement conditions are stable (stationary).

MOS. In summary 2: 

MOS. In summary 2 MOS was not designed to measure extreme voice degradations when parts of words or even complete words are lost. MOS methodology is not sufficient when analysing human-mashine interface and speech quality when lost packets are dense.

Intelligibility of the speech: 

Intelligibility of the speech Speech intelligibility quantitative measure is the degree to which spoken speech can be understood. Speech intelligibility depends on a large variety of factors: For example distortions in frequency band 250-800 Hz is less important for speech intelligibility than in 1000-1200 Hz

Factors affecting Intelligibility: 

Factors affecting Intelligibility Acoustical Factors affecting Intelligibility in the room are: background noise, distance between speaker and listener, reverberation time and level, early reflections, echo interference.

Packet loss & Intelligibility: 

Packet loss & Intelligibility The sounds of speech are represented linguistically by phonemes. In terms of information theory, the question is then whether parts of the transmitted information, whether phonemes are getting lost. Diminished intelligibility is associated with a loss of information that is coded in a number of highly dependent elements, and many factors influence it.

Some Properties of Speech: Vowels and Consonants: 

Some Properties of Speech: Vowels and Consonants “ee” in “key” “o” in “spot” “oo” in “blue” “e” in “again” “s” in “spot” “k” in “key” Vowels: Quasi-periodic; Relatively high signal power Consonants: Non-periodic (random); Relatively low signal power Source. D. H. Crawford. Digital Signal Processing: An Introduction and Some Examples of its Everyday Use

Consonants & speech intelligibility: 

Consonants & speech intelligibility Consonants are the most important speech sounds (Voiers, 1977): they contain most of the important clues as to the identification of words. Miller and Licklider (1950) have reported that a monosyllabic word is likely to be perceived incorrectly if either its initial or final consonant is missing.

Speech Quality & Intelligibility: 

Speech Quality & Intelligibility The statistical intelligibility measurement process uses trained talkers speaking standardized word lists through the communication system to trained listeners. The word lists are crafted to evaluate specific aspects of speech transmission. The ability of the listeners to identify individual words or word pairs indicates the quality of the transmission.

Word lists for testing of Intelligibility: 

Word lists for testing of Intelligibility A number of specialized and standardized word lists are in common use for testing various aspects of speech communication: the Modified Rhyme Test (MRT), the Diagnostic Rhyme Test (DRT), the set of twenty Phonetically Balanced (PB) word lists.

The Stimulus Words of the MRT : 

The Stimulus Words of the MRT

Duration of the phonemes. The Statistics: 

Duration of the phonemes. The Statistics The duration of initial vowel of the target word V2 = 181- 211 ms. Mean duration of the vowel in the nonsense context syllable V1 = 110 – 151 ms The duration of stressed vowel 114 ms The duration of unstressed vowel 55 ms The consonant preceding a target word C2 = 69 – 76 ms; or only 40 ms. Transitions from vowel to consonant T0, Closure duration 76 ms

Working hypothesis: 

Working hypothesis It can be seen that words in MRT test differs only by one consonant, It is wise to quess that intelligibility tests will fail if begining of word signal would be lost, i.e. about 60 ms. or 3 packets. Research were performed to verify this hypothesis. Typical GSM lost packet patterns were determined and there is evedence that even one lost packet may change the meaning of word

Models of Voice Packets Loss in GSM: 

Models of Voice Packets Loss in GSM The rate of lost packets also depends on channel codes and consequently on parity bits. The experimental investigations of GSM frame-level error traces have been carried out at the reliable link layer (Radio Link Protocol – RLP). The models of lost packet series in GSM network were developed based on these RLP experimental results [1, 2]. [1] Ji P., Liu B. Towsley D., Ge Z., Kurose J. Modeling frame-level errors in GSM wireless channels. GLOBECOM 2002. vol.3. [2] Konrad A., Zhao B. Y., Joseph A. D., Ludwig R. A Markov-based Channel Model Algorithm for Wireless Networks. Wireless Networks, vol.9, no.3,2003/05.

Experimental test-bed : 

Experimental test-bed

The test signal: 

The test signal The test signal can be expressed: where i = 1, 2, …, 11, T = 20ms.

Experimentally collected frame erasure trace 1: 

Experimentally collected frame erasure trace 1

Experimentally collected frame erasure trace 2: 

Experimentally collected frame erasure trace 2

Experimentally collected frame erasure trace 3: 

Experimentally collected frame erasure trace 3

Rates of series of lost frames 1: 

Rates of series of lost frames 1

Rates of Series of Lost Frames 2  : 

Rates of Series of Lost Frames 2  

Rates of series of lost frames 3: 

Rates of series of lost frames 3

Lost voice packets in GSM. Summary: 

Lost voice packets in GSM. Summary In the GSM voice channel up to 12 or more voice packets may be lost. This translates to 20 - 240 ms of deterioriated speech in time domain. Question: May the meaning of word become ambiguous because of such erasure?

Eksperiment. Substitution of frames & word intelligibility: 

Eksperiment. Substitution of frames & word intelligibility

Conclusion and future work: 

Conclusion and future work MOS method hardly differentiates consequences of different lost packets on speech quality. Different parts of word signal have unequal value for word intelligibility. Research has revealed that even one coded packet in speech transmittion systems may be crucial for word understanding.