Home

Sample PESQ User Guide - Spirent Knowledge Base

1. eese rennen 39 Figure 22 Excitation of reference and degraded signals essere 40 Figure 23 Discontinuous transmission events eee en rennen 41 Figure 24 Example speech and voicing probability pitch and formant estimates 43 Figure 25 PESQ narrowband input filter characteristic eese 45 Figure 26 PESQ wideband input filter characteristic eese eene 45 Figure 27 Evaluation of quality with background noise eee nennen 55 Figure 28 Mapping between PESQ score and subjective condition MOS sess 57 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 8 of 73 Psytechnics PESO User Guide Release 2 1 Tables Table 1 Listening quality scale iii 20 Table 2 Typical PESQ scores for a range of conditions erre 23 Table 3 Signal level measures calculated separately for reference and degraded signals 34 Table 4 Level measures of the system under test eese eene 34 Table 5 Average and worst case correlation coefficient for 38 subjective tests known during PESQ development sub divided by test type esee ener cnn enne nnne 59 Table 6 Error distribution across all 38 known subjective tests eene 59 Table 7 Correlation coefficient 8 unknown subjective tests PESQ only esses 60 Table 8 Error distr
2. 44 0 1473 261 880 E mail info psytechnics com Web http www psytechnics com Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 3 of 73 Psytechnics PESO User Guide Release 2 1 This page has been left intentionally blank This page has been left intentionally blank Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 4 of 73 Psytechnics PESO User Guide Release 2 1 Contents T Introductio acetate sa ia tirs Pet Rui pda art DU lota EIU HIE obes Enie EnS Sorani 11 LE Aboutthis document eiie eet be tei ne Per I e e s 11 1 2 A guide to this document eee aee aee ettet fete de ie itae dece 12 1 37 Slee al notice 55 o D A teet tee PE ERI RERO e P Lee eere creen vetat EE esten 13 Uode M O O 15 2 PESQ as simple measurement device eee ee ee eee eee ee eee seen seen setate seta ases tense tese etse seen ases eaae 16 2 1 Overview PESO a dt bi 16 PAM E OO Beet 17 DD Spe ch signals tute ON 17 2 2 2 Sapling tate teo De eA de e E HERREN dde 17 2 2 3 Modelspecitication soria 17 2 3 Operations performed by PESO ii oe e et le eb deteriore E Oe to 18 244 Quality SCOFes iode iege e HEUTE Ie odore SEE SENT eee fes Ubros Cete Espere Ett 20 2 41 BESQ SCOtGe indc Moe eate deett etr ds 20 2 4 2 PESQ to MOS mappings eer ii re EUH pe top eh ke coget auf ella pegue eo ope 20 24 3 PESOTEO uet eee is 20 2 44 Relationship between PESQ score and PESQ LQ eese 21 24S PBOZ EE
3. 1 5 Standard deviation of delay is computed in units of seconds from the frame by frame delay used in PESQ File duration problem PESQ has been validated in the ITU T for use with signals up to 30 seconds Due to the precision available to the floating point arithmetic in PESQ once the signals being processed reach a certain length errors will start to be introduced in the signal energy calculation From our analysis it was found that signals with more than about 1 million samples will start to cause problems 60 seconds of 16kHz mono signal contains 960 000 samples and this would be a sensible threshold at which to apply a warning If the signal is at SKHz then potentially twice the length could be used However since P 862 has only been validated up to 30 seconds two separate warnings should be displayed one if the reference signal length exceeds 35 seconds and a second if the number of samples in the reference or degraded exceeds 960 000 Potential level alignment problems This issue has two effects depending on whether an utterance has been deleted or added to the degraded and whether a large amount of silence padding has been added to the degraded When an utterance has been deleted from or a large amount of silence padding has been added to the degraded signal the signal will be level shifted to a value above the optimum When an utterance has been added to the degraded signal the signal will be level shifted to a value below
4. Psytechnics PESO User Guide Release 2 1 the ITU T as giving the most relevant comparison between objective models and subjective scores and 1t was used in the calibration of PESQ 8 2 Correlation coefficient The closeness of the fit between PESQ and the subjective scores may be measured by calculating the correlation coefficient Normally this is performed on condition averaged scores after mapping the objective to the subjective scores in other words with data of the form plotted in Figure 28 b The correlation coefficient is calculated with Pearson s formula EXG xbi El x x 4 5 In this formula x is the condition MOS for condition i and x is the average of x A xy y is the r mapped condition averaged PESQ quality score for condition i and y is the average of y A yy For the data shown in Figure 28 b the correlation coefficient 720 988 Correlation coefficients for a number of subjective tests are given in the next section 8 3 Residual errors The mapping removes any systematic offset between the PESQ scores and the subjective MOS minimising the mean square of the residual errors X Yi Various measures may be applied to the residual errors to given an alternative view of the closeness of PESQ scores to subjective MOS Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 56 of 73 Psytechnics PESO User Guide Release 2 1 9 Performance of PESQ 9 1 Narrowband measurements PESQ was compa
5. masking From the positive and negative errors two disturbance parameters are calculated They are calculated as non linear averages over specific areas of the error surface These disturbance parameters are e the absolute symmetric disturbance a measure of absolute audible error e the additive asymmetric disturbance a measure of audible errors that are significantly louder than the reference This analysis gives two error parameters that summarise the amount of each type of audible error Finally the error parameters are converted to a quality score which is a linear combination of the average symmetric disturbance value and the average asymmetric disturbance value Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 19 of 73 Psytechnics PESO User Guide Release 2 1 2 4 Quality scores This release of PESQ returns three quality scores e PESQ score is calculated according to P 862 e PESQ LQ gives a quality score on a MOS like scale e P 862 1 is the ITU T standard mapping for PESQ to MOS like scale e PESQ Ie is the impairment factor Je which is an input to the E model The PESQ LQ P 862 1 and PESQ Ie scores are derived from the PESQ score using simple formulae The PESQ LQ mapping was developed by Psytechnics the P 862 1 mapping is defined as an ITU T recommendation relating directly to PESQ the PESQ Ie mapping is defined in ITU T Recommendation P 834 PESQ score PESQ LQ and P 862 1 are output in the file pesqlog txt and
6. 30 of 73 Psytechnics PESO User Guide Release 2 1 3 2 8 Error surface The error surface is the degraded sensation surface minus the reference sensation surface This means that errors that have added to the signal for example noise have positive values while parts of the signal that have been attenuated or muted have negative values The amplitude of errors is related to how audible and annoying they will be Examples of errors that may occur are listed here e Front end clipping causes large but short negative errors at the start of speech bursts e Muting can be seen as prolonged negative errors during speech where the degraded sensation surface falls close to zero e Addition of background noise shows up as positive error and is most obvious in silent periods e Coding distortion generally causes low level errors throughout speech bursts although this is very codec dependent e Bit or frame errors tend to cause localised distortion which may be positive or negative This effect is dependent on the codec and any error concealment algorithm used An example error surface is shown in Figure 12 3 2 9 Frame by frame delay statistics PESO Tools Only PESQ Tools provides statistics for the frame by frame delay values described in section 3 2 2 These statistics are e mean delay e maximum delay e minimum delay e standard deviation of delay e delay histogram The histogram comprises ten uniformly spaced bins wh
7. 73 Psytechnics PESO User Guide Release 2 1 6 Overview of subjective testing This section describes the subjective testing methods used to obtain the opinion scores that PESQ is calibrated to predict It is beyond the scope of this document to provide a full guide on designing and conducting subjective tests For more information you should consult the references listed in section 10 This gives the ITU T recommendations concerning subjective testing However it should be noted that there are certain differences between these recommendations and the methods in current use in the standards bodies such as ETSI What we describe here is focused towards the subjective methods used to gather data for calibrating PESQ based on best practice in standards related work 6 1 Listening and conversational testing Subjective testing aims to obtain a key benchmark of network performance based on the customers perception of speech quality Examples of the behaviours considered include low bit rate coding transcoding multiple coding stages and channel errors due to mobile or packet based transmission There are two distinct classes of telephony subjective test listening and conversational In listening tests subjects hear various distorted recordings and vote on their opinion of the quality after hearing each one Because there is no two way element of communication listening tests cannot fully model the effect of listening level talker ech
8. Guide is divided into four main divisions e The main User Guide page 15 e A section of background and advanced material page 47 e Supplementary sections including references and a glossary page 61 e Guidelines on how to use the sample information in this document to create end user documentation for different types of products that include PESQ or PESQ Tools page 65 The main User Guide contains three sections e Section 2 covers the use of PESQ as a simple measurement device which returns only a quality score e Section 3 covers use as an advanced speech quality analyser with a full set of features and outputs for use by trained individuals This section includes descriptions of the diagnostic outputs provided by the PESQ Tools option e Section 4 has material specific to the use of Psytechnics PESQ for evaluating Head and Torso HATS measurements or wideband telephony This is an extension to the P 862 standard The following sections cover background and advanced material that should be read for specific purposes e Section 5 contains instructions for creating speech signals for testing e Section 6 summarises techniques used for designing and conducting subjective listening tests the quality benchmark that PESQ is designed to model e Section 7 provides guidance on testing the performance of systems in the presence of background noise e Section 8 outlines the methods use to compare objective and subjective scores
9. are listed in the User Guide section 3 2 8 Results option Error surface User Guide section 3 2 8 The error surface is not provided explicitly by PESQ but is simply calculated as the degraded surface minus the reference surface You are encouraged to show the error surface along side the sensation surfaces See section A 4 7 for further information Results option Frame by frame delay statistics User Guide section 3 2 9 PESQ Tools only If PESQ Tools is available you may wish to provide the frame by frame delay statistics for profile 1 in addition to profiles 2 and 3 Results option Utterance by utterance delay User Guide section 3 2 10 PESQ Tools only You may wish to offer a view of the utterance by utterance delay using the utterance delay start end and confidence information Results option Utterance by utterance level User Guide section 3 2 11 PESQ Tools only You may wish to offer a view of the utterance by utterance level using the utterance level start end and confidence information This can be useful in diagnosing some advanced network processes such as adaptive level control Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 69 of 73 Psytechnics PESO User Guide Release 2 1 A 4 12 Results option Signal level and gain measures User Guide section 3 2 12 PESQ Tools only The levels of speech and noise and the gain of the system are interesting for many applications and you are encouraged to offe
10. are quoted to two decimal places 2 4 1 PESO score PESQ returns a quality score known as PESQ score which conforms to ITU T P 862 PESQ score lies on a scale from 0 5 to 4 5 though in most cases it is between 1 and 4 5 PESQ score correlates with subjective quality 2 4 2 PESQ to MOS mappings It has been found that PESQ score is consistently higher than subjective MOS for poor quality conditions In order to deliver an objective MOS score which is more closely aligned with subjective MOS a simple mapping can be applied This mapping aligns the PESQ output scale to the subjective test scale obtained from ITU T P 800 listening quality tests This is reproduced in Table 1 along with the prompt that is given to subjects Listening quality scores lie between 1 and 5 PESQ LQ score lies between 1 0 and 4 5 This is because 4 5 is usually the maximum obtained in a subjective test Table 1 Listening quality scale Quality of the speech Excellent Good Fair Poor Bad N Aj The score gives a measure of customers perception of quality The highest score 4 5 means that no distortion is measured As the amount of distortion increases the quality falls For more information on how to compare PESQ scores to subjective test data see section 8 2 4 3 PESQ LQ Psytechnics have analysed this using a very large number of subjective tests To make it easier to compare PESQ score with MOS a second q
11. as codecs This release of Psytechnics PESQ provides a fully conformant implementation of PESQ as defined in ITU T P 862 Additionally it provides extensions to allow PESQ to be used with wideband telephony or head and torso simulator HATS measurements The ITU T selection process that resulted in the standardisation of PESQ involved a wide range of conditions with demanding correlation requirements set to ensure that it has good performance in assessing conventional fixed and mobile networks and packet based transmission systems Figure 1 Using PESQ Reference signal PESQ quality Distorting SERIE system PESQ Other distortion measures Test degraded signal PESQ takes into account the following sources of signal degradation coding distortions errors packet loss delay and variable delay and filtering in analog network components PESQ does not take into account the subjective effect of level changes in the network echo and the effect of round trip delay on conversation Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 16 of 73 Psytechnics PESO User Guide Release 2 1 2 2 Inputs 2 2 1 Speech signals PESQ requires two inputs the original the unprocessed test signal and the degraded version that has been passed through the distorting system In addition the model needs to know the sampling rate of these files which may be either 8kHz or 16kHz The test signal should be speech like This is important because
12. cnnn no En enne n naar nr entes tente tenentes enne ens 59 9 2 Widebarid Measurements iii arder EENE ER ON e tb eere DA Cas 60 Supplementary Information 0000 scsscssovssensssossscssossssencsossssdonsesnsdsssensessssasendessesesdonsesenssesssensesecsessosonasss 61 10 References IPR E M 62 10 1 Objective speech quality assessment ssseessssesseseseeeene eene eene nennt nente sternit enne 62 10 2 Subjective testing o eee erteilen tet sett heel a PU ERE 62 10 3 Statistics ion ote SOOO eie ne EET eue breeds aite is 63 Ber rim seca secendds 64 Guidelities i c rt pesela ene penhora iov eset i a laa Ut equi cue Osa EVE E CF UR VERDE YE MEI Ue pese aspas ada 65 A Guidelines for the use of the sample user guide by the licensee e eeeeeeee eere eere 66 AS Introductionis Lee D RN ee PEE RR REIR E ER ERR Ue RE RE 66 AD PESO TOGS 1 i ree deer tee e Pe Fee bi e ede diee dee e Colinas olde Lotes iacet 66 A 3 Inputs and Outputs for basic use of PESQ sessssseesseseeeeeeeeere nennen ener enne 67 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 6 of 73 Psytechnics PESO User Guide Release 2 1 A 3 1 T putoption Speech mala a ai iran 67 A 3 2 Input option Sampling Rate oononncononeoconoccnrccnncnoreconncninc conectando nba cren antenne nns there 67 A 3 3 Input option Model specification eese nennen nennen en
13. due not to the quality of the conditions but to their presentation order Language dependence Subjects are normally native speakers of the language used in a test If language dependence is to be evaluated subjectively it is necessary to use a pool of subjects of different nationalities or to conduct tests in several countries On the available evidence it appears that PESQ performs well for subjective tests conducted in several languages or language groups Number and population of subjects A telephony listening test normally uses at least 16 subjects 24 is a common number of subjects in standards work They should be untrained and should not have participated in another test within the last year Typically subjects are selected at random from an adult population and should ideally cover a representative range of ages and be approximately equally split between the sexes Averaging across the votes of this population aims to control possible preference effects for example due to gender or age Balance The conditions in the test should cover a broad range of quality Although MNRU references help to ensure this the other conditions should be chosen to include several different levels of audible distortion Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 52 of 73 Psytechnics PESO User Guide Release 2 1 6 3 Processing of speech material SAO E E dev do Although the methods used to process material for a subjective test are beyond the s
14. e Some results on the performance of PESQ calculated according to these methods are presented in section 9 The supplementary material includes e References for further reading in section 10 e A glossary of technical terms in section 11 The guidelines includes e Classification of different PESQ usage profiles e Notes on how to use the sample documentation for different application profiles Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 12 of 73 Psytechnics PESO User Guide Release 2 1 1 3 Legal notice Performance results reported in this documentation represent actual results obtained by Psytechnics Psytechnics does not warrant that the indicated results will be obtained in every test scenario All warranties with respect to PESQ remain as stated in the applicable licence agreement between Psytechnics and the Licensee Nothing in this document is to be interpreted as varying the terms of the licence agreement either expressly or by implication Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 13 of 73 Psytechnics PESO User Guide Release 2 1 This page has been left intentionally blank This page has been left intentionally blank Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 14 of 73 Psytechnics PESO User Guide Release 2 1 User Guide Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 15 of 73 Psytechnics PESO User Guide Release 2 1 2 PESO as simple measurement device 2 1 Overview of PESO Modern communications n
15. muting algorithms and discontinuous transmission These outputs are generated by comparing the degraded signal to the reference signal Muting of a signal typically occurs when an error concealment algorithm at a receiver has insufficient information to replace missing or corrupted data The muting estimate is provided in terms of the proportion of signal frames that have been muted by the system under test Discontinuous transmission DTX schemes aim to increase transmission efficiency by ceasing transmission during periods of talker inactivity Applications of DTX include increasing battery life reducing interference or freeing transmission capacity Temporal clipping occurs when the voice activity detection VAD algorithm in a DTX system misclassifies part of a speech utterance as noise and replaces it with comfort noise at the receiver Front end clipping refers to the case where the start of an utterance has been clipped back end clipping refers the case where the end of an utterance has been clipped Hangover is a term applied to the period after the end of an utterance when a discontinuous transmission scheme continues to transmit as normal rather than generating comfort noise These different events are shown diagrammatically in Figure 23 Figure 23 Discontinuous transmission events actual talker activity speech no speech discontinuous transmission state transmission comfort noise front end back end hangover cli
16. provided in addition to the basic information described in A 3 and will typically apply to profiles 2 and 3 It should be included in your user documentation when your product offers the corresponding input or output to the user Unless otherwise stated you may provide the following information in profiles 2 and 3 The PESQ Tools option greatly extends that range of diagnostic outputs provided by PESQ The use of PESQ Tools is therefore highly recommended for Profile 2 Input option Model specification User Guide section 3 1 You may include the further discussion of changes in release 1 4 that are presented in section 3 2 10f the User Guide Results option Frame by frame Delay User Guide section 3 2 2 We recommend that you present a graph showing the frame by frame delay for example as shown in section 3 2 2 of the User Guide Alternatively you may plot a histogram of the frame by frame delay The words time offset may be used in your documentation instead of or in addition to delay Results option Bark scale transfer function User Guide section 3 2 3 You should display the Bark scale transfer function estimate for products in profile 3 and you may wish to offer it for profile 2 Results option Perceptual parameters User Guide section 3 2 4 The symmetric and asymmetric disturbance values may be presented graphically frame by frame Alternatively you may also present the average symmetric and asymmetri
17. specification Release 1 4 introduced a small modification to the perceptual model that may lead to small changes in PESQ score This improves the performance of PESQ for cases where the reference signal is very quiet during silent periods If it is essential to obtain scores that exactly match those obtained by previous releases of PESQ the version 1 0 model can be selected using the appropriate switch We recommend that the release 1 4 method is used by default See section 3 2 1 for more details Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 17 of 73 Psytechnics PESO User Guide Release 2 1 2 3 Operations performed by PESQ The processing carried out by PESQ is illustrated in Figure 2 Figure 2 Processing performed in PESQ Reference Signal Speech Quality Prediction Degraded Signal Re align bad intervals The model includes the following stages Level alignment In order to compare the signals the reference speech signal and the degraded signal should be at the same constant power level This is necessary because the reference signal does not have to be to be at a defined level and because the gain of the system under test is unknown before testing PESO assumes that the subjective listening level is a constant 79dB SPL at the ear reference point ITU T P 830 section 8 1 2 A gain is applied to both the reference and degraded signals to bring them to this level Input filtering PESQ models the receive path of t
18. such technologies as codecs are designed to transmit speech Simple synthetic signals such as sine waves or white noise may not give results that relate to customers perception of the system s speech quality The reference signal should be filtered with an appropriate send filter before injection in a network under test This will usually be a modified IRS Send filter ITU T P 48 This filtered reference signal should then be used as an input to the PESQ algorithm We recommend that you use the Psytechnics artificial speech like test signal ASTS which is available as an optional addition to PESQ This reproduces the key temporal spectral and sequence properties of speech with less redundancy than natural speech allowing greater confidence with shorter measurements If you intend to use natural recorded speech you should first read Section 5 Care should be taken to ensure that the signal has been filtered and is at the correct level before entering the network so that it 1s representative of signals transmitted from a telephone handset 2 2 2 Sampling rate PESQ is able to process input material at SKHz or 16kHz sample rates The 8kHz version of PESQ is faster and requires less memory than the 16kHz version Both input files must be at the same sample rate In certain applications the sampling rate for the PESQ application may be fixed If the sampling rate can be changed 1t is essential that the correct value is specified 2 2 3 Model
19. that include PESQ or PESQ Tools The guidelines begin by defining three PESQ usage profiles The guidelines then follow the structure of the user guide to describe which inputs and outputs are recommended for use in the different profiles Cross references to the appropriate sections of the sample PESQ User Guide are provided Provided that a Licensee continues to pay PESQ royalties to Psytechnics the Licensee may include the text of the User Guide in their own User Guides as appropriate The text should be modified as described in these guidelines to suit the requirements of the Licensee s product A 1 Introduction PESQ can be used in different types of application and by people with different requirements and levels of knowledge For simplicity we have identified three usage profiles In creating your own user guide you should base it on the profile that your own application most closely matches The usage profiles are e Profile 1 PESQ as a simple measurement device where there may be a choice to process speech sampled at either SKHz or 16kHz This returns only quality scores e Profile 2 PESQ as an advanced speech quality analyser with a full set of features and outputs for use by trained individuals e Profile 3 Use of Psytechnics PESQ in Head and Torso HATS measurements and in wideband telephony This is an extension to the P 862 standard The concept of profiles is provided for guidance and it is left to licensees to ch
20. the clean speech These permutations are shown in Figure 27 Figure 27 Evaluation of quality with background noise A B C Reference Reference Reference Noise Noise This makes it possible to investigate the effects of the noise alone B the performance of the transmission system alone C or the performance of the system while transmitting noisy speech D A provides a simple check of the baseline quality and may be omitted with PESQ Comparison of E with B and D provides another way for establishing the effect of the system on the noise 7 2 Subjective testing with background noise It is not possible for this document to fully describe the methods used in subjective testing with noisy speech We can only summarise some of the available techniques and outline a typical test design Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 55 of 73 Psytechnics PESO User Guide Release 2 1 The choice of subjective opinion scale and voting method is critical This usually affects the results of a test because as noted above the quality prompt can influence the votes given to different conditions even changing their ordering The ACR listening quality method may be used for background noise testing In this case the noise is one type of degradation that the subjects vote on The listening quality method appears to be more sensitive to noise compared to coding distortions than listening effort In an ACR test the effect
21. to the reference to model a telephone handset and a wideband filter is applied to the degraded file as the HATS recording will automatically include the handset path The wideband filter used for HATS measurements has a lower gain than the filter used in the wideband model but its frequency response otherwise has the same shape Figure 26 PESQ wideband input filter characteristic 20 T T T T T 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency Hz Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 45 of 73 Psytechnics PESO User Guide Release 2 1 This page has been left intentionally blank This page has been left intentionally blank Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 46 of 73 Psytechnics PESO User Guide Release 2 1 Background and Advanced Information Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 47 of 73 Psytechnics PESO User Guide Release 2 1 5 Notes on speech signals This section provides background material on speech signals and essential information on creating and using your own test files 5 1 1 Properties of test signals Networks may treat speech and silence differently and often behave in a way that is dependent on the signals passing through them In designing a test signal it is essential to consider the following factors e Temporal structure speech and silent periods e Level and frequency content e Source material natural or artificial speech e Duration of an indi
22. unknown subjective tests PESQ only Note test 3 was excluded as the data for this test was per file only Absolute error range 0 25 0 5 0 75 1 0 1 25 errors in range PESQ 72 3 91 1 97 8 100 0 100 0 9 2 Wideband measurements Wideband PESQ is a Psytechnics extension Results of tests on wideband PESQ were reported in Rix Proposed modification to Draft P 862 see reference in Section 10 The results below summarise the correlation between measurements using PESQ and subjective tests In all cases the subjects listened binaurally through wideband headphones The performance of wideband PESQ was assessed against four subjective experiments 1 Narrowband and wideband MNRU conditions and CELP codecs 2 Narrowband 8kHz sample rate conditions only MNRU CELP codecs and three packet loss conditions for each CELP codec 3 The same structure as experiment 2 but with all of the conditions wideband 4 Four different families of codecs at between 8 and 64 kbit s along with MNRU references at three different sample rates 8kHz 11 025kHz and 16kHz The results are summarised in Table 9 which presents the correlation of wideband PESQ with subjective MOS for each of the four wideband speech experiments For all of these experiments wideband PESQ shows high correlation with subjective quality It should be noted that wideband PESQ has not been validated with any conditions containing additive background noise Tab
23. 22 2 4 6 Relationship between raw PESQ score and P 862 1 sess 22 24 7 Iypiealquality ScOtes nere e CR RR P re eR ens 23 2 4 8 PESQ le mapping etri erect et ee iti Le tese ee UR Ei eb ib otto tent 23 249 gt PESQ Usage warnings ii io cere eere eee E erbe eee olet ode ete Ee cet Babee e etes 24 E A O dada sent aantaze 25 A Input options en nein utr eed e E de bee Uh get tete det tne d Savana ge tiene ees edge e eee Sondas 25 SAEC C RO 26 3 201 PESO Scores 27 3 22 Frame by strame delay iii mil dedi inl ae delete cette 27 3 2 3 Bark scale transfer function ini I ce daa 28 3 24 Perceptual parameters eet e e ene eri ee bo Lee trial adan leethadeen 28 3 25 Erame by framie SCOtE Uttar ede Dee Sade tbe Cope tre ep Pres Pee coh E Pee buena repe 29 3 2 00 Signal waveforms nonien ete e Pe e eoe ee tute pae ge cute a ceri se taste eter edle 29 3 2 1 Sensation Surfaces aa said aa 30 3 2 8 SO Ret pee e ON RON 31 3 2 9 Frame by frame delay statistics eee sessi nennen nennen nennen enne 31 3 2 10 Utterance by utterance delay measures eese ener rennen 32 3 2 11 Utterance by utterance level cecilia ttt reete P cere beate e darte eU 33 3 2 12 Signal level and gati 1o eet e e de te sette ria aaa 34 3 2 13 Bark signal Spectra serie ee iere re oeil dai m End 35 3 2 I4 Linear special 36 3 2 15 Transfer function estimation iii eee e d
24. 4 8 PESQ Ie mapping The PESQ Ie score is the impairment factor Je which is an input to ITU T G 107 E model The PESQ Ie score uses a scale from 0 to 140 and is calculated from the PESQ score using the relationship shown by the following graph Figure 5 Mapping between PESQ score and PESQ Ie PESQ score The mapping from PESQ score to PESQ Ie is defined in ITU T Recommendation P 834 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 23 of 73 Psytechnics PESO User Guide Release 2 1 2 4 9 PESQ Usage warnings These warnings are designed to indicate when the scores returned by P 862 maybe unreliable Psytechnics has implemented these warnings in the pesqmain c main module This section of the code can be used in its original form for standalone executables or as an example of how to generate the warning for a PESQ library build Possible time alignment failure warning There are certain situations for example when the degraded file does not originate from the reference and therefore contains different speech or just circuit noise where the time alignment in PESQ will incorrectly estimate the delay between the reference and degraded signals When this happens it is possible that PESQ will returns scores that are inappropriately high The following test is used to assess whether the reference and degraded files may not be related Delay confidence 0 3 and standard deviation of delay gt 0 05 and raw PESQ score gt
25. PESQ quality score for each condition for a subjective test on mobile codecs This clearly shows that there is a simple relationship between PESQ score and MOS Figure 28 Mapping between PESQ score and subjective condition MOS a Raw condition average scores b Mapped condition average scores 4 5 Condition average PESQ score Mapped condition average PESQ score 1 1 5 2 2 5 3 3 5 4 4 5 1 1 5 2 2 5 3 3 5 4 4 5 Condition MOS Condition MOS This relationship between PESQ score and MOS is modelled using a monotonic cubic polynomial The solid line in Figure 28 a shows this polynomial function The polynomial can then be applied to map the PESQ scores for each condition onto the same scale as MOS in this test Figure 28 b shows the same subjective condition MOS plotted against the mapped PESQ scores illustrating how the mapping works All of this analysis is normally performed with condition averages of both objective and subjective scores The mapping should be constrained to be monotonic across the range of the data otherwise it will not preserve the ordering of the objective scores A different mapping is required for each subjective test to take account of the differences outlined above Psytechnics recommend this method of using a monotonic cubic polynomial optimised for minimum mean squared error to map between subjective and objective scores This method has been accepted in Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 57 of 73
26. Page 26 of 73 Psytechnics PESO User Guide Release 2 1 3 2 1 PESQ Score In release 1 4 a small modification was introduced to the PESQ perceptual model which affects PESQ score in some cases The new model gives identical scores to the old model in most circumstances where natural speech recordings are used However the new model gives higher and more accurate scores in cases where the reference signal is very quiet during silent periods for example if it includes digital silence In these cases the difference in PESQ score between the two models has been found to be as large as 0 25 Psytechnics recommend that for normal network measurement purposes the new model introduced in PESQ release 1 4 should be used However there may be circumstances for which the results obtained with previous versions of PESQ must be reproduced exactly For these cases the old backwards compatible model may be used by making the appropriate switch The default option is the PESQ release 1 4 model 3 2 2 Frame by frame delay An overview of the PESQ time alignment operation is given in section 2 3 It generates two sets of results the utterance by utterance and frame by frame delay values Frame by frame delay is the delay measure used in calculating the PESQ quality score Utterances are broken up into frames of 32 ms duration Frames use a window function that gives greater weight to the central 16ms of each frame and there is an overlap between suc
27. Q should be the original file before any processing was applied and with no background noise added See section 7 for more information on using PESQ to test quality in the presence of noise It s important to note that real speech signals that are passed through networks are not usually completely digitally silent during pauses between speech utterances PESQ is able to detect the effect of small amounts of added noise if the reference signal is very quiet in silent periods This means that a measurement of a system that adds noise such as an analogue connection using a reference signal that includes digital silence may give a slightly lower quality score than a measurement of the same system using a noisy reference In effect the noisy reference masks the noise added by the system 5 1 8 Degraded signal The degraded signal is the distorted version of the reference signal measured at the output of the system under test As little further degradation as possible should be introduced before this signal is input to PESQ as the model would not be able to separate this from the distortion introduced by the system Ideally the degraded signal should be recorded at 16kHz sample rate though for certain applications the use of 8kHz sample rate might be unavoidable The signal should be stored with at least 16 bits of precision at a level that avoids amplitude clipping and unnecessary quantisation Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 49 of
28. SQOO 210 EDO0155 0 1 Page 56 of 73 Psytechnics PESO User Guide Release 2 1 8 Comparison between objective and subjective results 8 1 Mapping PESQ scores to subjective MOS Scores given to identical conditions in two subjective tests will not generally be equal It is necessary to take account of this fact in comparing subjective and objective scores Subjective votes are affected by such factors as the balance of the other conditions in a test or the individual preferences of each subject Since one subjective test cannot be directly compared with another it is impossible for an objective model such as PESQ to give exactly the same scores as every subjective test However the difference between two sets of scores for the same conditions is usually no more than a smooth curve plus small ideally random errors This curve can be thought of as a function that can approximately map one set of scores on to the other To preserve order this mapping should be monotonic one to one This section illustrates how PESQ scores may be mapped to subjective MOS using this method The techniques outlined in this section apply equally to PESQ score and PESQ LQ PESQ LQ is generally closer to listening quality MOS than PESQ score but the comparison between either value and MOS is affected by the same variability in subjective votes and hence MOS that is outlined above Figure 28 a plots the subjective condition MOS against the condition averaged
29. The LP spectrogram is more specialised and therefore more suitable to profiles 2 and 3 The format that you use for example an image where colour is related to loudness or as a 3 D surface is left to you A 4 17 Results option LP excitation User Guide section 3 2 17 PESQ Tools only The theory behind LP analysis requires an advanced understanding of digital signal processing theory The LP excitation is therefore recommended for products where the user is likely to be interested in the properties of the speech signal Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 70 of 73 Psytechnics PESO User Guide Release 2 1 A 4 18 Results option Speech activity related outputs User Guide section 0 PESQ Tools only These outputs will be of use to anyone interested in diagnosing the effects of error concealment algorithms or discontinuous transmission systems The clipping statistics can be shown directly whereas it is recommended that the clipping flags be plotted alongside the reference and degraded signals A 4 19 Results option Speech diagnostic outputs User Guide section 3 2 19 PESQ Tools only These outputs will be of use in products where the user may be interested in the properties of the speech signal for example a tool to aid the development of speech coding algorithms It is recommended that the pitch information and formants be plotted on a time frequency axis pair The power output and speech and voicing probabilities ca
30. a Ee Re nane ERR RUPEE o bear auae 50 6 1 Listening and conversational testing esses ener nren trennen nennen enne 50 6 2 Design of a subjective testiin mennen epe eee teet educere te M id pedore 51 6 2 1 OpiMON SCale iii eh tree pee d ai 51 6 2 2 Conditions iot eie edes oreet o etim au meti cade ve aa 51 6 2 37 Other Tacita E EE 32 6 3 Processing of speech material cra estet esee le Hee pice 53 6 4 Analysis Of results i4 secet t erret dr Res cesa Dal sb te ee keen 54 6 4 1 Condition mean opinion score nono nono nono none cone nenen eren nsn nee nne nete enne 54 6 4 2 Other MOS Measures oi eere tede A eed eee iaa 54 6 4 3 Purther statistical analysis iicet ieee eel Retest dicts 54 6 4 4 Further reading ica eel hee nya the Ge ee oe tien raed 54 TS RE 55 7 1 Background noise testing with PESQ ener nre nennen nennen 55 7 2 Subjective testing with background noise ooooocnoconoccnonononcnonnnonnonnnnnnncnn nooo nc nn enne nennen 55 8 Comparison between objective and subjective results 4 ecce e ecce eere ee eee eee eene eee enae 57 8 1 Mapping PESQ scores to subjective MOS ccoo nono cnn cnn ncnnncnn nec nennen netten 57 8 2 Correlati n coefficient uiia e te ee RR b EE ERR da Ie EE AT HARE ORE andes da cada 58 8 9 JResid alertorszc ie tote dt eR Itt nett Atem nte Autos is 58 9 Performance of PESO a inicias 59 9 1 Narrowband measurements err no
31. ame score should only be taken as a rough guide to the location and relative magnitude of distortions it is not meaningful to talk about subjective quality on such short time intervals Figure 9 Frame by frame score Score 3 2 6 Signal waveforms The signal waveforms plot the amplitude of each signal over time as shown in the example in Figure 10 Figure 10 Signal waveforms 6000 4000 2000 Amplitude Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 29 of 73 Psytechnics PESO User Guide Release 2 1 3 2 7 Sensation surfaces The sensation surfaces show the perceived loudness on the Sone scale of the signals in time and frequency The frequency scale is a modified Bark scale and time interval between successive samples is 16ms The sensation surfaces are very useful clearly showing the content of the signals The sensation surfaces are available both pre equalisation before either transfer function equalisation or equalisation for time varying gain have been applied and post equalisation The error surface and the PESQ disturbance parameters are calculated post equalisation has been applied An example sensation surface is shown in Figure 11 Band Band Figure 11 Degraded sensation surface 40 35 30 25 20 15 10 5 1 2 3 4 5 6 7 Time s Figure 12 Error surface 40 35 30 25 20 15 10 j 1 2 3 4 5 6 7 i Time s N A Issue 2 1 PRO PESQ00 210 ED0155 0 1 Page
32. ance Note 2 The labelling of speech clipping and noise is dependent on the voice activity decision and other classifiers different classifiers may give different results Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 42 of 73 3 2 19 Psytechnics PESO User Guide Release 2 1 Speech outputs PESO Tools Only PESQ Tools provides a number of speech diagnostics for both the reference and degraded signals These outputs relate to the production of the speech signal and are complementary to the excitation signal discussed in section 3 2 17 The outputs are calculated using overlapping 32ms Hann windows and are updated every 16ms The following outputs are provided vocal pitch in Hertz frequency of first four formants f f4 in Hertz power of 32ms window absolute value not dBov probability of voicing probability of speech calculated from the reference signal The formants are only calculated during periods of speech activity while the pitch is only calculated during periods of voiced speech Figure 24 Example speech and voicing probability pitch and formant estimates Speech Voicing Est probability 4000 Formants 2000 Frequency Hz Frequency Hz Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 43 of 73 Psytechnics PESO User Guide Release 2 1 4 Extensions to P 862 4 1 Choice of model The models that PESQ can implement are e PESQ release 1 4
33. aw PESQ score to P 862 1 MOS ooooonoconocccocccocncnoncooncnnncon conan cono rennen 22 Figure 5 Mapping between PESQ score and PESQ Ie seen nennen 23 Figure 6 Frame by frame delay esee eene ener nennen 27 Elgure 7 Transfer function 2 A ated teens ms 28 Figure 8 Frame by frame disturbance eese enne ener en rennen nre 28 Figure 9 Frame by fr me Score eo ac e eR ERU ERR bee b nete edes 29 Figure 10 Signal Waveforms 2 2 eei Ri deti be ett tpe e Le toe ea i He Mee liga E lisis 29 Figure 11 Degraded sensation surface re ener enne nennen tnter nennen 30 Figure 12 Error Surface e Ree A ee etre Do Ue e Ee eet eres 30 Figure 13 Utterance by utterance delay ssseeeeeeeeeeeee eene nennen nennen eren 32 Figure 14 Utterance by utterance level eese eree enne con nennen 33 Figure 15 Speech spectrum of reference and degraded signals eee 35 Figure 16 Linear spectrum of reference and degraded signals eee 36 Figure 17 Transfer function estimates ener AAE nennen tenerent neret 37 Figure 18 Colierenc TUNCUON comica ii dece hen Ue HO Sore aa 37 Figure 19 Impulse response estimate rear nnne entre nne tenete 38 Figure 20 Linear spectrogram of degraded signal eese 39 Figure 21 LPC spectrogram of degraded signal
34. c disturbance as single values for each condition Results option Frame by frame quality score User Guide section 3 2 5 As an alternative to the symmetric and asymmetric disturbance values you may wish to present the simpler frame by frame quality score In this case you must include comments on the limitations of this output as given in section 3 2 5 of the User Guide Results option Signal waveforms User Guide section 3 2 6 The signal waveforms show the amplitude and timing of the signals You may choose whether or not to display them You are encouraged where possible to provide an option to play back the original and degraded files Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 66 of 73 A 4 7 A 4 8 A 4 9 A 4 10 A 4 11 Psytechnics PESQ User Guide Release 2 1 Results option Sensation surfaces User Guide section 3 2 7 You are encouraged to show the sensation surfaces and error surface see section A 4 8 It is often useful to present either one or both of the sensation surfaces or the reference signal waveform alongside the error surface so that the location of error events may be easily seen The format that you use for example an image where colour is related to loudness or as a 3 D surface is left to you You may also wish to include in your documentation some sample sensation and error surfaces so that users can learn how to interpret the images Examples of different types of distortion
35. cessive frames of 50 Effectively therefore each frame is 16ms long this can be thought of as sampling the values every 16ms PESQ calculates the delay in each frame based on the nearest utterance Because it models the processing used in PESQ the frame by frame delay is the best way of tracking how delay varies during the signal Delay changes are most likely to be caused by jitter buffer adaptation in VoIP telephony edge devices This adaptations occur when there is a large change in the jitter on an IP network As jitter on the VoIP network increases the delay measured by PESQ Tools will typically increase as the jitter buffer grows in size As the jitter decreases the delay measured will typically decrease as the jitter buffer decreases in size Figure 6 plots the frame by frame delay for the same condition as shown in Figure 13 Figure 6 Frame by frame delay Delay ms Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 27 of 73 Psytechnics PESO User Guide Release 2 1 3 2 3 Bark scale transfer function The system s transfer function in dB is estimated for each of the 42 perceptual frequency bands at 8kHz sample rate 49 bands at 16kHz sample rate A typical transfer function is shown in Figure 7 Note that the transfer function is calculated after level alignment has been performed Constant gain in the system under test will therefore not be shown in the transfer function estimate The overall dB gai
36. cope of this document examples are given here of the processing stages for two types of condition Simulated condition 1 9 Doe ml gx cU RS d Record original speech material using high quality microphone in quiet conditions Send filtering e g modified IRS and level alignment e g to 26dBov Add environmental noise at appropriate level if required Downsample to 8kHz the sample rate at which the codec simulations operate Apply coder Channel error insertion Apply decoder If multiple transcodings are simulated a filter and an arbitrary delay may be inserted to make the transcodings asynchronous then the coder error decoder stages are repeated Upsample to 16kHz for presentation in subjective test checking for clipping 10 Verify that active speech level lies within bounds Measured condition Record original speech material using high quality microphone in quiet conditions Send filter and level align to calibrated level for measurement system Set up connection Play out original signal at 16kHz sample rate Record degraded output of system at 16kHz sample rate Adjust level to calibrated active speech level e g 26dBov Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 53 of 73 Psytechnics PESO User Guide Release 2 1 6 4 Analysis of results 6 4 1 Condition mean opinion score The key measure of quality is the average of votes across all subjects and all files given to each condition Thi
37. d codecs IEEE International Conference on Acoustics Speech and Signal Processing ICASSP May 2001 Rix A W Proposed Annex B to Recommendation P 862 Application of PESQ to speech quality assessment of wideband telephone networks and speech codecs ITU T Study Group 12 Contribution COM12 36 August 2001 Psytechnics website http www psytechnics com 10 2 Subjective testing Methods for subjective determination of transmission quality ITU T Recommendation P 800 1996 Modulated noise reference unit MNRU ITU T Recommendation P 810 1996 Subjective performance assessment of telephone band and wideband digital codecs ITU T Recommendation P 830 1996 Issue 2 1 PRO PESQOO 210 EDOISS 0 1 Page 62 of 73 Psytechnics PESO User Guide Release 2 1 10 3 Statistics Kreyszig E Advanced engineering mathematics McGraw Hill ge edition 1998 Peebles P Probability random variables and random signal principles McGraw Hill 3 edition 1993 Dunn O J Applied statistics analysis of variance and regression Wiley 2 edition 1987 Snedecor G W and Cochran W G Statistical methods lowa State University Press ds edition 1989 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 63 of 73 Psytechnics 11 Glossary ACR ASL CCR DCR DTX HATS IRS ITU T LPC LQ MIRS MNB MNL MNRU MOS PESQ PSQM RMS SNR VAD PESQ User Guide Release 2 1 Absolute Category Rating a method for subj
38. de Release 2 1 7 Noise testing For certain types of network especially mobile it may be important to evaluate the quality of transmission when the original signal is corrupted by background noise For example a low bit rate coder optimised to transmit speech may produce strange sounding distortions when noise is present PESQ may be used for testing transmission quality in the presence of noise as described in this section We would like to emphasise that conducting subjective tests with background noise conditions is more difficult than conventional subjective testing and should be approached with caution 7 1 Background noise testing with PESQ Five different tests can be made with PESQ to evaluate the effect of noise on the quality of a given codec or system 1 No noise or coding This gives the baseline quality with no distortion PESQ scores are normally 4 5 in this case 2 Noise only no coding This gives the effect on quality of the noise alone and is important because the presence of the noise itself may be the largest factor Coding only no noise This gives the quality of the system with clean speech 4 Noise added before coding at input to system This gives the quality of the system when transmitting noisy speech 5 Noise added after coding at output of system This separates the effect of the noise from the effect of noise on the system In all cases the reference signal supplied to PESQ should be
39. degraded signals This is given in dBov By comparing the level of each utterance in the reference and degraded signals it is clear if the gain is changing during the measurement Gain variation can appear as a consequence of any of the following e automatic level control ALC e dynamic noise reduction e strong filtering e g in an analog connection An example plot showing the effect of ALC is shown in Figure 14 Figure 14 Utterance by utterance level Utterance level dBov Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 33 of 73 Psytechnics PESO User Guide Release 2 1 3 2 12 Signal level and gain PESO Tools Only PESQ Tools provides various measures of level amplitude which are calculated separately for the reference and degraded signals The measures and a description of each are shown in Table 3 From these values some additional quantities are derived which are shown in Table 4 Note 1 For computing most PESQ Tools parameters such as level spectrum transfer function and speech parameters the degraded signal is aligned in time with the reference signal The computation is based on the parts of the two signals that overlap This means that some measures of the degraded signal will give slightly different results from measures computed without this time alignment However this process makes it much more convenient to compare the reference and degraded signals Note 2 Level measures are com
40. e E 70 A 4 15 Results option Transfer function estimation sese rennen ren enne 70 A 4 16 Results option Signal spectrograms ener ener 70 A 4 17 Results option LP excitation eere nennen ener nennen tenete tne aaa 70 A 4 18 Results option Speech activity related outputs oooncnnccnoncconnconnnoncnonnnonononnncnnn conc conc eene 71 A 4 19 Results option Speech diagnostic outputs ooooonccnoncconcconncnnnnnonncnnnnnn nono nc nn nono nono aconncnnncnnncnnn ninos 71 Ad Extensions to P 862 5 ete ed teile Let ete a illa eA tee Ho bates bas Ert HR obe Lodge 72 A 6 Notes on speech signals A A ie ee tete date e deste ARES 72 A 7 Overview of subjective testing essei dee pecie vo desi pe erepto rre Doa e Lc a cannes 72 AS NOISE TeStin oce edet e RO e dee ied ele aide ee eee 72 A 9 Comparison between objective and subjective results essere 72 A TO Pertormance of PESQ 552 cen irn tenere ee ach ie a nena 72 ASAT References ot eot ete nee tin ie Do OE RA o CO des 72 LUEGO 73 A 13 Document details etg ree de tet rie eU e EH ED HESSE IRR Rr edd 73 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 7 of 73 Psytechnics PESO User Guide Release 2 1 Figures Figure 1 Usma PESO cuida ie 16 Figure 2 Processing performed in PESQ nono a e e ia eene iie 18 Figure 3 Mapping from raw PESQ score to PESQ LQ eese nennen nennen 21 Figure 4 Mapping from r
41. e tee rere Sterne eei ln 37 3 2 16 Signal spectrOgrams inen eben te e Ht pbi SEE Ee tee Saba pub bead seb eee don 39 92 TT EPG excltauon oret EM Rui tede e eS 40 3 2 18 Speech activity related outputs ener rennen nnne trenes 41 3 2 19 Speech outputs 4 2 eue RU eerte I tete ede ste pe deer ne eger ede 43 4 Extensions to ESO dinastia 44 4L Choice of model tetitas della 44 4 2 PESQ input filters rios 45 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 5 of 73 Psytechnics PESO User Guide Release 2 1 Background and Advanced Information cssscsssscssssscssscssssscsssssssssscssssssssscsssscssssssssesssesssssssessses 47 5 Not s on speech signals rr eoee eter h ersten onore Foo Feet eee eo et ha sebis toues ossos nb e rabo po eee aora eoa eene ea eR Penna 48 SLL Properties of test signals terree eere e e Eee e Eae t etel 48 O12 Temporal Structure citaciones 48 5 1 3 Leveland frequency content 1 cesses ttd tenet de ede e dede to Saab crede 48 5 L4 Source material uoce Heber step deste eese seeded aTe 48 5 1 5 Duration of an individual recording eese eene 49 3 10 Multiple measurements ses 2 er terret rte eoe cbr t preteen eroe 49 JT Reference signal dices e eet es te eu ete ed Me itg 49 3 188 Degraded signal tete te ere tee epe eo e nd te etr t eset 49 6 OVerview OF subjective testing eee orici tn rear begun eYao Pa obe Toon ee sesse nai Ur ko e
42. each condition The results are calculated per condition unless otherwise stated Table 5 Average and worst case correlation coefficient for 38 subjective tests known during PESQ development sub divided by test type No tests Type Corr coeff PESQ PSQM PSQM MNB 19 Mobile average 0 962 0 924 0 935 0 884 network worst case 0 905 0 843 0 859 0 731 9 Fixed average 0 942 0 881 0 897 0 801 network worst case 0 902 0 657 0 652 0 596 10 VoIP average 0 918 0 674 0 726 0 690 multi type worst case 0 810 0 260 0 469 0 363 Table 6 Error distribution across all 38 known subjective tests Absolute error range 0 25 lt 0 5 lt 0 75 lt 1 0 1 25 errors in range 74 7 93 9 99 3 99 9 100 0 PESQ errors in range 54 6 82 3 92 1 96 7 98 7 PSQM errors in range 59 6 84 5 93 7 97 2 98 9 PSOM errors in range 46 1 74 5 89 4 96 1 98 9 MNB Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 59 of 73 Psytechnics PESO User Guide Release 2 1 Table 7 Correlation coefficient 8 unknown subjective tests PESQ only Test Type Corr 1 Mobile real network measurements 0 979 2 Mobile simulations 0 943 3 Mobile real networks per file only 0 927 4 Fixed simulations 4 32 kbit s codecs 0 992 5 Fixed simulations 4 32 kbit s codecs 0 974 6 VoIP simulations 0 971 7 Multiple network types simulations 0 881 8 VoIP frame erasure concealment simulations 0 785 Table 8 Error distribution 7
43. ech quality assessment of narrow band telephone networks and speech codecs ITU T Recommendation P 862 Geneva February 2001 Rix A W Hollier M P and Gray P Predicting speech quality of telecommunications systems in a quality differentiated market 6 IEE Conference in Telecommunications ICT 98 IEE conference publication 451 156 160 1998 Rix A W Bourret A and Hollier M P Modelling human perception BT Technology Journal 17 1 24 34 January 1999 Rix A W Reynolds R and Hollier M P Perceptual measurement of end to end speech quality over audio and packet based networks 706 Audio Engineering Society Convention pre print no 4873 May 1999 Rix A W and Hollier M P Perceptual speech quality assessment from narrowband telephony to wideband audio 707 Audio Engineering Society Convention pre print no 5018 September 1999 Rix A W Reynolds R and Hollier M P Robust perceptual assessment of end to end audio quality IEEE Workshop on Applications of Signal Processing to Audio and Acoustics 39 42 October 1999 Rix A W Proposed modification to Draft P 862 to allow PESQ to be used for quality assessment of wideband speech ITU T Study Group 12 Delayed Contribution COM12 D007 February 2001 Rix A W Beerends J G Hollier M P and Hekstra A P Perceptual evaluation of Speech Quality PESQ a new method for speech quality assessment of telephone networks an
44. ective rating of quality in tests Active Speech Level Comparison Category Rating a method for subjective rating of quality in tests Degradation Category Rating a method for subjective rating of quality in tests Discontinuous transmission Head And Torso Simulator Intermediate Reference System International Telecommunications Union Telecommunication Standardisation Sector Linear predictive coding Listening Quality Modified IRS Measuring Normalising Blocks an earlier model for assessing speech quality of codecs Mean Noise Level Modulated Noise Reference Unit Mean Opinion Score Perceptual Evaluation of Speech Quality An algorithm described in ITU T recommendation P 862 Perceptual Speech Quality Measure an earlier model for assessing speech quality of codecs Root Mean Square Signal to Noise Ratio Voice activity detector End of sample userguide Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 64 of 73 Psytechnics PESO User Guide Release 2 1 Guidelines Not to be included in end user documentation for licensee use only Issue 2 1 PRO PESQOO 210 EDOISS 0 1 Page 65 of 73 Psytechnics PESO User Guide Release 2 1 A Guidelines for the use of the sample user guide by the licensee These guidelines should be read by all Licensees of the Psytechnics distribution of PESQ Perceptual Evaluation of Speech Quality They contain notes on creating end user documentation for products
45. etworks include elements that cannot reliably be assessed by such conventional engineering metrics as signal to noise ratio Examples of such elements include lossy coding error prone channels and voice activity detection One way to measure customers perception of the quality of these systems is to conduct a subjective test involving panels of human subjects However these tests are expensive and unsuitable for such applications as real time monitoring PESQ provides an objective measure that predicts the results of subjective listening tests on telephony systems To measure speech quality PESQ uses a sensory model to compare the original unprocessed signal with the degraded version at the output of the communications system This process is shown in Figure 1 and is explained in more detail in the next section The result of comparing the reference and degraded signals is a quality score This score is analogous to the subjective Mean Opinion Score MOS measured using panel tests according to ITU T P 800 The PESQ scores are calibrated using a large database of subjective tests Optionally PESQ can be used to provide other diagnostic information if required PESQ incorporates many new developments that distinguish it from earlier models for assessing codecs for example PSQM and MNB ITU T P 861 These innovations allow PESQ to be used with confidence to assess end to end speech quality as well as the effect of individual elements such
46. gn of a subjective test should attempt to control these to prevent them from influencing the condition opinion score The following are the most important of these variables Talker dependence Because different people s speech may be distorted in different ways it is usual to pass speech from four different talkers two adult male two adult female through each condition Subjects hear each condition four times at different stages in the test with speech from each of the four talkers Material dependence Different sections of speech may be distorted in different ways For example a frame erasure event may be less audible if it coincides with an unvoiced part of speech as opposed to a voiced part of speech Recently practice in subjective testing has moved to control this effect by using partially or fully factorial designs which evaluate three or more different recordings from each talker for a given condition Different groups of subjects hear a different combination of source speech material and condition This appears to give more consistent results than using only one recording from each talker for a given condition Order dependence A subject s vote for a given condition will depend to some extent on the last few conditions heard This effect may be partially controlled by scrambling the order Ideally a different order should be used for every subject otherwise there is danger that the subjective results could show a bias that is
47. he modulus of the gain in the averaging process rather than the complex value e The spectral difference transfer function estimate is derived from the ratio of the power of the output signal to the power of the input signal in each frequency band An example of linear phaseless and spectral difference transfer function estimates is shown in Figure 17 e The coherence spectrum provides an indication of the linearity of the system under test in each frequency band as shown in Figure 18 e The time domain transfer function is an estimate of the impulse response of the system under test It is derived by taking the inverse Fourier transform of the linear transfer function described above but using the complex value rather than the modulus of the mean gain for each band An example of a time domain transfer function estimate is shown in Figure 19 Figure 16 Figure 19 all relate to the same test condition Figure 17 Transfer function estimates T T T Linear Spectral difference Phaseless Transfer function gain dB bef no i i a 3000 4000 5000 6000 7000 8000 Frequency Hz 0 1000 2000 Figure 18 Coherence function Linear coherence 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency Hz Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 37 of 73 Psytechnics PESO User Guide Release 2 1 Figure 19 Impulse response estimate Linear impulse response Time
48. he telephone handset using an input filter This takes account of the effect of the electrical and acoustic components of the handset The filter used is similar to the IRS receive characteristic ITU T P 48 Time alignment The system under test may include a delay which may be variable In order to compare the reference and degraded signals they need to be lined up with each other PESQ applies voice activity detection to the signals to identify those parts of the signal that are speech ignoring noise Time alignment is then done in three stages e First PESQ aligns the overall speech signals utterances An utterance is a continuous speech burst identified by the voice activity detector that does not contain pauses longer than a pre determined threshold 200ms This process detects delay over major sections of the degraded signal compared to the reference signal e Second PESQ aligns overlapping sections of the speech frames This process detects delay that is variable over the length of an utterance as this can be significant in packet based networks e The third stage does not occur immediately after the second stage but is performed after the auditory transform has been calculated The third stage realigns bad intervals sections of the speech with very large disturbance and improves the model s accuracy with a small number of files where delay changes are not correctly identified by the initial time alignment process I
49. ibution 7 unknown subjective tests PESQ only sene 60 Table 9 Overall correlation of wideband PESQ with subjective test results sese 60 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 9 of 73 Psytechnics PESO User Guide Release 2 1 This page has been left intentionally blank This page has been left intentionally blank Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 10 of 73 Psytechnics PESO User Guide Release 2 1 1 Introduction 1 1 About this document This document is an overview and user manual for the Psytechnics distribution of PESQ Perceptual Evaluation of Speech Quality Licensees of Psytechnics Ltd may customise it for their own end users in accordance with their licences and the guidelines provided at the end of this document The guidelines should be read by all Licensees of the Psytechnics distribution of PESQ Perceptual Evaluation of Speech Quality They contain notes on creating end user documentation for products that include PESQ or PESQ Tools The following documentation is also available on PESQ and PESQ Tools e PESQ and PESQ Tools Code documentation Contains detailed documentation of the PESQ and PESQ Tools code and API It is intended for use by engineers integrating PESQ and PESQ Tools into an end user product Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 11 of 73 Psytechnics PESO User Guide Release 2 1 1 2 A guide to this document This sample User
50. ich are optimised to best reflect the spread of the data Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 31 of 73 3 2 10 Psytechnics PESO User Guide Release 2 1 Utterance by utterance delay measures PESO Tools Only An overview of the PESQ time alignment operation is given in section 2 3 It generates two sets of results the utterance by utterance and frame by frame delay values In order to deal with variable delay PESQ sub divides the signal into a number of utterances Each utterance is time aligned separately The calculation returns for each utterance e the estimated delay in samples e adelay confidence between O no confidence and 1 full confidence e the utterance start sample index e the utterance end sample index These quantities enable the variation of delay throughout the recording to be plotted An example is shown in Figure 13 The utterance by utterance results are a preliminary set of values and the frame by frame delay values section 3 2 2 are the values actually used in calculating the quality score Figure 13 Utterance by utterance delay Delay ms Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 32 of 73 3 2 11 Psytechnics PESO User Guide Release 2 1 Utterance by utterance level PESO Tools Only To analyse the effect of time varying processes such as automatic level control PESQ Tools includes measurements of the active speech level of each speech utterance in the reference and
51. le 9 Overall correlation of wideband PESQ with subjective test results Experiment P905 1 P905 24 P905 2b AES107 Per condition correlation coefficient between wideband 0 952 0 981 0 977 0 949 PESQ and subjective MOS per condition after third order mapping Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 60 of 73 Psytechnics PESO User Guide Release 2 1 Supplementary Information Issue 2 1 PRO PESQOO 210 EDO0155 0 1 Page 61 of 73 Psytechnics PESO User Guide Release 2 1 10 References 10 1 Objective speech quality assessment Wang S Sekey A and Gersho A An objective measure for predicting subjective quality of speech coders IEEE Journal on Selected Areas in Communications 10 5 819 829 1992 Hollier M P Hawksford M O and Guard D R Characterisation of communications systems using a speech like test stimulus Journal of the Audio Engineering Society 41 12 1008 1021 1993 Beerends J G and Stemerdink J A A perceptual speech quality measure based on a psychoacoustic sound representation Journal of the Audio Engineering Society 42 3 115 123 1994 Hollier M P Hawksford M O and Guard D R Error activity and error entropy as a measure of psychoacoustic significance in the perceptual domain IEE Proceedings Vision Image and Signal Processing 141 3 203 208 1994 Perceptual evaluation of speech quality PESQ an objective method for end to end spe
52. model narrowband handset on reference and degraded signals e Backwards compatible PESQ version 1 0 model narrowband handset on reference and degraded signals e HATS ear recording on degraded signal unprocessed wideband reference signal e Wideband model headphone listening The default is the PESQ release 1 4 model Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 44 of 73 Psytechnics PESO User Guide Release 2 1 4 2 PESQ input filters Depending on the choice of model section 4 1 PESQ determines internally which input filter to apply In the standard narrowband PESQ measurements an input filter is applied to both the reference and degraded files before time alignment and psychoacoustic processing The filter used which is similar to the modified IRS receive filter specified in P 830 is shown in Figure 25 This is an approximation to the filter characteristic of a telephone handset Figure 25 PESQ narrowband input filter characteristic 20 0 1000 2000 3000 4000 Frequency Hz For wideband measurements a filter with a flat response above 100Hz and a gentle roll off below this point is used This models the attenuation of the headphones and ear at low frequencies The response of the 16kHz implementation is shown in Figure 26 The 8kHz implementation has the same gain within 0 1 dB in the 1Hz 4kHz range For HATS measurements using a telephone handset the standard narrowband input filter is applied
53. ms Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 36 of 73 3 2 16 Psytechnics PESO User Guide Release 2 1 Signal spectrograms PESO Tools Only A spectrogram is a two dimensional output that comprises a time sequence of frequency spectra PESQ Tools provides two spectrograms for the reference and the degraded signals The linear spectrogram is a sequence of Fourier transform spectra which are calculated every 16ms using overlapping 32ms Hann windows An example linear spectrogram is shown in Figure 20 The linear predictive coding LPC spectrogram is a sequence of spectra derived by calculating the Fourier transform of 16 order LPC coefficients The LPC coefficients are generated from the input signals every 16ms using a Hamming Window An example LPC spectrogram is shown in Figure 21 Figure 20 Linear spectrogram of degraded signal 0 3500 40 3000 20 2500 30 2000 40 1500 50 1000 60 500 70 20 1 2 3 4 5 6 7 d Time s Frequency Hz Figure 21 LPC spectrogram of degraded signal 3500 3000 2500 2000 1500 1000 500 0 0 1 2 3 4 5 6 7 Time s Frequency Hz Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 39 of 73 3 2 17 Psytechnics PESO User Guide Release 2 1 LPC excitation PESO Tools Only PESQ Tools generates a time domain excitation signal for both the reference and degraded input The excitation of a speech signal is the residual signal generated by filtering it with a time varying linea
54. n of the system can be found using the signal level measures section 3 2 9 Figure 7 Transfer function Gain dB 500 2000 Frequency Hz 3 2 4 Perceptual parameters PESQ computes two parameters that describe the amount and distribution of audible errors e Symmetric disturbance e Asymmetric disturbance These values are returned both frame by frame and as averages Both types of disturbance range between 0 no distortion and 45 maximum An example plot of the frame by frame disturbance parameters is shown in Figure 8 Note that PESQ usually ignores the silent periods at the start and end of any signal which is why both disturbance values go to zero at the end of this example Figure 8 Frame by frame disturbance Symmetric Asymmetric Disturbance Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 26 of 73 Psytechnics PESO User Guide Release 2 1 3 2 5 Frame by frame score Frame by frame quality score is calculated from the frame by frame symmetric and asymmetric disturbance values to provide a simpler way to interpret distortions An example of frame by frame score is shown in Figure 9 corresponding to the same condition as Figure 8 Note that the PESQ score is not a simple average of the frame by frame score A complex non linear averaging process is applied separately to obtain the average symmetric and asymmetric errors and the PESQ score is derived from these Frame by fr
55. n be computed as follows 4 999 0 999 y 0999 14 g 195660 The graph of the P 862 1 function is presented in Figure 4 For more information on this mapping please see ITU T recommendation P 862 1 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 22 of 73 Psytechnics PESO User Guide Release 2 1 2 4 7 Typical quality scores Based on simulations and real measurements Table 2 presents the results of a number of typical networks and codecs with no errors or packet loss In addition it gives the scores that can be expected in some mobile network conditions where errors are significant Table 2 Typical PESO scores for a range of conditions Network condition Typical PESQ Typical PESQ score LQ score Clean ISDN network 4 3 4 4 Analogue network G 711 4 1 4 2 G 728 codec 16kbit s 3 8 3 9 G 729 codec 8kbit s 3 6 3 7 G 723 1 codec 6 3kbit s 3 5 3 4 GSM EFR codec 12 2kbit s 3 9 4 0 GSM FR codec 13kbit s 3 5 3 5 GSM EFR mobile network in 3 6 to 3 1 3 6 to 2 9 typical operating range GSM EFR mobile network in 2 2 1 6 very poor conditions Note Results can be affected by a number of factors for example the test signal used We averaged the scores from measurements with different speech material in four languages Each measurement was 8s long and used clean speech The speech signals at the input to the network were MIRS send filtered and were at an active speech level of 26 dBov 2
56. n be plotted alongside the time frequency representation using the same time axis Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 71 of 73 Psytechnics PESO User Guide Release 2 1 A 5 Extensions to P 862 User Guide section 4 You are recommended to include the information in section 4 of the User Guide when your users can use the additional facilities in PESQ for Head and Torso Simulator HATS ear measurements and wideband telephony measurements for example in profile 3 Please note that relatively little testing has been done on the performance of PESQ with these alternative models performance results from 4 wideband telephony experiments are given in the User Guide HATS and wideband applications should therefore be approached with care and after appropriate training A 6 Notes on speech signals User Guide section 5 You are recommended to include the information in section 5 of the User Guide as part of your user documentation for products designed for profile 2 or profile 3 unless the users are only going to be applying a test signal that was prepared by you according to the guidelines given in this section A 7 Overview of subjective testing User Guide section 6 You are recommended to include the information in section 6 of the User Guide as part of your user documentation for products designed for profile 2 or profile 3 A 8 Noise testing User Guide section 7 You are recommended to include the information in
57. nce and degraded signals are as follows speech spectrum speech active periods only noise spectrum silent periods only e average spectrum of the whole signal The level of each frequency band is in dBov Examples of the speech and noise spectrum for reference and degraded signals are shown in Figure 16 Figure 16 Linear spectrum of reference and degraded signals Reference speech spectrum Reference noise spectrum Degraded speech spectrum Degraded noise spectrum En o o Power spectral density dBov ER N o i i i i l i 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency Hz EN p o Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 36 of 73 3 2 15 Psytechnics PESO User Guide Release 2 1 Transfer function estimation PESO Tools Only Five different transfer function estimates are provided four long term spectra and one time domain signal The frequency scale of the spectra is linear the values provided for each frequency band in the spectra represent gain and are given in dB e The linear transfer function is an estimate of the transfer function between the input and output of the system under test The value provided for each frequency band is the modulus of the mean complex gain for that band The complex gain values used in the averaging process are calculated every 16ms using a Fourier transform e The phaseless transfer function is similar to the linear transfer function but uses t
58. ne 67 A 3A Level aligninent zoe pe a eine etc ode e e ep dre re tees 67 A 3 Imea hlgnment 5 RTS 67 3 6 Results quality Scores uni le te Ped Eta tenes 67 A 4 Advanced use including PESQ Tools sese ennt ennt een 68 A 4 1 Input option Model specification essent nennen nne rennen 68 A 4 2 Results option Frame by frame Delay seen 68 A 4 3 Results option Bark scale transfer fUNctiON ooooonnccnocononaciononononononancnnnnnnnconnncnnnnnnn nro corn rennen 68 A 4 4 Results option Perceptual parameters sees net nennen 68 A 4 5 Results option Frame by frame quality score ooooconoccnocococcconncnonananonanonanconnncnnnnonn crac eene 68 A 4 6 Results option Signal waveforms sese nennen nennen nentes 68 A 4 7 Results option Sensation surfaces sera eene nennen nenne nnne enne 69 A48 Results option Error surface sesine acrea aet etg eee EL bete REL iore eet 69 A 4 9 Results option Frame by frame delay statistics esee 69 A 4 10 Results option Utterance by utterance delay esee rene 69 A 4 11 Results option Utterance by utterance level eene 69 A 4 12 Results option Signal level and gain measures esee 70 A 4 13 Results option Bark signal spectra er ene nennen nnnm nennen 70 A 4 14 Results option Linear spectantia rni hneda a a e a e
59. nus MNL of reference signal May differ from the system gain if noise is added or suppressed Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 34 of 73 3 2 13 Psytechnics PESO User Guide Release 2 1 Bark signal spectra PESO Tools Only The Bark signal spectra are calculated using the Bark frequency scale These measures can be used to compare the spectrum of different signals and compare speech and noise The spectra returned for the reference and degraded signals are as follows speech spectrum speech active periods only e noise spectrum silent periods only average spectrum of the whole signal The level of each frequency band is in dBov The centre frequency in Hz of each Bark band is also returned and this can be used to plot the data on a linear frequency scale as shown in Figure 15 Figure 15 Speech spectrum of reference and degraded signals 30 E E E Reference 40 E E Degraded 60 70 80 90 Spectral level in each Bark band dBov 100 2000 2500 3000 3500 4000 Frequency Hz 0 500 1000 1500 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 35 of 73 3 2 14 Psytechnics PESO User Guide Release 2 1 Linear spectra PESO Tools Only The linear signal spectra are calculated using a linear frequency scale These measures can be used to compare the spectrum of different signals and compare speech and noise The spectra returned for the refere
60. o delay or handset sidetone In conversational tests pairs of subjects hold a conversation over a test network connection before voting on its quality These measurements take account of the whole link including handsets and sidetone echo level and delay impairment Conversational tests are generally more expensive than listening tests and a single conversational test is only able to investigate a small number of conditions PESQ on its own is a listening model so PESQ quality scores do not normally take account of the conversational factors level talker echo delay and sidetone However information on level and delay may be gained from the PESQ level and delay values if the measurement setup is appropriately calibrated Other techniques can be used to estimate level echo and delay Sidetone can often be assumed constant based on typical equipment used in a given country Conversational factors may be important in some circumstances In particular if a network introduces significant level changes attenuation or gain or if it has audible talker echo or large delays it may be appropriate to consider measurements of these factors as well as PESQ scores For example voice over IP transmission equipment may often improve listening quality by increasing buffer length introducing greater delay This causes greater conversational impairment and since the network is most likely to be used for two way communication this change in delay should al
61. obtained from PESQ or PESQ Tools Plots of some outputs are provided giving example results from interesting network conditions Notation A number of outputs are returned using the dBov scale This is defined such that a square wave of amplitude equal to the maximum possible value of a 16 bit PCM signal has a level of OdBov A difference between two dBov quantities has the units dB In some cases a value in dBov or dB cannot be computed for example if the degraded file contains digital silence In these cases a value of 999 0 is returned The following outputs are available in PESQ and PESQ Tools e Frame by frame delay section 3 2 2 e Transfer function and signal spectra section 3 2 3 e Perceptual parameters section 3 2 4 e Frame by frame score section 3 2 5 e Signal waveforms section 3 2 6 e Sensation surfaces section 3 2 7 e Error surface section 3 2 8 The following outputs are only available in PESQ Tools e Frame by frame delay statistics section 3 2 9 e Utterance by utterance delay measures section 3 2 9 e Signal level and gain section 3 2 9 e Utterance by utterance level section 3 2 11 e Bark signal spectra section 3 2 13 e Linear spectra section 3 2 14 e Transfer function estimation section 3 2 15 e Signal spectrograms section 3 2 16 e LPC excitation section 3 2 17 e Speech activity related outputs section 0 e Speech outputs section 3 2 19 Issue 2 I PRO PESQOO 210 EDO0155 0 1
62. of noise on a transmission system may be tested by including the following conditions in the test analogous to those described in the previous section l clean speech unprocessed 2 noisy speech unprocessed 3 clean speech coded 4 noisy speech coded These conditions allow tests of the subjectivity of several factors the noise alone B compared to A the system with clean speech C compared to A and the system with noisy speech D compared to B Several different types or levels of noise can be assessed in a test although of course there is only need for a single set of clean speech conditions This type of test normally uses the MNRU as a reference with clean speech only DCR and CCR methods may also be used In this case the reference signal that subjects hear may be the noise free unprocessed speech standard methods or the noisy unprocessed speech the so called modified methods These methods allow a comparison similar to that possible with the ACR methods although ACR requires a shorter listening time for each condition The background noise tests used in PESQ calibration were all conducted with the ACR methods In this case the reference signal presented to PESQ is the clean unprocessed speech If the results of a subjective test including environmental noise are to be compared with PESQ scores it is strongly recommended that either the ACR listening quality or the ACR listening effort method is used Issue 2 I PRO PE
63. on Model specification User Guide section 2 2 3 You may offer a switch to select the PESQ version 1 operation mode model 1 However we recommend that the default processing should be the new process in PESQ release 1 4 model 0 and we request that you make model 0 the default Level alignment User Guide section 2 3 Level alignment is integral to PESQ and there must be no option to change this Time alignment User Guide section 2 3 You must not offer any option to alter the way in which time alignment is performed Although a small improvement in processing speed may be gained by preventing PESQ from testing for delay changes during speech this could cause the PESQ scores to be significantly in error if delay changes do actually occur If processing speed is a major problem even after fully optimising your code you should contact Psytechnics Results quality scores User Guide section 2 4 You are encouraged to offer PESQ score PESQ LQ and PESQ Ie as the outputs of the model You may use the descriptions in section 2 4 of the User Guide including the formula for PESQ LQ and the reference to ITU T Recommendation P 834 in your documentation Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 67 of 73 Psytechnics PESO User Guide Release 2 1 A 4 Advanced use including PESQ Tools A 4 1 A 4 2 A 4 3 A 4 4 A 4 5 A 4 6 User Guide section 3 The information referred to in this section will be
64. oose the appropriate profile A 2 PESQ Tools The PESQ Tools option greatly extends that range of diagnostic outputs provided by PESQ The use of PESQ Tools is therefore highly recommended for Profile 2 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 66 of 73 Psytechnics PESO User Guide Release 2 1 A 3 Inputs and Outputs for basic use of PESQ A 3 1 A 3 2 A 3 3 A 3 4 A 3 5 A 3 6 User Guide section 2 You are recommended to include the information referred to in this section as part of your user documentation for all profiles 1 3 For a device according to profile 1 you may choose to provide only this information and omit the background and advanced use information referred to in the following sections Input option Speech signals User Guide section 2 2 1 Implementations may vary on how test signals are stored and passed to PESQ Your documentation should explain the format s that you choose to offer in your product Input option Sampling Rate User Guide section 2 2 2 Implementations may vary in how the sample rate is specified and whether there is any default value For measurement devices that only operate at one sample rate there is no need to offer this option or to provide information on the choice of sample rate If the model detects sample rate by other means for example from a wav format file header the documentation should still discuss the issues related to choice of sample rate Input opti
65. ould be made of a given condition to allow time varying quality or material dependence to be assessed The PESQ scores for all recordings on a given condition can be averaged to give a view of the overall quality and the individual scores show quality variation over the condition due to material or time dependence If artificial speech is being used the measurement should be at least 28s long It is recommended that this be split into three or four files If natural recorded speech is used as a test signal 32s should be regarded as a minimum 8s for each of four talkers and if possible up to two minutes 16 recordings of 8s duration should be used 5 1 7 Reference signal The reference provides PESQ with information on how the original unprocessed signal should sound The file must contain samples at 8kHz or 16kHz sample rate This data should normally be stored as 16 bit integers The reference should be distortion free so that PESQ can assess the quality of the system under test The reference can often be exactly the same file that is passed through the distorting system Certain types of pre processing make little difference in practice to PESQ scores especially filtering with the modified IRS send characteristic or level adjustment as long as quantisation errors remain small Various types of noise may be added to evaluate the system s performance at transmitting noisy speech In these situations the reference that is used with PES
66. pping clipping Statistics are provided for the following clipping events e All types of clipping e All types of clipping excluding front end e Front end clipping only e Back end clipping only Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 41 of 73 Psytechnics PESO User Guide Release 2 1 The statistics are e The proportion of speech subject to clipping as a value between O and 1 e The number of clipping events e The total duration of clipping events in seconds e The mean duration of clipping events in seconds In addition the total duration of speech the total duration of hangover and the number of instances of hangover are also returned PESQ Tools also provides an output that divides the input signal into 1ms frames and sets various classification flags for each 1ms frame according to any speech activity events The following flags may be set e Reference signal is active e Reference signal is active at the P 56 criterion e Degraded signal is active at the P 56 criterion e Clipping has been detected reference is active but degraded is not e Clipping classified as front end e Clipping classified as back end e Hang over period degraded is active but reference is not e Comfort noise period neither reference nor degraded is active Note 1 A signal is defined to be active according to the P 56 criterion if its level in the frame is greater than ASL 15 9dB where ASL is the active speech level of that utter
67. psutechnics olce amp video quality assessment A Sample PESQ User Guide Psytechnics PESO User Guide Release 2 1 This page has been left intentionally blank This page has been left intentionally blank Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 2 of 73 Psytechnics PESO User Guide Release 2 1 Important Information Document issue This is Issue 2 1 of the PESQ and PESQ Tools sample user guide for Psytechnics Release 2 1 of PESQ Intellectual property rights Software included in this product is protected by copyright and by European US and other patents and is provided under licence from Psytechnics Limited Warranty Psytechnics Limited warrants that it has used reasonable commercial efforts prior to packaging and dispatch to make certain that the media on which the software is delivered is error free In the event that the Licensee discovers any material errors and notifies Psytechnics Limited of the same within 90 days warranty period of receiving the software Psytechnics Limited will at its option either replace the software or fix any material errors provided any non compliance has not been caused by any modification variation or addition by the Licensee In no circumstance will the existence of any errors constitute a breach of the Licence Agreement In addition Psytechnics Limited warrants that it has used reasonable commercial efforts in the production and dispatch of Documentation and or Manuals
68. puted using a voice activity decision based on the reference signal VAD This can produce different results from a VAD applied only to the degraded signal for example if the addition of noise alters the classification of speech and noise Voice activity decision is sometimes ambiguous so you may encounter unexpected results with the MNL of the reference and degraded signals if the reference signal is hard to classify Table 3 Signal level measures calculated separately for reference and degraded signals Measure Units Meaning Typical value Typical range Active speech level dBov Power RMS level during speech 26 35 15 ASL active periods Mean noise level dBov Power RMS level during silent 70 clean 80 15 MNL periods only speech RMS mean level dBov Power RMS level of the entire 30 40 15 signal Estimated signal to dB The relative loudness of speech to 45 clean 10 60 noise ratio SNR noise i e ASL MNL speech DC offset PCM The DC offset of the input signal 0 32 32 units Table 4 Level measures of the system under test Measure Units Meaning Typical value Typical range Insertion gain dB Power gain of the system under test 0 digital 20 6 Calculated as ASL of degraded Jo malos signal minus ASL of reference 8 signal Noise gain dB Gain calculated for noise in silent 0 20 20 periods Calculated as MNL of degraded signal mi
69. r predictive coding LPC filter In PESQ Tools the excitation of an input signal is produced by dividing the input signal into segments calculating a set of 16 order LPC coefficients for each segment and then filtering each signal segment with the corresponding coefficients The excitation signal is a valuable tool in speech analysis because it approximates the speech at the point of excitation 1 e before the signal spectrum is modified by the effects of the vocal tract and lip radiation Voiced sounds are generated from pulses produced by the periodic opening and closing of the vocal cords The time between two pulses is the pitch period for that section of speech Unvoiced sounds are generated by forcing air through a constriction in the vocal tract for example that created by placing the upper teeth in the lower lip and is typically noise like in nature An example plot of a sequence of voiced sounds followed by an unvoiced sound is shown in Figure 22 In this example the voiced part runs from about 1 1 1 65s the unvoiced sound from 1 65s onwards Figure 22 Excitation of reference and degraded signals Reference excitation Degraded excitation 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 40 of 73 3 2 18 Psytechnics PESO User Guide Release 2 1 Speech activity related outputs PESO Tools Only PESQ Tools provides a number of diagnostic outputs that relate to the use of
70. r them to users for profiles 2 and 3 You should edit the description of the outputs to reflect how you calculate them and present them to the user Note for developers the level outputs are returned from PESQ using the Lv Params array your own implementation should calculate SNR using the simple formula given in section 3 2 12 of the User Guide A 4 13 Results option Bark signal spectra User Guide section 3 2 13 PESQ Tools only For profile 3 and possibly also for profile 2 you may wish to show the signal speech and noise spectra that are calculated for both reference and degraded signals PESQ Tools returns these measures on a perceptual Bark frequency scale and a linear frequency scale see A 4 14 A 4 14 Results option Linear spectra User Guide section 3 2 14 PESQ Tools only For profile 3 and possibly also for profile 2 you may wish to supplement the Bark signal spectra see A 4 13 with the equivalent linear frequency spectra A 4 15 Results option Transfer function estimation User Guide section 3 2 15 PESQ Tools only You may display the four linear frequency transfer function estimates TFE and a time domain TFE for products in profile 3 and you may wish to offer it for profile 2 These complement the Bark scale TFE discussed in section A 4 3 A 4 16 Results option Signal spectrograms User Guide section 3 2 16 PESQ Tools only You may display the linear signal spectrogram for products in all profiles
71. red to PSQM and MNB the previous standards using methodology similar to that of the ITU T competition that resulted in recommendation P 862 See Rix Beerends Hollier and Hekstra reference in Section 10 The test used correlation coefficient and residual error distribution to quantify the performance of models at predicting subjective MOS These metrics are calculated for each subjective test separately after mapping the objective scores to the subjective scores for that test in a minimum squared error sense using monotonic third order polynomial regression This mapping ensures that the comparison is made in the MOS domain whilst allowing for normal variations in subjective voting between tests Tests are grouped according to whether conditions were predominantly from mobile fixed voice over IP VoIP and multiple type networks Table 5 and Table 6 show correlation and residual error distribution for PESQ PSQM and MNB for 38 subjective tests that were available to the developers of PESQ These included a wide range of simulated and real network measurements Table 7 and Table 8 present the results for PESQ only of an independent evaluation that was conducted after development was complete All of this data relates to subjective listening tests carried out on the Absolute Category Rating ACR listening quality opinion scale Test material consists of natural speech recordings of 8 12s in duration with four talkers two male two female for
72. relating to the software In the event the Licensee discovers a material error and notifies Psytechnics Limited of the same within 90 days warranty period of receiving the Documentation and or Manual Psytechnics Limited will at its option either replace the Documentation and or Manual or correct the material error The Licensee acknowledges that any and all copyright trademark and other intellectual property rights subsisting in or used in connection with the software including any Documentation and or Manual relating thereto are and shall remain the property of Psytechnics Limited and the Licensee shall not during or after expiry or termination of this Agreement in anyway question or dispute the ownership of the Documentation and or Manuals relating to the software Copyright Under the copyright laws this publication may not be reproduced or transmitted in any form electronic or mechanical including photocopying recording storing in an information retrieval system or translating in whole or in part without the prior written consent of Psytechnics Limited O Copyright 2001 2002 Psytechnics Limited All rights reserved Trademarks PESQ PESQ Tools Psytechnics are trademarks of Psytechnics Limited Product and company names mentioned herein are trademarks or trade names of their respective companies Contact Psytechnics Limited Fraser House 23 Museum Street Ipswich IP1 1HN United Kingdom Tel 44 0 1473 261 800 Fax
73. s is known as the condition mean opinion score often abbreviated to MOS and is the figure most commonly used to describe a condition 6 4 2 Other MOS measures It is also possible to average votes to obtain an MOS for each file file MOS and or each talker talker MOS in a given condition Though less commonly used than condition MOS these scores given an indication of quality dependence on material or talker 6 4 3 Further statistical analysis Many statistical techniques may be applied to analyse the distribution of votes and investigate the influence of factors such as talker or subject For example the following methods may be useful e Confidence interval provides an estimate of the range in which the true mean may lie given the distribution of observations votes e T tests allow the votes from two different conditions to be compared to assess whether there is evidence that any differences between them are significant or merely stem from randomness in the voting process e ANOVA analysis of variance is a technique for testing and ideally eliminating the influence of many factors that cannot be fully controlled for example talker dependence listening order and individual subjects 6 4 4 Further reading More information on these and other statistical methods may be obtained by following the references listed in section 10 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 54 of 73 Psytechnics PESO User Gui
74. section 7 of the User Guide as part of your user documentation for products designed for profile 2 or profile 3 A 9 Comparison between objective and subjective results User Guide section 8 You are recommended to include the information in section 8 as part of your user documentation for products designed for profile 2 or profile 3 A 10Performance of PESQ User Guide section 9 You are recommended to include the information in section 9 1 as part of your user documentation for all profiles Section 9 2 wideband telephony model is appropriate only to profile 3 A 11 References User Guide section 10 You are recommended to include the information in this section as part of your user documentation for products designed for profile 3 You may also wish to include some of the references in documentation for products in profile 1 or profile 2 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 72 of 73 Psytechnics PESO User Guide Release 2 1 A 12 Glossary User Guide section 11 You are recommended to include some or all of these terms as appropriate in a glossary in your own user documentation A 13 Document details Your documentation should show your own company details As specified in your license agreement you must personalise your implementation of PESQ and all accompanying documentation with your own identity End of guidelines Issue 2 I PRO_PESQ00_210_ED0155_0 1 Page 73 of 73
75. so be considered before conclusions on overall quality are made In the remainder of this document we consider only subjective listening tests Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 50 of 73 Psytechnics PESO User Guide Release 2 1 6 2 Design of a subjective test Subjective perception of quality depends on a large number of factors In designing a subjective test 1t 1s essential to control many extraneous variables by choosing appropriate values or averaging over a typical population distribution These variables are examined in this section 6 2 1 Opinion scales The most common technique in listening testing for telephony is known as the Absolute Category Rating ACR method In this type of test subjects hear only the processed conditions After hearing each recording the subjects are prompted to vote PESQ produces listening quality scores that are analogous to the ACR listening quality opinion scales The votes given by subjects for each file are then averaged to give a file mean opinion score MOS The average of all votes given to all files for a given network condition is known as the condition MOS There are some alternative test structures in use for specific applications These include Degradation Category Rating DCR and Comparison Category Rating CCR methods Because these methods use a different quality question they will not normally give the same results as an ACR test Indeed there is evidence to suggest tha
76. speech recordings should contain a representative and balanced range of speech sounds If different recordings are to be concatenated the joins must be made in silent periods to avoid discontinuities Signals that are not speech like should not be used with PESQ for several reasons They may cause the network to behave in an unrepresentative way they cannot fully test the quality of speech codecs and they do not reproduce the temporal structure of speech that may be exploited by elements such as voice activity detectors Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 46 of 73 Psytechnics PESO User Guide Release 2 1 5 1 5 Duration of an individual recording PESQ is optimised for recordings of 8s in duration containing at least 4s of active speech As a guide the minimum length for a measurement to give a representative PESQ score is about 6s containing at least 3s of active speech Recordings of 16s or longer in duration should be split into shorter sections and each processed separately through PESQ The reference and degraded signals do not have to be of exactly the same length PESQ aligns and processes only the sections for which data is available from both reference and degraded signals If a measurement introduces significant unknown delay it is a good idea to extend the recording at both the start and end to ensure that the entire test signal is captured 5 1 6 Multiple measurements Whenever possible more than one measurement sh
77. ssue 2 I PRO PESQOO 210 EDO0155 0 1 Page 16 of 73 Psytechnics PESO User Guide Release 2 1 Auditory transform In order to compare the reference and degraded signals taking account of how a listener would have heard them each is passed through an auditory transform that mimics certain key properties of human hearing This gives a representation in time and frequency of the perceived loudness of the signal known as the sensation surface Equalisation Part of the auditory transformation equalises certain processes that have little subjective effect First the transfer function of the system is estimated and is used to equalise the reference to the degraded in the auditory transform domain This takes account of filtering in analogue components of the network such as telephone handsets Second the frame by frame amplitude gain of the system is estimated and used to equalise the auditory transform of degraded file to the reference In both cases the equalisation is partial large amounts of filtering or gain variation are not cancelled and therefore result in errors being measured Disturbance processing The difference between the sensation surfaces for the reference and degraded files is known as the error surface this shows any audible differences introduced by the system under test The error surface is analysed by a process that takes account of the effect that small distortions in a signal are inaudible in the presence of loud signals
78. t asking a different quality question may result in different conclusions being reached when comparing one type of communications technology with another Where subjective test results are to be compared with PESQ scores we strongly recommend that the ACR listening quality method is used 6 2 2 Conditions A typical listening test allows up to about 50 network conditions to be evaluated assuming that an Absolute Category Rating ACR method is used with speech material from four talkers see below for more details At least six of these conditions should normally be given over to MNRU references P 810 that cover the full range of quality It is also a good idea to include standard network conditions such as G 711 so that quality scores can be compared against them At the start of each test all subjects hear the same set of 6 8 preliminary conditions covering a range of distortion types and vote on their quality using the same procedure for voting as the main set of conditions The votes for the preliminaries are discarded they serve as an anchor to ensure that all subjects start the test with the same idea of what the range of quality and the types of distortions will be Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 51 of 73 Psytechnics PESO User Guide Release 2 1 6 2 3 Other factors A test aims to obtain a measure of the subjective quality of a number of network conditions However there are usually many other variables The desi
79. the optimum These will have an effect on the amount of disturbance measured in the degraded signal and will therefore effect the PESQ score This issue be addressed by displaying a warning if the degraded and reference signals vary in length or active speech level by more than 20 Speech activity warning P 862 states that the speech activity in a test signal to be used with PESQ should be between 40 and 8096 A low speech activity could cause the PESQ score to be inaccurate Although the typical speech activity for a test signal can vary depending on the language used in the signal A warning should be shown if PESQ detects that the speech activity in the reference or degraded is below 35 or above 85 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 24 of 73 Psytechnics PESO User Guide Release 2 1 3 Advanced use Before reading this section all users should read the material of Section 2 Advanced applications may offer a full set of features and outputs for use by trained individuals This section includes descriptions of the diagnostic features available in the PESQ Tools option 3 1 Input options The input options are e Sampling rate section 2 2 2 e Choice of version 1 4 or 1 0 models sections 2 2 3 and 3 2 1 Psytechnics recommends the version 1 4 model Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 25 of 73 Psytechnics PESO User Guide Release 2 1 3 2 Outputs The following section describes the outputs that may be
80. uality value PESQ LQ has been introduced PESQ LQ scores are closer to the listening quality subjective opinion scale which is standard in the industry and is defined in ITU T P 800 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 20 of 73 Psytechnics PESO User Guide Release 2 1 2 4 4 Relationship between PESQ score and PESQ LQ The function which is used to calculate PESQ LQ is shown in Figure 3 Figure 3 Mapping from raw PESO score to PESQ LQ PESQ LQ 1 1 5 2 2 5 3 3 5 4 4 5 P 862 PESQ The mapping from PESQ score to PESQ LQ can be computed as follows if pesq_score lt 1 7 then pesq_lq 1 0 else pesq_lq 0 157268 pesq score 1 386609 pesq_score 2 504699 pesq score 2 023345 Issue 2 I PRO PESQOO 210 EDO0155 0 1 Page 21 of 73 Psytechnics PESO User Guide Release 2 1 2 4 5 P 862 1 The ITU has standardised a universal PESQ to MOS mapping This was created from a shared pool of subjective test results covering wireless VoIP fixed and codec only conditions including Japanese British English American English French German Italian Swedish Dutch and Finnish 2 4 6 Relationship between raw PESQ score and P 862 1 This mapping is continuous from PESQ 0 5 to 4 5 and MOS 1 to 4 55 It takes the form of a logistic with 4 parameters and is shown below Figure 4 Mapping from raw PESO score to P 862 1 MOS MATP EG 1 mapping FEB 1 mapped MOS The mapping from PESQ score to P 862 1 MOS ca
81. vidual recording e Requirement for multiple measurements of the same condition 5 1 2 Temporal structure Test signals should include speech bursts separated by silent periods to be representative of natural pauses in speech Speech bursts should normally be 1 3 seconds in duration To test certain types of voice activity detector silent periods should be at least 300ms in duration As a guide speech should be active for between 40 and 80 of the time 5 1 3 Level and frequency content A key factor in speech quality is the level the signal power usually quoted in dB In digital speech files a typical level is 26dBov Signals injected into the network should normally be at the appropriate calibrated level which may vary depending on the national standards and the impedance of the circuit As telephone handsets and analog networks both introduce filtering it is important that the test signals have a representative frequency content In other words they must be pre filtered in an appropriate way For fixed network measurements the modified IRS send filter is normally applied to the speech before injection into the network ITU T P 830 This attenuates strongly below 300Hz and also provides a small boost of about 10dB per decade within the passband Level is measured after the filtering has been applied 5 1 4 Source material Natural recorded speech or the artificial speech supplied with PESQ may be used as test signals Natural

Sample PESQ User Guide - Spirent Knowledge Base

Contents

Download Pdf Manuals

Related Search

Related Contents