Home

The CSLU Labeling Guide - Center for Spoken Language

image

Contents

1. 1 3 Levels of Transcription Following is a description of the three types of transcriptions distributed by the CSLU 1 3 1 Non Time Aligned Orthographic Non time aligned word level transcription is designed to indicate the content of an utterance without reference to time It is represented in a standard orthography or romanization and distinguishes between speech and non speech information Conventions for non time aligned transcriptions are described in chapter 2 CSLU Corpora that have utilized non time aligned labeling in part or all of the corpus are Spelled and Spoken Names Stories Words Numbers and Phrases Names Numbers 22 Language Cellular Speech and Alphadigit Non time aligned word transcriptions are generally created in a text editor so the text files appear much the same as the text of this document but without punctuation capitalization or indentation A non time aligned transcription of the phrase Oh my dream house might look like the following pau ohh my dream house 2 CHAPTER 1 INTRODUCTION AND OVERVIEW The non speech label indicating a pause is abbreviated and appears in pointy brackets The exclamation oh is transcribed ohh to disambiguate between the letter o and the number oh 1 3 2 Time Aligned Orthographic An example of a time aligned word transcription appears in Figure 1 1 Figure 1 1 Orthographic and phonetic time alignment of the alpha digit sequence r one five T
2. mid high mind front short centralized I sher lion mid high front long heh is mid low front short menaa sparrow mid low front long ei O9 uma rath chariot mid central mid low central short high back rounded long muuk quiet high back rounded ruk stop mid high back short mot aa fat mid high back rounded aur and mid low mid back pataa address low front low front long D SM DH Cep amp Table 6 14 Hindi Diphthongs Worldbet al ay bhaiyaa aci aU aw laut a gt U 6 5 HINDI 6 5 2 Notes on Hindi Vowels 65 e Some vowel length distinctions were not made in Hindi in the first release of the multi language database a was labeled as short and u was always labeled long e We added the symbol to Hindi 6 5 3 Hindi Consonants Hindi has labial lab dental dent alveolar alv retroflex ret palatal pal velar vel and glottal gl consonants Table 6 15 e voiceless unaspirated stops voiced unaspirated stops voiceless aspirated stops voiced aspirated stops voiceless unaspirated affricates voiced unaspirated affricates voiceless aspirated affricates voiced aspirated affricates voiceless voiced unaspirated aspirated taps Hindi Consonants jam HR e e CHAPTER 6 PHONETIC LABELS Table 6 16 Hindi Consonant Examples Worldbet pal moment phal fruits bal strength b
3. Hieronymus because of its multi lingual extensibility Phonetic training is an important asset because it enables the listener to hear accurately those deviations from the expected pronunciation that are common to fluent speech An ability to read and interpret acoustic cues provided in the spectrogram and waveform is important as are tools for viewing and segmenting of the speech signal Toolkit has a nice acousic display thanks to recent improvements made by Tim Carmel and Johan Wouters and the tool is free to academic universities and corporate sponsors Knowledge of the language being transcribed is strongly recommended Labeler reliability studies have shown that agreement is wildly different when labelers do not know the language they are labeling 5 9 Segmentation and Label Selection Placing exact boundaries is difficult In continuous speech many boundaries that are intuitively perceived by listeners do not exist when the speech is examined acoustically Words overlap and phones become coarticulated Because of this the boundaries we assign when we label speech are sometimes artificial Ambiguous cases are specified by rule to achieve consistency It is our contention that the ear is the not the most reliable source to use to determine where to place boundaries Labelers are encouraged to listen to the labeled segments in context as coarticulation alters the percept of a sound in isolation For the first pass the labele
4. St hle high front long rounded i mid high mid front short rounded I mid high front rounded long mid low front short mid low front long mid low front rounded short mid high front long low front low front long mid low mid back mid low central mid high back rounded long mid high back short high back rounded long central short mid low central mid low front long German has bilabial labiodental alveolar palatal velar glottal and uvular consonants Ta ble 6 11 gives a list of the consonant labels used in German 62 CHAPTER 6 PHONETIC LABELS Table 6 9 German Diphthongs Worldbet Tier T r Geh r er Gewehr leite Laute Jahr Leute Tor Ruhr Table 6 10 German Consonants a ETT ale pal vet t ar e stops Lil voiced stops voiceless affricates voiced affricates voiceless sibilants voiced sibilants voiceless fricatives voiced S trill nasals ef feds Lis a 1L ES 6 4 GERMAN Table 6 11 German Consonant Examples and Descriptions passe voiceless bilabial stop voiceless bilabial stop closure Bass voiced bilabial stop voiced bilabial stop closure Tasse voiceless alveolar stop voiceless alveolar stop closure das voiced alveolar stop voiced alveolar stop closure Kasse voiceless velar stop voiceless velar stop closure Gasse voiced velar stop voiced velar stop closure fasse voiceless labiodental fricative wasser voiced labiodental fric
5. 55 56 6 2 English 6 2 1 Vowels CHAPTER 6 PHONETIC LABELS Table 6 1 English Vowel Chart not including retroflexes Lp Tem Ta beet bit bet bat roses suit above to go pot boot book above caught father bird butter 6 2 2 Diphthongs 6 2 3 Notes on English Vowels e Rhotacized vowels are included in Table 6 2 high front long mid high mid front short mid low front short mid low front long centralized I fronted u mid central short voiceless amp British high back rounded mid high back rounded short mid low central mid low mid back rounded low back rhotacized mid central short rhotacized mid central e The carat is normally found in stressed syllables It is longer and slightly lower than the reduced vowel amp e The syllabic retroflexes 3r and amp r have a similar distinction as and amp where 3r is normally found in stressed syllables and amp in unstressed 3r is longer more tense and slightly lower than the reduced vowel amp r e Often speakers of American English do not round the high back vowel u The allophone label u i will be used in these cases 6 2 ENGLISH 57 Table 6 3 English Diphthong Examples and Description Worldbet e Note that i amp e amp u amp and 5 are most commonly found in British pronunciation 6 2 4 English Consonants English has bilabial labiodental alveolar alveo palatal velar and glottal consonants abbre viated
6. average length of a word initial closure in running speech e Likewise when a word phrase or utterance ends in a voiceless stop that is not released the stop closure label should extend 100ms after the energy in the waveform dies out This value was chosen as an average length of word final closures in running speech Voiced closures e The voicing in a voiced closure is normally visible in the waveform Set the left boundary at the point of most radical change in the waveform 5 9 SEGMENTATION AND LABEL SELECTION 4T e If the voicing in the phonemically voiced closure is not evident the closure should be labeled with the devoicing diacritic and the label should extend 50ms word initially and 100ms word finally Again this length is arbitrary but was chosen as an average length for consistency e A voiced stop which follows a nasal often has no visible closure this is either because the closure is very short or because the voicing makes it indistinguishable from the preceding nasal As the closure interval is too difficult to isolate it should not be marked when the place of articulation is the same for the nasal and the closure part of the nasal acts as the perceived closure The velum is closed just before the burst to allow the pressure to build up for the burst This build up interval can be very short and need not be labeled For example a dc preceded by a n is usually imperceptible and does not require a label Sometimes
7. inside dentro coma coma goma glue chica little girl llama he calls sabio wise favor favor plaza plaza cada each sol sun est s you are desde since dos two ella she jota letter j lago lake perro dog pero but matar to kill nadar to swim ni o child cinco five lana wool hueso bone ortilla tortilla tortilla tortilla llorar to cry voiceless bilabial plosive voiceless bilabial closure voiced bilabial plosive voiced bilabial closure voiceless dental plosive voiceless dental closure voiced dental plosive voiced dental closure voiceless velar plosive voiceless velar closure voiced velar plosive voiced velar closure voiceless palatal affricate voiceless palatal affricate closure voiced palatal affricate voiced palatal affricate closure voiced bilabial fricative voiceless labiodental fricative voiceless interdental fricative voiced dental fricative voiceless dental sibilant aspiration replacing s voiced dental sibilant voiceless palatalized s voiceless palatal sibilant voiceless velar fricative voiced velar fricative alveolar trill alveolar retroflex flap bilabial nasal dental nasal palatal nasal velar nasal dental lateral labiovelar glide palatal lateral approximant palatal glide Chapter 7 Diacritics t h h aspirated cen
8. lt fp gt lt laugh gt In long gt ls lt nitl gt lt ns gt lt pau gt lt pron gt lt sneeze gt lt sniff gt tc uu VS sp lt yawn gt lt whisper gt CHAPTER 3 DESCRIPTION OF NON SPEECH LABELS BXN EXN inbreath outbreath BXS EXS sneeze sniff tc uu LVS spelling BA EA abbreviation whisper beginread endread Table 3 1 Non speech labels for Word Level labeling Non Tine Aligned Time Aligned grammatically altered word due to mixing of two languages heavily aspirated p t or k or puff at end of word that is not a breath always connect cut off speech always connect a beep sound connected or not temp signal blip connected or not background noise connected or not begin simultaneous background noise end simultaneous background noise breathing noise never connect inhalation exhalation background speech connect or not begin simultaneous background speech end simultaneous background speech a burp never connect a cough never connect a clear throat never connect generic filled pause false start connect or not glottalization laughter connect or not line noiseconnect or not elongated word always connect lip smack never connect not in the language connect or not non speech connect or not pause or silence never connect an odd pronunciation always connect sneeze never connect sniffing sound never connect tongue click neve
9. right corner of the letter The o has both a hook and a dot underneath it This marking is different than the one in the word tra e In Vietnamese version of cha m the carat should be above the letter a not to the left of it 38 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES e In the Vietnamese version of nhie the and should both be on top of the e e In the Veitnamese version of ngie u the marking should look like the upper half of a question mark and be connected to the top center of the letter The actual diacritic does closely resemble this glottal stop symbol This is a different marking than in du o c 4 5 SPECAL CONVENTIONS IN EACH LANGUAGE 39 Table 4 2 Special characters Czech Czech OGI tis arts esky c esky palatalization hook ted palatalization hook tat ka palatalization hook ot zka acute accent u w circle on top krouz ek Table 4 3 Special characters French coute listen e coute acute accent l ve rise le ve grave accent francais french franc ais cedilla t te head te te circumflex cigu hemlock cigue diaeresis Table 4 4 Special characters German German OGI Type me ono ae a umlaut 6 o umlaut u umlaut beta acute accent acute accent acute accent acute accent acute accent o umlaut u umlaut double accute accent double accute accent 40 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES Table 4 6 Japa
10. a glottal stop replaces a t it will be transcribed th_ 7 8 Lateralization Often in the context of a lateral segment vowels will become lateralized At times no distinct is produced and the only artifact of the phoneme l is lateralization on the vowel The l diacritic is used on vowels that have become lateralized when a distinct 1 cannot be seen in the spectrogram This allows for recovery of a phonemic level transcription without requiring lexical knowledge If an lis visible usually a segment with slightly lower amplitude having similar formant frequencies as neighboring vowels the lateralization diacritic should not be used as it is a predictable coarticulatory effect When the l diacritic is used with the base label 1 it signifies lateral release This usually looks like a vertical bar on the spectrogram and sounds like a clicking noise It is a common articulatory effect caused by tongue movement during the release of the lateral phone 1 7 9 The Lengthening Diacritic Length thresholds are difficult to set due to variable speaker rates We quantify length respective to each speaker If a given phone is significantly longer than similar phones in similar contexts by the same speaker the _ diacritic should be considered The diacritic _ indicates relative lengthening It is frequently used for vowels that are prolonged for emphasis or for a filled pause The elongated symbol is often combined with the filled pause diacritic
11. and chC closure voiceless labiodental fricative voiceless aspirated dental fricative voiceless retroflex fricative voiceless palatal fricative voiceless glottal fricative bilabial nasal alveolar nasal velar nasal alveodental liquid advanced r labiovelar glide palatal glide 6 7 MANDARIN 73 Because tones cannot be detected without voicing they are rarely labeled in these cases e The allophone jw has not been included due to its low frequency e Mandarin Chinese has a very strict syllable structure although there can be a great vari ety of combinations of vowels in syllables some having monophthongs some diphthongs and some triphthongs the consonant structure is comparatively rigid The syllable pat tern is C V V V or N Initial consonants are optional and only nasals usually n or N are permitted word finally e The label tshr was labeled tsR in the first release of the multi language corpus e The label chC was labeled cCh in the first release of the multi language corpus e The label x was labeled as x in the first release of the multi language corpus 6 7 4 Mandarin Tones There are 4 tones in Mandarin They are labeled with the vowel by number as the vowel is a reliable marker of the syllable nucleus Tone are marked on all vowels except whispered vowels and filled pauses Tone 1 High level tone i 1 cloth Tone 2 High rising tone i 2 to suspect Tone 3 Falling rising tone i 3 chai
12. boundary where the waveform shows change 5 9 1 Stops Affricates and Trills Stops affricates and trills usually manifest clear changes in the waveform especially when they follow a pause or a closure When labelers see movement in the waveform begin they should set the left boundary of the burst label When using the OGI speech tools Only only need set the left boundary as the right right boundary of one label is automatically aligned to the label follows when the file is saved using the write align command Manually extending the right boundary leads to the formation of overlapping boundaries which by convention are not allowed in transcriptions 46 CHAPTER 5 PHONETIC LEVEL LABELING Stop bursts that are heavily aspirated are the easiest to mark Stop bursts with no aspiration may be more easily heard than seen To identify unaspirated stop boundaries look for a single pulse in the waveform that is much lower in amplitude than the following vowel In fast speech plosives are often released gently with little or no pressure build up If a stop burst does not appear readily in the waveform or spectrogram it is possible that one did not occur Do not allow phonemic expectation to alter interpretation of what is actually in the signal Some strategies for identifying stop bursts 1 Set a finer resolution on the waveform Normally when transcribing a resolution of 001 seconds per pixel is used but 00025 will make segmentation sim
13. is for utterances that are basically unintelligible It should be used rarely Here are some examples to clarify Person says uhm i like coffee transcribed uhm i like coffee The word uhm appears in the table Person says er well i don t know transcribed er lt sp gt i don t know er does not appear in the table but can be sounded out Person says some odd sound i mean catfish transcribed fp i mean catfish There is not an adequate word in the table and it cannot be sounded out 3 11 glot This label is only to be used in time aligned transcriptions It should appear in the comments box when excessive glottalization occurs in the speech If the glottal pulses appear to separate noticeably relative to the rest of the speech the glottalization label should be used Glottalization should be included as a part of the word in the regular word box and the label glot should appear in the comments window spanning the period of glottalization As a general rule glottalization is not noted in the non time aligned transcriptions because transcribers do not have access to the spectrogram during transcription If glottalization can be detected using the waveform solely the transcriber is encouraged to use the label 3 12 laugh or laugh A laugh If the speaker laughs while saying a word connect the label to the word my lt laugh gt favorite lt laugh gt dish lt laugh gt is lt laugh gt ribs lt laugh gt Th
14. like this A_ _fp Based on the labeling we have done we have determined that vowels longer than 300ms should receive this diacritic 7 10 NASAL RELEASE 81 7 10 Nasal Release The Nasal release diacritic n may only be applied to a base label that is a nasal The nasal release is spectrally similar to the lateral release and it also is caused by movement of the tongue as the nasal is released Often nasals are released with a trailing vowel The vowel should be transcribed with a label describing the quality of the vowel usually The word ten pronounced tc th E n A is a common example 7 11 Nasalization The diacritic denotes nasalization The nasalization diacritic is normally used on vowels or diphthongs Nasalized vowels and diphthongs are spectrally distinguished by a split or separation in the first formant This diacritic will be used when nasalization is not predictable When nasalization can be predicted by phonological rule i e when in the context of a neighboring nasal it is not labeled Nasal deletion is a common reduction occurring in fast speech Sometimes the only acoustic clue that a nasal phoneme occurred is nasalization on the vowel as in some pronunciations of mountain If acoustic or auditory evidence remains signaling nasality but no distinct nasal is evident in the signal the nasal diacritic should be used on the vowel Although this is a predictable environment for a nasal vowel the nasaliza
15. phone in the middle of the word letter is a tap Trill Produced by multiple rapid taps of the tongue The Spanish and French r s are often trilled Approximant or semivowel Sound produced when one articulator is close to another but not close enough to produce a turbulent airstreem The words world and yes begin with approximants A 3 3 Additional Features Voiced Sounds produced when the vocal cords are vibrating Voiceless Sounds produced when the vocal cords are apart Aspiration Audible release of air during production of a phone A 4 VOWELS 8T A 4 Vowels The tables in Chapter 6 describe each vowel using the terms high mid low front central and back These words describe where the vowel is produced in the mouth The following chart illustrates the location denoted by each term Front central and back describe the part of the tongue that is the highest during production while high mid and low describe the height of the tongue To say that a vowel is high front for example means that the front blade of the tongue is high in the mouth when the phone is produced A mid back vowel means that the back body of the tongue is raised to a mid height i e raised slightly Table A 1 Vowel Chart C pes high mid low This table is a blank vowel chart which could be filled with any of the similar charts of chapter 8 Its form mimics the cardinal vowel triangle which graphically displays vow
16. phonetic segmentation 44 listening 7 in context 39 english 52 german 56 hindi 60 japanese 63 mandarin 65 spanish 69 m english 52 german 56 95 manner of articulation 80 mispronunciations 10 N english 52 german 56 hindi 60 japanese 63 mandarin 65 spanish 69 english 52 german 56 mandarin 65 spanish 69 german 56 n hindi 60 N english 52 german 56 n english 52 german 56 nf hindi 60 japanese 63 nd english 52 nasal devoiced 44 on vowel 44 phonetic segmentation 44 nasals description of 80 nj spanish 69 non speech cough 24 ct 24 vs 27 background noise 22 background speech 23 breath noise 23 burp 24 filled pause 24 how picky should i be 8 96 line noise 25 lip smack 25 pause 26 phonetic labeling 46 unintelligible speech 27 nr hindi 60 o japanese 61 spanish 67 o german 54 hindi 57 japanese 61 oax german 55 oh the exclamation 11 ol japanese 61 oi spanish 67 oU english 51 mandarin 64 P hindi 59 japanese 63 mandarin 65 spanish 69 pause 26 Dr english 52 german 56 hindi 59 japanese 63 mandarin 65 spanish 69 pf german 56 pfc german 56 pH hindi 59 ph english 52 INDEX german 56 mandarin 65 phoneme 4 phonetic labeling techniques 37 phonetic transcription diacritics 71 aspiration 72 cut off speech 76 flap diacritic 73 formation 3
17. process generally involves moving the file to another directory where it will undergo no more processing Discarded files are not included in the release of speech data files to researchers 16 CHAPTER 2 WORD LEVEL CONVENTIONS 2 5 Time Aligned Conventions 2 5 1 Word Box Conventions e Overlapping boundaries Lola boxes which have overlapping boundaries are illegal Geminates will be split where the waveform shows change If the waveform or spectrogram manifest no particular changes the geminate will be split in half e Extension of Labels Labels should only be extended from left to right If a box is originally created at 30ms the left boundary will always remain at 30ms unless the transcriber physically moves the left boundary Thus if one drags the right boundary past the left boundary and sets the right boundary at 10ms the result will be a backwards label box and resulting error in the lola file e Closures A closure is the articulation which begins the phonemes p t k b d g ch and jh When a word begins with any of these sounds look for spectral evidence to signal the beginning of the closure For the voiced closures of b d g and jh there is normally an appearance of F1 in the spectrogram preceding the burst For voiceless closures there is sometimes a small lipsmack which signals the closure of the articulators If there is no spectral evidence which signals the beginning of the closure set the word label bound
18. pron on words that are so heavily accented as to inhibit intelligibility To avoid over use of this label pron should not be used either for subtle or for predictable pronunciation variations A subtle pronunciation might be a non native English speaker who tends not to aspirate the voiceless plosives p t k syllable initially in English A predictable pronunciation is a pronunciation that is predictable by phonological rule like mystery becoming mystry etc There are some tricky cases where large numbers of people tend to pronounce words one way often considered wrong by others who think the former speech is full of errors One example is Latin et cetera pronounced by many as E k s E t 3r Another is escape pronounced e k s k ei p rather than e s k ei p Another example espresso pronounced E k sprEso rather than E pr E s o Because it is debatable whether or not these so called mispronunciations are based on phonological rules followed by certain speakers and not other speakers or whether they are ideolectical I tend toward the former analysis the safe thing to do is to attach the pron tag to such words of questionable pronunciation There are no word level transcription conventions to mark dialectical variation specifically If dialect information is necessary a verification of the corpus designed to note dialect should be considered as a separate process 24 CHAPTER 3 DESCRIPTION OF NON SPEECH LABELS 3 20 sneeze or
19. pronounces a word in a foreign language the label nitl should be used 3 16 1 nitl at the Time Aligned Level If the transcriber cannot decipher individual words the label ot should span the entire part of the waveform containing foreign speech If the transcriber understands the foreign language or can at least make a guess at what is being spoken the time aligned transcription should be aligned to the waveform just as words in the appropriate language are aligned in other words when the transcriber can transcribe the individual words no special transcription conventions are used at the time aligned level for foreign speech 3 16 2 lt nitl gt at the Non Time Aligned Level For non time aligned transcriptions however speech not in the language can be dealt with in a more specific way A transcription of a person speaking Spanish when another language was called for might look like this nitl mi casa es su casa where the label nitl is followed by a colon then a space and then the words as close as they can be deciphered all within the pointy brackets Note This is the only case in non time aligned transcriptions where speech by the caller can be transcribed within pointy brackets See also section for use of the colon in pointy braces to further define non speech symbols 3 17 NS OR NS 23 3 17 us or ns The label ns is used for sounds that are made by the speaker s mouth but that are not sp
20. starts and cut off speech are the same although the infor mation yielded by the various transcriptions is different in a significant way e Cut off speech at the beginning of a file indicates barge in or interruption of the prompt by the speaker Barge in can be a significant and non trivial problem for speech re searchers False starts within the file indicate interesting information about the speech at the higher discourse level The type or frequency of false starts might be of interest to linguists or others who want to study disfluencies in continuous speech Cut off speech at the end of a file indicates either a utterance detection was not working properly b the speaker chose to talk longer than the allowable time Both of these issues are significant The first is important if the system designers were indeed employing a technique to detect when a person has stopped talking The fact the the file was cut off is a good indication that the technique could use some tuning up The second issue could be significant for the designers of the interface who in order to ensure the interest and cooperation of the caller seek to develop friendly and natural interfaces If each person who calls gets cut off by a loud beep when they answer certain questions perhaps developers would want to consider changing the recording parameters so that the calling experience is more satisfying for the participant This will make the caller less likel
21. tricky areas For example there are some words we have identifed as spoken only There exists no written character for these words as they occur in the spoken language only These words will be identified in the transcription with the tag so for spoken only connected to the end of the word In the sentence leih 5 liu lt so gt 1 jo 2 heui 3 bin 1 where did you go liu lt so gt is considered spoken only it is colloquial and would be replaced with a word such as jau 2 in formal or written language Other words that are spoken only include je le gak and ha These words are not generally found in dictionaries or formal treatments of the language However because they are used so often they are starting to be printed in some newspapers and magazines Preserving a more traditional view of the language we attach the so tag to the romanized word The only other time the mapping between the romanization and the original chinese char acter will not be straight forward is when there are two different chinese characters that are pronounced the same These words will need to be disambiguated by context For example the character pronounced leih 5 can be translated in the following ways 1 Lee leih 5 siu 2 je 2 Miss Lee 2 you leih 5 hou 2 ma 3 how are you 3 care mh 4 leih 5 don t care 4 sole yat 1 faai 3 leih 5 a piece of shoe s inner sole Now that the romanization transcriptions are nearly complete we are working
22. with those label lists in with a few exceptions All deviations from standard Worldbet are annotated The differences are minimal i e an occasional added symbol 5 6 Label Assignments To the unaccustomed user Worldbet symbols may seem awkward Due to a desire to create a label set which corresponded to IPA labels and a need to use only ASCII symbols specific label assignments were at times somewhat creative It was the intent of its author to remain as close as possible to the IPA by using symbols related to the IPA in some visible way Admittedly this relationship was sometimes only true in the abstract sense It is hoped that the unfamiliarity of this label set will not be a hindrance to its use as it is a valuable tool when phonetic transcriptions of multiple languages are needed 5 7 Diacritics Where as in some transcription sets such as the IPA diacritics are seen as super or subscript characters worldbet diacritics are made distinct from base symbols by use of the linking symbol the underscore So a glottalized vowel might be transcribed A_ For a detailed description of diacritics see Chapter 5 8 Necessary Tools Phonetic Labeling requires a number of skills and tools discussed here One must chose a transcription set suitable for the task There are many philosophies embodied in existing label sets CSLU uses the Worldbet transcription set developed by James 5 9 SEGMENTATION AND LABEL SELECTION 45
23. yet begun to transcribe Farsi However Farsi is written with the Arabic script so the romanization method will likely be similar to that used in Arabic 4 5 6 French Standard French is written with the Roman alphabet French has five characters which are not found on most standard American keyboards the acute and grave accents and the cedilla circumflex and diaeresis The numbers will be transcribed using dashes as is done in standard French orthography This avoids ambiguities such as the following vingt deux 22 and vingt deux 20 2 The French often fill thinking space with sh The label sh has been adopted as a filled pause label See table 4 1 and table 4 3 4 5 7 German Standard German is written with the Roman alphabet There are two non ascii characters for which we have devised a conversion scheme the umlaut and the beta See table 4 4 Capitalization Unlike English and all proper names and all nouns are capitalized in German transcriptions as is done in standard German orthography It is important to preserve German rules of capitalization because they can in some cases disambguate words German words that are sentence initial are only capitalized if they are nouns or proper names 4 5 8 Hindi Hindi has been transcribed orthographically The Hindi script for which I have not input Latex fonts has been romanized This information is currently being stored off line in my trusty file cabinet under Languag
24. 1 background noise 22 background speech 23 breath noise 23 burp 24 citation pronunciation 8 filled pause 24 glot defined 25 line noise 25 lip smack 25 pause 26 taking notes 8 technique 7 techniques 7 unintelligible speech 27 worldbet 4 x german 56 japanese 63 spanish 69 Y german 54 y mandarin 64 y german 54 yax german 55 Z english 52 german 56 hindi 60 japanese 63 Z english 52 german 56 hindi 59 japanese 63 spanish 69 Zero word transcriptions 11 99
25. 3 1 French Vowels Table 6 6 French Vowel Chart Front central Tack al 3 v mid e 7 EE 887 amp gt A 6 3 2 Notes on French Vowels e y is rounded i Some speakers have such a high placement of y that it is slightly fricated This is still labeled y 7 is rounded e ipa e 8 is a rounded E ipa ce e Many French speakers do not distinguish between 7 and 8 but those that do contrast jeune Z 8 n young with je ne Z 7 n fasts 7 e The nasalized vowels A gt E and 8 have the same placement as their non nasalized counterparts A gt E and 8 6 3 3 French Consonants 6 3 4 Notes of French Consonants e K the voiced uvular fricative is commonly devoiced especially word finally pasteur When the phoneme K is devoiced it is labeled K O e X the voiceless uvular fricative was introduced for foreign words and should not be used when K is devoiced 6 4 GERMAN 6 4 German 6 4 1 German Vowels Table 6 7 shows the German vowel chart and Tables 6 8 and 6 9 list the vowels and diphthong labels that are used Table 6 7 German Vowel Chart Lew penes Trac high 1 Y mid e 4 low a Table 6 8 German Vowels Examples and Descriptions Worldbet vw sv o Di o or rai e EGGS v ei 6 4 2 German Consonants bieten Giite bitten Mutter Goethe Betten bate G tter beten Ratte raten Rotte rot Kutte Rute Gesetz besser
26. 649 A 649 844 9r 844 895 pau 895 994 w 994 1089 1089 1174 n 1174 1300 f 1300 1506 aI Our level of labeling The aim of phonetic phonemic transcriptions is to represent the phonetic content of an utter ance at a given level of detail The phonetic transcription conventions used to label at the CSLU are found in Chapter 5 and following chapters We label with Worldbet and the re sult is a transcription containing mostly phonemic base labels and limited phonetic detail The phonetic detail is made explicit by use of diacritics Phonetic phenomena that we transcribe include excessive nasalization glottalization frication on a stop centralization lateralization rounding and palatalization See chapter for a description of all salient phonetic features captured in our transcriptions Other levels of labeling Phonemic transcriptions capture only those distinctives that are contrastive in a certain lan guage while phonetic transcriptions explicitly mark allophonic variation Consider the three contrastive plosives occurring in English For a purely phonemic transcription one would posit only the three voiceless stops p t and k and would not differentiate the phonetic variation ex isting in certain contexts like k in kite as opposed to k in queue As many phonetic realizations are predictable by phonological rule they do not always need to be transcribed explicitly The TIMIT label set which is used to transcr
27. 7 Hindi Consonant Examples continued Worldbet hat move man mine naam name gun ii talented lihng gender lataa tendril savera morning per tree parh aaii studies yaad person s name 6 5 4 Notes on Hindi Consonants voiced alveo palatal fricative voiceless palatal fricative voiceless glottal fricative bilabial nasal dental nasal retroflex nasal patalized nasal velar nasal alveo palatal lateral r flap not retroflexed retroflex plosive aspirated retroflex plosive palatal glide bilabial glide e All flapping is allophonic in Hindi and will be transcribed using the flapping diacritic Flapping commonly occurs with b d g n and rr e Worldbet n has not been included as a base symbol because it is not phonemic When a palatized n occurs it will be labeled n_j 68 6 6 Japanese CHAPTER 6 PHONETIC LABELS 6 6 1 Japanese Vowels Table 6 18 Japanese Vowels rent central back Wig Dr Cw fee oo Op e wea 1 Table 6 19 Japanese Vowel Examples and Descriptions Worldbet SEENEN ichi one high front unrounded iie no high front unrounded long koe voice mid front sensei teacher mid front long uta song high back unrounded futsuu ordinary high back unrounded long igo Igo game mid back rounded tookyoo Tokyo mid back rounded long san three low ce
28. 8 glottalization 73 labialization 75 length 74 nasalization 74 palatalization 76 rhotacization 75 stop frication 73 unrounding 76 voicing 75 purpose 37 segmentation 39 phonetic transcriptions purpose 1 place of articulation 79 q hindi 59 R german 56 german 56 spanish 69 r hindi 60 spanish 69 r mandarin 65 resolution adjustment 40 retroflex definition of 80 romanization 2 29 rr german 56 rr hindi 60 INDEX rr H hindi 60 english 52 german 56 hindi 59 japanese 63 spanish 69 english 52 german 56 hindi 59 japanese 63 mandarin 65 spanish 69 segmentation affricate 39 closures 40 fricative 43 ghost bursts 39 nasal 44 phonetic level 39 stop 39 trill 39 unreleased closure 42 voiced closure 41 voiceless closure 40 silence 26 spectrograms computation of 39 ghost bursts 39 spelling 8 28 non time aligned word 12 st mandarin 65 stop aspirated 40 closure segmentation 40 description of 80 invisible burst 40 phonetic segmentation 39 unreleased 42 waveform resolution 40 symbol set contents 38 97 english 52 spanish 69 hindi 59 japanese 63 mandarin 65 spanish 69 hindi 59 japanese 63 t H hindi 59 t h mandarin 65 japanese 63 t sc japanese 63 tap description of 80 tc english 52 german 56 mandarin 65 spanish 69 th english 52 german 56 th_ e
29. English e Foreign words nitl revisted a discussion of how to handle foreign or accented words as well as how to handle newly created words which are a combination of two or more different languages the alt tag e l illed pauses a discussion of the filled pauses developed for languages other than English e Specal Conventions in each language a subsection is devoted to each language to describe the special conventions used to transcribe that language including the romanization for that language and diacritic markings 4 2 Non Speech Events The non speech labels appearing in Chapter 3 Table 3 1 are used in transcriptions of all lan guages unless otherwise noted For rules of use see chapter 3 Two labels do not appear in Chapter 3 Table 3 1 because they are not used in English They are lt alt gt and lt so gt lt alt gt has been added to transcribe words that have been grammatically altered so that they incooporate elements from more than one language lt alt gt is described in section 4 3 and following The lt alt gt tag could theoretically be used in English but we have not run into a case yet that required its use The label lt so gt was developed specifically for Cantonese and is described in section 4 5 2 29 30 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES Multilanguage labelers should take note that the non speech labels described in Chapter 3 are to be considered a part of the transcr
30. IGNED CONVENTIONS 9 4 Connecting non speech labels to words signifies that the non speech event happens at the same time the word was spoken The tags should always be attached to the end of the word they modify order of tags is not important See section 5 Non speech labels should not be connected to one another unless they are also connected to a word Non speech labels in parentheses should never be connected to anything 2 3 3 Ordering of symbols There is actually a great deal of information captured in the word transcriptions at the non time aligned level In the absence of time markers we show simultaneous events by connecting non speech labels to the word spoken in the foreground there are over 20 different types of non speech events With all this information transcriptions must be structured in order to be parsable With this need for structure in mind lets revisit the word gt format Word is the only required element Neither the asterisks nor the s are required but if the are present the asterisk is required Non speech information appearing in pointy brackets always appears last in the word The attachment of a non speech label means nothing as far as where in the word the event happened whether closer to the beginning or end rather the attachment only indicates simultaneity This enables the non speech label to always appear at the end of the word The following are all valid transcriptions en
31. The CSLU Labeling Guide T Lander Center for Spoken Language Understanding Oregon Graduate Institute May 15 1997 Acknowledgements The Authors of this Book would like to thank the following people without whose support this work could not have been completed Dr Ron Cole Director of CSLU for his vision of phonetically labeled speech and the support which enabled the research Dr Beatrice Oshika for linguistic expertise in the development of initial label sets for editorial comments on this document and for ongoing support and training of labelers Dr Etienne Barnard for input as head of the Language Identification Project Dr Jim Hieronymus for the multi language transcription set Worldbet Vince Weatherill for input as Center Manager and initial doc umentation of conventions Dr Mark Fanty John Pochmara and Johan Schalkwyk for the development of the OGI Speech Tools Mike Noel for input as Corpus Development Man ager and the labelers including Takayuki Arai Japanese Li Jiang Mandarin Troy Bailey Spanish English Anne Johansen Spanish German Kay Berkling German Dana Mitchell English Jim Brennan English Victoria Noel English Marlyse Cathery French English Katsutoshi Ohtsuki Japanese David Cole English Kal Shobaki English Terri Durham English Angie Fujioka English Vince Weatherill English Alexandra Guerra Spanish Amie Wilson English Zhihong Hu Mandarin Yonghong Yan Mandari
32. a list of non speech labels Table 3 2 shows a list of filled pause labels Table 3 3 shows a list of other words common to the spoken langauge 3 1 Overview A description of each non speech label follows Keep in mind that these labels are used in two types of transcriptions non time aligned and time aligned At the non time aligned level the labels appear in pointy brackets e g lt bn gt at the time aligned level they appear with a preceding period e g bn The terms word box and comments box refer to time aligned labeling Conventions specific to one type of transcription are noted otherwise assume the convention applies to both time aligned and non time aligned NOTE Most of the non speech labels refer to a set of sounds bn could be a door slamming music or general environmental noise In could be clicking noise static or a buzz from the phone line There is no mechanism in place for further defining these labels More detailed transcription conventions could be developed if it were important to specifically define each non speech event in a file 3 2 Asterisk The asterisk is used to denote both cut off speech and false starts at the word level Due to the complex nature of the use of the asterisk and the obvious implications for ordering of symbols it has been described in detail in Chapter 2 section 2 3 6 for cut off speech and 2 3 7 for false starts 3 3 blip or lt blip gt This label is used when the sig
33. adually until it is indistinguishable from environmental or line noise Set the right boundary where the first formant in the spectrogram dies This should coincide with a point of radical change in the waveform 5 9 8 Geminates A geminate is two identical sequential phonemes In the phrase nine nouthetic counselors there are two geminates n in nine and nouthetic and the velar closure and optional burst s between nouthetic and counselors The burst is optional because in this context often the first plosive is not released and the geminate is realized in the length of the closure Geminates are generally one and a half times longer than single phones and are usually marked by spectral discontinuity such as lower amplitude Following are the conventions used to label geminates Splitting geminates is useful in speech recognition where there is no facility for between word phonological rules One can always automatically merge geminates if there is no need to explicitly represent them Lowered Amplitude Boundaries between geminates may be signaled by lower amplitude In this case the phonemes are continuous but the amplitude decreases when articulation of the second phone begins There is no intervening pause Place the boundary between the two phones where the amplitude drops the most radically labeling both segments with the same phoneme label if there are no other acoustic differences between the two segmen
34. an dialect and bs indicates background speech that can be heard during the entire file Square brackets Square brackets are used to contain the transcriber s best guess when speech has been cut off or when there has been a false start If a text appears in square brackets there should be no acoustic evidence for that utterance When transcribing text in square brackets you are using your knowledge of the language to supply information that is not actually in the signal Note the following example the colors of the flag are red white and b lue Here although the speaker was cut off while uttering the word blue as speakers of the language we can confidently supply the information missing from the signal If a confident guess cannot be made about the cut off speech no guess should be made See 2 3 6 my name is sp 2 3 2 Non Speech Labels Quick Summary 1 Any word or string of characters not appearing within pointy brackets or parenthesis is assumed to be a word spoken by the caller This will be referred to as a foreground event 2 Words spoken by the caller should never appear in pointy brackets or parenthesis Inferred speech speech that was intended but not actually uttered may appear in square brackets See Section 3 3 Putting a non speech tag in parenthesis means that the non speech event happens during the entire file The parenthesized tag s must appear at the beginning of the transcription 2 8 NON TIME AL
35. and the phone is said to be terminated if other formants continue in an obvious way the devoicing diacritic should be used on the end of the phone for example n 0 In some languages a nasal may be indicated by heavy nasalization of the preceding vowel it may not be possible to isolate the nasal This happens frequently in languages like English where nasalization is not phonemic Be careful that a separate label is not given to a nonexistent nasal If there is no indication of a nasal in the waveform but you hear nasalization use the diacritic a_ on the vowel Use the nasalization diacritic for the whole vowel even if the nasalization does not carry throughout the entire segment See for more information on labeling nasals 5 9 5 Liquids The onset of a liquid is marked by the disappearance of f3 after a vowel or the appearance of fl and or f2 after a nasal or obstruent There is a corresponding change visible in the waveform after a vowel the onset of a liquid may be as gradual as the onset of a semivowel use the guidelines listed for separating vowels from other vowels or semivowels to determine an appropriate boundary point 5 9 6 Vowels and Approximates The onset of a vowel or approximate following a nasal or obstruent is marked by the appearance or darkening of formants in the spectrogram and by increased amplitude and or periodicity in the waveform as always the point of most radical change in the waveform when it can b
36. ary 50ms before the burst This length was chosen as an average length of word initial stop closures in english For words ending with closures extend the boundary of the word 100ms after the energy of the preceding phone stops 100ms was chosen as an average length for word final closures in english e Unreleases If a stop comes at the end of a word such as cat or pot some speakers will release the plosive with a burst which is visible in the spectrogram and some speakers will not release the closure If the final burst is imperceptible set the right boundary of the word 100ms after the energy in the preceeding phone stops If the burst is perceptible extend the right boundary of the word to the end of the burst e Cut off speech For time aligned labeling transcribe the part of the word that was spoken followed by an asterisk in the word box Rather than supplying the intended utterance within pointy brackets write the complete word in the comments window lf the entire word is not known write the label incomplete in the comments file Likewise if the utterance is ambiguous or unintelligible use the label uu in the word box and write the label incomplete in the comments file See 2 3 6 Note There must be evidence in the waveform that a cut off has occurred Do not use the asterisk unless there is a signal in the waveform such as a high energy level at the end of the waveform Do not use the cut off diacritic in c
37. as follows in Table 6 4 bilab l d int d alv postalv vel and gl See Table 6 5 for a list of the labels with word examples and descriptions Table 6 4 English Consonants LE DT In s ell Taps UI T voiceless affricates tS mM stops voiced affricates voiced dZ efes fs a voiceless fricatives voiced qus EET m E e D eot e t e e a EE 6 2 5 Notes on English Consonants e Aspiration is overtly transcribed on all voiceless stops in English This contrasts with relatively unaspirated stops in languages like German where the stop label would be p 58 CHAPTER 6 PHONETIC LABELS Table 6 5 English Consonant Examples and Descriptions Worlabe Ao ape ES ca oe ant z EB m Sos S o LAS n f T s S h v D Z Z pan pan tan tan can can ban ban dan dan gander gander me knee sing rider writer banter fine thigh sign assure hope vine thy resign azure ahead church church judge judge limb right yet when bottom button bottle voiceless bilabial stop voiceless bilabial stop closure voiceless alveolar stop voiceless alveolar stop closure voiceless velar stop voiceless velar stop closure voiced bilabial stop voiced bilabial stop closure voiced alveolar stop voiced alveolar stop closure voiced velar stop voiced velar stop closure bilabial nasal alveolar nasal velar nasal alveolar flap alveolar flap alveolar nasal flap
38. ases where you know the person wanted to say something more but before he she was able to utter anything the system stopped recording The speaker is only considered to be cut off when she he has actually begun to say something and not only when the transcriber thinks he she might be about to say something 2 5 TIME ALIGNED CONVENTIONS 17 2 5 2 Comments Box Conventions e Actual vs Citation Pronunciations The comments window at the time aligned word level is used to note actual pronuncia tions i e partial pronunciations or slang the mode in which the caller is speaking and important details which clarify the transcription All comments should be entered as one word in the comment windows For example glot whisper outbreath Most files will contain one or more comments For some time aligned word labeling no comments window will exist The presence of the comments window is dependent upon the needs for the particular corpus being labeled If the comments window is not present disregard conventions pertaining to it and follow only those associated with the word box e Backslashes Every comment that is not a transcription of something the speaker actually said as opposed to the dictionary form of the word appearing in the word box should be pre ceded by a backslash For example the comments glot whisper outbreath and extraneousnoise should all be preceded by backslashes e Two Types of Lola Boxes There are two types of
39. ative Satin voiceless alveolar sibilant Satz voiced alveolar sibilant Schatz voiceless alveo palatal sibilant Genie voiced alveo palatal sibilant Reich voiceless palatal fricative Rauch voiceless velar fricative hasse voiceless glottal fricative Pfennig voiceless labiodental affricate voiceless labiodental affricate closure Zeit voiceless alveolar affricate voiceless alveolar affricate closure Deutsch voiceless alveo palatal affricate voiceless alveo palatal affricate closure Dschungel voiced alveo palatal affricate voiced alveo palatal affricate closure brauchen alveolar retroflexed tap brauchen uvular fricative rasse alveolar trill Narren uvular trill Masse bilabial nasal nasse alveolar nasal hangen velar nasal Kognak palatalized nasal lasse alveolar lateral approximant haben syllabic bilabial nasal hatten syllabic alveolar nasal Haken syllabic velar nasal Kessel syllabic lateral alveloar Jacke palatal approximant 64 CHAPTER 6 PHONETIC LABELS 6 4 3 Notes on German Consonants e The alveolar tap rr was added to the Worldbet set 6 5 Hindi 6 5 1 Hindi Vowels Table 6 12 is a general vowel chart of Hindi vowels and Tables 6 13 and 6 14 are lists of the labels used in Hindi Table 6 12 Hindi Vowels front central back high i u u I Ix U mid e E amp low L IL Table 6 13 Hindi Vowel Examples and Descriptions Worldbet miit beloved high front long mit ana wipe out
40. ause fp Table 4 1 Filled pauses for multi language labeling OGI ortho Worldbet Language ano ano Japanese A Czech Polish German A m German Czech Portuguese Spanish Japanese Japanese Spanish Portuguese Polish Swedish Czech Swedish Japanese Polish Japanese Swedish Hungarian French Mandarin Swedish Swedish Polish 4 5 Specal Conventions in each language 4 5 1 Arabic One of the problems in romanizing Arabic for transcription at the orthographic level is that there are different varieties of written Arabic and no clear standard written version As of 4 97 we have not begun transcriptions in Arabic 4 5 2 Cantonese Cantonese is nearly completely transcribed In keeping with our goal to provide transcriptions in an ascii based transcription we have used an already existing romanization as our model This Yale romanization is treated in Cantonese a Comprehensive Grammar by Stephen Matthews and Virginia Yip 1994 One modification we made to the romanization is way we mark tone The tone is transcribed with a number and is separated from the word by a hyphen baak 3 luhk 6 sahp 6 yaht 6 There are 7 basic tones in Cantonese 1 high rising 2 high level 4 5 SPECAL CONVENTIONS IN EACH LANGUAGE 33 3 high falling 4 mid level 5 low rising 6 mid level 7 low falling In most cases there is a straight forward mapping between the romanization and the can tonese character However there are a few
41. boundary of a fricative label at the point of most radical change in the waveform If there is little or no clear visible indication of the fricative in the waveform but you can hear and see indications of a low amplitude fricative in the spectrogram set the boundary where there is an increase of energy in the spectrogram If the fricative is released very strongly like a burst label the closure epenthetic closure and the burst the appropriate fricative label 50 CHAPTER 5 PHONETIC LEVEL LABELING 5 9 4 Nasals T he onset of a nasal is nearly always easy to determine the waveform rises or drops into a highly periodic low amplitude signal The nasal usually carries the same formants of the preceding vowel or other phone but is lighter in color or intensity the change in the waveform is very easy to spot because the decreased amplitude causes the height of the signal to be shortened That is the absolute value of the wave has decreased At the end of an utterance or phrase a nasal may trail off gradually until it is indistin guishable from environmental or line noise Set the right boundary where the first formant in the spectrogram dies this should coincide with a point of radical change in the waveform The reason we use the disappearance of the first formant to determine the right boundary of a phone is because it is the best clue to voicing Thus when the first formant is no longer visible we know that voicing has ceased
42. ce of the phone but the quality of the phone may remain constant When this occurs divide the geminate in half and give the same base label for each segment two labels are given to the segment to retain a phonemic representation of the utterance Last Resort If there is simply no spectral discontinuity and the geminate does not appear markedly longer than a single phone the geminate should still be split in half and two identical labels should be given to each segment 5 10 Non speech Events The acoustic signal does not solely contain speech events Sounds such as telephone line noise laughter and inhalation do not convey speech information per se However because they are present in the signal and because recognition systems need to learn to distinguish between speech and non speech information perceptible non speech phenomena will be labeled See chapter 2 for non speech symbols and usage In phonetic transcriptions non speech labels are used conceptually in exactly the same way as they are used in time aligned word transcriptions The only differences are that they are time aligned to the waveform and they are preceded by a period rather than being enclosed in pointy brackets 5 10 1 Pause Closure and Epenthetic Closures Technically speaking there is no silence in the signal because there is always line noise or some other type of noise present in the signal The label pau pause is used for periods of time in which the speak
43. ct words which differ in only one meaningful sound The two words bat and pat illustrate that b and p are distinct phonemes in English Other examples are sought and fought Notice that pronunciation is a factor but not spelling p is a phoneme in English which has a number of phonetic allophonic realizations p in the word perfect is aspirated and spectrally averages 50 60ms in length p in the word spectacle is generally unaspirated and the average length is shorter than its aspirated counterpart Both phones are bilabial plosives and both have the same percept for native speakers of English As the percept is the same for both the aspirated and unaspirated bilabial stops there is no meaningful difference between these sounds they are allophones Another allophonic form of p is unreleased Many speakers do not release word final p as in the word stop This would contrast with the word petunia where word inital p is always released A 3 Consonants A 3 1 Place of Articulation The following terms are used to describe place of articulation of consonants For a more complete coverage of Linguistic terminology see and 85 86 APPENDIX A TERMINOLOGY Bilabial Produced by bringing the two lips together Labiodental Produced by moving the lower lip to the upper front teeth Interdental Produced with the tip of the tongue between the upper and l
44. d in a spectrogram by a drop in both F1 and F2 Velar stops are a common place for labialization as in English quick or the Cantonese name Kwai look for formants that emerge from the stop aspiration at an extremely low level The phoneme s is another common target of labialization the frication energy will drop to 2000 Hz or lower causing s to look like S 7 15 Unrounding The diacritic 1 indicates unrounding on a normally rounded segment Many American back vowels become unrounded before high front vowels The high back vowel u in particular is often unrounded in dialects of American English 7 16 Palatalization The diacritic _j denotes palatalization It signifies that the place of articulation of the phone is approaching the palatal place of articulation When applied to a base label it does not necessarily mean that the phone is produced at the hard palate Palatalization of velar stops especially before high front vowels is a very common assimi lation process in the world s languages as is palatalization of t before r in English When a t is palatalized it is usually more heavily aspirated and it sounds more like tS The change is most obvious in the waveform where the aspiration is quite intense the waveform looks very similar to that of a S or tS Palatalization often occurs in the transition between high front vowels to back vowels It happens mainly with velar stops and it destroys the velar pinch that w
45. d non time aligned word levels Time aligned and non time aligned transcriptions differ in two ways First time aligned transcriptions contain boundary markers to show where segments begin and end and non time aligned transcriptions contain no reference to time Second time aligned non speech labels are preceded by dots ns whereas they appear in pointy brackets in non time aligned transcriptions bn 2 3 Non time Aligned Conventions This section covers conventions for non time aligned word level transcription For time aligned word level conventions see section 2 5 2 8 1 Enclosing brackets lt gt We want our transcriptions to be computer parsable This means the structure needs to be predictable There are two possible structures in word level transcriptions 1 word lt gt 8 CHAPTER 2 WORD LEVEL CONVENTIONS 2 tag This syntax will become clearer as you read Pointy brackets Non speech labels are usually enclosed in pointy brackets See Chapter 3 Table 3 1 Nothing should appear between pointy brackets except for these non speech tags Parentheses Enclose a non speech tag in parentheses when the non speech event happens during the entire file Put tags in parentheses at the beginning of the file Note the following example appropriate for an Austrailian speaking in a crowded room pron bs i like vegimite The pron tag indicates non standard pronunciation in this case the Australi
46. dard orthography If the transcriber had thought instead that the word was buy the transcription would have been b uy again reflecting the standard orthography characteristic of word level transcriptions In transcription 2 the last sound in the word name was cut off Again this was a familiar word to the transcriber so she he was able to supply the final two letters of the word name In the examples 3 and 4 there is no information in brackets but for different reasons Let s take example 3 first Transcription 3 signifies that only part of the m was cut off but part of m could still be heard by the transcriber In this respect 3 is most like 1 the only difference is that in 1 the sound m could not be heard at all whereas in 3 part of the m could be heard In example 4 nothing appears in brackets This is clearly a case of a cut off word as the asterisk is the first character in the file In this transcription since the utterance was of an unfamiliar name the trancriber was unable to supply the rest of the word The label sp was added to show that it was an unfamiliar word whose spelling could not be ascertained Placement of Asterisk For placement of asterisk see 2 3 3 In summary e When a word is cut off but the transcriber understands what the speaker was intending to say the part that was actually uttered is transcribed not in brackets and the part that the transcriber supplies appears
47. e 40 C german 56 hindi 60 mandarin 65 cC mandarin 65 cCc mandarin 65 chC mandarin 65 citation pronunciation 8 closure phonetic labeling 46 segmentation 40 voiced segmentation of 41 voiceless segmentation 40 coarticulation segmentation 39 comments boxes 18 consonant terms 79 manner of articulation 80 place of articulation 79 D english 52 spanish 69 INDEX d english 52 german 56 japanese 63 d hindi 59 spanish 69 de hindi 59 japanese 63 dn hindi 59 d z japanese 63 d zc japanese 63 d english 52 dc english 52 german 56 spanish 69 devoicing stop closure 41 diacritic 29 devoicing 45 nasal 44 diacritics aspiration 72 cut off 76 devoicing 41 flap 73 formation 38 frication 73 glottalization 73 length 74 nasalization 74 non speech 46 palatalization 76 phonetic 71 rhotacization 75 unrounding 76 voicing 15 word level 32 discard word transcription 11 dR hindi 59 93 dr hindi 59 drc hindi 59 dZ english 52 german 56 hindi 59 japanese 63 spanish 69 dZc english 52 german 56 hindi 59 japanese 63 spanish 69 dZH hindi 59 E english 50 german 54 hindi 57 mandarin 64 spanish 67 e japanese 61 spanish 67 E german 54 e german 54 hindi 57 japanese 61 e amp english 51 Eax german 55 eax german 55 el japanese 61 ei english 51 mandarin 64 epen
48. e See table 4 10 4 5 19 Swahili Conventions for Swahili are in the process of being developed Stay tuned 4 5 20 Swedish Swedish is written with the roman alphabet Table 4 11 displays transcription conventions used when transcribing the umlauts The fourth character u umlaut is used for German loan words e The hypen is used in Swedish transcriptions for hypenated names such as maja lena or lars erik e The apostrophe is used in Swedish to distinguish words like ide idea 1 d e amp from ide den 1 d amp At least in this case the word with the apostrope is pronounced differently than the word without Bjoernen ligger i sitt ide Han foar en ide The bear lays in his den He gets an idea 4 5 21 Tamil 4 5 22 Vietnamese Although written with the roman alphabet Vietnamese has a number of characters not found on a standard keyboard Table 4 12 displays our ascii solution to that potential problem Symbols following a letter usually apply to the previous letter In the transcriptions symbols should appear in their original order from left to right and from top to bottom Symbols above the letter will appear first followed by symbols below the letter leftmost symbols will appear first and then right Known problems with table 4 12 e n in the Veitnamese version of the first word du o c the hook glottal stop symbol is misleading This marking actually looks like a comma or backward c connected to the top
49. e determined is the place to set the left boundary After a liquid or strong approximate the onset of a vowel is marked by the appearance of the f3 and or f4 and f5 in the spectrogram cues from f4 and f5 will generally not exist in telephone speech due to the limited bandwidth the waveform will show some change but there is no typical change to look for Be sure to use cues from both the spectrogram and waveform setting the left boundary where the changes coincide Give preference to the changes in the waveform After a semivowel or vowel it can be practically impossible to determine the exact onset of a vowel To be consistent we have chosen to place the boundary in the middle of the transition period If the formants never level off on either the semivowel or the vowel divide the segment in half 5 9 SEGMENTATION AND LABEL SELECTION 51 5 9 7 Devoiced Vowels Not all vowels are voiced in rapid speech are voiced Devoiced vowels tend to occur after a voiceless obstruent and tend to be shorter than voiced vowels If a devoiced vowel is suspected look for the occurrence of f2 and f3 in the spectrogram disappearance of the first formant and perhaps increased periodicity in the waveform Set the left boundary at the onset of these changes and use the devoicing diacritic This diacritic 0 will be used only on the part of the vowel that is devoiced At the end of an utterance or phrase the final vowel or approximate may trail off gr
50. e detailed transcription such as a phonetic transcription 2 4 2 Capitalization There is no capitalization in English orthographic transcriptions 2 4 3 Punctuation The only punctuation mark allowed in English is the apostrophe The apostrophe is used in the following three instances e To indicate possession she has susan s slide rule e For contractions don t take that it s mine e Spelling my name s got two r s in it The apostrophe is not used for the omission of letters in words like walkin or bout See Table 2 3 Periods commas dashes hyphens semicolons and other punctuation marks are omitted Square and pointy braces are used in transcriptions for non speech events or predictable speech that was cut off See 2 3 1 The apostrophe is not used to indicate omission of letters as in walkin walking or bout about 2 4 4 Transcribing Spelled Letters When a caller spells a name recites the alphabet or says an abbreviation see below transcribe the letters spoken with spaces to separate them See Chapter 4 for capitalization rules in other languages Note Section 4 5 7 2 4 MISCELLANEOUS DETAILS 15 A spelled name harry The alphabet abcdefghijklmnopqrstuvwxyz An abreviation or acronym o h s u the hospital c d player Note the following transcription for double letters my name is terri spelled t e and then two r s i Transcriptions used to contain dashes to indicate that the speaker did n
51. e dis advantage lies in lack of consistency Some base labels are allophones not phonemes so a simple stripping off of diacritics does not necessarily leave one with a purely phonemic transcription There are ways to get around the inconsistencies and a smart diacritic stripper is the first step Such a tool is under development and is anticipated to be a part of the OGI Speech Tools in the near future CHAPTER 1 INTRODUCTION AND OVERVIEW Chapter 2 Word Level Conventions 2 1 Purpose Word level transcriptions provide quick lexical access to an utterance as well as limited paraspeech data The purpose of word level transcriptions is to transcribe the words spoken as they would appear in a standard dictionary For exceptions see Section 2 4 1 Little information regarding dialect or ideolect is encoded in the transcription Words that are mis pronounced can be tagged with pron Entire files can be flagged with pron if a strong accent is detected Apart from this we have no convention for specifying the actual pronunications of words If more detailed information is needed one could consider creating phonetic transcriptions In addition to transcription of words word level transcriptions record non speech sounds such as background noise bn line noise In and breath noise br The non speech labels are discussed in Chapter 3 2 2 Overview Chapter 2 describes conventions used to transcribe speech at the time aligned an
52. eak frication followed by strong frication This sequence mimics the closure and aspiration of the regular stop 7 6 Glottal Onset The diacritics _ and are in complementary distribution The diacritic is used to mark the few glottal pulses occurring in word or utterance initial vowels in American English Glottal onset is a very short catch occurring in the pharynx as the vocal folds are beginning to vibrate Glottal onset is acoustically identical to a glottalized vowel but it only occurs in the word or utterance initial environment and it is of limited duration one to at most three glottal pulses The glottalized diacritic _ may appear anywhere the _ does not appear or it may appear in word or utterance initial position if the glottalization is of long duration 7 7 Glottalization The diacritic _ is used to label glottalization Any voiced phone can be glottalized but vowels are the most commonly glottalized phones Glottalized vowels are characterized by a marked slowing of vocal fold vibration They differ significantly in appearance from non glottalized vowels Utterance final vowels are often glottalized as the vocal folds cease moving Vowel geminates are also often glottalized in the period of transition between the two phonemes When spacing of vocal striations becomes relatively and significantly larger relative to immediate spectral context the glottalization diacritic should be used 7 7 1 Marking glottali
53. eech and for which we don t currently have a specific label Hiccups yawns and grunts are exam ples 3 18 pau or pau A period of relative quietness in which the speaker stops to think or hesitates before saying a word The expression relative quietness is used here because there is no actual silence in the speech signal due to line and environmental noise For consistency and ease of transcription pau should not be marked unless there is a period of at least 1 second 1000ms where no other event worth transcribing occurs If there is a 500ms breath and a 500ms pause the pause should be ignored 3 19 pron or pron The label pron is used for odd pronunciations or pronunciations that differ from the stan dard dialect One native American speaker said nane when he clearly intended us to under stand nine Another non native speaker consistently pronounced the letter name z zed British pronunciation Using pron allows us to transcribe only words that would appear in a standard dictionary while indicating that the token varies in some significant way from the most common or a predictable pronunciation pron can be used for dialectical variations such as zed above Our English transcrip tions assume a standard American English dialect Each language transcribed has a defined dialect and is transcribed from the perspective of that dialect pron may also be used for speech with a heavy foreign accent Only use
54. el place of articulation 88 APPENDIX A TERMINOLOGY Bibliography 1 George D Allen The phonascii system Journal of the International Phonetic Association 18 9 25 1988 2 A Evison A P Cowie Concise English Chinese Chinese English Dictionary Oxford Uni versity Press Oxford England 1986 3 International Phonetic Association Report on the 1989 kiel convention Journal of the International Phonetic Association 19 67 80 1989 4 W J Barry and A J Fourcin Levels of labelling Manuscript 1990 5 Ronald Cole Beatrice T Oshika Mike Noel Terri Lander and Mark Fanty Labeler agreement in phonetic labeling of continuous speech In JCSLP Conference Proceedings august 1994 6 Bernard Comrie editor The World s Major Languages Oxford University Press Oxford first edition 1990 7 CSLU Ogi speech tools user s manual Technical report Center for Spoken Language Understanding Oregon Graduate Institute 1993 8 Victoria Fromkin and Robert Rodman An Introduction to Language Holt Rinehart and Winston Inc New York fourth edition 1988 9 J Hieronymus M Alexanberd C Bennett I Cohen D Davies J Dalby J Laver W Barry A Fourcin and J Wells Speech segmentation criteria for the scribe project Manuscript 1990 10 James L Hieronymus Ascii phonetic symbols for the world s langauges Worldbet Tech nical report Bell Labs 1993 11 Peter Ladefoged A C
55. er has stopped speaking or for periods in which there is very little or no apparent energy in the signal There is no length restriction on the label pau The only acoustic difference between a voiceless closure and a pause is the period leading into and coming out of the closure Due to coarticulation the formants will reveal the position of the articulators if they have moved into a certain position for the closure Other than this the voiceless closure segments could be thought of as pauses as well as they do span a period of little or no activity in the spectrogram The closure labels are used to mark a specific type of articulatory closure t closure te p closure pc etc see chapter 6 for the phonological labels Closures occur before stops and affricates as distinct from pau closures do have a typical length because speaker and rate of speaking A Pause is a discourse unit it breaks up the sentence and indicates a boundary between sub sentence sets of words Epenthetic closures are always caused by movement of the articulators from one position to another They are often referred to as insertions since a closure is inserted into a word 5 10 NON SPEECH EVENTS 53 Some general guidelines to follow for English other languages may have other telltale clues that can be used e Pause Label a segment with pause pau if you can hear a comma such as when a person lists a series of items or seems to be collecting their tho
56. es Hindi Phonology 4 5 9 Hungarian For special characters see table 4 5 In Hungarian the hyphen will be used for the whether or if condition as in Nem tudom hogy lesz e buli don t know whether there will be a party Here is an example utterance written first in standard Hungarian orthography and then using our transcription conventions The sentence means It s very simple I cross the main 4 5 SPECAL CONVENTIONS IN EACH LANGUAGE 35 road pass the river and turn right There is a new supermarket opening nearby The sentence nagyon egyszer tmegyek a f e tvonalon s a foly ut n roeg toen lefordulok a koezeluenkben nyilik egy j b c would be transcribed nagyon egyszerue a tmegyek a foe u tvonalon e s a folyo uta n roegtoen lefordulok a koezeluenkben nyi lik egy u j a be ce 4 5 10 Indonesian Standard written Indonesian uses a Roman script Hyphens are allowed in Indonesian Nothing else particularly striking i e different from English happens in Indonesian as I recall 4 5 11 Italian Italian has not yet been transcribed However it uses a Roman alphabet with a few additional diacritics not found on a standard keyboard These issues will be dealt with when transcriptions begin 4 5 12 Japanese A modification of the Hepburn romanization is used to transcribe at the word level Table 4 6 outlines the romanization scheme unfortunately it does not yet contain the actua
57. es for us because often foreign place names are spelled and or spoken closely to the way they are spoken in the language of origin For a speaker of Vietnamese is a place name such as New York a foreign word Or is Paris a foreign word to a Hungarian To determine whether place names or other words are foreign require the lt nitl gt tag we rely on the pronunciation of the word e If the talker pronounces the proper name the way most talkers of the same native language pronounce the name the word is not considered foreign and the lt nitl gt tag is not used e if the speaker attempts to pronounce the name as it is pronounced in the language of origin assuming that is different than the way s he would normally say the word in his her own language the lt nitl gt tag should be used e if the place name or other word is pronounced identically in each language the lt nitl gt tag is not needed For example in Japanese McDonald s is pronounced m akodonarud o but an American pronunciation would be m k d A n l d z The former should be 4 4 FILLED PAUSES 31 transcribed in the Japanese romanization as macodonarudo but the American sounding pronunciation would be McDonald s lt nitl gt Some languages have developed ways of dealing with foreign loan words orthographically Loan words in Japanese are written in Katakana script For Japanese it would be more natural to romanize McDonald s as macodonarudo than to write the word with s
58. ghlight the information used by people to understand speech We label what we consider to be the core information the contrastive sounds of the language and in addition we mark salient features of the speech which are visually striking such as glottalization or excessive nasalization For efficiency we draw the line at segments that are e visually acoustically distinct e distinct enough both visually and perceptually to allow for consistent and accurate tran scriptions by multiple lablers e worth the time it takes to label 5 2 Overview This chapter is a general discussion of the transcription process at the phonetic level In structions are given to train one to label at the broad phonetic level Included is information concerning segmentation issues and non speech labeling conventions 5 3 Alignment Phonetic transcriptions are time aligned with the waveform to a certain degree of accuracy In ambiguous cases the boundaries are set according to established rules Boundary placement is discussed in detail in this chapter although we know that a strictly time aligned transcription is impossible due to coarticulation Phones are not always signalled by discrete non overlapping regions in the waveform In continuous speech coarticulation deletion and elision cause phone boundaries to overlap Therefore because true boundaries do not actually exist in many cases care must be taken to follow convention in order to ensure consistenc
59. glish basic word english part of sound e cut off english part of sound sh cut off eng lish eng cut off eng lish lish cut off english lt In gt simultaneous line noise eng lish In In simultaneous with eng lish cut off eng lishcIn In simultaneous with lish eng cut off english lt In gt lt bn gt two simultaneous events 2 3 4 Simultaneous Sounds It is difficult to show overlapping phenomena in transcriptions Following is a description of the convention used to distinuguish between sequential and simultaneous sounds at the non time aligned level See section 2 5 2 for conventions at the time aligned level 1 Put a space before and after pointy brackets of non speech labels when the sounds do not occur simultaneously For example if you hear a noise between the words to and fly you would transcribe it ilike to bn fly kites 2 If noise is heard while the talker is saying a word connect the non speech label to the word that was affected 10 CHAPTER 2 WORD LEVEL CONVENTIONS i like to flycbn kites Here background noise could be heard while the speaker was saying the word fly 3 If background noise or other non speech phenomena can be heard throughout the entire file transcribers can illustrate this by typing the appropriate non speech label at the be ginning and at the end of the file Consider the following example in w
60. haiyaa brother pataa address path path din day dhan money t uut aa broken th andaa cold d aal branch dh er lots kaam work khaanaa food gam sorrow ghar home cammac spoon chat roof jaan life jhaarna waterfall fatnaa tear siimaa limit zamiin land sher lion voiceless labial plosive voiceless labial aspirated plosive voiceless labial closure voiced labial plosive voiced aspirated labial plosive voiced labial closure voiceless dental plosive voiceless aspirated dental plosive t and t H closure voiced dental plosive voiced aspirated dental plosive d and d H closure voiceless retroflex plosive voiceless aspirated retroflex plosive voiceless dental plosive voiced retroflex plosive voiced aspirated retroflex plosive voiced dental closure voiceless velar plosive voiceless aspirated velar plosive voiceless velar closure voiced velar plosive voiced aspirated velar plosive voiced velar closure glottal plosive voiceless alveo palatal affricate voiceless aspirated alveo palatal affricate t and tSH closure voiced alveo palatal affricate voiced aspirated alveo palatal affricate dZ and dZH closure voiceless labial fricative voiced labial fricative voiceless alveolar fricative voiced alveolar fricative voiceless alveo palatal fricative 6 5 HINDI 67 Table 6 1
61. he text file corresponding to the time aligned word level transcription in Figure 1 1 appears below The first two lines of the file contain the header and the remaining lines contain the words spoken preceded by the start and end times in milliseconds MillisecondsPerFrame 1 000000 END OF HEADER 481 846 r 846 894 pau 894 1176 one 1176 1507 five Time aligned word transcriptions are also represented in a standard orthography or roman ization Speech and non speech phenomena are distinguished The transcriptions are aligned to a waveform by placing boundaries to mark the beginning and ending of words In addition to the specification of boundaries this level of transcription includes additional commentary on salient speech and non speech characteristics such as glottalization inhalation and exhalation Conventions used to produce time aligned orthographies are in chapter 2 CSLU Corpora that have utilized time aligned word transcriptions in part or in all of corpus are English Census Corpus and Stories Corpus 1 8 LEVELS OF TRANSCRIPTION 3 1 3 3 Phonetic An example of a time aligned phonetic transcription appears in Figure 1 1 Following is the text file containing the time aligned phonetic transcription The header occupies the first two lines and the following lines contain the start and end times in milliseconds and the phonetic labels corresponding to the marked segment MillisecondsPerFrame 1 000000 END OF HEADER 480
62. hich this sentence is assumed to be the entire file lt bs gt i was born in san paulo and i wanted to learn to ski from the time i was old enough to speak lt bs gt This convention aleviates the need to put a non speech label in every pause or silence although it introduces ambiguity in the case where noise only occurred at the beginning and end of the file 2 3 5 A Note About Connecting Non speech Labels The non time aligned non speech labels can be divided into three categories e Those that must be connected to a word lt long gt lt sp gt lt asp gt lt alt gt lt pron gt e Those that must not be connected to a word cannot indicate simutineity lt br gt lt burp gt cough ct ls pau lt sneeze gt sniff tc e And those than can be used either way beep blip bn bs fp laugh In lt nitl gt ns uu yawn vs 2 3 6 Cut Off Speech Cut offs Non Time Aligned The asterisk is to be used in the word file at the time aligned and non time aligned levels when speech is cut off By cut off we mean the person was either already speaking when the system began recording or that she he was still speaking as the system stopped recording Note There must be evidence in the waveform that a cut off has occurred Do not use the asterisk when you think the person wanted to say something more but did not atually say anything When a person is c
63. ibe English is an example of a phonemic transcription set that captures mainly language specific distinctives TIMIT has a few labels that are not phonemic such as the reduced vowel label ix OGIbet is based on TIMIT It was formerly used at the CSLU for broad phonetic transcriptions OGIbet is mainly phonemic but by use of diacritics and some phonetic base symbols the transcriber is able to capture more phonetic features of the speech signal If one wishes to capture fine phonetic detail the IPA conventions are a candidate The IPA has long been recognized as the standard for phonetic transcriptions The IPA is intended to be a set of symbols for representing all the possible sounds of the world s languages The choice of symbols is usually guided by the principles of phonological contrast 4 CHAPTER 1 INTRODUCTION AND OVERVIEW If one is involved in producing multi language transcriptions it is not sufficient to have symbol sets that capture only distinctives existing within a given language Worldbet developed by Dr James Hieronymus was adopted at CSLU because of the need to capture sounds in many different languages Worldbet contains labels which are similarly defined across all languages Like the IPA Worldbet is robust enough to capture all contrastive sounds in the worlds languages as well as salient phonetic distinctions A correspondence can be made among a standard phonological transcription such as TIMIT a detai
64. ifies aspiration in the phone There are certain predictable contexts in which English voiceless plosives do not contain aspiration such as st clusters Lightly or non aspirated stops are not marked explicitly in transcriptions because they can be predicted by phonological rule 7 3 Centralization The centralized diacritic is used for vowels that have become centralized or reduced but that still perceptually contain elements of the original vowel quality If the placement of a vowel has moved slightly central this diacritic will be used In fast speech the vowel I often appears as a very short and somewhat reduced vowel F2 is still too high for it to be considered A or amp It 7 4 THE FLAPPING DIACRITIC 79 should be transcribed I x 7 4 The Flapping Diacritic The _ diacritic is used for flapped consonants When stops are flapped they are extremely short lack closure and have visible formants In the spectrogram they often look like a small dip in between vowels and have a slightly lower amplitude than surrounding segments Flapping is very common with alveolar stops in American English Usually a flapped segment is not much longer than 30ms although the actual length depends on the rate of the speaker The alveolar nasal in English is also commonly flapped 7 5 Fricated Stops The diacritic _F is used to indicate that a stop burst has been heavily fricated Fricated stops are plosives having no true closure but a sequence of w
65. in square brackets See Examples 1 and 2 above e If only part of a single sound is cut off then nothing should appear in brackets but the asterisk will appear at the beginning or end if it is cut off at the end of a word with no intervening space to indicate that part of the initial or final sound in the word was cut off Example 3 above e If instead only part of an unfamiliar word can be heard due to a cut off the part that can be heard is sounded out so to speak and the asterisk is added to the word 12 CHAPTER 2 WORD LEVEL CONVENTIONS 2 3 7 False Starts The asterisk is also used to label false starts A false start is a certain type of disfluency in which the speaker begins to say something and then stops in the middle of the utterance Consder the following examples 4 Id like to thi nk read about it for a bit when i go to flori da i mean tennessee i ll look you up trimson cri ed i mean crimson tide well i don t br even want to talk about it Notice in the first three examples the transcriber felt confident enough of the context to hazard a guess at what the person intended to say T he first two examples are straightforward the third is what is known as a spoonerism and in the fourth the false start br was not traceable to a word and hence no information appears in brackets 2 3 8 Why distinguish between Cut off and False Start Basically the conventions for false
66. in the background If a person is speaking in the background whether the speech can be heard on the television set on the radio or in person it would be labeled bs bs is not used to for side comments made by the caller The caller s speech is always considered to be a foreground event bs is not used for babies crying that would be labeled bn The use of bs is similar to the label bn and it likewise has a corresponding label for use in the comments window at the time aligned level when both background speech and a foreground event can be heard See 2 5 2 for detail concerning extraneous speech in the comments window One special use of the label bs is the following i have lived in the u s for uhm martha how long have we lived here bs ten years ten years In this case the caller has asked his wife in the background how long they have lived in the U S T he wife s answer is audible in the background and then the caller repeats her answer into the telephone This type of construction is only allowable with the lt bs gt label 3 7 burp or lt burp gt A burp 3 8 cough or lt cough gt A cough 3 9 CT OR CT 21 3 9 ct or ct A clear throat 3 10 fp or lt fp gt uh uhm hmm and others see table 2 Sounds uttered which act to fill silence while a speaker is thinking are called filled pauses The label fp is meant for filled pauses that do not appear in table 3 2 uhm uh hmm mm etc fp
67. ing transcribed are answers to questions the comments BA and EA should be used These are abbreviations of the actual comments beginanswer and endanswer adopted for ease of typing Align only the left boundary of these boxes This convention has only been used for labeling the Census Corpus This convention identifies the part of the response that is intended as the answer to the questions as opposed to other speaking which may be in the call If the utterance contains only an exact answer with no extra speech there is no need to use the comments BA and EA Breath Labelers should use br in the word box when there is a breath In the comments box the breath should distinguished further inbreath outbreath These labels should span the duration of the breath These may be abbreviated using the convention ib and ob Abbreviations Write the label abbreviation for abbreviated forms of words Whispering For whispered speech by the caller create a box in the comments file which spans the phenomenon and label it whisper If there is someone in the background whispering this should be labeled following conventions for extraneous speech The label whisper should only be used in reference to the primary speaker See section 2 5 2 Mispellings If you are unsure of the spelling of a word at the time aligned level use the label spelling in the comments box Chapter 3 Description of Non Speech Labels Table 3 1 shows
68. iption of languages other than English even though they are not treated specifically in this chapter 4 3 Foreign Words Most people participating in our multi language project speak English as well as at least one other language and live in the United States As a result the speech we record is often sprinkled with English words Especially in countries like India where English is one of the official languages English heavily influences the speech of the people Foreign words are marked with the tag lt nitl gt An important question is how to classify words as foreign not in the language We consider a word foreign if it is not completely absorbed into the language For example buffet although originally French has been absorbed into the English language and a tag such as lt nitl gt would not be necessary 3 16 But perhaps some would consider program less French than logiciel It is not always easy to decide if a word is foreign or not but following are some guidelines for transcribing 4 3 1 Foreign words spoken with a foreign accent Foreign words pronounced with the phonology of the language of origin require the lt nitl gt marker For example A Mandarin speaker says the name McDonald s with American pronunciation Transcription medonald s lt nitl gt Section 3 16 describes the use of lt nitl gt in more detail 4 3 2 Foreign words names spoken without an accent Place names present difficulti
69. is is an authentic example 3 13 In or lt In gt Line noise refers to clicks buzzes or periods of static caused by the telephone line If line noise occurs throughout a the call and can be heard when the caller is not speaking In should be 22 CHAPTER 3 DESCRIPTION OF NON SPEECH LABELS labeled where pau would normally be used You may have a call in which the label pau is not used but is instead replaced with line noise 3 14 long This label is only used at the non time aligned level for elongated or drawn out words If there is a word which is noticeably drawn out connect this label to the word that is elongated Do not use long for pronunciations in which the speaker has paused between syllables as in na ipau cy but only when sounds in the word are drawn out When a speaker pauses between syllables of a word there is no special convention to mark this the word should be transcribed as it would normally be i e nancy 3 15 Is or ls This phenomenon occurs often before a breath or an utterance If the speaker makes a smacking noise with the tongue and lips segment and label it Is 3 16 nitl or lt nitl gt The label nitl is used for speech that is in a language other than what the caller was asked to speak This is referred to as foreign speech Foreign speech is particularly common in multi language transcriptions when the person is spelling words or giving an address When a person clearly
70. l Japanese characters It should still be readable to someone familiar with Japanese Note some other salient characteristics e Long consonants are transcribed as double consonants i e kitte stamps e Long Vowels are transcribed with colons i e kacho manager rather than kacho with a line over the o as in the Hepburn romanization e Long e will be transcribed as e or ei depending on pronunciation The most common pronunciation is kire pretty But in some cases speakers say kirei manifesting a strong diphthong When this occurs the ei sequence will be used to reflect the sound change e The sequence hu is transcribed as fu as in fuji mount fugi and as tii e The sequence ti is transcribed as chi when it is palatalized as in ichi one when it appears in borrowed words such as lemon tii lemon tea Likewise the sequence di is transcribed ji when it is palatalized as in jikan time and as dii in borrowed word such as birudiingu building The double ii is used in transcribing tii and dii regardless of the actual length of the vowel because in this borrowed phonological context vowel length is in free variation and is not considered phonemically significant e Word Boundaries 1 Particles Particles will be considered as distinct words separated by spaces such as watashi wa gakuse desu I am a student 36 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES 2 Clas
71. label boxes made in the comment window one kind spans the phenomena it represents and the other simply marks the beginning or end of the event The former labels should have manually extended right boundaries which span the length of the phenomenon For the latter a label box is created and the right boundary is not extended These latter types of comments usually mark the beginning and end of a long segment ie read speech or extraneous noise e Manually Extended Boundaries When the label used is the type which spans a certain phenomena rather than simply identifying the beginning and end the right boundary should be manually extended to its proper time aligned position When saving the comments file in AutoLyre use write lola rather than write aligned lola in order to preserve the position of the labels which were not extended Specific Comment Box Labels e Actual pronunciations Word comments help explain the difference between what the speaker actually said and the citation form of the word For example if the speaker said what s dat write dat in the comment box aligned with that in the word box Similary bout written bout etc This information is redundant with the phonetic transcription and need not be noted unless the pronunciation is very noticeable If the comment is a transcription of an actual pronunciation no backslash should precede it e Read Speech Judgements are entered in the comment bo
72. lecting distinct phonetic variations In order to arrive at a purely phonemic level transcription the diacritics must be removed and adjacent base labels collapsed Following is a description of all of the diacritics used in English labeling Note that this is only a subset of the Worldbet diacritic inventory For the complete set Diacritics appearing in this chapter are those that were most commonly needed when labeling English This section will eventually be expanded to include additional multi language diacritics 7 2 Aspiration The diacritic _h indicates excessive aspiration on a phone Aspiration may be evident in relaxed speech on a vowel when the vocal folds are still vibrating but breath increases If a phone becomes devoiced the devoicing diacritic 0 is used and aspiration is assumed and therefore need not be explicitly marked In order to use the aspiration diacritic on a vowel the formants of the vowel must remain strong Aspiration following a vowel that does not contain strong formants should be labeled br although it may retain some vowel like quality when heard in isolation Predictable aspiration is contained within the base label i e th in English contrasts with t in German as English stops are in general more aspirated than German plosives If an English alveolar plosive were unusually heavily aspirated the transcription would be th h The latter transcription convention is rarely used because the base label already spec
73. led phonetic transcription such as the IPA and the multi language motivated Worldbet CSLU Corpora that utilize time aligned phonetic transcriptions in part or all of the corpus are Spelled and Spoken Names Corpus OGI Multi Language Stories Names and Numbers 1 3 4 IPA versus Worldbet Both the IPA and Worldbet are functional and useful for multi language transcriptions The IPA is the commonly recognized standard for phonetic transcriptions Worldbet is relatively new and less well known It is designed upon different principles than the IPA Many of the differences between Worldbet and the IPA are purely academic and do not affect the training of spoken language systems at all Still I will discuss the major differences here for those who may be interested e The IPA uses special non ASCII symbols Diacritics appear as subscripts or super scripts Special fonts must be installed in order to produce and view symbols IPA fonts are widely available and there are nice fonts available free of charge through the SIL Summer Institute of Linguistics See their web page http www sil org computing sil_computing html silsoftware e Worldbet uses an ASCII based character set and can be typed on a standard keyboard There is no need for special tools to view or produce symbols Diacritics are separated from the base label by use of an underscore e IPA symbol choice is based mainly on phonological contrast For example the dental place of a
74. lt sneeze gt This label indicates a sneeze 3 21 sniff or lt sniff gt sniff is to span the period in which the speaker sniffs 3 22 tc or tc The label tc signifies a clicking noise made with with the tongue 3 23 sp At times a caller may utter an unfamiliar word Rather than place the non descriptive label uu in the transcription labelers are encouraged to sound out the word and produce the most likely transcription lt sp gt should be attached to the end of all such words to indicate the transcriber s uncertainty of the correct spelling of the word do not insert a space between the word and sp For example for an unfamiliar street name might be toogali lt sp gt sp is commonly used with filled pauses that are not specifically listed in Table 3 2 For example i was driving urpl lt sp gt i mean flying to new york At the time aligned level the label would be spelling used in the comments box See Section 2 5 2 Obviously if the word can be found in a dictionary the correctly spelled word should appear in the transcription without the lt sp gt label 3 24 uu or uu Unintelligible speech is a category of sounds that cannot be mapped logically to any known utterance If the labeler does not understand what the speaker has said but he or she is sure that it is speech of some sort the label uu should be used This label should actually be used rarely because usually a guess can be made as
75. ly happens at the end of a question or in the pronunciation of some function words like the word de 4 5 15 Polish Although written with the roman alphabet Polish has a number of characters not found on a standard keyboard See table 4 8 displays our ascii solution to this potential problem 4 5 16 Portuguese Although written with the roman alphabet Portuguese has a number of characters not found on a standard keyboard Table 4 9 displays our ascii equivalents of the agudo accent cedilha circumflexo and til n the phonetic labeling of the OGI Multi language Corpus tones were labeled phonetically as they sounded not necessarily phonemically Phonetic labeling of tones will continue in this manner to allow comparison of expected pronunciations in the word level transcriptions with actual pronunciations in the phonetic transcriptions In the phonetic level labeling of the OGI Multi language Corpus the reduced tone was labeled as tone one 4 5 SPECAL CONVENTIONS IN EACH LANGUAGE 3T 4 5 17 Russian Russian transcriptions have not begun Most Russian characters belong the the Roman alpha bet but the remaining non roman characters will need to be Romanized We will probably do this by employing capital letters which most closely resemble the Russian character 4 5 18 Spanish Spanish is written in a roman alphabet There are two characters not available on a standard keyboard One is the accent and the other is the tild
76. mil and English Each language has a unique set of labels Only labels for the languages that have been labeled at the CSLU appear in this document Each label set is composed of mainly phonemic base symbols and diacritics that capture phonetic detail See Chapter for a description of each diacritic and Chapter 5 10 for the non speech labels used in phonetic labeling The aim of Worldbet is to enable consistent transcription across multiple languages so that a single symbol is similarly defined across all languages The label sets are designed to be extensible to all languages regardless of language family Chapter 6 discusses each of the six languages consecutively with the following subsections e Vowels This includes a vowel chart patterned after IPA vowel triangle and a table containing all vowel labels with word examples e Diphthongs This section charts the diphthong labels used in the language e Notes on Vowels optional This section contains additional information about labeling issues regarding vowels and diphthongs e Consonants Included here is a consonant table which contains only the phonetic labels and denoting manner and place of articulation Following is an additional table devoted to providing word examples for label use e Notes on Consonants optional This section contains additional information about language specific labeling issues For IPA correspondences of CSLU Worldbet symbols see
77. n and Neena Jain Hindi Preface The CSLU labeling document was created as a reference manual for the corpus development staff at the Center As such it is a living document that continues to evolve as languages are added to the database and new problems and solutions are encountered There are several goals of corpus devlopment at CSLU to provide training data for speech recognition to supply training data for automatic language identification and to offer a body of data to the research community to enable analysis of language at all levels As our speech data and transcriptions are released to the research community we have documented our tran scription conventions to make transcriptions useful to others Contents CONTENTS List of Tables LIST OF TABLES List of Figures Chapter 1 Introduction and Overview 1 1 Purpose The CSLU Labeling Guide is intended to accompany the data distributed by CSLU It describes the conventions used to transcribe those data 1 2 Overview Speech data at CSLU are transcribed at two levels orthographic and broad phonetic We produce non time aligned orthographic transcriptions to provide quick access to the content of an utterance Some orthographic transcriptions contain markers for word boundaries to support access and retrieval at the lexical level Time aligned phonetic transcriptions give a more detailed representation of the utterance to enable phonetic and phonemic analysis
78. n continuous speech la cosa bonita for las cosas bonitas This phonological process is common in many varieties of Spanish including Andalucian Caribbean Pacific Coastal Spanish and various Central and South American varieties e T is phonemic in certain dialects of Castellano including Madrid and other central and northern locations in Spain e Z and dZ are dialectal variants of the unmarked j pronunciation of orthographic Il dZ tends to be word initial for dialects that have it e The label s_j is for the palatilized s in certian dialects of Spanish especially in Madrid e To label r vs r if the wave form has more than one obvious burst it is labeled r Trills occur word initially at times but they are relatively rare in continuous speech Sometimes the r segment will contain a small closure and then a short vowel burst like segment The closure is included in the segment when it occurs Courtesy of Barrutia and Schwegler Fon tica y fonologia Espa olas 1994 6 8 SPANISH Table 6 29 Spanish Consonants iar a ier dest pad e ol voiceless Zoe ul fad voiced As foo Jal te voiceless Ee fis L voiced Es ll EE voicless Go voiced me v o a S nass m approximants v 15 CHAPTER 6 PHONETIC LABELS Table 6 30 Spanish Consonant Examples and Description Worldbet Description poco little boca mouth tengo I have
79. nal goes completely silent for a period of time due to a bad phone line connection It is only used when parts of words are imperceptible and the label should always be connected to a word It is not considered significant if a blip occurs during a pause 19 20 CHAPTER 3 DESCRIPTION OF NON SPEECH LABELS 3 4 bn or bn This label indicates the presence of background noise which is a broad category encompassing noise produced in the background of the call Some common examples are typing music papers shuffling babies crying etc At the time aligned level the label bn is used in the word box and there is a corresponding label to be used in the comments window if the background noise coincides with the foreground noise such that both can be heard See section 2 5 2 for coverage of extraneous noise in the comments box 3 5 br or lt br gt Breath noise either exhalation or inhalation occurring any place an actual breath is perceived Breaths often occur between pauses but speakers frequently exhale word or sentence finally Aspiration released at the end of a word which is not a part of the phone should be labeled br At the time aligned level when bris labeled a corresponding label must be placed in the comments window showing whether it is an inbreath or an outbreath See page 17 for more details on labeling in the comments window 3 6 bs or bs The label bs is to be used in the word box when speech can be heard
80. nese Romanization ki si chi tii ji dii ni k s t d n h m y r w Table 4 7 Special symbols Mandarin Pinyin OGI Type Orthog Orthog n nu 3 de de no tone reduced tone 4 5 SPECAL CONVENTIONS IN EACH LANGUAGE 41 Table 4 8 Special symbols Polish Polish OGI Worldbet Polish OGI Worldbet Orthog Orthog Label Orthog Orthog Label a a a e e wD Oc 8 i J i o r ZAR Table 4 9 Special characters Portuguese Portuguese OGI Type D RN aer bica aerobics aero bica agudo accent for a strength forc a cedilha p r to put po r circumflexo p o loaf pa o til Table 4 10 Special characters Spanish Spanish OGI Type Sei esta acute accent Table 4 11 Special characters Swedish Swedish OGI Type ortho ortho umlaut ring umlaut umlaut umlaut 42 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES Table 4 12 Special characters Vietnamese Vietnamese OGI Type ortho ortho Chapter 5 Phonetic Level Labeling 5 1 General Comments It is vain to do with more what can be done with fewer William of Occam The speech signal is packed with acoustic information Yet our ear sorts the information into intelligible sounds and words even in the presence of distortions and disfluencies Not all of the information in the signal is important in fact much is filtered out by the human perceptual system The labeling we do should hi
81. nglish 52 time aligned word purpose 1 time alignment degree of accuracy 37 TIMIT 4 tR hindi 59 tr hindi 59 transcription techniques 7 word techniques 7 word level 7 tre hindi 59 trill description of 80 98 phonetic segmentation 39 tS english 52 german 56 hindi 59 japanese 63 spanish 69 ts german 56 mandarin 65 tSc english 52 german 56 hindi 59 japanese 63 spanish 69 tsc german 56 mandarin 65 tSH hindi 59 tsh mandarin 65 tshr mandarin 65 tsr mandarin 65 tsrc mandarin 65 U english 50 german 54 hindi 57 u english 50 hindi 57 mandarin 64 spanish 67 u german 54 hindi 57 u amp english 51 uax german 55 unintelligible speech 27 uvular INDEX definition of 80 ux english 50 V spanish 69 v english 52 german 56 hindi 59 velar definition of 80 voiced closure following nasal 41 voiced stop following nasal 41 voicelessness description of 80 voicing description of 80 vowel back description of 81 central description of 81 devoiced 45 front description of 81 height description of 81 phonetic segmentation 44 vowel terms 81 w english 52 hindi 60 japanese 63 mandarin 65 spanish 69 word transcription non time aligned 7 word alignment 17 word level transcription 7 word transcription cough 24 ct 24 laugh 25 INDEX sneeze 27 sniff 27 te 27 vs 27 asterisk defined 2
82. nscription 24 bx 23 cough 24 ct 24 fp 24 laugh 25 An 25 Je 25 nitl usage word transcription 26 pau 46 sneeze 27 sniff 27 te 27 uu 27 vs 27 amp english 50 german 54 hindi 57 japanese 61 91 mandarin 64 spanish 67 amp 0 english 50 amp 0 japanese 61 amp r english 50 mandarin 64 japanese 61 english 50 german 54 hindi 57 3r english 50 4 japanese 61 Ar mandarin 64 Tax german 55 9r english 52 2 mandarin 64 4 japanese 61 T german 54 8 german 54 A english 50 mandarin 64 a german 54 92 hindi 57 japanese 61 spanish 67 german 54 hindi 57 japanese 61 aax german 55 ae english 50 affricate description of 80 phonetic segmentation 39 al english 51 japanese 61 ai german 55 hindi 57 mandarin 64 spanish 67 alveo palatal description of 80 alveolar definition of 80 approximant description of 80 approximate phonetic segmentation 44 aspiration description of 80 aU english 51 german 55 hindi 57 mandarin 64 au spanish 67 auto lyre window display 37 ax german 54 english 52 german 56 hindi 59 INDEX japanese 63 spanish 69 backwards label 18 base symbol 38 bc english 52 german 56 hindi 59 japanese 63 spanish 69 bH hindi 59 bilabial definition of 80 boundary setting right boundary 39 breath noise 23 broad phonetic labeling 38 burst invisibl
83. ntral vowel apaato apartment low central long vowel mid central long or short a common voiceless allophone of amp mid central stressed Table 6 20 Japanese Diphthongs Worldbet al ay hai yes a gt I ol oy o gt Il i4 4A4 el ey e gt I 6 6 2 Notes on Japanese Vowels e Vowel length is phonemic in Japanese In previous releases of the multi language corpus long vowels were transcribed with double vowel labels rather than with the vowel label plus colon e In word final position a nasalized vowel duplicated from the preceding vowel may be used instead of the velar nasal N see Table 6 21 6 6 JAPANESE 69 e Thesymbols 4 and 4 were labeled as u and u in previous releases of the multi language database 6 6 3 Japanese Consonants Japanese has bilabial dental alveolar palatal and velar consonants These are abbreviated in the consonant chart as follows bilab bilabial dent dental alv alveolar pal palatal vel velar Table 6 21 Japanese Consonants voiceless plosives voiced plosives voiceless affricates voiced affricates voiceless fricatives voiced fricatives nasals lateral glides 6 6 4 Notes on Japanese Consonants e Length is phonemic for Japanese consonants Long consonants will be indicated by placing a colon after both the closure and release label as in k i t c t aa e The labels t s d z tS and dZ are allophones of the phoneme
84. on mapping back to the cantonese character in order to disambiguate words words like leih 5 4 5 3 Czech Written Czech uses a Roman alphabet However there are three accent or diacritical markings the hook the acute accent and the krouzek the little circle on top of the letter u In our transcriptions we have convert these diacritical markings to characters that can be typed on a standard keyboard See table 4 2 for character conversions The first three words in the table are examples of the palatalization marker or the little hook h ek The little hook appears on z s c r n e d and t In standard Czech the palatalized version of lower case d and t are normally marked with an apostrophe the little hook only appears on the capitalized versions However we will transcribe lower case palatalized t and d with the little hook as shown in the table because there is no real difference between the apostrophe and the little hook in written Czech the distinction is purely conventional 34 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES Vowel length is marked on Czech vowels by placing an acute accent over the vowel aeio u and y have long and short versions See table 4 2 The krouz ek or little circle on the u is a historical convention for vowels that used to be long o u with a circle on top will be transcribed as u 4 5 4 English See Chapter 2 for transcription conventions for English 4 5 5 Farsi We have not
85. ot pause between the letters but currently that is not done Without access to a spectrogram transcribers are not able to make the distinction consistently 2 4 5 Suprasegmentals Currently there are no specific notations for suprasegmentals Pauses are marked but by convention pau is a period of relative inactivity in the waveform at least 1000ms long rather than a discourse unit in the stricter sense See Section 3 18 2 4 6 Transcribing Numbers Write numbers as words 1959 is transcribed nineteen fifty nine Dashes as in twenty nine are not necessary in English transcriptions 2 4 7 The Letter Number and Exclamation o The number zero when spoken as o is transcribed as oh When the speaker is spelling the letter o write simply o when oh is said as an exclamation it is transcribed ohh 2 4 8 Okay The confirmation okay can typically be written okay or OK in standard English orthography Because we do not use capital letters in transcriptions we will transcribe the word okay 2 4 9 Discarding Files The criteria for discarding files is task dependent In general we discard files in which the caller has hung up without speaking in which only line noise can be heard or in which there is no useful speech Be sure to check the criteria for discarding for your particular application The procedure of discarding files varies with the task as well Different scripts may run slightly differently The
86. ould normally be seen between F2 and F3 In the sentence We used to stand in queues there will likely be palatalization of the k in queues as the mouth makes the transition from a high front vowel 7 17 CUT OFF SPEECH 83 to a back vowel If there is no visible velar pinch in the spectrogram and the k sounds palatalized possibly approximating the sequence kh j use the _j diacritic 7 17 Cut Off Speech Due to the nature of CSLU s telephone speech data collections often a person s speech will be abruptly cut short This has lead to the development of a cut off diacritic used when the signal ends abruptly If a phoneme occurring at the end or beginning of a file has been cut short the diacritic x should be used to distinguish it from spectral segments manifesting complete articulations Appendices Appendix A Terminology A 1 Description Appendix A defines linguistic terms used throughout this document These definitions will be especially useful when examining the charts in Chapter 6 We have not deviated from standard linguistic theory in our use of these terms A 2 Phonemes Versus Allophones A phoneme is a distinctive sound in a given language which acts to contrast words An allophone is a predictable phonetic variant of a phoneme Examples b is a phoneme in English To determine if a given phone is an phoneme see if a minimal pair can be found A minimal pair or set is two distin
87. ourse in Phonetics Harcourt Brace Jovanovich third edition 1993 12 Terri Lander Ron Cole Beatrice Oshika and Mike Noel Multi language speech database Creation and phonetic labeling agreement In Eurospeech95 Conference Proceedings september 1995 13 Terri Lander Beatrice Oshika Ron Cole and Mark Fanty Multi language speech database Creation and phonetic labeling agreement In JCPAS Conference Proceedings august 1994 14 Ian Maddieson and Kristin Precoda Upsid and phoneme manuscript 1992 89 90 15 16 17 18 19 BIBLIOGRAPHY M I T Massachusetts Massachusetts Institute of Technology 6 67s Speech Spectrogram Reading July 1985 Yeshwant Muthusamy Kay Berkling Takayuki Arai Ronald Cole and Etienne Barnard A comparison of approaches to automatic language identification In Furospeech sept 1993 John J Ohala and Brian W Eukel Explaining the intrinsic pitch of vowels In R Channon and L Shockey editors In Hone of Ilse Lehiste Foris Dordrecht 1987 S Seneff and V W Zue Transcription and alignment of the timit database TIMIT CD ROM Documentation 1988 Timothy J Vance An Introduction to Japanese Phonology State University of New York Press Albany New York 1987 Index long 25 gt english 50 german 54 hindi 57 mandarin 64 gt Y german 55 gt i english 51 15 21 usage word transcription 12 bn 22 br 23 burp usage word tra
88. ower teeth These are also called dental Alveolar Produced with the tip or blade of the tongue raised to the alveolar ridge Retroflex Produced with the tip of the tongue and the back of the alveolar ridge Palatal Alveolar Produced with the blade of the tongue and the back of the alveolar ridge Palatal Produced with the front of the tongue and the hard palate 7 Velar Produced by raising the back of the tongue to the velum Uvular Produced with the back of the tongue at the uvula Glottal Produced at the glottis A 3 2 Manner of Articulation The following terms are used to describe the manner of articulation Stop Airflow is completely cut off by the closure of the articulators pressure builds up and is released Stops occur in the initial sounds of the words buy toy and dog Nasal Nasals are produced when the soft palate is lowered and air is allowed to flow through the nasal passage Examples are the final sounds in ram and cocoon Fricative Fricatives involve partial closure such that the air flowing between the articulators is turbulent The words float veil sing and zip begin with fricatives The first two f and v are weak fricatives and s and z are strong strident fricatives Affricate A combination of a stop followed by a fricative The words church and judge begin with affricates Tap or flap Produced by asingle tap of the tongue against the alveolar ridge In the American dialect the
89. pler in this case If a stop has been released there will be an irregular portion of the waveform that will distinguish the stop burst from both the preceding closure and following phone Segment the distinct portion and label it as the burst 2 Look closely at the formants which follow the stop closure is there a period where the formants are level and then begin to move into position for the vowel Segment the portion of the spectrogram where the formants are level and label it as the burst 3 If there is no acoustic information signaling the burst assume one did not occur 5 9 2 Closures In general the left boundary of a closure is placed where the energy for the preceding phone stops The cease of energy is obvious in most cases when the closure follows anything but a pause Following are conventions used to label closures Voiceless closures e Voiceless stop closures which occur at the end of a pause and the beginning of an utterance often have spectral evidence to signal the beginning of the closure Speakers may make a small amount of noise when moving their articulators into the position of the closure when this occurs a small pulse is seen in the waveform or the spectrogram at which point the closure label should be placed e If there is no acoustic evidence to signal the beginning of the closure it should still be labeled The boundary should be set 50ms to ensure that labels are consistent This length was chosen as an
90. r Tone 4 High falling tone i 4 meaning For more information about tones and their phonemic values see or Tone 3 may be shortened in rapid speech it can change to Tone in a word before a syllable with Tone 2 either intra or inter word The tone is labeled according to which phonemic tone it is closer to Tone 1 or Tone 3 At this time the reduced tone the light tone is labeled as Tone 1 This includes most of the Pin Yin de and le formations T4 CHAPTER 6 PHONETIC LABELS 6 8 Spanish 6 8 1 Spanish Vowels Table 6 27 Spanish Vowel Examples and Description Worldbet nino child high front tense vowel simple simple high reduced vowel allophone beb baby mid front tense vowel pero but mid front lax vowel allophone mid central reduced vowel allophone neutral vowel allophone duda doubt high back tense vowel boda marriage mid back tense vowel papa papa low central vowel Table 6 28 Spanish Diphthong Examples and Description Worldbet estoy autobus caray 6 8 2 Spanish Consonants Spanish has consonants at seven places of articulation They are abbreviated in the Spanish consonant chart as follows bilab bilabial ld labiodental inter interdental dent dental pal palatal vel velar and gl glottal 6 8 3 Notes on Spanish Consonants e The label hs represents the sylable final replacement of s by aspiration i
91. r connect unintelligible speech connect or not voice squeak connect or not unknown spelling always connect a yawn connect or not begin answer Census Corpus end answer Census Corpus abbreviation whispered speech usually connect beginning of read speech end of read speech 3 25 VS OR VS 27 Table 3 2 Filled pause labels for English word level labeling Filled pause label English translation if any hmm bilabial aspirated beginning uh centralized vowel no nasal uhm centralized vowel bilabial nasal mm bilabial nasal hum Table 3 3 Other miscellaneous words in English word level labeling English translation i any alveolar nasal begins first uh usually rising intonation usually falling intonation usually falling intonation usually rising intonation usually falling intonation usually falling intonation 28 CHAPTER 3 DESCRIPTION OF NON SPEECH LABELS Chapter 4 Orthographic Conventions For Multiple Languages 4 1 Overview Chapter 4 discusses special orthographic transcription conventions developed for languages other than English A special note Transcription conventions for all languages will follow those found in Chapter 2 and 3 Exceptions include conventions classified as English specific or those superceded by something found in Chapter 4 The outline of the chapter is as follows e Non speech events a discussion of non speech labels developed especially for languages other than
92. r should listen to a phone in a context of at least one phone on each side of the phone in question Use available acoustic information in the waveform to decide where to place the boundary For high frequency low amplitude phones the spectrogram may be used as these sounds are difficult to identify in the waveform The practice of listening to a segment for a given sound and then extending the boundary until the sound is no longer heard is not generally recommended for determining the placement of boundaries although it may be necessary as a last resort in the absence of spectral cues Boundaries should be set whenever possible where the waveform shows change Changes in the waveform are generally more reliable than changes in the spectrogram This is especially true for transitions from low to high amplitude sounds The spectrogram is computed by averaging the energy in a given sized window and displaying the averaged values While in most cases this increases the accuracy of the spectrogram when the energy level goes from near zero as in a stop closure to very high in the stop burst ghost images appear in the spectrogram These ghost bursts show up on the spectrogram perhaps 10 or 20 milliseconds before the burst appears in the waveform They are caused by the averaging of the abnormally low values with the very high values Ghost bursts that appear in the spectrogram should not be used for segmentation rather the labeler should set the
93. rticulation on the Spanish stop could either be transcribed with a diacritic or not transcribed at all It is not important to mark the dental place if one considers phonological contrast alone because there is no language for which dental and alveolar stops are contrastive the label t suffices to mark both the dental and alveolar t without conflict e Worldbet symbol choice is based not only on phonological contrast but on descriptive principles For example a Worldbet base label explicitly contains information about aspiration or place of articulation in situations where the IPA might not transcribe this explicitly For example Worlbet uses th in English aspirated alveolar plosive t in Spanish dental unaspirated plosive and t in French unaspirated alveolar plosive where the IPA would generally use t for all three cases e One can transcribe either phonemically or phonetically with the IPA but there is no mechanism for transcribing both levels at the same time e Worldbet attempts to transcribe both phonetic and phonemic levels of labeling in a single tier Base labels are usually phonemes with diacritics showing phonetic detail If a phone mically voiceless alveolar fricative s becomes completely voiced during articulation the 1 8 LEVELS OF TRANSCRIPTION 5 segment would be transcribed s v in Worldbet but z in the IPA The Worldbet scheme has the advantage of retaining any length distinctions left on the voiced s but th
94. sifiers Classifiers will be separated from the word they modify by dashes Note the follow ing examples ichi ji desu one o clock ip pun one minute yok ka fourth day yo ka eighth day 3 Numbers Numbers will be written as one word such as ju ni twelve and hyakurokuju one hundred and sixty 4 5 13 Korean Korean transcriptions have not yet begun The issue of what romanization to use will need to be addressed when transcriptions begin but we will likely use some form of Hangul such as that used in the dictionary entitled English Korean Practical Conversation Dictionary printed by Hollym in 1984 4 5 14 Mandarin Chinese Standard Pinyin is used to transcribe mandarin with two modifications 1 tone 2 See table 4 7 Pinyin can be found in many places Chinese English dictionaries for one Its use is quite widespread The four Mandarin tones are identified by the numbers one through four In orthographic transcriptions these numbers are connected to the word by a dash The high level tone is 1 the rising tone is 2 the falling rising is 3 and the falling tone is 4 A reduced tone sometimes occurs in fast speech This is usually found in a tone one word whose pitch is drastically reduced or shortened and which does not reach the target of tone one in actual pronunciation Because the reduced tone is not phonemic it is not labeled at the word level The so called reduced tone normal
95. stops are not released in running speech sometimes they are it can be confusing to label them The following are the conventions for labeling stop closures between words CHAPTER 5 PHONETIC LEVEL LABELING Unreleased Closure Segmentation stop stop top dog tic th A pc de d A ge g top dog label tc th A pc dc d A gc g neat tie n i te te th al neat tie label n i te te th al stop d at the te D amp at the label tc D amp stop fricative bit far be b I tc f A Or bit far label bc b I tc f A 9r stop vowel bit off be b I tc f bit off label bc b I tc A f 5 9 SEGMENTATION AND LABEL SELECTION 49 Released Closure Segmentation stop stop top dog tc th A pc ph de d A gc g top dog label tc th A pc ph dc d A gc g neat tie n i te th te th al neat tie label n i tc th tc th al stop d at the te th D amp at the label tc th D amp stop fricative bit far be b I tc th f A Or bit far label bc b I tc th f A 9r stop vowel bit off be b I tc th A f bit off label bc b I tc th A f 5 9 3 Fricatives Of the fricatives sibilants are the easiest to isolate because of the high energy they produce the onset can be determined by a sudden heavy increase of random energy in the spectro gram other types of fricatives are also marked by random energy in the spectrogram but the amplitude and visibility may be very low in the waveform Set the left
96. tandard American spelling However in French it seems more natural to write On se email rather than On se i mel ranscribers should do what seems most natural for the specific utterance and language Specific conventions to deal with each language will be developed on a case by case basis and will be described in this chapter 4 3 3 Foreign words modified grammatically At times speakers of one language alter the form of a word to fit the grammar of another language For example In Bantu languages the plural morpheme is ba and the singular suffix is mu When discussing the fate of a single bartender one Swahili speaker referred to the bartender as a mutender He altered the English word bartender to fit the grammer of Swahili a native speaker of Swahili would understand this to mean a solitary person who tends a bar Utterances requiring alt might raise eyebrows but the meaning is usually clear to a speaker of both languages Words of one language that are altered to fit a grammatical pattern in another language should appear with the alt tag attached The example above would be transcribed as mutender lt alt gt or possibly mutenda lt alt gt to better approximate the actual pronuncia tion Another example comes from Japanese One speaker joined the English verb lecture and the Japanese verb suru saying lecturesuru to signify to do lecturing This would be transcribed lecturesuru lt alt gt It is up
97. the words in Table 2 1 or words of the same form i e of becomes a in kinda sorta lota going to becomes gonna what are you becomes whacha want to becomes wanna got to becomes gonna transcribe them as they appear in the list above Apostrophes are not used to indicate the omission of one or more letters See Section 2 4 3 The words in Table 2 1 are given special consideration due to their frequency their widespread acceptance in informal speech and the significant acoustic variation from the dictionary pronunciation Another set of exceptions are the filled pauses The filled pauses are listed and defined in 14 CHAPTER 2 WORD LEVEL CONVENTIONS Table 2 3 Example transcriptions actual utterance transcription it s bout time it s about time whacha doin whacha doing walkin walking nuts n bolts nuts and bolts nuts an bolts nuts and bolts Table 2 2 They are also described in Chapter 3 Table 2 3 contains some example transcriptions for potentially confusing cases In summary if the word is not of the type listed in Table 2 1 or 2 2 but it is clearly an intelligible word it should be transcribed using standard spelling as displayed in Table 2 3 If the pronunciation is noticibly different from the normal or standard dictionary pronunciation and it is not simply a result of natural sound change in the language consider using the lt pron gt tag Chapter 3 Finer detail if desired should be captured in a mor
98. thetic closure phonetic labeling 47 94 F japanese 63 f english 52 german 56 hindi 59 mandarin 65 spanish 69 filled pause 24 flap description of 80 foreign speech non time aligned word 26 fricative description of 80 phonetic segmentation 43 spectral cues 43 G spanish 69 8 english 52 german 56 hindi 59 japanese 63 spanish 69 gc english 52 german 56 hindi 59 japanese 63 spanish 69 geminate 45 gH hindi 59 ghost bursts 39 glot 25 glottal definition of 80 h english 52 german 56 hindi 60 japanese 63 mandarin 65 h v english 52 hs i4 i amp lax If INDEX spanish 69 english 50 german 54 hindi 57 mandarin 64 spanish 67 japanese 61 spanish 67 japanese 61 english 50 german 54 hindi 57 japanese 61 mandarin 64 english 51 german 55 mandarin 64 informal speech 10 interdental definition of 80 IPA 3 iU english 51 Ix english 50 hindi 57 j english 52 german 56 hindi 60 japanese 63 mandarin 65 spanish 69 german 56 hindi 59 japanese 63 INDEX mandarin 65 spanish 69 kc english 52 german 56 hindi 59 japanese 63 mandarin 65 spanish 69 kH hindi 59 kh english 52 german 56 mandarin 65 spanish 69 english 52 german 56 hindi 60 mandarin 65 spanish 69 japanese 63 english 52 german 56 labiodental definition of 80 line noise 25 lip smack 25 liquid
99. tion diacritic should be used so that a phonemic level transcription can be reproduced without lexical knowledge 7 12 Rhotacization The diacritic _r is used to indicate r coloring or rhotacization Many vowels become retroflexed when they are followed by r as in beard bared bard bored poor tire and hour In the above examples rhotacization does not usually appear on the beginning of the vowel F3 is at the level appropriate for that vowel When retroflexion begins F3 suddenly dips to a lower level The r diacritic should only be applied to the portion of the vowel in which F3 has reached a level indicating retroflexion for that speaker 2 000Hz for the average male voice There are two retroflex base labels 3r or amp r in the English set Traditionally these syllabic retroflex vowels have been distinguished from other retroflexed vowels because in most cases the retroflexion affects the entire vowel Occasionally vowels with a different placement than 3r or amp r will behave like syllabic retroflexes in that the entire vowel will be retroflexed but this is relatively rare If it occurs the r diacritic may span the entire phone tr clusters often have retroflection in the aspiration notable mainly because of third formant movement going into the vowel Another common coarticulation effect in tr clusters is palatalization Segment and listen for retroflexion on the t and check the position of F3 at the onset of the vo
100. to the transcriber to decide how to spell these altered words It is preferable that the transcription be close enough to the spelling of the original borrowed words in the respective original languages if known so that the origin of the alteration is clear Transcribers should do what seems most natural for the specific utterance and for the language Note lt alt gt should only be used on a word that has changed form If a French person were to say Ici on ne parle pas On se email Email although used like a French verb is itself unaltered it is not conjugated or inflected with French markings and would not require the alt tag It might however require lt nitl gt 4 4 Filled Pauses Filled pauses are treated like actual words so they do not appear within pointy brackets Table 4 1 lists the allowable filled pauses for each language English filled pause labels appearing in Table 3 2 are also available to transcribers in each language If a filled pause is uttered that is not in either Table 4 1 or Table 3 2 the transcriber has the following options e If it occurs commonly it should be added to this table e If it is not common but can be sounded out attach sp to the end erg lt sp gt no need to create a new label for these rare cases 32 CHAPTER 4 ORTHOGRAPHIC CONVENTIONS FOR MULTIPLE LANGUAGES e If the utterance is too rare to require a label assignment and you are unable to sound it out transcribe the filled p
101. to the utterance and in that case the transcriber should just do the best he she can to transcribe what is heard See the description of the cut off label x for another use of the uu label 3 25 vs or vs The label vs is to be used for high pitched squeaks produced during speech Voice squeaks are spectrally distinct having formants which slope upwards and disappear They generally occur word initially or word finally when the speaker s voice cracks There is generally a large enough gap between the voice squeak and the rest of the word to consider the voice squeak as a separate entity Because of the reliance upon the spectrograms for voice squeak detection it is not required at the non time aligned level However if the transcriber can identify a voice 3 25 VS OR VS 25 squeak at the non time aligned level using only the waveform he she is encouraged to use the label The label whisper is used for whispered speech Connect the tag to the end of each whispered word in non time aligned transcriptions or make a box in the comments window spanning the period of the whisper for time aligned transcriptions In non time aligned labeling this tag should always be connected to a word unless there is a period of whispering that is unintelligible In the latter case the tag whisper can appear by itself 26 lt beep gt lt blip gt lt bn gt lt br gt lt bs gt lt burp gt lt cough gt lt ct gt
102. tral back I maj E Table 6 24 Mandarin Correspondence Chart Pinyin and OGI Worldbet Description high front rounded high front tense high front unrounded lax vowel very high front almost fricated mid front lax vowel mid reduced vowel high back vowel mid back unrounded retroflexed vowel mid back unrounded vowel mid central retroflex vowel mid low back vowel low central vowel low front vowel Table 6 25 Mandarin Diphthongs Worldbet ai ay aU aw ei ey oU OW 6 7 2 Mandarin Consonants Following is the correspondence chart between the Pinyin Worldbet and OGI symbols 6 7 3 Notes on Mandarin Consonants e The diacritic h attached to a label normally represents aspiration However in Man darin it has been attached to vowels when the speaker is whispering i e voicelessness 72 CHAPTER 6 PHONETIC LABELS Table 6 26 Correspondence Chart for Pinyin and Worldbet Consonants Worldbet eee L D e CO Em voiceless bilabial plosive voiceless aspirated bilabial plosive p or pH closure voiceless dental plosive voiceless aspirated dental plosive t or th closure voiceless velar plosive voiceless aspirated velar plosive k or kH closure voiceless affricate voiceless aspirated affricate ts and tsH closure voiceless retroflex affricate voiceless aspirated retroflex affricate tsr and tsR closure voiceless palatal affricate voiceless aspirated palatal affricate c
103. tralized dental flapped consonant fricated stop glottal onset glottalized lateral release lengthened nasal release nasalized not in the language palatalized retroflexion less rounded more rounded syllabicity voiced voiceless waveform cut off filled pause line noise corruption background noise Table 7 1 Mapping between IPA Worldbet and OGIbet Diacritics DI 78 CHAPTER 7 DIACRITICS 7 1 Overview Diacritics are used to show finer detail than the base symbol is designed to give With few exceptions mainly vowels and syllabics base labels represent single phonemes while diacritics provide additional phonetic detail A diacritic is separated from the base label by the underscore The number of diacritics used depends on what is needed to accurately describe the phone There is no particular ordering for diacritics If there is noticeable spectral variation within the same basic phone it may be divided into multiple segments all of which contain the same base label but different diacritics Vowels that become glottalized should be segmented into at least two parts the first segment with the base label for the vowel and the second segment with the vowel diacritic A single phoneme can be segmented into more than two segments Vowels may become glottalized for a period then heavily aspirated and finally devoiced Thus a single phoneme may be represented in the transcription with any number of base labels each ref
104. ts Glottalization Glottalization is another acoustic cue signaling the boundary between the geminates For example in the phrase we each have more the two i phonemes might be separated by a period of glottalized i Segment the first period of non glottalized vowel in we as i the glottalized portion as 1 7 and the final non glottalized portion in each as i This is consistent with the conventions for labeling glottalization When converting to the phonemic level to avoid confusion between how many phonemes there are in the string in this case how many i phonemes there are any glottalized vowel can be merged with a preceding or following non glottalized vowel if one exists Allophones Geminates can be realized as two allophones of the same phoneme The two segments will be distinct acoustically and visually For example in the Spanish sentence ohala que pepe este en la casa i hope pepe is in the house the first of the geminates would usually be realized as 52 CHAPTER 5 PHONETIC LEVEL LABELING e and the second as E If labels for these allophones do not exist in the language segment into two segments marked with the same label placing the boundary where vowel quality changes When the more descriptive labels exist for the language use them Lengthening If there is no other spectral cue geminate phones are signaled by lengthening Usually geminates will at least be longer than an average occurren
105. ughts before going on e Closures Use the closure symbols in conjunction with stop or affricate labels Closures precede all plosives or affricates and are formed by complete closure of the articulators Usually this closure is accompanied by a build up of pressure seen in the release of the stop or affricate burst e Epenthetic Closures Use the label when there is no actual pause but a period of closure in the spectrogram that can not be associated with a phonemic stop or affricate The segment should be 30 milliseconds or longer Often it occurs between a nasal and a fricative Periods of silence between words are usually labeled as pauses and not epenthetic closures The epenthetic closure label is also used on the closure preceding fricatives that have been realized as stops See Section 5 9 3 54 CHAPTER 5 PHONETIC LEVEL LABELING Chapter 6 Phonetic Labels 6 1 Introduction The tables in this chapter display those speech labels used in the phonemic phonetic tran scriptions of six languages English German Hindi Japanese Mandarin Chinese and Spanish With the exception of Hindi these are a part of the OGI multi language 10 language corpus The current data collection has expanded the original 10 language corpus into 22 languages including Arabic Cantonese Czech Farsi French German Hindi Hungarian Indonesian Italian Japanese Korean Mandarin Polish Portuguese Russian Spanish S wahili Swedish Vietnamese Ta
106. ut off in the middle of a word the waveform will only contain the part of the word being uttered when recording either stopped or started Often the transcriber can supply the whole word from contextual clues but other times the word cannot be ascertained Iranscriptions will be different depending on the transcribers ability to supply the missing part of the word If part of the word can be ascertained it should appear in square brackets to disambiguate it from the non speech labels which appear in pointy brackets lt gt Consider the following valid transcriptions 2 8 NON TIME ALIGNED CONVENTIONS 11 1 m y name is jerry 2 jerry is my na me 3 my name is jerry 4 isideck lt sp gt is my mother s name Notice in example 1 that the asterisk is placed at the beginning of the first word in the transcription to indicate that the word at the beginning of the file was cut off In example 2 the asterisk has been placed at the end of the last word in the file to indicate that the speech has been cut off at the end of the word Example 3 is like example 1 in that the beginning of the file has been cut off Note also the bracketed information in examples 1 and 2 In example 1 m signifies that the m sound was cut off but the entire word could be supplied by the transcriber Note also that even though the sound the transcriber heard was ay or in Worldbet al the trancription is m y which reflects the stan
107. voiceless labiodental fricative voiceless dental fricative voiceless alveolar sibilant voiceless alveo palatal sibilant voiceless glottal fricative voiced labiodental fricative voiced dental fricative voiced alveolar sibilant voiced alveo palatal sibilant voiced glottal fricative voiceless alveo palatal affricate tS closure voiced alveo palatal affricate dZ closure alveolar lateral retroflex approximate palatal glide bilabial glide syllabic m syllabic n syllabic N syllabic 1 6 2 ENGLISH 59 t or k If the phonetic realization of a phoneme is very short ie flapped the diacritic _ will be used This occurs frequently with b g n t and d The alveolar flap although asingle phonetic percept has dual underlying phonemic representation and is transcribed as either th_ or d_ This serves to distinguish the phone s in the words writer and rider Although it is not clear whether or not these phones are acoustically distinct they are transcribed separately to facilitate mapping to the word level from phonetic level transcriptions Spectrally some fricatives seem to manifest closure burst segments These apparent closure like segments have been analyzed as either periods of decreased frication with low amplitude in which case the segments are included in the label box for the fricative or they are analyzed as epenthetic closures and labeled DU CHAPTER 6 PHONETIC LABELS 6 3 French 6
108. wel If the level of F3 is low consider retroflexion If F3 is still quite high but aspiration is heavier than normal on the t burst consider palatalization Examples taken from 82 CHAPTER 7 DIACRITICS 7 13 Voicing and Voicelessness The diacritics v and 0 are used to label voicing of normally voiceless segments and absence of voicing on normally voiced segments respectively In a normally voiceless consonant if there is voicing throughout the entire segment use the voicing diacritic _v Similarly if a phonemically voiced consonant is completely devoiced use the voiceless dia critic 0 These two diacritics are not used for partially devoiced or partially voiced consonants because the threshold would be too subjective and labeling would lack consistency Because vowels have a higher amplitude carry stress and are generally more clearly artic ulated than consonants they should be segmented more precisely If a portion of a vowel is devoiced and a portion is voiced the portion that is devoiced should be segmented separately and given the devoicing diacritic Note that the z must be completely devoiced for the de voicing diacritic to be used where as the devoiced portion of the vowel is segmented separately regardless When _0 is used aspiration is generally assumed and need not be explicitly marked 7 14 Labialization The diacritic w is used for rounding of both consonants and vowels Lip rounding may be indicate
109. x if the speech is extraneous or read speech 18 CHAPTER 2 WORD LEVEL CONVENTIONS Spontaneous speech is the default boundary boxes don t need to be set for this speech We consider read speech to differ from extemporaneous speech Thus if the caller is reading make a boundary box where the read speech begins and label it Nbeginread Make a box at the boundary where read speech ends and type endread Extraneous speech noise Often noise can be heard in the background of a call The labels bn and bs used in the word box are sufficient to cover isolated occurrances of background speech and noise when extraneous information occurs during a pause or during a segment of relative silence However when an event is segmented and extraneous noise or speech is heard in addition to a more prominent phenomenon to be labeled in the word box the following labels should be used in the comments window To mark the beginning of the extraneous speech event the abbreviated label is NB XS To label the end of this phenomenon use EXS These are abbreviations for begin extraneous speech and end extraneous speech Likewise for extraneous noise the labels are BXN and EXN These labels stand for begin extraneous noise and end extraneous noise Note that these symbols only need to be used when the noise or speech in the background overlaps foreground events most commonly speech by the caller Answers When the utterances be
110. y 43 44 CHAPTER 5 PHONETIC LEVEL LABELING 5 4 Level of Labeling That which for the sake of simplicity we identify as phonetic labeling would be more ac curately termed broad phonetic labeling with a phonemic basis Hor each language a set of phonemes distinctive speech sounds within the language are chosen These label sets contain all of the phonemes in each language in addition to spectrally distinct and frequently occurring allophones The allophones with a few exceptions are labeled with diacritics 5 5 Label Set In the past data was transcribed phonetically using OGIbet a broad phonetic transcription set based on TIMIT In 1993 a different transcription set Worldbet was adopted for phonetic transcriptions This label set developed by Dr Jim Hieronymus was chosen because it can more consistently handle transcriptions of non European languages without multiply defining a given symbol Whereas the OGIbet label r was used to identify both the English retroflex and the Spanish alveolar flap and other sounds Worldbet r signifies only type of sound an alveolar trill where as the retroflex approximate is transcribed 9r For this reason as well as the fact that Worldbet is made up of ASCII characters which are easier to process Worldbet was determined to be more suitable and extensible for the multi language labeling effort at the CSLU See 1 3 3 The content of the label sets in the following chapter is consistent
111. y to get frustrated and hang up before the call is completed representing a loss of valuable data for the researcher 2 4 Miscellaneous Details This section describes conventions to be followed at the time aligned and non time aligned levels 2 4 MISCELLANEOUS DETAILS 13 Table 2 1 List of Exceptions transcription example utterance gonna i m gonna go whacha whacha doin wanna i wanna see gotta i gotta go now kinda yeah kinda lota there s a lota money nother that s a whole nother story Table 2 2 Filled Pauses transcription meaning thinking word thinking word thinking word thinking word mm hmm yes hmm mm mm no nuh uh uh huh huh uh uh uh 2 4 1 Exceptions to dictionary form rule Word transcriptions at the time aligned and non time aligned levels should contain the words the speaker said The words should appear in citation form as they would appear in a dictio nary We know that in continuous speech people often delete or transpose syllables and that coarticulation can alter the percept of a word In order to simplify the orthographic transcrip tion process and to minimize redundancy with phonetic transcriptions we have opted to ignore most such articulations especially those which follow the natural phonological processes of the language The words in Table 2 1 are exceptions to the citation form rule Their dictionarly entry form does not seem to be consistent If callers utter
112. z t s and d z are used before uw tS and dZ are used before i e We added the symbol r to the Worldbet label set 70 CHAPTER 6 PHONETIC LABELS papa papa abu horsefly uta song eda branch aka red igo Igo game itsu when ichi one izu izu place name iji taste fu fu couple hito human voiceless glottal fricative asa morning aza bruise ishi stone aho fool ama mom ana hole awa bubble ayu sweetfish Table 6 22 Japanese Consonant Examples and Descriptions Worldbet voiceless bilabial plosive voiceless bilabial closure voiced bilabial plosive voiced bilabial closure voiceless dental plosive voiceless dental closure voiced dental plosive voiced dental closure voiceless velar plosive voiceless velar closure voiced velar plosive voiced velar closure voiceless dental affricate t s closure voiceless palatal affricate tS closure voiced dental affricate d z closure voiced palatal affricate dZ closure voiceless labial fricative voiceless velar fricative voiceless dental sibilant voiced dental sibilant voiceless palatal fricative voiced palatal fricative bilabial nasal dental nasal uvular nasal voiced alveolar lateral flap bilabial glide palatal glide 6 7 MANDARIN 71 6 7 Mandarin 6 7 1 Mandarin Vowels Table 6 23 Mandarin Vowels front cen
113. zation on diphthongs It can be difficult to determine what base label to use if only the initial part of a diphthong is glottalized It does not seem reasonable to mark a segment al unless the glottalized portion sounds like the full diphthong Often the glottalized portion at the beginning of a diphthong sounds like a single vowel rather than a diphthong When the initial portion of a diphthong is glottalized if there is movement in the formant of the segment and if the quality of the vowel sounds like the entire diphthong use the entire 80 CHAPTER 7 DIACRITICS diphthong label for example al However if the glottalized portion of the diphthong only bears the quality of the initial sound in the diphthong use the following conventions diphthong base vowel aU _ al A iU is gt i gt ei ei oU oU Notice that for ei and oU the entire diphthong label is used regardless of whether or not the offglide is heard This is done because there is no specific label for either e or o in English Also the initial glottalized portion of the diphthong aU becomes _ while aI becomes A 7 The conventions are established this way for optimal descriptiveness being the most common initial sound in aU and A being the most common initial sound in al 7 7 2 Glottal t A special use of the glottalization diacritic is with the phoneme t In American English t is often replaced by a glottal stop as in the word button When

Download Pdf Manuals

image

Related Search

Related Contents

331585 S10 SERVICE INFO MANUAL  Euro-Pro V1707H User's Manual  OpenLAB CDS ChemStation HPLC版基本操作マニュアル  Samsung PPS-50A656 50-inch plasma TV Screen Size 50" Aspect  PDF - Complex Adaptive Systems Modeling  ASX Clear (Futures) Margin Control  Manuels de l`Administrateur et de l`Opérateur  PDF - 翻訳センター  Voice Recorder  

Copyright © All rights reserved.
Failed to retrieve file