Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 11 July 2020

(p. 639) Index

(p. 639) Index

Introductory Note

References such as “178–9” indicate (not necessarily continuous) discussion of a topic across a range of pages. Wherever possible in the case of topics with many references, these have either been divided into sub-topics or only the most significant discussions of the topic are listed. Because the entire work is about ‘corpus phonology’, the use of this term (and certain others which occur constantly throughout the book) as an entry point has been minimized. Information will be found under the corresponding detailed topics.

@ana attribute 186–7
Abstract Corpus Model (ACM) 310
abstract representations 87, 262, 406
abstraction 48, 63, 65, 118, 175
accented syllables 74, 80–1, 85–6, 481
accents 69, 74, 226, 295, 299, 367, 553
final 558
phrase 78, 482, 566, 573
access control 440, 456, 461
accuracy 24–5, 51, 55–6, 59, 93, 103, 105
acoustic measurements 272, 280
acoustic parameters 211, 567
acoustic signals 57, 195, 197, 203, 403, 558
acoustics, speech 272, 285
acquisition of Taiwan Southern Min 577, 579, 581, 583, 585, 587
actual forms 269, 281, 283, 391–2, 397
actual pronunciation 55, 503, 581
adjectives 182–3, 219, 227–30, 237, 239, 242–4, 246
admin interface 463
adult phonological systems 268, 277, 284, 396
Advanced Research Projects Agency (ARPA) 65, 92
aerodynamic modelling 1, 13
Africa 455, 494–7
agile corpus creation 14, 25, 29, 107, 405
agreement 38, 100–1, 106, 119, 146, 433, 515–16
degree of 96
inter-annotator 501, 515–16
intra-annotator 25, 515–16
perfect 515–16
scores 95, 101–2
Aix-Marsec corpus 19
algorithms 84, 231–2, 278, 280–1, 283, 391, 394
alignment 94–5, 181–2, 319, 326, 390–1, 555, 557
visual 181
allophonic variants 56
alphanumeric symbols 66, 254
alveolars 62, 210, 398, 540
American English 54, 60, 78, 287–8, 565, 567, 571
General 92, 562
AMP systems 440–2
analogue recordings 134, 518, 520
annotated corpora 169, 174, 189, 299, 321, 430
annotated data 140–1, 185, 321, 436
annotation(s) 1–4, 15–19, 23–6, 46–89, 169–71, 305–19, 327–35
contour-based transcription systems and the encoding of intonational phenomena 70–7
data 306, 313, 316, 423, 434
documents 306, 310, 313–14, 316–17, 319
elements 422, 424, 426–7, 432, 435–6
encoding segmental phonetic and phonological information 52–67
errors 23, 25
graphs 47, 171, 176, 403, 406
labels 200, 307, 407
layers 166, 169, 172
levels 153, 172, 174–6, 184, 187, 317, 326
linguistic 15, 17, 58, 168, 171, 173–4, 176
morphosyntactic 175, 185, 188, 558–60
most commonly used systems 58–67
multimedia 3, 305, 319; see also ELAN
overlapping 318–19
phonetic, see phonetic annotation
processes 25, 67, 313, 423, 431, 557, 560
programs 44, 356–7
prosodic 4, 362, 367, 515, 558, 560, 575
schemes/schemas 25, 170–3, 175–6, 183, 322–3, 325–32, 336–7
sets 426, 505
software-assisted methods 277–8
suprasegmental phenomena and segmentation 68–70
TAICORP 584–7
target-based transcription systems and intonational phenomena 78–87
time-aligned 17, 147, 288, 310, 332
transcription procedures and levels of representation 52–7
transcription systems and suprasegmental information 67–87
VALIBEL 557–61
video 420–36
annotators 24, 48, 306, 316, 366, 423, 515–16
human 100, 102, 317
anonymization 369
ANVIL 5, 171, 383, 406, 409, 420–36
advanced annotation concepts 426–7
analysis 432–3
corpus management 430–1
descriptive statistics 432
fundamental annotation concepts 421–6
inter-coder agreement 432–3
interface matters 427–30
interoperability 433–6
project tool 430–1, 436
tracks 434–5
transition diagrams 432–3, 436
apical pronunciation 216, 235
apical variants 215, 220–1
apicalization 214–39, 508
monomorphemic words 216, 218, 222–4, 227–38
apicals 3, 214–18, 220–6, 229, 236, 238
frequency/incidence 215, 220, 223, 235
application engines 438–9, 468–70, 472
archival visual material 40
archiving 1–2, 42, 44, 133–5, 137–41, 143–9, 164
curation and preservation 142–4
data centres 138, 140, 144–5
digital corpus archives and dissemination 135–42
MPI archive 146–7
traditional archives of corpus primary data 134
ARPA (Advanced Research Projects Agency) 65, 92
ARPABET 61, 65–7, 92
articulation 51, 62, 197, 210, 272, 283, 396–7
double 51
articulatory data 195–6, 198
articulatory descriptions 268, 273, 278
artificial neural networks (ANNs) 97
aspiration 56–7, 63, 93, 490, 513
assimilation 55–6, 60, 218, 220, 225, 489, 497
assp tools 324
audio files 306–7, 310, 317, 461, 498, 502, 507
audio formats 42, 444
audio recorders 43–4
audio recordings 5, 159, 174, 305, 409, 416, 527
audio signals 49–50, 98, 100, 103, 198, 200, 202
Australian English 8, 562, 565–7, 571–2, 574–5
intonation 565–6
prosodic features 8, 575
Australian Map Task corpus 8, 562, 562–75
intonational categories and discourse structure 571–5
prosodic structure, intonation, and discourse segmentation 565–71
(p. 641) Australian National Database of Spoken Language (ANDOSL) 8, 562, 565, 571
authenticity 22, 135, 139, 143, 486
authority domains 137, 139
automatic phonetic transcription 2, 107, 365–6, 507
automatic phonological transcription 89–109
optimization 105–6
automatic segmentation 318, 325, 363, 368, 584–5
automatic speech recognition (ASR) 47, 90, 93, 97–8, 100, 102–3, 105–7
automatic transcription 68, 91–4, 107
autosegmental-metrical framework 70, 78, 80, 481, 566
backends 442–3, 461, 463, 466
basic utterance terminators 586
Belgium 8, 455, 492, 552–4, 561
bilinguals 54, 404, 453, 455
blind transcription(s) 388–9
Boolean logic 375, 424, 435
boundaries 17, 75, 246–7, 353, 355–8, 367, 570–1
compound 215, 226, 237, 239
derivational 226, 237, 239
intonational 568, 570, 575
morpheme 236, 394
phrase 78, 85, 295, 567
syllable 255, 326, 356, 391, 416, 513, 540
word 79, 218, 242, 326, 356, 394, 497
boundary tones 70, 78, 482, 512, 514, 566, 574–5
branching, conditional 372, 376, 379
Break Indices 78–9, 512, 566–7, 570
British English 20, 74, 82, 475, 478, 483, 520
Standard Southern 83, 477, 571
British Isles 5, 35, 80, 475–8, 483, 562
British National Corpus (BNC) 18–19, 21, 23, 58–9, 141, 523
British tradition 70, 74, 476, 481–2
broad phonemic transcription 51, 55
broad phonetic transcription 55, 92, 102, 255, 364–5, 503, 513
browsers 162, 410, 438–40, 442–3, 445, 468
built-in functions, Praat 372, 374–5, 379
C-PROM corpus 212, 560
cameras 41–2, 44–5, 500
Canada 279, 455, 496
canonical transcriptions (CAN-PTs) 98–9, 102–7, 364–5, 368
capital letters 66, 82, 395, 513, 536
carriers 134–5, 141, 148
casual speech 33, 219, 237, 239
Catalan 73–4, 207, 391
centroid vectors 127
CGUs (Common Ground Units) 8, 574–5
CHAT 4, 181, 186, 380, 382–4, 400, 404
conventions 59, 181
checkbox fields 455–6
checkboxes 440, 462
child language 273, 277, 280, 284–5, 577
acquisition 9, 284, 576
corpora 381, 577
data 276, 401
Child Language Data Exchange System, see CHILDES
child phonological development 265–85, 382
child phonological systems 277
child speech 61, 278, 578, 580, 583
CHILDES (Child Language Data Exchange System) 3–4, 275–6, 278, 288, 380–1, 578, 580
citation forms 30, 35, 38, 56
citation-phonemic representation 54–6, 61
CLAN 4, 380, 385, 387–8, 393, 400–1, 409
Phon-CLAN compatibility 400–1
CLARIN research infrastructure 145, 165
CLAWS 4 tagger 524
clitic groups 79, 390
CLPF corpus 273, 275, 399–400
cluster analysis 2, 121, 130
concepts and hypothesis generation 118–28
motivation 111–18
cluster structure 114, 117, 124
clusters 115–16, 123–4, 127, 130, 293–4, 305, 392
coda, see coda clusters
consonant 229, 293–4, 582
(p. 642) CMDI (Component Metadata Infrastructure) 2, 160–3, 165
CNGT 157–8
coda clusters 293–5
reduction 293, 295, 299–300
coda position 56, 582–3
coding 253–7, 275–6, 423, 428–9, 436, 491–2, 536–7
processes 5, 420, 423, 584–5
schemes 254, 313, 423–4, 428, 430, 436
spatial 5, 420, 427
systems 254, 256, 491, 495, 522
Cohen’s kappa 94, 433
collections 14–15, 22–3, 39, 112, 133, 136–8, 553–4
virtual 133, 136, 142, 145
collectors 18–21, 24, 29, 32, 34, 39, 41
combined CAN-DD transcription 104
combined KB-DD transcription 104
commands 343, 345, 347, 355, 357, 370–2, 375
fixed 375
Praat scripting 371
query 375, 377, 379
selection-dependent 372, 375–6, 379
common data services 138–9
common formats 324, 384, 444, 490
Common Ground Units, see CGUs
common nouns 186, 219, 222, 226–7, 230, 232–8
communication metadata 412–13
communicative events 28–9, 33–5, 39, 313, 411, 556–7
communicative situations 16, 21–2, 200, 289
comparability 488, 490, 537
competence 130, 193, 240, 261, 300, 510–11
compilation 1–2, 5–6, 8, 18–19, 21–3, 25, 392
compilers 2, 14–16, 18, 20, 159, 437, 440
complex words 215, 219, 227, 237, 239
Component Metadata Infrastructure, see CMDI
components 160–2, 164, 170, 172, 202–3, 392, 442
menu 444, 446, 449
software 140, 142, 320
viewer 307–8
compound boundaries 215, 226, 237, 239
compound data structures 372, 377–8
compression 82, 143, 483
lossy 143
computer animation 36, 420, 427
computer programs, see software
concatenation 30, 54–5, 143, 344
concordances 220, 316, 414–15, 504
conditional branching 372, 376, 379
configurations 69–70, 86, 238, 272, 446, 459
tonal 80, 87
confirmatory analysis 112
connected speech 55–6, 60, 64, 99, 102, 197, 538
consensus transcription 94, 106
consent 18–19, 147
declarations of 18–19
consistency 60, 79, 93, 147, 313, 316, 422–3
consonant clusters 229, 293–4, 582
consonantal errors 291
consonantal intervals 512–14
consonants 204–5, 241–2, 246–7, 293–4, 392–3, 395–8, 582–3
actual 283, 398
coronal, see coronals
deletion 55, 245–7, 541
floating 250–1
harmony 283, 397–8
labial 393, 396
latent 55, 204, 242–4, 246, 250, 259, 262
metathesis 268, 283, 392, 397–8
non-coronal 222–3, 227–8, 230–5, 238
non-pulmonic 61–2
constraints 171, 173–4, 202, 217–19, 222, 311, 313–16
extralinguistic 215, 221
lexical 215, 217, 219
markedness 9, 576
morphological 3, 218–19
(p. 643) phonotactic 80, 97, 238
predefined 314–15
structural 217, 239, 315
types 312
construction of corpora 9, 152, 241, 410
containers 305, 407–8, 426
content construction kits (CCKs) 440–3
content management systems (CMS) 440–2, 455
contexts 27–34, 203–6, 242–6, 258–60, 265–9, 272–5, 491–3
and contextual variation 29–32
cultural 34, 36, 38–9
grammatical 247, 260
liaison 55, 205–6, 243–4, 247, 251, 254, 261
prosodic 30, 368
segmental 395, 491, 540
situational 18, 21, 555
contextual factors 28, 30–2
contextual variation 2, 27, 29
continuation rises 208–9
contours 71, 76, 84, 197, 208–9, 573
intonation 69–70, 72, 75, 81, 83–4, 571, 575
rising–falling 84, 208–9
control 2, 27–8, 31, 34–5, 37, 428, 536
access 440, 456, 461
experimental 417, 479–80, 483
structures 372, 376–7
controlled database 8, 552
controlled vocabularies 150, 154, 308, 311–13, 315, 424
conventions 78–9, 407, 410, 520, 534, 537, 539
CHAT 59, 181
conversations 253–4, 257–8, 288–9, 407, 489, 500–1, 542–4
free 35, 255, 480, 487, 550
informal 16, 54, 253, 493, 495, 497, 500
semi-directed 253, 255, 487
spontaneous 57, 550
conversions 44, 147, 333–5, 400, 481, 503
Copenhagen 535–6, 538, 541, 543–4
copying, digital 134–5
COREIL corpus 199, 289
coronal consonants, see coronals
corpora 13–25, 27–9, 162–5, 193–201, 211–15, 287–90, 552–5
advantages and limitations 262–4
Aix-Marsec 19
annotated 169, 174, 189, 299, 321, 430
Australian Map Task 8, 562, 562–75
British National 18, 58, 141
C-PROM 212, 560
construction of 9, 152, 241, 410
COREIL 199, 289
DECTE 7, 517, 519, 525, 533
demo 439, 447, 459
Dutch Spoken 141, 151
EXMARaLDA 416–19
GARS 58, 490
Kiel 196, 565
LANCHART 7, 534–5, 537–9, 541–3, 545
learner, see learner corpora
legacy 402, 517
map task 565, 567
multi-purpose 6, 92, 498, 506
NECTE 2, 7, 113–20, 157, 517–21, 523, 525
Norwegian Speech 490, 499, 501, 503, 505, 507
oral 50, 53, 59–60, 194, 253, 288, 438
Orléans 251–2
PAIDUS 416–17
PFC 3, 52, 198, 211, 253, 256–7, 469
PhonBank 4, 380–1
phonological 5, 14, 16–19, 22–4, 26, 285, 287
raw data selection 22–3
‘ready-made’ 39
Rehbein-SKOBI 416, 418
representativeness and size 19–22
second language 286–7
Spoken Dutch 90, 102, 105, 364
subcorpora 7, 29, 212, 413, 447, 509–11, 560
Switchboard 90, 92, 101, 211, 565, 568
TAICORP 8–9, 576, 581, 584–7
TalkBank 4, 380
transcribed 271, 278, 284
variation and phonology 240–64
corpus analysis 112, 128, 131, 293–4, 296, 298, 405
corpus annotation, see annotation
corpus building 265, 275–6, 305, 362
corpus compilers, see compilers
corpus creation 2, 14, 18, 24–5, 107, 405, 409
agile 14, 25, 29, 107, 405
process 14, 19, 23–5, 510
corpus design 1, 13–26, 107, 325, 405, 417, 469
corpus linguistics 13, 18, 22, 110–11, 131–2, 194, 198
corpus phonology, see Introductory Note
corpus queries 25, 413
corpus storage 2, 14, 18
corpus window 346, 358–60
correction 156, 246, 364, 370, 405, 587
manual 331, 365–6
correspondence 432, 568, 570, 572, 574
coughing 181–2, 190, 407, 502, 504, 556, 587
cross-level links 426–7
cross-sectional studies 270, 281, 387
cross-word assimilation 90, 102
cross-word phenomena 100, 103
cultural contexts 34, 36, 38–9
curation 5, 134, 138–40, 142, 402
DAMSL 175, 568, 572–3, 575
dialogue acts 568, 572
Danish 7, 109, 499, 535–6, 538–41, 562
Eastern 542–3
data analysis 119–20, 265, 286, 338, 388–9
data categories 169, 173–5, 187
data centres 138, 140, 144–5
data collection 1, 6–8, 22–3, 25–45, 555, 561, 576–8
context and contextual variation 29–32
continuum 27–9, 31, 34, 38
elicitation of wordlists 35
equipment and recording technique 41–5
ethical considerations 41
highly controlled techniques 34–5
observer’s paradox 2, 27, 32–4, 479
partnering other data collection activities 40–1
reading tasks 34–5
TAICORP 578–9
‘uncontrolled’ data 38–41
use of nonverbal stimuli 35–8
data compilations 265, 280, 282–4, 381, 385, 388–93, 398–9
data-driven transcriptions (DD-PTs) 103–4
data exchange 87, 409
ELAN 319
data files 306, 423–4, 467, 470
data formats 2, 18–19, 23, 25–6, 165, 167, 169
data mining 111, 130–1, 400
data models 26, 402–3, 406–8, 419
EXMARaLDA 406–8
Data Seal of Approval (DSA) 146–7
database systems 4, 276, 279, 321–2, 336, 471, 554
database templates 322–5, 329, 331
databases 197, 291, 321–3, 335–6, 467–8, 548–52, 560–1
controlled 8, 552
GTRP 548–9
IViE 480, 483
kielread 339–40
MySQL 440–2, 504
VALIBEL 8, 490, 552–61
datasets 47–8, 151, 196, 274, 276, 279, 520
decision trees 99, 104–5, 107
declarations, of consent 18–19
declarative questions 569
decomposition 69–70, 327
DECTE (Diachronic Electronic Corpus of Tyneside English) 7, 517–33; see also NECTE
interviews 527, 529, 532
deletions 95, 99–100, 103, 105, 293, 399, 538
demo corpus 404, 439, 447, 459
dependency 296, 425, 434, 437
derivational boundaries 226, 237, 239
derivational suffixes 219, 226, 237
derivations 56–7, 219–20, 222–7, 229–30, 232–4, 237, 243
tokens of 220, 226, 234
descriptions 8, 93–4, 153–4, 185–7, 215, 245–6, 476–7
(p. 645) articulatory 268, 273, 278
metadata 136–8, 142–3, 146, 161
picture 199, 289, 417
structuralist 70, 277
descriptive features 281, 390, 396, 398
descriptors 152, 169, 171, 173–4
development of tools 38, 46–7, 212–13
developmental patterns 270, 381, 399
challenges in characterization 277
devoicing 93, 105, 293–4, 558
Diachronic Electronic Corpus of Tyneside English, see DECTE
diacritics 56, 61, 64, 278, 281, 390, 396
dialectology 5–6, 9, 117–18, 402, 498
dialects 82, 418–19, 476–7, 480–4, 499, 547–9, 551
Dutch 7, 546, 550, 565
traditional 7, 542, 546
dialogic interaction 567, 572
dialogue acts 8, 175, 567–8, 570–5
DAMSL 568, 572
dialogue(s) 34–8, 40, 177, 180, 565, 567–8, 574–5
telephone 102–3, 211
dictionaries 168, 177, 321–2, 335, 339–40, 384, 389
pronunciation 89, 211, 339
digital copying 134–5
Digital Repository Audit Method Based on Risk Assessment (DRAMBORA) 146
dimensionality 122–4
diphthongs 51, 522, 540, 583
direct inspection 111–13, 117, 532
directories 131, 138, 333, 344, 359, 365, 469
user-specified 366, 372, 375–7
disclosure 274
discourse 6, 8, 211–12, 563, 567–8, 571, 573–5
structure 565, 567, 571
discourse segmentation 564–5, 567, 571
discretization 48, 64, 69
dissemination 1–2, 133–5, 137, 141–5, 147–9, 187, 532–3
channels 140, 148
strategies 7, 517–18
distance 95–6, 123–4, 314–16, 502
measures 94–5
relative 117, 124
distinctive features 202, 242
distributional patterns 22, 381, 542
document clustering 131
Document Type Definitions (DTDs) 24, 525, 527
documentation 16, 18–19, 23–4, 26, 41–2, 170, 173
language 4, 28, 31, 33, 39–40, 42, 305
dorsals 218, 540
downloads/downloading 42, 445, 456, 461, 466, 516
DRAMBORA (Digital Repository Audit Method Based on Risk Assessment) 146
dropdown menus 466, 471
DTDs (Document Type Definitions) 24, 525, 527
Dublin Core 150–4, 158
duration 75, 211–13, 363, 368–9, 511, 565, 567
minimal 211, 317
of pauses 212–13, 570
Dutch 101, 103, 109, 270, 291, 298, 548–50; see also Meertens Institute
dialects 7, 546, 550, 565
Dutch Spoken Corpus 141, 151
dyadic interviews 518
dynamic programming (DP) 95, 391
EAF files 313–15, 446
East Flanders 549
East Norwegian 6, 214–16, 219, 228–9, 498
East Oslo 499–500
Eastern Jutland 541–2
ecological validity 479–80, 483
edges 69–70, 78, 176, 345
phrasal 69, 77, 80, 566
editors 59, 156, 308, 370–1, 461, 472
education 217, 470, 494, 499–500, 509, 554
higher 478, 499–500
levels 23, 215, 447, 453, 461, 553, 555
ELAN 3, 5, 42, 44, 305–20, 403–4, 433–4
data files 421, 434
data model 310–13
EAF files 313–15, 446
(p. 646) functionality overview 306–9
media handling frameworks 309–10
multiple file (local corpus) operations 316–17
multiple file search 315–16
search 314–15
tier-based operations 317–19
electronic texts 15, 110–11, 156, 179
elicitation 28–9, 34–7, 173, 198–9, 247, 277, 479
tools 27, 29, 38
elicited speech 199, 266, 554
elongated phonemes 512–13
EMU 4, 47, 166, 280, 321–41
annotation 332–6
corpus preparation 324–31
database creation 331–2
database framework 322–3
further analyses and processing using EMU/R 338–41
Labeller 325, 332–3
queries 336, 336–9
Query Language (EQL) 336–7, 341
EMU/R 332, 338–41
encoding 50–1, 56–7, 68, 422, 426–7, 429, 435–6
standards 135, 142
English 5–8, 63–4, 82–4, 108–9, 288–90, 293–300, 509–11
American, see American English
Australian, see Australian English
British, see British English
intonation 81, 475–6, 482
New Zealand 562, 565
Scottish 477
Welsh 476–7
epenthesis 229, 244, 254, 293–4
epenthetic liaison 254–5
errors 24–5, 94, 98–9, 105, 240, 297–8, 405
annotation 23, 25
consonantal 291
performance 249–50
transcription 59, 200
European French 258, 497
evaluation 2, 9, 14, 25, 52, 273, 292
EXAKT 413–15
Excel 338–40
EXMARaLDA 4–5, 171, 179, 357, 402–19
corpora 416–19
corpus-driven workflows 405
corpus manager 411–13
data model 406–8
data visualization 404
design considerations 403–5
development 403, 419
EXAKT 413–15
interoperability and sustainability 403–4
language acquisition studies 416
metadata 404–5
Partitur-Editor 408–12
tools 290, 404, 408–15, 436
transcription 405, 411
experimental control 417, 479–80, 483
experimental data 194, 196, 198–201, 205, 207
experiments 27–8, 94, 97, 100, 106, 176, 180
speech 27–8
exploitation software 140, 162–3
exploratory analysis 112, 221
export 316, 319, 382, 384, 430, 434–6, 554
formats 153, 175
functions 319, 508
Extensible Markup Language, see XML
extensions 19, 153–4, 310, 317, 446, 458, 463
file 331, 446–7, 462
extralinguistic constraints 215, 221
factor groups 221–4, 227, 230–1, 234–6
factor weights 231–2
factors 221–3, 230–5, 237, 239, 271, 295, 481–2
contextual 28, 30–2
frequency 9, 258, 261
knockout 223, 227, 230
lexical 205–6, 217, 239
nonlinguistic 300, 510
structural 214, 222, 225, 236, 238
FAND 548–9
feature sets 390–1, 396
feature structures 179, 183–7
feature-value pairs 183–4
features
descriptive 281, 390, 396, 398
linguistic 20, 22
phonological 22, 268, 278, 296, 390, 392, 419
females 163, 404, 453–4, 500, 503, 510, 548
field recorders 43–4
fields 39–40, 160, 266–7, 447, 449–51, 453–7, 459
checkbox 455–6
text 353, 355, 373, 440, 451, 453, 466
user 451, 454, 463
file extensions 331, 446–7, 462
file formats 310, 333, 400
file locations 323, 331
filenames 372, 374–6, 458–9, 466, 468
files 315–16, 322–5, 371–2, 375–7, 442–6, 455–8, 468–70
anvil 423, 430
data 306, 423–4, 467, 470
multiple file operations 316–17, 320, 456
NECTE 520, 525
signal 323–5, 331
text 17, 363, 366, 371, 376, 577–80, 584
TextGrid 355, 365–6, 375, 545, 557
token 221–2
transcription 447, 456, 463, 522, 549
video 6, 42, 44, 306–7, 310, 498, 504
filled pauses 78, 90, 292
films 36–40, 134, 501, 504
filters 31, 99, 144, 183, 413, 415, 466
final accents 558
final consonants 204–5, 242–4, 254
final position 217, 494
first language acquisition 3, 577
first languages 288–91, 494
fixed commands 375
Flanders 547–50
French 547
Flexicontent 443–6, 451, 453, 455–9, 466
categories 447, 452
general configuration 446
installation 443–4
launching 446, 449
floating consonants 250–1
floating segments 251–2, 263
fluency 287, 291–3, 295, 299–300
non-native 291, 293
folders 316–17, 442, 444, 455
fonts 279, 345, 409
for-loops 372, 376–7
forced recognition 99–102, 104, 107
foreign languages 6, 61, 291, 497, 509–10
formats 42–4, 134–5, 146–7, 153, 169–70, 174–5, 399–400
audio 42, 444
common 324, 384, 444, 490
data 2, 18–19, 23, 25–6, 165, 167, 169
export 153, 175
file 310, 333, 400
metadata 150–65
for phonological corpora 166–90
proprietary 186, 189, 279
standardized 2, 18, 23, 166–7, 187
fortuitous interruptions 33, 41
forward linking 250
forward-looking functions 568–9
France 53, 196, 455, 487, 489, 492
free conversation 35, 255, 480, 550
free speech 289, 292, 294, 296, 510
French 59–61, 63–4, 207–9, 252–4, 288–9, 491–4, 552–7
Belgian, see Belgium
European 258, 497
pauses in 211–13
PFC, see PFC
phonology 245, 247, 487, 489, 491, 493, 495
Reference 243, 245
southern 493–4
French Flanders 547
French Phonology and Morphology (FPM) 241–3, 245–6, 249–50, 262
frequency 20, 22, 259–60, 298, 300, 507, 542–4
fundamental 48, 84, 196, 324, 333, 343, 368–9
word 118, 205, 585–6
frequency factors 9, 258, 261
frontends 442, 450, 461, 463–6
(p. 648) function words 294, 336–7, 582
functionalities 280, 283, 316, 319, 395, 401, 403
functions 48–9, 328–30, 339, 374–7, 386–8, 391, 394
built-in 372, 374–5, 379
export 319, 508
forward-looking 568–9
pragmatic 69, 583
string 372, 374, 379
fundamental frequency 48, 84, 196, 324, 333, 343, 368–9
gaps 29, 32, 251, 261, 309, 312, 319
accidental 215, 227
GARS corpus 58, 490
gatekeeper functions 137–8
General American English 92, 562
geometrical interpretation 121–2
Germany 6, 180, 509
Northern 417
glottal stops 250–1, 255, 582
Goeman–Taeldeman–Van Reenen Database 546–50
GoldVarb software 214, 221, 224, 231–2, 236, 239
grammar 16, 31, 206, 245, 268, 343, 400
grammatical categories 186, 205
grammatical contexts 247, 260
grammatical gender 185
grammatical variables 7, 539, 542, 544
granularity 163, 171, 174, 181, 568, 571, 573
degree of 52, 58
graphical user interfaces, see GUIs
GTRP database 548–9
GUIs (graphical user interfaces) 322, 324, 361, 371–2, 375, 388, 390–1
Praat 362, 371–2, 375, 379
Query 336–8
handedness 422, 424, 432, 435
harmony
consonant 283, 397–8
vowel 283, 392, 488
HCRC 8, 28, 37, 562, 567–70, 573, 575
headers 151, 178–9, 313, 332, 525, 580
headphones 43
hesitations 47, 49, 52–3, 59, 250, 255, 501–2
HIAT, conventions 416–17
hidden Markov models (HMMs) 97, 106
hierarchical analysis 124, 126
hierarchical queries 336–7
hierarchical relations 333, 335–7
hierarchy 325–7, 332–3, 335, 356, 426, 566
prosodic 327–9, 566
higher education 478, 499–500
higher-level prosodic structure 567–8
HMMs (hidden Markov models) 97, 106
horizontal tracks 307, 422
hosting providers 441–2
human annotators 100, 102, 317
human coders 278, 422–3
human transcribers 92–3, 100, 102, 271, 363
human transcription(s) 93–4, 96, 273
human–computer interaction 355, 420
hyphenation 156–7
hypothesis generation 99, 111, 117–18, 124
hypothesis verification 100, 102
identifiers 151, 153–4, 324, 392
images 35, 144, 306, 410, 431, 438, 440
immigrants 477–8, 500
import 319, 339, 404, 409, 421, 459, 467
impressionistic judgements 271–2
inconsistencies 24–5, 74–5, 84, 423
index cards 110, 522–3
indexed variables 377, 379
indexes 137, 374, 377, 470, 472, 560, 567
inflectional suffixes 252
informal conversations 16, 54, 253, 493, 495, 497, 500
input probability 231–2, 234, 236
insertions 95, 99–100, 103, 205, 242, 250, 328
inspection
direct 111–13, 117, 532
visual 56, 115–16, 481
(p. 649) installation 333, 438, 441–2, 445
intensity 75, 77, 306, 350, 363, 368, 422
intention 28, 32, 37, 39, 41, 519, 521
inter-annotator agreement 501, 515–16
interaction 27, 30–1, 196, 206–7, 224, 269, 554–5
dialogic 567, 572
interdependencies 227, 230, 237
interfaces 384, 395, 398, 408, 411–12, 463, 549–50
admin 463
interjections 336–7, 502, 559
interlanguage 298, 300
intermediate phrases 482, 566–7, 570
International Organization for Standardization, see ISO
International Phonetic Alphabet, see IPA
international standards 1, 7, 176, 519
internet 141, 164, 276, 313, 379, 437, 550–1
interoperability 2, 152, 163–7, 170, 174–5, 310, 330
ANVIL 433–6
EXMARaLDA 403–4
Phon 384–5
interpretability 118, 139, 142, 147, 209
interpretation 87, 118–19, 157, 271–3, 399, 479, 489–90
geometrical 121–2
linguistic 15, 68, 75, 80
interpreters 40, 370, 374, 378
interrogatives 69, 416, 480
interrupted phrases 296, 512
interruptions 47, 81, 501, 555, 560, 586
fortuitous 33, 41
intertwined phenomena 68–9
interval labels 374, 377
interval tiers 334, 351–2, 358, 373–5
intervals 47–8, 352–3, 355–8, 363–6, 368–9, 539, 544–5
adjacent 355–6
consonantal 512–14
vocalic 513–14
intervening vowels 283, 397–8
interview situations 40–1, 506, 510
interviewers 195, 289, 503, 511–13, 532
interviews 500–1, 503, 510–12, 517–18, 520, 527, 548
dyadic 518
sociolinguistic 538, 544, 553
without nonverbal stimuli 40
intonation 68–72, 289–90, 295–6, 475–6, 478–9, 481, 564–5
analysis 70
contours 69–70, 72, 75, 81, 83–4, 571, 575
English 81, 475–6, 482
non-native 297, 514
patterns 30, 36, 199
phrases 78, 80, 295–6, 476, 479, 514, 566–7
rising 556, 572
systems 31, 33, 76
intonational boundaries 568, 570, 575
intonational events 67, 76, 80–1, 85, 481
intonational patterns 199, 207, 564
intonational phrases 74, 476, 479, 482, 566–7, 571
intonational phrasing 295–6, 300
intonational tunes 571–2, 574–5
intonational variation 5, 80, 475, 477, 564–5
intra-annotator agreement 25, 515–16
INTSINT 70, 76, 86–7, 368
symbols 84–6
invariants 210–11
IPA (International Phonetic Alphabet) 56, 61–8, 70, 74–5, 87, 108–9, 203
symbols 62, 65–6, 71, 74, 109, 390, 396
transcription(s) 73, 193, 278, 388, 393
ISO (International Organization for Standardization) 146, 168, 173–6, 184, 187, 189
ISO-TEI 176, 184–6
iterative structures 376–7; see also loops
IViE (Intonational Variation in English) corpus 5–6, 21–2, 70, 80–2, 84, 198–9, 475–85
applications 482–3
design 476–82
Japanese learners 288–90
Java Media Framework (JMF) 309–10, 421
Java programming language 306, 309
Javascript 467, 470–2
Joomla CMS 443–4, 450–1, 461, 463
installation 442
Journal of Quantitative Linguistics 131
(p. 650) Jutland
Eastern 541–2
Western 542
kappa 106, 433, 515–16
Cohen’s 94, 433
keywords 129–30, 376, 413
Kiel Corpus of Speech 196, 565
kielread database 339–40
knockout factors 223, 227, 230
knowledge-based transcriptions (KB-PTs) 103–4
label types 327–8, 330–1, 336
labels 47–51, 115, 339–40, 363–4, 375–8, 453, 560
annotation 200, 307, 407
interval 374, 377
textual 305, 310
labial consonants 393, 396
labials 210, 218, 391, 393, 396, 540, 583
laboratory phonology 13, 204, 239
LAMUS (Language Archive Upload and Management System) 138, 146–7
LANCHART corpus 7, 534–45
aim and basic terminology 534–7
comparability and discourse context analysis 537–8
phonetic annotation 538–41
sample studies 542–5
language acquisition 1, 13, 46, 265, 267, 380–1
first 3, 577; see also child language, acquisition
Language Archive Upload and Management System, see LAMUS
language documentation 4, 28, 31, 33, 39–40, 42, 305
language learners 54, 272, 278, 290–3, 298
second 3, 6, 286–8, 290, 299–300, 509
language model (LM) 97, 103, 317
language resources 76, 151, 157, 163, 173–4
management from ISO perspective 174–6
language teaching 16, 291–2, 298–300
language variation 1, 9, 13, 417
latent consonants 55, 204, 242–4, 246, 250, 259, 262
layouts 324, 409, 411, 447, 450, 463
LeaP corpus
annotation 512–16
data format and availability 516
size and content 509–12
learnability 266, 428
learner corpora, phonological 287–90, 300
legacy corpora 402, 517
lemma 327–9, 524, 531
tier 512, 515
letters 108, 353, 469, 538
capital 66, 82, 395, 513, 536
level tones 296–7
lexica 22, 54, 90, 97–8, 137, 203, 577
lexical constraints 215, 217, 219
lexical factors 205–6, 217, 239
lexical items 54–5, 57, 188, 205, 219, 225, 261
lexical representations 69, 205, 243
lexical tones 69, 72, 582, 584
lexicalized pronunciation 226, 237, 239
liaison 54–5, 60, 204–6, 241–63, 489, 491–2, 495–6
contexts 55, 205–6, 243–4, 247, 251, 254, 261
data, grammar, and register 245–7
epenthetic 254–5
optional 243, 248–9, 259
realized 248, 258–9
variable 246, 249, 254, 261
lig-derivations 222, 237
linear algebra 120, 128–9
linguistic annotations 15, 17, 58, 168, 171, 173–4, 176
linguistic features 20, 22
linguistic interpretation 15, 68, 75, 80
linguistic structures 20, 22, 74, 147, 176, 479, 551
linguistic theory 131, 152, 201, 240, 263
linguistic type 154, 310–11, 314, 316, 434
linguistic variables 504, 537
linking 182, 220, 224–7, 229–30, 234, 237, 239
forward 250
(p. 651) linking phonemes 219, 222–7, 234, 237
listeners 13, 89, 94, 100, 102, 497, 503
locations 69–70, 81, 360, 412–13, 477–8, 547–8, 555
file 323, 331
geographical 535–6
log likelihood 231–2, 234
login 441–3, 463
modules 463–4
long vowels 224–30, 237–9, 341
longitudinal studies 270, 277, 281, 387, 535, 577
loops 372, 376–7, 379
lossless compression 143
lossy compression 143
loudness 47, 69, 75, 483
machine-readable extension 61, 65
MacOSX 440–1
macro-style scripts 371–2
Magnetic Resonance Imaging (MRI) 197
maintenance 136, 140, 168, 173
MAMP 441–2
MAND 549
Mandarin 562, 577–8, 580, 584
manual annotations 24, 367, 432, 512–13, 515–16
manual correction 331, 365–6
manual transcription(s) 91, 104–5, 107, 378, 584
manually annotated tiers 512, 515–16
many-to-many relations 327, 331
map task corpora 565, 567
mark-up 31, 169, 420, 537, 539, 558
markedness constraints 9, 576
Max Planck Institute for Psycholinguistics 3, 38, 140–1, 306, 416
measurements 38, 93–4, 118–20, 292, 299, 419, 515–16
acoustic 272, 280
repeated 24, 93
media files 141, 147, 306, 309, 313–14, 317–18, 387
Meertens Institute 7–8, 546–51
Goeman–Taeldeman–Van Reenen Database 546–50
soundbites 550–1
melodic variation 51
Memorial University of Newfoundland 285, 401
menus 343–4, 346, 348–9, 370, 375–6, 379, 464–5
dropdown 466, 471
fixed 370–1
metadata 45–7, 163–5, 177–9, 328–9, 411–12, 525, 554–5
communication 412–13
definition and reasons for use 150–2
descriptions 136–8, 142–3, 146, 161
EXMARaLDA 404–5
formats 150–65
modellers 160, 162
outlook 164–5
practical aspects of design and creation 162–4
records 134, 147, 150–2, 154, 163–4
schemas 2, 150, 163
sets 2, 150, 152–63
metathesis 247, 293–4
consonant 268, 283, 392, 397–8
methodologies 2, 193–213, 285–7, 292, 300, 487, 571–2
metrical structures 38, 69–70, 206
metrically strong syllables 70, 482
microphones 15, 42–5, 178–9, 500, 578
minimal duration 211, 317
minimal pairs 215, 288, 488, 540
MLUs 393–4
modelling 13, 170, 176, 194, 206–7, 210
aerodynamic 1, 13
prosodic 212, 483
statistical 46, 193
models 174, 206–7, 211, 231–3, 272, 481–2, 566
data, see data models
language, see language model
statistical 207, 213
theoretical 245, 266, 272, 495
Momel-INTSINT 68, 70, 84–6
monologues 197, 199–200, 288–9, 550, 570
monomorphemic words 216, 218, 222–4, 227–38
(p. 652) monosyllabic prepositions 259
monosyllables 493–5
morpheme boundaries 236, 394
morphemes 183, 215, 239, 276, 327–9, 385, 393–5
morphological boundaries 216, 218, 221, 223, 237
morphological constraints 3, 218–19
morphologically complex environments 224–7
morphology 222, 230, 232, 241, 498, 506, 547–8
morphosyntactic annotation 175, 185, 188, 558–60
motivation 111, 361, 403, 475, 478, 510–11
MPEG 42, 44, 142–4, 147
MRI (Magnetic Resonance Imaging) 197
Multext project 91, 185
multi-purpose corpora 6, 92, 498, 506
multimedia annotation 3, 305, 319; see also ELAN
multiple file operations 320, 456
multiple file search, ELAN 315–16
multivariate analysis 221, 230, 232, 236, 519
MySQL database 440–2, 504
narrow phonetic transcription 51, 56–7, 87, 92, 288, 548, 578
nasal vowels 243, 262, 557
nasals 514, 582–3
native speakers 6, 35–6, 83–4, 288, 292, 295–6, 509–11
natural speech 22, 32, 211, 480, 502, 506, 508
NECTE (Newcastle Electronic Corpus of Tyneside English) 2, 7, 113–20, 157, 517–21, 523, 525; see also DECTE
corpus construction 519–33
files 520, 525
Netherlands 3, 7, 38, 279, 306, 546–8, 550
New Zealand English 562, 565
Newcastle Electronic Corpus of Tyneside English, see NECTE
Newcastle University 517, 533
noise 24, 28, 31, 43, 90, 107, 327
level 317–18
non-coronal consonants 222–3, 227–8, 230–5, 238
non-liaison 255, 259
non-native fluency 291, 293
non-native intonation 297, 514
non-native speakers 292, 294–7, 510–12
non-native speech 23, 59, 292, 294, 296
non-pulmonic consonants 61–2
non-silence segments 317–18
non-temporal elements 426
nonlinguistic factors 300, 510
nonlinguistic information 15, 68–9, 300
nonverbal stimuli 34–6, 40
Northern Germany 417
Norwegian 3, 501–2, 507
Norwegian Speech Corpora 490, 499, 501, 503, 505, 507
NoTa-Oslo corpus 6, 214–15, 217, 220–1, 498–502, 504, 506–8
documented areas of use and future possibilities 507–8
limitations 506–7
nouns 182–4, 227, 229, 235, 238–9, 558–9, 585–6
singular 244–6, 254
nuclear tones 74, 296–7
nuclei 296–7, 391, 540
syllable 75, 77, 368
object lists 343–4, 348, 350, 375
object types 346, 350
Object window 342–5, 347, 349, 370, 375
objectives 48, 50, 57, 67, 69, 80, 202
observer’s paradox 2, 27, 32–4, 479
one-level binomial analysis 232
one-to-many relations 326–7
(p. 653) open-source software 279–80, 284, 385, 401, 437
open sourcing 275, 278
open standards 276, 278, 285, 403
operating systems 279, 309, 342, 385, 404, 419
opposition 64, 75, 199, 202, 488, 493
Optimality Theory (OT) 9, 207, 244, 496, 576
optional liaisons 243, 248–9, 259
oral corpora 50, 53, 59–60, 194, 253, 288, 438
Orléans corpus 251–2
orthographic representation 54–5, 61, 98–9, 541
orthographic tier 24, 78–9, 81, 364
orthographic transcription(s) 52–4, 56–8, 60–1, 98–100, 288–9, 362–5, 557–8
Praat 362–4
orthography 53, 63, 65, 98–9, 578, 580, 584–5; see also spelling
standard 54, 243, 490
Oslo 3, 215–18, 226–7, 229, 235–6, 238, 498–500
Norwegian 214–39
University of 6, 216–17, 487, 498, 502
output 97, 99, 364, 368, 439–40, 519, 522
signals 195, 197
overlapping annotations 318–19
overlapping speech 182, 289, 555, 557
overrepresentation 22, 238
overuse 288, 296–7
PAIDUS corpus 416–17
pairs 31, 62, 124, 424, 563
minimal 215, 288, 488, 540
paragoge 293–5
parameters 198–9, 259, 261, 317, 415, 449–50, 460–1
acoustic 211, 567
search 394, 398–9
parent tier 305, 311–12, 319, 434
parents 312–13, 329, 478, 499, 501, 555, 577–8
part-of-speech tagging 414–15, 519, 521–4, 531, 555, 559–60, 584–5
partial words 90, 108
participles 541–2
particles 582–3
Partitur-Editor 408–12
passwords 441–2
pathological speech 65, 82, 87
patterning, phonological 274, 282, 390
patterns 202, 220, 270, 282–3, 316–17, 397–9, 542–4
detection 282–3, 392
distributional 22, 381, 542
intonational 199, 207, 564
phonological 202, 266, 269–73, 393, 397, 401
sound 193, 201–3, 241, 268, 277, 577, 584
stress 281, 395
pauses 85, 209, 211–12, 255, 288–9, 292, 366–7
duration 212–13, 570
filled 78, 90, 292
in French 211–13
length 557, 567
silent 212, 292, 570
unfilled 292, 512
perceived pitch 76–7
perception 20, 48, 50, 68–9, 75, 143, 291–2
perfect agreement 515–16
performance, errors 249–50
persistent identifiers (PIDs) 135–7, 142, 146–7, 173
PFC (Phonologie du français contemporain) 6, 14, 253, 486–97, 505, 508, 553
corpus 3, 52, 198, 211, 253, 257, 469
results 256–62
database 256, 258–60, 494, 497
goals and methodology 487–91
methodology 253–62
results 496–7
from schwa to prosody 491–6
Phon 4, 278–84, 380–1, 383–96, 398–401, 404, 409; see also PhonBank
automatized data annotation systems 390–2
CLAN compatibility 400–1
corpus elaboration within 385–93
(p. 654) corpus organization and transcription 387–90
data compilation 385, 392–3
data structure 284, 387
design 279, 281
interoperability 384–5
media linkage and segmentation 387–8
phonetic transcription and validation 388–9
project management 387
queries and reporting 393–400
transcription 384
word grouping 390
PhonBank 3–4, 47, 266, 269, 277–85, 380–401, 587; see also Phon
corpus 4, 380
data sharing 382–5
database 275, 382–3
phone level 97, 364–5
phone recognition 97–9, 102, 107
phone sequences 97, 396
phonemes 17, 55–7, 61, 66–7, 290–1, 329–30, 539–40
elongated 512–13
linking 219, 222–7, 234, 237
substitutions 95, 294
phonemic representations 54–5, 58, 61, 63–4, 210, 538
phonemic transcription (s) 24, 48, 54, 61, 64, 67, 203
broad 51, 55
phonetic alphabet, see IPA
phonetic annotation 50, 343, 364, 541, 545
LANCHART corpus 538–41
Praat 364–7
phonetic realizations 62, 65, 67, 70, 81–2, 199, 297
phonetic representation 61, 63–4, 522
phonetic segments 64, 113–14, 116, 119–21, 127–8, 521, 538
phonetic symbols 54, 56, 58, 61, 265, 268, 281
phonetic tier 82, 84, 482
phonetic transcription(s) 108–9, 268–9, 271, 277–9, 384, 518–20, 579–81
acoustic 57
automatic 2, 107, 365–6, 507
broad 55, 92, 102, 255, 364–5, 503, 513
canonical 364–5, 368
narrow 51, 56–7, 87, 92, 288, 548, 578
tools 108–9
phonetic usage 113–14, 117, 123, 127
phonetic variables 7, 113–14, 538–9, 544
phonetic variation 51, 119, 124, 128, 213, 366, 538–9
Phonex 393, 396, 399
phonological analysis 4, 7–8, 31–9, 250, 279, 506–7, 583
phonological annotations 16, 24, 89–90, 106, 279–80, 290
phonological categories 82, 84, 202, 208–9, 213, 297
phonological corpora 1–3, 13–14, 16–20, 22–4, 173–7, 285–7, 297–301
definition 14–18
formats for 166–90
phonological development 3–4, 380–5, 395, 401
child 265–85, 382
phonological features 22, 268, 278, 296, 390, 392, 419
phonological learner corpora 287–90, 300
phonological patterning 274, 282, 390
phonological patterns 202, 266, 269–73, 393, 397, 401
phonological representations 55, 67, 70, 89, 204, 211, 225
phonological segments 267, 356
phonological structures 16, 30, 268, 300, 481
phonological systems 80, 86, 269–70, 272, 278, 280, 282
adult 268, 277, 284, 396
child 277
phonological tier 82, 84, 482
phonological transcription(s) 90–5, 97–9, 102, 104–7, 289
automatic 89–109
phonological words 226, 233, 237
Phonologie du français contemporain, see PFC
(p. 655) phonology, see Introductory Note and detailed entries
phonotactic constraints 80, 97, 238
phonotactics 98, 201
phrasal edges 69, 77, 80, 566
phrase accents 78, 482, 566, 573
phrase boundaries 78, 85, 295, 567
phrases 30–3, 35, 182–3, 326–30, 336–7, 389–90, 515–16
intermediate 482, 566–7, 570
interrupted 296, 512
intonation(al) 78, 80, 295–6, 476, 479, 514, 566–7
prosodic 47, 70, 207
phrasing 3, 68–70, 80–1, 87, 194, 290, 482
intonational, see intonation phrases
prosodic 78, 80, 86, 204, 206, 211
picture descriptions 199, 289, 417
picture window 342–6, 375
pitch 69–70, 72–3, 75, 82–3, 85–6, 343–5, 349–50
accents 78–82, 84, 482–3, 512, 514, 571, 573
contours 343, 351, 368, 411, 483
curves 344–6, 349
movements 69, 72, 74, 81–2, 84, 296–7, 476
objects 344–7, 369
perceived 76–7
range 17, 484, 570
span 573
plosives 45, 56–7, 62, 368, 413, 514
plugins 400, 442, 445, 457, 459
plurals 108, 184, 229, 244, 258, 263
post-alveolar segments 31
Praat 333–4, 342–63, 370–1, 377–9, 403–4, 411–12, 490
analysis of sounds and corresponding TextGrids 368–70
annotation with TextGrid window 350–8
built-in functions 372, 374–5, 379
compound data structures 372, 377–8
control structures 372, 376–7
corpus window 346, 358–60
elements of scripting language 370–9
example applications 362–70
fixed and selection-dependent commands 375
history mechanism and macro-style scripts 371–3
interaction with user 373
motivation for use 361–2
orthographic transcription 362–4
phonetic annotation 364–7
picture window 342–6, 375
procedures 378
prosodic annotation 367–8
reading and saving files 375–6
scripting 4, 360–79, 558
sound window 347–50
variables 373–4
windows 342–7
pragmatic functions 69, 583
pragmatics 5, 22, 118, 201–2, 402, 417
predefined constraints 314–15
prefixes 147, 224–5, 245, 263, 317, 559
prepositions 183, 206, 259, 559
prescriptive tradition 243, 245, 256–7, 486
preservation 18, 140, 142, 165, 196, 517
long-term 135, 138–9, 147–8
preverbal messages 291–2
primary data 2, 16, 21, 25, 27–8, 137, 305–6
primary stress 71–2, 538–9
primary tracks 425, 429, 434
primitives 51, 70, 80
probabilities 97–8, 111, 128, 206, 215, 221, 432
proficiency levels 291–2
programs, see software
pronouns 296, 357, 493, 503, 559
pronunciation 225, 327, 475–8, 488, 520, 540–2, 557
actual 55, 503, 581
dictionaries 89, 211, 339
lexicalized 226, 237, 239
standard 327, 418, 540–1
variants 64, 90, 97, 100, 107, 366
proper names 219, 222, 238–9, 415, 559
proper nouns 227–8, 230, 232–6
proprietary formats 186, 189, 279
prosodic annotation 4, 362, 515, 558, 560, 575
Praat 367–8
prosodic boundaries 77, 84, 367–8, 571
(p. 656) prosodic context 30, 368
prosodic features 8, 24, 538, 574–5
prosodic hierarchy 327–9, 566
prosodic modelling 212, 483
prosodic phrases 47, 70, 207
prosodic phrasing 78, 80, 86, 204, 211
above word level 206–7
prosodic structure 207, 565–8
prosodic tiers 82, 254, 368
prosodic transcription(s) 15, 87, 197, 290
prosody 3, 8, 197, 495–7, 562–3, 565–8, 575
acquisition 6, 509
Prosograms 48, 51, 70, 75–7, 87, 368, 558–9
punctuation 59, 490, 556
quasi-intonation phrases 512
queries 281–2, 314–15, 320–2, 325, 339, 392–7, 413
hierarchical 336–7
syntactical 337–8
query commands 375, 377, 379
Query GUI 336–8
query results 321, 336–8, 413–15
query strings 336–7
query tools 322, 336–7, 402
questionnaires 199, 239, 299, 548–9
questions
declarative 569
tag 296, 569
Quicktime 309, 504
radio buttons 373, 440, 453, 455
range identification 187
raw data 15–20, 22–3, 39, 200, 274, 321, 324
reading passage style 292–4, 296–7
reading tasks 28, 30, 34–5, 52, 199, 253, 363
‘ready-made’ corpora 39
real time 7, 36, 534–5, 545
changes 534–5, 537–8
realization 30–1, 63, 84, 204, 206, 481–2, 539–41
rate of 261
realized liaisons 248, 258–9
recognition, forced 99–102, 104, 107
recognizers 100, 317, 320
recorders, field 43–4
recording sessions 43, 52, 270, 277, 281, 399, 577–9
reference elements 425–6
Reference French 243, 245
reference semantics for linguistic annotation 173–4
reference tracks 424–6, 429–30
reference transcriptions 94, 100–5
references 128, 136, 138, 173, 310, 470, 525–7
regional varieties 61, 87, 492, 496
regionalization 542–3
register 244–5, 249, 251, 256, 495, 497
formal 252–3
registered users 146, 453, 456, 460–1, 463
Rehbein-SKOBI corpus 416, 418
relations 43–4, 135–6, 173–4, 181, 326, 329–36, 411–13
many-to-many 327, 331
one-to-many 326–7
reliability 25, 93, 384–6, 390, 423, 428, 515–16
repetitions 35, 52, 59, 195, 292, 325, 586
representations
abstract 87, 262, 406
citation-phonemic 54–6, 61
lexical 69, 205, 243
phonemic 54–5, 58, 61, 63–4, 210, 538
phonetic 61, 63–4, 522
phonological 55, 67, 70, 89, 204, 211, 225
symbolic 48, 55, 57, 67, 70, 77, 482
representative samples 14, 21, 28, 39, 200, 561
representativeness 19–22, 24, 47, 499, 554
reproducibility 515–16
research assistants 500, 503, 576, 578
research infrastructures 145, 160, 187, 537
research questions 27–9, 34–6, 111–13, 118–19, 126–8, 200, 364–5
resources 132–3, 142–8, 150–2, 160, 166, 401, 483
retellings 289, 294, 296–7, 510–12
return values 374–5, 377–9
rising intonation 556, 572
rising–falling contours 84, 208–9
row vectors 121, 123–4, 127
run-time 370, 373
SALT 383
SAMPA 55, 61, 65–7, 92, 513, 516, 538
samples/sampling 112, 123, 476–8, 480, 520, 535–6, 584–5
representative 14, 21, 28, 39, 200, 561
simple random 20
speech 47, 54, 57, 78, 80, 271, 387–8
stratified random 20
sampling frames 20, 499
sandhi phenomena 54, 90, 100, 241
schwa 105, 113, 254, 256, 487, 489, 491–6
deletion 54, 60, 493
Scottish English 477
screenshots 395–6, 412, 560
scripting 4, 335, 342
languages 361, 370–1
Praat 4, 360–79, 558
search 392–3, 395–6, 504–5, 507–8, 538–9, 549–50, 557–8
interfaces 392, 397–8
multiple file search 315–16
parameters 394, 398–9
results 147, 316, 394, 398, 413–14, 505
strings 220, 315, 470
tools 19, 25, 147
web 6, 129–30, 498
second-generation immigrants 478, 500
second language acquisition (SLA) 3, 5–6, 9, 65, 89, 286–301, 381
corpus-based research on 290–7
phonological corpora in second language teaching and learning 297–9
second language corpora 286–7
second language learners 3, 6, 286–8, 290, 299–300, 509
secondary stress 71–2, 74, 539
segmental contexts 395, 491, 540
segmental information 50, 57–8, 67, 70
segmental level 17, 50–1, 495, 538
segmental strings 281, 481
segmental transcription 50, 52, 55, 57, 67, 75, 87
segmentation 47–51, 63, 68–9, 74–5, 157, 317–19, 407
automatic 318, 325, 363, 368, 584–5
discourse 564–5, 567, 571
processes 48, 310, 407
semi-automatic 317–18, 496
segments 63–4, 120–1, 127, 317, 339–40, 363–4, 515–16
floating 251–2, 263
non-silence 317–18
phonological 267, 356
sound 49–50, 61, 64–7
speech 49, 268, 271, 388, 398
strings of 63–4
selection-dependent commands 372, 375–6, 379
semantics 6, 111, 118–19, 152, 166–7, 173, 336–7
semi-automatic segmentation 496
semi-directed conversations 253, 255, 487
semitones 76–7, 297
serialization 175–6, 182
sharing 2, 4, 18, 273–6, 283–4, 381–2, 561
systems 438, 441, 444, 466
short vowels 218, 222–4, 226–8, 230–5
signal definition 331, 334
signal files 323–5, 331
silence 53, 211, 317–19, 363, 556, 578
silent pauses 212, 292, 570
simple random sampling 20
simple signal file format (SSFF) 324, 332
singleton tracks 425–6, 434
singular nouns 244–6, 254
Sinitic languages 578, 580
situational contexts 18, 21, 555
social class 21, 252, 535–6
(p. 658) sociolinguistic interviews 538, 544, 553
sociolinguistic surveys 7, 517
sociolinguistics 5–6, 9, 117–18, 240, 249, 476, 483
software 108–9, 130, 341–3, 384–5, 403, 437–8, 583–5
components 140, 142, 320
exploitation 140, 162–3
open-source 279–80, 284, 385, 401, 437
sonorants, syllabic 228–9
sonority 229
sound
changes 57, 236, 238, 538, 545, 582, 584
objects 344, 347, 350, 358, 372
patterns 193, 201–3, 241, 268, 277, 577, 584
sound player plugin 444
sound segments 49–50, 61, 64–7
soundbites 8, 546, 550–1
source code 321, 370, 379, 385
southern French 493–4
space 38, 43, 122–4, 427, 430, 519, 523
vowel 322, 325, 339–41
span 187–8, 392, 425–6, 436
pitch 573
relationships 426, 429
spatial coding 5, 420, 427
specifications 167, 169, 173, 178–9, 187, 375, 378
spectra 29, 57, 175, 210, 244, 343–4, 347
spectrograms 48, 55–7, 331, 333, 349, 351–3, 355–6
speech
acoustics 272, 285
banks 8, 552–5
communities 33, 534, 536, 543
experiments 27–8
rates 211, 253, 272, 292–3, 327, 368, 559
samples 47, 54, 57, 78, 80, 271, 387–8
segments 49, 268, 271, 388, 398
styles 90, 98, 198–9, 206, 210–11, 476
transcription(s) 177, 179–81, 189, 425
spelling 23, 246, 248, 255, 520, 555–6, 585; see also orthography
standard 60, 254, 555
split-fall–rises 572–3
spoken corpora, see speech corpora
Spoken Dutch Corpus 90, 102, 105, 364
spoken language 4, 6, 177, 179, 181, 288–9, 509
corpora, see speech corpora
spontaneous conversations 57, 550
spreadsheets 220, 393–4, 400, 560
SQL 438, 467
queries 456–9, 466
SSFF (simple signal file format) 324, 332
stable vowels 493, 495
standard orthography 54, 243, 490
standard pronunciation 327, 418, 540–1
standard spelling 60, 254, 555
standardization 18, 23, 26, 167–9, 175–6, 187, 543
bodies 167–8, 189
processes 167, 173
standardized formats 2, 18, 23, 166–7, 187
standards 1, 26, 52, 93, 166–70, 189, 276
encoding 135, 142
international 1, 7, 176, 519
statistical analysis 4–5, 111, 212, 230, 321, 342, 394
statistical corpus exploitation 110–32
cluster analysis 111–28
literature review 128–32
statistical methods 1–2, 111–12, 128, 131, 213
statistical modelling 46, 193
statistical models 207, 213
statistical tools 217, 239, 338
statistics 1, 111, 128–9, 316, 400, 433, 505
stems 219, 225–6, 237, 310
stimuli 27, 29, 35–8, 198–9
nonverbal 34–6, 40
synthesized 210
storage 18, 23, 26, 140, 147, 322, 329
file 381, 469
stored objects 135–6
stratified random sampling 20
stress 72, 74, 278, 281, 327, 329, 539
patterns 281, 395
secondary 71–2, 74, 539
stressed syllables 216, 218, 397
string functions 372, 374, 379
string variables 373, 376
strings 99–100, 220, 281, 372–8, 390–2, 413, 423–4
search 220, 315, 470
segmental 281, 481
structural constraints 217, 239, 315
structural factors 214, 222, 225, 236, 238
structural features 22, 215
structuralist descriptions 70, 277
structures
feature 179, 183–7
linguistic 20, 22, 74, 147, 176, 479, 551
metrical 38, 69–70, 206
phonological 16, 30, 268, 300, 481
prosodic 207, 565–8
subclusters 115–16, 124, 126, 128
subcorpora 7, 29, 212, 413, 447, 509–11, 560
subdivision 311–12, 314, 390, 421, 434, 436
relationships 426, 429
symbolic 311–13
tracks 426, 429
subjectivity 91, 93, 561
substitutions 86, 95, 99–100, 103, 294
suffixes 225, 237, 317, 374, 541–2
derivational 219, 226, 237
inflectional 252
suprasegmental features 77, 417
suprasegmental information 2, 47, 51, 66–9, 71, 82, 86
surveys 249, 253, 256, 487–9, 491, 495–7, 503
sociolinguistic 7, 517
sustainability 2, 14, 18, 158, 164, 524, 533
EXMARaLDA 403–4
Switchboard corpus 90, 92, 101, 211, 565, 568
Switzerland 455, 492
syllabic sonorants 228–9
syllabic status 218–19, 221, 223–4, 228, 230, 235, 238
syllabification 51, 281, 294, 335, 391–2
syllable boundaries 255, 326, 356, 391, 416, 513, 540
syllable nuclei 75, 77, 368
syllable segmentation 326, 558
syllables 84–5, 326–8, 340, 479–83, 515–16, 558–9, 582–4
accented 74, 80–1, 85–6, 481
metrically strong 70, 482
stressed 216, 218, 397
unstressed 281–2, 299, 397, 483, 539
word-initial 493, 495–6
symbolic association 311–13
symbolic representations 48, 55, 57, 67, 70, 77, 482
symbolic subdivision 311–13
symbolic transcription 67, 75–7, 420
symbols 61–7, 70–2, 79–80, 84–6, 92, 119, 184
phonetic 54, 56, 58, 61, 265, 268, 281
transcription 63, 80, 289, 512, 556
syntactical queries 337–8
syntax 6, 9, 118, 371, 375, 497–8, 506–7
synthesized stimuli 210
systematic studies 21, 193, 266, 269, 475, 488
systematic variation 113, 119, 124, 126
tab-delimited text 314, 316
tag questions 296, 569
taggers 47, 61, 502–3, 524
tagging 47, 58, 152, 501–3, 532, 559, 584
part-of-speech 414–15, 519, 521–4, 531, 555, 559–60, 584–5
tags 186, 188, 431, 524–7, 532, 559, 568
tagsets 172, 179, 185–7, 524
TAICORP (Taiwanese Child Language Corpus) 8–9, 576–87
data collection 578–9
phonological information 582–4
POS and discourse annotations 584–7
text files 579–82
text format 580–2
transcription 579–80
Taiwan 8, 576–7, 579, 581, 583, 585, 587
Taiwan Southern Min 8, 576–87
Taiwanese Child Language Corpus, see TAICORP
target forms 269, 278, 283, 391, 397–8
(p. 660) target languages 8, 20, 35–6, 271, 291, 509–11
TAUS corpus 214, 217, 220–2, 236, 498, 502–3, 506–7
documented areas of use and future possibilities 507–8
limitations 506–7
revitalized 502–3, 507
TEI (Text Encoding Initiative) 154, 159, 168, 176–7, 187–90, 525, 527
annotating corpora with core mechanisms 183–8
documents 177–9, 181, 187
Guidelines 19, 175–7, 182, 187, 525
TEI-Based Transcription 180–3
telephone dialogues 102–3, 211
templates 36, 119, 306, 313, 331, 459, 461
database 322–5, 329, 331
temporal relationships 422, 428, 435
text boxes 308, 462
Text Encoding Initiative, see TEI
text fields 353, 355, 373, 440, 451, 453, 466
text files 17, 363, 366, 371, 376, 577–80, 584
textbooks 54, 110, 120, 130–1, 216
TextGrid 343–5, 350–2, 355, 357, 360, 371–5, 559–60
files 355, 365–6, 375, 545, 557
objects 344, 355, 360, 372, 375, 377
texts, electronic 15, 110–11, 156, 179
textual label 305, 310
theoretical models 245, 266, 272, 495
tier-based operations 317–19
tiers 305–8, 310–20, 350–2, 354–8, 365, 418–19, 434
interval 334, 351–2, 358, 373–5
manually annotated 512, 515–16
phonetic 82, 84, 482
phonological 82, 84, 482
prosodic 82, 254, 368
segments 512–13, 515
transcription 66, 71, 78, 81, 254, 492
time
information 16, 326, 331, 334, 339, 424
intervals 311, 316, 332, 422, 425–7
time-aligned annotations 17, 147, 288, 310, 332
time-aligned phonological annotations 14, 16–17, 23–4
time-aligned tracks 422, 436
time subdivision 311–13, 434
timestamps 409, 426–7, 434
TLS sub-corpus 518–19
ToBI (Tones and Break Indices transcription) 68, 70, 74, 78, 80–1, 566–7, 571
token file 221–2
token frequencies 577, 583–4
tonal configurations 80, 87
tonal inventories 207–9
tone inventory 202, 296–7, 300
tone tier 512–15, 580
tones 70–1, 76, 78–9, 295–7, 327–30, 566, 580–2
boundary 70, 78, 482, 512, 514, 566, 574–5
level 296–7
lexical 69, 72, 582, 584
nuclear 74, 296–7
Tones and Break Indices transcription, see ToBI
tools 1–5, 23–4, 108–9, 332–3, 401, 419–21, 434–7
annotation 305, 421, 427, 436, 557
development of 46–7, 212–13
EXMARaLDA 290, 404, 408–15, 436
statistical 217, 239, 338
TQE (Transcription Quality Evaluation) 109
tracks 42, 44, 331, 333, 374, 377, 422–35
horizontal 307, 422
primary 425, 429, 434
reference 424–6, 429–30
singleton 425–6, 434
span 425–6, 430
subdivision 426, 429
time-aligned 422, 436
transcribed corpora 271, 278, 284
transcribers 56–61, 72, 74–6, 90–1, 100–1, 107, 271–2
human 92–3, 100, 102, 271, 363
Transcription Quality Evaluation (TQE) 109
transcription symbols 63, 80, 289, 512, 556
(p. 661) transcription tiers 66, 71, 78, 81, 254, 492
transcription(s) 51–3, 55–61, 63–9, 78–82, 90–7, 103–8, 554–8
automatic 68, 91–4, 107
blind 388–9
canonical 98–9, 102, 104–7, 364–5, 368
data-driven 103–4
errors 59, 200
EXMARaLDA 405, 411
files 447, 456, 463, 522, 549
human 93–4, 96, 273
IPA 73, 193, 278, 388, 393
knowledge-based 103–4
manual 91, 104–5, 107, 378, 584
Phon 384
processes 384, 409, 536, 556, 560
reference 94, 100–5
segmental 50, 52, 55, 57, 67, 75, 87
speech 177, 179–81, 189, 425
symbolic 67, 75–7, 420
systems 64, 67, 69, 78, 80, 87, 556
TAICORP 579–80
VALIBEL 555–7
transition diagrams 432–3, 436
translation 180, 308, 327–9, 406, 505, 548–9, 556
transliteration 327, 424
truncation 82, 242, 483
tunes 446, 566, 571–3, 575
intonational 571–2, 574–5
Tyneside 7, 476, 517, 520, 533
underselection 415
unfilled pauses 292, 512
universal innate patterns 9, 576
University of Oslo 6, 216–17, 487, 498, 502
unregistered users 461, 463
unstressed syllables 281–2, 299, 397, 483, 539
uploading 146, 160, 441–2, 444, 446–7, 456, 462–3
usage 13, 16, 22–3, 114, 150, 152, 245–6
phonetic 113–14, 117, 123, 127
user fields 451, 454, 463
user interfaces 314–15, 355, 409, 423–4, 427–8, 439; see also GUIs
user-specified directories 366, 372, 375–7
users 136–9, 144–5, 341–5, 363–6, 386–92, 399, 413–15
registered 146, 453, 456, 460–1, 463
unregistered 461, 463
VALIBEL
annotation 557–61
from corpora to database 552–5
database 8, 490, 552–61
future developments 561
transcription 555–7
validity 93–4, 112, 134, 138, 202–3, 415, 423
ecological 479–80, 483
variability 20–1, 205–6, 209–11, 245, 248, 264, 271
of phones 210–11
variable liaison 246, 249, 254, 261
variable rule analysis 220–1, 224, 232, 234
variables 113, 117–19, 372–5, 377–9, 504, 506–7, 538–45
dependent 221
grammatical 7, 539, 542, 544
independent 221–2
indexed 377, 379
linguistic 504, 537
phonetic 7, 113–14, 538–9, 544
phonetic segment 113, 120
Praat 373–4
string 373, 376
variants 87, 90, 100, 102, 104, 107, 542–5
apical 215, 220–1
pronunciation 64, 90, 97, 100, 107, 366
variation 229, 231, 240–1, 245–7, 257–9, 261, 540
contextual 2, 27, 29
intonational 5, 80, 475, 477, 564–5
linguistic 221, 236, 553
melodic 51
phonetic 51, 119, 124, 128, 213, 366, 538–9
systematic 113, 119, 124, 126
verifications 196, 272, 283–5, 389, 392
video 41–2, 44, 310, 420–1, 423–4, 430, 432–3
files 6, 42, 44, 306–7, 310, 498, 504
frames 358, 427, 436
video annotation 420–36
advanced annotation concepts 426–7
fundamental annotation concepts 421–6
video camcorders 43–5
video recordings 15, 138, 289–90, 408–9, 414, 420, 430
viewer components 307–8
virtual collections 133, 136, 142, 145
visual inspection 56, 115–16, 481
visualization 307, 310, 388, 428, 432, 434, 436
vocabulary 198, 288, 308, 313–14, 408, 501, 553
vocal tract 197, 202, 272
vocalic intervals 513–14
Voice Onset Time (VOT) 272
VOT (Voice Onset Time) 272
vowel formants 38, 299, 537
vowel harmony 283, 392, 488
vowel space 322, 325, 339–41
vowels 50–1, 61–2, 210–11, 225–6, 395–6, 493, 513–14
intervening 283, 397–8
nasal 243, 262, 557
stable 493, 495
WAV files 44, 48, 306–7, 324, 372, 374–7, 444
waveforms 56, 332–3, 343–5, 349, 351, 353, 360
viewer 306–7, 317
WaveSurfer 332–3, 383
web 5, 13, 15, 128, 437, 441, 447
applications 160, 164, 439
search 6, 129–30, 498
servers 438–40, 453, 467–8, 472
web-based archiving and sharing 437–72
architecture 438–41
filling in corpus data 461–6
Flexicontent, see Flexicontent
future development 471–2
importing existing data 466–7
and server-independent architecture 467–71
system installation 441–61
weights 119, 123, 201, 231, 233, 235
factor 231–2, 234
Welsh English 476–7
WERs (word error rates) 105–6
West Oslo 499–500
Western Jutland 542
word boundaries 79, 218, 242, 326, 356, 394, 497
word classes 219, 221–2, 227–30, 234–6, 296
word error rates, see WERs
word frequency 118, 205, 585–6
word groups 72, 74, 390–1, 396
word-initial syllables 493, 495–6
word tokens 226, 534, 578
wordlists 193, 198, 253–4, 316, 339–40, 409, 487–9
elicitation 35
words 30–5, 96–100, 225–7, 326–30, 336–40, 393–7, 539–41
complex 215, 219, 227, 237, 239
function 294, 336–7, 582
monomorphemic 216, 218, 222–4, 227–8, 230, 232–5, 237–8
orthographic 63, 78
partial 90, 108
phonological 226, 233, 237
workflows 44, 134, 321–2, 386, 388, 390, 405
World Wide Web, see web
X-bar syntax 246–7
X-SAMPA 61, 65–7, 92
XLR connections 43–4
XML 23, 181–2, 185, 306, 403–4, 438, 525
declarations 525–6
documents 181–2, 525
files 137, 151, 182, 525, 527, 532–3