Show Summary Details

Page of

PRINTED FROM OXFORD HANDBOOKS ONLINE (www.oxfordhandbooks.com). © Oxford University Press, 2018. All Rights Reserved. Under the terms of the licence agreement, an individual user may print out a PDF of a single chapter of a title in Oxford Handbooks Online for personal use (for details see Privacy Policy and Legal Notice).

date: 20 February 2020

(p. 915) Index

(p. 915) Index

Notes: f denotes illustration, b box and t table

The adjective ‘human’ has been omitted in starts to subject headings. However it is used within topics such as ‘human/non-human primates’

The structure of this book is described on pp. 10–12

A
accent languages 20
accents 902
attitudes to 668–9
cue to person perception 667–82
development 673–4
effects on social evaluation 669–70
evolution 672–3
foreign 670, 673
foreign accent syndrome 675
neural signature for group membership 676
neuroscience of 675
own-accent bias 670
regional 548, 673
shibboleth 668
sociophonetic studies 780–1
television productions 674
accuracy of personality traits from voices 593–4
‘acousmatic’ voices 177
acoustic cues 369–70
acoustic information
asymmetric sampling in time (AST) hypothesis 146–8
psychoacoustic model of voice 179t
acoustic modulations, slow 148–9
acoustic parameters 64–5
acoustic patterning of emotion vocalizations 61–92
acoustic parameters of voice production 64–5
Geneva Minimalistic Acoustic Parameter Set (GeMAPS) 64–5b
acoustic similarities across species 365–7
Acoustic Voice Quality Index (AVQI) 172–3
acoustics, physical and perceptual information 586–7
aeroacoustics of phonation 175, 176
affect
centrality 288
expressed through voice, stronger observable neural response in amygdala 483
low-level descriptors (LLDs) 720
non-verbal expression 447–8, 468
affect bursts 61–2
affective information, neural processing 431–55
affective prosody, defined 431
affective semantics 299–300
affective states, animal calls 397–401
affective voices
neural processing 446–9
prioritized processing 449–51
age
acoustic cues and correlates 38, 566–70
age-related hearing impairment 157–8
changes with ageing 595, 781–3
speech processing, role of prosody 156–8 see also voice gender and age
aggression, vs fear 365
agnosia
phonagnosia 39, 48, 53, 539, 546, 553, 855–92
proposagnosia 22, 38
alarm/distress signals 293–7, 310, 364, 366, 373–4
across species 373
mobbing 374, 395
(p. 916)
alcohol dependency 659
Alzheimer’s disease 894
phonagnosia 875
amphibians, vocal recognition skills 16–18
amygdala 473–94, 482f
emotion vocalizations 473–94
affect expressed 483
faces and voices, neuroimaging 651–3
fear conditioning 476–8
functional neuroimaging (fMRI) 481–3
laterobasal (LB) neurons 415–17, 439
PET study 481
schizophrenia and 809
selectivity for FM rate 417
single-neuron recordings 487
subnuclei 487–8
superficial complex (SF) and latero-basal complex (LB) 440f
analysis of handbook structure 10–12
analysis–synthesis methods 761–4
anger
cortico–subcortical network 25
emotional processing 483
model, standardized path coefficients 81f
animal calls/signals 293–300, 364–5
‘acoustic startle reflex’ 294
affective states 397–401
alarm/distress signals 293–7, 310, 364, 366, 373
bats, call convergence 404
call usage 405–6
duration, increased fundamental frequency, alarm and aggressive 399
frequency sweeps 294
group-dependent vocalizations 672
induce affective states 399–400
learnability and expressivity 297
modality 295
monkeys, species-specific calls 415f
motivatedness 293
production constraints 295
repetition rate 370
swallows, infant calls 403
systematicity 296
transparency 295
vocalization processing (list) 369
voice perception, similarities in bioacoustics 365–7
anterior forebrain pathway (AFP) 317–18f, 319–21
dopamine as a regulator 324
anterior superior temporal (AST) cortex 420–1
audio-motor integration 422–4f
homology between human and non-human primates 423
non-verbal voice information precursor signal 423
selective for phonemes words, and short phrases 420 [link] –1 see also superior temporal cortex (STC)
anterolateral belt 414
anxiety 475
anxious voice 20
aphasia, conduction, receptive or expressive 210
apperceptive voice-identity processing 860–1t, 864, 869 see also phonagnosia
appetitive learning, dopamine 323
appraisal processing with emotional cues 483
approach avoidance 592–3, 809
articulation 780–2
articulation rate 396t
articulatory execution 744–5
network 747
synthesis 760
artificial neural networks (ANNs) 731–4
associative voice-identity processing 860–1t, 864, 869 see also phonagnosia
asymmetric sampling in time (AST) hypothesis 146, 147–8
attack speed 396t
attractiveness of voice 587–8, 594–5, 607–16
cognitive neuroscience 614
evolution and 608
fundamental frequency and 612–13
interdisciplinary approach 607
sexual selection 608–12
femininity 610–12
masculinity 608–10
social success 613–14
voice morphing 587–8, 685–706
attribution bias 804, 816f
audio words 724–5 (p. 917)
audio-linguistic signal processing (ALSP) model
dysprosodia 816–17
feedforward and top-down mechanisms 818–19
audiovisual integration (AVI)
high-level social information 552
in infancy
emotion 242–4
grand-averaged responses to the audiovisual pairs 244 [link] –5
identity 240–1
McGurk effect 243
speech 243–5
audiovisual speaker-identity perception 553
audition and vocal production, integrative status 22
auditory cortex 413–23
acoustic cues underlying prosodic modulations 147
activation of superior temporal sulcus (STS) areas 39–40
auditory dorsal pathway 421, 423
bilateral activation, vocal/non-vocal 40
core and belt regions 413, 420
evolution 423
lateral belt neurons for species-specific calls in macaques 415f
right and left posterior auditory-related cortex (PARC) 147f, 148
age-related 156–7
fast temporal transitions 156
auditory dorsal stream
non-human primates 413, 421, 423
sensorimotor functions 413, 421, 423
auditory evoked potentials, recorded during stimulation with brief vocal sound stimuli 41f
auditory feature network 743–4f, 745t, 746–7
auditory input processing 743–4f, 745t
auditory priming 451
auditory processing, dual-pathway system 413–28
auditory stimuli (Appendix) 234
auditory system, general perceptual borders 370
auditory-acoustic elements
exclusive combinations 26
uniqueness of vocal pattern 26
auditory-acoustic parameters 19f
auditory-feature sensitivity of voice-region neurons, neurons in 343
auditory–vocal functionality, dynamic development 26
autistic spectrum disorders (ASD) 239–40, 657–8
atypicalities in language development 239–40, 547
children at risk 51, 240, 245
diminished reward salience 817
disrupted parental behaviour 265
dysprosodia 815
emotionally congruent face–voice combinations 546–7, 657–8
gelotophobia 509
pervasive developmental disorder 658
superior temporal sulcus/gyrus a typicality 240
temporal voice areas (TVAs) 44
voice-sensitive activity 51
autobiographical memory, parietal or temporal lobe damage 25
automatic access systems, forensic aspects 784–5
automatic speaker recognition (ASR) 720
systems 787
automatic speech recognition (ASR) 707, 709–13
automatic transcription 711–13
defined 709
distinctive aspects 710
feature extraction 710–11
front-end transcription 710–11
normalization 713
normalization methodologies 707
automatic transcription 711
B
babies (incl. newborns) see infants
baboons
chacma baboons, dominance behaviour 403
infant contact calls and distress screams 403
bag-of-audio-words method 725
bag-of-features processing chain 727 (p. 918)
bag-of-words method 725, 727f
suprasegmental features 720
band-passed (BP) noise
LB neurons 415, 417
selectivity of neurons in lateral auditory belt cortex 416f
basal ganglia 442–3, 444f
dysfunction
Parkinson’s disease 25, 810
personal identity and other vocally transmitted information 25
Parkinson’s disease 25, 810
schizophrenia and 810
basal ganglia nuclei
connection with frontal lobes 24
sites for vocalization in primates 24
bats
aggressive and appeasing vocal sequences, basolateral amygdala 371
call convergence 404
echo-delay evaluation 371
vocal recognition skills 16
BBC, personality profiles 669, 672
behavioural evidence, heterospecific emotional processing 371, 376–9
behavioural paradigms, hallucinations 834–5
behavioural studies
aggressive behaviour 365, 399, 403
behavioural biometric features 778, 779–81
bodies and voices 654f
brain lesions, voice identity processing 875
cerebral representation of voice identity 571–3
differences in voice perception
temporal voice areas (TVAs) 43–4
voice gender and age 42, 570
dominance behaviour 403
EEG, and brain imaging evidence of voice processing 196t
emotional processing 371, 376–7
evolution, across vertebrates 5 [link] –7
faces and voices 645–6
heterospecific vocalization processing 372–4
infant voice processing 196t
best bandwidth (BBW) 416f
binary decision trees
with leaf nodes 764
results of context clustering 772f
bioacoustics, similarities across species 365–7
biological meanings (basic) 370, 371
biological voice 370–2
biomechanics 175, 781
biometric features 778–81
behavioural biometric features 778, 779–81
physiological biometric features 778, 779
variability 788
within and between speakers 781–3
biometric speaker recognition 784–9
automatic systems 787–8
automatic/ acoustic/ aural methods compared 788–9
defined 777
future visions 789
naive and expert systems 785–6
biosemantics 298–9
bipolar disorder 658, 801
social cognitive disabilities 807
bipolar rating scales 668
birdsong development
anterior forebrain pathway (AFP) 317–18f, 319–21
exposure to live tutor (various spp.) 313–14
impact of social factors 314
zebra finch 314–16
instructive and selective models of learning 310–11
new thinking on neural basis 318
reward value of song, social motivation system 321–5
socially guided learning (SGL) model 311
song motor pathway (SMP) (vocal generator) 317–19
song system (network of discrete brain nuclei) 317, 322f
blood oxygen level dependent (BOLD) signal, neural activity 486, 488
bodies and voices 653–9
behavioural effects 654f
electrophysiology and neuroimaging 654–5
emotional cues from voice, face and body expression 659
Boltzmann machines 733
bouba-kiki effect 285, 287
brain
core and extended cerebral networks for voice processing 37–60 (p. 919)
networks and hubs, neurophysiological location 744f, 745t, 749f
primate brain 337–62
right temporal areas, perception of speech 146
brain activation, attentive listening 221f
brain cerebral networks 37–60, 489, 905–9
brain hub 744f, 749f
brain lesions 144, 210, 432–4, 478–80
amygdala and hippocampus 433 [link] –4, 479
phonagnosia as a selective deficit in recogniäing speaker identity 539
recognition of affective meaning 433–4
right and left lateralized 432
in regard to emotional processing 479
STS/STG, predictors of aprosodic (affective) deficits 144
Wernicke-Lichtheim-Geschwind model 210
brain plasticity 21–2, 226
brain responses to human, dog, and non-vocal sounds 379f
brain streams 743–5, 747
speech perception 744
visual perception 746
brain systems, speech production and perception 21–4f
brainstem, peripheral processes of audition and vocal production 26
Broca’s area 214, 413
schizophrenia 657
syntactic processing and 220–1
Brodmann’s area 210, 225, 237, 424, 441, 508
Bruce-and-Young-type model of face perception
face processing 539–40, 553, 857f
face ‘structural encoding’ stage 46–7
person recognition 553
budgerigars, convergence 404
C
California mice, aggressive behaviour 399
call duration
increased fundamental frequency, alarm and aggressive 399
repetition rate 370
call series 370
call usage
modified, environmental noise 406 see also mobbing
Capgras syndrome 22
caregiving, valence 364
cat, vocalizations 377
cepstral analysis 761
cepstral coefficients, mel-frequency 723
cepstral peak prominence 172–3
smoothed (CPPs) 172–3
cepstral spectral index of dysphonia (CSID) 173
cerebellum
activity to infant cries 254–6t
damage, jerky, arrhythmic vocal signal (scanning or ataxic spee) 25–6
vocal gestures 25
cerebral networks 37–60, 489, 905–9
key cognitive operations in voice processing 907f
cerebral processing of voice 37–60, 203, 907f
extended 47–9
in human newborn 49–50
interplay between core and extended networks 49–50
phylogeny and ontogeny 50f
cerebral representation of voice identity 570–4
behavioural evidence 571–3
neurophysiological evidence 573–4
Chandler, Raymond, on voice quality 168b
chickadees, convergence in D-notes 404
Chinese language 769–70
chromatic scales 396
clinical disorders 797–914
acquired and developmental phonagnosia 855–92
autistic spectrum disorders (ASD) 239–40, 657–8
bipolar disorder 658, 801
dementias 893–914
hallucinations 831–54
Parkinson’s disease 895–6
schizophrenia 799–830
cochlea
development 193
tonotopic organization 370
code-excited linear prediction 761
cognitive model
attractiveness of voice 614
processing semantics and syntax of sentences 751
(p. 920)
cognitive psychology 671
combinatorial network 743–4f, 745t, 751
communication
alarm/distress signals 293–7, 310, 364, 366, 373
animal signals 293–300, 364–5
conspecific communication processing 371–2, 376–8
dual-stream models of auditory vocal communication 413–28
emotion vocalizations 819–20
functional deployability 297
heterospecific contexts 375, 377–9
in human/non-human primates 279–308
model of auditory vocal communication 413–28
music, emotional communication 395
in non-human animals 393–412
primordial communication 15–36
signals
face–voice integration 337
voice and face 337
social functions, identification 402 [link] –5
computational paralinguistics 713f, 719
computer graphics, face morphing 695
computers
algorithms 543
ANNs 732
automatic access systems 784
automatic speech recognition (ASR) 707
machine learning 733
neurocomputational models of mental lexicon 751
observed F 0s 770f
speaker verification or authentication 784
speech-based personal assistants 716
TTS conversion systems 758
voice averageness 615
voice emotion 69
concatenations
cascade and parallel 762
cost, and selection cost 762
concept-to-speech conversion 759–62
conceptual network 743–4f, 745t
conditioned stimulus (CS) 475
connectivity analysis 485, 879
consensus auditory–perceptual evaluation of voice, or CAPE-V 169
consistency of personality traits from voices 591–2
conspecific vs heterospecific emotional processing 376–8
core cerebral networks for voice processing 37–60
core-voice system 858, 859, 876, 880, 882–5
cortical entrainment 149
temporal envelope 146
cortical regions
speech and language processing, models 211t
vocal emotional processing 371
cotton-top tamarin, aggressive behaviour 399, 403
court of law
voice testimony 627–8
voice-recognition evidence 628–9
covered voice 126
cowbirds, reinforcement 404
criminology see forensics
critical bands (CBs), processing of spectrally rich sounds 371
cross-species approach to traits 310, 354, 376
crossmodal bias 285
crossmodal connections
ERPs 646
web diagram 286f
cues
acoustic cues, similarities in neural processing 369–71
acoustic cues underlying prosodic modulations 147
emotional cues from voice, face and body expression 659
salience 813–14f
size assessment, formant dispersion 375
to syntax, prosodic information 150–1
culture and languages
cross-cultural development of voice-sensitive activation 240–1 (p. 921)
emotion vocalizations 82–3
sociocultural background 548
ultural context of phonation 176
D
Darwin, Charles
on infant cues and survival 251
on vocal emotional expression 61, 364, 394, 459, 473, 495, 819
decoding 727–30
frame-level 728–34
segment-level 734–6 see also machine-based decoding
deep brain stimulation (DBS), Parkinson’s disease 25
deep learning 733
deep neural network (DNN)
bandpass behaviour 737
DNN-based speech synthesis 763, 764, 773
deer, familiarity recognition 16
delusion, impairment of familiarity sense 22
dementia 893–914
hallucinations 906, 907f
paralinguisitic information processing 904–5
sarcasm 905, 908f
voice identity processing 904
voice processing 895–6, 906, 907 [link] –8f
neuroanatomical bases 904
depression 658
postnatal 239, 265
developmental phonagnosia 39, 856–7, 877–87
diaphragm 118–19
diffusion tensor imaging (DTI) 814
discrimination and selection network 745, 746
discrimination tasks
defined 542
paradigm 520
disturbed voice, GRBAS scale 9, 169
ditropic sentences 20
dogs
dog barks and growls 374, 377
familiarity recognition 16
family dog 375
lateralization bias for learned meaningfulness 381
near-primary auditory cortex 370, 380–1
reward, basic meaning, fMRI study 380f
temporal voice areas (TVAs) 52
vocal brain evolution 51–3
voice-related brain area 191
dolphins, imitation 404
dominance behaviour, chacma baboons 403
dominance judgements 590
dopamine
appetitive learning 323
mesolimbic dopamine pathway 317
reward and 323
songbirds 323–4
dopaminergic projections 318, 321, 323–4, 326
dopaminergic transmission 810, 843
dopaminergically based reward-processing neural network 809
dorsal processing streams 211 [link] –12, 215, 413, 421–3f
dorsolateral prefrontal cortex (DLPFC) 421
dual processing streams, visual analogy 210f
dual-pathway system, higher auditory processing 413–28
dual-stream models
dorsal processing streams 211 [link] –12, 215, 413, 421–3f
ventral processing stream 211t, 342–3, 353, 413–14, 421–3
dyslexia 526–7, 547
dysphonia, cepstral spectral index of dysphonia (CSID) 173
dysphonia severity index (DSI) 173
dysprosodia
audio-linguistic signal processing model 816
defined 25
schizophrenia and 803, 804, 808
auditory sensory deficits 810–11
premorbid indicator 806
E
earwitness testimony 617–44
adapting rules evolved for eyewitnesses 628
identification 639
eavesdropping
food location 373
predators and prey 372–3
electroencephalography (EEG) 475, 549, 646–8 (p. 922)
electroglottography 174
electrophysiological studies
EEG and MEG 646–8
schizophrenia 811
and time course of processing vocal emotion expressions 459–72
elephants 399
familiarity recognition 16
reaction to human speech, imposed threat 372
emotion vocalizations 19, 65–85, 429–512
acoustic patterning 61–92
GEMEP and VENEC 73–4t, 83
amygdala processing 473–94
appraisal check results 67t
clinical research 84
communicative function 819–20
cues, from voice, face and body expression 659
culture and languages 82–3
empirical studies 67–75
evolutionary origin 61
expression 26, 83
first 21
non-verbal communication in speech 736f, 900
path analysis 80–1f
predictions
acoustic parameter changes 68t
physiological changes 66
signals, structural features 396t
social communication, non-human animals 393–412
transmission and perception 75–7
GEMEP and VENEC 78, 83
recognition 78, 79t
transform functions 77
tripartite emotion expression and perception (TEEP) model 63f
voice production 65–75, 474–6
voice samples 69–70
emotional contagion 400, 438
emotional processing
adaptation effects 486–7
affective prosody 484
behavioural studies on lateralization 371
conspecific vs heterospecific 376–8
cortical regions 371
heterospecific, human behavioural evidence 371, 376–7
implications of attention 483–4
individual differences 484–5
emotional prosody 63, 144–6, 371, 900–1
angry vs neutral 483
infants 238–9
audiovisual emotion integration 242–4
schizophrenia and 656
emotions
‘big six’ 719
primary, vertebrates 364, 365
salience and intensity 485
speech prosody 288
stimuli, fMRI study 489–90
encoding 65, 69, 71, 76, 78, 82–3, 219
voice representation prototype 26, 47, 519, 544, 565f, 570–7, 595, 607, 614–16, 633, 673, 700, 858 see also emotion vocalizations
end-to-end learning 736–7
entertainment industry, language, ‘same’ vs ‘other’ 674
envelope
F0-adaptive spectrum envelope 687–8
sharpness 150
errors
carousel 733
false positive/ negative 784
ethnolinguistic identity theory 671
‘ethotransmitter’ 17
Euclidean distance 724, 726, 804
event-related potentials (ERPs) 217, 218, 226, 239, 242–3, 463, 475, 550, 594, 646
deviance detection, mismatch negativity 811–12
evolution
attractiveness of voice 608
comparative perspective 277–428
continuity of social expressions 4, 5f
divergence of humans and songbirds, vocal learning, social reward 309
emotion vocalizations 61
language 18–19
larynx and vocalization 21
laughter 497f (p. 923)
perceptual and cognitive apparatus 297–9
primordial vocalization 15–17, 364–5
vocal behaviours across vertebrates 5 [link] –7
‘vocal brain’ 51–3
vocalization processing 17, 52–3
voice averageness 615–16
expectation maximization (EM) 724
expertise, biometric speaker recognition 785–6
extended cerebral networks for voice processing 37–60
extrapolation of personality traits from voices 589
F
face attractiveness 617
face inversion effect, ‘Thatcher illusion’ 38
‘face localizer’ protocol 39
face perception 37–8, 414
Bruce-and-Young-type model, face processing 539–40, 553, 857f
familiar face recognition 543
‘structural encoding’ stage 46–7
face stimuli, fusiform face area (FFA) 39
face–voice integration
behavioural studies 617
communication signals, primates 337
individual difference
infants 241–2
optimal 552
pairings of unfamiliar adults, in infants 241
perception of speaker identity 552
processing, analogy 67, 414
faces and bodies 645–66
faces and voices
behavioural effects 645–6
electrophysiological studies 646–8f
neuroimaging 648–53
amygdala 651–3
connectivity of RPTS 650–1
right posterior temporal sulcus 648–9f, 650
roles in judgements 588
summary traits 589
facial overshadowing effect 636–7
factorial–dimensional analysis 134
false acceptance/rejection 784
falsetto–sounding male 177
familiarity agnosias 22
familiarity recognition 540–1f, 898–9
change blindness 23
impairment, delusion 22
in non-human animals 16
special status 23
as a value 22
fear conditioning 475–6
feature learning 736
feature scaling (unity-based normalization) 727
feature selection 710, 724
feedforward neural network 732f
fetus
auditory development steps 192t
earliest voice perception 193–4
recognition of maternal voice 17, 49, 673
fine-grained acoustic speech events 149, 461, 464, 632
first impression 588–90, 592, 594, 669, 702
fishes
central pattern generator 17
cichlid familiarity recognition 15–16
midshipman fish, hums 15, 21
periaquaductal grey (PAG) 25, 363
vocalizations in batrachoidid fish 5f
fluency 616
forebrain, anterior pathway (AFP) 317–18f, 319–21, 324
foreign accent syndrome 675
foreign language
second-language learning 156
language familiarity effect 524
speaker identification 524–6
forensics
automatic access systems 784–5
biometric speaker recognition 784–9
determining identity 617–44
INTERPOL 788
speaker comparison
analysis 786
defined 778
speech analysis, automatic/ acoustic/ aural methods compared 788–9
vocal disguise 782–3
voice identity 617–44
voice recognition 627–9, 634–7
(p. 924)
formant-based speech synthesis 757, 760
formants 127–8
defined 7, 117, 757
dispersion 789
cue for size assessment 375
formant or voice source 132
frequencies, function 368, 786
laryngeal ventricle 130
linear prediction (LP) 757
singer’s formant cluster 129–30
tuning 131–3
frequency-modulated (FM) signals 417–18f
mismatch negativity (MMN) 809, 811
frogs see amphibians
frontal operculum 144, 438
frontal regions
Broca’s area 214, 220–1, 413, 657
connection with nuclei of basal ganglia 24
damage, timing features in vocal production 25
frontotemporal dementia 894, 905
frontotemporal lobar degeneration (FTLD) 875
frontotemporal sensitivity to voice (FTPV) 41f, 50f
functional near-infrared spectroscopy (fNIRS) 195–9, 237–40, 245, 481–3
functional neuroimaging (fMRI) 648–55
amygdala 481–3
bodies and voices 654–5
electro/ magnetoencephalography (EEG/ MEG) studies 574, 575
emotional stimuli 489–90
infant cries 255–6t
responses during voice imagery 879
reward, basic meaning 380f
functionals 723–5
fundamental frequency (F 0) 7, 612, 786
defined 786
generation process model 767–8f, 770f
modelling contours 765–70f
fusiform face area (FFA) 39
functional significance 46
voice familiarity/recognition tasks 549
G
gate neurons 733
Gaussian mixture model (GMM) 765
gelotophobia 508–9
general perceptual borders of vocalizations 370–1
generation and decoding of voices 683–794
generation process model 757, 769
observed (F 0)s 767–8f, 770f
Geneva minimalistic acoustic parameter set (GeMAPS) 64, 65b, 715, 724
German language 589
gibbons, acoustic structure of gibbon duets 672
Glasgow Voice Memory Test (GVMT) 44
glottal adduction 120–1, 125, 136, 139
flow phonation 123f
glottal asymmetry 175
glottal configuration, aspects of phonation 175
glottal pulses, spectrogram and amplitude waveform 6f
glottis
pulsating waveform 122f see also subglottal pressure
glottograms 124f
goats
familiarity recognition 16
group-dependent vocalizations 672
goldfinches, call convergence 404
graphics processing units (GPUs) 732
gray mouse lemur 399
GRBAS scale, disturbed voice 9, 169
great apes, sexual dimorphism 403
H
hallucinations 831–54, 903
auditory verbal hallucinations (AVH) 831
behavioural paradigms 834–5
defined 831
dementia 906, 907f
findings from structural and functional studies 836–45
brain functional characteristics 840–1
MRI studies 837–40
post-mortem studies 836
state effects 840–2
trait effects 842–5
schizophrenia patients 833–4
theoretical models 832–3
harmonic source, spectral shape 179t
harmonic spectrum, quasi-harmonic partials 126 (p. 925)
harmonic-to-noise ratio (HNR) 366, 370
trustworthiness 590
health, voice averageness 615–16
hearing impairment
adventitious 22
age-related 156–8
atrophy of PARC 157
paediatric 102
Heschl’s gyrus/sulcus 147, 147f, 156, 378, 420–1, 836, 858, 882, 885
heterospecific emotional processing, behavioural evidence 371, 376–9
hidden Markov models (HMM) 711–12f, 728–30
likelihood ratios 788
personalized voice synthesis 542
speech synthesis 758–9, 763–5, 769–71f
F0 contour generation 768–73
HMM-based generation process model 769–70
variations in pronunciation 763
hierarchical models
three steps model of neural activation 480
voice-identity processing 341–6, 549
higher auditory processing, dual-pathway system 413–28
hippocampus
basal ganglia (BG) 443–4f
medial frontal cortex (MFC) and 442, 445, 505–6
brain lesions 433 [link] –4, 479
horses, familiarity recognition 16
Huntington’s disease 895
hyperparameter optimization 735
I
iconicity 287, 466
identity see voice identity
indigo buntings, imitation 404
individual differences, voice recognition 546–8
infancy/vocalizations
accents 674–5
acoustics 252 [link] –3
adult neural responses 251–76
atypical development, voice-sensitive activation 239–40
audiovisual emotion integration 242–4
audiovisual identity integration 240–1
audiovisual speech integration 243–5
avatars 753
birth and early developmental changes 194–6
cries
functional neuroimaging (fMRI) 255–6t
heterospecifics 375
neural reactivity to 254, 255–6t, 375
demands of parenting 262f
effects of gender and parental status 263
individual differences 263–4
influence of hormonal levels on the parental brain 266
postnatal depression (PND) 239, 265
disrupted parent–infant interactions, neural correlates 265–6
early discrimination of prosodic patterns 199
emotion-modulated activation 238–40
face–voice pairings of unfamiliar adults 241
language-learning mechanisms 215–16
major developmental steps 201–2
maternal voice recognition 18, 235–6
mismatch negativity (MMN) protocol 49
in newborns 49
monkey faces producing different vocal expressions 243
natural learning 753
negative information 199
neural responses in adult listeners 251–76
‘parental brain’ networks 257 [link] –8
processing emotional prosody 198–201
sensitivity to 253–5
social perception 235–50
speech
acquisition 752
‘categorical perception’, ‘magnet effect’ 215
segmentation 215
temporal dynamics of ‘parental brain 258–9f
types 253
voice and face integration 241–2
voice perception 191–203
auditory pathway development 192–8
cerebral voice processing in newborn 49–50
extraction of social meaning 198–201
IDS vs speech overheard 94
lateralization of activation in hemispheres 197f (p. 926)
native and non-native speech responses 198
neuroimaging and electroencephalography 195–8
recognition of the maternal voice 17
voice processing 236–8
behavioural, EEG, and brain imaging evidence 196t
infant-directed speech (IDS) 93–116
acoustic properties 96–8
caregivers’ intentions 95–6
caregivers’ use of 107–8
dog puppies 375–6
effect of developmental status 101–2
effect on language acquisition 103–4
IDS vs ADS 94–5
learning 104–7
roles 94–5
segmental properties 98–101
speech quality 94
suprasegmental and segmental dimensions 94
inferior frontal cortex (IFC) 371, 436 [link] –43, 446, 448–9, 451, 549, 651
large-scale neural network 446
musical syntax and prosodic perception 808
voice-identity processing 549
inferior parietal cortex (IPC) 594
information
non-verbal, paraverbal 4
socially relevant 3
inharmonic source, spectral shape 179t
intensity-independent spectral filtering and integration 371
interceptive eavesdropping, sympatric species 373
intercostal muscles 118
INTERPOL, forensic speech analysis 788
interpolation of voice recordings 566, 702
intonation 134
inverse document frequency (IDF) 726
‘inverse effectiveness principle’ 655
J
Japan Society of Logopedics and Phoniatrics 169
Japanese language 768
K
Kallman syndrome 177
‘Kindchenschema’ (Lorenz) 253
Klattalk 760
knowledge acquisition 750–1, 751–4
L
language familiarity effect 515–16, 521t
discriminating speakers of other languages 520–1
psychological models 529–30
speaker discrimination 521t
speaker identification 516–19, 523t
theories and models 528–30
language processing
asymmetry 212
cortical speech and 211t
current models 212–13
ventral and dorsal streams 211t, 421–3, 744, 747
visual analogy 210f see also speech processing models
language structure
learnability and expressivity 290–1
motivatedness 285, 289
recursive structure 209–12
systematicity 289, 290f, 291–2
language-learning mechanisms, perceptual and cognitive demands on 215–16
language(s)
acquisition
major developmental steps 201–2
role of prosody 155–6
attitude, bilingual children 673
creating a ‘perfect’ 292
discrimination paradigm 520
emotion vocalizations 82–3
evolution 18–19
experience, and prosody 151–5
functional deployability 280–1
information, formant frequencies 7
and its neural substrate 209–12
models, ASR 710
native language
voice discrimination 521
voice recognition 515
phonemes 298
prosodic structure and patterns 143 (p. 927)
prosody see speech prosody
recognizing speakers across languages 515–38
‘same’ vs ‘other’, entertainment industry 674
second-language learning, language familiarity effect 524
specific language impairment (SLI) 526
standard form 670
uses of vocal material 20–2
words and meanings, motivated/ systematic 281–3
laryngeal ventricle 130
larynx 6f, 7
evolution 21
frog 21
function, phonation 7
height 135–6
register 7
lateral belt 347, 415
lateral belt neurons 415f, 418f
lateral magnocellular nucleus of anterior neostriatum (LMAN) 317 [link] –18, 324
lateralization bias, for learned meaningfulness, dogs 381
lateralization studies, vocal emotional processing 371, 378–9
laughter 495–512
acoustic analysis 500f–1, 501 [link] –2t
bonding and group coherence 496
corrective instrument 496
discrimination 502–3f
emotional dimensions 503
evolution 497f
gelotophobia 508–9
non-human primates 495
perception, audiovisual laughter 504
perception of self 507–8
phonation 499
place of articulation 498–9
semiotics 497–9
supraglottal features 499t
taunting laughter 496, 497
learning
action-based 311
algorithms
hidden Markov models (HMM) 728
self-organizing 751
appetitive, dopamine 323
deep 733
end-to-end learning 736–7
languages 156, 215–16, 524 see also birdsong
lesions see brain lesions
Lewy body dementia 895
lexemes 752
lexical linguistic contrasts, signalling 20
lexical network 743–4f, 745t
lie detection 84
likelihood ratios, hidden Markov models 788
limbic system 25, 264, 380, 443, 475
abnormalities, emotion, and prosody 809–10, 817
birdsong 316, 325
cataloguing functional differences 479
demarcating functional boundaries 479
emotional and attitudinal content of voice 24, 443, 907
lesion studies 435, 479, 816f
PTSD 475
rats 371, 398
similarities across species 363–4
six limbic areas 321–2f
song learning 317, 325 see also amygdala
line spectrum pair 758
linear prediction (LP), formants 757
linear prediction vocoder 761
linguistic processing
hypothesis 529
model 531
linguistic prosody 901
listeners, neural responses to infant vocalizations 251–76
long short-term memory cell 733f
long short-term memory recurrent neural networks (LSTM-RNNs) 732–4, 737
long-term average spectrum (LTAS) 7
low-level descriptors (LLDs) 720, 723–5f, 727–9, 733
functionals 723–5
pre-processing 727
M
macaque monkeys see rhesus macaque monkeys
McGurk effect
audiovisual integration in infancy 243
illusion 635
(p. 928)
machine learning, computers 733
machine-based decoding
generation and decoding of voices 683–794
paralinguistic vocal features 719–42
voices and human speech 707–18
magnetoencephalography (MEG) 574, 646–8
‘maluma’ and ‘takete’ shapes 285f
mammals
cortical–subcortical network 16
familiarity recognition 16
neural substrates in vocalization processing 369
polyvagal hypothesis 16
social decision-making, networks regulating 322f
vocalizations 363–8
encoding of valence 367f
voice perception 368–81
maps problem, and mapping problem 145
maps and streams, brain systems 743–5
marmoset monkeys 310
calling learning 404
masking, singing voice 135–6
mate preferences, laughter and 496
maternal voice recognition 18
maximum likelihood (ML) estimate 730
medial frontal cortex (MFC) 442–4
mel-frequency cepstral coefficients (MFCC) 711, 714, 724, 764, 787
mid ectosylvian gyrus (mESG), near-primary auditory cortex 381
midbrain, periaquaductal grey, respiratory control 25
middle superior temporal gyri (mSTG) in heterospecific sound processing 378
midshipman fish, hums 15, 21
mismatch negativity (MMN) 484, 809, 811
protocol, newborn infants 49
MITalk system 758
mobbing 374, 395
model neurons 747–8
monkeys, species other than rhesus macaque
alarm signals 293–7, 310
familiarity recognition 16
marmoset monkeys 310, 404
squirrel monkeys 401, 404
vervet monkeys 293–7, 310 see also rhesus macaque monkeys
morphemes 760
Morton’s motivation-structural rules 399
encoding emotion 374, 376–7
fear versus threat contexts 398–9
motivatedness
modality and transparency 284
and word-meaning associations 291 see also Morton’s motivation-structural rules
motor plan network 744, 745t, 751
mouse, courtship 399
multi-space probability distribution (MSD) 764
MSD-HMMs 769
multidimensional scaling (MDS) 804
multimodality, non-verbal voice parameters 613
multisensory pathways
convergence for voice and face 346–52
neurons in voice-sensitive brain regions 337–62
‘unity assumption’ 552
multivoxel pattern analysis (MVPA) 488–9
music
emotional communication 395
evolution 62
information retrieval 725
symphony orchestra, long-term average spectra 130f
to speech, transfer effects 217–22, 226
voice and 488–90 see also singing voice
music training
language processing and 220
speech segmentation 215–19
speech-learning mechanisms 217–19
vs non-musicians, ERPs recorded at parietal sites 225f
and word learning 209–34
mutism, damage to midbrain periaquaductal grey matter 25
N
N-gram models 711
nasalization 131
natural language processing (NLP) 720, 725
network modules, hubs, and streams 744f, 749f
neural activation (p. 929)
blood oxygen level dependent (BOLD) signal 486, 488
hierarchical structure 480
patterns 747 see also neural responses
neural connectivity, neuroimaging 480–1
neural entrainment 149
neural integration, voice processing 527–8
neural maps 747, 748–9f
neural network
artificial (ANNs) 731–3
convolutional (CNN) 733
feedforward 732
large-scale (AC, amygdala and IFC) 446
local neural subnetworks 439–42
decoding affective meaning 440f
recent evidence 439–42
modules 747, 750
processing affective voices 434–6
recurrent 728, 731, 734
so-far neglected regions 442–4 see also basal ganglia; hippocampus; medial frontal cortex (MFC)
neural plasticity 21–2
neural responses
accent bias 676
amygdala 439, 481, 483
appraisal processing with emotional cues 483
crossmodal influences 350
event-related 371
face vs non-face categories of stimuli 338
‘inverse effectiveness principle’ 655
music vs voice 488
not conspecific-specific 380
participant bias 484
species-specific voice characteristics 340
of STS 480
to angry and neutral prosody 483–4
to infant vocalizations in adult listeners 251–76
measures of parenting 263–4, 267
temporal dynamics 259 [link] –60
to songs 314–15 see also neural activation
neural streams see brain streams
neural substrates, in vocalization processing 369
neurobiological model
neurocomputational models of voice/speech perception 747–51
knowledge acquisition 750–1
mental lexicon 751
model neurons 747–8
neural maps 748–9
neural network module 750
neurodegenerative syndromes
sarcasm processing 906
voice processing in dementias 906, 907 [link] –8f
neuronal microcolumns 148
neuronal oscillations
asymmetric spontaneous 212
early stage of speech processing 149
mechanism for sensory integration 148
in schizophrenia 818
slow, decoding and integration of prosodic features 148–50
theta frequency band 226
neuronal voice coding
neurophysiological location of network modules and hubs 745t
neuropsychological tests, dementia 897t
neuroticism, personality characteristics 484
newborn see infancy
non-human primates
anterior superior temporal (AST) cortex, homology 423
auditory dorsal stream 413, 421, 423
communication 279–308
communication signals, face–voice integration 337
dominance, chacma baboons 403
emotional and social communication 393–412
familiarity recognition 16
great apes, sexual dimorphism 403
laughter 495
model for stages in voice-and-face identity processing 351–4
neurons in voice-sensitive brain regions 337–62
vocal recognition skills 16 see also monkeys
non-linear phenomena (NLPs) 366, 368, 373, 731
non-linguistic voice perception 746 (p. 930)
non-verbal communication 4, 713
affect 447–8, 468
emotion vocalizations 736f, 900
non-verbal voice parameters 613
non-vocal sounds, human and dog 379f
voice information precursor signal 423
norm-based coding, voice identity 543–4, 574, 595, 700
northern fur seals, mother/pup recognition 4years on 403
O
octave intervals 127
onomatopoeia 286
ontogenetic development of voice perception 191–276
openSMILE 724
openXBOW 726
orbitofrontal cortex (OFC) 221f, 347, 809, 905, 907f
affective stimuli 236
amygdala 487
early reactivity in infancy 261
emotional processing of speech 199–200, 245, 380, 461
rapid orientating responses 260–2
organ of Corti 193
output prosody 901
own vocalizations 899
P
paralinguisitic information 561, 715–16, 746, 900
computational 713f, 719
processing 904–5
vocal features, machine-based decoding 719–42
parametric representation, F 0 180, 685–92, 766
parametric synthesis 760, 773
PARCOR (PARtial auto-CORrelation) 758
parenting 262 [link] –5
disrupted parent–infant interactions, neural correlates 265–6
maternal voice recognition 18
postnatal depression (PND) 239, 265
parietal or temporal lobe damage 25
Parkinson’s disease 895–6
basal ganglia dysfunction 25, 810
deep brain stimulation (DBS) 25
neuroimaging, prosodic abilities 809
parrots 310
partials, harmonic 121–3, 126
penguins, vocal recognition skills 16
perceptual bias 285, 299
perceptual linear prediction (PLP) coefficients 711
periaquaductal grey (PAG)
peripheral processes of audition and vocal production 26
respiratory control 25
role in fish 25, 363
role in rats 26, 261
personality profiles, BBC 669, 672
personality traits from voices 20, 484, 585–606
accuracy 593–4
‘big five’ 714, 719
changes with ageing 595–6
consistency 591–2
cross-species approach 310, 354, 376
extrapolation 589, 701f
neural correlates of personality processing 594–5
real world applications 596–8
Realistic Accuracy Model (RAM) 593
summary traits derived from voice 589
trustworthiness and dominance 590, 591f
voice identity, and social context 513–682
voice preferences 586–7
personalized voice synthesis, hidden Markov models 542
pervasive developmental disorder see autism spectrum disorders (ASD)
phobias 475
phonaestheme cluster 289
phonagnosia 39, 48, 53, 539, 546, 553
acquired 855–7f, 870–3, 886–7
brain lesion–behaviour relation 875–7
case reports 869–70, 873–6
lesion studies 865–8t
selective impairments 873–7
apperceptive/apperceptive voice identity processing 864–6, 870–3
defined 855
right or left parietal lobe 873
associative 855–6
defined 539, 855
developmental phonagnosia 546, 856, 877–87 (p. 931)
case reports 877–83
identifying cases 883–5
performance on voice recognition 547f
in relation to current voice identity processing models 884–7 see also voice identity processing
phonation
cultural context 176
larynx function 7
respiratory aspects 118–21
phonatory pressedness 124
phonemes 709, 752
speech recognition 763
phonetic familiarity hypothesis/model 529–30
phonetic variability 541–2, 553, 676
phonological network 743–4f, 745t
phonology 209
physical voice 174–5
physiopsychoacoustic Model 180–1
pitch 396t, 590
anxiety detection 631
audio-sensory perceptual impairment 814
contrasts, basal ganglia nuclei 24
deficits in schizophrenia 811f
fundamental frequency (F 0) 9
near-primary auditory cortex 370, 380–1
newborn perception 198–9
perfect 127
speech prosody 288
tonotopic representations 370
valence encoding 366–7f
vibrato 126
pitch-synchronous overlap add (PSOLA) method, voice morphing 588, 758, 761
pitch-synchronous speech analysis 764
planum temporale (PT) 147f, 148, 156, 195, 433f, 485, 675, 836, 842, 882
politicians, voice preferences 587
polyvagal hypothesis, mammals 16
post-traumatic stress disorder (PTSD) 475
posterior auditory-related cortex (PARC) 147f, 148, 156–8
posterior parietal cortex (PPC) 424, 449, 451
posterior superior temporal cortex (PST) 421, 439
postnatal depression (PND) 239, 265
predators and prey 372
prefrontal cortex 363
preparatory network 745t
presbycusis 157
pressed phonation 124–5
primates see non-human primates
primates, see also monkeys
primordial communication 15–36
processing streams see dorsal –; dual –; ventral –
progressive supranuclear palsy 905
pronunciation 730, 760
variations, hidden Markov models 763
proposagnosia 22, 38
prosodic information
cues to syntax 150–1
interacts with neuronal processing of speech 150
processing in infants 198–9
prosody 900
abnormal processing 806
decision tree model 816f
long-term antipsychotic drug exposure and 807
‘sensory’ aprosodia 808
stroke patients 808
affective, dysfunction 815
affective appraisal, cognitive model 813f
defined 143, 780, 801
diversity of languages 151
expressive 802–3, 815
generation of prosodic features 765–6
language experience and 151–5
measuring 802–34
neural network modelling
decision tree model 816f
schizophrenia and healthy subjects 812
neuroimaging 814
receptive 802, 815
role in natural language acquisition 155–6
in schizophrenia and bipolar disorder 803f, 806
speech recognition 764, 786
structural device 150 see also dysprosodia; speech prosody
prosopagnosia 38
psychiatric diseases 656–9
autistic spectrum disorder 657–8
perception of laughter in 508–9
(p. 932)
psychoacoustic model of voice 179t
psychological models, language familiarity effect 529–30
psychopath 20
psychophysiological interaction (PPI) 485
puberty changes 569, 782
pygmy marmosets
babbling 404
call convergence 404
trill vocalizations 404
R
rats
aggressive behaviour 399
emotional vocalizations 19
familiarity recognition 16
periaquaductal grey (PAG) 26, 261
Realistic Accuracy Model (RAM) 593
recurrent neural network (RNN) 728, 731, 734
vanishing gradient problem 733
register
singing voice 125–6, 396t
falsetto, modal, vocal fry 125–6
repetitive transcranial magnetic stimulation (rTMS) 45
respiratory control, underlying vocalization 25
resting expiratory level (REL) 119
restricted Boltzmann machines 733
reticular formation 363
reward, basic meaning 380f
reward circuitry 265, 267, 381
rhesus macaque monkeys
pitch-selective neurons 370
species-specific calls 415f
speech processing
lateral belt neurons for species-specific calls 415f
model of auditory vocal communication 413–28
temporal lobe voice-sensitive regions 51–2, 338–42
anterior voice-sensitive clusters 345, 348, 352f
vocal brain evolution 51–3
vocal emotional categories, human recognition 376
right posterior temporal sulcus 648–50
connectivity 650–1
right temporal TMS, impairs voice detection 46f
rodents
aggressive behaviour 399
California mice, aggressive behaviour 399
emotional vocalizations 19
familiarity recognition 16
periaquaductal grey (PAG) 26, 261
role of subcortical region 261–3
rolandic operculum 676
S
sampling frequency, sampling period (ASR) 709
sarcasm 20, 804–5, 902–3
dementia patients 905
schizophrenia 656–7, 800–4
auditory sensory deficits 810–11
basal ganglia deficits 810
crossmodal interactions 656
dysprosodia 803, 804, 809, 814
audio-linguistic signal processing model 816, 818–19
expressive and receptive prosodic abilities 805
hallucinations 833–4
misperception and misinterpretation of emotional intent 803
pitch deficits 811f
prosodic dysfunction as premorbid indicator 806
social cognitive disabilities 807
science of voice perception 3–14
Scott, Jimmy 177
sea lions and seals, familiarity recognition 16, 403
second-language learning 156
language familiarity effect 524
segment-level decoding 734–6
segmental features 765
selection cost, and concatenation cost 762
selection-based speech synthesis 762, 766
selective learning, sparrow 311
self perception
laughter 507–8
own vocalizations 899
self-organizing learning algorithms 751
semantic hub 743–4f, 745t (p. 933)
semantic task, musicians vs non-musicians, ERPs recorded at parietal sites 225f
semantics 209, 299
affective 299–300
biosemantics 298–9
sensorimotor hub 744–5t
sexual dimorphism
mammals 403, 570
puberty changes 569, 782
sexual selection 368, 608, 613, 615
attractiveness of voice 608–12
sheep, familiarity recognition 16
shibboleth 668
Siberian hamsters, aggressive behaviour 399
sigmoid function 731
signal
coding systems 707
singing, song learning, social reward 309–36
singing voice 117–42
emotional colouring 133–5
formant cluster 129–30
intensity and masking 135–6
larynx height 135–6
partials 121
phonatory pressedness 124
placement 135
register 125–6
styles 137–9
subglottal pressure 118–20, 123–4f
support defined 121
synthesis and naturalness 136–7
time difference between vowel onset and piano chord 133f
vibrato 126
vibrato onset, rate, and extent 134
vocal tract shape for indicated vowels 128f
voice source 121
vowel onset, lead and lag 133
vowels sung at high pitches 132f
social anxiety 475
schadenfreude 462, 496, 497, 501–2t, 503–4f
social class, groups and 671
social cognition 45, 260, 505
social communication
functions, voice identification 402–5
non-human animals 393–412
rhesus macaque monkeys 413–28
social cues 540
social decision-making, networks regulating 322f
social expressions, evolutionary continuity 4, 5f
social expressions across species (heterospecifics) 4
social feedback in development of vocal forms, humans and songbirds 309
social identity theory (SIT) 671–2
social influence, on vocal structure 404–6
social interaction
disorders and 805
imitative behaviour 672
laughter 495, 497
song 311, 314, 322–3
social meaning, infants, on from perceived vocalisations 198–201
social motivational learning, conclusions 325–7
social perception, infants 235–50
social reward
linking with vocal learning 309–36
socially guided learning (SGL) model 311
song learning 309–36
songbirds 311–24
social signal processing (SSP) 708, 713f
social signals
analysis 715–16
cues 540
social success, attractiveness of voice 613–14
social system network 316
social voice space
principal component analysis solutions and main correlates 591f
prototypical and caricatured voices 596f
valence 590
socially guided learning (SGL) model 311
zebra finch 314–16
socially relevant information 3
sociocultural background 548
sociolinguistics, language variation 669
sociophonetic studies, learning and accents 780–1 (p. 934)
soft margin 735
somatosensory network 745t, 746
song learning, social reward 309–36 see also birdsong
songbirds
dopamine 323 see also birdsong
sound pressure level (SPL) 123f, 134, 138
sound-symbolic phenomena 299
sound-symbolism and crossmodality 288
source-filter modelling 363, 722, 757–8, 761, 773
sparrows
plastic stage of song learning 311–12
tuition 314, 404
white-crowned 311–14, 401, 405
speaker diarization 715
speaker familiarity, identity processing 542–4, 633–4
speaker identification from voice 539–60, 631–3
brain correlates of identity processing 548–50
language familiarity effect 516–19, 521t
other languages voice line-up 519–20
speaker recognition
across languages 515–38
training 522–3
unresolved questions 532
characteristics 19f
role of language in development 526–8
role of stimulus duration 541f
verification or authentication 784, 785
voice biometrics 777–99
speaker-state network 746, 751
species, voice perception 363–91
acoustic similarities across species 365–7
ancient neural pathways 369
heterospecific emotional processing 371–4, 376–7
heterospecific neural processing of acoustic cues 369–71
species-specific vocalizations, selectivity for 417–19f
spectral envelope 711, 764
spectral filtering and integration, timbre 370
spectral mismatch (concatenation cost) 762
spectral perceptual borders 371
spectrograms, male/female voices 8f, 701
speech
music training, and word learning 222–4
temporal fine structure 146
tonal and quantitative languages 216
transfer effects, growing evidence for 217 see also voice and speech
‘speech chain’ 176
speech conversion, text-to-speech (TTS) 597, 758–62
speech corpora
and method of concatenation 763
parallel 764–5
speech generation process model (Voder) 757
speech intervals, analysis 787
speech processing
brain systems 24f
right temporal areas 146
elderly persons, role of prosody 156–8
neural integration 527–8
speech processing models
dual-stream models
auditory vocal communication 413–28
dorsal processing stream 211 [link] –12, 215, 413, 421–3f
language processing 211 [link] –14
memory, unification, and control (MUC) model 214
ventral processing stream 211t, 342–3, 353, 413–14, 421–3
dynamic and cognitive models 211t
speech prosody 143–66
abstract structures (pitch, intensity, and duration) 143
emotions 288
pitch 288
suprasegmental 146
units and computational functions 146
speech recognition, phonemes 763
speech segmentation
infancy 215
music training 215–18
speech comprehension 218
voice-activity detection 720
(p. 935)
speech signals
analysis-synthesis methods 761–4
ASR system 707–11, 713
speech sound generation 760–2
articulatory synthesis 760
formant synthesis 760
parametric synthesis 760
waveform concatenation methods 760
speech spectra 760, 762
speech stream, in cocktail-party paradigm 149
speech synthesis 757–76
deep neural network (DNN) based 763–4
hidden Markov model-based 758–9, 763–5, 769–73
prosodic features 765–6
selection-based 762, 766
voice conversion 764–5
waveform concatenation methods 763
speech and voice 37–9
speech-based emotion recognition 725
speech-based personal assistants 716
speech-learning mechanisms
perceptual and cognitive demands on 215–16
phonological categorization task 223f
squirrel monkeys
aversive vocalizations 401
chuck-calls 404
starlings, song learning 314
startle response 399
state network 746, 751
step function 731
stimuli, mere exposure 616
Stockholm Voice Evaluation System 170
STRAIGHT analysis 764
stress, see also word stress
stress detection, lie detector 84–5
stroke
affective voice processing 432
autobiographical memory 25
foreign accent syndrome 675
parietal or temporal lobe damage 25
prosodic abnormalities 808
right hemisphere, Capgras syndrome 22
voice-identity processing 865–6t
structural features of emotional signals 396t
subcortical regions 363, 371, 476, 486
subglottal pressure
control 119–20f
effect on voice source 118–19, 123
flow glottograms 124f
recoil forces 119
sound pressure level 123f
subthalamic nucleus (STN) 444f
superior temporal cortex (STC)
affective prosody 448
auditory spatial localization 449
decoding the affective meaning from voices 436f
non-verbal expressions 448
posterior 421
feature-sensitivity 439 see also planum temporale
superior temporal sulcus/gyrus (STS/STG) 147f
activation patterns 221f, 237
Brodmann area 22, 237
auditory cortex activation 39
autism (ASD), high-functioning adults 240
connections 489
development 156
endogenous theta oscillations 157
indistinguishable for English words and time-reversed versions 39
lesions, predictors of aprosodic (affective) deficits 144
major fibre bundle 210
posterior superior temporal gyrus (pSTG) 147f
vocal and non-vocal differentiation 41
voice identity and vocal expressions, meta-analysis 549, 879
voice patches 42
voice perception 191
voice-selective areas 548 see also temporal operculum; temporal voice area (TVA)
support vector machine (SVM) 734–6
LIBSVM 736
suprasegmental features
bag-of-words method 720
information 789
vector representation 723
suprasegmental modulations, mediating prosodic functions 151 (p. 936)
suprasegmental speech patterns 82
suprasegmental speech prosody 143–66
swallows, infant calls 403
swear words 288, 289, 292
sympatric species, interceptive eavesdropping 373
synaesthesia 287, 288
syntax, defined 209
synthetic speech see speech synthesis
systematicity 289, 291–2
T
talker identification see speaker identification
task-related neuroimaging studies, dual stream models 211–13
television productions, accents 674
tempo 396t
temporal envelope, cortical entrainment 146
temporal fine structure 146
temporal operculum, bilateral macroanatomical portions 147f
temporal voice areas (TVAs) 39–47, 191
acoustical cues, individual human voices 574
autism spectrum disorders 44
behavioural differences 43–4
causal link with voice perception 45–6
development, infant brain 50–2
fMRI-localized 41, 42
functional role 45–7
gender differences 42
immediate recall performance at GVMT 44
inferior prefrontal cortex bilaterally and amygdala 42
inter-individual variability 42–4
link with fusiform face area (FFA) 549
middle, TVAm 49
negative bias (happy/angry contrast), infants 199–200
part of extended ‘vocal brain 42, 43f
position 40
processing
acoustic signal properties 549
strong-intensity emotional prosody 198
species-specificity, human vs animal vocalizations 51–3
timbre analysis 7, 45
voice patches 42, 45, 49
voice-identity processing 549
term frequency inverse document frequency (TF-IDF) 726
terminal analogue synthesis 760–2
text-to-speech (TTS) aids 597
TTS conversion systems 758–62
thalamus, relaying information to vocalizing structures 25
Thatcher illusion 38
The Gambia, cross-cultural development of voice-sensitive activation 240–1
threshold of hearing (TOH) 708f
timbre 7, 396t
analysis 7, 45
spectral filtering and integration 370
twang 138
time course
processing vocal emotion expressions 459–72
of stimuli 646, 842
time domain–pitch synchronous overlap add (TD–PSOLA) 758, 761
changing fundamental frequencies 761f
time-varying source characteristics 179t
ToBI system 766
tonotopic organization
cochlea and auditory pathway 370, 614
maps 809
tracheal pull 120
transcranial alternating current stimulation (tACS) 817
transcranial magnetic stimulation (TMS) 45, 46f, 438, 699
repetitive (rTMS) 45
transfer effects 217–27
of adaptation 576
bottom-up accounts 226
evidence for 217–20
music to speech 217–22, 226
interpretations 219–20
shared neural networks 220–2
transparency, defined 286
‘tri-phone models’ 763
tripartite emotion expression and perception (TEEP) model 62–3f
Turnbull direction, voice -recognition evidence 629
U
unfamiliar voice 241, 563
ungulates, sexual dimorphism 403
universals, calls of other species 397
V
valence 24, 70, 72, 77–8, 80, 134, 144, 238, 242
social voice space 590
valence encoding
in mammals 367f
pitch 366
vector quantization 724–5f
ventral processing stream 211t, 342–3, 353, 413–14, 421–3
ventral tegmental area 257f, 258, 317, 399, 809
verbal hallucinations 831
vertebrates
vocalizations 363–8
voice perception 368–81
vervet monkeys, alarm signals 293–7, 310
vibrato 126, 134
vision, and voice 903
visual input processing 745t, 746
Viterbi algorithm 712, 762
vocal attractiveness see attractiveness of voice
‘vocal brain’, development and evolution 26, 49–53
vocal communication see communication
vocal expression
19 emotions across cultures (VENEC) database 83
acoustic low-level descriptors 721–3
key features 26
vocal flexibility 310
vocal folds 64, 117–18
closure 364
fundamental frequency (F0) values 7, 64–5
puberty changes 569, 782
structure 175
vocal learning, social reward 309–36
defined 309
humans and songbirds, evolutionary divergence 309
vocal patterns 5, 15–18, 20, 26–7, 71, 902
Vocal Profiles Analysis Scheme (VPAS) 170
vocal signals see animal calls/signals
vocal tract 6f
analogue 760
shape
for indicated vowels 128f
smile 460
structure in vertebrates 363–4
transfer function 179t, 762
Vocaloid system 137
Voder (voice operating demonstrator) 757
voice 167–89
covered 126
defined 9, 176
extraction of information 37–8
formant frequency 127, 132, 135–7, 139
fundamental frequency (F 0) 125–32f, 143–4
and music 488–90
nature of 176–8
alternative approach 178–81
integral approach to study 181
psychoacoustic model 179t
source 121–3
two sides of (production, perception) 4, 8
and vision 903 see also vocal
voice activity detection 720
voice attractiveness see attractiveness of voice
voice averageness 615–16
voice biometrics see biometrics
voice cells
auditory response characteristics 344f
compared with face cells 341
voice characteristics, emotion, gender and age 629–31
voice cognition 38, 47
development before birth 38
voice conversion 686, 699, 759, 764–6, 768, 773
voice definition 9, 176
voice detection gating stage 46–7
voice disguise 637–8, 781–3
voice emotion
computers 69 see also emotion vocalizations
(p. 938)
voice and face
communication signals 337
integration, in infancy 241–2
multisensory influences and pathways of convergence 346–52
voices as ‘auditory faces’ 38–9
voice features see vocal expression
voice gender and age
age, acoustic cues and correlates 38, 566–70
differences in voice perception, behavioural level 42, 570
differences in voice recognition 548, 570
evidence in court of law 629–31
femininity 610–11
masculinity 608–10
perceptually relevant cues 570
Voice Handicap Index 173
voice identity
cerebral representation 570–1
outstanding questions 576–7
defined 542
familiarity and specificity 562–5
unfamiliar voice discrimination 563
forensics 617–44
facial overshadowing effect 636–7
redundancy of information 635
testimony 627–8
identifiers 407, 672
missed identification or false elimination 783
network, voice recognition tasks 549
personality and social context 513–682
social functions of communication 402 [link] –5 see also speaker identification
voice identity processing
acquired and developmental phonagnosia 855–92
apperceptive and associative 860–1t, 864, 869
brain lesion–behaviour relation 875
case report evidence for selective 874–5
dementia 904
electroencephalography 549
IFC 549
impairments see phonagnosia
meta-analysis of neuroimaging studies 549, 879
model 856–9
norm-based coding 543–4, 574, 595, 700
STS/STG 549
summary 887
tests 859–60t see also voice processing
voice measurement 168–76
current approach 168–70
dysphonia severity index (DSI) 173
evaluation systems 168–70
multidimensional voice program (MDVP) 173
physical voice 174–5
summary 176