Schizophrenia is associated with deficits in the ability to decode emotion based upon modulation of intonation (affective prosody)
(1,
2) . Such deficits, along with disturbances in facial affect recognition, contribute substantially to impaired social and global outcome in schizophrenia
(3,
4) . Traditionally, such deficits have been attributed to generalized neurocognitive dysfunction, particularly involving processes such as executive function and working memory
(5), as well as limbic dysfunction
(6) . More recently, however, specific contributions of auditory processing deficits have been noted as well, with deficits in simple tone matching correlating with deficits in prosodic identification
(4) . These findings predict both sensory-level and cognitive-level contributions to impaired prosodic processing. This study investigates the neural substrates of impaired auditory emotion detection with a combined behavioral and structural imaging approach.
Neuroanatomical abnormalities in schizophrenia have been extensively documented and shown to involve both white and gray matter structures (e.g., references
7,
8) . In order to further evaluate the neural substrates of impaired prosodic processing in schizophrenia, voxelwise correlation analyses were performed relative to magnetic resonance diffusion tensor imaging studies. Diffusion tensor imaging is sensitive to white matter disturbances in schizophrenia and has previously been used to evaluate structure-function correlations within frontal cortical regions
(7 –
9) . To our knowledge, this is the first study to evaluate structure-function relationships within auditory sensory brain regions in schizophrenia with diffusion tensor imaging.
Diffusion tensor imaging studies are typically analyzed with fractional anisotropy, a parameter that reflects the relative diffusion of water parallel to the long axis of structural boundaries, such as axonal membranes or myelin, relative to diffusion perpendicular to those boundaries
(9) . Reduced fractional anisotropy in schizophrenia is thought to reflect either axonal or myelin-related pathology
(9), both of which have been documented in schizophrenia
(10) . Reduction in fractional anisotropy may also reflect disorganization of fiber bundles (“disconnectivity”), although in areas of crossing fibers, such disconnectivity may lead to paradoxical increases in fractional anisotropy
(11) .
Regardless of underlying etiology, fractional anisotropy has proved effective for evaluating structure-function relationships. For example, increased impulsivity has been found to correlate selectively with reduced fractional anisotropy in inferior frontal white matter
(12,
13), whereas impairments in executive processing have been found to correlate selectively with reduced fractional anisotropy in anterior prefrontal regions
(14,
15) . Impairments in visual processing have been found to correlate selectively with reduced fractional anisotropy in optic radiations
(16) .
In this study, we analyzed relationships between regional fractional anisotropy in schizophrenia patients and healthy comparison subjects and performance on two separate tasks: the Distorted Tunes Task
(17), which measures the ability to detect incorrect notes within common melodies, and the Voice Emotion Identification Task (VOICEID)
(2), which measures the ability to decode emotions based upon tone of voice. The Distorted Tunes Task was originally developed to assess genetic contributions to musical pitch perception abilities and shows high heritability within families
(17) . The VOICEID task has been used by ourselves
(4) and others
(2,
3), and it is highly sensitive to affective prosodic perception deficits in schizophrenia.
In the brain, auditory projection paths begin at the level of the medial geniculate nucleus and project to the primary auditory cortex (Heschl’s gyrus, A1) through superolaterally projecting thalamocortical (acoustic) radiations. From the auditory cortex, fibers project to higher brain regions along both ventral and dorsal divisions of the arcuate fasciculus
(18) . The ventral stream is primarily involved in acoustic feature analysis
(19), whereas the dorsal stream is thought to process spatial and spectral motion
(20), including speech
(19) .
Affective prosodic comprehension can be conceptualized as involving a “three-stage processing chain” that begins with sensation (stage 1) in the primary auditory cortex and continues with integration (stage 2) within ventral aspects of the temporal cortex and the superior temporal sulcus. There, aspects of the acoustical information are tagged as affective. Processing proceeds finally to the cognitive stage (stage 3) in inferior frontal regions, where this information is evaluated both semantically and contextually (reviewed in reference
21 ).
The present study used two specific tasks—the Distorted Tunes Task and the VOICEID—to evaluate functioning of sensation and integration phases of affective prosodic comprehension. We have previously demonstrated that individuals with schizophrenia show deficits in auditory sensory-level performance, as reflected in impaired tone-matching ability
(22), as well as reduced auditory event-related potential generation
(23) . For this study, we hypothesized that joint impairments in Distorted Tunes Task and VOICEID performance in patients would correlate significantly with reduced fractional anisotropy primarily in basic auditory brain regions (i.e., acoustic radiations) that subserve sensation and that VOICEID impairments would show additional correlations in ventral and dorsal stream projection regions that subserve integration and cognitive evaluation.
As a control condition, we evaluated correlations between fractional anisotropy and performance levels on the Wisconsin Card Sorting Test, a widely used, visually based test of executive/prefrontal performance that would not be expected to show correlations with auditory sensory regions. In a prior study, an increased perseverative error rate on the Wisconsin Card Sorting Test was found to correlate with reduced fractional anisotropy in specific regions of the cingulum
(14) . Here, patterns of voxelwise correlations to the Wisconsin Card Sorting Test were compared to patterns observed for the Distorted Tunes Task/VOICEID.
To characterize the further extent of prosodic processing dysfunction in schizophrenia, a follow-up study examined the integrity of nonaffective prosodic processing, such as the ability to distinguish statements from questions (declarative versus interrogative prosody) or to distinguish between alternative meanings based upon which syllables within a sentence are emphasized (stress prosody). We hypothesized that impairments in pitch perception, which were statistically associated with impaired affective prosodic performance (i.e., emotion recognition deficits) in a prior study
(4), might also be correlated with impairments in the ability to make even nonaffective discriminations, such as the ability to differentiate statements and questions (i.e., declarative versus interrogative intent), based upon tone of voice. Furthermore, such findings would provide convergent evidence for the contribution of sensorial abnormalities to social communicatory disturbances in schizophrenia.
Methods
Experiment 1
Participants
Nineteen patients (one woman) meeting DSM-IV criteria for either schizophrenia (N=17) or schizoaffective disorder (N=2) participated in this study. Diagnoses were based upon the Structured Clinical Interview for DSM-IV (SCID), with all available clinical information used. Twelve patients were receiving only second-generation antipsychotics (primarily risperidone or olanzapine), two patients were receiving clozapine, two patients were receiving only traditional antipsychotics (haloperidol), and three patients were receiving combination treatment. The mean chlorpromazine equivalency dose was 1298.3 mg/day (SD=780.4). The mean illness duration was 15.7 years (SD=8.7). The clinical ratings of patients followed the methods described previously
(4) . The patients had a mean rating of 35.5 (SD=7.1) on the Brief Psychiatric Rating Scale and a mean rating of 32.5 (SD=12.8) on the Schedule for the Assessment of Negative Symptoms.
The healthy comparison group consisted of 19 (six women) staff volunteers or individuals who responded to local advertisements. The comparison subjects had a mean age of 36 years (SD=9). The healthy subjects and the patients differed significantly in mean IQ (112, SD=9, versus 95, SD=16, respectively; t=3.9, df=32, p<0.001) and grades achieved (16 years, SD=2, versus 11, SD=3; t=6.05, df=36, p<0.001).
The local institutional review boards approved all experimental procedures, and all subjects provided written informed consent after study procedures were explained fully. The participants received $10/hour for participation.
All subjects except one patient were right-handed, as assessed by methods described previously
(4) . Within this group, 12 of 19 patients and seven of 19 healthy comparison subjects had been in a prior study
(4) .
Behavioral measures
Behavioral measures included the VOICEID
(2), which consists of 21 spoken sentences each conveying one of six different emotions (happiness, anger, fear, sadness, surprise, or shame), the Distorted Tunes Task
(17), which consists of 26 popular tunes of which 17 are rendered melodically incorrect by changing the pitch of specific notes within the tune, and the Wisconsin Card Sorting Test
(24) . For the VOICEID and the Distorted Tunes Task, the primary dependent measure was percentage of stimuli identified correctly. For the Wisconsin Card Sorting Test, which was administered to patients only, the perseverative error rate was used as a primary dependent measure. All behavioral tasks were presented through a stereo player at a comfortable hearing level. For all subjects, behavioral task data were collected within 6 months of magnetic resonance imaging (MRI) acquisition.
MRI
Scanning was performed on a 1.5-T Siemens Vision system (Erlangen, Germany) at the Nathan Kline Institute Center for Advanced Brain Imaging. Three main sequences were acquired: a magnetization prepared rapidly acquired gradient echo scan (TR/TE=11.4/4.9 msec, matrix=256×256, field of view=300 mm, number of excitations=1, 1.17-mm slice thickness, 172 slices, no gap), a turbo spin echo scan (TR=5000 msec, TE=22/90 msec, matrix=256×256, field of view=224 mm, number of excitations=1, 5-mm slice thickness, 26 slices, no gap), and a diffusion tensor imaging sequence. The diffusion tensor imaging sequence has been described elsewhere
(25) (TR/TE=6000/100 msec, matrix=128×128 [interpolated to 256×256], field of view=240 mm, 5-mm slice thickness, 20 slices, no gap) and employs a double echo pulse to minimize eddy current effects
(26) . The sequence entailed four acquisitions of six diffusion-weighted images (b=1000 s/mm
2 ) for 20 slices. In addition, two acquisitions without diffusion weighting (b=0 s/mm
2 ) were acquired.
Fractional anisotropy was calculated with custom software. The b=0 images were corrected for susceptibility-induced distortion and were transformed into Montreal Neurological Institute space with methods described elsewhere
(13,
27) . Images were matched to a template in Montreal Neurological Institute space, and the final voxel size was 2×2×2 mm
3 . A white matter mask was computed from the mean spatially normalized patient fractional anisotropy image with a nonparametric image segmentation algorithm
(28) and was applied to all of the standardized images. This approach limited the voxels to white matter and resulted in fewer statistical comparisons, thereby lowering the probability of false positive tests.
After transformation into Talairach space, the images were masked such that only voxels with data present for all subjects were included in the analyses. This ensured that missing data, which would have zero values, would not drive correlations.
Statistical analysis
Between-group comparisons of prosodic (VOICEID) and pitch (Distorted Tunes Task) performance were performed with repeated-measures analysis of variance. Spearman correlation coefficients were used to measure the relationships between task performances within groups.
For neuroimaging data, a voxelwise correlation approach was used similar to that of Baudewig et al.
(29), with thresholds as described previously
(13,
15) . This approach protects against false positive correlations with voxels that are significant at p≤0.05 that are grown from a seed voxel with a significance value of p<0.005. To supplement these criteria, only clusters with more than 11 contiguous voxels were considered significant. To assess areas of shared correlation across tasks, maps of each task-fractional anisotropy correlation cluster were overlaid, and a new map representing overlap regions (identically thresholded for each task) was generated.
Our voxelwise correlation analysis was two-tailed; nevertheless, we focused a priori only on correlations in which worse performance correlated with fractional anisotropy reductions.
Experiment 2
Participants
The participants consisted of 24 patients with schizophrenia (three women) meeting DSM-IV criteria for either schizophrenia (N=21) or schizoaffective disorder (N=3) and 17 healthy volunteers (three women). Healthy subjects and patients were of similar age (mean=37.8 years, SD=10.2, versus mean=32.5, SD=10.6, respectively) but differed in verbal IQ (mean=109.7, SD=10.4, versus mean=94.1, SD=7.5; t=4.4, df=33, p>0.001) and education (mean=16, SD=1, versus mean=11, SD=2; t=–7.5, df=35, p>0.001). The patients were receiving typical and/or atypical antipsychotic medication (chlorpromazine dose: mean=1373 mg/day, SD=829).
Behavioral measures
In addition to the VOICEID
(2), the Distorted Tunes Task
(17), and the Wisconsin Card Sorting Test
(24), nonaffective prosody was assessed with Weintraub’s Sentence Discrimination and Semantic Comprehension Tasks
(30) : Twenty-five pairs of semantically neutral sentences, such as “Jack climbed the mountain,” were repeated after a brief delay. Seventeen of the pairs differed because of either stress (where stressed emphasis shifted between the subject and object of the sentence) or declarative/interrogative differences. Eight pairs were identical. The subjects were asked whether the sentences were said in the same or a different manner. The score reflected percent correct. Additionally, scores were broken down into percent correct for stress and declarative/interrogative distinctions.
The Semantic Comprehension Task consisted solely of 16 utterances, expressing either declarative (eight utterances) or interrogative (eight utterances) intent. The subjects were asked whether the speaker posed a question or a statement. The score reflected percent correct.
Data analysis
Between-groups effects across all auditory measures were assessed with multivariate analysis of variance, with post hoc contrasts for specific measures (t tests). In the event of ceiling or floor effects, Mann-Whitney nonparametric measures were employed. Nonparametric signal detection analyses used A′
(31) and B′′
(32) as measures of sensitivity and bias, respectively.
Correlation matrices between nonaffective and affective prosody, as well as pitch perception, were calculated within the patient group only and submitted to principal components analysis. Principal components analysis, factor selection, and rotation were conducted on eigenvalues≥1 (see reference
4 ). All statistical tests were two-tailed, with alpha≤0.05, and computed with JMP software (SAS Institute, Cary, N.C.).
Results
Experiment I
Behavioral results
The patients showed significantly impaired performance across both tests (F=56.6, df=1, 36, p<0.001) (
Figure 1 ). The group-by-task interaction was nonsignificant (F=2.5, df=1, 36, p<0.13). Distorted Tunes Task and VOICEID scores were significantly correlated both across groups (r
s =0.55, N=38, p<0.001) and within the patient group alone (r
s =0.54, N=19, p<0.02) but were not significantly correlated for comparison subjects (r
s =0.25, N=19, p<0.30). For patients, poorer Wisconsin Card Sorting Test performance (increased perseverative errors) correlated with poorer VOICEID (r
s =–0.55, N=17, p<0.02) but not Distorted Tunes Task (r
s =–0.23, N=17, p<0.34) performance. Additionally, among patients, medication dosage did not correlate with neuropsychological measures (all p>0.22).
Structural correlations within patients
VOICEID
As predicted, impaired VOICEID performance was significantly correlated with fractional anisotropy in regions lying between the auditory thalamus (medial geniculate nucleus) and the primary auditory cortex (acoustic radiations,
Figure 2, top). These regions are known to contain auditory radiations from Montreal Neurological Institute space to A1 (
Figure 2, bottom). In addition to these regions, significant correlation clusters were observed bilaterally along the ventral and dorsal auditory pathway in temporal and frontal white matter (
Figure 3 A). Other areas in which correlation clusters were observed include the corpus callosum splenium and body, as well as the posterior commissure and the right cingulum (
Figure 3 D). Clusters were also observed adjacent to both left and right amygdala medial laterally (
Figure 3, left). Additional areas of significant correlation included white matter in Brodmann’s regions 44, 45, and 46 and the orbitofrontal cortex (
Figure 3 ). R
2 values for each region are shown in
Table 1 .
Distorted Tunes Task
Also as predicted, the pattern of correlations of fractional anisotropy with the Distorted Tunes Task closely resembled the pattern of correlations with VOICEID (
Figure 3 B) (see
Table 1 ). Regions of overlap included primary auditory radiations, dorsal and ventral stream auditory projections (
Figure 3 ), and the amygdala (
Figure 4, left). However, no correlations were observed in regions 44, 45, or 46 or in the orbitofrontal cortex.
Wisconsin Card Sorting Test
As opposed to the Distorted Tunes Task and the VOICEID, no significant correlations of fractional anisotropy with the Wisconsin Card Sorting Test were observed in regions of the acoustic radiations or along either dorsal or ventral auditory radiations (
Figure 3 C) (see
Table 1 ). Moreover, there were no significant areas of overlap between Wisconsin Card Sorting Test and VOICEID correlations (
Figure 3 A,
Figure 3 C). Significant correlation clusters were observed between the Wisconsin Card Sorting Test perseverative error scores and fractional anisotropy in white matter in the regions of the right anterior cingulate gyrus (
Figure 3 F). Even in frontal regions, however, little overlap was observed between correlation clusters for the VOICEID and the Wisconsin Card Sorting Test. Finally, in contrast to the VOICEID and the Distorted Tunes Task, no correlations in the vicinity of the amygdala were observed (
Figure 3 G).
Experiment 2
The patients performed significantly worse than the comparison subjects across all prosodic measures (
Table 2 ) with no significant group-by-task interaction (p>0.50). On the Sentence Discrimination Task, the patients showed significant decrements in performance on interrogative/declarative items as well as stress items (all p<0.01).
Within nonaffective prosody measures, the patients were significantly less sensitive (A′) than comparison subjects on both the Sentence Discrimination Task (mean=0.86, SD=0.19, versus mean=0.98, SD=0.32, respectively) and the Semantic Comprehension Task (mean=0.83, SD=0.17, versus mean=0.98, SD=0.03) in detecting differing prosody or interrogative intent, respectively (p<0.001). However, there were no significant differences in terms of bias (B′′) in the Sentence Discrimination Task (mean=0.54, SD=0.14, versus mean=0.68, SD=0.11) (p>0.50) or in the Semantic Comprehension Task (mean=0.37, SD=0.60, versus mean=0.72, SD=0.67) (p>0.08).
An examination of the interrelationship among prosody measures, pitch perception, and executive processing using principal components analysis (
Figure 4, left) yielded only two criteria-meeting components, which when rotated revealed that the Distorted Tunes Task and the Sentence Discrimination Task loaded exclusively onto the first component (0.77 and 0.82, respectively) and the Wisconsin Card Sorting Test onto the second (0.95). The Semantic Comprehension Task, however, loaded significantly on both components (component 1=0.59, component 2=0.61). Correlations between affective prosody scores and their nonaffective counterparts were highly significant (Semantic Comprehension Task by VOICEID [r=0.64, N=24, p<0.001], Sentence Discrimination Task by VOICEDIS [r
s =0.61, N=21, p<0.004]) (
Figure 4 ).
Finally, patient performance on nonaffective prosody measures did not significantly correlate with illness duration or medication dosage (all p>0.20).
Discussion
Emotion recognition deficits are associated with poor social and global functional outcome in schizophrenia
(3,
4,
33), yet neural correlates have been investigated to only a limited degree. This study used a combined behavioral and voxelwise investigation to localize areas of potential relevance to impaired auditory prosodic processing. Significant correlations were observed between prosodic processing deficits and regions (e.g., prefrontal, periamygdalar) that are classically associated with neurocognitive dysfunction in schizophrenia
(6) . However, prominent deficits were also observed with reduced fractional anisotropy in regions such as the primary auditory radiations and the dorsal and ventral auditory streams, suggesting that impairments in voice emotion recognition arise from sensory-level disturbance in schizophrenia as well. Thus, these findings support our prior observations of processing deficits within, rather than across, sensory modalities
(4) and suggest that functional and structural deficits within early sensory regions contribute to the overall pattern of cognitive dysfunction in schizophrenia.
In this study, structure-function relationships were assessed with voxelwise diffusion tensor imaging analysis. In contrast with volumetric approaches targeting gray matter regions, diffusion tensor imaging provides a measure of integrity of white matter tracts in the brain, which in turn may serve as a measure of dysconnectivity or dysmyelination within specific brain pathways
(7,
8) . Fractional anisotropy reductions in schizophrenia have been observed across brain regions, consistent with underlying reductions in oligodendrocytic markers
(10) . Furthermore, regionally specific correlations have already been observed for several well-validated measures
(12,
13) . Within this study, for example, reduced Wisconsin Card Sorting Test performance in schizophrenia correlated with reduced fractional anisotropy within the cingulate fasciculus, consistent with prior investigations in the field
(14), as well as functional brain imaging studies
(34) . In contrast, no significant correlations were observed in auditory regions. Using the present approach, we have also previously demonstrated significant associations between verbal declarative memory, attention, and fractional anisotropy in task-relevant regions, attesting to the regional specificity of the current analysis approach
(15) .
The primary finding of this study was that reduced performance on the Distorted Tunes Task and the VOICEID correlated independently with reduced fractional anisotropy in brain regions containing primary auditory radiations from the medial geniculate nucleus of the thalamus to Heschl’s gyrus and subsequent dorsal and ventral stream auditory projections. Additional areas of commonality included the genu and splenium of corpus callosum and the middle cingulate gyrus, consistent with lesion and neuroimaging studies implicating these regions in musical pitch and affective prosodic processing
(19,
35,
36) . Correlation clusters lateral to the amygdala were also observed in both tasks. As such, these findings indicate significant contributions of low-level auditory processing deficits to higher-order failures of neurocognition in schizophrenia.
In addition to areas of commonality, we also observed differences in correlation patterns between the pitch and prosodic tasks, particularly in the frontal cortex. Here, prosodic correlations extended more anteriorally to Brodmann’s areas 44, 45, and 46, which are implicated particularly in speech perception
(19) (
Figure 2, bottom). Other areas involved in the affective evaluation of speech, such as the prefrontal and orbitofrontal cortex
(21), also showed significant prosody-fractional anisotropy but not pitch-fractional anisotropy correlations (
Figure 2, bottom;
Figure 3 B). The somewhat greater severity of prosodic versus pitch identification deficits observed in our patient group may reflect the greater extent of brain involvement engaged by the prosodic identification task.
The finding of sensory-level correlations in patients with schizophrenia is consistent with well-replicated deficits in auditory processing that have been demonstrated with both electrophysiological (e.g., reference
37 ) and behavioral (e.g., reference
22 ) approaches. Structural imaging studies of the auditory primary cortex are conflicting with some
(38) —but not all
(39) —studies, finding reduced volume of superior temporal auditory regions. However, postmortem changes analogous to those observed in the prefrontal cortex have been demonstrated in the auditory cortex as well
(40), supporting auditory cortical involvement in the pathophysiology of schizophrenia.
In our second experiment, large effect size deficits (d>1.1) were observed in nonaffective prosodic perception and were correlated with deficits in affective prosodic perception. Thus, for instance, the subjects had difficulty in differentiating statements and questions based upon tone of voice, just as they had difficulty in differentiating sad from happy utterances. Both types of deficits are related to impaired pitch perception abilities, suggesting significant audiosensory antecedents. These findings indicate that dysprosodia, rather than being associated purely with emotional perceptual disturbances in schizophrenia, affects broader aspects of cognitive and social communicatory functioning.