During human interaction, individuals communicate information not only verbally but also through tone of voice. Individuals modulate their voices differently during different emotional states-for example, speaking quietly and with little animation when sad, speaking loudly and with great animation when happy, and shouting when angry (
1,
2). Detection of this nonverbal information allows listeners to adjust their behavior accordingly and thus to perform adequately in social situations. Thus, impairments in auditory emotion recognition (AER) and social cognition contribute strongly to poor psychosocial outcome in schizophrenia (
3–
5). The pattern and basis of AER deficits are an area of active research, as is the relationship between AER deficits in schizophrenia and deficits in more global forms of cognition dysfunction. In the visual modality, the ability to recognize emotion from faces has been operationalized using well-validated face recognition tests, such as the Penn Emotion Recognition Task (ER-40) (
6) and the Ekman 60 Faces Test (
7). In the auditory modality, however, batteries for assessment of emotion recognition remain relatively underdeveloped, limiting opportunities for clinical assessment and research.
To date, multiple batteries have been used for studying AER impairments in schizophrenia, with no standardization of the psychoacoustic properties of stimuli across studies. Potentially as a result of this variation, different patterns of AER deficits have been reported (
8–
10), with some studies suggesting a generalized pattern of deficit (
11) and others hemisphere- (
12), emotion- (
13), or valence-specific (
14) patterns. We sought in this study to validate the use of a novel, psychoacoustically well-characterized auditory battery in a large group of schizophrenia patients as a means of investigating both the pattern and the underlying basis of social cognition impairments in schizophrenia.
In auditory communication, overlapping but distinct patterns of psychoacoustic features contribute to communication of discrete auditory emotional percepts (see reference
2 for a comprehensive review). For example, specific pitch-related vocal features such as mean pitch, pitch variability, and pitch contour are critical for communicating emotions such as happiness, sadness, and fear, whereas specific intensity-related aspects (i.e., loudness-related), such as mean voice intensity, intensity variability, and voice quality as reflected in the amount of high-frequency energy over 500 Hz, are particularly crucial for communicating the percept of anger.
Furthermore, several emotions may be communicated in more than one fashion (
15). For example, anger may be communicated by increased voice intensity (i.e., shouting, or “hot” anger) or by reduction in the mean base pitch of the voice without increasing intensity (irritation, or “cold” anger) (
1). Similarly, conveyance of strong happiness (“elation”) may depend on somewhat different cues than conveyance of weaker forms of happiness (
1) (for examples, see the data supplement that accompanies the online edition of this article). To date, auditory emotion batteries used in schizophrenia research have not distinguished the precise tonal features used by different actors in portraying specific emotions, potentially contributing to a heterogeneity of findings across studies.
In recent years, we (
16,
17) and others (
18,
19) have documented deficits in basic pitch perception (i.e., tone matching) in schizophrenia related to structural (
20) and functional (
21,
22) impairments at the level of the primary auditory cortex. In a recent emotion study (
23), we evaluated the performance of schizophrenia patients on a novel emotion recognition battery using psychoacoustically characterized stimuli and observed deficits in pitch-based but not intensity-based emotion processing. A similar pattern of deficit was observed more recently for emotion conveyed by frequency-modulated tones (
24). In the present study, we extend these findings to a larger patient sample and validate a brief version of the test for widespread use.
Along with the specific sensory-level contributions to AER, other potential contributors include emotion-level dysfunction and general cognitive impairments. In this study, we evaluated visual emotion recognition using the ER-40 test (
25), which includes emotions similar to those presented in our auditory emotion battery. We also included the processing speed index (PSI) from the WAIS-III, which contains tests that are thought to be particularly sensitive to generalized cognitive impairment in schizophrenia, such as the digit symbol subtest (
26,
27).
We hypothesized that patients would show emotion recognition deficits for stimuli in which emotion was conveyed primarily by pitch-based but not intensity-based measures and that correlations between basic pitch processing and AER would remain significant even after covariation for deficits in nonauditory emotion recognition and general cognitive function. We also evaluated the replicability of the findings across two separate performance sites and thus applicability to general schizophrenia and neuropsychiatric populations.
Method
Participants
The primary sample included 92 chronically ill patients with schizophrenia or schizoaffective disorder (
Table 1). Patients were drawn from chronic inpatient units and residential care facilities associated with the Nathan S. Kline Institute in New York. All were receiving conventional or atypical antipsychotics at the time of testing. Diagnoses were determined by the Structured Clinical Interview for DSM-IV. Comparison subjects (N=73) were volunteers who were on the staff or who responded to local advertisements. The groups did not differ on mean parental Hollingshead socioeconomic status (
28), which reflects level of education and employment on a scale of 0–100. As expected, the patient group had a lower socioeconomic status on average than the comparison group (p<0.001) and parents (p<0.0001).
Clinical assessments included ratings on the Positive and Negative Syndrome Scale (PANSS) (
29) and the problem-solving subscale of the Independent Living Scales (
30). All study procedures received approval from the Institutional Review Board at the Nathan S. Kline Institute. All participants had the procedure explained to them verbally before giving written informed consent.
In the replication sample (
Table 2), participants were drawn from psychiatric populations and both age-matched (N=31) and more general normative populations (N=188) associated with the University of Pennsylvania in Philadelphia. As no significant differences were observed between the two comparison groups on any of the dependent measures, the two groups were combined in statistical analyses.
Procedure
For the full version of the task, stimuli consisted of 88 audio recordings of native British English speakers conveying five emotions-anger, disgust, fear, happiness, and sadness-or no emotion, as previously described (
2,
23). Acoustic features for these stimuli were measured using the Praat speech analysis software program (
www.fon.hum.uva.nl/praat). (Sample stimuli are included in the online data supplement.)
For the brief version, a subset of 32 stimuli were selected incorporating all emotions except disgust, which was eliminated to decrease the number of choices and therefore the number of stimuli. Pitch-based stimuli (N=17) were selected based on a previous study showing that these stimuli were well recognized as expressing the intended emotion (i.e., >60% correct scores) (
23). In addition, physical characteristics of the stimuli such as base pitch and pitch variability were close to the mean value for the intended emotion (
Figure 1).
Stimuli selected as intensity based were confined to portrayals of anger and happiness (N=9) and differed from pitch-based stimuli of the same emotion on physical intensity measures, such as overall intensity or high-frequency energy. Intensity-based anger portrayals would all be recognized as loud based on overall intensity (
Figure 1) and thus represent “hot” anger as opposed to “cold” pitch-based anger. Pitch-based and intensity-based happiness differed in sound quality (high-frequency energy), with pitch-based stimuli showing features most characteristic of “happiness” and intensity-based happiness showing characteristics of “elation” as described by Banse and Scherer (
1).
Participants were tested on either the full battery with the original 88 items (67 schizophrenia patients and 32 comparison subjects) or the 32-item brief version (40 schizophrenia patients and 53 comparison subjects). Stimuli representing different emotions were intermixed and presented in consistent order across subjects. After each stimulus, subjects were asked to identify the emotion (six possible alternatives in the full version; five in the brief version) as well as the intensity of portrayal on a scale of 1 to 10. A limited number of participants received both the full and abbreviated versions of the task (15 schizophrenia patients and 12 comparison subjects). In this subgroup, no significant group-by-version interaction was observed. For subsequent analyses, these participants were counted only once, with data from the full battery being used for statistical analysis.
Tone matching was assessed using pairs of 100-msec tones in series, with a 500-msec intertone interval. Within each pair, tones (50% each) either were identical or differed in frequency by a specified amount in each block (2.5%, 5%, 10%, 20%, or 50%). Participants indicated by keypress whether the pitch was the same or different. Three base frequencies (500, 1000, and 2000 Hz) were used within each block to avoid learning effects. In all, the test consisted of five sets of 26 pairs of tones.
Recognition of visual emotion was assessed using the ER-40 (
31–
33). Global cognitive functioning was assessed using the WAIS-III PSI, which includes the widely studied digit symbol coding subtest (
27) and the symbol search subtest.
Data Analysis
The accuracy of AER was assessed using repeated-measures analysis of variance with either emotion (happy, sad, anger, fear, disgust, no emotion) or feature (pitch, intensity) as within-subject factors and group (schizophrenia group, comparison group) as a between-subject factor. The relationship between emotion identification and specific predictors was assessed using analysis of covariance (ANCOVA) with tone matching, ER-40, and PSI included as potential covariates. Task version (full or brief) was also included in the analysis to remove variance associated with these factors.
Structural equation modeling was used to further investigate the pattern of correlation observed with regression analysis and to query both directionality of relationship and covariance among measures within the context of the overall correlation pattern. Alternate models were accepted if they led to a significant reduction in variance as measured using the chi-square goodness-of-fit parameter. Effect size measures were interpreted according to the convention established by Cohen (
34). All statistical tests were two-tailed, with alpha set at 0.05, and were computed using SPSS, version 18.0 (SPSS, Chicago).
Results
Full Version
On the full battery, patients showed highly significant large-effect-size impairment across stimuli (F=25.4, df=1, 97, p<0.00001; d=1.1) (
Figure 2). In contrast, the group-by-emotion interaction (F=2.09, df=5, 93, p=0.07) fell short of statistical significance, suggesting statistically similar deficits across emotions. Despite their impairments, patients performed well above chance levels for all emotions.
When stimuli were divided according to underlying feature (pitch or intensity), a highly significant group effect was again observed (F=14.4, df=1, 97, p=0.0003), as well as a highly significant group-by-feature interaction (F=7.79, df=1, 97, p=0.006). Follow-up t tests showed a highly significant difference in detection of pitch-based emotion (t=4.51, df=97, p<0.0001) but no significant difference in detection of intensity-based emotion (t=1.44, df=97, p=0.15) (
Figure 3). Mean performance levels across groups were not significantly different for pitch versus intensity stimuli, suggesting that group interactions were not due to floor or ceiling effects but that comparison subjects were better able to discriminate emotions when emotion-relevant pitch information was present, whereas patients were not.
Brief Version
A second group received the 32-item brief version of the task. In the brief version, the main effect of group (F=18.3, df=1, 91, p<0.0001) and the group-by-feature interaction (F=9.61, df=1, 91, p=0.003) were again significant, with significant between-group differences for pitch-based (t=5.32, df=91, p<0.0001, d=1.1) but not intensity-based (t=1.59, df=91, p=0.11, d=0.33) stimuli (
Figure 3). In the brief battery, as in the full battery, there was no significant group-by-emotion interaction.
Relative Contributions of Sensory and General Cognitive Dysfunction
In addition to deficits in AER, patients showed highly significant deficits in the tone matching test, the ER-40, and the PSI (
Figure 4). As predicted, the correlation between tone matching and AER was highly significant both across groups (r=0.56, N=164, p<0.0001) (
Figure 5A) and within patients (r=0.47, N=91, p<0.0001) and comparison subjects (r=0.34, N=73, p=0.004) independently.
To evaluate the relative contribution of these measures, an ANCOVA was conducted incorporating group as a between-subject factor and the tone matching test, ER-40, and PSI as potential covariates. Both tone matching performance (F=8.72, df=1, 117, p=0.004) and PSI (F=12.9, df=1, 117, p<0.0001) correlated significantly with AER performance, whereas the correlation with ER-40 was nonsignificant. After effects of tone matching performance and PSI were accounted for, the main effect of group was no longer significant.
Finally, inclusion of these factors into a path analysis yielded a strong model confirming both tone matching performance and PSI as mediators of the group effect on AER and showing an interrelationship between tone matching performance and PSI. In the path analysis, a significant relationship between auditory and visual emotion recognition was observed, with AER predicting performance on ER-40 (
Figure 5B).
Validation of Pitch Versus Intensity Dichotomy
Tone matching measures were also used to validate the psychoacoustic dichotomization of stimuli into pitch based versus intensity based. An ANCOVA conducted across groups with tone matching performance as a covariate showed not only a significant effect of tone matching (F=18.4, df=1, 159, p<0.0001) but also a significant tone matching-by-feature interaction (F=4.25, df=1, 159, p=0.041) reflecting a significantly stronger relationship between tone matching performance and accuracy in identifying pitch-based stimuli (F=30.8, df=1, 161, p<0.0001) than between tone matching performance and accuracy in identifying intensity based stimuli (F=8.43, df=1, 161, p=0.002). When analyses were restricted to happy stimuli alone, an even stronger dissociation was observed, with a significant tone matching-by-feature interaction (F=10.2, df=1, 157, p=0.002) and a significant relationship between tone matching performance and performance for pitch-based (F=19.6, df=1, 159, p<0.0001) but not intensity-based stimuli. Within patients alone, significant correlations were observed between tone matching performance and ability to detect pitch-based happiness (r=0.38, df=90, p<0.0001) and anger (r=30, df=65, p=0.017), but not intensity-based emotions.
“Cold” Versus “Hot” Anger
Pitch versus intensity analyses were also conducted separately for both anger and happiness, both of which may be conveyed by either pitch or intensity modulation (
Figure 1). Patients showed significant deficits in detection of anger conveyed by pitch modulation (“cold anger,” irritation) (t=2.51, p=0.014), but not by intensity (“hot” anger), although the group-by-feature interaction only approached significance (F=3.38, p=0.07). Similarly, patients showed significant deficits in detection of happiness conveyed primarily by pitch (t=2.57, p=0.011) but not intensity (“elation”) modulation (see Table S1 in the online data supplement).
Auditory Versus Visual Emotion Recognition
On the ER-40 (see Table S2 in the online data supplement), patients showed significant impairments in detection of sadness (p=0.003), fear (p<0.001), and no emotion (p=0.003), with deficits in detecting happiness (p=0.07) and anger (p=0.06) approaching significance. When correlations between AER and ER-40 were conducted for individual emotions within patients (see Table S3 in the online data supplement), the strongest correlations were found within emotion (mean r=0.33, p<0.01), with lower correlations across emotion (mean r=0.12, n.s.).
Correlation With Symptoms and Outcome
Deficits in AER correlated significantly with the cognitive factor of the PANSS (r=–0.33, p=0.003) but not with other PANSS factors. Deficits in emotion processing also correlated with standardized scores on the problem-solving subscale of the Independent Living Scales (r=0.26, p=0.017). Correlations with medication dosage, as assessed using chlorpromazine equivalents, were nonsignificant across all emotions.
Replication Sample
In the replication sample (
Table 2), as in the primary group, there was a highly significant mean effect of group (F=42.4, df=1, 253, p<0.0001; d=1.49), along with a significant group-by-feature interaction (F=6.35, df=1, 253, p=0.012). In addition, tone matching performance significantly predicted AER performance over and above the effect of group (F=24.2, df=1, 249, p<0.0001). In contrast, as in the primary sample, the group-by-emotion interaction was not significant (see Table S4 in the online data supplement). The reliability of the measures across samples based on intraclass correlation was 0.97 for patients and 0.96 for comparison subjects.
Discussion
Impairments in social cognition are among the greatest contributors to social disability in schizophrenia (
25,
32,
35,
36). Operationally, these deficits are defined based on inability to infer emotion from both facial expression and auditory perception. Although well-validated batteries have been developed to assess visual aspects of social cognition (
31,
37), auditory batteries remain highly variable, with limited standardization across studies (
9). Moreover, the relative contributions of specific sensory features and more generalized cognitive performance remain largely unknown.
We assessed AER deficits in two independent samples of patients and comparison subjects using a novel, well-characterized battery in which the physical features of the stimuli were analyzed and in which stimuli were divided a priori according to physical stimulus features that contribute most strongly to the emotional percept. In addition to strongly confirming the AER deficit in schizophrenia that we observed previously (
23), this study provides the first demonstration of a specific sensory contribution to impaired AER that remains significant even when more general emotional and cognitive deficits are considered. Finally, we provide both a general and a brief AER battery for study across neuropsychiatric disorders.
In the battery, angry and happy stimuli were divided a priori into pitch- versus intensity-based exemplars based on physical stimulus features. As we have previously observed both with these stimuli (
23) and with synthesized frequency-modulated tones designed to reproduce the key physical characteristics of emotional prosody (
24), patients show greater deficit in emotion recognition when emotional information is conveyed by modulations in pitch rather than intensity. Significant group-by-stimulus feature interactions were found for both the full and brief versions of the battery and in both the primary and replication samples. The battery thus provides a replicable method both for characterizing sensory contributions to AER impairments in schizophrenia and for comparing specific patterns of dysfunction across neuropsychiatric illnesses.
In addition to differential analysis of deficits by pitch-based versus intensity-based characterization, we analyzed AER relative to tone matching performance, which provides an objective index of auditory sensory processing ability, and relative to both face emotion recognition (ER-40) and WAIS-III PSI, which provide measures of visual emotion and general cognitive dysfunction in schizophrenia, respectively (
27,
38). The relative contributions of these measures to AER deficits were assessed using both multivariate regression and path analysis.
All three sets of measures (tone matching, ER-40, PSI) showed highly significant independent correlations to AER function across groups, with no further difference observed in AER function between schizophrenia patients and comparison subjects once these factors were taken into account. Approximately equal contributions were found for tone matching and PSI (
Figure 5B), with AER deficits in turn predicting impairments in ER-40. In addition, when correlations were analyzed between auditory and visual emotion recognition batteries, correlations were strongest within rather than across emotions, suggesting some shared emotional processing disturbance in addition to contributions of specific sensory deficits. Similar findings were obtained in the replication sample, in which group membership, tone matching, and ER-40 performance all contributed significantly and independently to AER performance.
Finally, deficits in AER also correlated with score on the problem-solving subscale of the Independent Living Scales, a proxy measure for functional capacity (
39,
40). Remediation of deficits in basic auditory processing has recently been found to induce improvement as well in global cognitive performance as measured using the Measurement and Treatment Research to Improve Cognition in Schizophrenia (MATRICS) Consensus Cognitive Battery (
41). Our results suggest that sensory-based remediation, along with specific emotion-based remediation, may be most useful for addressing social cognitive impairments in schizophrenia.
Based on our findings in this study, we propose that greater attention should be given to the physical characteristics of stimuli used for assessment of social cognition deficits not only in schizophrenia but also across neuropsychiatric disorders. Thus, for example, autism spectrum disorders are associated with AER deficits as indicated by performance on batteries similar to those used in schizophrenia (
42). However, the specific pattern of deficit may differ from that in schizophrenia. Autism spectrum patients are reported to show most pronounced deficits in vocal perception of anger, fear, and disgust, with relatively spared perception of sadness (
42). Our study suggests that dissociation across emotion in schizophrenia is not observed once the physical nature of the stimuli is considered. Comparison across populations, however, would be facilitated by use of a consistent battery with well-described physical features, such as the one we used in this study, in order to allow identification of the relative determinants of social cognition deficits across conditions.
Although our battery represents a significant advance over previous batteries, some limitations remain. First, actors were not coached to emphasize specific features when portraying an emotion, so critical stimulus parameters had to be deduced post hoc. Batteries in which actors purposely try to convey emotion by modulation of specific tonal or intensity-based features would give us even more ability to evaluate the differential mechanisms of emotion recognition dysfunction across diagnostic groups. Second, we used primarily a chronic, medicated patient population. Studies with prodromal or first-episode patients are needed to further delineate the temporal course of emotion recognition function relative to more basic impairments in tone matching ability. Third, although the pattern of results in this study is similar to that we have observed previously with this battery (
23), formal psychometric properties of the battery, such as test-retest reliability and sensitivity to change following intervention, remain to be determined. Fourth, the actors included in this battery spoke with a British accent, which may have influenced the results. Studies using actors speaking in a local accent would be desirable. Finally, other components of interpersonal interaction may also communicate emotion, such as body movement, context, and the verbal content of language. These were not tested in the present study.
In summary, deficits in social cognition are now well recognized in schizophrenia, although underlying mechanisms are yet to be determined. This study highlights substantial deficits in the ability of schizophrenia patients to decode specific stimulus features, such as pitch modulations, in interpreting emotion, leading to overall impairments in auditory emotion recognition. These deficits correlated with more basic impairments in sensory processing even when general cognitive and nonauditory emotion deficits were taken into account. These findings highlight the importance of sensory impairments, along with more general cognitive measures, as a basis for social disability in schizophrenia. In the short term, such deficits must be considered during interactions with patients, and both clinicians and caregivers should be aware that patients may simply be unable to perceive the acoustic features in speech that permit normal social interaction. In the long term, such deficits represent appropriate targets for both behavioral and pharmacological intervention.