Multiple studies support the view that neurocognitive alterations underlying schizophrenia are best explored by examining patient groups who share specific symptom presentations
(1–
4). The current study used this research strategy by investigating the perceptual/cognitive mechanism of auditory hallucinations.
Auditory hallucinations are a common but by no means universal symptom occurring in schizophrenia. Andreasen and Flaum
(5) described two Iowa samples of schizophrenic patients diagnosed through use of DSM-III-R criteria and found base rates for auditory hallucinations of 70% and 56%. These rates fell in the same range as those demonstrated in an earlier cross-cultural study of schizophrenia conducted by the World Health Organization
(6).
The content of auditory hallucinations in schizophrenia generally consists of speech or “voices.” For some, speech hallucinations emerge only during an acute psychotic episode, whereas for others, this symptom persists for years. A subgroup of patients experiences speech hallucinations as distressing and behaviorally disruptive, especially when these hallucinations have a recent onset or express negative content
(7,
8).
The neurocognitive basis of speech hallucinations remains poorly understood. A commonly accepted view is that speech hallucinations are verbal thoughts misidentified as deriving from external sources
(9,
10). One study comparing hallucinating and nonhallucinating patients found frontal underactivation in the former group when subjects imagined hearing speech spoken by others
(11). The authors suggested that these brain activation failures “might contribute to a less secure appreciation” of the actual source of verbal thoughts. However, there are no studies linking frontal underactivation to difficulties in identifying the source of thoughts or images. A single photon emission computerized tomography (SPECT) study found greater blood flow in Broca’s area among schizophrenic patients during active illness with speech hallucinations than during a later period of time when these symptoms had remitted
(12). Because Broca’s area activation has been associated with production of inner speech or verbal thoughts, these data support the hypothesis that speech hallucinations are related to these experiences. A study by Silbersweig et al.
(13) used the superior temporal resolution of H
2 [
15O] positron emission tomography (PET) to assess regional blood flow specifically when speech hallucinations occurred. This methodology did not demonstrate activation of Broca’s area during speech hallucinations, thereby challenging the view that these hallucinations are misidentified verbal thoughts.
An alternative hypothesis is that speech hallucinations derive from abnormalities in speech perception systems. The two distinguishing characteristics of speech hallucinations—that the content consists of spoken speech and is attributed to nonself sources—are then accounted for by the fact that these systems ordinarily produce percepts of external speech. In support of this view are perceptual studies demonstrating that patients with speech hallucinations are especially prone to experience meaningless sounds as meaningful speech and to misperceive speech with reduced phonetic clarity
(14,
15). A study of temporal region activation that used functional magnetic resonance imaging suggested that speech hallucinations compete with external speech for neurophysiological resources
(16). The study by Silbersweig et al. cited earlier identified activation of left temporoparietal association areas coincident with speech hallucinations
(13), and a SPECT study found left superior temporal activation to be associated with speech hallucinations
(17). Insofar as dominant hemisphere temporal and parietal areas participate in decoding spoken speech, these studies suggest that speech hallucinations arise from neurocircuitry underlying speech processing
(18).
Working memory has been defined as a brain system that temporarily stores and processes information to direct ongoing cognitive processes
(19). Neural network computer models of speech perception use a specialized working memory
(20–
22). These models are based on the observation that ordinary speech produced at normal rates has significant acoustic ambiguity because of blurring of phonetic information, the absence of pauses between words, and background sounds
(23,
24). Syntactic and semantic expectations, generated by prior word sequences processed in a specialized working memory, are used by human listeners normally to “fill in the blanks” when perceiving acoustically ambiguous speech
(24–
28). For instance, when listening to the two words in sequence, “John chases . . . ,” we learn to expect that a noun representing a moveable entity (e.g., “cat” or “Jane”) will follow, rather than a stationary entity (e.g., “hat”) or a verb. Sequential linguistic expectations of this sort constrain dramatically the range of alternative word interpretations when processing continuous speech with reduced clarity. As a result, streams of speech sounds are more readily translated into correct sequences of word percepts.
A speech perception neural network model reported by Hoffman and McGlashan also simulated a postnatal neurodevelopmental process believed to occur in mammalian brain systems
(22). Some evidence suggests a process of “neural Darwinism” in which “weaker” synaptic connections between neurons are selectively eliminated in biological neural networks
(29). When applied to the working memory component of speech perception simulations, this modification sharpened serial linguistic expectations. As a result, the network’s ability to disregard irrelevant inputs and to “fill in the gaps” when processing ambiguous speech was enhanced
(22). These attentional capacities arose not from a peripheral input “filter” but from the network’s ability to weigh the salience of new input information relative to what was expected linguistically.
When reductions in network connectivity went beyond a certain threshold, serial linguistic expectations became exaggerated and disordered. As a result, word percepts emerged in the absence of any “phonetic input,” thus simulating hallucinations. We hypothesized that hallucinated voices reported by schizophrenic patients arise from a similar mechanism. Speech perception networks that spontaneously “hallucinated” also demonstrated impairments when processing simulated “narrative speech”
(21,
22) because of misdirected and exaggerated working memory linguistic expectations, which derailed perceptual processing. Consequently, fewer words were correctly detected and more perceptual errors were produced.
A speech tracking task was developed to test this model in humans. Subjects shadowed (repeated while simultaneously listening to) narrative speech accompanied by multispeaker phonetic noise, or “babble.” As predicted, schizophrenic patients reporting voices demonstrated narrative speech perception impairments relative to nonhallucinating schizophrenic and normal subjects
(21).
The study reported in this article expanded these findings by studying a larger patient group through use of the speech tracking task. Our neural network simulation predicted that speech hallucinations should arise from a verbal working memory system that generates disordered and inappropriate serial linguistic expectations during speech processing. Consequently, a verbal memory task that requires use of appropriate semantic/syntactic expectations was also given to subjects. Other assessments of verbal working memory have been used to study schizophrenic patients
(30). However, these studies have not specifically assessed memory processes that are guided by serial linguistic expectations. Finally, an auditory continuous performance task using nonspeech tones was administered
(31). This version of the auditory continuous performance task has been shown to be sensitive to schizophrenic cognitive disturbances
(31) and does not use working memory insofar as responses were not dependent on previous input sequences. This study design permitted us to ask two questions:
1. Do speech processing impairments of the sort predicted by the neural network model differentiate hallucinating and nonhallucinating patients?
2. Are speech processing differences between hallucinating patients and nonhallucinating patients still detectable when one controls for effects of nonlanguage cognitive impairment?
Affirmative answers to these questions would support the hypothesis that reports of voices by schizophrenic patients are not only bizarre descriptions of self-experience but indicators of altered neurocircuitry dedicated to speech processing.
METHOD
Subjects
All patients admitted to two acute inpatient psychiatric units over an 18-month period and meeting inclusion criteria were recruited to participate in the study. Patients were carefully observed and assessed by research and clinical staff. Twenty-eight patients reporting speech hallucinations within 1 week before testing and 26 patients not reporting speech hallucinations over this time were studied. None of the patients or normal subjects was in our prior study of speech hallucinations
(21). Each patient received a DSM-IV diagnosis of a schizophrenia spectrum disorder (schizophrenia, schizoaffective disorder, or schizophreniform disorder) on the basis of the Comprehensive Assessment of Symptoms and History
(32). The Comprehensive Assessment of Symptoms and History also provided symptom ratings for delusions, hallucinations, positive thought disorder, and negative symptoms. The criterion for inclusion in the hallucinatory group was a score of 2 or greater on the hallucinations subscale of the Comprehensive Assessment of Symptoms and History (“mild but definitely present”) based on a report of voices within 1 week before testing. The presence of speech hallucinations was confirmed by reports from clinical staff and chart review. Patients with a score of 1 (“uncertain”) for hallucinations were excluded from the study. Six patients in the nonhallucinatory group had a history of hearing voices an average of 18 months before testing. These individuals produced results very similar to those of other nonhallucinating patients and were retained in that group. Thus, “hallucinating” and “nonhallucinating” refer to the presence or absence of hallucinated voices during the psychotic episode that led to the index hospitalization rather than the time of testing per se.
Twenty-six normal subjects were also studied. These subjects were recruited primarily from temporary employment agencies. Normal subjects were excluded if they had a history of psychiatric treatment or were receiving medication that might impair concentration.
The following exclusion criteria were used for all groups: 1) history of a neurological disorder, active substance abuse disorder, or ECT within the previous 6 months; 2) estimated IQ less than 80; 3) history of hearing impairment; 4) history of developmental articulation or language disorder; 5) first language not American English; and 6) age less than 18 years. Patients were not recruited into the study until their symptoms had improved to a degree that permitted them to engage actively in the experimental tasks. All subjects gave written informed consent to participate in the study after procedures were fully explained. Each was paid a small sum. Success in enrolling patients in the study who met inclusion/exclusion criteria was greater than 90%.
Given that the “late disappearance” half-life of neuroleptics is 30–40 hours
(33), quantification of neuroleptic exposure was averaged over the 5-day period before testing. Neuroleptic medication dose was converted to chlorpromazine equivalents per day by using conversion formulae provided by Davis
(34); 75 mg of clozapine and 1.5 mg of risperidone were assumed to be equivalent to 100 mg of chlorpromazine.
Assessment Instruments
Subjects first underwent audiometric screening. Any subject not able to detect tones at 500 Hz, 1000 Hz, or 2000 Hz at 25 dB in either ear was excluded from the study. Five hallucinating patients, two nonhallucinating patients, and one normal subject were excluded from the study on the basis of this criterion.
Masked speech tracking task stimuli were created in collaboration with Haskins Speech Laboratory. Target stimuli consisted of narratives from fiction and popular magazines ranging in length from 90 to 135 words with high levels of familiarity according to a frequency study of the American English lexicon
(35). Narrative texts were read by one of two speakers (one male and one female) at a mean rate of 2.20 words/second (SD=0.21) and were digitally recorded. Phonetic “babble” was created by mixing speech of six additional male and six female adults reading emotionally neutral texts. This produced a steady stream of unintelligible speech sounds from which no words could be reliably discerned. Phonetic babble is an optimal mask for experimentally reducing clarity of speech sounds
(23). Speech stimuli were combined with babble at a low level (four narratives), at a moderate level (three narratives), and at a high level (three narratives) and were digitally stored on audiotape. Final versions of babble-masked stimuli were reproduced binaurally by using a digital audiotape recorder. A more detailed description of this instrument has been reported elsewhere
(21).
Subjects were instructed to “shadow” narrative passages, i.e., to verbally repeat what they heard as they simultaneously listened to spoken speech. Two practice segments in the low noise condition were presented. Passages were then presented in two blocks of four texts generated by the same speaker, starting with the low noise condition and building up to the high noise condition. This allowed the listener to establish familiarity with the voice to be shadowed before more difficult, higher noise conditions. Order of presentation of male and female blocks was randomly varied from subject to subject. Verbal responses were recorded by means of a lapel‐clip microphone and audiocassette recorder. Hallucinating patients were asked if they heard speech hallucinations during masked speech tracking task performance. Only one patient answered affirmatively; many of these patients had stopped reporting speech hallucinations by the day of testing.
Scoring Speech Tracking Performance
Verbal productions during “tracking” were transcribed off-line and underwent a word-by-word content analysis. The total number of words correctly reproduced from the text yielded a word detection rate. Words substituted into texts by subjects were classified into two groups: “motivated” and “unmotivated.” Motivated substitutions corresponded to words that were plausibly derived from semantic or syntactic expectations generated by prior target text as determined by the scorer. These substitutions were viewed as normal responses to reduced phonetic clarity. Word substitutions not meeting this criterion were identified as unmotivated. Articulation errors (corresponding to nonwords nearly identical to a target word but with mispronunciation of one to two phonemes) and bizarre nonwords (nonwords that were not articulation errors) were also scored. Articulation errors were commonly produced by normal speakers and were not considered to be pathological. Scoring was conducted blind to group membership or other identifying information. Interjudge reliability was estimated by using 10 protocols (four hallucinating patients, three nonhallucinating patients, and three normal subjects). The intraclass correlation coefficient demonstrated excellent reliability (RI≥0.90) for word detection rate, motivated word substitutions, unmotivated word substitutions, and bizarre nonwords and good reliability for articulation errors (RI=0.64). For this report, data analysis was limited to word detection rate and a single combined score reflecting more serious perceptual errors (unmotivated word substitutions plus bizarre nonwords).
Other Tasks
To test for nonlanguage cognitive impairment, an auditory continuous performance task developed by Mirsky et al.
(31) was also administered to subjects. The task used three tones (640, 1000, and 1600 Hz) administered binaurally with signal intensity of approximately 90 dB. The subject was instructed to push a button when the highest pitched tone was heard. Two variables were computed—the number of target tones identified correctly and the number of “false alarm” responses.
To assess effectiveness of serial linguistic expectations, the Benton-Hamsher sentence repetition test was administered
(36). Subjects were asked to repeat grammatical sentences ranging from three to 18 words in length immediately after listening to them on headphones. Longer sentences force subjects to rely on grammatical structure to achieve full recall. Scoring was based on the longest sentence accurately repeated, i.e., all content words correctly reproduced.
Statistical Analyses
Masked speech tracking task word detection rate demonstrated significant deviations from a normal distribution with a large positive kurtosis (mean=1.99) and negative skew (mean=–1.11) for patients. These data were normalized by subtracting each score from 100 (thereby calculating the “miss rate”), adding one to permit logarithmic transformation of perfect scores (i.e., miss rate of zero), and calculating the logarithm
yielding an adjusted mean kurtosis of 0.05 and an adjusted mean skew of 0.00. The same transformation applied to the continuous performance task hit rate reduced negative skewness from –1.26 to –0.16. Error scores for the three masked speech tracking task noise conditions and the continuous performance task false-positive score demonstrated large positive skewness (mean=2.68) that was partially normalized by square root transformation (adjusted mean skew=0.91).
Repeated measure analyses of variance (ANOVAs) were used to calculate overall effects. For variable sets showing a significant group effect, one-way ANOVAs were calculated for individual variables with pairwise comparisons of groups that used the Tukey B test with alpha set at 0.05.
Hallucinating patients were found to have a lower mean education level than nonhallucinating patients. In order to make these two groups comparable in terms of this variable, data from hallucinating patients with less than 12-grade education level and nonhallucinating patients with greater than 17-grade education level were not included in the analyses described later. After the two schizophrenic groups were combined, Pearson correlations between the four symptom measures (hallucinations, negative symptoms, delusions, thought disorder ) and the six masked speech tracking task variables and the sentence repetition variable were calculated. A Bonferroni correction was used, and consequently, a minimum individual analysis p value of 0.002 (0.05 divided by 28) was required.
RESULTS
Table 1 summarizes subject characteristics. Groups did not differ statistically with regard to age (F=0.89, df=2, 68), gender (χ
2=0.79, df=2), education level (F=0.48), or parental socioeconomic class (F=0.33). Racial distribution showed borderline significant group differences, with the normal group reflecting a somewhat larger number of African Americans (χ
2=9.16, df=4, p<0.06). Hallucinating and nonhallucinating patients did not differ significantly with respect to age at first hospitalization, number of prior hospitalizations, neuroleptic exposure, thought disorder, delusions, and negative symptoms.
A repeated measures ANOVA applied to percent of words detected correctly in the three noise conditions revealed a significant overall effect for group (F=17.0, df=2, 68, p<0.0005), noise level (F=586, df=2, 137, p<0.0005), and group-by-noise level interaction (F=10.0, dsf=4, 137, p<0.0005). Follow-up one-way ANOVAs revealed that in the low and moderate noise conditions, hallucinating patients performed less well than nonhallucinating patients (
table 2). In all three conditions, hallucinating patients performed less well than normal subjects. In the low noise condition, nonhallucinating patients also were impaired relative to normal subjects.
A repeated measures ANOVA applied to the rate of speech perception errors in the three noise conditions revealed a significant group effect (F=12.4, df=2, 68, p<0.0005), a significant noise effect (F=50.5, df=2, 137, p<0.0005), and a nonsignificant group-by-noise level interaction (F=1.08). Follow-up one-way ANOVAs revealed that hallucinating patients demonstrated greater numbers of perceptual errors than both nonhallucinating patients and normal subjects in the low and moderate noise conditions. In the high noise condition, hallucinating patients produced a greater number of errors than normal subjects but did not differ from nonhallucinating patients in terms of this variable.
Sentence repetition demonstrated group differences; hallucinating patients were significantly more impaired than normal subjects and nonhallucinating patients (
table 2). Normal subjects and nonhallucinating patients did not significantly differ from each other.
Analysis of continuous performance task measures revealed that both hallucinating and nonhallucinating patients were impaired relative to the normal group but did not differ from each other (
table 2).
A discriminant analysis for classifying patients as hallucinating or nonhallucinating was calculated by using word detection and error rate in the moderate noise condition. Sixteen of 21 nonhallucinating patients and 21 of 24 hallucinating patients were successfully classified, yielding an overall success rate of 82.2% (χ2=12.1, df=2, p<0.002). The only correlation exceeding the criterion p value after Bonferroni correction was between hallucination severity and word detection in the moderate noise condition (r=–0.47, df=43, p<0.001).
DISCUSSION
As predicted by our “hallucinogenic” speech perception neural network
(22), patients reporting hallucinations were more impaired than both nonhallucinating patients and normal subjects during speech tracking and when performing a sentence repetition task that relies on serial linguistic expectations. These impairments appeared not to reflect differences in the level of other positive or negative symptoms, neuroleptic exposure, education, or other demographic characteristics because the two patient groups did not differ with respect to these variables. A discriminant analysis using two speech tracking variables in the moderate noise condition classified the two patient groups with high (greater than 80%) accuracy. These findings suggest that “voices” demarcate a relatively discrete subgroup of schizophrenic patients characterized by speech processing alterations. Although the auditory continuous performance task demonstrated statistically robust impairments among schizophrenic patients relative to normal subjects, this task failed to differentiate hallucinating and nonhallucinating patients. These findings support the hypothesis that speech processing alterations demonstrated by hallucinating patients are not due to a general, nonlanguage cognitive impairment such as attentional failure.
It is possible that other nonlanguage cognitive impairments might have contributed to speech processing impairments detected in hallucinating patients. For instance, schizophrenic patients can demonstrate excessive distractibility
(38,
39), which could impair task performance. Such alterations could have produced masked speech tracking task performance impairments, given that speech stimuli were accompanied by superimposed phonetic noise. However, hallucinating patients also performed worse on the sentence repetition task, which does not incorporate a distraction condition. Therefore, it is difficult to attribute speech processing performance impairment to an external distraction effect per se. Another possibility is that patients were distracted by their own hallucinations. As indicated earlier, only one patient reported actually hearing speech hallucinations during the task. However, multiple patients produced on occasion unusual or bizarre strings of perceptual errors. The patients themselves experienced these spurious percepts as deriving from external speech stimuli heard on the headphones, rather than from hallucinations. Perhaps these patients were, in fact, “hearing” and reporting speech hallucinations that they did not acknowledge. Although this possibility cannot be ruled out, distraction by internally generated hallucinations per se is not suggested by parallel continuous performance task hit rate data. Consistent with an earlier study reported by Mirsky et al.
(31), we found the continuous performance task hit rate to be a psychometrically robust variable that clearly differentiated schizophrenic patients overall from normal subjects. There is no reason to believe that speech hallucinations would be less distracting for auditory nonspeech stimuli than for speech stimuli. If internal distraction due to speech hallucinations was a significant cause of performance impairment, the continuous performance task hit rate should differentiate hallucinating and nonhallucinating patients to the same degree demonstrated by masked speech tracking task word detection—which was not the case.
Our findings are consistent with the hypothesis that speech hallucinations arise directly from speech processing neurocircuitry. However, caution must be exercised in making inferences regarding the mechanism of this symptom complex. For instance, it is possible that a brain region necessary for processing narrative speech neighbors but is distinct from a region responsible for speech hallucinations. If schizophrenia alters function in both regions simultaneously, speech tracking alterations and speech hallucinations could co-occur without a causal mechanism leading from the former to the latter. Other causes of speech hallucinations such as phencyclidine or bipolar disorder may not be associated with speech tracking impairments and hence derive from different mechanisms. We are now conducting a study using transcranial magnetic stimulation that assesses whether speech perception regions of the cerebral cortex are involved directly in the generation of speech hallucinations in schizophrenic patients
(40).
Finally, longitudinal assessments should determine the degree to which speech processing impairments improve when speech hallucinations improve over time. Moreover, the degree to which these impairments are also expressed by nonschizophrenic patients experiencing speech hallucinations should be assessed. Nonetheless, this work indicates the usefulness of neural network models in directing studies of schizophrenia and suggests that future studies of neuropsychiatric illness will benefit from computer simulations that generate models of brain dysfunction.