Computer models are central to scientific disciplines ranging from meteorology to physical chemistry. Their usefulness lies in simulating complex, interactive systems. A good model does not recreate “reality” in its entirety—if that were the case, the best model would be the real-life system itself. Instead, model construction proceeds by incorporating certain properties of the system in a much simplified form which, when simulated by computer, exhibits characteristic properties or behaviors that have been previously unexplained. For instance, a computer simulation of Jupiter's atmosphere produced a stable “storm” resembling the planet's Red Spot after its rapid spin rate and liquid composition had been accounted for (
1). The simulation was useful insofar as essential physical dynamics of this previously unexplained phenomenon were captured.
OBSERVATIONS USED IN CONSTRUCTING THE MODEL
Key observations used in constructing the simulation of a speech perception neural network were as follows.
1. Cortical development during adolescence is characterized by substantial reductions of synapses. Studying normal postmortem tissue obtained from the middle frontal cortex, Huttenlocher (
4) found that synaptic density peaked during childhood, with a subsequent decline of 30%–40% during adolescence to reach adult levels, which remained relatively stable. Since large-scale neuronal loss does not occur during this developmental period, synaptic elimination must reflect curtailed connectivity between neurons. Less than 1% of afferents to any cortical area derive from the thalamus, the primary source of noncortical input (
5). Large-scale reductions of cortical synapses must therefore reflect reductions in corticocortical connections rather than thalamocortical afferents. Investigating consequences of reduced connectivity within neural systems was therefore a primary goal of our model.
2. Recent studies suggest that excessive synaptic pruning is associated with schizophrenia. Many workers have hypothesized that schizophrenia is a neurodevelopmental disorder (
6–
11). The characteristic age at onset of this disorder—late adolescence and young adulthood—and the prominence of synaptic pruning during normal adolescence suggest that schizophrenia could arise from a pathological extension of this “late” developmental process (
6,
8). This hypothesis has been supported by phosphorus-31 magnetic resonance spectroscopy studies of neural membrane phospholipid turnover (
12–
14) and postmortem studies of neuropil volume (
15) and dendritic spine numbers (
16,
17) that compared the frontal cortex in schizophrenic brains and normal control brains. In addition, reductions in synapse-associated phosphoproteins (synapsin and synaptophysin) in the medial temporal cortex of schizophrenic patients have been reported (
18,
19).
3. Hallucinated speech or “voices” commonly occur in schizophrenia. Our strategy was not to simulate the entire syndrome of schizophrenia but to explore a single characteristic symptom, auditory hallucinations, which are reported to occur in approximately 50%–80% of patients (
20). One clue to their origin is that these hallucinations most commonly consist of spoken speech or “voices” (
21), a phenomenological feature suggesting that hallucinated speech involves neural systems dedicated to auditory speech perception. This view is reinforced by positron emission tomography and functional magnetic resonance imaging evidence of activation of the auditory/linguistic association cortex when voices occur (
22,
23).
Certain aspects of the speech perception system were therefore simulated to determine whether pruning “corticocortical connections” could simulate voices. Our criterion for identifying this “symptom” was production of “percepts” by the speech perception network in the absence of any phonetic input, thereby simulating hallucination.
4. Working memory underlies normal speech perception. The neural network simulation was guided by the observation that ordinary speech, when produced at normal rates, has substantial acoustic ambiguity because of blurring of phonetic information and background sounds (
24–
27). Consequently, perception of a word embedded in narrative speech depends not only on acoustic input corresponding to the word itself but also on previously perceived words and intrinsic knowledge of how words are sequenced into larger message units (
28,
29). The utilization of linguistic expectations used to “disambiguate” ongoing speech inputs reflects a specialized working memory capacity that was incorporated into the neural network.
Many studies have demonstrated working memory impairments in schizophrenia (
30–
32). Weinberger et al. (
30) have implicated pathology involving interactions between frontal and medial temporal areas that are known to underlie human working memory (
33). We therefore targeted the working memory component of our neural network to explore effects of reduced corticocortical connectivity.
METHOD
Our simulation of sequential word perception was based on models developed by Elman (
29,
34). We have described a preliminary study that used this model to explore hallucinated speech (
35). Compared to our first study, the simulation we report here made use of a more complex learning paradigm and a smaller input layer designed to force the network to rely further on working memory.
Network Architecture and Language Training
The network, which consisted of 148 “neuronal elements” divided into a four-layered system (
figure 1), was designed to translate “phonetic” inputs into words. Actual acoustic data were not used. Instead, our simplifying assumption was that the phonetic representation of each word corresponded to a unique pattern of activation where roughly 25% of the neurons in the initial or input layer were “turned on.” The vocabulary of the network consisted of 30 words, including 14 nouns (woman, Jane, boy, girl, Bill, man, cop, Sam, omen, warning, story, dog, God, ball), 11 verbs (chase, kiss, love, fear, tell, run, kick, give, frightens, think, miss), four adjectives (young, old, large, small), and one other word (won't). A large array of grammatical structures was permitted by this word set.
Each of 40 hidden-layer neurons received a weighted sum of inputs from each of the 25 input neurons and 40 temporary storage neurons:
where input (x) is the summed input received by neuron x in the hidden layer, I(y) is the activation of neuron y in the input layer, S(y) is the activation of neuron y in the temporary storage layer, and w
yx are corresponding projection weights (which can be positive or negative). The activation of each hidden-layer neuron was then computed by using a sigmoidal function ranging from 0 to 1, which acted on the summed input. The output layer consisted of 43 neurons. Output-layer neurons received inputs exclusively from the hidden layer (
figure 1) and had the same sigmoidal activation function as hidden-layer neurons. Besides being assigned a phonetic code, each word was also assigned a pattern within the output layer, where between three and six of these neurons were turned on. These neurons coded for semantic and syntactic features. For instance, the word “cop” was represented by activation of output neurons that individually coded for NOUN, ANIMATE, and HUMAN, as well as a particular neuron that referred to “cop” itself. Examples of output codes for individual words are provided in
figure 2.
When the network produced an output layer activation pattern, an algorithm decided which word was the best fit for that particular pattern; the best fit became the “detected word.” When the output activation pattern demonstrated no clear-cut best fit, the network was assessed as not perceiving any word.
Network training utilized 60 repetitions of a set of 256 different grammatical sentences. Differences in weights between different neuronal layers were adjusted by using an “on-line” variant of back-propagation learning (
36), which progressively minimized the error of activation patterns produced by the output layer in response to inputs whose phonetic information was partially degraded. During the course of training, the network acquired the ability to use linguistic expectations—stored as activation patterns resonating between the hidden and temporary storage layers—to guide detection of words.
Assessment of Network Performance
After the network was trained, it was retested with a set of 23 sentences not used in training but incorporating the same vocabulary. During testing, each test sentence was separated from the next by a pause consisting of five null inputs (all input neurons set to 0). The percentage of words successfully detected by the network was counted, as well as the total number of misidentifications (when the network confused one word for another). Hallucinations were scored when output-layer activation patterns yielded word percepts during pauses when phonetic inputs were absent. Assessment of network performance was undertaken with full phonetic information for each word and then repeated with degraded phonetic information. The latter condition was created by randomly selecting two input neurons ordinarily turned on for each word and resetting them to 0. This manipulation forced the network to rely more on working memory and linguistic expectations based on previous inputs to “fill in the blanks” and produce the correct word percept.
As an example of network performance, suppose that the input consisted of “phonemes” presented in a sequence corresponding to the following words: cop-chase-old-man-#-#-#-#-#-Jane-kiss-girl, where # denotes null inputs corresponding to pauses. Assume that the output of the network was cop-chase-█-dog-█-█-█-fear-█-Jane-kiss-girl, where █ denotes the absence of any output produced by the network. The number of words correctly identified would be five of seven; the word “man” would be scored as misidentified, and “fear” would be scored as a hallucination.
Neuroanatomic Manipulations
Two neuroanatomic manipulations of working memory networks were simulated. The pruning procedure was guided by the concept of neurodevelopmental “Darwinism,” where neurons compete with each other for anatomic access to other neurons, with elimination of less robust interneuron connections (
37–
39). In mathematical terms, if the absolute value of a connection weight linking the temporary and hidden layers was below a certain threshold, it was “clamped” at 0. Excitotoxic cell death was simulated by presenting the network with the standard set of test sentences. Hidden-layer neurons were ranked according to the summed activation that they received. Neurons with the highest “rank” were then “eliminated” by clamping their activation levels at 0. Two other simulations of cell loss were also explored: 1) random elimination of neurons and 2) elimination of neurons that were the least activated during testing. Each of these three methods of cell elimination was applied separately to the hidden and temporary storage layers.
DISCUSSION
Random selection of the word order of input sentences dramatically disrupted performance of the standard network, demonstrating clearly the network's reliance on linguistic expectations generated by a specialized working memory. This property of the network provided the basis for investigating the impact of neuroanatomic alterations of working memory on narrative speech perception. These efforts yielded two results.
First, eliminating working memory connections within a certain range improved the network's ability to perform the perceptual task. Functional advantages could also be obtained from excitotoxic elimination of temporary storage neurons. The model thus provides a functional accounting for these two neurodevelopmental trends. Although their relationship is poorly understood, a clue is provided by the Huttenlocher study (
4), which suggests that cell death occurs somewhat before synaptic elimination.
Our study examined the relation between one aspect of language capacity and cortical connectivity. There are no neurobiological studies to date that provide a direct comparison. However, a study of songbird acquisition demonstrated an associated reduction of synapses in brain areas responsible for this communication function (
40). Birdsong is not language, but it is a highly structured system involving sound sequences. Perhaps a parallel developmental process occurs in humans, where cortical pruning of synapses results in enhanced efficiency in processing sequential linguistic behavior. In bird studies, loss of synapses was accompanied by inability to learn new birdsong sequences. Reduced aptitude for learning a second language that is associated with the end of childhood may also be due to a cortical pruning process (
41).
Second, synaptic pruning, when applied excessively, simulated hallucinated speech. This concept is of interest given the high prevalence of this symptom in schizophrenia, the characteristic age at onset of this disorder, the dramatic loss of frontal synapses occurring in adolescence normally, and empirical evidence suggesting further reductions in cortical synapses in schizophrenia. The characteristic age at onset of psychosis combined with our observation that synaptic elimination rather than cell elimination underlies this phenomenon fit well with the view that the former occurs developmentally later than the latter (
4). The fact that hallucinations were not generated by cell death models suggests that a relatively full array of neurons is necessary in order to produce hallucinogenic percepts.
Our results estimated reductions in connectivity associated with normal neurodevelopment and induction of psychosis. The model can also be used to estimate corresponding reductions in synapses. These estimates can be generated if one assumes that the number of synapses mediating a projection from one neuron to another is linearly correlated with the strength or weight of the projection. In other words, higher numbers of synapses mediating a projection would increase the functional weight of the connection. Our model also assumes that pruning is “Darwinian” (where weaker interneuron connections are preferentially eliminated). Consequently, connections mediated by a smaller number of synapses would tend to be pruned away. A 64% reduction in connections optimized the perceptual functioning of the model and thus estimates stable, normal adult levels of connectivity. On the basis of the distribution of connection weights of the unpruned network, a reduction in connectivity of this magnitude would correspond to a synaptic reduction of 29%. This figure approximates the 30%–40% reduction of synapses from peak childhood levels to adult levels indicated by postmortem study of frontal areas of normal brains (
4). “Hallucinosis” was observed for connectivity reductions greater than 77%. The model predicts that connectivity reductions of this magnitude correspond to an
additional 20% loss of synapses compared to optimized adult levels of (reduced) connectivity. This estimate approximates the findings of Selemon et al. (
15), who found that the mean neuropil volume of the frontal cortex of schizophrenic brains is reduced by 17% relative to that of normal adult brains (
16). Neuropil consists of the dense intertwining of axons and dendritic arbors surrounding neuronal cell bodies and is likely to correlate with overall synaptic density. Thus, the model provides reasonable estimates of synaptic elimination, which, in a preliminary fashion, converge with the findings of these postmortem studies of normal brain development and schizophrenia (
4,
15).
Another study of pruning in artificial networks indicates that this alteration can promote better generalization (
42). In our simulation, pruning at low levels helped the network to fill in the gaps during perceptual processing. It is therefore not surprising that additional pruning could push the network to produce spontaneous hallucinations. The nonrandom pattern of occurrence of hallucinated percepts (i.e., “won't” following sequentially appropriate nouns) in the model indicates that this “pathology” arose from misapplied sequential expectations derived from working memory. This finding is of interest given the large number of studies indicating working memory impairments in schizophrenia (
30–
32). Reduced working memory capacity may derail thought processes, thereby suggesting a mechanism of thought disorder (
43). Our model suggests that this functional system can also produce spurious outputs productive of other positive symptoms such as hallucinations.
Physiological studies indicate that synaptic elimination reduces local metabolic requirements (
44,
45). Along these lines, imaging studies of human brain development have shown downward shifts in local cerebral metabolic rates that parallel developmental shifts in synaptic density (
46). Our synaptic elimination model therefore provides an accounting for the many studies demonstrating reduced cerebral metabolism in schizophrenic patients (
47).
The model may address an intriguing question raised by Crow (
48): why has the genetic predisposition to schizophrenia remained robust in diverse human populations in spite of obvious fertility disadvantages? Our results suggest that genes that lead to postnatal reductions in corticocortical connectivity might be advantageous cognitively up to a certain point (and hence selected for) but in certain combinations could produce too much pruning—with psychotic symptoms resulting.
The model also predicts that when phonetic clarity is curtailed, the narrative speech perception abilities of schizophrenic patients reporting voices are reduced compared to those of nonhallucinating schizophrenic patients. These differences were demonstrated in a recent study of schizophrenic patients that used a speech tracking task (
35).
One limitation of the model is that the simulated hallucinations consisted of a single word following “external speech.” The content of hallucinated speech has been shown to be highly constrained (
49). However, actual hallucinations in most cases consist of whole phrases or sentences and are often not in response to external speech. With much more complex vocabularies and linguistic knowledge, simulated speech perception networks could, at least in theory, produce whole phrases or sentences as hallucinations. Moreover, thoughts or even affects could trigger spurious working memory outputs that are experienced as hallucinations. Therefore, we propose that the simulation does provide a useful, albeit simplified, model of this phenomenon.
Many issues are not addressed by our simulations. For instance, the model does not provide an explanation for the mechanism of action of neuroleptic drugs. We have simulated a hypodopaminergic state and demonstrated that hallucinations can be eliminated; these findings will be reported separately. Second, our simulation addresses only a single psychotic phenomenon, namely, hallucinations. Other psychotic symptoms may have different mechanisms. In addition, the model is limited in terms of its fidelity in simulating human neurobiology, including the simplicity of neuronal types and architecture and the learning paradigm used. However, as we stated earlier, a model should not be judged on the basis of its complexity but on its ability to extrapolate from current observations and data to account for unexplained phenomena. Many facets of schizophrenia remain mysterious and even paradoxical. We predict that neural models will be needed to assemble a comprehensive picture of this disorder.