Depression is associated with varied symptoms, from mood changes to cognitive impairment. A large proportion of these symptoms may be driven at least in part by abnormal responses to affective stimuli (
1). Specifically, depression is associated with a strong “negative” bias: enhanced sensitivity to negative (punishing) stimuli and a behavioral neglect of positive (rewarding) stimuli (
2). This affective bias, which is manifested across many facets of learning, memory, and cognition, putatively serves both to instigate and to uphold the debilitating negative and anhedonic mood state (
3,
4). A clearer understanding of the neural basis of affective bias in depression will thus lead to a clearer understanding of the overall pathology.
In this study, we focused on affective biases seen in flexible learning in depression. Adaptive behavior in our daily life, where the consequences of our actions are often uncertain and variable, requires individuals to frequently and flexibly update their behavior. The experimental model most often used to examine such flexible behavior is the probabilistic reversal learning paradigm. In this paradigm, subjects learn by trial and error to choose the most rewarding stimulus and then subsequently reverse their choice when contingencies change and this previously rewarding stimulus is unexpectedly followed by punishment. In this probabilistic task, where around one-fourth of the reward and punishment feedback is misleading, depressed individuals reverse more often than do healthy individuals when they receive misleading negative feedback (
5–
7). This problem has been interpreted to reflect a negative affective bias and may underlie the tendency of depressed individuals to emphasize negative—at the expense of positive—life experiences.
However, this negative affective bias could be driven by at least two different processes: 1) increased behavioral sensitivity to unexpected punishment in depression (encouraging reversal during misleading negative feedback), and/or 2) reduced behavioral sensitivity to reward in depression (reducing the ability to maintain the correct stimulus-reward association). To elucidate the nature of affective biases in reversal learning, we developed a novel reversal learning paradigm that enabled direct comparison of reversals signaled by unexpected reward with reversals signaled by unexpected punishment (
8–
11). In this task, subjects do not directly choose the rewarded or punished stimulus but rather predict the outcome of stimuli selected by the computer. Unlike the probabilistic tasks, this task is deterministic and subjects are required to reverse their behavior as soon as they receive unexpected outcomes. Critically, our study design involved Pavlovian rather than instrumental conditioning, which allowed the assessment of reversals on the basis of unexpected reward
as well as unexpected punishment (
8,
11).
Using this task, we previously demonstrated that both punishment and reward reversals rely on overlapping but distinct regions of the striatum (
11). This involvement of the striatum is consistent with imaging studies of the classic probabilistic reversal learning task in healthy individuals, in whom increased striatal response precedes behavioral switching (
12), and it concurs with the frequently highlighted role of the striatum in dopamine-mediated prediction error learning (
13).
Extrapolation from the above findings suggests that it is plausible that the behavioral bias in reversal learning seen in depression (
5–
7) is driven by altered striatal processing. Indeed, attenuated striatal function is seen in the depressive pathology across multiple cognitive tasks, from higher-order planning to gambling (
1,
14,
15). However, previous work using the classic probabilistic reversal learning paradigm in depressed individuals has not found significant differences in striatal response during reversal learning (
5,
16,
17). Although the striatum is a key region involved in reversal learning in healthy individuals and reversal learning is impaired in depression, studies to date have not demonstrated striatal involvement in the negative bias in reversal learning in depression, despite the fact that the striatum is involved in the neuropathology of depression (
14).
The negative bias in reversal learning in depression therefore might not directly involve the striatum but rather aberrant function in, for example, the orbitofrontal cortex (
18–
20) or the amygdala (
5). Alternatively, however, previous studies may have failed to reveal the contribution of the striatum because they inadequately disentangled the separate reward and punishment components of reversal learning. In this study, we therefore employed our new deterministic reversal learning task to examine differences in the hemodynamic response during separate punishment and reward reversal trials across unmedicated depressed individuals and healthy comparison subjects. We predicted that depressed individuals would demonstrate a negative bias in reversal learning and that this would be associated with a corresponding attenuation in striatal response during reversal trials. However, given the absence of striatal differences across diagnosis in punishment-based probabilistic reversal learning, we predicted that any alteration in striatal response would be restricted to reward-based reversals.
Method
Volunteers (N=27; 15 Caucasian, one Asian, 11 African American; all right-handed) 18–50 years of age underwent screening evaluations that included a medical history, physical examination, laboratory testing, and structural MRI. Psychiatric assessment was conducted using the Structured Clinical Interview for DSM-IV-TR and an unstructured interview with a psychiatrist; 14 volunteers had no psychiatric disorders (healthy comparison subjects), and 13 had major depressive disorder. Exclusion criteria for all participants included psychotropic drug exposure (including nicotine) within the past 3 weeks; major medical or neurological illness; illicit drug use or alcohol abuse within the past year; lifetime history of alcohol or drug dependence; psychiatric disorders other than major depression (excepting comorbid anxiety disorder and a remote history of substance abuse); current pregnancy or breastfeeding; structural brain abnormalities on MRI; general MRI exclusions. Additional exclusion criteria for comparison subjects were a history of any psychiatric disorder (except a remote history of substance abuse) and a history of any mood disorder in a first-degree relative. After receiving a complete description of the study, participants provided written informed consent as approved by the National Institutes of Health Combined Neuroscience Institutional Review Board. Participants were group matched for age (healthy comparison group, mean=31 years [SD=6], depressed group, mean=36 years [SD=11]), gender (eight male participants in each group), years of education (healthy comparison group, mean=17 years [SD=2], depressed group, mean=16 years [SD=2]), and IQ (healthy comparison group, mean=120 [SD=15], depressed group, mean=120 [SD=15]; IQ scores were not available for eight participants [five in the depressed group], four because English was not their first language [one in the depressed group]; one [in the depressed group] because he vocationally administered IQ testing, and three because they dropped out of the study after scanning but before neuropsychological testing). The mean score on the 21-item Hamilton Depression Rating Scale (HAM-D) (
21) was higher in the depressed than in the comparison group (depressed group, mean score=20 [SD=7]; comparison group, mean score=1 [SD=1]; F=95, df=1, 25, p<0.001).
Behavioral and Functional Neuroimaging Measures
Task.
The behavioral task was adapted from a previously developed paradigm (
8,
9,
11) and programmed using E-PRIME (Psychological Software Tools, Inc., Pittsburgh).
On each trial, participants were presented with two vertically adjacent stimuli, one scene and one face (location randomized) on a projector viewed by means of a mirror attached to the head coil in the functional MRI (fMRI) scanner. One of these two stimuli was associated with reward and the other with punishment. Participants were required to learn these deterministic stimulus-outcome associations by trial and error. Unlike standard probabilistic reversal paradigms, however, participants were not required to choose between the two stimuli but were instructed to predict whether a stimulus that was highlighted with a black border (randomized from trial to trial) would lead to reward or to punishment (the task contingencies were thus Pavlovian and expected to be processed more specifically in the ventral striatum [
22]). They indicated their outcome prediction for the highlighted stimulus by pressing, with the index or middle finger of their dominant (right) hand, one of two buttons (one for reward, one for punishment; response mappings counterbalanced) on a button box placed on their abdomen. They had up to 1,500 msec to provide a response. Once they responded, the outcome was presented for 500 msec in the center of the screen (between the two stimuli). Reward consisted of a green smiley face and punishment a red sad face. If they failed to make a response, “Too late!” was displayed instead of the outcome. After the outcome, the screen showed only a fixation cross for a reaction time-dependent interval, so that the interstimulus interval was jittered modestly between 2,000 and 4,000 msec.
Each experimental block consisted of one acquisition stage and a variable number of reversal stages. The task proceeded from one stage to the next following a specific number of consecutive correct trials as determined by a preset learning criterion. This criterion varied between stages (four, five, or six correct responses) to prevent predictability of reversals. The task also terminated after 10 consecutive incorrect trials in order to avoid scanning blocks in which participants were not performing the task correctly (e.g., because of having forgotten the outcome-response mappings). Reversals of contingencies were signaled to participants either by an unexpected reward presented after the previously punished stimulus was highlighted or by an unexpected punishment presented after the previously rewarded stimulus was highlighted. Unexpected reward and unexpected punishment events were interspersed within blocks. Consistent with previous versions of this task (
8,
11), the same stimulus was highlighted after the unexpected outcome and was presented until participants correctly reversed their predictions.
During the scan session, participants completed six experimental blocks. The average number of reversal stages per experimental block was eight (four signaled by punishment), although the block terminated automatically after completion of 150 trials (7.4 minutes), so that each participant performed 900 trials (six blocks) per experimental session (approximately 90 minutes, including breaks). A 30-second fixation period was also included at the beginning and end of each block to provide a baseline with which to compare blood-oxygen-level-dependent (BOLD) response during trials.
All participants performed a practice block before entering the fMRI scanner to familiarize them with the task. The practice task was identical to the main task except that the stimuli were presented on a laptop computer.
Behavioral analysis.
Reaction times and accuracy rates were assessed in an analysis of variance with reversal (reversal versus nonreversal trials) and valence (reward versus punishment) as within-subject factors and group (depressed versus healthy comparison group) as the between-subjects factor. Trials on which participants failed to make a response were excluded from reaction time analyses, and the rare trials in which participants coincidentally made a nonreversal error on an unexpected outcome trial were excluded from all analyses (as this meant that they accidentally preempted the reversal, making the expectancy of outcome unclear). Accuracy was determined as a proportion of the total number of trials for the type being examined; nonreversal reward errors were divided by the total number of nonreversal reward trials, and punishment reversal errors were divided by the total number of punishment reversals. As the task was deterministic, reversal errors were defined as errors on the trial immediately following the unexpected outcome (
9,
11). Partial eta-squared (η
p2) effect sizes are reported for all significant contrasts, and p values are Bonferroni adjusted.
Functional Neuroimaging
Image acquisition.
A GE Signa HDxt 3-T scanner (GE Healthcare, Milwaukee) was used to acquire structural and functional MR images. The functional sequence comprised six echo-planar imaging sessions of 255 volume acquisitions (flip angle=90°; repetition time=2,000 msec; echo time=30 msec; field-of-view=24×24 cm; slice thickness=3 mm; slice spacing=0.5 mm; matrix=64×64 sagittal slices with array spatial sensitivity encoding technique). The first 10 volumes from each session were discarded to avoid T1 equilibrium effects. The structural sequence comprised a magnetization-prepared rapid gradient echo anatomical reference image (flip angle=60°; repetition time=7,800 msec; echo time=3,000 msec; field of view=22×22 cm; slice thickness=1.2 mm; slice spacing=0 mm; matrix=246×192 for spatial coregistration and normalization).
Image analysis.
Images were preprocessed (see the data supplement that accompanies the online edition of this article) and analyzed using SPM8 (Wellcome Department of Cognitive Neurology, London). We estimated a general linear model, for which parameter estimates were generated at the onsets of all expected and unexpected reward and punishment trials (with zero duration), which co-occurred with the response. Consistent with our previous study, an unexpected outcome was the first outcome of a new stage, presented after learning criterion had been obtained (i.e., the outcome signaling contingency reversal), and all other outcomes were coded as expected outcomes, irrespective of task performance (
11).
Because of strong a priori hypotheses regarding the role of the striatum in this task, a region-of-interest analysis was performed by extracting standardized β values from the anatomically defined (
23) left and right caudate and putamen using the MarsBar software package (
24) for each trial type. In line with our hypotheses, across-group analyses were performed separately for each trial.
Next, to localize more specifically the peak differences in responses within the striatum and to investigate the extended functional anatomical network of regions that may interact with the striatum during task performance, a whole brain voxel-wise analy-sis was performed post hoc for each of the four trial types. For this whole brain analysis, a one-sample t test was created for each trial type (unexpected punishment and unexpected reward) with group as a covariate. Clusters are reported at voxel-level p values <0.001 (labels assigned using the automated anatomical labeling toolbox for SPM [
23]) and defined using a voxel-level threshold corresponding to an uncorrected p value <0.001 and coordinates reported (Montreal Neurological Institute [MNI]/Talairach) for peak voxel t value. Family-wise error voxel-level corrected p values are also reported for the peak voxel t values within small-volume-corrected regions of interest.
Discussion
Consistent with our hypothesis, a negative bias in reversal learning in depression was accompanied by altered reward-related striatal response. Specifically, we found impaired reward (but not punishment) reversal behavior in depression alongside attenuated ventral striatal response to unexpected reward. Thus, we provide a potential neural basis for the negative bias underlying the flexible-learning impairment in depression.
The attenuated reward-related striatal response in major depressive disorder is consistent with results of several recent studies examining reward processing deficits in different aspects of cognition in depression (
1,
2,
15,
25,
26). However, this study is the first to demonstrate valence specificity in the striatal response to reward and punishment in depression and the first to demonstrate that striatal attenuation in depression extends beyond the receipt and anticipation of reward (
15) to reward-based reversal learning. This blunted behavioral response to reward and not to punishment also provides an alternative explanation for the previously demonstrated impairment in reversal learning in depression (
5–
7); it may be driven by attenuated reward responses rather than by elevated punishment responses. Previous studies with the probabilistic reversal learning task failed to reveal differences in striatal function while solely examining reversals based on unexpected punishment (
5,
16,
17), and (although the interpretation of this latter negative finding was limited by the low generalizability and statistical sensitivity conferred by the relatively small sample sizes) we saw significant three-way interactions of valence, reversal, and depression and also failed to demonstrate striatum-specific differences between depressed and comparison groups on punishment-based reversals. The group difference in the striatal hemodynamic response was significant only when we compared responses to unexpected reward.
Under a variety of experimental conditions, mood disorders have been associated with abnormal neural processing in structures implicated in appetitive and aversive learning, including the orbitofrontal cortex (
18–
20) and the amygdala (
5), which likely contributes to the overall neurocognitive profile of depression. The locus within the striatum where we observed an attenuated hemodynamic response to unexpected rewards implicated a region of the anterior ventrolateral putamen, which receives projections from both the medial and orbital prefrontal cortical networks (
14,
27) as well as the amygdala (
28). Thus the attenuated BOLD response in the putamen may have been driven by abnormal afferent transmission from these cortical regions (
27,
29) rather than by a specific abnormality within the striatum. Notably, lesions in the ventral striatum, orbitofrontal cortex, pallidum, or mediodorsal nucleus of the thalamus have all been shown to cause perseverative deficits in stimulus-reward reversal tasks in rats and monkeys, such that the animals have difficulty switching away from previously rewarded but not unrewarded stimuli (
14). The present study thus extends the sources of altered neural transmission in depression to encompass attenuated reward reversal-related responses in the ventral striatum, but this finding is interpreted within the context of the limbic-prefrontal cortical-striatal-pallidal-thalamic circuits involving this part of the striatum (
11,
12,
14).
While the negative bias demonstrated with the reversal learning task used here joins the affective biases demonstrated by a range of cognitive tasks in depression (
2), the specific direction of the impairment we observed—attenuated reward processing rather than improved punishment processing—may be related to the impaired ability to derive pleasure from rewarding activities seen in depression. This hypothesis would be compatible with evidence that the functioning of the mesolimbic dopaminergic system, which plays a major modulatory role within the limbic-cortical-striatal-pallidal-thalamic circuitry (
30), is reduced in depression (both in general and in response to unpredicted reward) (
1,
14,
31,
32) and with evidence for the involvement of dopamine in punishment and reward learning in the striatum in this (
8–
10,
33) and other (
34,
35) tasks. Individuals with higher dopamine synthesis capacity, for instance, demonstrate improved reward-based relative to punishment-based reversal learning on the task we used here (
10,
36). Moreover, amphetamine-induced dopamine release within the anteroventral putamen is correlated with subjective feelings of euphoria (or hedonia) in healthy individuals (
37,
38). Thus, the attenuated anteroventral putamen response we identified in depression may reflect a reciprocal process: attenuated striatal response associated with reduced dopamine release and anhedonia. It is conceivable, furthermore, that amelioration of the reversal learning impairment and anhedonia in depression would result from enhancement of the mesolimbic dopaminergic system (
17,
39). Nevertheless, these hypotheses require testing in future studies, since the present study included neither anhedonia ratings nor assessments of central dopaminergic function.
Finally, our findings do not invalidate the proposition that depression is also associated with hypersensitivity to punishment in other contexts, such as when performance declines after a perceived error (and associated aversive feedback) on planning or mnemonic tasks (
2). Indeed, alterations to both reward and punishment processing are seen in depression (
1), and while this “catastrophic response to perceived failure” (
2, p. 64) is likely due to an enhanced impact of negative (punishing) judgment on performance, the task used in this study does not provide patients with explicit judgment about their performance and may therefore tap into distinct reward and punishment processing mechanisms. Indeed, one key advantage of neurocognitive assessment as a measure of pathology is that it is possible to target distinct neural systems with different cognitive tasks, thereby breaking down the underlying architecture of such multifaceted and subjective behaviors. Recent findings in fact implicate a habenula-rostromedial tegmental circuit in the processing of reward omission and expected punishment (
40), but our fMRI parameters were not optimized to detect signal change in a structure of this small size. Whether this circuit therefore underlies altered punishment processing in depression is a question for future research.