Functional abnormalities in fronto-striatal circuits underlie inhibitory control deficits and cognitive inflexibility in obsessive-compulsive disorder (OCD) (
1), while dysfunction in mesolimbic regions (e.g., the hippocampus and amygdala) underlies fear expression in patients with the disorder (
2). Together with ventral fronto-striatal regions (e.g., the orbitofrontal cortex and ventral striatum), these mesolimbic regions comprise a reward-processing system (
3) that allows us to anticipate, respond to, and learn from reward outcomes in quotidian life. Whereas the ventral/anterior hippocampus is intrinsically connected to the ventral striatum (
4), processes reward-related information, and is preferentially involved in anxiety (
5), the dorsal/posterior hippocampus preferentially processes spatial information (
6). Using an ecologically valid task of reward-based spatial learning adapted from animal research, we sought to identify functional impairments in mesolimbic and ventral striatal circuitry that may contribute to OCD behaviors.
OCD patients perform poorly on tasks that require adjusting responses based on changing reward feedback (
7), consistent with findings of aberrant processing of reward in the orbital frontal cortex and ventral striatum during reversal learning (
8) and reward anticipation (
9). Visuospatial impairment has also been described (
10,
11), but the neural correlates of spatial learning have not been assessed in OCD. Together with anatomical findings of reduced gray matter in corticolimbic areas (
12) involved in reward expectancy (
13) and of smaller amygdala and hippocampal volumes in patients with refractory OCD, when compared with a comparison group (
14), these data suggest that OCD patients have functional and structural abnormalities in the brain regions that support reward-based spatial learning.
Spatial learning is often assessed in rodents by having them navigate an eight-arm radial maze (
15). We adapted this paradigm to a virtual reality environment for use with functional MRI (fMRI) (
16). Both the animal and human tasks require learning to use extramaze cues to navigate and find hidden rewards. Healthy individuals activate temporoparietal areas when searching the maze, as occurs with other spatial navigation tasks (
17–
20). Importantly, our task includes a control condition in which the use of spatial cues to find hidden rewards (i.e., spatial learning) is experimentally disabled, allowing us to assess the neural correlates of reward processing in the absence of spatial learning and disentangle the neural correlates of reward processing and learning. Healthy individuals activate the hippocampus and amygdala when receiving unpredicted rewards in the control condition compared with the learning condition, a finding that we speculated could be attributed to enhanced dopaminergic firing from ventral tegmental areas to the ventral striatum and these mesolimbic areas in response to unpredicted rewards (
21).
In the present study, we used our translational fMRI task to assess the neural correlates of reward-based spatial learning in unmedicated individuals with OCD who were free of comorbid illnesses. Twenty-one of these individuals were treatment naive, and 12 were off psychotropic medications for at least 12 weeks. Given findings of functional deficits in reward-processing circuits (
9) and compensatory hippocampal engagement during other learning tasks (
22) in OCD, together with the differential roles of the posterior and anterior hippocampus in processing spatial and reward information, respectively (
6), we made the following hypotheses. First, we hypothesized that whereas both the OCD and healthy comparison groups would engage tempoparietal areas while navigating the maze, OCD participants would overengage the posterior hippocampus during spatial learning. Second, we suspected that OCD participants would not engage the anterior hippocampus and ventral striatum to the same extent as healthy participants during reward anticipation or in response to reward receipt, especially when rewards were unpredicted in the control condition compared with the learning condition. We also explored associations of mesolimbic and temporoparietal activations with OCD severity and symptom dimensions.
Method
Participants
Unmedicated adults with OCD (N=33) and healthy comparison subjects (N=33), group-matched by age, sex, and ethno-racial groups, were recruited through flyers, Internet advertisements, and word-of-mouth. The institutional review board of the New York State Psychiatric Institute approved this study. Participants provided written, informed consent prior to entering the study.
Details of the exclusion criteria, clinical assessments, and behavioral analyses are described in the data supplement accompanying the online version of this article.
Reward-Based Spatial Learning Paradigm
Our reward-based spatial learning paradigm has been described elsewhere (
16,
23). The virtual reality environment consisted of an eight-arm radial maze surrounded by a naturalistic landscape (e.g., mountains, trees, and flowers) that constituted the extramaze cues that could be used for spatial navigation (
Figure 1). Prior to scanning, participants practiced navigating a similar maze on a desktop computer.
Stimuli during scanning were presented through nonmagnetic goggles. Participants used an MRI-compatible joystick (Current Designs, Philadelphia) to navigate the maze. Before scanning, participants were informed that they would be in the center of a maze with eight identical arms extending outwards and that hidden rewards (monetary) would be available at the end of the arms. They were instructed to navigate the maze to collect the rewards and that they could keep any money they found, but they would lose money if they revisited an arm. They were told that they would complete several sessions of the task but not that the sessions differed from one another. Therefore, they believed that they would perform the same task multiple times.
The paradigm included an active-learning and a control condition. In the learning condition, all eight arms were baited with rewards. As participants navigated the maze, they had to learn the spatial layout of the extramaze cues to avoid revisiting arms. After each arm visit (trial), participants reappeared at the center of the maze with their viewing perspective randomly reoriented to prevent use of strategies, such as chaining (systematically selecting neighboring arms). After collecting all eight rewards, the learning condition was terminated.
Next, participants were presented with a screen indicating that a new session was beginning. In this control condition, identical extramaze cues used in the learning condition were randomized among locations after each trial to destroy any possibility of using the spatial layout of the cues (spatial learning). To control for the reward/punishment frequency in the learning condition, participants were rewarded at the same frequency but without regard to their actual performance. This control condition thus shared all salient features with the learning condition, including lower-order stimulus features and higher-order task features. This condition terminated following the number of trials that a given participant needed to obtain all eight rewards in the learning condition. If a participant required 18 trials to find all eight rewards in the learning condition (i.e., eight correct and 10 error trials), they would be given 10 unbaited trials randomly in the control condition. Thus, contrasting neural activity in the learning condition (during spatial learning) and the control condition (in which spatial learning is impossible) reveals the neural correlates of reward-based spatial learning.
Participants underwent two runs of each condition. The learning condition always preceded the control condition to establish the number of trials and reward frequency for the control condition. Thus, the paradigm contained 32 rewarded navigation events (8 rewards × 2 conditions × 2 runs), but the number of unrewarded events varied for each participant. Following completion of the paradigm, participants were provided the same amount of money regardless of performance.
Image Acquisition and Processing
A General Electric Signa 3T-LX scanner (General Electric, Milwaukee) and a standard quadrature General Electric head coil were used for image acquisition. Axial functional images were positioned parallel to the anterior commissure-posterior commissure line using a T1-weighted sagittal localizing scan. Functional images were obtained using a T2*-sensitive gradient-recalled single-shot echo-planar pulse sequence (time to repeat=2,800 msec; echo time=25 msec; 90° flip angle; single excitation per image; field of view=24×24 cm; matrix=64×64; 43 slices 3 mm thick, no gap, and covering the entire brain). The number of echoplanar imaging volumes collected was determined by the performance of each participant in the learning condition, with a maximum of 322 volumes per run.
As described elsewhere (
16), image preprocessing procedures were run in batch mode using MATLAB 7.9 (Mathworks, Natick, Mass.) and implemented in SPM8 (Wellcome Department of Imaging Neuroscience, London) and FSL (FMRIB Software Library,
www.fmrib.ox.ac.uk). Preprocessing consisted of slice-time correction, using a windowed Fourier interpolation to minimize dependence on the reference slice, motion correction, and realignment to the middle image of the middle scanning run (
24). Images with estimates for peak motion exceeding 3-mm (one voxel) translation were repaired with ArtRepair (
25). Runs with more than 15% of such images were discarded for poor quality (
26). Motion-corrected functional images of each participant were coregistered to the corresponding three-dimensional spoiled gradient recall anatomical image, which was spatially normalized to Montreal Neurological Institute (MNI) space (avg152T1 template brain) with a voxel size of 2×2×2 mm
3. Normalization parameters warped the functional images into the same MNI space as the spoiled gradient echo image. Normalized images were spatially smoothed using a Gaussian-kernel filter with a full width at half maximum of 8 mm. Spatially smoothed fMRI time series were temporally high-pass filtered with a cutoff frequency of 1/128 Hz through a discrete cosine transform to remove low-frequency noise (e.g., scanner drift).
Image Analysis
Extraction of subject-level signal differences across the learning and control conditions of the spatial learning task was conducted using general linear models in SPM8. Four regressors corresponding to specific events that occurred during each trial of each condition were defined (
Figure 1A). “Searching” was defined from the start of a trial until an arm was selected and committed to (and 10% of its length was traversed). Reward “anticipation” began after the first 10% of an arm was traversed and extended until its baited area was reached. The two types of reward feedback possible at an arm’s terminus were defined as “reward,” when a monetary reward was won, and “no-reward,” when no monetary reward was won. These regressors were convolved with a canonical hemodynamic response function, with the durations of the regressors for each participant modeled according to the durations of these events during performance in the learning condition. For these regressors, a t contrast vector was applied to the parameters (beta_j) estimated for each voxel j producing four contrast images for each participant representing each regressor/event (searching, anticipation, reward, no reward) compared across the two conditions (learning, control).
A random-effects “omnibus” analysis (F test in SPM8) was used to test the significance of interactions between group (OCD, healthy comparison), condition (learning, control), and event (searching, anticipation, reward, no reward) across the whole brain, covarying for sex. To correct for multiple comparisons, we applied a cluster extent threshold with an a priori significance threshold set at a p value of 0.01. The cluster extent threshold was obtained with Monte Carlo simulations (10,000 iterations) implemented in custom software written in Matlab. Group composite activation maps generated for each contrast were used to examine the interactions resulting from the omnibus test; voxels identified using a p value threshold <0.01, together with a cluster extent threshold of 25, are reported. Subject-level fMRI signal differences across the learning and control conditions and an implicit baseline (consisting of the unmodeled components of the task) were extracted to derive parameter estimates for individual participants at specific peaks of the statistical map for that contrast. These post hoc tests determined group differences in activation associated with the learning and control conditions for each event.
Discussion
We used a translational paradigm to investigate the neural correlates of reward-based spatial learning in unmedicated individuals with OCD. Participants had to use extramaze cues to navigate the maze and find rewards in the learning condition, but randomization of the scene prevented use of the cues to learn the reward locations; thus, spatial learning was experimentally prevented in the control condition. Both OCD and healthy participants demonstrated spatial learning, taking less time and fewer trials to find all eight rewards in the second scan run compared with the first scan run. Group differences in neural activity associated with searching the maze, anticipating, and receiving rewards were detected in a left hemisphere cluster encompassing the hippocampus, amygdala, and ventral putamen. Although both groups engaged temporoparietal areas typically engaged by healthy individuals during spatial navigation (
17–
20), only participants in the OCD group engaged the left posterior hippocampus. Additionally, healthy participants exhibited activation in the left anterior hippocampus, amygdala, and ventral putamen when receiving unexpected rewards in the control condition, consistent with our previous findings with this task in another sample of healthy individuals (
16). In contrast, in OCD participants, signal in these mesolimbic regions decreased relative to baseline in response to receiving unexpected rewards; activation was instead detected in response to receiving expected rewards in the learning condition. Finally, only healthy participants showed activation in the left ventral putamen and amygdala when anticipating rewards in the learning condition. These findings suggest abnormal functioning of mesolimbic and ventral striatal circuitry in OCD during reward-based spatial learning.
Healthy participants did not show activation in the posterior hippocampus when searching the maze, a finding we previously interpreted as evidence that the (posterior) hippocampus works with other medial temporal regions to create a map of the environment (
28). In contrast, participants in the OCD group exhibited activation in the left posterior hippocampus when searching and receiving rewards in the learning condition, suggesting that they required additional neural processing resources to learn/remember the spatial layout of the cues, consistent with their needing more trials (attempts) than healthy participants to obtain all eight rewards in run 1 and with the role of the left hippocampus in episodic memory (
29). OCD participants took more time to find all rewards in run 1, and their performance speed correlated positively with activation in the left posterior hippocampus during navigation. Perhaps their greater engagement of this region contributed to their greater improvement (compared with healthy participants) in performance (speed and number of trials) from run 1 to run 2. Greater reliance on the hippocampus is consistent with findings of compensatory hippocampal engagement in OCD participants during performance of other learning tasks (
22). Both performance speed and activation in the left posterior hippocampus during navigation were positively associated with doubt/checking symptoms, suggesting that the OCD participants who endorsed more of these symptoms required the most time and greatest reliance on the posterior hippocampus to find all rewards.
Unlike healthy participants, unmedicated OCD participants did not show activation in the ventral striatum in response to receiving unexpected rewards in the control condition. Lesion, neurophysiological, and fMRI studies typically implicate the ventral striatum, specifically the nucleus accumbens, in processing reward prediction errors (
30). The fMRI data from healthy individuals suggest that ventral striatal activation increases with positive prediction errors (i.e., when reinforcement is greater than expected [
31,
32]). Our findings suggest that the receipt of unexpected rewards is the prediction error signal that activates the ventral striatum on this task in healthy participants. However, in OCD participants, the receipt of unexpected rewards was associated with decreased BOLD signal relative to baseline in the ventral putamen, an effect typically associated with omitted rewards in healthy individuals (
32,
33). Abnormal ventral striatal function when processing rewards is consistent with findings from studies using a monetary incentive delay task of reward processing in OCD patients (
9,
34). Our finding of attenuated ventral striatal activation during reward anticipation in OCD participants is also consistent with these previous data (
9).
Together, these findings suggest ventral striatal dysfunction in reward signaling in OCD pathophysiology, perhaps contributing, in part, to the inflexible control over behaviors. Blunted reward signaling, for example, might decrease the rewarding relief that should normally result from a behavior, thereby contributing to difficulty controlling the urge to repeat it. These findings can also be interpreted in terms of the dopaminergic system, since dopamine is associated with reward-based learning (
21). Neurophysiological findings suggest that ventral striatal dopaminergic neurons fire in response to unpredicted rewards (
35). If ventral striatal activation reflects the normal phasic activity of dopaminergic neurons in response to reward unpredictability, then our fMRI findings suggest abnormal phasic activity of striatal dopaminergic neurons in OCD, consistent with positron emission tomography data (
36).
In healthy participants, activation in the left ventral putamen, along with the anterior hippocampus and amygdala, was detected in response to receiving unexpected rewards. The ventral striatum is intrinsically connected to the anterior hippocampus (
4), which has a preferential role over the posterior hippocampus in processing reward information (
6), and to the amygdala, which is also involved in reward prediction error signaling (
37). In OCD participants, signal in these regions decreased relative to baseline in response to receiving unexpected rewards, suggesting that the processing of reward prediction errors is abnormal in OCD. However, given the role of the anterior hippocampus in anxiety (
5), this lack of activation may also represent greater baseline activity within these connected regions in persons with OCD, particularly in the right hippocampus, since our findings of group differences were localized to the left hemisphere. Thus, baseline psychophysiological measures of anxiety should be incorporated into future studies.
In OCD participants, activation in the left ventral putamen and anterior hippocampus during reward receipt in the control condition was inversely associated with severity ratings on the doubt/checking dimension, suggesting the least activation in those who endorsed the most doubt/checking symptoms. Mesolimbic dysfunction specific to this symptom dimension may, in part, be a result of reduced gray matter volume in mesolimbic areas in OCD patients with prominent checking compulsions (
12,
38). Electrophysiological (
39) and fMRI (
40) data indicate that the anterior hippocampus encodes uncertainty, consistent with our findings of anterior hippocampal activation in response to unexpected reward in healthy individuals. Thus, the processing of uncertainty within these regions is likely altered in OCD, consistent with evidence that OCD patients—especially those with checking compulsions—are highly intolerant of uncertainty (
41).
This study is limited by the modest sample size and spatial resolution of fMRI that does not allow differentiation of detailed hippocampal subregions that may contribute differently to reward-based learning. We also cannot exclude the possibility that group differences in brain activations were due, in part, to group differences in visuospatial processing or affective responses to receiving/not receiving rewards. Finally, searching and reward-related activations might be less distinct than we suggest, given the timing of the task and slowness of the hemodynamic response function.
In conclusion, this is the first study, to our knowledge, to assess the neural correlates of reward-based spatial learning using a translational fMRI paradigm in unmedicated participants with OCD. Our data point to mesolimbic and ventral striatal dysfunction associated with reward-based spatial learning in OCD, confirm findings of hippocampal compensation (
22), and suggest that the neural processing of unpredictable rewards is abnormal in OCD. Future studies will determine whether these functional abnormalities precede clinical expression of OCD (and could be biomarkers of risk) or whether these abnormalities follow the clinical expression of OCD (and could be targets for treatment).