In clinical practice, TMS targeting is usually based on scalp measurements, resulting in different patients receiving stimulation of different sites in the prefrontal cortex (
6–
8). This practice produces variance in the stimulation site, which can be linked to an underlying circuit by using the human connectome as a template (
9). This variance has been used to retrospectively determine which stimulated circuit led to the best overall antidepressant response (
10,
11). For example, patients who received stimulation at cortical sites connected to the subgenual cingulate have shown greater improvement in global depression scores (
10,
11). This retrospective association has been used to optimize TMS targeting for recent prospective clinical trials (
12–
15).
Methods
Complete methodological details are presented in the online supplement.
Subjects and Data Collection
This study included two retrospective data sets of patients who received TMS for treatment-resistant depression (see Table S1 in the online supplement). All patients received high-frequency TMS using a standard figure-of-eight coil to the left dorsolateral prefrontal cortex (DLPFC) at 120% of resting motor threshold for 3–6 weeks. Stimulation sites were recorded on the patient’s MRI and transformed to a standard brain atlas using the nonlinear transform function in SPM8. Further study details are delineated in the online supplement.
Discovery data set.
Thirty patients received a 4- to 6-week naturalistic course of clinical TMS at the Berenson-Allen Center for Noninvasive Brain Stimulation in Boston. Treatment was targeted using the 5.5-cm rule. The stimulation site was recorded using stereotactic neuronavigation software (BrainSight, Rogue Research, Montreal). Outcomes included patients’ self-reported responses on the Beck Depression Inventory (BDI) and a modified clinician-reported version of the 24-item Hamilton Depression Rating Scale (HAM-D). The BDI was treated as the primary outcome measure because responses were collected immediately before and after the clinical TMS course. The HAM-D was treated as a secondary outcome measure because responses were collected several months before or after treatment (mean pre- to posttreatment interval, 126 days), which may limit its reliability (
17). The HAM-D was also modified for routine clinical use, including omission of three of the 24 items (genital/libido, paranoia, and insight). A subset of this cohort was previously used to predict global antidepressant response (
10).
Replication data set.
Eighty-one patients received active TMS, and 87 received sham treatment as part of the Optimization of TMS for the Treatment of Depression Study (
6,
18). Patients received 3 weeks of blind TMS (active or sham), which was used as our primary endpoint. Some patients also received up to 3 additional weeks of open-label active TMS, which we used as a secondary endpoint. Treatment targeting was done using the 5-cm rule. The stimulation site was recorded by placing a vitamin E capsule over the site during an MRI scan. The primary outcome was the clinician-reported 28-item HAM-D, as in the original clinical trial (
6). The self-reported Inventory of Depressive Symptomatology was also collected as a secondary outcome measure. This data set was used for replication rather than discovery to enable comparisons between active and sham treatment for the replication and because the method of recording the stimulation site may be less precise.
Statistical Testing
For permutation testing, each patient’s symptom response was randomly reassigned to a different patient’s stimulation site. After 10,000 iterations, solutions were considered significant if the outcome of interest was stronger for the real data than for 95% of random permutations. For correlation analyses, normality was assessed using the Shapiro-Wilk test, and nonnormally distributed data were rank-transformed.
Seed-Based Normative Connectivity Analysis
For each patient, the volume of brain tissue modulated by TMS was estimated using a previously specified model (
10,
19) and using the above record of the stimulation coordinates in standard atlas space (
Figure 1A). Normative connectivity of this stimulation site was computed using resting-state functional MRI (rsfMRI) data from a large connectome database (N=1,000) (
20), as in past work (
10). This method provides a highly reliable estimate of each stimulation site’s expected connectivity profile (
Figure 1B), enabling direct comparison of stimulated circuits and clinical changes (
9–
11,
16). Subject-specific functional connectivity data were available only for a subset of patients (N=30, out of 198) and thus were not incorporated into the model.
Correlation was computed between these stimulation site connectivity maps and the total change in each depression symptom across subjects. For example, in our discovery cohort, we performed correlations across 30 patients for each of the 21 symptoms measured by the BDI. At each voxel, this yielded a value representing the degree to which this voxel’s connectivity to the stimulation site predicted efficacy for any given symptom. We refer to these voxel-wise maps as symptom-response maps (
Figure 1C).
Clustering of Symptom-Response Maps
Because some symptom-response maps looked similar to one another, we classified similar maps into common clusters. The degree of similarity was quantified by calculating spatial correlation between each pair of symptom-response maps. Maps that were most similar to one another were grouped into clusters using Ward’s hierarchical clustering method (
21). The strength of clustering was evaluated using the gap statistic (
22). Symptom-response maps within a cluster were averaged to generate a mean cluster-response map.
Control analyses were conducted to confirm that the results were not driven by the following factors:
1.
Characteristics of the stimulation site alone: The analysis was repeated using baseline symptom severity (which should be unrelated to the stimulation site) rather than symptom improvement.
2.
Covariance in symptom response alone: The analysis was repeated using symptom change alone, without considering the stimulation sites or their connectivity.
3.
Random chance: The analysis was repeated after randomly assigning each patient’s stimulation site to a different patient’s clinical response.
4.
Artificial binarization of a continuous distribution: We tested for normality of the real data (which should be nonnormal) using the Shapiro-Wilk test. We also used a permutation test to confirm that the strength of clustering (gap statistic) was stronger than would be expected by chance.
5.
Influence of confounders: The analysis was repeated after controlling for the effect of age, sex, number of current antidepressant medications, and number of current psychotropic medications.
Reproducibility of Clustering Model
To assess reproducibility of our cluster-response maps, we repeated our clustering analysis for both symptom inventories in both data sets. Because “sadness” was a consistent question item across all symptom scales and data sets, we designated the cluster containing this symptom as cluster 1 and the remaining cluster as cluster 2. Discovery cluster-response maps were compared with replication cluster-response maps using spatial correlation, comparing cluster 1 to cluster 1 and cluster 2 to cluster 2. To assess significance, these values were compared with the spatial correlation generated from randomly permuted replication data. We also tested whether this replication was specific to active relative to sham stimulation. To confirm that the results were not driven by the choice of data set, we repeated our analysis using our discovery data to reproduce results from our replication data set. For all control analyses, we maintained the same definition of cluster 1 as the cluster containing “sadness.”
Building a Targeting Atlas
For each voxel in the DLPFC, we used spatial correlations to compute the similarity between that voxel’s connectivity and our two cluster-response maps. The DLPFC was defined as in Fox et al. (
23). By definition, these maps reflect the symptomatic improvement expected after TMS to each voxel in the DLPFC. Because the two maps showed minimal overlap, we combined them into a single targeting atlas, which depicts targets expected to preferentially improve one of the two symptom clusters. We repeated this analysis for each data set independently.
Prediction of Relative Improvement in Symptoms Across Patients
The targeting atlas from one data set was used to predict preferential improvement in one of the two symptom clusters for each patient in the other independent data set. This prediction was based solely on the location of the patient’s stimulation site on the independent targeting atlas. Correlation was calculated between this predicted symptom response and the actual symptom response (the ratio of clinical improvement between the two clusters). Symptom ratios were rank-transformed because ratio analyses inherently show skewed distributions (
24). This analysis was repeated for the sham arm of the replication data set. The significance of the interaction between the active and sham arms was assessed through permutation testing.
To confirm that the result was not influenced by features related to the normative connectome database (N=1,000), each cohort was also analyzed using an independent subset of this connectome database (N=500, for each).
Combining Both Data Sets
Symptom-response maps were generated for all symptoms across both data sets (active TMS only) and were clustered using the same methods as described above. Symptom-response maps in each cluster were averaged to form two overall cluster-response maps. These two cluster-response maps were used to build a composite targeting atlas following the methods described above.
Prediction of Relative Symptom Improvement Across Studies
A literature review was conducted to identify therapeutic TMS trials that used a focal figure-of-eight coil and measured distinct mood and anxiety rating scales, similar to previous work (
16). Because individual symptom items were not reported in most studies, we used depression scales as a proxy for dysphoric symptoms and anxiety scales as a proxy for anxiosomatic symptoms. Each study’s stimulation site was identified and converted to Montreal Neurological Institute (MNI) coordinates. The location of this stimulation site on the composite targeting atlas was used to predict the ratio of improvement between the two symptom domains. Correlation was calculated between this predicted symptom response ratio and the actual symptom response ratio. We also calculated the percentage of studies in which the model correctly predicted which symptom scale would improve preferentially. A single-proportion z test was used to determine whether this percentage was significantly different from 50%.
As a control, we repeated this analysis for trials of “deep” TMS that use a broader stimulation field than traditional figure-of-eight coils. We hypothesized that these nonfocal coils would stimulate both circuits, resulting in similar improvement in dysphoric and anxiosomatic symptoms. Specifically, we tested whether symptom response ratios showed greater variance in trials using focal TMS coils compared with nonfocal TMS coils.
Results
Different Circuits for Different Symptom Clusters
Starting with our discovery data set and the patient-reported BDI, we computed 21 symptom-response maps that show connections with the stimulation site most correlated with improvement in each of the 21 symptoms (
Figure 1C; see also Figure S1 in the
online supplement). The correlation values in these maps were stronger than would be expected by chance (permutation test p<0.001). Although each symptom-response map was unique, the maps demonstrated one of two general topographic patterns (
Figure 2; see also Figure S2 in the
online supplement). An optimal two-cluster solution explained 73% of the variance as quantified by the gap statistic (see Figure S2 in the
online supplement). One cluster included symptoms such as sadness, decreased interest, and suicidality, and a smaller cluster included symptoms such as irritability, sexual disinterest, and insomnia (
Figure 2). For convenience, we refer to these clusters as “dysphoric” and “anxiosomatic,” respectively.
We performed multiple control analyses to validate the clustering model (details are in Figure S2 in the online supplement). When this analysis was repeated using baseline symptom severity rather than symptom change, individual symptom-response maps showed weaker cross-correlation (unpaired t test, p<1.0×10−12) and did not result in a two-cluster solution. When this analysis was repeated using only symptom change, without consideration of the stimulation site, no clusters of covarying symptoms were identified. When both stimulation site and symptom change were included but randomly permuted, individual symptom-response maps showed weaker cross-correlation (p<0.005), and a two-cluster solution was identified less than half the time (42%). When a two-cluster solution was forced, it explained less variance than the two-cluster solution from the real data (p=0.02). Our two clusters did not result from artificial binarization of a normal distribution, as the real data deviated significantly from the normal distribution (p<0.05; see Figure S2 in the online supplement) and showed significantly stronger clustering than randomly permuted data (p<0.05; see Figure S2D in the online supplement). Finally, the maps were nearly identical when recomputed to control for age, sex, number of antidepressant medications, and number of psychotropic medications (spatial r>0.99).
Reproducible Clusters Across Symptom Scales and Independent Cohorts
An optimal two-cluster solution with similar topography was identified in our replication cohort, despite using a different symptom scale as the primary outcome measure (
Figure 2). The use of secondary outcome measures for each data set (the HAM-D for the discovery sample and the Inventory of Depressive Symptomatology for the replication sample) produced nearly identical results (see Figures S4 and S5 in the
online supplement). These cluster-response maps were more similar than would be expected by chance across different symptom scales (p<0.01) and across independent data sets (p<0.01). These results were not seen using the sham data (N=87), which produced a six-cluster solution. When forced to use a two-cluster solution, the sham data failed to match the discovery cohort as well as the active data (p<0.05). The reproducibility was unrelated to the choice of cohort, as the discovery cohort also significantly reproduced the maps derived from the replication cohort (p<0.01).
Identification of Potential Treatment Targets in Each Cohort
The maps described above represent connections specific to “antidysphoric” or “antianxiosomatic” TMS sites in the discovery (
Figure 2A) and replication (
Figure 2B) cohorts. To identify TMS targets connected to these regions, we used spatial correlations to compare each voxel’s connectivity profile with each dysphoric and anxiosomatic cluster map. This revealed an atlas of potential antidysphoric and antianxiosomatic TMS sites (
Figure 3A and
3B). For each cohort, the antidysphoric and antianxiosomatic atlases were combined into a targeting atlas (
Figure 3C) to predict relative improvement in dysphoric and anxiosomatic symptoms in the other cohort (examples shown in
Figure 4A).
Validation Across Patients
We tested whether we could predict clinical improvement in one patient cohort using the targeting atlas derived from the other independent cohort (
Figure 4). Starting with the stimulation sites from our discovery cohort, we found that the location of each patient’s stimulation site on the targeting atlas derived from our replication cohort predicted the ratio of dysphoric to anxiosomatic symptom improvement (N=30, r
s=0.69, p<10
−4). This prediction was significantly stronger for the real discovery data than for randomly permuted discovery data (p<10
−3).
Similarly, we found that the location of each patient’s stimulation site in our replication cohort on the targeting atlas derived from our discovery cohort could predict clinical response at both the 3-week time point (N=81, rs=0.47, p<10−5) and after the full course of treatment (N=81, rs=0.52, p<10−6). This prediction was significantly stronger in the active arm of the replication data set than in the sham arm of the same data set (p<10−3).
Results were unchanged when controlling for baseline symptom severity or when using an independent connectome data set (2×N=500) to compute the discovery and replication targeting atlases.
Combining Both Data Sets
All symptom-response maps across both data sets (active arm only) and both symptom scales were combined into a single large clustering analysis. This model again identified distinct dysphoric and anxiosomatic clusters. The symptom-response maps in each cluster were averaged to create combined cluster maps across data sets.
These cluster maps were largely the inverse of one another (spatial r=−0.85), but this result was not mandated by the clustering analysis, as permuted data with a forced two-cluster solution showed a significantly weaker spatial correlation (mean r=−0.07, p<0.05). Furthermore, some regions were not inverted and were common to both clusters, including the left DLPFC, the right orbitofrontal cortex, the periaqueductal gray, and the left anterior insula (see Figure S6 in the online supplement). Because connectivity may differ in the included patient population, we repeated this final clustering analysis using a smaller connectome database of patients who received rsfMRI scans before a clinical course of TMS for major depression (N=38). This analysis revealed a solution similar to that derived from our larger normative connectome database (see Figure S7 in the online supplement).
These overall antidysphoric and antianxiosomatic maps were combined into a single composite targeting atlas following the same procedure as described above (
Figure 5A–C). The peak dysphoric target was at MNI coordinates [−32, 44, 34], close to the “anti-subgenual” target used in recent TMS trials (
Figure 5A) (
12,
13). This dysphoric target lies at the intersection of Brodmann’s areas 9, 10, and 46 and is part of a brain network variably referred to as the ventral attention, salience, and cingulo-opercular network (see Figure S8 in the
online supplement) (
20).
Within the left DLPFC, the peak anxiosomatic target was at MNI coordinates [−37, 22, 54], close to the 5-cm target used in early TMS clinical trials for depression (
6,
25). There was also an anxiosomatic target in the dorsomedial prefrontal cortex at MNI coordinates [18, 37, 55], close to the target used in TMS trials for eating disorders, obsessive-compulsive disorder (OCD), and the anxiety/insomnia biotype of major depression (
Figure 5B) (
2,
26–
28). Both anxiosomatic targets fall within Brodmann’s area 8, part of the default mode network (see Figure S8 in the
online supplement) (
20).
A recent lesion network mapping study identified a specific DLPFC region with increased connectivity to lesion locations associated with overall depression scores. This region falls at the intersection of our dysphoric and anxiosomatic circuit targets (see Figure S8C in the
online supplement) (
29).
Prediction of Relative Symptom Improvement Across Studies
In an exploratory analysis, we tested whether our composite targeting atlas could predict variability in past TMS clinical trials. In addition to the two data sets described above, we identified 12 studies (see Table S4 in the
online supplement) reporting distinct depression and anxiety scales after at least 3 weeks of high-frequency focal TMS (
6,
12,
14,
26–
28,
30–
36). Six studies stimulated relatively dysphoric sites, and eight studies stimulated relatively anxiosomatic sites (
Figure 5C). Our prediction was correct in 13 of 14 studies (p<10
−3, single proportion z test). Relative improvement in depression as opposed to anxiety was strongly correlated with the predicted improvement based on the location of each stimulation site on our targeting atlas (r=0.88, p<10
−4) (
Figure 5D).
Of note, the seminal clinical trial by O’Reardon et al. (
25) was excluded because it lacked anxiosomatic scales, but it was consistent with our prediction for the 5-cm stimulation site. Specifically, the study reported greater improvement in scores on the HAM-D, a multidimensional scale that also incorporates anxiety and somatic symptoms (
37), than on the Montgomery-Åsberg Depression Rating Scale, a unidimensional metric more heavily focused on dysphoric symptoms (
37).
As a control, we identified six additional studies of nonfocal “deep” TMS, which stimulates a broad field covering all the stimulation targets identified in this study (
38–
44). We hypothesized that nonfocal TMS would modulate symptoms nonspecifically, leading to a uniform depression-to-anxiety ratio across studies. Indeed, the depression-to-anxiety ratio showed significantly less variance with nonfocal TMS (range=0.94–1.19, SD=0.10) than with focal TMS (range=0.67–1.94, SD=0.38) (p<0.01, F test for equal variances).
Discussion
Despite long-standing recognition that patients with major depressive disorder can have very different symptoms (
45), the field has yet to reach a consensus subclassification. Traditional classifications have contrasted “melancholic” with “agitated” or “atypical” subtypes, while contemporary classifications have contrasted “core” symptom clusters with “sleep” and “anxiety” clusters (
46,
47). Some studies have proposed distinct circuit-based “biotypes” associated with distinct clinical features, such as anhedonia, anxiety, or rumination (
1,
2). However, despite these advances, subtyping based on symptoms and biological correlates has yet to improve treatment outcomes (
48).
Here, we took a reverse approach toward therapeutically oriented biotyping. Rather than identifying biotypes and then searching for therapeutic relevance, we started with therapeutic response to an anatomically targeted intervention. This intrinsically links therapeutic specificity with anatomical specificity and identification of treatment targets. Thus, we were able to simultaneously identify two symptom clusters that respond to two distinct neuroanatomical treatment targets. We labeled these response patterns as dysphoric and anxiosomatic to prevent confusion with existing diagnostic classifications. Although these cluster labels and their constituent symptoms may be refined in future work, our analyses confirm that different patterns of response are associated with different TMS targets. These clusters were not explained by random chance or by anatomical features inherent to the stimulation sites. The clusters were also not driven by covariance in symptom improvement alone, suggesting that the anatomical contrast greatly increases our power to detect symptom-specific efficacy.
At the individual patient level, our targets predicted clinical improvement across independent data sets, suggesting potential clinical utility. At the population level, our targets explained variance in past TMS clinical trials. This exploratory population-level analysis is somewhat limited by inclusion of anxiosomatic symptoms in most depression scales, which may introduce noise into the analysis. Despite this noise, our model predicted significant variance in past clinical trials, suggesting that this approach may be useful for assessing future clinical trial outcomes.
Our finding of distinct dysphoric and anxiosomatic circuit targets is consistent with recent work identifying anhedonia/psychomotor retardation and anxiety/insomnia as distinct biotypes in major depressive disorder (
2). In this study, the anxiety/insomnia biotype was preferentially responsive to dorsomedial prefrontal TMS (
2), which aligns with our anxiosomatic target (
Figure 5). Our work is also consistent with recent studies showing that anhedonia was preferentially responsive to DLPFC stimulation sites that align with our dysphoric target (
49–
51).
Strengths of our study include an empirical data-driven approach independent of a priori models of depression or symptom localization (
52), replication across independent cohorts with very different study characteristics, specificity to active versus sham TMS, permutation-based statistics to ensure valid clustering (
53), and direct therapeutic relevance for guiding future trials. The primary weakness of our study is that both of our independent cohorts were retrospective. This was not a clinical trial. The goal of our study was to derive reproducible symptom-specific TMS targets. We achieved this goal, but the clinical utility of the identified targets and symptom clusters remains to be prospectively tested in a randomized clinical trial.
It is worth noting that not all symptom-response maps were equally consistent across data sets and symptom scales. Some symptoms, such as sadness, decreased interest/activities, suicidality, guilt, and hopelessness/pessimism, were consistently part of the dysphoric cluster. Symptoms such as insomnia and sexual dysfunction were consistently part of the anxiosomatic cluster, with the exception of the HAM-D in the discovery cohort. This HAM-D score may be less reliable because it was collected at a clinical appointment months before the start of TMS and excluded questions about sexual function. Alternatively, reduced consistency in our anxiosomatic cluster may be due to greater heterogeneity in the interpretation of anxiosomatic relative to dysphoric question items (
54) or the fact that our study focused on patients with major depression, who will have an overrepresentation of dysphoric relative to anxiosomatic symptoms. Further work using a truly transdiagnostic patient cohort with deeper phenotyping (including a discrete anxiety scale) may enable more definitive characterization and subclassification of these provisional clusters.
Additional limitations of our study that may be improved in future work include individualization of TMS and functional connectivity models. TMS electric field modeling may help to incorporate additional variables, such as angle and rotation of the TMS coil, stimulation intensity, and patient-specific anatomy (
55). Individualized rsfMRI data may further enable modeling of interindividual variability in brain circuit topography (
56). However, these specialized tools may be difficult to implement in clinical practice. Electric field modeling can require a high-performance computing cluster (
55), and reliable individualized functional connectivity measurements may require hours of specialized MRI scanning (
56). The added value of these advances needs to be determined and weighed against the additional cost and complexity for guiding clinical intervention.
An important question is whether our results are biased by factors that are mandated by the analysis. For example, a recent connectivity-based biotyping study of major depression may have artificially clustered a continuous phenomenon (
53). We ensured that our clustering model was based on a parameter that did not follow a continuous distribution and that randomly permuted data showed significantly weaker clustering. Similarly, because the human brain is divided into two large anticorrelated networks (
57), it is possible that any analysis would identify two symptom clusters that map to these two networks. We ensured that this was not the case by showing that randomly permuted data and sham TMS data both failed to yield a similar two-cluster solution. Finally, because we first clustered both data sets and then predicted the ratio of symptom improvement using those same clusters, the analysis may seem circular. However, the ratio of symptom improvement for each data set was predicted on the basis of a targeting atlas derived from an independent data set, mitigating this concern. Furthermore, a biased or circular analysis would have produced similar results for the sham replication cohort, which was analyzed in the same way as the active replication cohort but failed to produce similar results. Randomly permuted data also failed to produce similar results when analyzed in the same manner.
Our findings in this study may help facilitate personalized medicine, targeting patient-specific symptoms based on neuromodulation of specific brain circuits. At a minimum, this work supports distinct assessment of dysphoric and anxiosomatic symptoms for patients receiving clinical TMS for depression. This may enable patient-specific selection and adjustment of treatment targets. For instance, residual anxiosomatic symptoms may be targeted in patients who initially show improvement only in dysphoric symptoms. Anxiosomatic sites may also be chosen for certain patients with comorbid anxiety disorders, OCD, posttraumatic stress disorder (PTSD), or somatoform disorders.
This work may also inform the design of clinical trials in neuromodulation for disorders with predominantly dysphoric versus anxiosomatic symptoms. Existing neuromodulation trials have employed broad symptom scales (
6,
12,
25,
58–
60) that may be insensitive to symptom-specific effects, potentially leading to false negative results (
14,
35,
61,
62). For instance, a recent trial failed to demonstrate antidepressant efficacy of TMS in part because it appeared to exacerbate comorbid PTSD symptoms (
14). Our results suggest that the 6-cm TMS target employed in this study would be expected to improve dysphoric but not anxiosomatic symptoms, offering a potential explanation for these results. Future neuromodulation trials may benefit from an assessment tool that clearly discriminates between different patterns of response.
Finally, our results may be relevant beyond the field of TMS and depression. For instance, different symptoms of Parkinson’s disease respond to different deep brain stimulation (DBS) targets. Tremor (but not other symptoms) responds well to thalamic stimulation, while dyskinesias respond best to stimulation of the globus pallidus (
63). Given evidence that symptom improvement depends on connectivity between the DBS site and other brain regions (
64), distinct circuit targets for specific symptoms of Parkinson’s disease could potentially be derived using the method outlined in this study.
In summary, we describe a novel approach for harnessing a targeted intervention to distinguish anatomical-behavioral relationships that previously remained elusive. This approach provides a framework for empirical derivation of symptom-specific treatment targets that may prove useful in other disorders and for other types of focal neuromodulation.