Precise effect size estimates are specifically important for novel biomarkers being used in clinical applications. This report extensively probes possible explanations for mixed associations between sgACC-StimFC and clinical outcomes. First, by examining sgACC connectivity in single subjects with large amounts of fMRI data (“precision functional mapping” data sets), we show that existing methods (
2) compensate adequately for noisy sgACC signals. Second, by examining the association between sgACC-StimFC and treatment outcome in the largest data set to date—295 patients with major depressive disorder who received either 10 Hz repetitive transcranial magnetic stimulation (rTMS) or intermittent theta-burst stimulation (iTBS) (
19)—we find the model used to estimate the stimulated area in the DLPFC to be a relevant source of variability. Third, by subsampling this large population, which showed a weak overall effect, we demonstrate that sampling variability alone can account for the variable efficacy reported in the literature. Finally, by examining the association between signal quality and treatment prediction, we show, unexpectedly, that a subsample of the population with pronounced global signal variation drives the relationship of sgACC-StimFC to clinical outcome; this subsample was characterized by an especially high variance in global fMRI signals that occur in association with a known pattern of breathing that is prominent in some patients with depression (
20).
Collectively, these results provide strong evidence for a true association between FC of the stimulation site with the sgACC and clinical outcome but call into question the utility of current FC-based targeting protocols that rely on single-echo fMRI data and isolated sgACC FC measures, because our results imply that only about 3% of clinical outcome variability is modifiable by this approach.
Discussion
Here, in a large rTMS data set, we found a relationship between sgACC-StimFC and treatment outcome, as previously reported. The effect we observed was substantially smaller than that described in several previous reports and accounted for only 3% of treatment response variance, hence calling into question the utility of this marker in its current form. Furthermore, this relationship was obtained only when weight maps were used to impute otherwise noisy sgACC FC, when the stimulated part of the DLPFC was estimated with E-field modeling, and when the sample included certain subjects, namely, those with low tSNR due to high signal fluctuations that have been specifically linked to irregular breathing. The interpretation of these findings is multifaceted. On the one hand, the association between sgACC-StimFC and treatment outcomes was modest in this sample and was contingent on certain choices. On the other hand, the effect was strengthened by more accurate modeling approaches, and it was robust to the partitioning of the data, robust across measurement scales, and robust to the inclusion of covariates. Furthermore, it was concentrated in a particular set of subjects who display identifiable characteristics. Our interpretation is at once tempered and hopeful: striking, strong linkages of FC to outcome in a large sample were not found, but the concentration of these linkages in a subpopulation suggests potential avenues toward understanding who is responsive to treatment and why.
Unexpectedly, the predictive value of sgACC-StimFC was strongest in patients with patterns of global signal variance associated with burst breathing. One possible explanation derives from the recent observation that burst breathing coincides with high-amplitude BOLD events that are time-lagged between networks (
20). The spatiotemporal structure of these time-lagged events is clearly revealed after removing CO
2-related global signal fluctuations and is such that it accentuates negative FC between the default mode network (of which the sgACC is commonly a part) and networks represented in the DLPFC (salience and frontoparietal networks). Consistent with this notion, subjects with prominent burst breathing had more negative sgACC-StimFC (see Figure S10B in the
online supplement), and omission of global signal regression removed the observed effect (see Figure S6B in the
online supplement). Hence, the anticorrelation in FC between the stimulation site and the sgACC might be an indirect index of what intrinsic networks have been engaged by TMS, which might be the true mediator of the effect. This would be aligned with a rich literature linking individually variable functional network topology to various behavioral domains (
21,
26,
27). Future work might elucidate the utility of these more comprehensive measures of functional brain organization that are only obtainable from high-quality data, such as multiecho fMRI scans. Alternatively, a propensity for burst-like respiration may differentiate a subsample of depressed patients for whom sgACC FC strongly mediates rTMS outcomes. We did not find strong evidence in the MSC and ME data for burst breathing having an impact on the validity of the weight-map method, besides a minimally better run-to-run reliability in male versus female MSC subjects (which could be indirect evidence, as burst breathing is more prevalent in males; see Results section in the
online supplement). If reliability is increased in subjects with burst breathing, this could increase the power to detect a true effect in these subjects. All of these factors may contribute to the effect. It is also possible that burst breathing may be influenced by somnolence or fluctuating levels of arousal during the scan. However, at present, there is no evidence linking burst breathing to low vigilance or somnolence, and several observations in the HCP data speak against it (
20), rendering vigilance an unlikely mediator of these effects.
With regard to data quality, it should be noted that the mean tSNR values were high for BOLD fMRI data, which was likely due to the somewhat larger-than-average voxel size. In terms of acquisition parameters, the THREE-D data compare favorably to data from previous studies (
5,
6,
9,
16–
18), with the shortest TR (2 seconds) and second-longest acquisition time (total of 20 minutes) among those studies. These factors render data quality an unlikely candidate to explain effect size discrepancies between our findings and those of previous studies.
We discuss other sources of heterogeneity that potentially contribute to the discrepant effect sizes between those of our study and previous studies in the Discussion section in the
online supplement. One possible source of heterogeneity is the study population: major depressive disorder is not a unitary disease entity, and sgACC-StimFC might be more relevant for predicting rTMS outcome in particular subpopulations with major depressive disorder. The THREE-D sample comprised patients with major depressive disorder and a relatively high rate of comorbid anxiety and polypharmacy (
19), as seen in naturalistic samples. However, previous reports of large effect sizes also included populations with similarly high rates of comorbid anxiety and polypharmacy (
5,
6), indicating that these variables alone are unlikely to explain the small effect we observed. The effect size we report (r=−0.16) represents a “best-case scenario” in our sample, where data across scans and treatment conditions were concatenated to increase power, and results with the QIDS-SR, which yielded stronger effects than the HAM-D, were used. Consequently, the small effect size might itself be inflated and likely represents the upper bound of a true effect obtainable in this data set.
Several limitations should be noted. First, like all previous studies, this was a retrospective analysis as opposed to a prospective analysis with a preformulated analysis pipeline. Second, subjects in the THREE-D study received either 10 Hz rTMS or iTBS, while previous studies of sgACC FC investigated the effects with 10 Hz rTMS only. However, we found a similar effect in both conditions, which suggests that our conclusions apply to both 10 Hz rTMS and iTBS treatment modalities. Third, like other studies of sgACC FC and rTMS, the THREE-D study did not include a sham control condition. Both the sgACC and DLPFC have been implicated in placebo responses (
28,
29), and our observation of a modest correlation between symptom improvement and sgACC-StimFC might be attributable to treatment responses, placebo responses, or both. The DLPFC, in particular, has shown consistent involvement in placebo antidepressant responses across metabolic, perfusion-based, and fMRI studies (
30). On the other hand, we also found that accurate E-field modeling strengthened the association between sgACC-StimFC and outcome, which may point to a treatment effect. E-field modeling of the stimulation site decreases variance by sampling FC features that were stimulated with TMS but increases subject-to-subject variance with respect to which FC features are being averaged within the DLPFC (i.e., the chosen features depend on each subject’s unique E-field distribution and shape). Since the neuroanatomical substrate of a placebo response is unlikely to move from subject to subject in line with their E-field, we interpret this as supporting evidence for a TMS treatment effect and not just a placebo response. Fourth, neither our study nor earlier ones included pupillometric or electrophysiological monitoring of vigilance, which can have marked confounding effects on the global signal and FC. A further limitation was the lack of respiratory belt traces in the THREE-D data. However, bursting signal patterns closely resembled previously reported patterns in the HCP data (
Figure 4B), and we exactly replicated a burst breathing–specific sex bias previously reported in the HCP data (
20), which is supportive evidence for the correct identification of irregular breathing patterns in the THREE-D data set (
Figure 4C). Lastly, the validity of our results—and those of previous studies—rests on the accurate identification of the stimulated cortex, which is an active area of research (
23). We used a biophysical E-field model that is, to our knowledge, the primary method for estimating the spatial distribution of TMS effects. However, the extent to which this model accurately predicts neuronal populations modulated by rTMS is still partially unknown, and future models might yield better approximations. Additional discussion of alternative explanations of our findings associated with burst breathing, our effect size estimate, and analytical choices is provided in the Discussion section in the
online supplement.
Together, these results suggest the need to reevaluate the utility of current sgACC FC-based targeting approaches for rTMS and elucidate relevant sources of variability that might have led to mixed results in previous studies. At the same time, these results provide strong evidence for the existence of a true—albeit weak—association. This highlights the need to further explore fMRI predictors for rTMS targeting, possibly involving additional network measures and fMRI data types. Methodological improvements, including multiecho fMRI and dense-sampling approaches, have already shown promise for enabling more comprehensive and personalized characterizations of functional brain organization that could be used to make better treatment predictions in the future. Prospective clinical trials designed to evaluate the clinical utility of fMRI-guided targeting strategies in naturalistic, real-world settings will be critical.