Unlike in most fields of medicine, diagnosis in psychiatry remains restricted to subjective self-reports and observable symptoms (
1). The biological mechanisms underlying psychiatric disorders are complex and still poorly understood. The weak link between established diagnostic methods and objectively measured biological indices (
2) forms a barrier to the development of effective personalized treatments (
3). To overcome this problem, many studies have aimed to stratify psychiatric disorders in an attempt to identify consistent subgroups based on objective biological markers (
4,
5).
Accordingly, a recent innovative study by Stevens et al. (
6) aimed to discover brain-based biotypes of trauma resilience and psychopathology in the acute aftermath of trauma, using data from the AURORA longitudinal study of trauma survivors (
7). Using functional MRI (fMRI) data obtained during performance of simple and widely used tasks that probe threat reactivity (
8), reward reactivity (
9), and inhibitory engagement (
10), the authors identified four different clusters of individuals in a discovery cohort (N=69). In an internal replication cohort (N=77), three clusters were replicated: “reactive-disinhibited,” “low reward-high threat,” and “inhibited” clusters (see Figure 2 in the original paper [
6]). Those replicated clusters were associated with different longitudinal trajectories of posttraumatic stress disorder (PTSD) and anxiety symptoms. Interestingly, the cluster of individuals showing heightened reactivity to both threat and reward was associated with the highest levels of subsequent PTSD and anxiety symptoms (
6).
Multimodal longitudinal studies of the posttraumatic stress response are rare because they present formidable technical and conceptual challenges (
11). These challenges include obtaining a large enough sample of recent trauma survivors; optimal timing of assessments and sufficient follow-up duration to capture critical stages in the longitudinal development of PTSD (
12); minimizing subjects’ burden; and contacting, enrolling, evaluating, and retaining sensitive clinical populations. To the best of our knowledge, the only existing data set comparable to the AURORA study comes from the Neurobehavioral Moderators of Posttraumatic Disease Trajectories study (
11) (ClinicalTrials.gov identifier: NCT03756545).
Here, using a similar data set from recent trauma survivors and closely matched analytic processes, we aimed to perform a conceptual nonexact replication of the Stevens et al. study (
6). Independent replications are needed as they represent a fundamental part of science and lead to greater confidence in previously reported findings (
13,
14). These replications are particularly relevant to neuroimaging studies because of their large analytical variability (
15) and to the field of psychiatry, which suffers from a well-known heterogeneity problem (
16). Nevertheless, engaging in replication studies is often undervalued, can be difficult to publish, and has few direct incentives for researchers (
17).
The main objective of the present study was to examine whether the previously identified brain-based clusters would generalize to our independent sample of trauma survivors. If these clusters were replicated, we aimed to test whether a cluster characterized by heightened reactivity to both threat and reward would be similarly associated with increased subsequent symptoms of PTSD and anxiety, which would demonstrate the stability of a biological phenomenon across similar measures of psychopathology.
Methods
The study was approved by the ethics committee in the local medical center (reference number 0207/14). All participants provided written informed consent in accordance with the Declaration of Helsinki and received financial remuneration at each assessment (1, 6, and 14 months posttrauma).
Data included in this study were collected between 2015 and 2020 as part of the NIMH-funded Neurobehavioral Moderators of Posttraumatic Disease Trajectories study. The study’s design and detailed methodologies have been previously published (
11), and aspects relevant to the present study are summarized below. Overall, we conducted all analyses as closely as possible to the published analytic pipeline of the original study (
6), with the aim of replication, and included similar preprocessing and analysis of the neuroimaging data using fMRIprep (
18) and SPM-12 (
19), the same anatomical regions of interest (ROIs), and an identical clustering analysis. We used an adapted version of the R code applied in the original study, which was kindly provided by the corresponding author. The methods section here is organized in a similar way to the original publication to further facilitate the comparison between the two studies.
Participants
Potential subjects for this study were adult civilians 18–65 years old who were consecutively admitted to a general hospital emergency department after one of the following events: motor vehicle accident, bicycle accident, physical assault, robbery, hostilities, electric shock, fire, drowning, work accident, terror attack, or large-scale disaster. Individuals who had sustained head injuries, were unconscious on admission to the emergency department, or were not able to provide informed consent or comprehend the study’s procedures were excluded from the study. Participants with conditions precluding MRI scanning (e.g., pacemaker, metal implants, and large tattoos) and those with current substance use disorder, current suicidal ideation, or lifetime psychotic disorder were also excluded. In contrast to the AURORA study, this study also excluded individuals with a prior diagnosis of PTSD. All participants provided oral consent to the study’s screening telephone interview and written informed consent upon attending a subsequent diagnostic and eligibility ascertainment clinical interview.
Clinical Assessments
A comprehensive clinical interview was conducted by trained and certified clinical interviewers using the Clinician-Administered PTSD Scale (CAPS) (
20,
21) to assess PTSD diagnosis and severity at each time point. To maintain continuity with decades of DSM-IV-based PTSD research in light of evidence of nonoverlapping groups of individuals diagnosed with PTSD based on DSM-IV or DSM-5 criteria and a consequent recommendation to use a broader PTSD definition for empirical research (
22–
24), we administered a combined clinical interview scoring both CAPS-4 (DSM-IV) and CAPS-5 (DSM-5) items (
20,
21). A positive diagnosis of PTSD was inferred when a participant met either DSM-IV or DSM-5 diagnostic criteria or, in line with previous recommendations (
25), had a total score ≥40 on the CAPS-4. As a secondary continuous measure of PTSD symptom severity, participants completed the PTSD Checklist (PCL) (
26), a 17-item self-report questionnaire corresponding to the DSM symptom criteria for PTSD. As a continuous measure of anxiety symptom severity, participants completed the Beck Anxiety Inventory (BAI) (
27), a 21-item self-report questionnaire measuring physical and cognitive anxiety symptoms.
MRI
Acquisition.
Whole-brain functional and anatomical images were acquired with a 3.0-T Siemens MRI system (MAGNETOM Prisma, Germany) with a 20-channel head coil at the Sagol Brain Institute, Wohl Institute for Advanced Imaging, Tel Aviv Sourasky Medical Center. Functional images were acquired in an interleaved order (anterior to posterior) with a T2*-weighted gradient-echo planar imaging pulse sequence (TR=2,000 ms, TE=28 ms, flip angle=90°, voxel size=2.2 mm3, FOV=220×220 mm, slice thickness=3 mm, 36 slices per volume). A T1-weighted three-dimensional anatomical image was obtained with a magnetization-prepared rapid gradient-echo (MPRAGE) sequence (TR=2,400 ms, TE=2.29 ms, flip angle=8°, voxel size=0.7 mm3, FOV=224×224 mm) to enable optimal localization of the functional effects.
fMRI tasks.
A face-matching task (
28) similar to that used by Stevens et al. (
6) was used to probe threat reactivity. The safe or risky domino choice paradigm (
29) was used to measure reward reactivity, which was not the same as the simpler reward task used by Stevens et al.; nevertheless, the two reward reactivity tasks showed similar neural activations in key regions associated with response to reward in the brain (see results in the
online supplement). Unlike the original study, we did not have an fMRI task measuring response inhibition (for a full description of the tasks, see the
online supplement).
fMRI data preprocessing.
Functional images were preprocessed with fMRIPrep, version 1.5.8 (
18). Functional imaging scans were coregistered to the anatomical T1-weighted images, corrected for motion, spatially realigned, slice-time corrected, normalized to the 2009 ICBM-152 template, and smoothed with a 6-mm kernel (for full details, see methods in the
online supplement).
fMRI data analysis.
Similar to the original study (
6), the analysis was performed with SPM, version 12 (
19). The final sample was restricted to participants with good-quality data across all fMRI tasks (N=130; see below). The ROIs were the same regions used in the original study, which were kindly provided by the corresponding author. They were anatomically defined and included the left and right amygdala, insula, subgenual and dorsal anterior cingulate cortex (sgACC and dACC, respectively), nucleus accumbens (NAcc), and orbitofrontal cortex (OFC) (for full details, see methods in the
online supplement).Procedure
A total of 4,058 consecutive trauma survivors admitted to the emergency department were contacted by telephone within 10–14 days after trauma exposure, were given information about the study, and provided informed assent (
Figure 1). Of those, 3,476 individuals underwent initial screening (i.e., “short” interview) and confirmed the occurrence of a psychologically traumatic event and related symptoms; 1,351 individuals subsequently underwent eligibility assessment (i.e., “long” interview), which further assessed acute stress symptoms that were suggestive or indicative of chronic PTSD risk (
30). A total of 435 individuals met the inclusion criteria for this study, did not meet any of the exclusion criteria, and were subsequently invited for an in-person clinical interview. Of these individuals, 300 attended the interviews, and 171 also underwent fMRI assessment, both within 1 month after the trauma. Of these, 41 individuals were excluded for the following reasons: missing (N=16) or partial (N=5) functional scans of the threat or reward reactivity tasks; poor-quality functional scans (e.g., excessive movement or artifacts) (N=6); missing or poor-quality structural scans (N=5); missing or partial behavioral data from the tasks (N=5); failure to properly understand the instructions (N=1); or missing clinical data (N=3). A final sample of 130 individuals with valid anatomical and functional brain imaging data from both fMRI tasks were included in this report (
Table 1).
Clustering Analysis
As noted above, we used R code that was adapted from the code applied in the original study (
6) with R, version 4.1.1, and RStudio, version 1.4.1717. Hierarchical agglomerated clustering was conducted with data from the ROIs extracted from the two fMRI tasks (i.e., threat and reward reactivity fMRI tasks) with the
cluster package (version 2.1.2) following Ward’s criterion (agnes function). This bottom-up method is designed to preserve the existing data structure, does not impose any assumptions of linearity, and is appropriate for exploratory analyses. The optimal number of clusters was determined using both the silhouette width metric (
31) and Hartigan’s distance metric (
32).
Cluster analytic algorithms are prone to find clusters even when the underlying data do not contain clusters but are multivariate and normally distributed (
33). Therefore, we expanded the analytic plan of Stevens et al. (
6) by testing whether the results of our cluster analysis meaningfully differed from the null hypothesis that our data did not contain clusters. For this purpose, we used a procedure reported by Dinga et al. (
34) that was based on a procedure originally proposed by Liu et al. (
33). In this approach, the null hypothesis is that the data come from a single multidimensional Gaussian distribution, that is, a distribution with no underlying clusters, with the number of dimensions equal to the number of features included in the clustering analysis (for full details, see methods in the
online supplement).
Analysis of Posttrauma Outcomes by Cluster
First, because different demographic characteristics might influence the cluster solution due to the unconstrained analytic approach, chi-square tests (categorical variables) and analyses of variance (ANOVAs) (continuous variables) were used to assess whether demographic characteristics differed between the clusters. Second, chi-square tests (categorical variables) and ANOVAs (continuous variables) were used to assess whether the clusters were significantly different in PTSD dichotomous diagnosis (i.e., PTSD or no PTSD), PTSD symptom severity (i.e., CAPS-4 and CAPS-5 total scores), self-reported PTSD symptoms (i.e., PCL scores), and self-reported anxiety symptoms (i.e., BAI scores). Benjamini-Hochberg (
35) false discovery rate (FDR) correction (q<0.05) was calculated to control for multiple comparisons of these different clinical measures.
All tests were two-tailed and used a significance threshold of p=0.05.
Results
Participants’ Demographic and Clinical Characteristics
A sample of 130 recent trauma survivors (mean age, 33.61 years, SD=11.21, range=18–64 years; 62 [48%] women) were included in all the analyses reported below. The most common trauma type among participants was a motor vehicle accident (N=115, 88%); 10 (8%) participants experienced an assault or brawl, and five (4%) participants experienced other traumatic events (for full demographic and clinical characteristics, see
Table 1). Similarities and differences in these characteristics between our sample (N=130) and the original cohorts (
6) (discovery: N=69; replication: N=77) are reported in the results in the
online supplement.
Covariance Among fMRI Tasks and ROIs
To assess feature redundancy, we examined the covariance structure among the tasks and ROIs. In line with Stevens et al. (
6), different ROIs showed high positive within-task covariance but low between-task covariance (
Figure 2A). Within the same task, there were moderate to high correlations between different ROIs, with the highest correlations being between reactivity to threat observed in the amygdala and reactivity to threat observed in both the sgACC and dACC (for both, r=0.56, p<0.01) (
Figure 2A). Similar to the original findings (
6), reactivity in a particular region was uncorrelated across the two tasks. Specifically, threat reactivity in the amygdala was not correlated with reward reactivity in the amygdala (r=0.08, p=0.77) (
Figure 2A).
Clustering of Individuals Using Task-Based fMRI at 1 Month Posttrauma
Hierarchical clustering performed with data from all 130 participants suggested an optimal solution of four clusters (k=4) according to Hartigan’s distance metric (see Figure S1a in the
online supplement) and two clusters (k=2) according to the silhouette width metric (see Figure S1b in the
online supplement); we therefore tested both solutions. To avoid redundancy, and in line with Stevens et al. (
6), we report the four-cluster solution here (
Figure 2B) and the two-cluster solution in the supplemental results (see Figure S2 in the
online supplement).
Assessment of different fMRI activation patterns revealed a subgroup of 18 individuals showing high reactivity of all brain regions to threat, predominantly in the dACC and sgACC, and to reward, predominantly in the NAcc (cluster 4 in
Figure 2C). The other three clusters were more similar to each other and were indeed part of the same cluster according to the two-cluster solution (clusters 1–3 in
Figure 2C). While individuals in cluster 1 (N=28) showed low reactivity to both threat (in the dACC and sgACC) and reward (in the amygdala), those in cluster 2 (N=44) showed low reward reactivity (predominantly in the NAcc) and those in cluster 3 (N=40) showed high reward reactivity (similar levels across the three regions), with relatively no threat reactivity (clusters 1–3 in
Figure 2C).
Notably, as in the original study, the clusters were unrelated to any of the demographic characteristics. There was no significant association between cluster assignment and participants’ age (F=0.492, p=0.688), years of education (F=0.412, p=0.745), gender (χ2=0.865, p=0.834), or marital status (χ2=2.032, p=0.236).
Finally, following Dinga et al. (
34) and Liu et al. (
33), we tested the statistical significance of the observed Hartigan distance index. In our data set, the four-cluster solution resulted in the optimal Hartigan’s distance index. Using a simulation approach (described in the methods section in the
online supplement), we found that this index was not statistically significant (Hartigan’s distance index=18.15, p=0.371) (see Figure S1c in the
online supplement). In other words, it is not unusual to observe such an index even when the hierarchical clustering is performed on a multivariate, normally distributed data set with no clusters.
Prospective Trajectories of Mental Health Among the Different Clusters
The four clusters did not differ in prospective 6-month PTSD dichotomous diagnosis (i.e., PTSD or no PTSD), PTSD symptom severity (i.e., CAPS-4 or CAPS-5 scores), self-reported PTSD (i.e., PCL scores), or anxiety symptom severity (i.e., BAI scores) (
Figure 3 and
Table 2). Similarly, these four clusters did not differ in any clinical measure at 14 months posttrauma (
Table 2; see also Figure S3 in the
online supplement). Statistical significance further decreased after applying the FDR correction for multiple comparisons (
35) (for all comparisons, 0.907≤p
FDR≤1.000) (
Table 2). In summary, there was no association between individuals’ cluster membership and PTSD or anxiety at 6 months posttrauma (the original study’s endpoint) or at 14 months posttrauma (this study’s endpoint).
Discussion
In this conceptual nonexact replication and extension of the Stevens et al. study (
6), we failed to replicate the previously identified neuroimaging-based biotypes (
6) or their association with prospective posttraumatic stress symptoms. Despite overall similarities in study design and aims, participant characteristics, and fMRI probes, ours is not an exact replication of the original study. Nevertheless, nonexact replications can provide strong evidence for robustness and external validity of previous findings and demonstrate generalization beyond specific study design choices and populations (
36). On the other hand, when original findings are not replicated (
37), it is hard to determine whether it is because of the methodological differences or because the original findings were false positive ones.
There are several potential explanations for our inability to replicate the results of Stevens et al. (
6), mainly due to methodological differences between the studies. First, replication data were obtained from a single site in Israel, compared to original data collected at several different sites across the United States. Although some demographic characteristics were similar in the two studies (e.g., participant ages), other characteristics differed (e.g., gender and trauma type) or were not collected in the present study (e.g., race/ethnicity and childhood trauma) (
Table 1; see also results in the
online supplement). Second, while this study specifically screened participants for early posttraumatic stress symptoms and excluded individuals with prior PTSD diagnoses, the original study did not have these constraints. Third, while our neuroimaging data were collected 1 month posttrauma (mean=30.43 days [SD=9.54] posttrauma), MRI data in the original study were obtained at a slightly earlier time point (mean=21 days [SD=6] posttrauma). Fourth, while Stevens et al. (
6) assessed symptoms based on abbreviated self-report tools (a limitation noted by the authors), we used gold-standard structured interviews (CAPS) administered by trained and certified clinicians. These assessments were performed at 1, 6, and 14 months following trauma exposure, whereas the assessments in the original study ranged from 30 days pretrauma (queried retrospectively in the emergency department) to 6 months posttrauma (for a total of five assessments). Importantly, a 6-month follow-up duration is a dynamic time point in the course toward the tangible chronic PTSD subtype (
38), whereas a 14-month follow-up is clinically stable and indicative of the chronic disease, as further recovery is marginal (
12). Finally, while clustering in the original study was performed with nine neuroimaging measures from three different tasks, in the present study, clustering was based on seven measures from two different tasks (we did not include an fMRI inhibition task). Furthermore, although both studies used the same fMRI task to probe threat reactivity, different tasks were used to assess reward reactivity. Nevertheless, both reward tasks showed similar whole-brain activations in the contrast of reward-gain versus punishment-loss (see results in the
online supplement).
The large clinical heterogeneity of posttraumatic psychopathology, together with recent advances in statistical and computational methods, motivated the search for homogeneous PTSD subtypes through data-driven approaches (
39). Nevertheless, the presumption of distinct and homogeneous subgroups might not be clinically useful or represent the biology underlying PTSD (
34). For example, most clustering approaches will always yield clusters regardless of the data structure, even if there are no clusters at all (
33). Hence, it is crucial to distinguish biologically and clinically meaningful subtypes from random data fluctuations or noise (
33). Here, following a procedure by Dinga et al. (
34), we showed that the four-cluster solution could be found even if the data came from a single Gaussian distribution with no underlying clusters; this finding supports the fact that four clusters (as shown in
Figure 2C) were not more likely than no clusters at all. This statistical test was not performed by Stevens et al. (
6). Another way to deal with the disadvantages of data-driven approaches is the use of hybrid analytic methods (
16), which combine prior knowledge and assumptions (theory-driven and supervised) with data-driven (unsupervised) approaches (
40).
In conclusion, our results highlight that slight changes in sample characteristics or experimental tasks can have a critical impact on the replicability of neuroimaging-based biotypes and their association with posttraumatic stress symptoms. This is in line with recent findings suggesting smaller than expected brain-phenotype associations and large variability across population subsamples (
41), as would be expected from a disorder with over 600,000 potential phenotypes (
42). Therefore, studies should carefully specify their design and methodology to define populations to which the results could be generalized until more stable and unified measures are established in psychiatry. Importantly, caution is warranted when attempting to define PTSD subtypes using neuroimaging data before treatment implications can be fully realized. Future replication studies may assist in closing the translational gap between basic psychiatric research and practice, advancing the development of meaningful biological tools to assist diagnosis or predict clinical outcomes (
3).
Acknowledgments
The authors thank Jennifer Stevens, Ph.D., for her collaboration and willingness to share her materials and methods (namely, the R code used for the clustering analysis and the anatomical regions of interest). The authors also thank the research team at Tel Aviv Sourasky Medical Center (including Naomi Fine, Nili Green, Mor Halevi, Sheli Luvton, Yael Shavit, Olga Nevenchannaya, Iris Rashap, Efrat Routledge, and Ophir Leshets) for their significant contributions to participant screening, enrollment, and assessments. The authors also extend their gratitude to all the participants of this study, who completed assessments at three different time points after experiencing a traumatic event, thus contributing to scientific research of posttraumatic psychopathology.