Gene-environment interactions (GxE) remain a contentious subject in psychiatry. It is uncontroversial to suggest that life experiences affect different individuals in different ways. However, disagreements about conceptual models, statistical designs, and standards of evidence for establishing specific GxE that crystalized a decade ago (
1–
3) remain unresolved. Two key areas of progress are improved precision in measurements of genetics, guided by genome-wide association studies (GWASs), and refined study designs that better isolate environmental risk factors. Both advances, along with enduring challenges, are showcased in the report by Cleary et al. (
4) in this issue of the
Journal.
Cleary et al. report an analysis of interactions between polygenic risk for depression and life stress in two cohort studies, the Intern Health Study and the U.S. Health and Retirement Study. The authors found that individuals at high genetic risk both benefit more from the buffering effects of social support in the context of a life stressor and suffer more from their absence. The observation is potentially meaningful in terms of advancing the understanding of the etiology of depression and the biology of genetic risk and could have clinical implications to the extent that genetic liability is incorporated into clinical practice.
The primary sample consisted of 1,011 medical residents (interns) of white European ancestry participating in the Intern Health Study (Intern Study), a prospective cohort of first-year interns in the United States. Residents completed an online questionnaire 2 months prior to the internship and then every 3 months during the internship year. Saliva samples were collected for DNA analysis. The main analysis was used to test whether genetic risk for depression modified the degree to which increases in social support following the start of internship protected against the development of depressive symptoms. The finding was that higher levels of genetic risk predicted a stronger protective effect of increasing social support. The authors interpreted this result as consistent with a differential susceptibility hypothesis (
2)—that is, what GWASs have uncovered as genetic risks for depression in fact reflect genetic plasticity factors modulating sensitivity to environmental inputs. Under this interpretation, putative risk alleles identified in GWASs affect the likelihood of developing depression because they make an individual more susceptible to both adverse and protective environments, while alternate alleles encode a less malleable psychology. If the differential susceptibility interpretation is correct, there are implications for precision medicine; individuals with high-plasticity values on the polygenic score could be more responsive to treatment and intervention than those with low-plasticity values.
Cleary et al. marshal further support from a conceptual replication using a data set of 435 older adults participating in the U.S. Health and Retirement Study (HRS) who were followed up after the death of their spouse. As in the Intern Study, higher genetic risk for depression predicted a stronger protective effect of increased social support, although to a lesser extent than in the Intern Study. Moreover, in the regression analysis of the HRS data, the “main effects” of both the polygenic score and social support were near zero in magnitude, in the direction opposite to expectations, and not statistically significant (p>0.50). In other words, while the interaction result was consistent with the Intern Study finding, the overall effect was much weaker.
These findings are provocative, but clarifying the interpretation will require further research. A key strength of both the Intern Study and HRS analyses is the inclusion of a stressor that is plausibly exogenous to the depression outcome, meaning that the stressor is experienced independent of the participants’ behavior. However, this is not the case with respect to social support, which could change in direct response to the participants’ experiences of depressive symptoms. For example, individuals experiencing depression may elicit increased social support from their network or withdraw from the potential social support. Alternatively, participants’ self-reported changes in social support may themselves be reflections of depressive symptoms rather than veridical accounts of the behavior of friends and relatives. In either case, statistical observations of an interaction could arise from mechanisms other than the differential susceptibility interpretation advanced by the authors. For example, environmental effects on phenotypic variance can produce this pattern of interaction (
5). A clear next step following from this analysis is to conduct a trial in which levels of social support are experimentally increased, for example, by providing some interns with access to peer support groups or soliciting nominations of friends and relatives at baseline and later prompting these individuals to increase social support over the course of the intern year.
Beyond its specific results, the study by Cleary et al. highlights progress and challenges in two features of study design central to long-standing debates over GxE in stress and depression: measurement and statistical modeling (
1,
3). Environments are hard to measure. Genomes are also hard to measure. And there is no settled answer as to how these pieces best fit together to investigate the etiology of depression. The study by Cleary et al. showcases some advances in the field and also illustrates areas where progress is needed.
In their measurement of “G,” the authors follow two important best practices. First, they based their measurement on genetic discoveries from a GWAS conducted in an independent sample. Although hypothesis-driven research on candidate genes for major depressive disorder continues, the empirical evidence to support a role for common variants in these genes in the etiology of major depressive disorder remains limited (
6). In contrast, GWASs have produced more consistent and replicable discoveries (
7) and therefore constitute a sounder basis for research into gene-environment interplay. Second, the authors composed their measurement of genetic liability by combining information across thousands of variants into a polygenic score. Major depressive disorder is a polygenic disorder—one influenced by many small-effect genetic variants across the genome (
8). The polygenic score approach integrates weak signals from these many small-effect variants into a single, stronger signal for analysis (
9). While the use of a GWAS-based polygenic score is a strength, the method used to construct polygenic scores lags behind current state-of-the-art approaches. The polygenic scores used in the Cleary et al. study were computed by weighting alleles at each SNP by the coefficients estimated in the GWAS and computing an average across the entire genome. This is a valid approach. However, new methods are now able to integrate GWAS results with information about patterns of correlation between different genotypes to achieve polygenic scores with superior predictive accuracy (
10). Establishing the robustness of the findings to alternative polygenic score specifications that are more predictive of major depressive disorder will be an important next step in this work.
In their measurement of “E,” the authors take full advantage of the exceptionally strong design of the Intern Study. What makes the Intern Study such an ideal platform for research on GxE in the context of stress and major depressive disorder is that it fixes two key parameters of the environmental exposure: intensity and timing. Stress measurement is an enduring challenge across behavioral sciences. Participant reports of stressful experiences are subject to reporter bias; for example, the same genetics that might modify the pathogenicity of stressful experiences could also affect the extent to which experiences are perceived or recalled as stressful. Experimental induction of chronic stress is unethical. The designers of the Intern Study did the next best thing: they enrolled a cohort of young people about to experience the same stressor for an extended period of time. The long hours and high pressure of medical residency are not uniform across residents. But few residencies could be described as low stress. And the design has the additional advantage of fixing onset of exposure across participants so that measurements can be obtained after the same exposure duration. Unfortunately, the HRS sample does not provide the same advantage. Participants’ depressive symptoms were measured at 2-year intervals, allowing substantial variation in follow-up time from a spouse’s death. In a study of the HRS data published in the
Journal several years ago, we found that this variation is consequential for depressive symptoms (
11), possibly explaining the weaker results from the HRS analysis in the Cleary et al. study. In our earlier study, we used nonlinear regression methods to address variation in timing to establish inferences about effects at a fixed exposure duration. Statistical models like the one we used have the potential to enhance analyses of GxE in studies where exposure timing cannot be fixed by design.
Statistical design is also a major point of contention in studies of GxE. Most individual differences among humans reflect some combination of genes and environments. GxE represent a special case of this combination. The standard interpretation of GxE can be phrased as “the whole is more than the sum of the parts.” In other words, being exposed to both G and E yields a level of risk that is greater than the sum of the individual risks associated with G and E on their own. This understanding of an interaction has different names, including “biological synergy” in epidemiology and “super-additivity” in statistics (
12). The test of interaction used by Cleary et al. was different. Instead of super-additivity (the whole is more than the
sum of the parts), their model tested super-multiplicativity (the whole is more than the
product of the parts). When both effects are >0 and <1, the product will be smaller than the sum. The interpretation becomes even more complicated when one factor is associated with risk and the other factor is protective. The bottom line is that tests of super-additivity and super-multiplicativity can yield different results. It is possible to recover tests of super-additivity from models like the ones fitted by the authors; best practice in the analysis of GxE should be to report results of both tests (
13). There are additional considerations complicating the interpretation of interaction terms in models such as the ones used by Cleary et al., again requiring extra steps to clarify the correct interpretation (
14).
These statistical considerations are just as relevant when results do not indicate the presence of GxE. For example, a recent study of American soldiers assessed before and after deployment to Afghanistan found that both genetic risk and a measure of social support (unit cohesion) were predictive of incident postdeployment major depressive disorder but did not find evidence of GxE (
15). That study had the same powerful design as the Intern Study, because the timing and intensity of exposure was fixed. However, the statistical approach taken to testing GxE had the same limitations outlined above. Reconciling these apparently conflicting findings is therefore complicated not just by differences in the subjects and stressors but also by a lack of clarity about the interpretation of statistical results. For research on GxE in psychiatry to move beyond old arguments and build a solid foundation for progress, future efforts should seek to combine the powerful designs of studies like the Intern Study and the deployment study with more robust and comprehensive statistical approaches to testing GxE.
Finally, it is worth amplifying the authors’ comments regarding the restriction of the analysis sample to White European-descent participants, which was appropriately noted as a limitation. The reason for this restriction was that the polygenic score they used was derived from a GWAS conducted in a White European-descent sample. Predictive accuracy of polygenic scores tends to decline with more distant genetic relatives of the sample studied in the original GWAS (
16). This is a major barrier to the equitable clinical translation of GWAS discoveries (
17). It may also impede the progress of research on GxE given the extent to which our environments are determined by ancestry-correlated features of our identities, such as racial and ethnic identification. There is progress in transancestry genetic discovery (
18). Expanding these efforts is a critical frontier in psychiatric genetics.