T
o the E
ditor: We respect Border et al.’s ambitious undertaking (
1), which was published in the May 2019 issue of the
Journal. We endorse greater scrutiny of novel effects within the literature on gene-by-environment (G×E) interactions, including replication using highly powered methods. We argue, however, that the approach used here to measure the environment—several dichotomized questionnaire items—is insufficient to draw conclusions, despite probing measurement error. Evidence for the importance of the environment for depression is well established (i.e., approximately 63% of liability; [
2]), and vulnerability-stress models of depression are widely endorsed and long precede G×E research. However, G×E research has failed to invest in measuring the environment (
3); we urge adoption of “E” assessment that matches the rigor of “G” assessment.
Before G×E research, stress interview measures emerged as gold-standard measures among researchers studying stress and depression. Indeed, one 1998 review indicated, “today, use of life event checklists designed to rate the presence or absence of a finite number of events has largely been abandoned” (
4, p. 301). With the advent of G×E research, a need for quick measures resurrected questionnaire measurement. Although a full rationale is beyond this letter (
5), appropriate measures maximally disambiguate stress exposure from response, account for investigator-rated severity, distinguish interpersonal from noninterpersonal stress, attend to events’ depressogenic time-frame (<3 months), and establish temporal precedence.
Why is it insufficient to use a large N and estimate the impact of measurement error? First, even in their “catastrophic” simulations, the authors vastly underestimate the amount of random error introduced by inadequate stress measures. One estimate suggests that questionnaires accounted for only 16% of variance of interview measures (
6), even without artificially dichotomizing questionnaires as done here. Would the field tolerate such poor validity measurement of genotypes? Second, large samples address random error but not systematic error. Specifically, the authors’ approach does not account for findings that different types of stress confer significantly different unique variance for depression (
7), are poor indicators of each other (major interpersonal compared with noninterpersonal events, r=0.04 and r=0.32, respectively; [
7,
8]), and can produce G×E effect sizes in opposite directions. For example, in a simultaneous model, a significant interpersonal major event G×E effect was the opposite direction from the trending noninterpersonal major event G×E effect (
9). Similarly, a serotonergic multilocus score produced significantly stronger G×E effects for major interpersonal events than other event categories (full results available from the authors).
Further, differential susceptibility theory (
10) suggests that some genetic variants—including many in the present study—confer sensitivity to the environment for better and worse, meaning the same genetic variant may have opposite effects depending on environmental conditions. Differential susceptibility renders the effects of many variants uninterpretable without the accurate capture of environmental effects. The field has attempted to address inconsistent findings with ever-growing sample sizes. To the extent that they create the need to rely on inadequate “envirotyping,” dazzling sample sizes not only fail to increase the likelihood of the detection of real G×E effects for differential susceptibility variants—they may completely obscure them.