Depression contributes substantially to disease burden, particularly in people in the 25–49 age range, where depressive disorders constitute the sixth most burdensome condition globally (
1). This burden is likely to be exacerbated as incidence increases, and cases rose by roughly 50% worldwide from 1990 to 2017 (
2). On an individual level, depression impairs quality of life, heightens suicide risk (
3), and increases risk for later-life conditions such as cardiovascular disease and associated morbidity and mortality (
4). Prevention of depressive symptoms and disorders is therefore a public health priority. However, this relies on the accurate identification of modifiable risk factors for the condition—an area where current understanding is limited (
5).
Alcohol consumption is one candidate worthy of further study, given the well-established comorbidity between alcohol use disorders and depression (
6,
7). Specifically, alcohol use disorders precede major depression (
6,
8,
9), with some additional evidence for bidirectionality (
10) (i.e., depression also being associated with subsequent alcohol use disorders), consistent with the “self-medication” hypothesis (
11,
12). However, for non-disordered drinking at a population level, the nature of the alcohol-depression relationship is less clear. There are indications that light to moderate consumption is associated with lower risk for depressive conditions when compared with abstinence, with risk increasing again for heavier drinkers, resulting in a J-shaped relationship—a finding supported by a recent meta-analysis (
9). However, the extent to which these protective effects are genuine causal relationships, as opposed to biased associations driven by methodological limitations, has not been established.
One such methodological bias may be confounding, given the possibility that background environmental (e.g., socioeconomic) factors may explain demonstrated alcohol-depression associations (
13). Most studies employ a “conditional” approach when controlling for confounding, that is, adjusting for covariates in a regression model. To be effective, this approach requires accurate model specification, including the shape of the relationship of each covariate with the outcome (e.g., linear, exponential, etc.) (
14–
16). Even then, this approach is limited when there are few individuals for certain variable combinations (
15,
17,
18), and it may be biased when there exists a time-varying factor that can act as both a confounder and a mediator over time (
18).
For example, income at some time during the exposure period may be a common cause of later alcohol consumption and depression and may itself be affected by past alcohol consumption. Conditional approaches that control for income at various waves of a study will give biased estimates by removing (i.e., adjusting away) the effects of prior alcohol consumption mediated through income. On the other hand, simply not adjusting for income would lead to results biased by confounding. See Figure S1 in the
online supplement for a visualization of these complex longitudinal relationships.
In addition to confounding, it is also difficult to isolate effects in a single direction, with reverse causation often present in the form of “sick quitters”—those for whom illness precipitates abstinence from or a reduction in alcohol consumption (
19). Additionally, selection biases may be at play, such as healthier drinkers being overrepresented in recruited cohorts (
20), and/or those with poorer outcomes being more likely to drop out over the follow-up period. Finally, categorization of alcohol consumption is often based on a single baseline measurement, which ignores changes in consumption (“drinker drift”) during the period to outcome measurement (
21).
Modern statistical methods for causal inference, such as marginal structural models (MSMs), overcome these limitations. MSMs attempt to fully adjust for measured confounders (both time-invariant and time-varying) through a “marginal” approach, that is, balancing confounders across exposure groups (i.e., different levels of alcohol consumption) before regression modeling, accomplished by assigning weights to each observation. These weights reflect the extent to which each observation is under- or overrepresented in the sample, with respect to a target population in which potential confounders are balanced across exposure groups. Applying these weights to the sample creates a “pseudo population” in which the distribution of confounders between exposure groups at each individual wave is balanced, while still allowing for longitudinal mediation via these confounders (
22).
Thus, relying on certain assumptions (see section 6 in the
online supplement), the pseudo population should be free of confounding and can be used to approximate randomized controlled trials—the gold standard for assessing causality—in contexts where their implementation is not feasible (
23). This approach can address many of the limitations of conventional cohort studies (see Table S1 in the
online supplement). Our recent review (
24) identified a single study applying MSMs to the alcohol-depression relationship. In a sample ages 20–64, Gemes et al. (
25) found a U-shaped relationship with major depression after 7–9 years of follow-up.
However, to determine whether the Gemes et al. findings are truly robust to the methodological biases outlined above, an extension of the application of MSMs is required. First, to utilize the full potential of MSMs and acknowledge drinker drift, multiple waves of exposure and covariate information should be included. Additionally, it is vital that a substantial portion of participants’ drinking (or abstinence) histories are incorporated, lest the complex cumulative and/or delayed effects of alcohol use (
26) or the effects of sick quitters adulterate estimates of the targeted exposure’s effect. Similarly, given that the nature of alcohol-depression relationships may change over the lifespan (
27), estimates should be derived from a sample of a fairly narrow age range so as not to mask underlying heterogeneity. Finally, exposure weights should only incorporate time-varying covariates that are known to temporally precede exposure.
Questions about the protective effects of moderate consumption tend to imply a stable level of consumption (e.g., “Does moderate consumption confer benefits over abstinence?”), allowing for simple, interpretable public health messages. Therefore, it is important to estimate the effects of consistent levels of consumption. With a secondary analysis of the National Longitudinal Survey of Youth 1979 (NLSY79), a cohort offering rich longitudinal data, our aim in the present study was to employ a robust MSM approach in addressing the following research question: Compared with consistent abstinence from alcohol, what is the effect of consistent occasional, moderate, and above-guideline alcohol consumption throughout early to middle adulthood on depression at age 50? Given evidence that methodological biases are responsible for previously identified J-shaped relationships, it was hypothesized that the use of a sophisticated MSM approach—reducing the impact of these biases—would result in a positive, linear relationship.
Methods
Study Design and Participants
The NLSY79 is a nationally representative U.S. cohort of 12,686 individuals born between 1957 and 1964 (the sampling procedure has been described elsewhere [
28]). Participants were first interviewed in 1979, at ages 14–22, and data on depressive symptoms (as indexed by the Center for Epidemiologic Studies Depression Scale–Short Form [CES-D-SF]) were first collected in 1992. Given the need to control for pre-baseline depressive symptoms, baseline for the present study was set at the following measurement occasion, which was in 1994. Eligible individuals were those who had valid baseline alcohol data and both time-fixed and pre-baseline (1992) time-varying covariate data (including CES-D-SF values), resulting in 5,667 participants. All NLSY79 participants provided informed consent.
Guided by the timing of CES-D-SF readministration, which changed after 1994 from year-based measurement to age intervals, the MSM incorporated alcohol consumption in 1994, 2002, and 2006, covariates and CES-D-SF values in 1992, in 1994, and at age 40, and a final CES-D-SF outcome measured at age 50 (see section 1 and Figure S1 in the
online supplement for more detail and a visualization, respectively, of what data were used from which assessment points). Thus, the alcohol consumption variables span ages 29–37 through 41–49, corresponding with early to middle adulthood. In addition to those who dropped out of the study entirely, individuals who were missing any alcohol consumption, covariate, or CES-D-SF values in 2002 or 2006, or CES-D-SF values at age 50, were considered censored at the earliest of those assessments.
Variable Derivation
Alcohol consumption.
Participants were asked about their current drinking frequency (number of drinking days in the previous month), volume (number of drinks per drinking day, with a drink explicitly defined as “the equivalent of a can of beer, a glass of wine, or a shot glass of hard liquor”), and heavy episodic drinking (defined as at least six drinks per occasion by the NLSY79 questionnaire). For this study, alcohol consumption was separately categorized for 1994, 2002, and 2006, with categorization based on methods (
29) incorporating frequency, volume, and heavy episodic drinking, modified such that volume criteria were concordant with current U.S. guidelines (see
Table 1 for category criteria) (
30). For every pre-baseline occasion with adequate information, individuals were similarly categorized, and wave-specific categorizations were condensed into a single historical variable (see section 2 in the
online supplement).
Covariates and CES-D-SF scores.
The CES-D was designed to measure depressive symptoms in the general population and asks respondents to indicate the frequency with which they experienced a range of symptoms during the previous week, with higher scores suggesting greater symptoms (
31). The CES-D-SF is a reduced, seven-item version of the CES-D and has demonstrated strong psychometric properties (
32). Two versions of the analyses were performed: one in which CES-D-SF scores were analyzed in their continuous form and another in which the scale was dichotomized into non-depression (scores <8) or probable depression (scores ≥8) based on suggested cutoffs for probable depression (
32). The distribution of the outcome variable was not sufficiently skewed to warrant transformation (see section 3 in the
online supplement) (
33).
Covariate selection was guided by a recent meta-analysis (
9). The baseline time-fixed variables incorporated were historical alcohol consumption, sex, ever smoked, ever used illicit drugs, pre-baseline self-esteem, pre-baseline frequent religious service attendance, race, educational attainment, and average parental educational attainment. The time-varying variables incorporated were CES-D-SF scores, age, self-reported health limitations, marital status, smoking status (not available at age 40), illicit drug use (not available at age 40), body mass index, employment status, health insurance status, household size, urban/rural residence, welfare receipt, and income. Note that the incorporation of CES-D-SF scores prior to the final outcome measurement means that the model specifically isolates the effect of alcohol consumption on age-50 depression. Further detail on covariate selection and derivation is presented in sections 2, 4, and 5 and Figure S2 in the
online supplement.
Statistical Analysis
Marginal structural models (MSMs).
The first step in constructing MSMs is to sequentially estimate inverse probability of treatment weights (IPTWs) for each wave—that is, the probability of an individual being in a given treatment (or exposure) group given how their covariate profile compares to the typical profile for that group. Individuals whose covariate profiles are atypical of their exposure category are assigned larger weights than those whose covariate profiles align more closely with their group’s typical profile. Inverse probability of censoring weights (IPCWs) are calculated at each wave too, such that those whose covariate profiles are not typical of those who remained in the study are upweighted. Combined with the use of a survey weight, IPCWs allow MSMs to be representative of the target population. The final weight for each individual is the product of all time-specific IPTWs, IPCWs, and the survey weight.
In the second step, the weighted pseudo population is statistically analyzed, comparing hypothetical joint interventions (i.e., exposure trajectories of interest) in their effect on an outcome. Such an analysis provides marginal estimates that apply to the whole population because these estimates are not conditional on other variables (i.e., they are not based on models that hold covariates constant) (
34).
Weight generation.
Stabilized IPTWs were created for alcohol consumption at the 1994, 2002, and 2006 waves, as were IPCWs (relating to attrition status in 2002 and 2006 and at age 50). The final MSM weight was trimmed at the 98th percentiles to mitigate the effect of extreme weights (
35). Further information on weight calculation, distribution, and covariate balance is presented in section 7 and Figure S3 in the
online supplement.
Linear regression and contrasts.
Using these final weights, weighted multivariable regression models regressed CES-D-SF scores (linear model) and probable depression (logistic model) at age 50 on 1994, 2002, and 2006 alcohol consumption. The first set of analyses, using a continuous CES-D-SF outcome, provides predicted mean depressive symptoms, while the second, using a binary outcome, provides predicted probabilities of probable depression. In results focusing on stable trajectories of alcohol consumption, drinking level “values” were substituted into a model using the parameters derived from the whole sample to generate predictions about hypothetical drinking trajectories of interest. This involved using linear contrasts to estimate the predicted difference in outcome between hypothetical trajectories of consistent consumption. Specifically, consistent occasional, consistent moderate, and consistent above-guideline drinking were compared against a consistent abstinence reference category. We selected these stable drinking categories because they are particularly informative in terms of formulating national alcohol guidelines. For the continuous outcome, this is equivalent to summing the relevant wave-specific effect estimates for each category of alcohol consumption from the initial regression, and then taking the difference with the reference group. Results were bootstrapped using 500 bootstrap replicates to account for the additional sources of uncertainty introduced by weighting.
Sensitivity analyses.
Despite incorporating a wide range of potential confounders, E-values were calculated for the contrast estimates to assess robustness to unmeasured confounding. E-values are a recently developed metric intended to represent the effect size that an excluded confounder must have on both exposure and outcome to account for the observed association (
36). They are calculated using the standardized effect estimates and their standard errors/confidence intervals for an observed association. A small E-value suggests that only a small amount of unmeasured confounding would be needed to account for the association. Conversely, as E-values get larger, the likelihood that unmeasured confounding fully accounts for the association becomes increasingly unlikely. The main analyses were additionally performed in men and women separately.
All analyses were conducted in R, version 4.1.0 (see section 8 in the
online supplement for packages and code used). We considered p values <0.05 statistically significant, but we also interpreted results with respect to strength of the effect estimates and confidence intervals.
Results
Baseline Sample Characteristics
The sample was 50.5% female, and in 1992 had a mean age of 30.81 years. At baseline, 1,947 participants abstained from alcohol, 1,259 consumed occasionally, 660 consumed moderately, and 1,801 consumed above guidelines. The mean CES-D-SF score in 1992 was 4.16 (SD=4.01), with baseline abstainers having the highest mean (4.55; SD=4.36) and baseline moderate drinkers the lowest (3.17; SD=3.33). Compared with abstainers and above-guideline drinkers, occasional and moderate drinkers had a higher mean income at baseline, were less likely to be Black, and were more likely to be employed and to be married. Additional baseline sample characteristics are presented in
Table 2 (note that IPTW weighting effectively nullified baseline differences between exposure groups; see Figure S3 in the
online supplement).
Primary Analyses
A total of 3,593 individuals provided valid outcome data at age 50. The results of the regression models are presented in
Table 3. For both the continuous and binary outcomes, the largest protective effects for both occasional drinking (b=−0.61, bootstrap 95% CI=−1.19, −0.01; odds ratio=0.70, bootstrap 95% CI=0.46, 1.04) and moderate drinking (b=−0.82, bootstrap 95% CI=−1.46, −0.13; odds ratio=0.58, bootstrap 95% CI=0.29, 1.00) corresponded to 1994 consumption, while the greatest detrimental effects for above-guideline drinking corresponded to 2006 consumption (b=0.55, bootstrap 95% CI=−0.98, 0.40; odds ratio=1.17, bootstrap 95% CI=0.72, 1.96).
Results from the contrast analyses approximated a J-shape (see
Figures 1 and
2): after bootstrapping, consistent occasional drinkers (b=−0.84, 95% CI=−1.47, −0.11) and consistent moderate drinkers (b=−1.08, 95% CI=−1.88, −0.20) were both predicted to have statistically significantly lower mean CES-D-SF values at age 50 than consistent abstainers. Consistent above-guideline drinkers had greater predicted CES-D-SF values than consistent abstainers, but the difference was not statistically significant (b=0.34, 95% CI=−0.47, 1.25). Similar results were obtained for the binary outcome analyses, with consistent occasional drinkers predicted to have significantly lower odds of depression than consistent abstainers at age 50 (odds ratio=0.58, 95% CI=0.36, 0.88) and consistent above-guideline drinkers having marginally higher odds (odds ratio=1.06, 95% CI=0.66, 1.72). Consistent moderate drinkers had reduced odds of developing probable depression, comparable to that of occasional drinkers, but this was not significant after bootstrapping (odds ratio=0.59, 95% CI=0.26, 1.13).
See Tables S2 and S3 in the
online supplement for full statistical output from contrast analyses.
Sensitivity analyses.
E-values for the consistent abstinence versus consistent moderate consumption contrast in the continuous outcome analyses were 1.83 (for a standardized β of −0.25) and 1.30 (for the confidence interval). This indicates that to fully account for the observed association, one or more unmeasured confounders would need to almost double both the probability of an individual belonging to the moderate exposure group and the probability of being high (compared to low) on CES-D-SF scores (with a smaller effect required to alter the confidence interval such that it includes the null). The E-value was similar for the binary outcome analyses (1.93). For the consistent abstinence versus consistent occasional consumption contrast, the E-values for mean depression score were 1.68 and 1.19 for a standardized β of −0.20 and confidence interval, respectively, and for the binary outcome were 1.95 for the effect and 1.33 for the confidence interval.
Women were less likely to be above-guideline drinkers than men (see
Table 2) and had higher depressive symptom levels overall: at age 50 the mean CES-D-SF score for women was 4.20 (SD=4.48), compared with 3.05 (SD=3.91) for men. However, in stratified analyses, the pattern of results from the overall sample replicated for both men and women (see section 9 and Figures S4 and S5 in the
online supplement). This was true of both the continuous and binary outcome analyses, although for the latter, consistent moderate drinking resulted in the lowest predicted probability for males rather than being equivalent with consistent occasional drinking, as it was for females.
Discussion
This MSM analysis of the relationship between alcohol consumption in early to middle adulthood and depression at age 50 replicated the classic J-shaped relationship reported in the conventional epidemiological literature. In the present study, consistent occasional or moderate drinking was predicted to result in lower CES-D-SF mean scores and reduced odds of probable depression than consistent abstinence.
This finding was contrary to the hypothesis that an MSM using methods that address potential biases would result in a positive, linear relationship between alcohol use and depression. That protective effects of low to moderate consumption were found even after accounting for common biases offers preliminary evidence that these effects may be causal. Unlike the harmful effects of above-guideline consumption, which were largest at the measurement wave just prior to outcome measurement, alcohol consumption at baseline contributed most strongly to the protective effects of occasional and moderate drinking. (While moderate drinking at the 2002 wave was associated with
increased risk in the binary outcome analyses, this estimate had wide confidence intervals after bootstrapping.) This suggests that the benefits of low-level drinking accrue over time. Various mechanisms have been suggested for these benefits, including possible biological mediation via both GABAergic (
37,
38) and dopaminergic effects (
39). Moderate alcohol consumption is also associated with increased levels of brain-derived neurotrophic factor as well as lower levels of inflammatory biomarkers (
39–
41), which are both implicated in depression (
42). However, given that moderate drinking is known to reflect or facilitate healthy social interaction (
43)—an established protective factor for depression (
44)—this should also be acknowledged as a key pathway for observed benefits (
25).
Analyses using continuous CES-D-SF scores maximized statistical power and offer population-level interpretability. While applying effect size heuristics to psychological research is imperfect given that small or very small effects are typical (
45), the protective effects found in the continuous outcome analyses were small (
46). At first glance, these may not represent a clinically meaningful benefit, but given evidence that increased depressive symptom severity is associated with risk for subsequent physical health conditions (
4), changes in mean scores—even if below clinical thresholds—are important to study. Further, these effects are consequential at the population level, particularly when they contribute to modeling that informs national drinking guidelines, where the incorporation of evidence of alcohol’s protective effects can substantially alter recommendations (
47). Indeed, the present findings offer support for the current U.S. drinking guidelines (
30), at least with respect to depression.
The second set of analyses, using a binary outcome, provided predicted probabilities of probable depression and maximize clinical interpretability. These analyses also supported protective effects for stable occasional and moderate consumption (although due to reduced statistical power, the moderate drinker–abstainer contrast was no longer statistically significant). However, any evidence for protective effects must be balanced against that demonstrating increasing dose-response relationships between alcohol consumption and other health outcomes, including several cancers (
48).
While consistent above-guideline drinking was associated with increased depressive symptoms and risk for probable depression, this increase was modest and not statistically significant. At higher levels, alcohol begins to exert opposite effects on the same pathways through which lower consumption may offer protection (
39). As such, it is ostensibly surprising that consistent above-guideline drinkers did not have greater depression risk. However, recent meta-analytic evidence has suggested that the increased risk of depression among heavy drinkers may be largely driven by confounding (
9). Our findings here are consistent with this; only modest increases in depression were found among the above-guideline drinkers after rigorously controlling for available confounders. Importantly, evidence still indicates that disordered drinking more specifically, that is, a compulsive pattern of drinking associated with significant physical and social harms, is associated with increased depression risk (
9).
That the results of the primary analyses replicated when the sample was stratified by sex is consistent with previous findings that despite sex-based differences in the epidemiology of alcohol consumption and depression more generally, the nature of the alcohol-depression relationship is similar (
25,
49).
Strengths
The application of causal inference methods in this area is a nascent field, and the present study contributes preliminary evidence that associations between moderate alcohol consumption and reduced depression may reflect genuine causal effects. The key strength of MSMs in this context is in generating effect estimates that account for the dynamic nature of alcohol consumption and confounders over time. Studies that ignore drinker drift are likely to produce biased estimates, and although using latent class growth models (LCGMs) to characterize trajectories of consumption avoids this problem, it relies on modeled classes rather than observed patterns in the actual data and cannot answer questions about stable levels of exposure. This also applies to emerging methods combining LCGMs and MSMs (
50).
In this study, we controlled for a wide range of variables commonly considered to confound the alcohol-depression relationship. This included pre-baseline alcohol consumption, which was key to mitigating sick-quitter bias given that 60% of baseline abstainers were previous above-guideline drinkers. Robust pre-baseline data and the incorporation of multiple waves of alcohol consumption measures over follow-up represent considerable improvements over the only other extant MSM analysis addressing the alcohol-depression relationship (
25).
Limitations
MSMs, like any statistical model, depend on the quality and breadth of confounder measurement. The breakdown of exposure group by covariates in
Table 2 illustrates that individual covariates—excepting prior alcohol consumption—tended to associate only modestly with baseline drinker category. This may indicate that the identified protective effects are unlikely to be solely a result of unmeasured confounding. That said, those trends that were evident, such as occasional and moderate drinkers having higher income and employment levels, constitute important social determinants of health. It is plausible that one of, or a combination of, similar unmeasured covariates may have effects on exposure and outcome exceeding the generated E-values. In particular, the present study lacked data on parental alcohol use disorder, as well as sophisticated measures of social support and interaction, which may confound (and mediate) the alcohol-depression relationship in a longitudinal context. Residual confounding may also be possible given that weighting for the 2002 and 2006 exposure groups did not achieve balance for all covariates within the conservative 0.1 standardized mean difference heuristic (see Figure S3 in the
online supplement). Moreover, the desired interpretation of the weighted estimates rests on strong assumptions (discussed in section 6 in the
online supplement and at length elsewhere [
22,
51]) that, at best, approximately hold.
Future Directions
More research applying novel approaches to the alcohol-depression relationship, such as flexible, nonlinear Mendelian randomization (which eschews confounding entirely), is needed to answer remaining questions about causality. Alongside this, triangulation of evidence across various observational designs and statistical methods is key. Due consideration must also be given to the manifold data processing and analysis decisions researchers make, the impacts of which can be quantified using multiverse analysis (
52).
Acknowledgments
The authors thank Dr. Noah Greifer for his advice on the construction of inverse probability weights and on the use of his R package WeightIt, as well as Dr. Alexa Yakubovich for her advice on the use of marginal structural models.