Optimizing clinical management of major depressive disorder during pregnancy entails weighing the respective risks, to mother and baby alike, of prenatal major depressive disorder compared with prenatal antidepressant exposure (for a review, see reference
1). Unfortunately, because ethical considerations generally preclude conducting randomized clinical trials to evaluate antidepressant safety or efficacy during gestation, risk estimates are derived from observational studies, which are susceptible to numerous sources of bias and confounding. For example, physician awareness of prenatal antidepressant exposure may generate more intense screening for potential adverse outcomes, creating an ascertainment (surveillance) bias affecting even large-scale national and health system databases (
2–
3). In addition, purportedly prospective studies of prenatal safety often rely on retrospective data collection, which has been shown to introduce a recall bias, potentially overestimating the effect of antidepressant exposure (
4). Recognizing these and other deficiencies, a 2014 review by the Agency for Healthcare Research and Quality (AHRQ) of the U.S. Department of Health and Human Services found that studies of prenatal antidepressant safety are “inadequate to allow well-informed decisions…because comparison groups were not exclusively depressed women” (
5). Studies of the purported association between antidepressant therapy during pregnancy and subsequent diagnosis of autism in exposed offspring provide an opportunity to examine the effect of these methodological concerns.
A causal link between fetal antidepressant exposure and autism is biologically plausible and thereby merits rigorous investigation. Converging lines of preclinical and clinical evidence suggest a pivotal neurotropic role for serotonin in neurodevelopment (
6) and implicate aberrant serotonin signaling in the pathophysiology of autism spectrum disorder (for reviews, see references
7–
9). Considering that antidepressants alter serotonin neurotransmission, readily cross the human placenta (
10–
11), and have been shown in a rodent model to bind to the serotonin transporter in the fetal brain (
12), it is reasonable to posit a role for fetal antidepressant exposure in the pathogenesis of autism.
Beginning in 2011, a rapidly accruing series of observational studies examining the possible link between autism and prenatal antidepressant exposure produced discrepant findings. Despite these inconsistencies, previous meta-analyses (
13–
15) have reported a significant association between antidepressant exposure and autism, although Brown and colleagues (
13) postulated that the association, nonsignificant when limited to women with histories of mental illness, is perhaps a by-product of residual confounding.
These previous meta-analyses have not systematically examined the potential for bias in these studies, nor the contribution for alternative study designs, particularly comparator group selection (which was specifically emphasized by AHRQ) to discriminate the contribution of bias compared with confounding underlying the discordant results. This meta-analysis is the first, to our knowledge, to systematically evaluate the effect of alternative study designs, particularly comparator group selection, on the observed association between prenatal antidepressant exposure and subsequent autism diagnosis.
Methods
We followed the guidelines established by the Meta-analysis of Observational Studies in Epidemiology Group (
16).
Search Strategy and Selection Criteria
The senior author (D.J.N.) conducted searches using BIOSIS, CINAHL Plus, Embase, MEDLINE, PsycINFO, PubMed, and Scopus databases from their respective inceptions through August 2017 for articles addressing the association between prenatal antidepressant exposure and autism and autism spectrum disorder diagnoses. The search strategy comprised three initial searches selecting articles regarding antidepressants (search terms: antidepressant, serotonin reuptake inhibitor, serotonin-norepinephrine reuptake inhibitor [SNRI], selective serotonin reuptake inhibitor [SSRI], tricyclic antidepressant, and the generic names for all commercially available antidepressants), pregnancy (search terms: antenatal, fetal, pregnancy, and prenatal), and autism (search terms: Asperger’s syndrome, autism, autism spectrum disorder, and autistic). These three sets were then joined into a single result set using the Boolean AND operator. This process was repeated for each of the seven databases. Finally, the bibliographies of all selected articles and review articles were searched to identify any articles that were overlooked in the database searches.
Peer-reviewed original research articles of controlled studies were selected for inclusion. Articles were excluded if the outcome was operationalized as the presence or severity of symptoms of autism rather than autism diagnosis.
Quality Assessment
Publication quality was assessed using the Newcastle-Ottawa Scale (
17–
18). The authors rated each article independently, and then a consensus rating was assigned. The cohort study version of the Newcastle-Ottawa Scale awards each study up to nine stars across three sections (selection, 4 stars; comparability, 2 stars; and outcome, 3 stars). The case-control version of the scale also awards up to nine stars across three sections (selection, 4 stars; comparability, 2 stars; and exposure, 3 stars). By convention, the two comparability factors of the scale were designated with respect to covariates deemed most important to the analysis in question. In view of the importance placed by the AHRQ on adequately controlling for prenatal depression (
5), we elected a priori to designate current maternal depression during pregnancy as a comparability factor. We selected maternal ethnicity and nationality a priori as a comparability factor, knowing that ethnicity and nationality (i.e., country of origin) have previously been implicated as sources of autism diagnosis misclassification (
19–
22) and having observed during our preliminary review that the majority of the qualifying studies reported statistically significant between-group differences in maternal ethnicity and nationality.
A Newcastle-Ottawa Scale quality threshold developed for the AHRQ (
23) was used to rate the quality of each study as good, fair, or poor. To be assigned a good rating, a study must have three or more stars in the selection domain, one or more stars in the comparability domain, and two or more stars in the outcome and exposure domain. Studies rated as fair must have two stars in the selection domain, one or more stars in the comparability domain, and two or more stars in the outcome and exposure domain. Finally, studies rated as poor have ≤1 star in the selection domain or zero stars in the comparability domain or ≤1 star in the outcome and exposure domain.
Meta-Analysis
Analyses were performed using the Comprehensive Meta-Analysis software program, version 3.3 (BioStat, Frederick, Md.). All statistical tests were two-tailed with alpha set at 0.05. A meta-analysis using a random-effects model was performed with summary measures of effect presented as odds ratios or hazard ratios, the latter for time-to-event data, with 95% confidence intervals. Because random-effects modeling accommodates analysis of studies drawn from different populations, it was used (in lieu of fixed-effects modeling) in light of our hypothesis that the varied approaches to comparator group operationalization alter effect estimates (
24–
25). The most fully adjusted odds ratio or hazard ratio estimates reported in each study were used in the meta-analysis. Under the rare disease assumption (
26), risk estimates from case-control and cohort studies were combined when calculating summary odds ratio and hazard ratio estimates.
In several cases, findings from the same, or overlapping, patient samples were reported in two or more studies (
Table 1). Specifically, four studies used data from the Swedish Medical Birth Register (
27–
30), three from the Danish Registry (
31–
33), and two from the Partners Healthcare database (
34–
35). To avoid data redundancy in the meta-analyses, preliminary random-effects meta-analyses were performed to produce a single pooled odds ratio or hazard ratio estimate for each set of overlapping studies, and the pooled estimates were incorporated into the final meta-analyses.
Results of meta-analyses were grouped by antidepressant class and trimester of exposure (first, second, third, any). Because the window of risk for adverse neurodevelopmental effects of fetal antidepressant exposure is unclear, data from all trimesters were evaluated. Initial meta-analyses were performed for studies reporting results using population-based comparator groups. Analyses were then performed for psychiatric control (i.e., control mothers limited to those with histories of depression) comparator group results and family-based (i.e., siblings discordant for prenatal antidepressant exposure or autism diagnosis) comparator group results. Finally, heterogeneity testing was performed to evaluate differences between results yielded with the three comparator group definitions.
Meta-Regression
Post hoc meta-regression was performed to investigate whether comparability factors used in the assessment of study quality (prenatal depression and maternal ethnicity and nationality) and other study characteristics (study design, study location, and publication year) were sources of heterogeneity in the population-based studies. Incremental insertion of candidate moderators into the regression model was performed.
Publication Bias
Potential publication bias was explored by constructing funnel plots and performing the Egger funnel plot asymmetry test (linear regression method) (
36). Standard error of the exposure effect was used as the measure of study precision.
Discussion
In this meta-analysis, we demonstrated a marked effect of comparator group composition on observational studies of prenatal antidepressant exposure and autism. First, most of the analyses of between-comparator subgroup heterogeneity (
Table 2) demonstrated highly significant differences. Moreover, whereas summary effect estimates derived from population-based comparator studies uniformly implicated fetal antidepressant exposure in the pathogenesis of autism, psychiatric control and discordant-sibling comparator studies, with largely nonsignificant and progressively lower summary estimates, indicate otherwise (
Table 2).
Additionally, our finding of significant between-group heterogeneity indicates that meta-analyses of prenatal antidepressant exposure and autism risk should be limited to studies using the same comparator group definition, because combining studies using different comparison groups will likely produce unreliable effect estimates. Thus, we are led to question which comparator group definition is preferred. Results of the meta-regression of population-based studies (
Figure 3; see also Table S2 in the
online supplement), a test of within-group heterogeneity, indicate that the effect estimates provided by the meta-analyses using population-based studies are unreliable (
43) because of unresolved differences in maternal ethnicity. Unfortunately, existing studies did not permit meta-regression of psychiatric control and discordant-sibling studies. Nevertheless, if the AHRQ position (i.e., that studies employing comparator groups comprising depressed women are better equipped to elucidate the risks of antenatal antidepressant therapy [
5]) is accepted, then the summary estimates from psychiatric control and discordant-sibling studies, which do not support an association between prenatal antidepressant exposure and autism, are preferred. In summary, our meta-analysis does not support an association between prenatal antidepressant exposure and autism.
The AHRQ design recommendation is predicated on a conviction that control subjects with depression are necessary to disentangle the effect of depression itself from that of antidepressant exposure. Numerous adverse outcomes have, in fact, been attributed to both prenatal depression and prenatal antidepressant therapy, including preterm birth, miscarriage, low birth weight, gestational hypertension and preeclampsia, child motor and cognitive deficits, and a variety of offspring behavioral and emotional perturbations (
44–
45), including autism (
46). Additionally, prospective studies concomitantly controlling for the occurrence of both depression and antidepressant exposure during gestation have successfully discriminated the adverse effects of prenatal depression (
47) from those of prenatal antidepressant exposure (
48), underscoring the importance of appropriate control for maternal depression.
However, none of the 14 studies included in our meta-analysis reliably ascertained whether participants experienced an acute episode of depression during the index pregnancy, relying instead on a level of ICD diagnostic coding that only delineates lifetime diagnoses of depression. In planning our analyses, we elected a priori that adequate control for an episodic disorder such as depression necessitates determining whether a depressive episode occurred during pregnancy. How, then, in the absence of adequate control for prenatal depression with any comparator group design, are we to understand the decidedly lower summary estimates derived from psychiatric control and discordant-sibling studies? The answer must lie elsewhere. If, as has been suggested, a genetic relationship exists between maternal depression and autism (
49–
50), then perhaps controlling for depression as a lifetime trait variable is adequate. Alternatively, because any lifetime history of depression is a risk factor for recurrence of depression during pregnancy (
51), both psychiatric control and discordant-sibling comparisons may have afforded at least partial control for acute prenatal depression.
Factors other than adjustment for maternal depression, however, likely explain our finding that summary hazard ratio and odds ratio estimates from discordant-sibling comparisons are decidedly lower than both population-based and psychiatric control designs. For example, discordant-sibling comparisons afford better control for genetic susceptibility to autism (
52), which is especially important in view of the recently reported 83% heritability (
53). Perhaps more importantly, significant differences in ethnicity in half of the studies (
27–
29,
31,
38–
40), coupled with failure to document ethnicity in five more studies (
30,
32–
33,
37,
41), indicate an additional advantage of the discordant-sibling design. With one exception (
38), the specific ethnic differences reported were that the prevalence of autism was significantly lower among children of Hispanic or immigrant mothers (
27,
40), and the prevalence of antidepressant exposure was significantly lower among immigrant mothers (
28–
29,
31,
39). Such differences should not be surprising. Ethnic disparities in clinical recognition of autism are well established, with U.S. studies consistently reporting underrecognition of autism in Latino children (
19–
22) and other studies specifically denoting poor English proficiency as a principal barrier to autism diagnosis among Latinos in the United States (
21,
54). In contrast to a recent review of 17 studies suggesting a higher prevalence for autism diagnoses among children of immigrant mothers in Europe (
55), autism diagnoses were significantly lower among children of immigrant mothers in the only European case-control study in our meta-analysis to report maternal ethnicity (
27). Similarly, immigrant status and poor language proficiency have been identified as barriers to access to mental health care in several countries (
56–
60). Taking these data together, we may surmise that ethnic minority and immigrant mothers in the contributing studies, particularly those with poor language proficiency, were less likely to have access both to treatment for depression during pregnancy and to a diagnostic evaluation for their children exhibiting symptoms of autism. Consequently, the observed association between prenatal antidepressant exposure and autism in population-based comparisons is unlikely to denote a causal relationship. In addition, it is also unlikely to be a consequence of residual confounding as proposed by Brown and colleagues (
13). Instead, the most plausible explanation is that the association is the product of a surveillance bias, arising because women in prenatal antidepressant exposure groups are more likely to secure a diagnostic evaluation for autism for their children. Moreover, reports linking maternal depression with autism (
61) in offspring may lead to enhanced autism screening, thereby constituting another potential source for surveillance bias. Such biases are not amenable to statistical adjustment but can be addressed by designating a comparator group (such as a discordant-sibling control) with a similar likelihood of screening for the outcome of interest (
62).
Limitations of this meta-analysis are those imposed by the 14 contributing studies. As previously noted, the available studies did not permit an analysis of the effect of research design on results of SNRI exposure or second- and third-trimester exposure to other antidepressants. Moreover, the studies did not afford sufficient control of the confounding effect of maternal depression, although it appears that some measure of control for depression may have been provided by psychiatric control and discordant-sibling designs. In addition, the studies did not reliably provide the data necessary to control adequately for concomitant prenatal pharmacological exposures, with only five studies controlling for exposure to other psychotropic drug classes (
28,
30–
31,
38–
39) and five studies controlling for tobacco exposure (
27–
28,
31,
38–
39).
The design implications of this meta-analysis for future observational studies using data derived from large-scale national registries and health care databases are far-reaching. While it is easy to be impressed by the voluminous sample sizes they produce, their data are primarily collected to address clinical and business demands, not to answer research questions. Thus, such databases are especially susceptible to the sources of bias known to hinder all observational designs. Whereas the value of family-based designs in genetic association studies has long been recognized, our study highlights an additional strength of a family-based design, namely, holding constant not only genetic but also family-level environmental variables, thereby minimizing their potential for surveillance bias and residual confounding (
52). Even psychiatric control designs fail to achieve this level of rigor. These results lead us to recommend that pharmacovigilance reports of large-scale registry and database data more carefully consider sources of bias, particularly surveillance bias, and that they accordingly consider incorporating alternative designs, such as family-based discordant-sibling designs, in lieu of conventional population-based comparisons to more effectively address the potential for surveillance bias.