To date, no biomarkers have passed the multi-step approval process of the U.S. Food and Drug Administration’s Center for Drug Evaluation and Research (CDER) Biomarker Qualification Program (BQP) to be “qualified” for use in ASD. The U.S. Food and Drug Administration and National Institutes of Health Biomarker Working Group generated the “Biomarkers, Endpoints, and Other Tools” (BEST) resource to harmonize biomarker terminology. BEST defines several biomarker categories based on use case, such as diagnostic and safety biomarkers. Here, we focus on response biomarkers, defined by BEST as “a biomarker used to show that a biological response, potentially beneficial or harmful, has occurred in an individual who has been exposed to a medical product or an environmental agent” (
4). Given the strong evidence for pathological vulnerability during fetal and perinatal development (
6,
7), the challenges of early detection of social deficits, and the paucity of somatic treatments that target ASD-defining deficits (
8), more reliable, biologically based assays would be transformative. In light of the substantial recent progress in the genetics and biology of ASD and the associated promise of identifying novel molecular treatment targets (
9), identifying reliable response biomarkers in ASD could revolutionize the field, providing a standardized metric to assess and refine therapeutic strategies (
5).
There is ample reason to be optimistic that response biomarkers can be found. Over the past decade, substantial progress has been made in identifying specific genes that dramatically increase the risk for ASD (
6). Moreover, the study of these definitive molecular risk factors, both individually and collectively, have identified a wide range of potential biological mechanisms (
6,
10) and also provided evidence that these genes converge to disrupt a smaller number of molecular pathways, cell types, and circuits in particular brain regions at specific points in development, resulting in the clinical phenotype (
6,
11–
15). Indeed, while the genetic contribution to ASD has been defined in only a minority of affected individuals, findings to date strongly suggest that markers of altered biological processes are likely identifiable, whether there is contribution from rare large-effect mutations, common polygenic inheritance, or environmental factors, all of which play a role in ASD pathogenesis (
16).
Methods
A systematic search of the literature was performed following the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (
18), although we note that our study was not registered in advance and includes an additional 32 articles identified from citations of the reviewed papers that were not captured by our initial database search (see Table S2 in the
online supplement). In April 2020, three databases (MEDLINE, Embase, and Scopus) were searched for relevant articles from January 1, 1900, to February 29, 2020, with the terms (autism, ASD, pervasive developmental disorder, or PDD) and (biomarker, marker, or endophenotype). For exact search terms, methods, and the results of these searches, see the Methods section and Table S1 in the
online supplement. Our initial search identified 3,571 MEDLINE records, 1,894 Embase records, and 4,577 Scopus records (
Figure 1A). Duplicate records within and across these databases were identified to yield a total of 5,799 independent records (
Figure 1; see also Table S1 in the
online supplement).
Two authors (A.A. and A.S.J.) independently applied a first round of filtering based on the title, article type, and language of the record, using three inclusion criteria:
•
Criterion 1: The article must be peer-reviewed and published in English.
•
Criterion 2: The article must describe original research.
•
Criterion 3: The article must focus on nonsyndromic ASD, although it may use a different term, such as pervasive developmental disorder or Asperger syndrome.
Applying these criteria, we retained 1,654 records (28.5%, 1,654/5,799) and excluded 4,145 records (
Figure 1A–C). The majority of exclusions (70.2%, 2,908/4,145) were due to not meeting criterion 3 (focus on ASD), followed by criterion 2 (original research) in 1,121 of the remaining 1,237 (90.6%). For the 1,654 records retained, we assessed our fourth inclusion criterion by reading the abstract and, if necessary, the full article:
•
Criterion 4: The article must describe new data assessing at least one biomarker.
Applying this fourth criterion identified 1,025 records of biomarkers in ASD (62.0%, 1,025/1,654) and excluded 629 (
Figure 1A and
1B). For all 1,025 records, we assessed the full article to apply three additional criteria:
•
Criterion 5: The biomarker assessed must have been both quantifiable and potentially variable (i.e., not fixed/structural).
•
Criterion 6: The research must have included a measure of ASD severity using behavioral measures or scales to assess social-communication and/or repetitive and restrictive behaviors. Noninterventional studies must also assess the association between these measures and the biomarker.
•
Criterion 7: If the research was based on animal models, there must have been an intervention.
To assess these criteria, each article was assessed independently by at least two of the authors (A.A., A.S.J., M.P., M.B., E.U.), and discrepancies were reviewed by an additional author. Applying the remaining criteria yielded a final sample of 248 articles (24.2%, 248/1,025) and excluded 777 (
Figure 1A,
1B, and 1D). The majority of exclusions (94.3%, 733 of 777) were due to criterion 6 (requirement for a measure of ASD severity), with 567 of the 733 being case-control studies for which the study needed to assess the relationship between the biomarker and ASD severity for inclusion (
Figure 1D). We note that these 777 excluded articles may nevertheless provide insights into potential ASD diagnostic biomarkers for future study, and we include a complete list of these articles in Table S1 in the
online supplement. From the citations of these 248 articles, we identified another 32 articles that met our seven inclusion criteria but had not been identified by the initial search, due to the absence of the words “biomarker,” “marker,” or “endophenotype” in the title or abstract, to yield 280 articles for final review. Table S1 in the
online supplement details all 5,799 articles and the outcome as these criteria were applied sequentially, plus the 32 articles identified from references cited by the articles we reviewed. We developed a standardized data extraction form, which five authors (A.A., A.S.J., M.P., M.B., and E.U.) used to extract data manually from all 280 eligible studies. If data were unclear or ambiguous, a consensus decision was taken by all five authors. The following metrics were extracted: biomarkers, study design, sample size, trial registration number if applicable (i.e., interventional study), ASD diagnostic criteria, inclusion and exclusion criteria, intervention (including dose and duration), primary outcomes, behavioral measures, and participants’ age, sex, and cognitive ability (see Tables S3 and S4 in the
online supplement). For the most frequently analyzed biomarkers, we also extracted the following outcomes: statistical association for the biomarker, direction of effect for the biomarker, whether the biomarker correlated with behavioral symptoms, and, for interventional studies, whether the intervention led to behavioral improvement. Where multiple outcomes were stated, the result based on the largest sample size was recorded. Missing data were recorded as “not stated.” For response biomarkers with consistent directions of effect across multiple studies (
Tables 1 and 2), we also extracted t statistics or means and standard deviations in cases and controls; studies in which these metrics were not reported were excluded from this step.
Data Analysis and Statistical Methods
Power calculations (
Figure 2) were performed using the
TTestIndPower function in the Python “statsmodels” library to perform a two-sided t test with alpha values of 0.05 (nominal) and 5.3×10
–5 (after Bonferroni correction for 940 biomarkers). Biomarkers reported in more than one study are displayed as co-publication networks using Cytoscape with the default “ForceDirected” layout. For response biomarkers with consistent directions of effect across multiple studies (i.e., glutathione), Cohen’s d was estimated from the t statistic (using the
t2d function in the Python “psych” library) or mean and standard deviations (see Methods in the
online supplement) and converted to Hedges’ g* with 95% confidence intervals (see Methods and Table S7 in the
online supplement). Hedges’ g* values were represented alongside 95% confidence intervals and sample size to provide insight into potential sample size biases (
Figure 3D and
3E). Given the small number of studies and their heterogeneous designs, the data were not subjected to meta-analysis or statistical assessment of heterogeneity, robustness, or bias.
Discussion
Through a systematic review of quantitative biomarkers in ASD, we identified 280 papers that detailed analyses of 940 potential response biomarkers. The majority of papers reported an association between a biomarker and ASD, yet no biomarkers have been qualified by the CDER BQP. Furthermore, biomarkers assessed multiple times mostly reveal both inconsistent evidence of ASD association and variable direction of effect (
Figures 3 and
4,
Tables 1 and 2). These discrepancies suggest a replication crisis, as observed in other fields of biomedical research (
32,
33). Our review identifies small sample sizes (
Figure 2), inadequate correction for multiple comparisons (
Figure 2), and the absence of replication cohorts as contributory factors. Given the high degree of positive findings despite minimal replication, it is likely that there is also a substantial publication bias, although this was not assessed statistically due to the limited reporting of quantitative outcomes in many analyses.
Against this background, distinguishing biomarkers that show true association with ASD symptoms is challenging. Mega-analysis of individual-level biomarker values would facilitate clear comparisons across biomarkers and assessment of biomarker-wide significant results; however, few studies include individual-level data. Similarly, meta-analysis, based on summary statistics, would also allow consistent comparisons across the field; surprisingly, many studies did not include these metrics. Furthermore, the heterogeneity of study design (e.g., phenotypes, demographic characteristics, methods) complicates comparison across multiple biomarkers. Well-known developmental changes are rarely taken into account in study designs (
30). Replication should distinguish true positive associations, focusing on the most frequently assayed biomarkers in each class; the majority do not show consistent results (
Figures 3 and
4).
The most consistent results were lower levels of reduced glutathione in ASD, observed in seven of eight case-control cohorts, with corresponding changes in oxidized glutathione (
34). The median effect size of −1.77 (Hedges’ g*) (
Figure 3D) is substantial. This pattern could be consistent with true ASD association, or it could arise by chance from the 940 biomarkers assayed and/or publication bias. A rigorous, large-scale, preregistered analysis is needed to resolve this question; with this effect size, the results should be definitive. Glutathione is considered to be a marker of oxidative stress in ASD, a hypothesis based on early findings of increased lactic acid in some children with ASD and the high frequency of ASD in children with genetic defects in mitochondrial enzymes (
35). To date, no gene related to oxidative stress has been associated with ASD through common or rare variation, suggesting that a causal role is unlikely, although it is possible that oxidative stress reflects nonspecific systemic dysfunction.
Many of the biomarkers assessed were cytokines, aiming to detect a proinflammatory state. Inflammation is a component of the maternal immune activation and lipopolysaccharide animal models that induce social impairments, and gene expression analyses of the postmortem human cortex show a consistent upregulation of coexpressed modules enriched for immune-related genes in ASD (
14,
36,
37), but neither common nor rare variation associated with ASD implicates immune processes. Proinflammatory IL-6 was the most frequently studied cytokine, and most studies failed to identify an association with ASD (
Figure 3), but those that did consistently reported higher IL-6 in ASD cases, often correlated with severity. In a meta-analysis of proinflammatory cytokines in ASD, interferon gamma, IL-1β, IL-6, and tumor necrosis factor–alpha reached nominal significance (
38); however, these results did not survive correction for the 21 cytokines assayed, let alone the 940 identified in this review.
The growth factor BDNF and the neurotransmitter serotonin both play critical roles in neurodevelopment and neurophysiology and highlight the importance of developmental age. Both BDNF and serotonin have some evidence to support a role as diagnostic biomarkers (
39–
42), although physiological levels and correlation with ASD vary across development (
41,
43). Hyperserotonemia was one of the first biomarkers implicated in ASD, with early studies showing that one-third of subjects with ASD showed increased blood levels (
44), a finding that has been validated in a recent meta-analysis (
45). Subsequent human and animal studies showed a physiological reduction in serotonin with increasing age. This age-related reduction is attenuated in autistic individuals (
43,
46), so that ASD-related hyperserotonemia is most apparent in late childhood (
47). While the papers we reviewed all report an association of BDNF and serotonin levels with ASD, the direction of effect varies (
Table 1,
Figure 3C). Prior evidence of ASD association may have been incorrect, or, alternatively, the heterogeneous ages of individuals in these cohorts may have masked or augmented the ASD-related differences.
The extensive variability of biomarkers, both within and between individuals, presents one of the biggest challenges to biomarker research. Likely sources of variability include symptom severity, comorbidities, developmental age, sex, ancestry, genetic variation, environmental conditions (e.g., diet, medications, infections), time of day, sample processing methods, tissue assayed, and experimental assay. The blood-brain barrier is expected to be a major source of variability, leading to different results if molecular biomarkers are assayed centrally or peripherally. This will vary between molecules; for example, plasma-brain correlation has been demonstrated for BDNF in rodents, but not in humans (
48), while such correspondence has not been established for GABA (
49) or most other biomarkers. Differences between central and peripheral assays may also occur across development, as the permeability of the blood-brain barrier varies with maturity. Within the central or peripheral compartments, the tissue assayed is also critical. For example, platelets store serotonin, so platelet-rich blood provides more accurate assays. Where possible, recording likely sources of variability enables their inclusion as covariates in statistical models—for example, correcting for population stratification based on genotypic data. For unrecognized variables, careful experimental design, including selection of cases and controls, is critical. In discovery cohorts, large sample sizes are essential to overcome this variability, while longitudinal analyses in the same individuals can help delineate the major sources of intra- and interindividual variability in validation studies. Identifying these key covariates will be critical to define the homogeneous cohorts in which a specific biomarker may augment a clinical trial.
Neuroimaging and neurophysiological studies have clear potential for detecting biomarkers, although we found that sample sizes have been modest (the median total sample size was 44; see Table S6 in the
online supplement), and a myriad of techniques, instruments, tasks, brain regions, and data processing methods pose additional challenges to distinguishing true biomarkers (
Figure 4). Both alpha power and fixation time, the most frequently assayed biomarkers in neurophysiological studies, showed inconsistent direction of effect across studies (
Table 1,
Figure 4C). As for neuroimaging, both task activation and functional connectivity studies implicated brain regions that are generally considered part of the “social brain” (
50), including the medial prefrontal and temporal cortices. Activation of regions in the medial prefrontal cortex and the parieto-temporal junction replicate across different study designs and show changes that correlate with ASD symptoms in clinical trials (
29,
30,
51). Functional connectivity analyses show consistent ASD association, and some correlation with ASD symptoms, for the default-mode network, including decreased connectivity between the medial prefrontal cortex and the posterior cingulate cortex (
52–
55).
The variability issues that embroil neuroimaging markers are compounded by variability in data acquisition methods, such as differences in hardware, image acquisition sequence parameters, and tasks, and differences in data-analytic approaches, including data postprocessing and quality control, as well as focus on various brain regions. While there is some replication in studies implicating social brain regions and the default mode network, the heterogeneity between studies prevents clear conclusions from being drawn at this time. Data repositories, such as the Autism Brain Imaging Data Exchange (ABIDE), and open-source data analysis pipelines (e.g., ABIDE imaging masks and analysis pipelines) are enabling a new generation of larger-scale neuroimaging analyses with clear correction for multiple comparisons and open data sets for replication (
56). It remains to be seen whether these initial findings prove to be robustly replicated in subsequent studies and whether some of the methodological difficulties are best overcome by pooling multiple studies or designing large-scale studies with consistent methods.
Animal models enable direct measurement of brain tissue and greater control over experimental conditions. All biomarkers assessed in animals were molecular in the studies we reviewed, and many overlapped with the molecules and classes most frequently assayed in human studies (
Figure 3; see also Table S8 in the
online supplement). The model most frequently used in the reviewed studies was the BTBR mouse, which is defined solely on ASD-like behaviors (
57). The reliance on purely behavioral phenotypes, particularly in rodents, has generally not been productive for illuminating the biology of psychiatric phenotypes, with rare exceptions (
58). In ASD, particularly given the discovery of dozens of large-effect mutations that can subserve the creation of “construct-valid” animal models, reliance on the BTBR mouse model has come under increasing scrutiny and at present has questionable relevance to the human syndrome. In addition, several environmentally induced models, including maternal immune activation and exposure to valproic acid, have been studied, but it is unclear to what extent these model common etiological mechanisms of ASD in humans. After applying our review criteria (
Figure 1), none of the animal models included were based on ASD-associated genes. Relaxing our exclusion criteria by discounting criteria 6 and 7 (
Figure 1) identified another 53 animal studies, but even here, only a few genetic models were used (e.g., 16p11.2,
MECP2,
FMR1; see Table S1 in the
online supplement). With CRISPR-Cas9 gene editing and biorepositories, genetic model systems are an underutilized resource in biomarker discovery, including genetic animal models and isogenic and patient-derived human cell culture models.
Limitations of the Review
While we identified over 1,000 original research articles that included putative biomarkers in ASD, we focused on about one-quarter of these—those that were most relevant to the response biomarker classes defined by BEST (
4) (see Methods and Table S1 in the
online supplement), by selecting articles that assessed whether biomarkers correlate with ASD symptom intensity. We note that biomarkers can overlap between classes; for example, a diagnostic biomarker that distinguished individuals with high or low ASD liability might also change in relation to a behavior of interest. However, since no ASD biomarker in any class has been qualified by CDER BQP, and since assessing correlation with symptom severity is a logical follow-up analysis for promising diagnostic biomarkers, it is unlikely that in-depth analysis of the other 750 articles would change our main conclusions. We note that we did not systematically assess whether these conclusions generalize to other biomarker classes.
Our search is also limited by the accuracy and sensitivity of ASD severity measures, especially in the older studies. We cast a wide net, including instruments the authors of each study utilized as a measure of autistic severity, unless the measure was solely assessing global functioning or disability. Consequently, some of the measures included are confounded by behavioral or intellectual impairment. Severity measures and validation efforts continue to improve (
59); it remains to be seen whether currently available metrics can detect the modest short-term changes that are likely to be necessary for evaluating biomarkers or therapeutic response.
Our most in-depth analysis focused on 12 biomarkers assayed across multiple papers (
Table 1), allowing us to assess replication in order to distinguish true biomarkers. An alternative strategy would be to rank all biomarkers assayed by effect size or p value to find the most promising. However, we found the analytic methods and summary statistics reported to be too heterogeneous and incomplete for this approach; it is possible that true biomarkers are included among those studied but do not currently stand out from the crowd (see Table S4 and S5 in the
online supplement).
Finally, we note several items missing from the PRISMA checklist, specifically that the review and protocol were not registered in advance, that we included 32 articles identified from citations but not the initial search, and that formal analyses of risk of bias, robustness, and heterogeneity were not performed, given the heterogeneity of the studies reviewed.
Future Studies
The ASD biomarker field is reminiscent of the era of candidate gene discovery, in which technological and biobank limitations necessitated a focus on small numbers of loci in small cohorts, which in turn led to a replication crisis (
19,
20). Many of the lessons learned from candidate gene approaches are transferable to biomarkers, including the need for larger sample sizes, appropriate multiple comparisons (
Figure 2), and replication cohorts. Correcting statistical significance thresholds to reflect all biomarkers assayed in a study is a bare minimum. A higher statistical threshold is required to overcome publication bias. Ideally, this threshold would be based on the total number of effective tests across all biomarkers, estimated from the degree of interdependence between biomarkers (
60). Until such estimates can be made, family-wise error correction (e.g., Bonferroni) for all biomarkers tested to date, currently about 1,000, is a simple and conservative approach. Alternatively, widespread sharing of individual-level data and key metrics (e.g., tissue, collection conditions, assay, demographic characteristics, deep phenotyping) would allow false discovery rates to be estimated in multi-biomarker mega-analyses. The widespread data sharing required would be simplified by the adoption of community-wide standards, for example, using standardized ontologies and machine-readable file formats to share biomarker results and key metrics. Once true biomarkers are identified for specific subgroups of individuals with ASD, specific developmental stages, or specific symptoms, their usefulness for monitoring change due to specific interventions could be tested in clinical trials. Only when the evidence for the biomarker is comparable to the confidence in existing behavioral measures of ASD severity will biomarkers become a useful outcome measure for interventional studies.
Based on our review, we provide a list of recommendations to help identify and distinguish true ASD biomarkers (
Box 1). Many of these recommendations are already being applied in some recent neuroimaging studies and through the CDER BQP process initiated by the Autism Biomarkers Consortium for Clinical Trials. Advances in genetics, such Mendelian randomization (
61) and CRISPR-derived model systems, and in technology, such as proteomics and metabolomics, have the potential to greatly accelerate the hunt for biomarkers. Although ASD biomarkers remain elusive, there is immense potential if community-wide efforts can be paired with rigorous scientific methodology.