Discussion
Here, we present an integrated analysis of ADHD-associated CNVs identified in the 11 studies published to date that have detected rare CNVs in ADHD case subjects. The limited power of the individual studies has been a crucial bottleneck in the definition of high-priority ADHD candidate genes for further studies. From the 1,532 CNVs described in the 11 studies, using strict criteria, we extracted 2,241 mRNA-coding genes; this number is likely to contain many false positives because of the individually low occurrence of the rare CNVs (
19), but the number is too large to allow each CNV to be studied in (animal) models for validation and mechanistic insights. In this study, we aimed to prioritize genes linked to ADHD among the 2,241 on the basis of the robustness of findings across different bioinformatics approaches, used in an integrative manner and including both human and animal model–derived data. For this, we selected only those genes that were recurrently affected by CNVs in the case subjects and focused on the minimal overlapping region of different CNVs in a region. Furthermore, we excluded all genes that were affected by a CNV in healthy control subjects of other studies, as well as those that were only partially duplicated (lacking a full coding sequence and promoter). These stringent criteria substantially reduced the number of candidate genes from the CNVs, to one-fifth of the original number. We showed that the selected 432 high-priority genes were significantly more coexpressed in the developing brain in comparison to random gene groups; this is evidence that our selection enriches for biologically coherent genes that are expressed at the same time in the same tissue, a prerequisite for them to be involved in the same biological processes and cause similar phenotypes when disturbed.
The contribution of rare genetic variation, including rare CNVs, to ADHD pathophysiology has received relatively limited attention thus far. Given the high prevalence of the disorder, the focus of genetic studies has been largely on common genetic variants, following the “common disease–common variant” hypothesis (
37,
38). However, in most common diseases, rare variants have also been shown to play a role, which has prompted researchers to start studying them for ADHD as well. Most studies published thus far have been limited in both sample size and in their focus on only one form of rare variants—CNVs—requiring integration as performed in the present study to extract meaning. Recent work in larger samples and/or using alternative methods for rare variant detection (e.g., exome sequencing) confirmed a role of such rare variation in ADHD. Most notably, the large study in the iPSYCH sample (
39) confirmed a significantly higher burden of rare protein-truncating variants in evolutionarily constrained genes in ADHD case subjects than in control subjects. Also, a recent CNV study in over 400 patients identified rare CNVs in 9.4% of individuals with ADHD (
40), and two studies of parent-offspring trios pinpointed several de novo CNVs occurring in offspring with ADHD (
41,
42). Whether these rare variants are the sole (genetic) cause of ADHD in the individual carrying them remains to be clarified. So far, evidence supporting high penetrance is been limited (
43,
44). Irrespective of this, the fact that rare variants can often be more easily linked to gene dysfunction than common genetic variants makes them an invaluable source of information toward understanding the biology underlying ADHD.
Studies in model organisms can provide a wealth of phenotypic information, because of the high level of functional conservation across species. For ADHD, several model organisms have been shown to provide valid phenotypes. These include monkey, rat, mouse, zebrafish, and fruit fly, where relevant phenotypes can be observed on genetic manipulation or drug administration (
6). To identify biological processes underlying the selection of genes, we therefore took a novel approach that has not yet been applied in the field of neuropsychiatric disorders. We mined the Monarch Initiative database, which integrates genotype-phenotype relations across species. Using this approach, we found that 18 of the 432 high-priority genes, when manipulated in animal models, cause phenotypes that are face valid to the ADHD core phenotypes of attention deficit, hyperactivity, and impulsivity. The beauty of animal models is that paradigms can be tested robustly and under controlled conditions, with large numbers of genetically identical animals and under environmentally identical conditions. This is impossible to achieve in humans. Although animal models are not a copy of humans, there is ample scientific evidence that fundamental behaviors such as learning, memory, and circadian and activity/sleep rhythms rely on highly conserved molecular mechanisms and players (
45,
46). Importantly, ADHD has consistently been shown to represent the extreme of traits of activity and attention in the general population; this has been established on the phenotypic, genetic, and neuroimaging levels (
7,
47). This fact makes ADHD one of the psychiatric disorders that can be modeled particularly well in animal models. Our approach integrating across model systems is therefore a particular strength of this study, with the consistency of our findings across the different approaches illustrating its value.
It is highly likely that the proteins we identified using the Monarch Initiative database form functional biological connections with proteins for which detailed functional characterization, in particular information on ADHD-related phenotypes, is still lacking. We therefore retrieved direct interactors of the 18 proteins and identified an interconnected network of 66 proteins linked directly or indirectly to disease-relevant phenotypes (
Figure 2). We also tested whether our high-priority gene list itself formed protein-interaction modules with significantly interconnected proteins, which could provide us with information on biological processes. Indeed, we identified four modules comprising significantly connected proteins from our selection. Of these, the major modules, module 1 and module 4, connected to and were supported by ADHD-related phenotypes across species, showing the added value of cross-species analysis (
Figure 2).
Module 1 contained two significantly linked proteins: WW-domain containing oxidoreductase (WWOX) and DNA-directed RNA polymerase I subunit RPA1 (POLR1A). WWOX is involved in autosomal recessive cerebellar ataxia-epilepsy-intellectual disability syndrome (
48), but no direct connections with ADHD are known. However, WWOX interacts with protein phosphatase 1F (PPM1F), a Ser/Thr protein phosphatase that modulates RhoA and Ca
2+/calmodulin–dependent protein kinase II pathways (
49,
50). Other members of this phosphatase protein family are involved in mediating dopaminergic signaling via G-protein-coupled receptors (
51). In addition, misexpression of PPM1F in the substantia nigra in patients with Parkinson’s disease may implicate PPM1F more directly in dopaminergic biology (
52). Dopamine signaling pathways have repeatedly been found to be altered in ADHD patients, and they form the basis for the most widely used pharmacological treatment approach (
53,
54). As we can directly connect PPM1F to the cross-species term hyperactivity and indirectly link
WWOX,
POLR1A,
POLR2B, and
POLR3C to ADHD core phenotypes, we can extend the network with genes potentially regulating dopaminergic signaling (
Figure 2). The RNA polymerase II subunits (POLR1A, POLR2B, and POLR3C) are involved in the regulation and fine-tuning of transcription.
Module 2 clusters proteins that are required for blood-brain barrier formation, which function in cell-cell junctions and communication. This module contains one significantly connected protein: catenin alpha-3 (CTNNA3). This adherence junction protein, also known to be associated with autism spectrum disorder (ASD), likely modulates cerebral and ependymal regions through GABA
A receptor activation (
55). Tight junction protein ZO-1 (TJP1) forms the connection to the other three proteins in this module. This gene is affected by CNVs in 28 ADHD case subjects, being the most frequently occurring gene affected by copy number alterations in our survey of the ADHD CNV studies (see Table S1 in the
online supplement). TJP1, together with claudin-5 (CLDN5, affected in 12 ADHD case subjects) represents an important constituent of the blood-brain barrier (
56,
57). Protein kinase C eta type (PRKCH) regulates TJP1 (
58). Genes regulating neuronal cell adhesion are also significantly associated with ASD, schizophrenia, and bipolar disorder, raising the hypothesis that this mechanism plays a role across different neuropsychiatric disorders (
59,
60). Given the high number of ADHD case subjects with a CNV in this module, we postulate that cell-cell junctions play an important role in ADHD.
Module 3 contains two proteins that directly interact with each other: the significantly interconnected density-regulated protein (DENR) and the eukaryotic elongation factor 2 kinase (EEF2K), both involved in regulation and initiation of translation (
61). Regulation and initiation of translation have been linked to neuropsychiatric disorders, including ADHD, through the regulation of brain-derived neurotrophic factor (
62–
64). Based on the repeated association and the described functional work, we suggest transcriptional regulation as one of the mechanisms that modulate the risk for ADHD.
Module 4 contains four significantly connected proteins: BIRC6, RAB15, SEPT5, and PHKB. Baculoviral IAP repeat-containing protein 6 (BIRC6 or Bruce) is an inhibitor of apoptosis involved in prostate cancer progression, but it also acts in neuronal protection against apoptosis (
65,
66). Ras-related protein Rab-15 (RAB15) is a direct connector of BIRC6, and it plays a role in regulating synaptic vesicle membrane flow in nerve terminals (
67,
68). Septin-5 (SEPT5) is involved in the binding of SNARE complexes, inhibiting synaptic vesicle exocytosis (
69). Recent studies have shown that manipulation of this gene in mice leads to altered social interaction and altered affective behaviors (
70,
71). Phosphorylase b kinase regulatory subunit beta (PHKB) is involved in glycogen metabolism and has been linked to neuronal plasticity (
72). Other proteins in the hub link to the cross-species phenotype term hyperactivity and attention deficit hyperactivity disorder: MAPK1 is directly connected, and SEPT5, PARK2, MYC, and TUBA3C are indirectly connected. They are thus prime candidates for further evaluation in functional assays.
While this study started from rare CNVs, we also found corroborating evidence for several of the genes implicated in ADHD in studies of common genetic variants. Among the 26 CNV-affected genes most consistently observed across the different bioinformatics approaches applied in this study, we also found
RBFOX1 and
POLR3C to be associated with the disorder in the largest SNP-based GWAS meta-analysis to date. In a recent study by Lee et al. (
73),
RBFOX1 was discovered to be the second most pleiotropic locus of the genome-wide meta-analysis among eight psychiatric disorders.
RBFOX1 encodes a splice regulator regulating several genes involved in neuronal development and mainly expressed in the brain (
73–
76). Animal models have shown that
RBFOX1 is involved in mouse corticogenesis and aggressive behaviors (
73,
75,
77,
78).
POLR3C is included in a well-known small CNV located at 1q21.1, which contributes to a broad spectrum of phenotypes in addition to ADHD, including morphological features and ASD (
79,
80). The 26 genes listed in
Table 1 may be viewed as having, individually, the highest credibility as ADHD candidate genes. We therefore recommend that these genes be prioritized in future studies searching for rare (single-nucleotide) variants in ADHD and for functional characterization of gene-disease pathways.
Table 1 also shows that among these 26 genes, common biological themes are present, such as transcription (already highlighted by the module analysis), mitochondria biology, mRNA metabolism, and cytoskeleton.
It is likely that the gene modules we identified for ADHD are also relevant for other neurodevelopmental disorders. Several of the individual studies that formed the basis for our work already reported overlap of identified CNVs and copy-variant genes with those found associated with ASD, intellectual disability, and schizophrenia (e.g.,
8,
18). Our analysis of the 26 top genes in cross-disorder GWAS data from eight different psychiatric disorders also suggested pleiotropy. Whether the gene-gene interactions in the specific modules may be more relevant for ADHD than for the other disorders will need to be clarified in future work.
Analysis of the expression patterns of the 26 most credible genes revealed a varied expression, with several genes being restricted to the brain and others showing a broad expression pattern across different tissue types. The developmental analysis showed the most overlap between genes during prenatal development. We were surprised to see that
TUBA3C and
DMRTB1 appeared to show expression restricted to the testis in the GTEx data and also did not show brain expression in the BrainSpan data. However, this could be a technical/analytical artifact, as the
TUBA3C gene is known to be mutated in Kabuki syndrome, a genetic syndrome including intellectual disability, and
DMRTB1 was earlier identified as an interactor of, for example,
RBFOX1 in a screen for protein-protein interactions relevant in inherited neurodegenerative disorders (
81). We did not observe restriction of gene expression to specific brain regions. This is consistent with results from recent brain imaging studies, where structural alterations in patients with ADHD were found to be a rather global phenomenon affecting multiple brain measures, with the highest effect sizes found for global measures, such as intracranial volume (a proxy for total brain volume) (
47,
82).
Our study should be viewed in the light of some strengths and limitations. We show that filtering based on recurrent CNVs restricted to ADHD case subjects in conjunction with complementary bioinformatics methods bear great potential to prioritize ADHD candidate genes. For the first time, cross-species phenotypes were used to identify candidate genes linked to ADHD core phenotypes, pointing to high-priority candidate genes. Several of the identified genes form significantly connected protein networks characterized by shared functions. In the cross-species analyses, we found that 56% of the genes (684/1,213; see Table S3 in the
online supplement) were annotated with the object label “hyperactivity”; this overrepresentation may result from the fact that activity is more easily assessed in animal models than are cognitive phenotypes, such as attention and impulsivity. The selection of genes analyzed in this study is extensive but by no means exhaustive. First, it is important to note that by only analyzing recurrent CNVs and focusing on the minimal regions of overlap among CNVs in the same region, genes with relevance for ADHD may be overlooked; on the other hand, the stringent evidence-based filtering holds high potential to uncover the most ADHD-relevant biological pathways. In addition, neither the surveyed CNV studies nor our study consider effects of CNVs on the surrounding genetic landscape. A duplicated CNV translocation, for example, can have an impact on the expression of genes in the chromosomal region at the site of insertion, which alone or together with the duplicated genes can contribute to ADHD phenotypes (
83). An additional limitation was the fact that the criteria used for selection of CNVs were not identical across the 11 source studies. Furthermore, the primary studies included here evaluated only autosomal CNVs; given the known sex differences in the prevalence of ADHD, it will be interesting to extend such work to the sex chromosomal CNVs. Lastly, we used a concatenation approach across brain regions and developmental time points based on the BrainSpan data set in our bioinformatic analysis, in keeping with the broad involvement of cortical and subcortical brain regions in ADHD (
47,
82). Knowing that not all brain regions and developmental stages are well represented in the BrainSpan data, we may have missed specificity in the coexpression modules. While we chose for maximal power through concatenation, ideally, one would also want to consider spatial and temporal gene expression patterns independently. With the more comprehensive data sets now becoming available, opportunities for such analyses will improve for future studies (
84,
85).