The hypothesis that multiple genetic variants contribute to psychiatric disorders was first proposed for schizophrenia in the 1960s, based on epidemiological data (
1). This polygenic contribution of genetic variants, which leads to a continuous liability to disease, arose long before molecular genetic studies allowed us to identify specific genetic markers associated with mental disorders. Genome-wide association studies (GWASs), which test genetic differences between case and control subjects for a psychiatric disorder, have now identified thousands of genetic variants associated with psychiatric disorders (
2). These variants can be combined into polygenic scores (PGSs), which provide an individual-level single measure of genetic loading. Psychiatry led the way in developing PGSs, with the first application in 2009 by the International Schizophrenia Consortium (
3). At the time, some geneticists were skeptical: the polygenic contribution was inconsistent with the predominant Mendelian model for disease susceptibility, which postulated a small number of genetic variants, each having a large effect on disease risk (
4). However, evidence from early GWASs supported polygenic architecture for most psychiatric disorders, which became the dominant paradigm (
5).
The value of PGSs for capturing genetic liability to disease has proved unequivocal (
6,
7), and PGSs are used across medical disciplines. Research studies have established that for most disorders, PGSs provide a statistically significant but modest level of prediction for a single disorder and can establish common genetic underpinnings of two or more disorders. Clinical applications for PGSs are now being explored (
8). For example, in a landmark paper, Khera et al. (
9) argued that PGSs for common disorders, such as coronary artery disease, type 2 diabetes, inflammatory bowel disease, and breast cancer, can identify individuals with risk equivalent to that conferred by monogenic pathogenic variants. The use of polygenic risk prediction in clinical care was proposed, and studies to explore the use of PGSs in heart disease risk prediction and in cancer screening are under way (
10–
12).
What Is a Polygenic Score?
Through GWASs, we have begun to identify specific genetic variants that contribute to the genetic loading for diseases or traits. Large sample sizes of participants with psychiatric disorders and control subjects are needed to identify associated genetic variants, because of the low effect sizes conferred by each variant and the stringent multiple-testing correction required to assess millions of variants across the genome. These genetic variants can be combined into a polygenic score, calculated by summing the risk alleles that an individual carries, weighted by their effect size in the discovery GWAS. PGSs are continuous measures of liability to disease with a normal distribution: most individuals’ scores will be close to average, with some individuals having a score in the tails of the distribution, giving them an elevated or reduced risk of developing the disease.
PGSs are often used to test the hypothesis of association with a disorder and are more strongly associated with case-control status than any single genetic variant. Because only a single test is performed, sample sizes of hundreds of study participants are often sufficient to detect associations—a reduction by orders of magnitude from GWAS sample sizes. PGSs therefore combine our dispersed knowledge of multiple genetic contributors to disorders into a genetic loading specific to each individual. The key question is, What is the value of PGSs, given our incomplete knowledge of genetic predisposition to psychiatric disorders and the additional risk conferred by environmental factors?
As a motivating example, we take the PGSs generated in the largest GWAS for schizophrenia from the Psychiatric Genomics Consortium (PGC), a meta-analysis of 76,755 individuals with schizophrenia and 243,649 control subjects without schizophrenia, with 25% of participants having non-European ancestry (
14). This study identified 287 independent genetic variants associated with schizophrenia. Polygenic scores constructed from these results (available at
https://pgc.unc.edu) explained on average 7.3% of the variance in liability (i.e., case-control status). Across ancestries, those with a polygenic score in the top 1% of the distribution were at a sixfold increased risk of schizophrenia compared with all others, and those in the top PGS decile (i.e., the 10% of the population with highest polygenic loading) had a 16-fold increased risk compared with those in the lowest decile (
Figures 1A and
1B). These summary measures are useful in research studies, but for any communication of risk to an individual, or in a clinical setting, these must be converted to an absolute risk. The risk of schizophrenia is approximately 4%–6% for an individual with a PGS at the top 1% threshold and 2% for an individual with a PGS at the top 10% threshold (
15). These values are low, particularly in comparison with a positive family history of schizophrenia, which confers a seven- to eightfold increase in risk for first-degree relatives of patients with schizophrenia (
16,
17). However, only a small percentage of the general population have a first-degree relative with schizophrenia, and thus family history is noninformative for most people. The modest absolute risks for schizophrenia reflect the low lifetime prevalence of schizophrenia (∼1%) and missing risk factors (e.g., undetected common variants, rare variants, sociodemographic and environmental risk factors), which underpin the authors’ conclusion that “the liability explained is insufficient for predicting diagnosis in the general population” (
14).
What About Ancestry?
Over 86% of GWASs have been performed in participants of European ancestry, with less than 5% of studies including participants from multiple ancestries (
24,
25). Such discrepancies are a particular problem in genetic studies, since interpopulation differences in the frequency of genetic variants and the complex patterns of correlation within short genetic regions (linkage disequilibrium) exist. These global differences imply that PGSs constructed from one population may have lower prediction in another population. For example, in schizophrenia, polygenic prediction is much higher in European cohorts than in African American cohorts (liability R
2 of 0.073 compared with 0.017) (
14,
26). Poor transferability of PGSs between populations will hamper the translation from research studies to clinical implementation, where it is important that genetic prediction is relevant to all, globally and locally. Two complementary strategies are being followed to ensure equity of genetic studies. First, increased investment in genetic studies worldwide and including all members of local communities will expand our knowledge of genetic variants associated with psychiatric disorders in any population (
27). Second, novel computational approaches will increase the power of PGS methods, by optimally combining GWASs across populations, to build PGSs that are transferable across ancestries (
28). These approaches are under development, with best practices still evolving. For example, in polygenic prediction of lipid traits, Kamiza et al. (
29) showed that PGSs built from African Americans increased prediction in sub-Saharan Africans compared to using a European GWAS, but not consistently across African populations, potentially because of differences in genetic or environmental risk factors. These findings reinforce the need to focus research on populations that are underrepresented in research studies and underserved in health care (
30).
Using PGSs to Increase Understanding of Nosology
One important application of PGSs is to give insights on clinical characteristics and heterogeneity within a psychiatric disorder. A powerful design uses large case-control GWASs to calculate PGSs that are then used for downstream analyses in smaller, deeply phenotyped clinical or population studies. For example, the Australian Genetics of Depression Study used major depressive disorder PGS (built from the large PGC case-control GWAS) to test for associations with self-report data on depression (
31). The researchers showed that higher depression PGS was associated with earlier age at depression onset, more episodes of depression, and comorbid anxiety disorder, but not with atypical depression (
32).
PGSs can also characterize disorder subtypes and have refined the immunometabolic hypothesis of depression. For example, in PGC studies, major depressive disorder case subjects with increased weight and appetite had higher PGSs for BMI, CRP, and leptin, which was not observed in case subjects with symptoms of decreased appetite and weight (
33). Similarly, UK Biobank participants with major depressive disorder who reported both insomnia and increased weight had higher BMI and CRP PGSs than those with major depression who did not have these symptoms (
34).
Population-based studies can identify traits associated with clinical disorder PGSs. For example, major depressive disorder PGS predicted childhood psychopathology in seven longitudinal cohorts (
35). PGSs for major depressive disorder, neuroticism, insomnia, and educational attainment predicted both general childhood psychopathology and specific measures of attention deficit hyperactivity disorder (ADHD) symptoms, internalizing problems, and social problems. These findings indicate that PGSs are relevant across the lifespan, from childhood symptoms to clinical disorders in adults. This is important if genetics is to play a role in prevention or prediction given the high proportion of mental disorders that develop during childhood and adolescence.
In interpreting these results, we should be mindful that there is no gold standard for PGSs. All scores capture the genetic differences between case and control subjects in the GWASs, and the individuals with psychiatric disorders ascertained for GWASs may not be fully representative of all cases of the disorder. For example, if ascertained GWAS cases are more severe than cases in the population, then the summary statistics will overrepresent the genetics of severe disorder compared with the genetics of population ascertained cases. Unbiased case ascertainment is difficult to achieve, but national registries in the Nordic countries provide complete coverage, and results from these studies are reassuring about generalizability. For example, the iPSYCH study ascertained all Danish residents born between 1981 and 2002 who were diagnosed with depression in a psychiatric hospital before 2012. Analysis of PGSs confirmed that the depression risk in this representative cohort was similar to that attained in the more heterogeneous studies in the PGC GWAS (
31,
36).
Clinical Studies
The results of the schizophrenia GWAS presented above show that the current predictive ability of the PGS is insufficient to be used to predict risk of schizophrenia in the general population (
14,
37). Although PGSs show significant differences in mean values between case and control subjects, the PGS distributions for cases and controls have substantial overlap (
Figure 1C), and accurate individual-level prediction in the general population may not be attainable. Clinical applications of PGSs will likely focus mostly on subgroups with broader psychopathology—for example, disorder prediction in those who have prodromal symptoms that do not meet full diagnostic criteria or have a positive family history of mental disorders. For those at the early stages of illness, using PGSs to inform diagnosis, prognosis, and treatment response might be possible (
38).
An important clinical question for the treating psychiatrist when someone develops a first depressive episode is whether this episode occurs in the context of unipolar depression or represents the onset of bipolar disorder. Currently, in the absence of hypomanic symptoms, the only relevant information for prediction is family history of bipolar disorder or psychosis. In the Danish iPSYCH study, the PGS for bipolar disorder improved prediction of progression from a provisional diagnosis of major depressive disorder to bipolar disorder, above that provided by family history (
39). Similarly, for patients with a first episode of psychosis, will the episode progress to schizophrenia, or to milder forms of psychotic experiences, or will it resolve? Schizophrenia PGSs can help separate patients who develop schizophrenia from those with other forms of psychosis (
40), and combining PGSs for schizophrenia and depression improves the distinction of schizophrenia from affective psychoses (
41).
Family history is often used as a proxy for genetic risk and is incorporated into the clinical history at no cost for collection. Although a powerful predictor for persons with a positive family history, it provides little information to patients without a close relative with a mental illness. If a reliable extended family history is available, as with national registers, it is possible to develop family genetic risk scores. These risk scores use information from a full family tree, not only first-degree relatives, and have good discriminative ability to separate major affective and psychotic disorders (
42). To justify the use of PGSs when family history is available, it is important to examine the performance of models incorporating both predictors. Using the Danish population registers, Agerbo et al. (
43) showed that schizophrenia PGS, parental socioeconomic status, and family history jointly predicted risk for schizophrenia. Each predictor explained a similar proportion of the variance (3.0%–3.5%), and family history was only partially mediated by PGS, indicating that these predictors contribute independent information. Similarly, depression PGS in conjunction with other parental psychosocial factors improves risk prediction for early-onset depression (
44).
In many clinical services, young persons with neurodevelopmental disorders are routinely screened with chromosomal microarrays for copy number variants (CNVs), a type of structural variation where a stretch of DNA is duplicated or deleted (
45). Persons with neurodevelopmental disorders frequently have psychiatric comorbidities, and the CNVs can also directly confer increased risk of psychiatric disorders. However, only a proportion of CNV carriers develop psychiatric disorders (
46), and it is therefore important to establish the joint contribution of CNVs and PGSs to risk. In schizophrenia, total CNV burden and PGS contribute additively, suggesting that total genetic risk can be estimated as the sum of these two risk scores (
47). However, interactive effects were found between PGSs and specific CNVs, which needs to be explored further before combining the two genetic risks in prediction models. Other risk factors also play a role, and the Philadelphia Neurodevelopmental Cohort found that combining CNV risk scores, PGSs, and measures of environmental stress improved the prediction of cognition and psychopathology (
48). One of the near-term targets for clinical use of PGSs may be in individuals with 22q11.2 deletion syndrome, who have a 20-fold increase in risk for schizophrenia. High schizophrenia PGSs further increase the risk of schizophrenia in 22q11.2 deletion carriers (
49) and are associated with cognitive decline, psychopathology, and hippocampal volume reduction (
50).
Predicting treatment response and adverse effects is central to the development and delivery of precision medicine. Pharmacogenetic studies of genes in pharmacodynamic and pharmacokinetic biological pathways identify modest associations for specific drugs or drug classes (
51). These account for only a small proportion of adverse events or response, and studies exploring polygenic prediction are beginning to appear. Using primary care records to define treatment-resistant depression based on the number of switches between antidepressant drugs, we found that PGS for ADHD, but not PGS for major depressive disorder, was predictive of treatment-resistant depression (
52). In clinical samples with prospectively assessed response to antidepressants, we observed that genetic risk for depression is weakly associated with remission rates, which are better predicted by high PGS for educational attainment and low PGS for schizophrenia (
53). Two recent studies of individuals with major depressive episode undergoing ECT found that those with high PGSs for schizophrenia (
54) or bipolar disorder (
55) had improved response and remission rates. In contrast, PGS for depression was associated with either lower improvement following ECT (
55) or no change (
54). For the prediction of treatment-resistant schizophrenia, a recent study (
56) showed that PGSs calculated from GWASs with schizophrenia case subjects treated with clozapine are more predictive than scores from GWASs with non-clozapine-treated schizophrenia case subjects. These studies highlight that PGSs of disorder susceptibility are not the optimal predictors of treatment response. Larger sample sizes will be required to build therapeutic-specific PGSs for treatment response that can be used for clinical decision making.
The ultimate test for the clinical validity of genetic tests is the improvement in applied risk prediction models, which are commonly used in physical disorders. In cardiovascular disease, adding PGSs to conventional risk factors could help prevent 7% of new cardiovascular disease events in 10 years (
57), while other researchers have suggested that further investigation is required before clinical implementation of genetic information (
58). In breast cancer, PGS is already incorporated in risk models for pathogenic variants in
BRCA1 and
BRCA2 (
59). In psychiatry, in contrast, prediction models based on multivariable risk factors are rarely used. One exception is models to identify those at ultra-high risk for psychosis based on clinical criteria, including attenuated or brief intermittent psychotic symptoms and elevated genetic vulnerability (
60), which has steered the development of clinical services to prevent psychosis (
61). Results from studies integrating PGSs with these clinical models have been mixed. One study showed that psychosis risk prediction models based on clinical and historical data in targeted populations at high risk improved with the addition of schizophrenia PGS (
62). Conversely, a study assessing the prediction of poor outcomes in psychosis (aggressive behavior, psychiatric admissions, course of disorder) from clinical information captured in a standard psychiatric interview found no added value by including schizophrenia PGS (
63). One interpretation is that genetic risk, as measured through PGS, might be more relevant to the development of schizophrenia than to the clinical course of disorder. However, in a longitudinal study measuring illness severity over 20 years following first hospitalization (
64), schizophrenia PGS predicted consistent differences in the course of negative symptoms, cognition, and psychosocial function, highlighting its potential as a prognostic indicator. Further research is required to establish the role of PGSs in models for prediction and prognosis.
PGSs and Environmental Risk Factors
The importance of both genetic and environmental factors in disease liability is indicated by an average twin heritability of all psychiatric disorders of only 46% (
65). Genetic factors act together with the environment in disease causation, and models incorporating both could improve risk prediction (
66). The simplest models assume an additive effect, but more complex models may be needed to account for the dynamic interplay between genetic and environmental variation in disease causation. Emerging evidence suggests that genetic factors are associated with the environment we are exposed to, in a phenomenon called gene-environment correlation (
67); hence, summing the risk from PGSs and environmental factors is an oversimplification. In addition, many researchers explore the possibility of gene-environment interaction (
68), based on the idea that genetic liability is modulated by environmental exposures. Although this is an expanding field of research (
69), several limitations affect the reproducibility of findings (
70), including lower statistical power compared with studying main effects, different environmental exposures between studies, and different statistical models to test for interaction. For example, Coleman et al. (
71) found a higher genetic contribution to depression in persons reporting trauma exposure compared with those not reporting trauma. A significant interaction between depression PGS and reported trauma was observed on the additive scale, but not on the multiplicative scale, suggesting that interaction results depend on underlying assumptions on how genes combine with the environment in disease causation (
72). Substantial work is needed on the joint role of genetic and environmental risk factors in psychiatric disorders before we can assess the potential for clinical implementation of gene-environment interactions.
Next Steps to Capitalize on PGSs
The previous sections have illustrated the use of PGSs to elucidate the role of genomics within and across psychiatric disorders and related traits. In research studies, the role of PGSs is well established, and PGSs have been highly successful in shedding light on clinical disorders and characteristics. The potential to use PGSs in clinical settings is more tentative. Many barriers remain to be overcome, primarily that the prediction utility of PGSs is too weak (see
Figure 1) to justify their use for assessing risk in the population. However, in the absence of stronger biomarkers in psychiatry, further exploration of the potential of PGSs for risk prediction is well justified. Increasing GWAS sample sizes will identify additional associated genetic variants, and thus increase the predictive ability of PGSs further, but this is unlikely to be transformational. For example, the most recent PGC schizophrenia GWAS reported 287 distinct genetic loci, a substantial increase over the 108 loci previously identified (
14,
73). However, when the Molecular Genetics of Schizophrenia cohort was used to benchmark prediction, the variance explained for case-control status increased only modestly from, 0.18 to 0.21, in the newer GWAS.
What should our expectations be for using PGSs clinically? Aside from using PGSs in combination with rare CNVs, especially 22q11.2 deletion, the evidence presented above suggests that implementing other applications is premature. Several possibilities exist for the next steps to assess the potential for using PGSs in clinical settings. First, we should not expect that genetics alone would give useful prediction. As in risk models for cardiovascular disease, it is more realistic to assume that genetics would be combined with other risk factors in an integrated risk model. Second, predicting risk for diagnosis of a psychiatric disorder in the general population (which is the implication from PGSs based on case-control studies) may not be the appropriate setting for applying genetic models. A particular concern is that people without a diagnosis may be stigmatized if, for example, they find out they have a high PGS for a psychiatric disorder. As described in the clinical studies section, predictive models for prognosis and for course of disease might be more useful, with the added advantage of being able to integrate information on clinical characteristics, family history, and other data from electronic health records alongside PGSs. Similarly, emerging studies for prediction of treatment response or treatment resistance might provide genetic evidence that is useful for therapeutic decision making.
Why should we be enthusiastic about PGSs? Any new tool that could contribute to solving the challenges of psychiatry should be welcomed. To establish how PGSs could most usefully be applied in preventing, diagnosing, or treating psychiatric disorders will take substantial time. It is important that we be realistic about the likely role of genetics, and its use alongside clinical data, other biomarkers, neuroimaging, or environmental risk factors. Other barriers or limitations to be overcome include public understanding of genetics, which is often a yes/no, high risk/low risk perspective. We need public education to reframe and communicate genetics as a continuous measure that is probabilistic, but not deterministic, in measuring risk. In addition, any clinical applications should be relevant to all, irrespective of ethnicity and ancestry; to achieve this, substantial research investment in global and local populations is needed.
In psychiatry, we lack the easy targets for population-level implementations of risk prediction that are available in other areas of clinical medicine. In cardiovascular disorders, adding genetics to well-established risk models (the pooled cohort equations, the QRISK3 prediction algorithm) could identify those at high risk where this would not be detected in standard screening programs (
10). In cancer, genetics might be used to determine individual-specific screening programs (
11,
12). These implementations of genetics will be beneficial in making genetic data more widely available in electronic health records, recalling that the key advantage of genetics is its stability across the lifespan, so that a single genetic test can be used to calculate any PGS.
In summary, PGSs are now well established as a research tool, providing insights on relationships between psychiatric disorders, and heterogeneity within disorders. Their adoption in research studies has been facilitated by the relatively low cost of genotyping. Further, powerful and flexible software tools allow PGSs to be easily integrated with study data and analysis pipelines. However, PGSs are not yet ready for clinical implementation in psychiatry, except potentially in combination with CNV testing. Looking forward, the utility of PGSs in other areas of medicine may lead to PGSs becoming widely available in medical records. We should continue to evaluate how, where, and when they might be useful in psychiatry.