This issue of the
Journal features an article by Almasy et al.
(1) that presents the results of a genome-wide linkage study of schizophrenia, using both categorical diagnoses and quantitative scores on multiple neurocognitive tasks as phenotypes. Deficits in many neurocognitive functions have been observed in individuals with schizophrenia and their clinically unaffected relatives
(2) . The central hypothesis of the Almasy et al. study was that the use of such neurocognitive phenotypes would increase the power to localize genes related to illness. The study sample was comprised of a set of 43 extended families, with multiple members diagnosed with schizophrenia. On average, 16 persons per family participated in the study. Most families had two affected individuals, and one-third had between three and five affected family members. Although this would be considered a moderate-sized sample for a linkage study, the use of quantitative phenotypes provided additional strength to the study design.
Ideally, quantitative phenotypes or traits can be measured in each study participant, and then the correlation between the scores of different individuals and the extent of DNA sequence that they share in each region of the genome can be assessed. This allows for a more efficient use of the genetic information of all members of the study. Unaffected family members who have abnormal scores on some measures can now contribute as much to the analysis as individuals with the categorical disease diagnosis. In addition, all individuals with a given diagnosis (affected or unaffected) are no longer considered phenotypically equal. Two affected family members with very different scores on a measure will now be treated quite differently in the analysis. The use of quantitative measures instead of categorical ones can provide a tremendous boost to the power of a linkage analysis when the trait involves expression of the genetic variants of interest.
Almasy et al. used a computerized battery to assess several cognitive domains commonly affected in schizophrenia, including abstraction and mental flexibility, attention, memory, and sensorimotor processing. Performance measures of each task were used as quantitative traits for genome-wide linkage analysis, the first such study to be reported for many of these domains. A categorical phenotype defining affected individuals as those who have schizophrenia or schizoaffective disorder, depressed type, was also used. Two noteworthy linkage signals were detected: one on chromosome 5q, with a quantitative phenotype for efficiency of abstraction and mental flexibility, and one on chromosome 19q, with the categorical phenotype. Because of the use of multiple-correlated phenotypes in this study, it is difficult to assess if these results rise to a rigorous level of statistical significance after correcting for multiple testing. The 5q region has been implicated by other linkage studies of schizophrenia using categorical phenotypes, while the 19q locus has not been previously reported. Despite the theoretical advantages of quantitative trait analysis and testing of multiple plausible domains, the quantitative trait analysis performed similarly to the analysis using categorical diagnoses. Each type of analysis identified one locus of interest, with similar magnitudes of statistical support. So we may be left asking: Did the quantitative trait analysis live up to expectations?
To answer this question, we need to better understand some of the features of quantitative trait analysis. Quantitative trait linkage analysis outperforms a categorical analysis when the trait of interest is truly quantitative and the categorical analysis is based on a forced dichotomization of the trait. For example, if we want to find genes related to height and we conduct a study categorizing our subjects as “tall” or “not tall” based on some arbitrary value, we have unnecessarily discarded much of the possible phenotypic information in our study sample. This can lead to a distorted relationship between members of the phenotypic classes. A person of average stature and a person with congenital dwarfism will both be classified as “not tall,” despite the very different phenotypes and genetic underpinnings of their height. On the other hand, a person 1 cm taller than the cutoff will receive a different categorical label than the person 1 cm shorter than the cutoff, despite the similarity in their height. In this situation, it is easy to understand why it is preferable to use the quantitative trait instead of a derived categorical assignment.
Suppose instead that the trait of interest is something more complex, such as being a successful professional basketball player. We could readily derive a consensus definition that would allow us to reliably categorize any player as successful or not. Observing that the majority of players categorized as successful are tall and that being tall tends to run in families, we might well be tempted to conduct a linkage study of height as an endophenotype of basketball prowess, hypothesizing that genes that control height contribute to success as a basketball player. However, a review of linkage studies of human stature revealed multiple regions with generally modest support for linkage, even when large samples were used
(3) . Height is a complex trait, determined by the interplay of multiple genes as well as important environmental influences. Just because a trait can be accurately measured does not mean it will necessarily be simple to find the genes controlling it. Also, height may not be the most important factor in determining success at basketball. Although quantifiable traits are attractive for study, they may not be the most relevant phenotypes to investigate when one is interested in the genetic etiology of disease.
However, the situation is not this direct in the Almasy et al. study. Schizophrenia, unlike success in basketball, has been demonstrated to have a significant inherited component. Although less than that of schizophrenia, in which heritability is >80%
(4), individual neurocognitive measures, including those used in this study, show moderate levels of heritability
(5), supporting their attractiveness as endophenotypes for schizophrenia. Age and environmental influences on cognitive measures may preclude their use in the assessment of certain individuals. Nevertheless, such quantitative traits could increase power for genetic studies if they identified genetically relevant subtypes of schizophrenia or if their underlying genetic architecture was simpler than that of schizophrenia. To date, there is little evidence of the former
(6), although globally lower intellect and characteristic physical features can clinically identify genetic subtypes such as those associated with 22q11.2 deletions
(7) . Studies of quantitative traits in simpler organisms and a review of endophenotypes in schizophrenia also provide little support for the latter hypothesis
(8) . Even measures of gene transcription levels, clearly closer to genotype than any clinical assessment, may not improve power to map complex diseases
(9) . In Alzheimer’s disease, for which cognitive endophenotypes are also being assessed, quantitative trait loci may not be the same as those for disease
(10) .
The “best” phenotype for genetic studies of schizophrenia is not entirely clear. The Almasy et al. study represents a very important empirical test of the utility of endophenotypes in linkage analysis of schizophrenia and suggests that there may be greater utility in combining endophenotypes with standard diagnostic phenotypes. For inherited genetic diseases, the goal is to discern the clinical definitions that most closely correspond with the genetic causes of the illness. The development of disease definitions usually includes surveying family members to determine what alternative forms the disorder may take. It is also critical to remember that genetic heterogeneity is the norm in nature and a single clinical label does not imply an underlying homogeneous etiology. A survey across the disease for clinically identifiable genetic subtypes, using coexisting features and/or familial segregation patterns, can significantly decrease genetic heterogeneity. With few exceptions, these are not the processes used to derive the reliable psychiatric disease definitions we currently use. However, these categorical definitions demonstrate high heritability
(4), have resulted in the most significant findings to date, and are the entities we wish to better understand, providing a compelling argument for their continued use in genetic studies.
One may be tempted to consider using very large samples to test a quantitative (or categorical) trait hypothesis, but this will not guarantee success. In addition to potentially introducing greater genetic heterogeneity, increasing sample size at the cost of genetically relevant knowledge, such as accurate phenotype and familial segregation patterns, can result in a loss of power to localize susceptibility loci
(11) . While there is always the hope that the next new laboratory technique or statistical method will suddenly be able to find clear genetic signals where there were none before, the reality is that without excellent clinical data, the odds will always be heavily against us. Even technological advances that identify new molecular genetic entities, such as copy number variation, require detailed clinical characterization for interpretation. A focus on careful assessment of the most genetically relevant phenotypes must be maintained as we move into the next phase of genetic studies of schizophrenia.