Biomeasures and Biomarkers
Biomarkers depend on biomeasures, i.e., any measurable characteristic of living tissue or fluids (e.g., bioassays of blood, urine, or cerebral spinal fluid; tissue biopsies of organs; and electrophysiological and functional measures of the brain at rest and in response to medications or selected tasks) (
4). Biomarkers must be reliable by being objective and minimally affected by the will, behavior, or attitudes of both patients and evaluators or by transient environmental influences. For example, patients’ efforts affect reaction times but not visual evoked responses (
4). Similarly, physiological responses to tasks that engage and thus depend on the patient’s willingness to attend to and conduct the task would be less suitable as biomarkers, whereas images reflecting receptor density or binding in the brain would qualify. Actigraphy would not be suitable as a biomarker because the subject can choose to be active or not, but body temperature over the 24-hour cycle would be more suitable.
Biomarkers may be present before, during, or following the symptomatic expression of the illness (
Figure 1). The antecedents are more likely to be mechanistically and potentially genetically relevant, whereas the consequences or “scars” might play a larger role in predicting the subsequent course of illness, the likelihood of complications, or residual disability. For example, some sleep EEG features appear to be trait-like (i.e., present before, during, and after the depressive episode), whereas others are largely episode concomitants (
5).
In addition, the types and nature of biomarkers may change over the course of the condition. Early biological indicators of bipolar illness may differ from those found among persons with an established multiyear history of the condition as the disease processes evolve over time. For example, left ventricular hypertrophy develops after years of chronic hypertension.
Furthermore, age and gender can affect the types and presence of biomarkers. For example, various sleep EEG features are affected by age, even among healthy persons. Women’s menstrual cycle phases, as well as smoking, diet, and exercise, can affect the presence, expression, and interpretation of biomarkers.
Whether one or a panel of biomeasures become clinically useful biomarkers depends on
how well the test performs,
costs and burdens to patients and care systems,
practical limitations (e.g., invalid results for patients taking oral contraceptives or antihypertensive medications),
comparative cost and performance vis-à-vis alternative competing tests, and
clinical utility (i.e., what the biomarker adds to what we already know). In terms of acceptable test performance, Kraemer and colleagues (
4) suggested a kappa of .6 for binary calls (e.g., a positive versus a negative diagnosis) and a Kendall or Spearman rank correlation coefficient of at least .6 for continuous variables. For example, a biomarker designed to reflect the severity of anxiety should correlate at least .6 with the anxiety symptom rating scale.
If the test is “accurate enough,” its clinical utility will depend on what the test adds to the clinical decision that it is designed to inform (
6). For example, a specific threshold can be set with a particular measure that identifies individual patients who have a high certainty of doing poorly or well with a treatment. The degree of certainty—if high enough—can lead the clinician to strongly encourage or avoid the treatment (for examples, see Grieve and colleagues [
7], South and colleagues [
8], Kuk and colleagues [
9,
10], and Li and colleagues [
11]). More definitive biomarkers are more likely to be tightly tied to pathobiology or etiology (
12). Often, several biomarkers are combined in a panel to increase the accuracy instead of relying on just one marker.
An important approach to dealing with the heterogeneous nature of psychiatric syndromes (complex disease phenotypes) has been to identify endophenotypes. In essence, this effort often combines clinical features with particular biological features that reflect the links between specific genes and the disease expression (
13). The term—borrowed from insect biology—refers to the measurement of biological processes that lie between the genes and the clinical presentation. Fundamentally, the idea is that an endophenotype will be more biologically and etiologically homogeneous than the wider complex phenotype (syndrome) from which the endophenotype is drawn (
14,
15). Because the endophenotype should be based on shared genes, it should be heritable, tend to cosegregate with the condition, and be found in some unaffected relatives in multiply affected families. Earlier work has been done on eye movement dysfunction in schizophrenia (see Levy and colleagues [
16] for an overview), and other work has been done in alcoholism (
17,
18).
Precision Medicine, Personalized Care, or Targeted Therapy
Personalized or precision medicine refers to the tailoring of medical treatment to the patient on the basis of individual patient characteristics (
19). Fundamentally, treatment selection derives from our ability to sort patients into subgroups that differ in their biology, prognosis, or response to treatments (
20). Biomarkers can aid us in selecting among multiple treatment options. This subgrouping enables the identification and selection of individual patients who are most suitable for (i.e., very likely to benefit from) or who are best advised to avoid specific therapies and the associated treatment costs and risks. This kind of targeted therapy is also referred to as “stratified medicine” (
21). The notion is that through an analysis of biomarkers in the patient population, disorders can be stratified into subsets that exhibit differential outcomes and responses to specific therapeutics (
22).
Beyond stratification, the term “precision medicine” is also used to include the creation of unique medical products for an individual patient, such as developing a cancer-fighting vaccine based on molecules derived from the patient’s own tumor.
From Biomarker Research to Clinical Applications: Please Mind the Gap!
Typically, when clinicians choose to use biomarkers or any laboratory test, they are often asking whether the patient does or does not have the condition or will or will not have a good outcome, etc. Unless the biomarker is almost uniquely associated with the condition in a highly sensitive and specific manner (as in the relation of hemoglobin SS to sickle cell disease), the actual performance of a biomarker test will depend substantially on the clinical context in which it is used.
When we ask, “How likely is my patient with a positive test to actually have the disease?” we are asking about the predictive value of a positive test. The answer depends on two test properties—sensitivity and specificity—and on the prevalence of the diagnosis or outcome in the population we are testing. Because prevalence affects the predictive value of any test, the same diagnostic test will have a different predictive accuracy according to the clinical context in which it is being applied.
Specifically, using the same test in a population with a higher prevalence of the positive outcome simultaneously increases the positive predictive value (PPV) and decreases the negative predictive value (NPV) of the test. Of course, if the clinical sample to which the biomarker is being applied has a comparable prevalence—in terms of the disorder or outcome being tested for—as the research populations in which the test was developed, the PPV, false discovery rate (FDR), and NPV will be comparable.
PPV depends on the number of true and false positives as shown in the following equations:
or
To calculate the actual numbers noted above, one needs the test performance characteristics of sensitivity and specificity and the prevalence of the condition in the population being tested. That is:
A concrete example is presented in
Table 1, which illustrates the impact of prevalence changes on PPV for a test with 99% sensitivity and 95% specificity (
23).
Table 1 shows how the prevalence of diabetes—the outcome being sought—affects the performance of even a very good test. The first and last rows show that when the prevalence of diabetes rises with age from 1% (among 30-year-olds) to 20% (among 70-year-olds), the PPV rises from 17% to 83%—a huge difference in the clinical interpretation of the same test result. (The middle rows of the table show how this result is calculated.)
Similarly, prevalence affects the FDR, which answers the question, “How often is a positive test wrong?” The FDR also depends on prevalence because it is calculated by using PPV as follows: FDR=1–PPV.
In brief, a biomarker or other lab test used in a very low-prevalence population will have a low PPV and a high FDR. Making biomarkers work well clinically depends on the timely and judicious use of the tests and on using the tests in a preferably target-rich environment.