Skip to main content
Full access
Perspectives
Published Online: 1 March 2012

How Good Are Observational Studies in Assessing Psychiatric Treatment Outcomes?

In comparative effectiveness studies, treatment effects are calculated by contrasting outcomes between patients who have been assigned to different treatment groups. While randomized treatment assignments are preferred, constraints on resources, timeliness of results, ethical concerns, low frequency of outcomes, and demands for patient subgroup analyses often lead psychiatric investigators to rely on observational data after patients and their physicians have self-selected their treatments (1). With observational studies, researchers must account for how patients select their treatment to properly adjust estimates of treatment effects that control for these selection biases.
In this issue of the Journal, Leon et al. (2) provide a strong case for an informative observational study. Their analysis was conducted to further test the 2009 Food and Drug Administration warnings that suicidal behavior could accompany the use of antiepileptic medications. To account for selection biases in this observational group, the authors applied an advanced statistical method called “propensity scoring” (3, 4). When the authors analyzed panel data for 199 participants with bipolar disorder who were followed for 30 years, they found no association between antiepileptic medication use and risk of suicide attempts or completed suicides.
The natural question for the practitioner is: Do these analytic techniques lead to scientifically valid findings that can guide clinical decisions? To make this judgment, it might be helpful to explain how statisticians control for selection biases.
Leon et al. (2) computed a treatment effect size by comparing suicidal outcomes (rate of suicide attempt or suicide) between treatment groups (patients who were exposed and who were not exposed to antiepileptic medications) after adjusting for differences in patient demographic and clinical factors. To yield valid findings, these adjustments must be based on all relevant confounding factors and computed using a correctly specified outcomes model.
To be relevant, confounding factors must 1) vary across treatment groups and 2) be expected to have an impact on patient outcomes directly. Randomized treatment assignments that yield equivalent treatment groups are said to be unconditionally “exogenous.” Thus, no factor will vary across treatment groups, and calculating effect sizes is reduced to simple comparisons of outcomes across treatment groups. However, when patients and their physicians self-select treatments, treatment groups are not expected to be equivalent. Researchers analyzing the outcomes must then identify all confounding factors (e.g., in the Leon et al. study, clinical and demographic characteristics), determine covariates from the data set to measure these confounding factors (e.g., in this case, prior symptom severity, suicidal behaviors, and comorbidities as clinical factors and socioeconomic status, marital status, age, and gender as demographic factors), and then specify an outcomes model to compute effect sizes that are adjusted for these covariates (e.g., here, a mixed-effect, grouped-time survival model).
Outcomes models specify outcome as the dependent variable (here, time between initial period and onset of suicidal behavior, if any) and the confounding covariates, along with a treatment indicator variable, as the independent variables. Treatment indicators assume a value of 1 when the patient selects the treatment of interest (e.g., exposed to antiepileptic in an initial period), and zero otherwise (e.g., not exposed). The outcomes model is fitted to the data set, and effect size is computed from estimates of the model parameters.
Few medical data sets will contain all relevant confounding factors (e.g., patient access to the means to commit suicide, patient access to psychiatric care for symptom relief). To account for these unobserved covariates, instrumental variables (5) are added to the list of independent variables in the outcomes model (e.g., here, the geographic location of patient residence, reflecting variations in gun regulations, drug trafficking enforcement, and availability of psychiatric services). Instruments must be observable in the data set, vary by treatment group, and be associated with one or more of the unobserved confounding factors. Unlike covariates, instruments are not expected to directly drive patient outcomes. Thus, any association observable in the data set between an instrumental variable and outcomes variables can be attributed to the instrument's association with one or more unobserved factors. If the observable covariates and instrumental variables included in the outcomes model reflect all relevant confounding factors, we say that the treatment assignment is exogenous conditional to the data, or “conditionally exogenous.”
The second problem is how to specify the outcomes model. Outcomes models that do not reflect the data set's true “data-generating process” are said to be misspecified (6). Adjusting for confounding factors using misspecified models could also lead to incorrect estimates of effect size (7).
To solve both exogeneity and specification problems in their outcomes model, Leon et al. summarized both covariates and instruments into a single score. This score was estimated by fitting a second model to the data set. Unlike outcomes models, these propensity models are designed to predict treatment assignment (e.g., exposed or not exposed to treatment with antiepileptics during the initial period) with covariates and instruments as independent variables. Effect sizes are computed by comparing outcomes between exposed and unexposed patients who have been matched by their respective propensity scores.
There are advantages to the Leon et al. approach. Combining covariates and instruments into a single score 1) reduces the number of free parameters in the outcomes model and thus increases power to detect treatment effect sizes; 2) permits more variables to be included in the analyses of small sample sizes; and 3) reduces the exogeneity problem to searching for variables that predict treatment assignment and the specification problem to determining how patients should be divided into discrete propensity groups.
But these advantages do not come without a price. The more successfully the propensity model predicts treatment assignment, the less likely it will be to find untreated and treated patients with matchable propensity scores (e.g., in the Leon et al. study, 21% of sampled patient-time intervals could not be matched). Replacing covariates and instruments by a single score may introduce a misspecification error because the impact of each variable on outcomes is assessed only through its association with the propensity score. When the study's purpose is to determine whether exposure to antiepileptic medication increases hazard rates for suicidal behaviors, what is needed is the propensity for suicidal behaviors, rather than the propensity for medication exposure. For instance, both severe symptoms and low socioeconomic status are positively associated with suicidal behaviors (8), while severe symptoms but high socioeconomic status often drive the decision to use medication (2). If these characteristics hold, then low-socioeconomic-status patients with severe symptoms would have a very different initial suicidal behavior profile than their high-socioeconomic-status counterparts with mild symptoms, although the two groups may have comparable propensity scores.
While citing prior successes is informative, findings should be tested for robustness each time an analytic method is applied to a given data set. Leon et al. did show that results were stable across different approaches to classifying patients into discrete propensity groups. However, more can be done here to help the practitioner judge the validity of the reported findings. For instance, a test for robustness inspired by White and Lu (9) and Rubin and Thomas (10) involves recomputing effect size estimates in which exposed and unexposed patients are rematched based on the propensity score plus one or more selected confounding covariates (e.g., propensity scores and socioeconomic status). Since both matched and rematched estimates are designed to measure the same effect size, any observed difference would allow the investigator to reject the null hypothesis that estimates were robust. By repeating across different sets of selected covariates (e.g., propensity and marital status, propensity and age group), the rematched sample that yields the greatest deviation from the original effect size estimate can be determined and tested for significance by bootstrapping the original data set.
This discussion is intended to point to an “analysis gap” that exists between advanced analytic methods that are known among mathematical and computational statisticians and actual methods that medical researchers apply in observational studies. The Leon et al. study offers a good example of methodologists and clinical investigators working closely together to narrow that gap and apply advanced statistical methods to observational outcome studies. As the National Institutes of Health continues its support for observational studies (1), medical researchers should, rather than restating theory, reciting prior successes, or limiting results to those computable with a popular commercial software program, comb the statistical literature, apply the best analytic methods for their study purpose, and test the applicability of such methods against their data set. Only then can practitioners have confidence that observational findings are offering correct statistical inferences on the risks and benefits of medical treatments.

Footnote

Editorial accepted for publication January 2012.

References

1.
Lauer MS, Collins FS: Using science to improve the nation's health system: NIH's commitment to comparative effectiveness research. JAMA 2010; 303:2182–2183
2.
Leon AC, Solomon DA, Li C, Fiedorowicz JG, Coryell WH, Endicott J, Keller MB: Antiepileptic drugs for bipolar disorder and the risk of suicidal behavior: a 30-year observational study. Am J Psychiatry 2012; 169:285–291
3.
Rubin DB: Estimating causal effects from large data sets using propensity scores. Ann Intern Med 1997; 127:757–763
4.
Cepeda MS, Boston R, Farrar JT, Strom BL: Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. Am J Epidemiol 2003; 158:280–287
5.
Heckman JJ, Vytlacil EJ: Local instrumental variables and latent variable models for identifying and bounding treatment effects. Proc Natl Acad Sci USA 1999; 96:4730–4734
6.
Golden RM, Henley SS, White H, Kashner TM: New directions in information matrix testing: eigenspectrum tests, In Causality, Prediction, and Specification Analysis: Recent Advances and Future Directions. Edited by, Swanson NR. New York, Springer (in press)
7.
Kashner TM, Henley SS, Golden RM, Rush AJ, Jarrett RB: Assessing the preventive effects of cognitive therapy following relief of depression: a methodologic innovation. J Affect Disord 2007; 104:251–261
8.
Qin P, Agerbo E, Mortensen PB: Suicide risk in relation to socioeconomic, demographic, psychiatric, and familiar factors: a national register-based study of all suicides in Denmark, 1981–1997. Am J Psychiatry 2003; 160:765–772
9.
White H, Lu X: Robustness checks and robustness tests in applied economics (Discussion Paper). San Diego, University of California San Diego, Department of Economics, 2010
10.
Rubin DB, Thomas N: Combining propensity score matching with additional adjustments for prognostic covariates. J Am Stat Assoc 2000; 95:573–585

Information & Authors

Information

Published In

Go to American Journal of Psychiatry
Go to American Journal of Psychiatry
American Journal of Psychiatry
Pages: 244 - 247
PubMed: 22407110

History

Accepted: January 2012
Published online: 1 March 2012
Published in print: March 2012

Authors

Details

T. Michael Kashner, Ph.D., J.D.
From Loma Linda University Medical School, Loma Linda, Calif.; University of Texas Southwestern Medical Center at Dallas; and the Office of Academic Affiliations, Department of Veterans Affairs, Washington, D.C.

Notes

Address correspondence to Dr. Kashner ([email protected]).

Funding Information

Dr. Kashner reports no financial relationships with commercial interests.

Metrics & Citations

Metrics

Citations

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format
Citation style
Style
Copy to clipboard

View Options

View options

PDF/EPUB

View PDF/EPUB

Login options

Already a subscriber? Access your subscription through your login credentials or your institution for full access to this article.

Personal login Institutional Login Open Athens login
Purchase Options

Purchase this article to access the full text.

PPV Articles - American Journal of Psychiatry

PPV Articles - American Journal of Psychiatry

Not a subscriber?

Subscribe Now / Learn More

PsychiatryOnline subscription options offer access to the DSM-5-TR® library, books, journals, CME, and patient resources. This all-in-one virtual library provides psychiatrists and mental health professionals with key resources for diagnosis, treatment, research, and professional development.

Need more help? PsychiatryOnline Customer Service may be reached by emailing [email protected] or by calling 800-368-5777 (in the U.S.) or 703-907-7322 (outside the U.S.).

Media

Figures

Other

Tables

Share

Share

Share article link

Share