Full access

Editorials

Published Online: 1 December 2016

Making Causal Inferences From Observational Databases?

T. Michael Kashner, Ph.D., J.D.Authors Info & Affiliations

Publication: American Journal of Psychiatry

https://doi.org/10.1176/appi.ajp.2016.16091091

Developing statistical models to make causal inferences from observational data offers important challenges for psychiatric researchers. Unlike variables that merely covary in a data set, A is said to cause B if the following conditions are met: 1) plausibility: A impacts B is consistent with accepted theory; 2) temporal precedence: A precedes B in time; 3) simultaneous effects: the effect of A on B can be distinguished from the effect, if any, of B on A; and 4) exogeneity: A affects B after all confounding factors C are held constant. Whether known or unknown to the researcher, confounding factors are associated with both A and B and, if left unaccounted for, will contribute to the observed correlation between A and B.

In this issue of the Journal, Blanco et al. (1) estimated the association between pain severity and prescription opioid use disorder in a representative sample of noninstitutionalized adults. They found that patients with painful conditions were at higher risk of opioid use disorder, and they conclude that “careful monitoring and consideration of nonopioid alternative treatments is warranted.”

The authors addressed the above causality conditions, including simultaneous effects, by applying structural equation modeling to the survey’s baseline (wave 1) and follow-up (wave 2) data structure and selecting confounders from among “several background demographic and clinical characteristics.” This approach works when the cause (in this case, pain) is outside the control of the patient and a well-designed database includes all critically important confounding variables that affect outcomes. It is, however, worth noting what happens if important unobserved confounders are present.

Following Heckman and Navarro-Lozano (2) and refined by Lu and White (3), control functions are designed to account for both observable and unobservable confounders. Here, we can let y be the outcome opioid use disorder and D be the variable of interest, pain severity, with Z as the vector of observable covariates of outcome that are available in the data set and U as the vector of remaining unobservable covariates that are not available in the data set. If the true data-generating process of outcome is linear, then:

(1)

where the coefficient to D, β₀, is the true pain effect size and the parameter of interest. Similarly, α₀ and γ₀ are the true effects of Z and U, respectively, μ₀ is the intercept parameter, and ϵ₀ is an independent and identically distributed random error.

To estimate β₀, suppose we model the contribution of the unobserved covariates to outcome as

(2)

where W₀ are proxies for the missing opioid use disorder covariates and W₁ are additional variables representing drivers of pain (D) and proxies of drivers of pain, with model parameters δ_i for i=c, D, Z, W₀, and W₁. ϵ₁ is an independent and identically distributed random error. Substituting equation 2 into equation 1 yields:

(3)

Note that the parameter coefficient to D equals the true effect β₀ plus confounding bias δ_D. The strategy is to deliberately misspecify equation 2 by entering additional variables W₁ so that pain (D) will no longer contribute to the total impact of the unobserved covariates in equation 2. One can test for bias by taking a transform (squared) of the estimated residuals in equation 3, regressing on the right-hand side of equation 2, and testing directly for (δ_D ≠ 0). By re-estimating equation 3 for different sets of qualifying W₁ variables, one can test whether the resulting effect size estimates vary—i.e., are robust. Since the focus on adding variables is to compute an unbiased pain effect, estimates of the coefficients to Z variables will likely remain biased (δ_Z ≠ 0), leaving the estimated effect of Z on opioid use disorder risk uninterpretable.

There are other strategies. For instance, two-stage residual inclusion replaces W₁ with the estimated residuals to regressing pain on variables that are correlated with pain, do not drive opioid use disorder, and are uncorrelated with U and ϵ₀ (4).

If present, correcting for unobserved confounders can make a difference. In our study of over 12,500 Veterans Affairs physician residents (5), we found that an association between psychological safe learning environments and physician resident satisfaction with clinical learning after adjusting for 16 factors fell by 50% when the estimate was further adjusted for unobserved confounders using a two-stage residual inclusion method.

One problem with control functions is that effect size estimates for the other covariates often remain biased and thus are uninterpretable. We have investigated using an approach designed to identify the true data-generating process and then estimate its parameters. “Best approximating models” (6, 7) begin with a stochastic/exhaustive search of all possible models constructed from a list of plausible covariates, plus nonlinear transforms and interaction terms. Each model is estimated to account for over-fitting and multicollinearity. The best-fitting model is selected from among those that do not test positive for model misspecification. The key here is to test whether the researcher’s final model is correctly specified to represent the true data-generating process (8, 9). As described above, models that can predict may not necessarily reflect the true data-generating process. For instance, a misspecified candidate model that differed by an incorrect transformation and irrelevant predictor from a data-generating process third-order power series was found to be observationally equivalent, yet it tested positive for model misspecification (8, 9). In simulation experiments in head-to-head comparisons with propensity score matching (which does not account for unobserved confounders), best approximating models were shown to differ by less than 10% of the true effect size in the presence of unobserved confounders, compared with estimates from propensity score matching that fell between 75% below and 50% above the true effect size (10, 11).

While Blanco et al. are to be applauded for using advanced analytics to compute true associations from observational data sets, investigators should be encouraged to consider more principled and systematic approaches in selecting their covariates, followed by testing their performance to see whether the variable of interest is conditionally exogenous, or the final model representing the data-generating process is correctly specified.

References

Blanco C, Wall MM, Okuda M, et al: Pain as a predictor of opioid use disorder in a nationally representative sample. Am J Psychiatry 2016; 173:1189–1195

Crossref

PubMed

Google Scholar

Heckman J, Navarro-Lozano S: Using matching, instrumental variables, and control functions to estimate economic choice models. Rev Econ Stat 2004; 86:30–57

Crossref

Google Scholar

Lu X, White H: Robustness checks and robustness tests in applied economics. J Econom 2014; 178:194–206

Crossref

Google Scholar

Terza JV, Basu A, Rathouz PJ: Two-stage residual inclusion estimation: addressing endogeneity in health econometric modeling. J Health Econ 2008; 27:531–543

Crossref

PubMed

Google Scholar

Torralba KD, Loo LK, Byrne JM, et al: Does psychological safety impact the clinical learning environment for physician residents? Results from VA’s Learners’ Perceptions Survey. J Grad Med Educ (Epub ahead of print, Oct 6, 2016)

Google Scholar

Westover AN, Kashner TM, Winhusen TM, et al: A systematic approach to subgroup analyses in a smoking cessation trial. Am J Drug Alcohol Abuse 2015; 41:498–507

Crossref

PubMed

Google Scholar

Henley SS, Kashner TM, Golden RM, et al: Response to letter regarding “A systematic approach to subgroup analyses in a smoking cessation trial”. Am J Drug Alcohol Abuse 2016; 42:112–113

Crossref

PubMed

Google Scholar

Golden RM, Henley SS, White H Jr, et al: New directions in information matrix testing: Eigenspectrum tests, in Causality, Prediction, and Specification Analysis: Recent Advances and Future Directions Essays in Honour of Halbert L. White, Jr. Edited by Swanson NR, Chen X. New York, Springer Science and Business Media, 2013, pp 145–177

Crossref

Google Scholar

Golden RM, Henley SS, White H, Kashner TM. Generalized information matrix tests for detecting model misspecification. Econometrics (in press).

Google Scholar

10.

Kashner TM, Chen GJ, Golden RM, et al: Recommendations from HSR&D’s Panel on Statistics and Analytics on VHA Datasets. Presented at the 2015 Health Services Research and Development/Quality Enhancement Research Initiative, Philadelphia, July 8–10, 2015

Google Scholar

11.

Kashner TM, Henley SS, Golden RM: New strategies to solve analytic challenges in HSR. Workshop presented at the Annual Research Meeting of Academy Health, Boston, June 25–28, 2016

Google Scholar

Information & Authors

Information

Published In

American Journal of Psychiatry

Volume 173 • Number 12 • December 01, 2016

Pages: 1161 - 1162

PubMed: 27903100

History

Accepted: September 2016

Published online: 1 December 2016

Published in print: December 01, 2016

Keywords

Authors

Details

T. Michael Kashner, Ph.D., J.D.

From the Department of Medicine, Loma Linda University Medical School, Loma Linda, Calif.; the Department of Psychiatry, University of Texas Southwestern Medical Center at Dallas; and the Office of Academic Affiliations, Department of Veterans Affairs, Washington, D.C.

View all articles by this author

Notes

Address correspondence to Dr. Kashner ([email protected]).

Funding Information

The author reports no financial relationships with commercial interests.

Metrics & Citations

Metrics

Citations

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

View Options

View options

PDF/EPUB

View PDF/EPUB

Login options

Already a subscriber? Access your subscription through your login credentials or your institution for full access to this article.

Personal login Institutional Login Open Athens login

Purchase Options

Purchase this article to access the full text.

PPV Articles - American Journal of Psychiatry

Not a subscriber?

Subscribe Now / Learn More

PsychiatryOnline subscription options offer access to the DSM-5-TR^® library, books, journals, CME, and patient resources. This all-in-one virtual library provides psychiatrists and mental health professionals with key resources for diagnosis, treatment, research, and professional development.

Need more help? PsychiatryOnline Customer Service may be reached by emailing [email protected] or by calling 800-368-5777 (in the U.S.) or 703-907-7322 (outside the U.S.).

Making Causal Inferences From Observational Databases?

References