Full access

Communications and Updates

Published Online: 1 May 2012

Standards for DSM-5 Reliability

Robert L. Spitzer, M.D., Janet B.W. Williams, Ph.D., and Jean Endicott, Ph.D.Authors Info & Affiliations

Publication: American Journal of Psychiatry

https://doi.org/10.1176/appi.ajp.2012.12010083

To the Editor: In the January issue of the Journal, Helena Chmura Kraemer, Ph.D., and colleagues (1) ask, in anticipation of the results of the DSM-5 field trial reliability study, how much reliability is reasonable to expect. They argue that standards for interpreting kappa reliability, which have been widely accepted by psychiatric researchers, are unrealistically high. Historically, psychiatric reliability studies have adopted the Fleiss standard, in which kappas below 0.4 have been considered poor (2). Kraemer and colleagues propose that kappas from 0.2 to 0.4 be considered “acceptable.” After reviewing the results of three test-retest studies in different areas of medicine (diagnosis of anemia based on conjunctival inspection, diagnosis of pediatric skin and soft tissue infections, and bimanual pelvic examinations) in which kappas fall within ranges of 0.36–0.60, 0.39–0.43, and 0.07–0.26, respectively, Kraemer et al. conclude that “to see κ_I for a DSM-5 diagnosis above 0.8 would be almost miraculous; to see κ_I between 0.6 and 0.8 would be cause for celebration.” Therefore, they note that for psychiatric diagnoses, “a realistic goal is κ_I between 0.4 and 0.6, while κ_I between 0.2 and 0.4 would be acceptable.”

When we (R.L.S., J.B.W.W.) conducted the DSM-III field trial, following the Fleiss standard, we considered kappas above 0.7 to be “good agreement as to whether or not the patient has a disorder within that diagnostic class” (3). According to the Kraemer et al. commentary, the DSM-III field trial results should be cause for celebration: the overall kappa for axis I disorders in the test-retest cohort (the one most comparable methodologically to the DSM-5 sample) was 0.66 (3). Therefore, test-retest diagnostic reliability of at least 0.6 is achievable by clinicians in a real-world practice setting, and any results below that standard are a cause for concern.

Kraemer and colleagues' central argument for these diagnostic reliability standards is to ensure that “our expectations of DSM-5 diagnoses…not be set unrealistically high, exceeding the standards that pertain to the rest of medicine.” Although the few cited test-retest studies have kappas averaging around 0.4, it is misleading to depict these as the “standards” of what is acceptable reliability in medicine. For example, the authors of the pediatric skin lesion study (4) characterized their measured test-retest reliability of 0.39–0.43 as “poor.” Calling for psychiatry to accept kappa values that are characterized as unreliable in other fields of medicine is taking a step backward. One hopes that the DSM-5 reliability results are at least as good as the DSM-III results, if not better.

Footnote

Accepted for publication in March 2012.

References

Kraemer HC, Kupfer DJ, Clarke DE, Narrow WE, Regier DA: DSM-5: how reliable is reliable enough? Am J Psychiatry 2012; 169:13–15

Crossref

PubMed

Google Scholar

Fleiss J: Statistical Methods for Rates and Proportions, 2nd ed. New York, Wiley, 1981

Google Scholar

Spitzer R, Forman J, Nee J: DSM-III field trials, I: initial interrater diagnostic reliability. Am J Psychiatry 1979; 136:815–817

Crossref

PubMed

Google Scholar

Marin JR, Bilker W, Lautenbach E, Alpern ER: Reliability of clinical examinations for pediatric skin and soft-tissue infections. Pediatrics 2010; 126:925–930

Crossref

PubMed

Google Scholar

Information & Authors

Information

Published In

American Journal of Psychiatry

Volume 169 • Number 5 • May 2012

Pages: 537

PubMed: 22549210

History

Accepted: March 2012

Published online: 1 May 2012

Published in print: May 2012

Authors

Details

Robert L. Spitzer, M.D.

Princeton, N.J.

View all articles by this author

Janet B.W. Williams, Ph.D.

Princeton, N.J.

View all articles by this author

Jean Endicott, Ph.D.

New York City

View all articles by this author

Funding Information

Dr. Spitzer reports no financial relationships with commercial interests. Dr. Williams works for MedAvante, a pharmaceutical services company. Dr. Endicott has received research support from Cyberonics, the New York State Office of Mental Hygiene, and NIH and has served as a consultant or advisory board member for AstraZeneca, Bayer Schering, Berlex, Cyberonics, Eli Lilly, Forest Laboratories, GlaxoSmithKline, Otsuka, Shire, and Wyeth-Ayerst.

Metrics & Citations

Metrics

Citations

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

View Options

View options

PDF/EPUB

View PDF/EPUB

Login options

Already a subscriber? Access your subscription through your login credentials or your institution for full access to this article.

Personal login Institutional Login Open Athens login

Purchase Options

Purchase this article to access the full text.

PPV Articles - American Journal of Psychiatry

Not a subscriber?

Subscribe Now / Learn More

PsychiatryOnline subscription options offer access to the DSM-5-TR^® library, books, journals, CME, and patient resources. This all-in-one virtual library provides psychiatrists and mental health professionals with key resources for diagnosis, treatment, research, and professional development.

Need more help? PsychiatryOnline Customer Service may be reached by emailing [email protected] or by calling 800-368-5777 (in the U.S.) or 703-907-7322 (outside the U.S.).

Standards for DSM-5 Reliability

Footnote

References