Skip to main content
To the Editor: Homage must be paid to the DSM-III field trials (1) that strongly influenced the design of the DSM-5 field trials. It could hardly be otherwise, since methods for evaluating categorical diagnoses were developed for DSM-III by Dr. Spitzer and his colleagues, Drs. Fleiss and Cohen. However, in the 30 years after 1979, the methodology and the understanding of kappa have advanced (2), and DSM-5 reflects that as well.
Like DSM-III, DSM-5 field trials sampled typical clinic patients. However, in the DSM-III field trials, participating clinicians were allowed to select the patients to evaluate and were trusted to report all results. In the DSM-5 field trials, symptomatic patients at each site were referred to a research associate for consent, assigned to an appropriate stratum, and randomly assigned to two participating clinicians for evaluation, with electronic data entry. In DSM-III field trials, the necessary independence of the two clinicians evaluating each patient was taken on trust. Stronger blinding protections were implemented in the DSM-5 field trials. Selection bias and lack of blindness tend to inflate kappas.
The sample sizes used in DSM-III, by current standards, were small. There appear to be only three diagnoses for which 25 or more cases were seen: any axis II personality disorder (kappa=0.54), all affective disorders (kappa=0.59), and the subcategory of major affective disorders (kappa=0.65). Four kappas of 1.00 were reported, each based on three or fewer cases; two kappas below zero were also reported based on 0–1 cases. In the absence of confidence intervals, other kappas may have been badly under- or overestimated. Since the kappas differ from one diagnosis to another, the overall kappa cited is uninterpretable (1).
Standards reflect not what we hope ideally to achieve but what the reliabilities are of diagnoses that are actually useful in practice. Recognizing the possible inflation in DSM-III and DSM-IV results, DSM-5 did not base its standards for kappa entirely on their findings. Fleiss articulated his standards before 1979 when there was little experience using kappa. Are the experience-based standards (3) we proposed unreasonable? There seems to be major disagreement only about kappas between 0.2 and 0.4. We indicated that such kappas might be acceptable with low-prevalence disorders, where a small amount of random error can overwhelm a weak signal. Higher kappas may, in such cases, be achievable only in the following cases: when we do longitudinal follow-up, not with a single interview; when we use unknown biological markers; when we use specialists in that particular disorder; when we deal more effectively with comorbidity; and when we accept that “one size does not fit all” and develop personalized diagnostic procedures.
Greater validity may be achievable only with a small decrease in reliability. The goal of DSM-5 is to maintain acceptable reliability while increasing validity based on the accumulated research and clinical experience since DSM-IV. The goal of the DSM-5 field trials is to present accurate and precise estimates of reliability when used for real patients in real clinics by real clinicians trained in DSM-5 criteria.

Footnote

Accepted for publication in March 2012.

References

1.
Spitzer RL, Forman JBW, Nee J: DSM-III field trials, I: initial interrater diagnostic reliability. Am J Psychiatry 1979; 136:815–817
2.
Kraemer HC: Evaluating Medical Tests: Objective and Quantitative Guidelines. Newbury Park, Calif, Sage Publications, 1992
3.
Kraemer HC, Kupfer DJ, Clarke DE, Narrow WE, Regier DA: DSM-5: how reliable is reliable enough? Am J Psychiatry 2012; 169:13–15

Information & Authors

Information

Published In

Go to American Journal of Psychiatry
Go to American Journal of Psychiatry
American Journal of Psychiatry
Pages: 537 - 538
PubMed: 22549210

History

Accepted: March 2012
Published online: 1 May 2012
Published in print: May 2012

Authors

Affiliations

Helena Chmura Kraemer, Ph.D.
David J. Kupfer, M.D.
Diana E. Clarke, Ph.D.
William E. Narrow, M.D., M.P.H.
Darrel A. Regier, M.D., M.P.H.

Funding Information

The authors' disclosures accompany the original commentary.

Metrics & Citations

Metrics

Citations

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format
Citation style
Style
Copy to clipboard

There are no citations for this item

View Options

View options

PDF/ePub

View PDF/ePub

Get Access

Login options

Already a subscriber? Access your subscription through your login credentials or your institution for full access to this article.

Personal login Institutional Login Open Athens login
Purchase Options

Purchase this article to access the full text.

PPV Articles - American Journal of Psychiatry

PPV Articles - American Journal of Psychiatry

Not a subscriber?

Subscribe Now / Learn More

PsychiatryOnline subscription options offer access to the DSM-5-TR® library, books, journals, CME, and patient resources. This all-in-one virtual library provides psychiatrists and mental health professionals with key resources for diagnosis, treatment, research, and professional development.

Need more help? PsychiatryOnline Customer Service may be reached by emailing [email protected] or by calling 800-368-5777 (in the U.S.) or 703-907-7322 (outside the U.S.).

Media

Figures

Other

Tables

Share

Share

Share article link

Share