T
o the Editor: We are highly appreciative of the excellent contributions provided by the authors of this letter, most of whom are members of the DSM-5 Substance-Related Disorders Work Group. We also welcome the opportunity to focus additional attention on the differences between previous DSM field trials and those conducted for DSM-5. Contrary to the initial statement in the Hasin et al. letter, test-retest reliability was a major focus of the DSM-5 field trials, but not of the DSM-IV field trial—the latter being more concerned about changes in prevalence that would occur if the same patients were examined using DSM-III, DSM-III-R, and ICD-10 criteria in contrast to the minimally changed DSM-IV criteria (
1). The reliability of DSM-IV criteria for substance use disorders noted by Hasin et al. was derived from multiple studies unrelated to the DSM-IV field trial, employing highly trained interviewers using structured research interviews. These studies were often epidemiological with a special interest in substance use disorders (
2)—not conducted in routine clinical settings where DSM is most widely used without the use of structured diagnostic interviews.
In research settings, it is well known that it is possible to maximize reliability, regardless of the validity of a diagnosis, by using selected populations, structured interviews, and highly trained interviewers. Robert Spitzer noted this in a 1997 review of the progression of diagnostic systems to that point in time: “Many studies show that researchers using structured interviews and assessing targeted populations (e.g., patients attending an anxiety disorder clinic) can achieve high reliability. However, the reliability of diagnosis in actual clinical practice by clinicians who have varying commitment to actually using the DSM criteria is unknown, but is probably low” (
3, p. 13).
The use of clinicians (or nonclinicians) with special training and required structured interviews would indeed explain the discrepancy of kappas. That would mean that, as a measure of how reliable the diagnosis would be in routine practice, the DSM-IV kappas are overestimates. In turn, as a measure of how reliable the DSM-5 diagnosis could be under the ideal conditions that usually do not pertain in practice, DSM-5 kappas are underestimates (
4). Again we get back to the reliability of diagnosis under ideal conditions versus the reliability of diagnosis under usual conditions. Which one is of interest: that which affects patient care or that which could be obtained under ideal conditions for research purposes?
This is the same argument as used for efficacy versus effectiveness trials (
5). Does one want to know how a treatment works under ideal conditions with patients with no other comorbid conditions, a rare occurrence in actual practice, or how a treatment works if clinicians use it with their patients under usual conditions? This question may be answered quite differently when asked of clinicians and patients or investigators trying to develop new treatments. The distinction is crucial since treatments that work well under ideal conditions may perform differently when used in clinical settings.
With that in mind, the decision was made early in the development of DSM-5 to focus on improving the lot of psychiatric patients and to leave the optimization of diagnosis in research settings to the researchers in those settings. It was recognized that DSM-5 reliabilities would be lower than reported DSM-IV reliabilities done under ideal conditions, but would be comparable to reliabilities of many medical diagnoses (
6). We thought it was timely to invest resources in determining how our diagnostic criteria would function in usual settings, which could also advance our understanding of the validity of DSM-5 diagnoses.