To the Editor: In their recent article on the evaluation of the psychometric properties of the DSM-IV axis V scales, Mark J. Hilsenroth, Ph.D., and colleagues
(1) reported that three instruments (i.e., the Global Assessment of Functioning Scale, the Global Assessment of Relational Functioning Scale, and the Social and Occupational Functioning Assessment Scale) “all exhibited very high levels of interrater reliability” (p. 1858), and they supported this finding with impressively high intraclass correlation coefficients (ICCs), which were “all in the excellent range (ICCs>0.74)” (p. 1860). My experience in evaluating comparable instruments designed for the assessment of the similarly conceptualized disability axis (axis II) of the Multiaxial Presentation of ICD-10 for Use in Adult Psychiatry
(2) is markedly different.
The disability axis of the ICD-10 multiaxial system is accompanied by the short version of the World Health Organization (WHO) Short Disability Assessment Schedule
(3), a semistructured instrument intended for clinician assessment of the following four specific areas of functioning in patients with mental disorders: personal care, occupation, functioning in relation to family and household members, and functioning in a broader social context, including participation in leisure and other social activities. The field trial version of the WHO Short Disability Assessment Schedule had a format similar to that of the DSM-IV axis V instruments and was based on a continuous scale ranging from 0 (“no disability”) to 99 (“gross disability”). The interrater reliability of the instrument was tested in the context of two international field trials of the ICD-10 multiaxial system, which involved 274 clinicians from 21 countries spanning multiple continents. The design of the field trials included the following steps: familiarization with the protocols and instruments and their pretest application on five local psychiatric patients, the rating of 12 internationally prepared psychiatric case vignettes, and assessment (by two clinicians making independent ratings) of 10 psychiatric patients selected locally in an unbiased manner.
With few exceptions, the ICCs for the WHO Short Disability Assessment Schedule disability categories ranged from 0.13 to 0.45, thus indicating poor reliability of the instrument. After investigators obtained such results in the first field trial and in order to improve the reliability of the instrument, the WHO revised the Short Disability Assessment Schedule and produced it as a set of 6-point scales with precisely defined anchor points accompanied by suggested guiding questions for clinicians to use in their exploration of specific areas of functioning covered by the instrument. The interrater reliability of the revised version of the WHO Short Disability Assessment Schedule was tested in a second, smaller international field trial, which produced better but still hardly satisfactory interrater agreement (ICC range=0.40–0.74) (2).
In view of the field test results for the WHO Short Disability Assessment Schedule and the previously identified modest reliability of the DSM-IV axis V measures (3), might the exceptionally high ICCs reported by Dr. Hilsenroth and colleagues be more reflective of their within-center interrater reliability (based on extensive instrument administration training and high motivation of the clinicians) than of the true psychometric properties of the DSM-IV axis V scales? In other words, I doubt that these scales would produce similar reliability results if tested internationally by a group of less-devoted clinicians belonging to different schools of psychiatry and psychiatric traditions.