Uneasy lies the head that wears the crown, and a long-reigning monarch of psychiatric measurement may well see pitchforks and torches approaching the castle walls.
The Hamilton Depression Rating Scale became the de facto gold standard for calibrating depression severity in the decades after its development by British psychiatrist Max Hamilton in 1960. Psychiatrists need a reliable tool to measure the effects of treatment in both the research lab and in the clinic, but many have questioned the utility of the Hamilton in recent years.
“Everyone knows there's a problem with the Hamilton Depression scale, but what's better?” asked R. Michael Bagby, Ph.D., of the Center for Addiction and Mental Health in Toronto.
“The aim of psychiatric treatment is remission, but you don't know if the patient is in remission unless you have a scale to measure signs and symptoms and their severity before and after treatment,” said A. John Rush, M.D., of the University of Texas Southwestern Medical Center in Dallas.
Bagby and four colleagues conducted a systematic review of 70 studies of the Hamilton scale. They found it somewhat reliable but“ psychometrically and conceptually flawed,” they wrote in the December 2004 American Journal of Psychiatry.
Support for his team's analysis came in part from Eli Lilly and Co., as well as the Ontario Mental Health Foundation and the Michael Smith Foundation for Health Research. Besides clinicians and researchers, the pharmaceutical industry and regulators would like to find a reliable, widely accepted scale, one that would clarify for all parties how well a drug was working.
“There's so much error in these instruments that medicines are not getting into the marketplace,” said Bagby.
The researchers based their review on item-response theory, which calls for maximizing the selection of items that are most sensitive to change. Psychometric theory may sound like dry stuff, but it's very important, said Bagby, whose background is in statistics and test construction. Applying item response theory to the Hamilton scale revealed weaknesses at every level, he said.
“The Hamilton depression scale's internal reliability is adequate, but many scale items are poor contributors to the measurement of depression severity,” wrote the researchers. “Others have poor interrater and retest reliability.”
To be useful, each item on a scale should measure just one symptom and rate higher or lower amounts of that symptom. Listing unrelated elements as part of the same question only confounds the outcome. For instance, the general somatic symptoms item encodes “feelings of heaviness,”“ diffuse backache,” and “loss of energy”—hardly steps along a continuum of severity, said Bagby.
Inclusion of psychotic symptoms may violate these parameters too.
“A patient with guilt-themed hallucinations may be more severely ill than a patient who has nonpsychotic guilt feelings, but is he or she feeling more guilt?” he asked.
A further problem arises when some items can be scored with more possible points than others. For instance, feeling tired all the time contributes two points to the general somatic symptoms item, while weeping all the time may contribute three or four points to the depressed mood item. This gives more weight to weepiness than sleepiness.
Yet another question is whether the Hamilton is showing its age. Can a test designed 45 years ago accurately reflect current standards defining depression? Several items on the Hamilton scale (like loss of insight or hypochondriasis) are not among DSM-IV diagnostic criteria for depression, while some DSM-IV features (like weight gain or oversleeping) are not on the Hamilton.
Attempts have been made in recent decades to improve the Hamilton scale. The GRID-HAMD scale (2002) revised questions on the Hamilton scale but kept the original 17 items, retaining their discontinuities with DSM-IV definitions of depression, said Bagby.
Giving the Hamilton in a structured interview has improved reliability, said Rush, but doesn't fix the problem of differentially weighted items.“ So a scale total is not as valid as it could be.”
A more radical solution is needed, Bagby argues.
“It is time to retire the Hamilton depression scale,” he said, rather than merely revise it.
“The Hamilton has seen better days and should be replaced,” agreed Rush. “Psychiatry would be better served by a rating scale that measured the signs and symptoms of the syndrome and also their severity.”
Alternatives do exist. The Montgomery-Asberg Depression Rating Scale (MADRS), published in 1979, has achieved some acceptance. Rush and colleagues published the Inventory of Depressive Symptoms (IDS) in 1985 and have been refining and validating it ever since.
The IDS and its short version, the Quick IDS (QIDS), capture all nine DSM-IV criteria for depression, said Rush. All items are scored with the same three-point severity scale. It picks up common, associated symptoms and can be used in clinical practice as well as in research. “Doctors can use the same scale in practice that they read about in journals,” he said.
A self-report version that patients fill out compares well with the IDS and demands less clinician time, said Rush.
APA is working with the American College of Physicians and the American Academy of Family Practitioners to develop a nine-item Patient Health Questionnaire (the PHQ-9) with the support of Pfizer Inc. Severity is graded 0 (“not at all”) to 3 (“nearly every day”). A simple chart interprets provisional diagnoses and treatment recommendations for primary care physicians.
The Depression Inventory Development team, a Toronto-based consortium of 14 pharmaceutical companies and representatives from academia, is working on a new depression rating scale.
Even regulators appear open to new standards.
“The HAMD has been the most widely used depression instrument for many years; however, we have always been willing to accept other valid instruments, and in fact some programs now use the MADRS,” said a spokesperson for the Food and Drug Administration in a statement. “We have also indicated that we would accept the IDS.”
“Dr. Bagby's research has important implications for the future of DSM and how instruments like this should be considered,” said Darrel Regier, M.D., M.P.H., director of APA's Office of Research and executive director of the American Psychiatric Institute for Research and Education. “We must pay attention to how to characterize severity and how to link measurement to diagnostic criteria. Then we would be better prepared to study the accuracy of diagnosis and medical treatment.”
Am J Psychiatry 2004 161 2163