Delirium has a high point prevalence, occurring in one out of five patients in general hospitals (
1). Polypharmacy, organ insufficiency, postsurgical states, and greater medical burden are causal factors (
2). Preexisting cognitive impairment and any brain disorder are major risk factors for delirium, as is advanced age (
3). Delirium is associated with longer hospital stay, more medical complications, greater difficulty with rehabilitation compliance, psychological sequelae, and increased mortality during and after hospitalization (
4). Delirium among elderly persons is associated with long-term cognitive and functional decline in the year following hospitalization, attributed to either acceleration of preexisting cognitive impairment or new insults as a result of medical conditions, anesthesia, or surgical procedures (
5).
A brief and easy-to-administer tool for use by nonexpert physicians and nurses to routinely detect delirium in at-risk patient populations is needed. The challenge is to not compromise validity or standardization of administration, given the breadth and unique differentiating features of delirium symptoms.
Other screening tools are intended for nurses’ use, but reviews of such tools have not been prevalent in the literature. These tools usually involve observations of behaviors rather than phenomenological assessment of core delirium characteristics, although one could argue that the items are more accessible to the work focus of nursing staff. Additionally, the use of these tools does not provide a provisional diagnosis. The Single Question in Delirium, a single screening question to family members asking whether the patient is more confused lately, had an 80% sensitivity and a 71% specificity, and its sensitivity outperformed the CAM-A by 40% when administered by untrained staff (
20). The Nursing Delirium Screening Scale does not assess attention, a cardinal feature of delirium (
21). The Delirium Observation Screening Scale rates 25 behavioral observations (
22), and a 13-item algorithm was developed from it. The Neecham Confusion Scale rates cognition, behaviors, and three nonbrain body functions (
23). The Delirium-O-Meter rates 12 behaviors (
24).
The main objective of the present study was to translate the DDT-Pro into Spanish to validate its performance among patients ≥60 years old who were admitted to an internal medicine service. A second aim was to compare performances of the commonly used original CAM-A version and the DDT-Pro against independently ascertained reference standards for delirium diagnosis: DSM-5 and the DRS-R-98.
Results
Face Validity of the DDT-Pro Spanish Translation
Mean scores from the expert survey about the comprehensibility of language and content of the tool and its instructions, as a final step of the translation process, ranged between 4.0 and 5.5 (score range, 1–6). The scale was slightly modified according to the experts’ suggestions for improvement, and the final Spanish version of the DDT-Pro was approved by both the translators and the authors and then used in the assessment of patients.
Characteristics of the Study Sample
All assessments were performed during day shifts, most within the same 8-hour period for a given patient. The patient flow diagram is presented in
Figure 1. Demographic and clinical characteristics of the 200 study subjects, 74 (37.0%) of whom had a diagnosis of dementia, are summarized in
Table 1. Fifty (25.0%) patients met DSM-5 criteria for delirium; of these, 34/50 (68.0%) had comorbid dementia. As expected, patients with delirium were older, had more comorbid conditions, and had a lower education level and higher dementia prevalence compared with patients without delirium. When the DRS-R-98 cutoff score of >14 for a diagnosis of delirium was used, 40 (20%) patients from the study sample had a positive diagnosis, 28/40 (70%) of whom had comorbid dementia. In the subgroup of 74 patients with a diagnosis of dementia, 34/74 (45.9%) met DSM-5 criteria for delirium, and 28/74 (37.8%) met DRS-R-98 criteria for delirium.
Description of DDT-Pro Scores, Concordance on Item Ratings, and Reliability
All DDT-Pro median scores (interquartile range) were significantly different between patients who met DSM-5 criteria for delirium and patients without delirium, regardless of who administered the measure or whether comparisons were made in the dementia subgroup (for further details, see Table S1 in the online supplement). Many of the percentages of DDT-Pro item ratings were similar, irrespective of administration by the physician or nurse (for further details, see Table S2 in the online supplement). The item with the highest number of concordances between raters was vigilance, followed by sleep-wake, and comprehension.
The intraclass correlation coefficient between the equivalent A and B forms of the DDT-Pro, as independently administered by a physician or nurse, was 0.873 (95% CI=0.832–0.904). In the dementia subgroup, it was also very good (intraclass correlation coefficient=0.875, 95% CI=0.801–0.921).
Internal Consistency
The internal consistency of the DDT-Pro was very good (Cronbach’s α for administration by the physician and nurse, 0.809 and 0.816, respectively). All items were important for consistency, because the alpha value lowered or remained almost the same when any item was removed (range, 0.698–0.794 and 0.669–0.823 for administration by the physician and nurse, respectively) and because all item-scale correlations were high (≥0.624 and ≥0.594 for the physician and nurse, respectively). In addition, the DDT-Pro had a very good Cronbach’s alpha in the subgroup of patients with dementia (0.824 and 0.843 for the physician and nurse, respectively).
Convergent Validity
As expected, the DDT-Pro and the DRS-R-98 scores were inversely related according to Spearman’s rho. In the whole sample, the DDT-Pro administered by the physician correlated with the DRS-R-98 Total score at –0.698 and with the DRS-R-98 Severity score at –0.701 (−0.693 in both cases for the nurse). Correlations were even better in the dementia subgroup (−0.829 and –0.848 for the physician; –0.820 and –0.821 for the nurse).
When we assessed correlations only in the DSM-5 delirium cases (N=50), DDT-Pro scores correlated well with DRS-R-98 Total scores (−0.775 and –0.687 for administration by the physician and nurse, respectively) and DRS-R-98 Severity scores (−0.775 and −0.681 for administration by the physician and nurse, respectively). In the subgroup of patients with DSM-5 delirium and comorbid dementia (N=34), Spearman’s rho for DDT-Pro with the DRS-R-98 Total was –0.798 for administration by the physician and –0.615 for administration by the nurse, and Spearman’s rho for DDT-Pro with DRS-R-98 Severity was –0.803 for the physician and –0.608 for the nurse.
Criterion Validity and Selection of the DDT-Pro Cutoff Score for Delirium
The area under the curve (AUC) for the ROC analysis of the DDT-Pro for accuracy of the diagnosis of delirium per DSM-5 criteria as administered by the physician was 94.1% (95% CI=90.6%−97.5%), and as administered by the nurse it was 93.8% (95% CI=90.0%−97.6%). Accuracy of the DDT-Pro by both evaluators in the subgroup of patients with dementia was >92.3%. AUC of the DDT-Pro for DRS-R-98 delirium as administered by the physician was 94.9% (95% CI=91.6%−98.1%), and as administered by the nurse it was 96.3% (95% CI=93.9%−98.7%). Accuracy for both evaluators in the dementia subgroup was always >93.3%.
Figure 2 shows that the AUCs for the physician administration were not significantly different from those of the nurse, nor were the AUCs for the physician or nurse in the whole sample different from those in the dementia subgroup.
ROC analysis sensitivity and specificity values for DDT-Pro diagnosis of delirium compared with either DSM-5 delirium or DRS-R-98 delirium are presented in
Table 2. A gradient for ascending sensitivity and descending specificity was found. DDT-Pro scores ≤5 and ≤6 both had balanced sensitivity and specificity, although scores ≤6 had higher sensitivity, whereas scores ≤5 had higher specificity. Therefore, we selected a score of ≤6 as the DDT-Pro cutoff for delirium diagnosis.
Comparisons and Performance of the DDT-Pro and CAM-A
Concordance crosstabs for delirium determined by DDT-Pro (≤6 cutoff score) compared with delirium determined by CAM-A are presented in
Table 3. CAM-A was administered only by the expert physician. The results were discordant as a consequence of classification differences for one or the other test, regardless of the DDT-Pro administration by the physician (Cochran Q=19.703, p<0.001) or by the nurse (Cochran Q=21.125, p<0.001). The tests were also discordant in the dementia subsample when the DDT-Pro was administered by the physician (Cochran Q=12.800, p<0.001) or the nurse (Cochran Q=13.235, p<0.001).
Accuracy of the DDT-Pro at the cutoff score of ≤6 was comparable to the CAM-A, when compared with the reference standards of DSM-5 criteria and with the DRS-R-98 Total scale cutoff score of >14 (
Table 4). Almost all accuracies were within the 80%−90% range. However, accuracies were lower in the dementia subgroup than in the whole sample for both tools, although more so for the CAM-A (>10 percentage points decline versus 6 percentage points for DDT-Pro).
Sensitivity of the DDT-Pro at the cutoff score of ≤6 was very high, ranging between 88.0% and 100%, considering all scenarios, whereas the CAM-A values were much lower, in the 61.8%−70.0% range. Further DDT-Pro sensitivity values were higher in the dementia subgroup (91.2%−100%) than in the whole sample, whereas CAM-A values were even lower (61.8%−64.3%). Specificity values tended to be higher for the CAM-A (84.8%−95.3%) than for the DDT-Pro (67.4%−86.7%).
Indicators assessing the likelihood of a patient actually having delirium when results from the DDT-Pro or the CAM-A were positive for the disorder (PPVs, +LRs), or not having it when the tools revealed negative results (NPVs, –LRs), are in accordance with indicators assessing internal properties of these instruments (sensitivity, specificity). At the cutoff score of ≤6, all DDT-Pro NPVs were >90%, whereas the PPVs were lower. Conversely, CAM-A NPVs raged from 73.5% to 92.5%, with PPVs a little better than those of the DDT-Pro. The +LRs of the CAM-A tended to be higher than those of the DDT-Pro (i.e., better), and the –LRs of DDT-Pro tended to be lower than those of the CAM-A (i.e., better).
Sensitivity of the DDT-Pro to Patients’ Clinical Change
Delirium clinical improvement at day 4 or 5 of the study could be documented in 18 patients’ charts, 13 of whom had comorbid dementia. The median DDT-Pro score at follow-up was 7.0 (interquartile range, 6.0–8.0), which implies a three-point difference from the initial score by the physician, who had a median of 4.0 (interquartile range, 1.7–5.0; Z score of Wilcoxon test=–3.741, p<0.001), and the same regarding the nurse, who had a median of 4.0 (interquartile range, 1.7–6.0; Z score of Wilcoxon test=–3.312, p=0.001).
DDT-Pro was also sensitive to clinical change in the dementia subgroup. Its median of 7.0 (interquartile range, 4.5–7.5) at follow-up implied a four-point difference from that of the initial physician score of 3.0 (interquartile range, 1.0–5.0; Z score of Wilcoxon test –3.195, p=0.001) and from that of the nurse score of 3.0 (interquartile range, 0.5–6.0; Z score of Wilcoxon test=–2.819, p=0.005).
Discussion
There remains a clinical need for a briefly administered, highly accurate, and reliable diagnostic tool for delirium to enhance delirium detection in clinical settings until expert physicians can confirm a diagnosis or when they are not available for all patients who might benefit from an evaluation. Such a tool should be accurate enough to be used as a provisional or “working” diagnosis to begin delirium evaluation until or in lieu of a delirium expert physician assessment or confirmation. The DDT-Pro was developed to provisionally diagnose delirium by assessing each of the three core domains of delirium (cognition [vigilance], higher-order thinking [comprehension], and circadian rhythm [sleep-wake]) and to be quantitative and highly structured, as a potential alternative to the widely used CAM-A screening tool. Therefore, we extended the validation of the DDT-Pro, which was originally studied in 36 subacute inpatients with TBI.
When the DDT-Pro interrater reliability was compared between a physician rater who had delirium expertise and a nurse who did not, we found high comparability between their ratings on the equivalent forms of the tool, which suggests that it could be useful in routine clinical care settings by clinical staff who are not physicians or experts in delirium. This may be attributable to its content, which is focused only on the key features of the syndrome, as well as its structured ratings, which enable better standardization and interrater reliability. Furthermore, it comprises three items from well-validated delirium tools, which enhance its validity and reliability. The latter is also revealed by the high rate of concordance between the physician and nurse for each of the score ratings of the DDT-Pro items.
Internal consistency and item-scale correlational analysis showed that all items were important for the performance of the DDT-Pro. Moreover, all individual item scores were statistically different between patients with delirium and control subjects (without delirium). This is consistent with the three items representing the three core domains of delirium. The delirium groups had high correlations of the DDT-Pro with the DRS-R-98 Total and Severity scores, which supports a high degree of construct validity (
41).
Unlike in CAM-A, the DDT-Pro items are not dichotomously rated as present or absent without regard to severity. Its quantitative nature allows for ROC analyses of the DDT-Pro against delirium reference standards.
The DDT-Pro performed very well with regard to criterion validity, regardless of administration by a physician or nurse, against diagnoses made independently by a senior research physician using DSM-5 criteria or DRS-R-98 Total score cutoff for delirium (
Figure 2). The DDT-Pro global diagnostic accuracy was high (>94.0%) and little affected by dementia, even though our population had a high comorbidity prevalence (nearly 40%). Accuracies were even higher against a DRS-R-98 delirium diagnosis (96.3% for administration by a nurse), with values >93% in the dementia subgroup. Beyond determining a single provisional diagnostic cutoff for the DDT-Pro, its excellent global accuracy warrants further study of its performance using other cutoffs for broader purposes, such as delirium confirmation in clinical settings or delirium diagnosis in research.
ROC analysis sensitivity and specificity of DDT-Pro scores enabled selection among cutoff scores in accordance with clinical interest. The cutoff score of ≤6 on the DDT-Pro had high sensitivity with balanced specificity and was the same cutoff score derived in the original TBI study (
30), in which the mean DRS-R-98 score in the delirium group was nearly the same as the mean in the present cohort (22.3). This cutoff had a higher sensitivity than specificity in both studies, which is preferred for case detection, whereas those more interested, for example, in higher specificity may prefer the ≤5 cutoff. The dementia subgroup had the lowest specificity values, irrespective of rater type or reference standard. However, clinically speaking, it is better to overdiagnose possible delirium in an at-risk elderly population, for whom dementia may be a confounding factor, than to miss it. Additionally, when comorbid, delirium overshadows dementia symptoms (
9).
Convergent validity was high for the DDT-Pro regardless of rater type. Correlations to the DRS-R-98 Total and Severity scores were both around –0.70 and even better in the dementia subgroup (−0.82 to –0.85). In the delirium-only group, there was a discrepancy between rater types, such that the nurse’s values were somewhat lower than the physician’s. This difference was most apparent in the comorbid delirium-dementia subgroup, although this was small (N=34).
We preferred to use the original CAM-A, which is not anchored by tests with the original descriptors (for further details, see the training manual [
37]). We used the original version, because most routine clinical evaluators would likely prefer this version for its ease of use. Even though we chose a delirium expert and consultation-liaison psychiatrist to administer the CAM-A, its sensitivity was considerably lower (64.0%) against DSM-5 criteria than for the nonexpert nurse’s administration of the DDT-Pro (90%), which we attribute to differences in scale design and not to inadequate delirium training.
Concordance statistical analysis for the diagnostic performance of the DDT-Pro cutoff and the CAM-A for positive and negative diagnoses revealed notable differences when one was used over the other (
Table 3). Even though they demonstrated high concordance for positive delirium (most scenarios >90%), their concordance was poorer in negative cases (<83.0%), especially in the dementia subgroup (as low as 63.3%). This degree of discordance implies that the performance of the tools was different for negative cases. This was corroborated by the DDT-Pro’s very high sensitivity (90.0%−100.0%), revealing a low percentage of false negative diagnoses, versus much lower sensitivity for the CAM-A (61.8%−70.0%), suggesting more false negatives among the true positives. It is possible that scale structure (CAM-A items 3 and 4 have an either-or option) and the fact that stupor and coma are allowed to be rated as delirium were contributors to the lower sensitivity when tested against reference standards that do not involve those potential confounds. It is preferable to have higher sensitivity than specificity for a diagnostic tool when the condition is a medical urgency, as with delirium.
There were also important differences in the performance characteristics between these tools when compared with the two reference diagnostic standards (
Table 4). Sensitivity values were higher for DDT-Pro, whereas specificity values tended to be higher for CAM-A, although CAM-A values were more adversely affected by dementia. This was true irrespective of whether the DDT-Pro was administered by the expert physician or nonexpert nurse, even though CAM-A was always administered by a delirium expert physician. The better specificity than sensitivity for CAM-A in this study sample could be a result of our use of the expert physician, who would know that delirium is not consonant with stupor or coma (even though CAM-A allows the rating of stupor and coma as delirium). Additionally, an expert in delirium using the unanchored CAM-A would apply full clinical knowledge of the syndrome, thereby enhancing diagnostic specificity.
DDT-Pro’s performance data for PPVs, NPVs, and LRs were consistent with its design to maximize a provisional delirium diagnosis among patients who actually had delirium. The low number of false negative test results (with high percentage of sensitivity) also increased the likelihood of excluding patients without delirium with negative test results (true negatives), where its NPV was >90%, and the –LR was close to 0.1. In contrast to the DDT-Pro, NPVs of the CAM-A were lower, and the –LR was around 0.3–0.4 (as a rule of thumb for the understanding of LRs, +LRs >5 and –LRs <0.2 are considered good) (
42).
Overall, our data suggest that the DDT-Pro administered by a physician or nurse would be more useful than the CAM-A to provisionally diagnose delirium, as a result of its performance characteristics. Furthermore, it has advantages over the CAM-A for use in elderly populations, among whom comorbid dementia is common. More research is needed before we can better determine the DDT-Pro performance specifications as a provisional diagnostic tool in clinical settings other than internal medicine wards and acute brain injury rehabilitation services, where it has been validated. Very brief screening tools, such as the 4AT or a single question, may be useful for triage in acute medical settings, where delirium prevalence is high and false positives from other psychiatric disorders is less likely, or in emergency departments, where time is of the essence, but such tools should not be relied on for provisional delirium diagnosis. A variety of nurse screening tools have not been as well studied as the 4AT, and their content is less focused on core delirium characteristics.
As the follow-up DDT-Pro scores revealed for 18 patients with delirium or with comorbid dementia, its continuous value scoring approach allows its use in repeated evaluation of patients. Additionally, another possible advantage of quantitative scoring could be for evaluation of subsyndromal delirium, in which core domain symptoms are present but at subthreshold intensity (
43). Future studies might delineate different cutoff values for subsyndromal delirium using the DDT-Pro.
This study has several limitations. The low mean years of education of the study sample imposes restrictions with regard to the generalization of our data. However, our cutoff score was the same as that in the Kean et al. (
30) study, in which U.S. patients had more than twice as many years of education, which suggests that the DDT-Pro is suitable across a wide range of education and in different cultures. IQCODE assessments were performed by two different researchers, which would have affected the prevalence of dementia; however, the IQCODE has shown good interrater reliability (
44). Because of the natural fluctuation of delirium symptom severity throughout a 12- to 24-hour period, this contributes variability among measurements made, even though they occurred during day 1 of the study. Patient fatigue could also have influenced some results.
Because delirium severity can fluctuate over a 24-hour period, it is recommended that for tools such as the DRS-R-98, symptoms used for ratings should include the preceding 12–24 hours in addition to the information gleaned at the interview. The DDT-Pro sleep-wake cycle item uses information for the preceding 24 hours, whereas its other two items rate the contemporaneous performance during the interview. Therefore, a potential limitation is that the measure might insufficiently capture the most severe symptoms for two of three items if administered when symptom severity is milder, thereby potentially producing a lower score that does not meet the cutoff score for delirium diagnosis (a false negative). However, our ROC analyses revealed a high sensitivity for the DDT-Pro versus reference standards, both of which assessed symptoms for up to 24 hours, consistent with false negatives not being a serious limitation.
In conclusion, we found excellent reliability and validity for the DDT-Pro in this large, medical inpatient study sample, which was less homogeneous than the original TBI cohort. High accuracies and performance characteristics may relate to the tool’s design: anchored item ratings and objective scoring on the basis of previously validated delirium scales, straightforward administration, no need for training or interview expertise on delirium phenomenological characteristics, brevity, and items representing the three core domains of delirium. Encouraging are the confirmation of the original cutoff score and its high performance for nonphysician clinician assessors, which is its intended purpose for ease and accuracy of routine delirium detection and provisional diagnosis. Additionally, the DDT-Pro’s high sensitivity among dementia patients suggests utility in real-life clinical settings with older patients, an advantage over existing tools. Its highly structured, quantitative items may enhance its performance over less-structured brief assessment tools, such as the CAM-A. Further validation in other settings is warranted, such as in emergency departments, nursing homes, and units for critically ill and postsurgical patients, where the severity of the clinical state of the patient, the resource availability, and the environment are different than that of the present study.