Rising health care costs have led to growing interest in the effects of untreated health problems on work performance and in the design of health insurance plans that optimally balance the direct costs of treatment against the indirect workplace costs of lost performance
(1–
3). Depression has become the focus of special interest in this regard for two reasons. First, epidemiologic studies show that depression not only is a common occurrence among employed people
(4,
5) but also is associated with substantial work impairment
(6). Second, experimental studies show that adequate depression treatment can substantially reduce work impairment
(7). Both of these lines of evidence are limited, however, because they are based largely on retrospective self-reports that require respondents to summarize information about their work performance over recall periods ranging from 1 week to 1 month. There is good reason to believe that recall bias associated with negative depressive cognitions could distort these reports, leading to upwardly biased estimates of both the work impairment associated with depression and the effects of depression treatment on work performance
(8–
10).
An obvious way to address this concern would be to base epidemiologic and experimental studies on objective measures of work performance
(11). However, such measures are lacking for most occupations, and they are very difficult to obtain even when they exist. Alternatively, simulation can be used to collect objective performance data by approximating the work conditions in a single occupation
(12). However, simulations are difficult and expensive to create. Simulations also limit investigation to the occupations for which the simulations have been created. Furthermore, because simulations are tests, workers typically do their best to perform as well as they possibly can, which means that simulations are much better at assessing abilities than performance. In the case of depression, where work impairment is arguably attributable to lack of motivation and effort as to lack of ability, simulation might be an especially poor measurement approach.
A technique known as the experience sampling method
(13–
15) has been developed to address the problem of self-report recall bias in situations where objective data are not available and where simulation is inappropriate. The experience sampling method uses a diary and a pager to collect concrete self-report data on moment-in-time work performance (i.e., performance at the very moment of being paged) for a randomly sampled set of moments-in-time for each respondent over the time interval of interest. The summation of the multiple “snapshots” captured in the diaries is used to build up a portrait of the typical work performance of each respondent in a way that avoids recall bias. However, it should be kept in mind that the experience sampling method relies on self-reports and may still be vulnerable to other information bias, such as might occur if depressed individuals systematically devalue their performance
(10).
The aim of the current report is to present data gathered by using the experience sampling method on the relationship between major depression and measures of work performance among service workers in two large corporations. Several previous reports have been published regarding the impaired performance of workers with individual conditions, such as migraine headaches
(16,
17), chronic fatigue
(18), diabetes
(19), asthma
(20), hay fever
(21), and arthritis
(22). These previous studies focused on single conditions in clinical samples. The current report, in comparison, presents data from nonclinical subjects on the work impairments associated with major depression compared with impairments associated with a range of chronic physical conditions.
Method
Subjects
The experience sampling method study was embedded within a larger survey of health and productivity among two types of service workers: reservation agents working for a major airline and customer service representatives working for a major telecommunications company. Most of the individuals in both groups were women, their median ages were in the 30–44-year range, and most had completed at least some college. As described in more detail elsewhere
(23), the larger survey was carried out by telephone as part of the calibration of the World Health Organization Health and Work Performance Questionnaire. A probability subsample of 286 respondents from this larger survey, including 105 reservation agents and 181 customer service representatives, was recruited into the experience sampling method study. Respondents who reported either major depression or any of several other commonly occurring chronic conditions in the survey were oversampled. A weight was used to adjust for this oversampling during data analysis.
Experience Sampling Method Procedures
Respondents were recruited by telephone to participate in what was described as a health-at-work diary study in which they would have to carry a pager and make diary entries about moment-in-time experiences at five randomly paged moments each day over 7 consecutive days. Once informed consent was obtained, the start day of the diary period was selected and an appointment was made with the respondent for a telephone interview the evening before the start day. The day of the week of the start day was randomized so that the day of the week would not be confounded by the number of days in the study.
A special delivery mail packet containing a pager, experience sampling method diaries, return envelopes, and a pen was sent to respondents for delivery the day before the start day. The telephone interview the evening before the start day instructed respondents in how to use the pager and diaries that were included in this packet. Information was also obtained at this time about each respondent’s waking time, work schedule, and bedtime. On the basis of this information, a computerized autodialer was programmed to page each respondent at five random times each day beginning at the start of the workday (or, on regularly scheduled days off, 1 hour after the respondent usually woke up on that day of the week) and ending 2 hours before the respondent usually went to bed on that day of the week. The random time points varied across days within respondents and also varied among respondents.
A separate diary book was provided for each day. Respondents were asked to mail back each day’s completed book the following morning in an effort to avoid retrospective completion. Reminder phone calls made on the evening of the first, third, and fifth diary days reinforced these procedures. A debriefing interview was administered the day after the end of the diary week.
Chronic Conditions
Seven conditions were considered sufficiently common in this group for study: allergies, arthritis, asthma, back pain, headaches, high blood pressure, and major depression. The first five of these seven were assessed with simple symptom checklists. High blood pressure was assessed with questions about whether a doctor ever told the respondent that he or she had high blood pressure and, if so, whether the respondent was under treatment for high blood pressure at the time of the interview. Major depression was assessed with the Composite International Diagnostic Interview Short Form
(24), a screening scale found to have good sensitivity (90%) and specificity (94%) for depression diagnoses made with the full Composite International Diagnostic Interview.
Work Performance Measures
Each daily diary included five identical sets of structured questions that focused on the moment in time when the respondent was paged. The questions began by asking respondents whether they were at work and, if so, whether they were on break or on the job at the moment of the page. If on the job, further questions using 7-point self-anchoring scales were asked about work performance. Exploratory factor analysis found two dimensions of performance in these reports. The first dimension, which we refer to as task focus, is indicated by responses to questions about concentration and daydreaming (reversed), coded by using a self-anchoring 1–7 response scale with anchors of “not at all” for the lowest score and “very much” for the highest score (Cronbach’s alpha=0.63). The second dimension, which we refer to as productivity, is indicated by responses to questions about quality, speed, and efficiency (coded by using a 1–7 self-anchoring scale with responses of “low” for the lowest score and “high” for the highest score) and responses to a question about the extent to which the respondent was succeeding at the moment of the page (coded by using a 1–7 self-anchoring scale with responses of “not at all” for the lowest score and “very much” for the highest score) (Cronbach’s alpha=0.76). The scale ranges of 2–14 for task focus and 4–28 for productivity were changed to a new range of 0–100
(25) by using simple linear transformation (N=[scale range–2]100/12 for task focus and N=[scale range–4]100/24 for productivity)
Analysis Procedures
The GLIMMIX macro and Mixed procedure in SAS
(26) were used to estimate a two-level random effects linear regression model with the assumption of an unstructured covariance matrix among the 2,742 data records that were completed while the respondents were working. The 0–100 task focus and productivity outcomes were centered and regressed on dichotomous measures of the seven health problems plus person-level controls (age, sex, occupation, percent of beeps completed, percent of completed beeps at work, and percent of completed beeps at work while working) and within-person controls (day of the week, sequence of the observation among the five observations in a day). Interaction terms were used to identify whether the effects of the health problems varied significantly by time of day, day of week, or demographics. Statistical significance was assessed with two-sided tests.
Results
The sum of 35 moment-in-time assessments for 286 respondents yielded 10,010 logically possible data records, of which 6,607 (66.0%) were completed (N=2,254 [61.3% of 3,675] among reservation agents and N=4,353 [68.7% of 6,335] among customer service representatives). In the total group, 3,460 (52.4%) of entries were made while the respondent was at work (1,101 [44.8%] among reservation agents and 2,450 [56.3%] among customer service representatives), and 2,742 (79.2%) of these were made while working (816 [80.8%] among reservation agents and 1,926 [78.6%] among customer service representatives).
Although agreement to participate in the experience sampling method study was not related to reports of depression in the baseline telephone interview, nonresponse to individual diary entries was higher among depressed workers than other respondents (280 [42.1%] versus 3,123 [33.4%]) (p<0.001).
The Distribution of Moment-in-Time Work Performance
The task focus and productivity scores were both strongly skewed to the right (
Table 1), with means of 67.2 and 73.2, respectively. There was substantial variation among reports, with interquartile ranges (i.e., the difference in the scores at the 25th and 75th percentiles on the scale distributions) of 45.6 for task focus and 27.0 for productivity. Scores were somewhat higher, but with a similar distribution, among reservation agents than customer service representatives.
The Effects of Control Variables
Regression analyses were carried out among reservation agents and customer service representatives combined because of low statistical power in the two separate groups. Between-person control variable analysis (results not shown but available on request) showed that scores for both task focus and productivity were higher among women than men, among older than younger workers, and among reservation agents than customer service representatives. Productivity scores were also significantly related to the three characteristics of the interview process (percents of beeps completed, completed beeps at work, and completed beeps at work while working; results not shown but available on request); therefore, we adjusted for these scores in subsequent analyses. Within-person control variable analysis showed that scores on task focus were lower on weekends than on other days of the week and lower in the afternoon and evening than in the morning, while productivity scores were highest on Mondays, lowest on weekends, and generally unrelated to time of day. With regard to the within-person effects, workers in both occupations had rotating shifts that included weekday and weekend shifts and that covered the morning, afternoon, and evening hours.
Effects Associated With Chronic Conditions
Adjusting for the effects of the control variables, we found that task focus was negatively related to four of the seven conditions, the exceptions being arthritis, headaches, and high blood pressure, and that productivity was negatively related to six of the seven conditions, the exception being allergy. (The study was not conducted during allergy season.) Major depression was the only condition that was significantly associated with a decrement in task focus; back pain and major depression (at the p<0.10 level, two-sided test) were the only conditions that were significantly associated with decrements in productivity (
Table 2). Major depression was associated with decrements of approximately 12 points in task focus and approximately 5 points in productivity on their 0–100 scales. These effect sizes are equivalent to a 0.4 standard deviation decrease in task focus and a 0.3 standard deviation decrease in productivity. Because these effects are based on experience sampling method evaluations at random moments in the workday, they can reasonably be assumed to describe the average decrements in these outcomes at all times across a typical workweek.
We investigated whether collinearities between individual conditions and either demographic characteristics or other conditions may have adversely affected the interpretation and statistical power of models containing all of these variables. To do so, we developed basic models and introduced demographic characteristics first and then other conditions; we observed little change in the size or significance of coefficients representing depression’s effects on either productivity or task focus, suggesting that collinearities with either demographic characteristics or other conditions did not materially affect our results (results not shown but available on request).
The possibility of variation in the magnitude of the significant effects was evaluated across both between-person and within-person control variables. Substantial consistency was found, with effects not varying meaningfully by age, sex, occupation, or day of the week. The only indication of meaningful effect size variation was for major depression by time of day (
Table 3), with the effects on both outcomes largest late in the day.
Monetizing the Effects of Depression
Previous research has shown that depression is associated with roughly a doubling of annual sickness absence days, from an average of approximately 1 day per month among most workers to approximately 2 days per month among depressed workers
(27,
28). This effect is typically monetized by calculating the daily salary and fringe benefits of depressed workers
(29,
30), a conservative estimate in the light of the fact that it does not count lost profit or the inefficiencies in the workplace created by the absence of a depressed worker
(31,
32).
It is less clear how to monetize the effects of depression on on-the-job performance. One approach is to treat the metrics of the task focus and productivity scales as representing absolute levels of work performance. Given means of 67.2 and 73.2 on these scales, respectively, the decrements of 12 points in task focus and 5 points in productivity represent proportional reductions of 18% and 7%, respectively, in these two outcomes. If we assume that the mean of these proportions, 12.5%, represents the proportional decrement in overall performance due to an episode of major depression, it is possible to convert these effects into sickness absence day equivalents by saying that the lost productivity during 8 workdays (the multiplicative inverse of 12.5%) is equivalent to 1 day absent for sickness. On a base of 225 workdays each year (i.e., a typical 250-day work year minus approximately 25 days of absence for sickness per year among depressed workers with episodes that persist the entire year), this is equivalent to somewhat more than 2 days of lost productivity per month of being depressed (225/[8×12]=2.3). Consequently, the estimate of lost productivity related to depression on days at work (2.3 days per month) is considerably greater than the lost productivity found in previous studies from sickness absence (approximately 1 day per month). Even with the relatively low salaries of the service workers in this study, the combined salary-equivalent effect of major depression on absenteeism and lost productivity is greater than $300 per month.
Discussion
These results should be interpreted with four sets of limitations in mind. First, although the experience sampling method removes recall bias by focusing on assessments across a random sample of moments in time, the consistency of these assessments with actual performance is largely unknown
(33). Some previous studies of idle time at work have found that experience sampling method rates are higher than rates recorded in official workplace records
(34,
35), and this has been interpreted as evidence of the greater sensitivity of experience sampling method data
(13). If discrepancies between experience sampling method reports and actual work performance at the sampled moments in time are random, associations with depression would in any case be largely unaffected. However, systematic discrepancies could still lead to overestimation of illness effects
(36,
37). A plausible systematic discrepancy is a “pessimism bias” associated with depression
(10), although this would seem on the face of it to be less likely for reports about task focus than for reports about productivity. Although doubtlessly imperfect, then, reports based on the experience sampling method may be the best measures we can obtain of actual behavior on the job, especially for task focus, where the estimated effect of depression was most pronounced.
Second, some of the comparator conditions that were sufficiently prevalent to examine may have been clinically mild, causing depression to appear to be inordinately impairing. Depression was also the only condition assessed with a screening instrument (others were assessed by symptom checklists and reports of earlier diagnoses), further raising the possibility that depression cases were more severe or acute than cases of other conditions. It may be worth noting that of the seven conditions examined here, four (arthritis, headaches, asthma, and depression) were significantly associated with absenteeism in the parent study that gave rise to our experience sampling method study
(38). Furthermore, among medically serious conditions (e.g., cancer, diabetes, gastrointestinal ulcers, and heart disease) assessed in the parent study that were not sufficiently prevalent in our subgroup to examine, only chronic obstructive pulmonary disease/emphysema was significantly associated with absenteeism.
Third, we did not have the information necessary to determine whether conditions preceded work impairments or vice versa. For this reason, it is not possible to say with certainty whether depression caused impairments or vice versa. Finally, it is not clear to what extent our results on service workers generalize to other types of workers. In addition, some predesignated respondents refused to participate, and others who participated failed to complete moments-in-time responses. However, it should be pointed out that nonparticipation was unrelated to depression in the baseline survey. We also found that nonresponse (typical in experience sampling method studies because of the high demands
[39]) was higher among depressed workers, confirming previous experience sampling method findings that nonresponse is related to both psychopathology and worse functioning
(37,
40,
41). If low response is an indication of failure to perform tasks while at work, our results regarding the effects of depression on work performance may be conservative.
Within the constraints of these limitations, the results reported here show that major depression is more consistently related to poor work performance than any of the commonly occurring chronic physical conditions found in the group studied. Focused experience sampling method studies of depression have found similar results. In particular, depressed primary care patients have been shown to be less engaged in work and more likely to report “doing nothing” during work hours than healthy control subjects
(42), and depressed adolescents have been shown to spend less time in schoolwork than either recovered patients or healthy control subjects
(43).
Our finding that effects of depression are more pronounced late in the day was not examined in previous experience sampling method studies. This finding is consistent with a diurnal pattern of greater symptom severity later in the day that is often found among patients with the milder forms of depression likely to be found in working populations
(44). It is also possible that depression exacerbates the fatigue and reduction in cognitive abilities that have been found to increase naturally over the workday
(45–
48).
An important implication of these results is that the cost-effectiveness of depression treatment from the perspective of the employer might be substantially greater than previously thought. An earlier simulation
(27) based on data from two nationally representative general population samples of workers suggested that depression treatments with effect sizes found in recent effectiveness trials would lead to decreases in work loss and work cutback that would yield a value of $1,100–$1,800 in salary equivalents per year of treatment. These savings exceed the average costs of depression treatment, even though they focus exclusively on absenteeism. We found that the effects of depression on on-the-job work performance, at least for these service workers, were twice as large as the average effects on absenteeism. If the same effect sizes were shown to hold in the general population, then the estimated salary-equivalent value of guideline-concordant depression treatment might be two to three times as high as the estimate based on the earlier simulation.
An important caution with regard to the last point is that real-world treatment of depression often uses suboptimal regimens
(29,
49). Employers are consequently reluctant to accept indirect evidence of cost-effectiveness. As a result, efforts to increase employer enthusiasm for expanded depression treatment will require effectiveness trials to be carried out that estimate the cost-effectiveness from the employer’s perspective of usual care as well as the perspective of enhanced depression care. A new effectiveness trial known as the Work Outcomes Research and Cost-Effectiveness Study, sponsored by the National Institute of Mental Health, is currently under way to obtain such estimates
(28). This new trial as well as future initiatives aimed at evaluating cost-effectiveness from the employer’s perspective need to consider not only the effects of absence because of sickness but also the effects on work performance in order to capture the full extent to which depression affects work performance and the full extent to which depression treatment has value for the employer.