A criticism of randomized controlled trials (RCTs) is related to concerns about the generalizability of their results for clinical decision making in the broader patient population (
1–
7). These concerns stem from several factors. One is that RCTs often exclude individuals with clinical characteristics common to many patients seen in community settings, such as co-occurring substance use disorders, chronic general medical conditions, or suicidality (
1,
4,
5,
8,
9). Another is that patients who participate in RCTs may differ from those seen in community settings on the basis of socioeconomic characteristics, educational attainment, or race-ethnicity (
10). Patient characteristics that may influence an individual’s participation in an RCT (for example, altruism and treatment adherence) (
11) may also influence outcomes, and provider biases may influence which eligible patients are invited to participate. Another concern is that RCT results are not generalizable to usual-care settings, where care often is not delivered in highly protocol-guided, algorithmic ways and where structured outcomes are not routinely measured during treatment. These threats to the generalizability of RCT results to community populations and practice can have significant implications for our ability to translate knowledge gained from RCTs into an understanding of which treatments will be effective for which patients and under what circumstances.
Broadening our understanding of patient characteristics associated with participation in an RCT could provide a useful context for interpreting RCT findings. We can examine some of the typical threats to RCT generalizability by using data from the Systematic Treatment Enhancement Program for Bipolar Disorder (STEP-BD) study (
12), in which RCTs were embedded within a larger, multisite observational study population. Specifically, we can learn whether there are clinical or demographic differences between RCT and non-RCT participants who are drawn from a common pool of patients and sites or clinicians, when the patient pool is clinically diverse and described with considerable clinical detail, and when the RCTs have few clinical exclusion criteria.
Methods
The STEP-BD Study and Population
The goal of the STEP-BD was to conduct clinical trials and other naturalistic studies that required a well-described, clinically diverse population of persons with bipolar disorder (
13). Twenty-one sites in 12 states participated. Several sites partnered with local clinics (six partnerships in five cities) to further increase participation by community clinics delivering mainstream care (
14). Per STEP-BD policy, these local clinics did not contribute patients to the RCTs (personal communication, Sachs GS, 2011). STEP-BD study participants gave informed consent to participate in the observational arm and additional consent for RCT participation. Approval was obtained from an institutional review board (IRB) at each site. For the analysis reported here, further IRB approval was obtained from McLean Hospital and Harvard Medical School.
STEP-BD began in November 1999 and was conducted through September 2005. Recruitment advertising for the STEP-BD consisted of public service announcements by the National Institute of Mental Health (NIMH) released in several cities that contained STEP-BD sites. Sites were quickly inundated with prospective participants, obviating the need for further active recruitment (
15).
STEP-BD participation was offered to new or existing patients at STEP-BD sites who met study criteria for bipolar disorder. Patients were informed about STEP-BD by their program psychiatrist. Participation meant, at a minimum, entering the observational study arm, which served as an overall structure for assessment and treatment of bipolar disorder. Providers in the observational arm received additional training in bipolar disorder treatment, but their treatment choices were not constrained. Participants in the observational arm who met criteria for one of the RCTs were offered an opportunity to participate in the RCT. RCT participants underwent additional assessments, as well as randomized assignment to the RCT treatment protocols. RCT enrollment could begin at any point of a person’s STEP-BD participation (that is, at registration or thereafter) (
13).
We compared two STEP-BD populations. Persons enrolled in at least one of two STEP-BD RCTs for the treatment of acute bipolar depression (the adjunctive antidepressant RCT or the psychosocial treatment RCT) (
16,
17) were compared with those in the observational arm who did not participate in either of these RCTs. In the adjunctive antidepressant RCT, participants were randomly assigned to receive either an adjunctive antidepressant medication or a placebo. In the psychosocial treatment RCT, participants were randomly assigned to receive one of three intensive psychosocial treatments or to a control arm of three educational sessions.
In brief, to be eligible, participants in both acute depression RCTs had to be adults (age 18 or older) who met
DSM-IV criteria (
18) for bipolar I or bipolar II disorder. Diagnoses were determined by a modified Structured Clinical Interview for DSM Disorders (
19) and confirmed by the Mini-International Neuropsychiatric Interview (MINI) (
20). Participants in the acute depression RCTs also met
DSM-IV criteria for a major depressive episode and consented to take mood stabilizer and antipsychotic medication concomitantly (
16). Few RCT exclusion criteria were employed. Both RCTs excluded persons who required short-term treatment for an active substance use disorder and those who were pregnant or planning to become pregnant in the coming year. Additional exclusion criteria for the adjunctive antidepressant RCT were history of nonresponse to the study antidepressants (bupropion and paroxetine) and either introduction of an antipsychotic or a change in dosage of an antipsychotic that had been prescribed for a long time. In the psychosocial treatment RCT, individuals unwilling to discontinue their current (nonstudy) psychotherapy or taper the sessions to one or two per month were excluded. Individuals could choose to participate in the adjunctive antidepressant RCT and not the psychosocial treatment RCT. However, the psychosocial treatment RCT was initially limited to participants in the adjunctive antidepressant RCT. Study investigators later modified this to allow participation of persons ineligible for the adjunctive antidepressant RCT because of a history of nonresponse to the study antidepressants.
Time-varying clinical characteristics (mood state and symptom severity) were noted on the clinical monitoring form (CMF), a template progress note for the STEP-BD observational and RCT arms. We defined “index acute bipolar-depressed visits” for participants in each study arm. For RCT participants, we defined the index visit as the CMF completed closest to the date of RCT randomization. Preliminary analyses indicated that this occurred from seven days before to seven days after randomization for 92% (N=380) of the RCT sample. For participants in the observational arm who were acutely depressed and who had never been enrolled in either of the acute depression RCTs (adjunctive antidepressant or psychosocial treatment), we took the first major depression clinical status noted in a CMF as the index acute bipolar-depressed CMF visit. We excluded from our sample the participants without a CMF and RCT participants for whom we were not able to identify the RCT randomization date.
Primary Outcome
Our primary outcome was a dichotomous variable designating whether or not a STEP-BD participant had been enrolled in either of the RCTs.
Explanatory Variables
We compared demographic characteristics and clinical characteristics of the two groups (that is, those enrolled in an acute depression RCT or not). We also included in the model a categorical variable for site. The demographic characteristics were age (centered), gender, race-ethnicity, education, income (divided by the median income of $40,000), and insurance type. Clinical characteristics included those related to bipolar disorder symptoms (domains from the Bipolarity Index [BPI] [
21]) and baseline severity scores on the Clinical Global Impression (CGI) scales (
22). The BPI is a categorical measure describing a patient’s bipolar disorder symptom history, illness course, age at onset, family history, and prior treatment response. We included the two domains that we felt would be most pertinent for this study: symptom history and prior treatment response. We categorized the CGI as 1–3, no or mild symptoms; 4, moderate symptoms; and 5–7, severe symptoms.
We also included variables for specific co-occurring psychiatric and general medical conditions and characterized the comorbidity “burden” as categorical variables (for example, zero, one, two, or three or more co-occurring conditions). The co-occurring psychiatric and general medical conditions included were those that could complicate or otherwise influence bipolar disorder pharmacotherapy prescribing. For psychiatric conditions, these were anxiety disorders, attention-deficit hyperactivity disorder, and eating disorders. For general medical conditions, these were pregnancy or hepatic, renal, pancreatic, seizure, thyroid, or inflammatory disorders. Variables were also included for conditions or patient characteristics that often lead to exclusion from or influence selection into clinical trials (for example, substance use disorders).
The BPI and CGI are clinician-rated scales. All other explanatory variables were based on patient self-report, with the exception of comorbid mental and substance use disorders, which were determined by the MINI (
20).
Statistical Analyses
This study was conducted with preexisting data from the STEP-BD, which we obtained from the STEP-BD Publications Committee; we subsequently obtained approval from NIMH for use of the data. We computed descriptive statistics (means and standard deviations) for the sample. Because some covariates were missing for some participants, we multiply imputed missing values. Our findings are based on combined multiply imputed data sets. We then fitted a mixed-effects logistic regression model in which site was a random effect (referred to as the full model). The mixed-effects model also enabled us to test the significance of site as an independent predictor of RCT participation allowing for the total number of patients the site contributed to STEP-BD. To separately quantify the impact of each characteristic significantly associated with RCT enrollment in the mixed-effects model, we fitted separate mixed-effects logistic regression models, each excluding a variable found to be significant in the full model.
To interpret the impact of significant variables, we estimated the receiver operating characteristic (ROC) curves of the fitted probabilities under both models (that is, the full model and the model missing the significant variable) and evaluated the difference in the area under the curves (AUCs). The AUC represents the probability that the model correctly classifies whether or not a randomly selected participant enrolls in an RCT. This can be thought of as a scale-free standardized effect size of a given explanatory variable on RCT participation. We treated the site dummies as random effects so that inferences pertain to an entire population of sites providing psychiatric care as opposed to their representing the sites that participated in STEP-BD. We then calculated the mean predicted probability of a particular site enrolling a patient in the RCT by using the results from the mixed-effects logistic regression model (
23). This yielded mean site-specific probabilities adjusted for patient characteristics, thereby ensuring valid comparisons.
In discussions with the STEP-BD principal investigator, we learned that one site expressed reluctance to enroll participants with health insurance in the adjunctive antidepressant RCT (personal communication, Sachs GS, 2011). The reason given for this reluctance was that additional health care costs resulting from potential adverse outcomes related to study participation (for example, if participation in one study arm was harmful or was less efficacious than participation in another) would need to be shouldered by the health insurance plans. Therefore, we conducted a post hoc analysis in which we excluded that site to determine whether it altered the results for the association between insurance status and RCT participation.
We conducted other post hoc sensitivity analyses. Given that patient or site characteristics associated with enrollment in an adjunctive antidepressant RCT may differ from those associated with enrollment in a psychosocial treatment RCT, we fitted a separate model excluding participants who entered only the psychosocial RCT. However, the limited number of participants who enrolled only in the psychosocial RCT precluded us from also fitting a separate model to them alone. Although our fixed-effects modeling enabled us to examine site contribution to the RCT independent of site size, as an additional sensitivity analysis, we added a variable to the model that controlled for site volume in order to reduce the component of site variation in RCT enrollment that was explained by site volume.
Results
We excluded 12 RCT participants either because they lacked CMFs or because we were unable to match them to the enrollment file where the date of their consent to the RCT was located. Bivariate analyses found no difference between participants in the RCT or in the observational arm in rates of missing data for any of the characteristics. Our total sample size was 2,222 (RCT, N=413; observational arm, N=1,809). As expected on the basis of the linked inclusion criteria of the two RCTs, the RCT participant populations overlapped considerably: among the 413 individuals who participated in at least one of the two STEP-BD acute depression RCTs, 56% (N=233) participated in both. The 233 individuals who participated in both RCTs represented 65% of the adjunctive antidepressant RCT participants (N=359), and 81% of the psychosocial treatment RCT participants (N=287).
Participants in the observational arm and in the RCTs were largely white (>85%) (
Table 1). About half had at least a college degree, and most were privately insured (>55%). CGI scores were predominantly in the moderate to severe range; about half had scores in the moderate range, and 22%−26% had scores in the severe range. About half had a co-occurring substance use disorder, over two-thirds had one or more comorbid mental health conditions that were not substance use disorders, and nearly a third had one or more of the general medical conditions that can influence pharmacotherapy choices for bipolar disorder.
In the full mixed-effects model (
Table 2), being uninsured, compared with having private insurance, was a significant predictor of RCT participation (odds ratio [OR]=1.58), as was a baseline CGI score in the severe range (OR=1.52), compared with a score indicating mild symptoms. Four sites had significantly higher odds of contributing patients to the RCT, whereas three had significantly lower odds: site C, OR=2.23; site K, OR=1.67; site L, OR=2.10; site P, OR=2.21; site B, OR=.59; site F, OR=.51; and site Q, OR=.40. The mean predicted probabilities of individual sites contributing patients to the RCT ranged from 8% to 31% (data not shown). Despite the STEP-BD policy that patients from community clinics would not be enrolled in the RCTs, among the five STEP-BD sites that partnered with the six community clinics, only one site had lower odds of contributing patients to an RCT. For the others, no greater or lesser likelihood was noted.
When the full mixed-effects model was compared with the model that did not include site, the difference in AUC suggested that site increased the accuracy of the model by 9 percentage points (model excluding site, AUC=.61; full model, AUC=.70) (
Table 3). Insurance status did not significantly change the accuracy of the ROC for the model that excluded insurance, nor did excluding the baseline CGI score (AUC=.70).
None of our sensitivity analyses (adding site volume to the mixed-effects model, dropping persons who participated only in the psychosocial RCT, or dropping the study site where site investigators expressed reluctance to enroll insured participants in an RCT) yielded different results.
Discussion
Site was the strongest determinant of RCT participation. That is, not all STEP-BD sites contributed similar proportions of patients from their observational arm to the acute depression RCTs. Our findings are independent of the actual number of overall STEP-BD participants from a given site. This finding is notable because clinics were selected for STEP-BD participation on the basis of criteria that would favor a capacity for completing RCT research tasks.
Site contributions to RCT participation are of interest because the outcomes of clinical trials often vary by site (that is, a site × treatment interaction effect) (
3,
24–
26). This effect can be attributable to variation in protocol adherence and in participant characteristics, although one purpose of multisite studies is to increase diversity in key participant characteristics (for example, race-ethnicity and socioeconomic characteristics) to improve generalizability (
27). Our findings of differential recruitment to RCTs by site raises important questions that require further study about whether site-specific enrollment to a multisite RCT may be related to a site’s resources to deliver care in general. For example, “high-enrolling” clinics may have different staffing composition or clinician-to-patient ratios that have an impact on care delivery or treatment quality. Important future work includes examining site characteristics in multisite clinical trials to understand why sites may differ in clinical trial outcomes and how we can interpret and extend results from RCTs to usual-care settings and populations.
Few clinical or demographic differences were found between participants in the STEP-BD acute depression RCTs and the observational arm. In contrast, previous studies have found that exclusion criteria typically used in RCTs often exclude patients with more complex presentations or heterogeneous characteristics (
1,
5,
9). Our finding of few clinical or demographic differences between participants in the RCTs and in the observational arm is perhaps not surprising given STEP-BD’s goal of a broader representation of patients with bipolar disorder than typically seen in clinical trials. Nevertheless, it is noteworthy that the STEP-BD achieved this aim for the RCTs.
Also in contrast to prior research on RCTs that involved patients with general medical conditions (
10,
28), our study found that a lack of insurance was associated with RCT participation. This finding could be mediated by investigator biases or patient preferences and choices. Excluding from our analysis the STEP-BD site where investigators expressed reluctance to recruit insured individuals did not change the results; other investigators or sites may have had a similar reluctance but did not express it. If our finding is attributable to investigator biases, it raises ethical concerns that investigators may view patients with and without insurance differently in regard to RCT participation. Alternatively, uninsured patients may be more likely than those who are insured to choose RCT participation.
In STEP-BD, the RCT paid for the study treatments (for example, study antidepressants and psychosocial visits) but not for RCT participation, non-RCT–related study visits, or any other psychotropic medications. Thus study participation offered some financial benefit to uninsured individuals. If financial constraints due to lack of insurance encouraged some to participate in the RCTs, then this raises concerns about the disproportionate burden that persons without insurance may bear for clinical trials as a means to procure health care. Federal legislation to mandate parity and reduce the number of uninsured persons—that is, the Mental Health Parity and Equity Act and the Affordable Care Act—could reduce this ethical concern by providing more patients the opportunity to receive mental health care outside a clinical trial.
A limitation of our study is that it could not address all the potential threats to RCT generalizability. STEP-BD participants likely differed from patients seen in some community settings, such as in demographic characteristics (for example, race-ethnicity, educational attainment, and income) and in openness to participate in observational research. However, STEP-BD seemed to attract a patient population with a clinical complexity similar to that seen in usual-care settings; for example, compared with the community mental health center population in the Texas Medication Algorithm Project, STEP-BD participants had similar or higher proportions of chronic general medical conditions and co-occurring substance use disorders (
29).
Conclusions
Our findings highlight the importance of future research focused on understanding not only how RCT patient populations may differ from patients seen in usual care but also the heterogeneity among clinics that participate in RCTs (that is, within an RCT) and whether this heterogeneity is predictive of patient outcomes as well.
Acknowledgments
The authors gratefully acknowledge support from the National Institute of Mental Health (grants RC4 MH092717 and K01MH071714 to Dr. Busch) and from the Health Services Research Division, Partners Psychiatry and Mental Health. They also thank Andrew Nierenberg, M.D., and Gary Sachs, M.D., for sharing their knowledge and expertise about the STEP-BD study.