Full access

Original Articles

Published Online: 2012, pp. 205–310

Some Limitations on the External Validity of Psychotherapy Efficacy Studies and Suggestions for Future Research

Glenn D. Shean, Ph.D. [email protected]Authors Info & Affiliations

Publication: American Journal of Psychotherapy

Volume 66, Number 3

https://doi.org/10.1176/appi.psychotherapy.2012.66.3.227

PDF/EPUB

Abstract

Increased emphasis on identifying empirically supported treatments (ESTs) has enhanced the scientific basis for psychotherapy practice, but uncritical acceptance of ESTs as the basis for credentialing and policy decisions risks stifling innovation and creativity in the field. There are limitations inherent in efficacy studies of psychotherapy that can constrain external validity. This article discusses several limitations on the external validity of efficacy studies, as well as other issues related to evaluating psychotherapy outcome research. These limitations and concerns include: 1) the practice of maximizing homogeneity by selecting participants diagnosed with a single Axis I disorder; 2) the practice of requiring manualized therapies for efficacy research; 3) the assumption that lasting and meaningful changes occur and can be assessed within a relatively short time frame; 4) the assumption that valid assessments of outcome can be conducted in randomized control trials studies without concern for researcher allegiance; and 5) the view that evidence of effectiveness from non-RCT design studies can be ignored. Finally, alternative research approaches for studying psychotherapy that can potentially supplement knowledge gained from efficacy studies and foster continued innovation and creativity in the field are discussed.

Introduction

The purpose of psychotherapy research is twofold 1) to establish an empirical basis for psychological therapies and 2) to increase understanding of the processes that facilitate change. Emphasis on the importance of establishing an empirical basis for psychological therapies increased substantially in 1995 when the Task Force on Promotion and Dissemination of Psychological Procedures, sponsored by Division 12 (Clinical Psychology) of the American Psychological Association (APA), published a list of criteria for identifying empirically validated treatments (later relabeled empirically supported treatments or ESTs). The criteria set for a “well established” EST were as follows:

a treatment should be manualized and demonstrated to be more effective than other treatments or placebo, or equivalent to an established EST in at least two randomized control group studies or in a number of single case design experiments conducted by different researchers (Chambless & Holton, 1998, p. 9).

These criteria were patterned after criteria established by the U.S. Food and Drug Administration for drug efficacy studies to include the additional requirement that ESTs are defined as “those clearly specified psychological treatments that have been shown to be efficacious in controlled research studies with a clearly delineated population” (Chambless & Holton, 1998, p. 9). Empirically supported treatments have been widely adopted as the standard of care required by many managed care organizations (MCOs) and state Medicaid programs (Carpinello, Rosenberg, Stone, Schwager, & Felton, 2002; Seligman & Levant, 1998). Randomized control trial studies have also become the “gold standard” for psychotherapy research (Sternberg, 2006), as indicated in the funding guidelines recently issued the National Institute of Mental Health (2008).

An impressive body of evidence has been accumulated since 1995 supporting the effectiveness of ESTs for treating a number of psychological dysfunctions (Barlow, Gorman, Shear, & Woods, 2000; Stuart, Treat, & Wade, 2000). This evidence is the basis for growing recognition of the importance of ESTs as indicators of competence among the public, policymakers, and training programs, and for reimbursement among MCOs (Laroche & Christopher, 2009). The enthusiasm of EST advocates is demonstrated by references to psychotherapies not included in evidence supported lists as, “less essential and outdated” (Calhoun, Moras, Pilkonis, & Rehm, 1998, p. 151) and articles in the popular press with titles such “Is Your Therapist a Little Behind the Times?” (Baker, McFall and Shorham, 2009) or “Ignoring the Evidence: Why Do Psychologists Reject Science” (Begley, 2009).

Nevertheless, there is growing concern about the potential legal ramifications and practice limitations associated with adoption of lists of ESTs as the basis for judging clinical competence and reimbursement (Rupert & Baird, 2004). There are also concerns about the applicability of ESTs to the diverse range of patients seen in actual practice settings (LaRoche & Christopher, 2009; Sue, et al, 2006; Wampold, 2007; Westen, Novotny, & Thompson-Brenner, 2004).

The goal of grounding psychotherapy practice in research is wise from both scientific and public policy perspectives, but there is risk associated with overzealous generalizations about the external validity of ESTs. Restrictions on training, and practice based solely on psychotherapy efficacy studies, may result in adverse legal and practice effects and stifle innovation and creativity in the field. There are two methods that can be used to evaluate psychotherapy outcomes. Efficacy studies use random controlled trial (RCT) designs to compare the application of manualized therapies delivered over a fixed number of sessions to treat individuals diagnosed with specific uncomplicated disorders. Effectiveness studies are conducted in naturalistic settings without the use of manuals, or strict session limits, most often with individuals who often have multiple problems. As Seligman (1996) has indicated, there is a substantial inferential distance between efficacy studies of manualized therapies of fixed duration, with selected, patient samples and effectiveness studies of therapy as practiced in naturalistic settings.

Effectiveness studies lack control over variables, such as number of sessions, therapeutic method, and sample characteristics, but have fewer problems associated with inferential distance because they more closely resemble and test what the research is designed to generalize to—actual practice. Additional concerns about the external validity of efficacy studies have been expressed by several prominent psychotherapy researchers (e.g., Beutler, 1998; Sue, et al., 2006; Wachtel, 2010; Wampold, 2007; Westen, et al., 2004). These concerns include:

the potential limitations of RCT studies that attempt to maximize homogeneity by selecting participants diagnosed with a single, uncomplicated Axis I disorder;

the practice of limiting studies to manualized therapies;

the practice of assessing changes solely in terms of symptom reduction, often assessed over a relatively short time period;

the failure to incorporate appropriate safeguards in RCT studies to minimize the effects of “therapist allegiance” and,

the dismissal of the relevance of evidence from effectiveness studies.

This paper will discuss these issues and suggest additional approaches to psychotherapy research that can contribute important knowledge about psychotherapy process and outcome.

1. Sample Restrictions

The rationale for limiting participants to a single diagnostic category is that specification of the sample population allows for ease of replication. While replication of research evidence is important, the goal of easy replication must be balanced against evidence that there is substantial heterogeneity within diagnostic groups (Howard, et al., 1996). The practice of limiting efficacy studies to populations with specific problems does not mirror the realities of actual clinical practice. Evidence indicates that about one-third to one-half of people seeking mental health treatment do not meet criteria for any one diagnostic category (Howard, et al., 1996), and when specific symptoms are the focus of treatment about one-half of the patients add new target complaints or change their complaints during the course of treatment (Kazdin, 2008).

Restricting efficacy studies to individuals with a single DSM diagnosis is likely to result in evidence that cannot be generalized beyond specific study sample characteristics. Between 50% to 90% of Axis I diagnoses are comorbid with other Axis I or Axis II disorders (Kessler, Stang, Witchen, Stein, & Walters, 1999; Zimmerman, McDermot, & Mattia, 2000; Thompson-Brenner & Westen, 2004a, 2004b; Westen & Hernden-Fischer, 2001). Axis I symptoms often are related to broad dispositional traits (e.g., negative affect, vulnerability, introversion) that may influence long-term outcomes and predispose individuals to developing symptoms under different circumstances (Blatt & Zuroff, 1992; Hammen, Ellicott, Gitlin, & Jamison, 1989; Krueger, 2002; Kwon & Whisman, 1998; Mineka, Watson, & Clark, 1992; Zinburg & Barlow, 1996). One cannot assume that individuals with a single diagnosis are matched for additional characteristics including: the presence and degree of family and marital difficulties, the extent of life and job frustrations, the presence of pretreatment trauma, chronic illnesses, substance abuse problems, ethnicity, culture, or issues related to transitions or life stage adjustment issues. If, for example, depressive symptoms remit in response to a targeted therapeutic intervention but dispositional and contextual problems are ignored, outcome may be quite different from what symptom-focused follow-up evaluation indicates. An additional issue regarding efficacy studies is the questionable validity of many DSM-IV diagnoses. Studies of psychotherapy outcome should include assessments of participants beyond DSM-based diagnostic symptoms to include broad-based assessments continued over several years. Assessment instruments are available that can provide broad based assessments of function and outcome but, these instruments are not widely used (Westen & Shedler, 1999a, 1999b, 2007). Inclusion of these measures in addition to DSM symptoms would allow for more valid assessments of psychotherapy outcome.

In summary, the practice of restricting EST studies to a single diagnostic category limits the external validity of many efficacy studies because important comorbid problems and contributing dispositional and contextual factors may be ignored. People develop difficulties in different contexts and bring many attributes and assets to therapy including: their personal histories, ability to identify and describe their emotions and concerns, level of self-reflectiveness, affect tolerance, readiness and motivation for change, quality of social supports, coping abilities, and capacity for intimacy (Asay & Lambert, 1999; Prochaska & Norcross, 2002). These attributes and influences can have a significant impact on psychotherapy outcome. Therapy is not only about symptom reduction; often it is designed by mutual agreement to bring about broad changes, such as helping patients to face and to deal with emotional and interpersonal issues, to make positive life changes or to encourage the risk of being more honest with themselves and others, to develop more fulfilling relationships, to actualize talents and abilities, to tolerate and be more aware of affects, to understand self and others in more nuanced ways, and to be better prepared to deal with life’s challenges with greater flexibility. The diversity of these goals can make it difficult to measure systematically outcome across individuals. Efficacy studies have ignored these complexities.

2. Brief Symptom Focused Interventions and Outcome

Psychotherapy efficacy studies target outcomes of treatment achieved between sessions 6 and 16 (Morrison, Bradley, & Westen, 2003; Westen, Novotny, & Thompson-Brenner, 2004). The idea that many standing problems can be permanently resolved in 6 to 16 sessions is not consistent with evidence of a significant psychotherapy dose-response relationship (Howard, Kopta, Krause, & Orlinsky, 1986; Kopta, Howard, Lowry, & Deutler, 1994; Seligman, 1995). The median treatment length for depression in outpatient settings is 75 sessions, a marked contrast to the average 6-16 sessions reported by efficacy studies. Field studies of an EST, such as CBT therapy for depressed patients, report an average of 69 sessions, a marked contrast to the average of efficacy studies (Morrison, Bradley, & Westen, 2003). Evidence indicates there is substantial variability in response to treatment and that we do not understand the contributors to this variability (Baldwin, Berkeljon, Atkins, Olsen and Nielsen, 2009). For example, change in therapy has been demonstrated to be a function of factors such as therapist competence and patient dispositional traits (Howard, Lueger, Maling, & Martinovitch, 1993; Kopta et al. (1994). Mental disorders are also characterized by intervals of remission and relapse, rendering the validity of relatively short-term follow-up evaluations potentially problematic. The well-designed NIMH Collaborative Research Program, for example, included a large sample size, an active medical management placebo control group, and employed investigators with allegiance to each approach to therapy under investigation (Elkin et al., 1989). Short-term assessments of depressed patients in treatment indicated promising responses to 16 weeks of both CBT and Interpersonal Therapy (IPT). Follow-up at 18 months indicated that the majority those treated either relapsed or sought further treatment (Shea, Elkin, et al., 1992).

3. Therapy From a Manual

The purpose of using therapy manuals in efficacy research is to reduce variability by standardizing interventions across participants and sites. Task force guidelines call for the use of “treatment manuals or their equivalent in the form of a clear description of the treatment” (Chambless & Ollendick, 2011) and assert that studies that do not include therapy manuals are “acceptable as evidence” only in “specific and rare exceptions” (Chambless et al. 1996, p. 6). The requirement of therapy manuals implies that psychotherapy can be standardized and administered in a manner parallel to drugs and other medical treatments but, the similarities between mental disorders and medical disorders are limited. Psychotherapy is rarely a “one-size-treats-all” process. Therapies for patients who are not preselected do not lend themselves to manualization, nor can providers be matched in study designs. Meta-analyses of effectiveness studies indicate that differences between approaches to psychotherapy account for less that 10% of the total variance among outcomes, suggesting that the “active ingredients” of therapy are not necessarily those associated with a particular treatment model (Lambert & Ogles, 2004; Luborsky, et al., 2002; Wampold, et al., 1997; 2002). The overlap between approaches in practice is such that raters of session transcripts often have difficulty determining which manualized therapy was being provided (Ablon & Jones, 2002). The lack of clarity about the processes that contribute to change with manualized therapy was described in a review of cognitive therapy outcome research as follows, “Perhaps we can state more confidently now than before that whatever mechanisms of changes with cognitive therapy, it does not seem to be the cognitions as originally proposed” (Kazdin, 2007, p. 8).

A study of archival treatment records of therapist adherence to either a psychodynamic or cognitive prototype, regardless of therapist’s beliefs about what model they were following, indicated that several prototypic dynamic practices best predicted successful outcome (e.g., encouraging open-ended dialogue; identifying recurring themes in the patient’s life; linking feelings and perceptions to past experiences; drawing attention to feeling states experienced as unacceptable; identifying defensive responses; interpretations; and discussing potential connections between the therapy relationship and other relationships; (Ablon & Jones, 1998). Therapists’ adherence to CBT practices (e.g., focus on cognitive themes such as thoughts and belief systems; discussion of specific homework tasks and explanations of the rationale behind treatment and techniques; therapist introducing topics; therapist functioning in a didactic manner; discussion of specific treatment goals; and focus on the patient’s current life situation) did not relate to successful outcome regardless of therapist stated orientation (Ablon & Jones, 1998). A study of specific process predictors of positive outcome using the Psychotherapy Process Q-Set, indicated that adherence to cognitive-behavioral process was most characteristic, adherence to interpersonal and psychodynamic process; however, was most predictive of positive outcome (Ablon, Levy, & Katzenstein, 2006).

Surveys of practitioners indicate that most describe themselves as theoretically eclectic. Cognitive-behavioral clinicians report using psychodynamic strategies, exploring relationships and unconscious processes with patients evidencing higher levels of emotional dysregulation. Psychodynamic clinicians report that they use CBT treatment strategies with emotionally constricted patients (Thompson-Brenner & Westen, 2004a, 2004b). Psychotherapy outcome studies in naturalistic settings attempt to assess what is more often than not an individualized and fluid process that is not compatible with the use of specific therapy manuals as included in efficacy studies.

4. The “Wild Card” Effect in Psychotherapy Research

A limitation of nearly all efficacy studies that form the basis of EST lists is the omission of double-blind outcome ratings, an important safeguard that is included in RCT designs. Double-blind ratings in psychotherapy efficacy studies are admittedly difficult, expensive, and often impractical to include, however, the lack of double-blind ratings leaves results susceptible to the “wild card” effect of therapist allegiance. This is not a minor issue since therapist allegiance has been shown to account for about 69% of the variance in psychotherapy outcome studies (Luborsky et al, 1999). Double-blind ratings may be difficult to include in psychotherapy studies but several practical suggestions for controlling the effects of therapist allegiance were provided by Luborsky and colleagues; these suggestions have been ignored. Luborsky et al. (1999) suggest that at minimum comparative treatment studies be conducted using raters of varied theoretical persuasions and researchers with minimal allegiance to the approaches studied. In addition, therapists for each treatment mode should be selected and supervised by those who represent the same treatment mode. Therapists should be assigned to each mode of treatment on the basis of ratings of their effectiveness. Outcome criteria should be developed based on the input of the therapists of all persuasions under study, and long-term functional follow-up evaluations should be conducted using consistent outcome criteria. Luborsky et al., (1999) recommend that if all else fails efficacy studies include researcher/therapist allegiance as a variable in all analyses.

5. Dismissal of Evidence From Trials That are Not Randomized and Controlled

A study of patients with panic disorder by self-identified psychodynamic clinicians indicated rates of remission and change scores commensurate with those of empirically supported therapies for panic disorder. Treatment gains were maintained at 6-month follow-up (Ablon, Levy, & Katzenstein, 2006). Meta-analytic analysis of comparative outcome studies indicated that there no large differences exist in efficacy among the major psychotherapies (e.g., cognitive-behavioral, interpersonal, behavioral activation, psychodynamic, problem solving, or social skills training) for mild to moderate depression (Benish, Imel, & Wampold, 2008; Cuijpers, vanStraten, Andersson, and van Oppen, 2008; Luborsky et al., 2002; Westen and Morrison, 2001). There is substantial evidence available to support of the effectiveness of dynamic therapies (e.g., Clarkin et al., 2007; Leichsenring, 2001; 2005; Leichsenring & Leibing, 2003; Leichsenring & Rabung, 2008; Leich-senring, Rabung, & Leibing, 2004; Levy & Ablon, 2008; Lewis, Dennerstein, & Gibbs, 2008; Seligman, 1996; Shedler, 2010; Wampold, 2008). Evidence from effectiveness studies of the effects of psychodynamic therapies is ignored.

The Present

Efficacy studies provide reasonable assurance of internal validity, (i.e., the degree to which results reported can be attributed to the therapeutic approach studied) but do not necessarily allow for conclusions about effectiveness. Psychotherapy outcome research has been based on the assumptions of a linear causal model where patient symptoms → manualized therapeutic approach → outcome (symptom reduction). But psychotherapy as practiced in the field often involves more than the application of specific strategies to fix specific problems; it is a complex series of interactions that take place over time between a therapist who has unique characteristics and a patient who has unique characteristics; it is a process that takes place in a particular context and that results in a reciprocal, unfolding causal process with unpredictable emergent properties (Wampold, Hollon, & Hill, 2011). Questions about outcome in terms of symptom reduction are important, but it is also relevant to evaluate the consequences of therapy for the patient’s life situation beyond symptom reduction, for example, whether long-term patterns have changed and the quality and character of the patient’s life improved so that life has become more satisfying, more meaningful and fulfilling, and more socially productive Orlinsky, 2009).

Psychotherapy research is also not only about questions of outcome, but there also are many important questions that must be studied about how therapy works and how treatments can be improved. Process research examines what occurs during therapy. Process variables, such as patient characteristics, context, therapist responses and the therapeutic alliance, cannot be experimentally manipulated; nevertheless, they can influence outcome. Research indicates, for example, that the quality of the therapeutic relationship accounts for about 30% of the variance in outcome, therapist techniques about 15%; expectancy (hope), therapist credibility about 15%, and environmental and patient characteristics (e.g., readiness for change, openness, engagement, active participation, ability to verbalize feelings) about 40% (Asay and Lambert, 1999). Characteristics of the therapist have a significant effect on outcome (Beutler, et al., 2004; Horvath & Bedi, 2002; Wampold, 2001), and within treatments, therapists vary considerably in their outcomes (Huppert, et al, 2001; Kim, Wampold, & Bolt, 2006; Wampold & Brown, 2005). We do not know enough about what highly effective therapists do that has a significant positive impact on outcome but, case studies suggest that process variables such as therapist immediacy (e.g., inquiring about reactions to the therapy relationship, pointing out parallels between the therapy relationship and other relationships, processing boundary crossings, disclosure of feelings) have a significant effect on outcome (Hill, et al., 2008; Kasper, Hill, & Kivlighan, 2008).

Therapist effects have been ignored in most EST research, an omission that limits generalizability (Serlin, Wampold, & Levin, 2003), inflates treatment effects (Wampold & Serlin, 2000), and obscures understanding how process variables are related to outcome (Baldwin, Wampold & Imel, 2007). The APA recently created a new Joint Task Force to develop guidelines for research on evidence-based psychotherapy relationships, the goals of which are to foster research efforts to identify elements of effective therapy relationships, and to identify effective methods of adapting therapy to the characteristics and needs of the patient (other than diagnosis; Norcross, 2002). These guidelines may help broaden the focus of psychotherapy research to include questions about variables such as the influence of the therapist-patient relationship, therapy process variables, and the impact of patient characteristics beyond diagnostic symptoms on outcome. We know a fair amount about the processes that contribute to therapeutic effectiveness including: the importance of developing a positive therapeutic alliance, fostering a sense of hope and self-efficacy, and encouraging relevant emotional expression but, there is much more to be learned about the particular skills, processes and practices associated with different conceptual models, how they may contribute to positive change, and how they interact with individual differences (Beutler et al., 2011). We need to learn more about the effects of other moderator variables on psychotherapy effectiveness, including variables such as: levels of distress and impairment, co-occurring problems, dispositional traits, self-reflectiveness, and openness to experience, as well as access to social supports, coping skills, and how these moderators interact with therapeutic approaches (Goldfried and Eubanks-Carter, 2004). We also do not adequately understand the proper conditions for effective application of therapeutic interventions such as the two-chair technique, transference interpretation, enactment, and the use of metaphor and paradox. Shedler (2010) described the key features that distinguish dynamic psychotherapy: as: a focus on affect and expression of emotion; exploration of attempts to avoid distressing thoughts and feelings; identification of recurring themes and patterns; discussion of past experiences with a developmental focus; focus on interpersonal relations; focus on the therapy relationship; and exploration of fantasy life. Research is needed to identify when, how, in what context and with whom these strategies are most likely to be effective, and whether or not these strategies can be effective across theoretical approaches. Clinical studies of comparable cases can be of value as illustrations of therapeutic principles (Goldfried and Wolfe, 1998). We can also learn about processes that bring about change by systematically studying the recorded therapy sessions of acknowledged expert clinicians. There is a need for in-depth within-model comparisons of therapies and comparisons of components of therapy models may help us better understand change mechanisms that operate within a particular model, as well as how to best maximize efficacy in the application of these mechanisms (Jacobson, 1999; Jacobson & Addis, 1993).

The APA Presidential Task Force on Evidence Based Practice (2005) defined evidence based practice as “the integration of the best available research with clinical expertise in the context of patient characteristics, culture and preferences” (APA, 2005, p. 273). This broader definition was apparently developed as a response to growing concerns that ESTs were being narrowly interpreted and misused as justification for questionable restrictions on access to care and treatments of choice (Norcross, Koocher, & Garofalo, 2006). The redefinition of ESTs was intended to incorporate the strengths of EST studies while also recognizing the value of diverse methodological approaches that “require an appreciation of the value of multiple sources of scientific evidence” (APA, 2006, p. 280). Much more research is needed to help us better understand the interaction between process and outcome in terms of the impact of the therapist, how and when certain interventions are most likely to be effective, and how the process of therapy changes over time.

Wampold et al. (2011) suggest that multilevel statistical models should be applied in psychotherapy research to estimate the proportion of variance in outcomes because of different levels of variables (e.g., patient characteristics, therapist characteristics, and treatments) and to assess the impact of these variables on outcome (Hox, 2002). Multilevel statistical models can be used to analyze longitudinal process and outcome data that are nested within therapists so that the effects of temporal and nested components of therapy can be identified and evaluated (Klein, et al., 2003). Once process variables, such as the characteristics and actions of effective therapists, are better understood within and across therapeutic approaches and evidence are accumulated about how these characteristics interact with patient variables across time, we will be better able to train effective therapists, and to assess and make policy decisions about therapeutic effectiveness. Until that time it seems most appropriate to remain somewhat circumspect about the practice and policy implications of psychotherapy efficacy studies.

REFERENCES

Ablon, J.S., & Jones, E.E. (1998). How expert clinicians’ prototypes of an ideal treatment correlate with outcome in psychodynamic and cognitive-behavioral therapy. Psychotherapy Research, 8, 71–83.

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

Abstract

Introduction

1. Sample Restrictions

2. Brief Symptom Focused Interventions and Outcome

3. Therapy From a Manual

4. The “Wild Card” Effect in Psychotherapy Research

5. Dismissal of Evidence From Trials That are Not Randomized and Controlled

The Present

REFERENCES

Information

Published In

History

Keywords:

Authors

Affiliations

Notes

Metrics

Citations

Export Citations

View options

PDF/ePub

Get Access

Login options

Purchase Options

Not a subscriber?

Figures

Other

Share

Share article link

Share