Full access

Original Articles

Published Online: 2013, pp. 1–108

Practice-Based Evidence: 45 Years of Psychotherapy’s Effectiveness in a Private Practice

Paul Clement, Ph.D., ABPPAuthors Info & Affiliations

Publication: American Journal of Psychotherapy

Volume 67, Number 1

https://doi.org/10.1176/appi.psychotherapy.2013.67.1.23

PDF/EPUB

Abstract

Of 2,259 patients seen during 45 years of private practice, outcome data was produced for 1,599 cases. The mean (SD) number of sessions per case was 18.82 (29.89). The dropout rate was 18.76%. Of all treated cases with outcome data 4 (0.25%) were rated as Much Worse; 11 (0.69%), Worse; 497 (31.08%), No Change from Intake; 546 (34.15%), Improved; and 541 (33.83%), Much Improved. The mean (SD) pre-/post-treatment effect size (ES) was 1.90 (1.61), the median was 1.62, and the range was from −2.91 to +15.22. Patients and parents of minors rated outcomes more positively than the therapist did. Outcome varied significantly across diagnostic categories. There was a significant, positive relationship in length of treatment and outcome. The therapist’s effectiveness did not improve across the years. Years with the largest patient caseloads or the greatest proportion of patients with managed-care insurance tended to show the poorest outcomes.¹

Introduction

About 60 years ago Eysenck (1952) reviewed the published research on adult psychotherapy and concluded, “The figures fail to support the hypothesis that psychotherapy facilitates recovery from neurotic disorder” (p. 323). Five years later Levitt (1957) reviewed the research on child and adolescent psychotherapy and reached a verdict similar to that of Eysenck: “. . . the results of the present study fail to support the view that psychotherapy with ‘neurotic’ children is effective” (p. 195). These two articles triggered many criticisms from throughout the world, but these two psychologists updated their literature reviews and came to the same conclusions made in their first articles (Eysenck, 1961; Levitt, 1963).

In spite of such negative reviews of research on psychotherapy, psychotherapists continued to practice. On rare occasions they reported their results. For example, Heilbrunn (1966) evaluated her outcomes from 17 years of practicing psychoanalysis and psychoanalytic therapy and published her results in the American Journal of Psychotherapy. She claimed that 77 of 173 patients (i.e., 45%) improved; however, she excluded more than 80 patients seen for less than 20 sessions. When I read her paper a few months before becoming a licensed psychologist in California, I resolved to do something similar. That is the purpose of the present paper.

Throughout the 1950s and 1960s researchers were responding to the challenges posed by Eysenck and Levitt. In 1970 Meltzoff and Kornreich reviewed that research. They concluded that well-designed and controlled research had demonstrated very positive outcomes from psychotherapy. They also reviewed research on characteristics of patients and therapists that contribute to positive treatment outcomes, and on patient-therapist relationship variables that make a difference. Although their review provided encouragement to psychotherapists in all work settings, it did not reveal what kind of outcomes were obtained by therapists in private practice. It did not identify brief outcome measures suitable for repeated administrations to gauge patient change across time.

Seven years later Smith and Glass (1977) introduced a quantitative approach for performing literature reviews of controlled treatment-outcome studies on adults. They called it “meta-analysis.” Three years later Smith, Glass, and Miller (1980) expanded and updated the previous review. Although many methodologists criticized their approach, many other psychologists (and researchers from many other disciplines) adopted and adapted meta-analysis for reviewing research findings. Many investigators have performed meta-analyses of controlled treatment-outcome research on psychotherapy for children and adolescents. In analyzing the findings from 27 meta-analyses of child, adolescent, and adult psychotherapy research, the mean and standard deviation effect size (ES) was 0.76 (0.24). 95% CI [0.66, 0.86] (Abbass, Kisley, & Kroenke, 2009; Anderson & Lambert, 1995; Bratton, Ray, Rhine, & Jones, 2005; Casey & Berman, 1985; Driessen, Cuijpers, de Matt, Abbass, de Jonghe, & Dekker, 2010; Erion, 2006; Fossum, Handegard, Martinussen, & Morch, 2008; Kazdin, Bass, Ayers, & Rodgers, 1990; Leichsenring, Rabung, & Leiging, 2004; Lewinsohn & Clarke, 1999; Maughan, Christiansen, Jenson, Olympia, & Clark, 2005; McCleod & Weisz, 2004; Messer & Abbass, 2010; Michael & Crowley, 2002; Oei & Dingle, 2008; Olympia & Clark, 2005; Peleikis & Dahl, 2005; Reinecke, Ryan, & DuBois, 1998a, 1998b; Shadish et al., 1997; Shapiro & Shapiro, 1982; Smith & Glass, 1977; Smith et al., 1980; Smith, Bartz, & Richards, 2007; Stage & Quiroz, 1997; Weisz, McCarty, & Valeri, 2006; Weisz, Weiss, Alicke, & Klotz, 1987; Weisz, Weiss, Han, Granger, & Morton, 1995). I have used meta-analytic techniques as one of my basic ways of expressing magnitude of change within and across the patients of my private practice. My results have consistently surpassed the mean ESs given in the reviews listed above.

The most common meta-analyses are reviews of the results of random controlled treatments or trials (RCTs) in which one or more groups of treated cases are compared to one or more control or contrast groups. In contrast, my meta-analyses have compared how a patient was functioning at intake to how much that patient has changed over time. Meta-analyses of such within-cases results have not appeared as frequently as those of RCTs, but they do exist. In analyzing the findings from 18 meta-analyses of within-patients-outcomes (pre-therapy versus post-therapy) research, the mean (SD) ES was 1.35 (0.39), 95% CI [1.16, 1.55] (Burlingame, Fuhriman, & Mosier, 2003; Clement, 2008; de Maat, de Jonghe, Schoevers, & Dekker, 2009; Driessen, Cuijpers de Maat, Abbass, de Jonghe, & Dekker, 2010; Friedman, Cardemil, Uebelacker, Beevers, Chestnut, & Miller, 2005; Hofman, Sawyer, Witt, & Oh, 2010; Huber, Henrich, & Klug, 2005; Kazdin & Whitley, 2006; Leichsenring & Leibing, 2003; Leichsenring, Rabung, & Leibing, 2004; Maughan, Christiansen, Jenson, Olympia, & Clark, 2005; Michael & Crowley, 2002; Minami, Wampold, Serlin, Kircher, & Brown, 2007; Norton & Philipp, 2008; Oei & Dingle, 2008; Stiles, Barkham, Connell, & Mellor-Clark, 2008; Stiles, Barkham, Mellor-Clark, & Connell, 2008; Stiles, Barkham, Twigg, Mellor-Clark, & Cooper, 2006).

A one-way ANOVA on the mean ES of the 28 reviews of RCT research listed above and the 18 reviews of within-cases studies produced the following results: F (1, 43) = 39.80, p = 1.299E −007. Published meta-analyses of mean ESs from RCTs have greatly underestimated how much a given patient improves during a course of psychotherapy.

Some of the meta-analyses carried out throughout the 1980s and early 1990s identified specific psychological treatments that were effective for treating particular disorders, for example, depression (Dobson, 1989; Steinbrueck, Maxwell, & Howard, 1983), generalized anxiety disorder (Gould, Otto, Pollack, & Yap, 1997), obsessive-compulsive disorder (Cox, Swinson, Morrison, & Lee, 1993; Christensen, Hadzi-Pavlovic, Andrews, & Mattick, 1987). In response to such findings David Barlow, president of the American Psychological Association’s Division of Clinical Psychology, appointed a Task Force on Promotion and Dissemination of Psychological Procedures in 1993 (Sanderson & Woody, 1995a, 1995b). During the ensuing years many journal articles and books identified, described, and provided treatment manuals for such empirically supported treatments. But there were many protests against these lists. For many years at the annual conventions of the American Psychological Association (APA) there were debates about the appropriateness of such lists.

Partly in response to these debates APA president Ron Levant created a Presidential Task Force on Evidence-Based Practice in 2005 to investigate the issues raised. That task force produced a report that ultimately became a policy statement of the association (APA Presidential Task Force, 2006). The task force concluded, “Evidence-based practice requires that psychologists recognize the strengths and limitations of evidence obtained from different types of research. Research has shown that the treatment method. . . . the individual psychologist. . . . the treatment relationship . . . .and the patient . . . . are all vital contributors to the success of psychological practice” (p. 275). The report called for the collection of effectiveness evidence to complement efficacy results from randomize controlled trials (RCTs).

Unfortunately there is very little published effectiveness data gathered from the routine private practice of psychotherapy. References to “usual clinical care” have been misleading (e.g., Weisz, Jensen-Doss, & Hawley, 2006) because they overwhelmingly refer to findings from institutional settings, such as clinics, hospitals, and residential treatment centers. In contrast, according to the APA database on “Employment Characteristics of APA Members,” between 58% to 78% of psychologists who are employed full-time providing mental health services are in independent practice. In addition there are many more psychologists who are salaried by colleges, universities, hospitals, clinics, etc. who maintain part-time private practices. We know almost nothing about their treatment outcomes. The present article provides an exception.

Most of the exceptions that do exist involve samples from an individual practice or from a group practice rather than presenting outcomes for all cases seen by an individual therapist or by a group of private practitioners. For example, Persons, Burns, and Perloff (1988) gave their results from treating 70 depressed adult patients in private practice using cognitive therapy. Similarly, Wise (2003) presented treatment outcomes for an intensive outpatient program with 225 patients.

There is another movement that complements empirically supported treatments and evidence-based practice. It is the call for practice-based evidence. Mellor-Clark, Barkham, Connell, and Evans (1999) gave an early example. Their article introduced the Clinical Outcomes in Routine Evaluation (CORE) information management system as well as emphasized the importance of collecting practice-based evidence to complement that obtained through RCTs and other avenues. More recently Barkham, Hardy, and Mellor-Clark (2010) edited a book calling for and demonstrating practice-based evidence. The present article is an example of practice-based evidence of the effectiveness of psychotherapy within a private practice.

Method

In general, I followed the methods described in several earlier articles (Clement, 1996, 1999, 2011), in which I used a five-level Global Estimate of Outcome (GEO) score as follows: 1 ~ Much Worse at termination than at intake (i.e., the level of functioning was at least 50% worse than at intake), 2 ~ Worse at termination than at intake (i.e., the level of functioning was 11% to 49% worse than at intake), 3 ~ No Change since intake (i.e., the level of functioning at termination was within plus or minus 10% of what it had been at intake), 4 ~ Better than at intake (i.e., the level of functioning at termination was 11% to 49% better than at intake), and 5 ~ Much Better than at intake (i.e., the level of functioning at termination was 50% or better than at intake)(cf., Clement, 1994). When ESs were available I converted to GEO scores as follows: If ES = -1.50 or less, the GEO score was 1; if ES = -0.51 to -1.49, the GEO score was 2; if ES = -0.50 to +0.50, the GEO score was 3; if ES = +0.51 to +1.49, the GEO score was 4; and if ES = +1.50 or greater, the GEO score was 5 (cf., Clement, 1999, 2008, 2011).

In the fall of 1988 I began reviewing all closed cases and assigned a GEO score based on all materials within the folder. For some cases, particularly children, observational data facilitated making a quantitative judgment. For a majority of cases such observational data did not exist.

About 25 years ago I started using problem checklists to evaluate functioning at intake and at subsequent re-evaluations. Each checklist contains over 60 items. These checklists are available in Clement (1999). The patient or the parent of a minor patient rates each problem using a 10-point Scale of Functioning (SOF): 10 = Excellent Functioning, 9 = Good Functioning, 8 = Slight Problem, 7 = Some Problem, 6 = Moderate Problem, 5 = Serious Problem, 4 = Major Problem, 3 = Unable to Function, 2 = In Some Danger of Hurting Self or Others, and 1 = In Persistent Danger of Hurting Self or Others. The mean (SD) number of items scored at intake has been 16.72 (11.13), 95% CI [15.58, 17.86], the median has been 14, and the range has been 1–62.

I also calculate a Global Assessment of Functioning (GAF) score at intake by determining the mean of the SOF scores, subtracting the standard deviation of the SOF scores for the patient from the mean, and multiplying the result by 10. This approach to determining the GAF score at intake uses the quantitative ratings of the patient as described in the preceding paragraph to determine level of functioning. The more common practice is for the therapist to make an impressionistic estimate of the patient’s level of functioning. Using my method the mean (SD) GAF score at intake across all cases has been 44.42 (12.84), 95% CI [43.73, 45.11], the median has been 45, and the range has been 2 to 85.

Therapist

I am a Caucasian male of Western European descent. I grew up on a small farm north of Seattle. I started my career in 1965 at the Division of Medical Psychology, UCLA Department of Psychiatry. In 1967 I joined the faculty of the Graduate School of Psychology, Fuller Theological Seminary. Given my rural origins, I have always had a bias toward identifying what works to solve a given kind of problem and toward obtaining empirical evidence regarding how much the phenomenon in question has changed. I do not identify with any of the brand-name psychotherapies, for example, psychodynamic, client-centered, behavioral, cognitive, cognitive-behavioral, Gestalt, etc. Part of my approach has been to lean on theoretical models that have been supported by empirical research. In this spirit I spent the first 23 years of my career doing controlled treatment outcome studies of psychotherapy with children. In late 1988 I left academia and shifted from part-time to full-time private practice. I have always been interested in measuring treatment outcomes within my private practice. I have affirmed the empirically oriented movements including empirically supported treatments, evidence-based practice, and now practice-based evidence.

Participants

From July 1966 through July 2011 I had 2,258 intakes to my private practice. Of these 201 came only for psychological assessment, 386 (18.76%) dropped out without receiving any identified intervention, 40 cases involved consultation without any intervention, 32 intervention cases had not yet produced outcome data, and 1,599 cases had produced outcome data.

Diagnoses

My career has spanned all editions of the Diagnostic and Statistical Manual (DSM) of the American Psychiatric Association; however, originally I used DSM-III-R (American Psychiatric Association, 1987) to record diagnoses for patients seen during the first 22 years of my practice. Before performing the analyses for the present article I updated all diagnoses using DSM-IV-TR (American Psychiatric Association, 2000).

In decreasing order of frequency the most common diagnoses were as follows: dysthymic disorder (n = 468); adjustment disorders (n = 403); attention-deficit/hyperactivity disorder, combined or predominantly hyperactive/impulsive (326); oppositional defiant disorder (283); generalized anxiety disorder (239); partner relational problem (220); major depression (165); social phobia (141); attention-deficit/hyperactivity disorder, predominantly inattentive type (121); no diagnosis (101); specific phobia (87); obsessive-compulsive disorder (85); panic disorder with and without agoraphobia (68); separation anxiety disorder (54); acute stress disorder and posttraumatic stress disorder (48); parent-child relational problem (41); sibling relational problem and relational problem NOS (40); bipolar disorders (38); academic problems (36); anxiety disorders NOS (23); body dysmorphic, hypochondriasis, or somatoform disorder (20); sexual disorders (16); Tourette’s disorder (15); schizophrenias and schizoaffective disorder (13); and autistic disorder and pervasive developmental disorder NOS (10).

Of all patients seen I had recorded only one Axis I diagnosis for 66% of them, two Axis I diagnoses for 34%, three Axis I diagnoses for 13%, four Axis I diagnoses for 4%, and one Axis II diagnoses for 2.5%. I do not know how these numbers compare with trends in other life-span private practices.

Sessions per Case

Of the 1,599 treatment cases with outcome data the mean (SD) number of sessions per case was 18.82 (29.89), 95% CI [17.35, 20.29], the median was 10, and the range was 1-344. Three-quarters of my patients completed treatment within 20 sessions, and approximately 82% finished within 25 sessions; therefore, the majority of my practice has consisted of what the literature identifies as “short-term therapy” (e.g., Watkins, 2012).

Storage and Retrieval of Data

When I opened a new case, I entered essential data about the patient into an electronic database (Microsoft Access). The database includes fields for identifying information on the patient, diagnoses, level of functioning at intake, and treatment outcome scores as well as other facts. All patients ever seen by me in my private practice were included. I could retrieve information from the database by running a query on one or more fields, such as age, sex, or diagnosis. Similarly, I could arrange the retrieved information by giving a sort command, for example, to list patients by age from the youngest to the oldest. Before analyzing any quantitative data I pasted the query from Access into Microsoft Excel.

Statistical Analyses

I copied raw data from Excel and used the StatMost statistical software package (1994) to perform statistical analyses.

Cancellations and No-Shows

Although I did not track cancellations and no-shows during much of my career, I have done so since January 1999. The mean (SD) percentage of cancelled and broken appointments per week has been 15.97 (0.07), 95% CI [15.44, 16.51].

Results

Who Made Referrals to Me?

In descending order of frequency the following have been my sources of referrals: psychologists 23.85%, patients 18.96%, managed care companies 18.52%, miscellaneous sources 15.36%, physicians 8.13%, school personnel 4.57%, unknown sources 2.66%, clergy members 2.58%, family members of the patient 1.87%, health care providers other than those mentioned elsewhere in this paragraph 1.87%, attorneys 1.02%, friends of the patient 0.27, social workers 0.27%, and patient him/herself 0.09%.

How Did Outcome Vary by Therapist Estimate and by Patient (or Parent) Rating?

For more than half of the 45 years in question, I did not calculate ESs; therefore, the GEO scores from those years were based on my assessment of all information within each patient’s record. In addition, since I started determining ESs many patients have drifted away from therapy without completing a termination interview. Obtaining self-ratings from patients after they stopped coming to sessions has been very difficult, so that there are many cases for which I have had to determine the GEO scores without the benefit of the patient’s self-rating. Only about 35% of treated patient provided self-ratings at termination using the Scale of Functioning and one of my checklists. For these patients I followed the procedures for transforming an ES into a GEO score as described in the first paragraph of the Methods section above. I wondered who would evaluate my outcomes more favorably: Would I or the patients do so?

My estimates were the basis of GEO scores for 1,041 cases. The mean (SD) of these ratings was 3.78 (0.80), 95% CI [3.73, 3.83]. The patient’s or parents’ ratings using the SOF and subsequent ESs were the basis of GEO scores for 558 cases. The mean (SD) of these ratings was 4.43 (0.73), 95% CI [4.37, 4.49]. An analysis of variance on these data produced the following results: F (1, 1,597) = 260.14, p = 0.0000. Patients and parents rated the outcomes more positively than I did.

How Much Did Patients Improve?

Percent Improved

Out of all treated cases with outcome data 4 (0.25%) were Much Worse, 11 (0.69%) were Worse, 497 (31.08%) showed No Change from intake, 546 (34.15%) were Improved, and 541 (33.83%) were Much Improved, for an overall improvement rate of about 68%.

Global Estimate of Outcome (GEO) Score

The mean GEO (SD) score for 1,599 cases was 4.01 (0.84), 95% CI [3.97, 4.05], the median was 4.00, and the range was 1-5.

Effect Size

The mean (SD) ES for 558 cases at termination was 1.90 (1.61), 95% CI [1.77, 2.04], the median was 1.62, and the range was from −2.91 to + 15.22. Of all ESs 3.14% were negative.

How Did Outcomes Vary By Diagnosis?

Percent Improved

Table 1 shows outcome by diagnosis with the best outcomes listed toward the top of the table and the worst outcomes listed toward the bottom. The table only includes diagnoses with at least 10 cases. For this list Chi Square was 99.18 with 23 df, p = 0.0000.

Table I. IMPROVEMENT RATES IN DESCENDING ORDER BY DIAGNOSIS.

Diagnostic Group	n	% Improved
Separation anxiety disorder	20	90.00
Encopresis and enuresis	16	87.50
Panic disorder without agoraphobia	15	86.67
No diagnosis on Axis I or II	28	85.71
Specific phobia	34	82.35
Eating disorders	15	80.00
Sexual disorders	10	80.00
Social phobia	60	76.67
Adjustment disorders	228	75.00
Generalized anxiety disorder	98	71.43
Sibling relational problem	21	71.43
Dysthymic disorder	221	70.14
A-D/HD, combined or hyperactive/impulsive	188	68.09
Oppositional defiant disorder	109	67.89
A-D/HD, predominantly inattentive type	48	64.58
Obsessive compulsive disorder	57	63.16
Anxiety disorder NOS	13	61.54
Parent-child relational problem	18	61.11
Major depressive disorder	99	60.61
Intermittent explosive disorder	12	58.33
Bipolar disorders	24	54.17
Partner relational problem	119	50.42
Panic disorder with agoraphobia	13	46.15
Conduct disorder	30	43.33

Effect Size

Table 2 lists mean ES by diagnosis in descending order of effectiveness. The table only includes diagnoses with 10 or more cases. For this list F (1, 464) = 1.38, p = 0.1768.

Table II. MEAN ES IN DESCENDING MAGNITUDE BY DIAGNOSIS

Diagnostic Group	n	ES (SD)	95% CI
Major depressive disorder	46	2.44 (2.56)	1.68, 3.20
Specific phobia	10	2.19(1.38)	1.19, 3.17
Generalized anxiety disorder	39	2.11 (1.97)	1.47, 2.75
Dysthymic disorder	102	2.01 (1.52)	1.71, 2.31
Adjustment disorders	83	1.93 (1.36)	1.64, 2.23
Social phobia	22	1.92 (2.09)	1.00, 2.85
Oppositional defiant disorder	37	1.90 (1.64)	1.35, 2.44
A-D/HD, combined or predominantly hyperactive/impulsive	59	1.81 (1 31)	1.47, 2.16
Bipolar disorders	12	1.70 (1.21)	0.93, 2.47
Partner relational problem	22	1.43 (1.46)	0.78, 2.07
Obsessive compulsive disorder	25	1.35 (1.31)	0.81, 1.89
A-D/HD, predominantly inattentive	20	1.26 (1.06)	0.77, 1.76

How Did Outcomes Vary by Patient Age?

Percent Improved

In decreasing order of effectiveness the results were as follows: age 0.5 (6 months?)-5 years, 85.33% improved (n = 75); age 6 years-12years, 71.59% improved (n = 352); age 20 years-29 years, 70.11% (n = 174); age 30 years-39 years, 68.04% (n = 291); age 40 years-49 years, 65.61% (n = 253); age 60 years-88years 63.64% (n = 44); age 50 years-59 years 63.49% (n = 126); and age 13 years-19 years, 63.41% (n = 276). The Pearson correlation between the mid point of each age range and the percent improved was r = -0.68, n = 8, p = 0.0609.

Effect Size

The mean ESs in descending order of magnitude by age group were as follows: age 60 years–88 years, 2.55 (3.54), 95% CI [0.66, 4.44], n = 16; age 0.5–5 years, 2.34 (1.78), 95% CI [1.62, 3.06], n = 26; age 50 years–59 years, 2.09 (1.76), 95% CI [1.59, 2.60], n = 49; age 30 years–39 years, 2.05 (1.54), 95% CI [1.74, 2.35], n = 102; age 20 years–29 years, 1.96 (1.58), 95% CI [1.56, 2.36], n = 62; age 6 years 12 years, 1.83 (1.46), 95% CI [1.55, 2.10], n = 110; age 40 years–49 years, 1.75 (1.56), 95% CI [1.45, 2.05], n = 105; and age 13 years–19 years, 1.66 (1.29), 95% CI [1.40, 1.92], n = 99. The Pearson correlation between the mid point of each age range and the mean ES was r = 0.42, n = 8, p = 0.3010.

How Did Outcomes Vary by Sex of the Patient?

Percent Improved

Of female patients 68.76% improved (n = 653). Of all male patients 67.60 % improved (n = 929). Although this is a slight difference, it has held up across the decades.

Effect Size

For all female patients the mean ES was 1.94 (1.73), 95% CI [1.71, 2.16], n = 225. For all male patients the mean ES was 1.85 (1.15), 95% CI [1.69, 2.01], n = 354.

Was There a Relationship between Treatment Length and Outcome?

Percent Improved

To answer this I ordered all treatment cases from the least number of sessions to the most number of sessions. Then I identified blocks of adjacent sessions. The first six blocks contained 49 cases each. The remaining 26 blocks contained 50 cases each. The first block had only one or two sessions per case with a median of two. The second block had a median of three sessions. The final block had a median of 125 sessions. Then I calculated the mean % improved within each block. Figure 1 presents the results in graphic form. The Pearson correlation between median sessions per block and percent improved was as follows: r = 0.63, n = 32, p = 9.56E-05.

Effect Size

There was great variability in the improvement patients achieved with a given amount of therapy as measured by ES. The Pearson correlation of ES versus sessions-per-individual case was as follows: r = 0.00, n = 683. I did a second analysis similar to the one presented in the preceding paragraph. I created blocks of sessions from the fewest sessions to the most. The first eight blocks contained 28 entries. The remaining 17 blocks contained 27. The median number of sessions per block ranged from 3 for the first block to 115 for the last block on the right. Then I determined the mean ES within each of these 25 blocks. The Pearson correlation between median sessions and mean ES was as follows: r = 0.07, n = 25, p = 0.7339.

For some individual patients I obtained from two to five ESs across time. For such cases I determined the Spearman rank order correlation by comparing the order the ESs were obtained (i.e., first, second, etc.) with the magnitude of each ES within a patient (i.e., 1^st = smallest ES, 2^nd = next larger ES, etc.). The results were as follows: r_Spearman = 0.63, n = 194, p = 5.42E-23.

Was there a Relationship between Treatment Format (Modality) and Outcome?

Percent Improved

Figure 2 shows the mean percent improved for each of five treatment formats. A Chi Square performed on the depicted data was 18.24 with 4 df, p = 0.001108.

Figure 2. MEAN OUTCOME BY TREATMENT FORMAT (MODALITY). CONSULTATION N = 18, GROUP N = 8, FAMILY N = 482, NDIVIDUAL N = 934, COUPLE N = 158.

ES. Sufficient data to analyze outcome by treatment format were only available for three formats. The mean ES for individual therapy was 2.00 (1.72), 95% CI [1.82, 2.17], n = 366. The mean ES for family therapy was 1.77 (1.41), 95% CI [1.56, 1.98], n = 172. The mean ES for couples therapy was 1.37 (1.34), 95% CI [0.96, 1.79], n = 42. An analysis of variance on these data produced the following results: F(2, 577) = 3.44, p = 0.0326.

Has My Therapeutic Effectiveness Changed over Time?

Percent improved

The Pearson correlation between mean percent improved and year of my career was as follows: r = -0.35, n = 45, p = 0.013. For the first 22 years of my career I maintained a part-time practice. For the more recent 23 years I shifted to full-time private practice. Given this fact I ran a Pearson correlation between the number of new cases opened per year versus the mean percent improved: r = -0.31, n = 45, p = 0.0391: the more cases seen in a given year, the poorer the outcome. I also determined the percentage of managed care cases seen each year and calculated a Pearson correlation between percentage of managed care cases and mean percent improved per year: r = -0.32, n = 45, p = 0.0319: the more managed care cases seen in a given year, the poorer the outcome. I computed the Pearson correlation between year of my career and percent of all cases that were children (12 years old and younger): r = -0.31, n = 45, p = 0.0375: as the years went by, a smaller proportion of my annual caseload consisted of children. Finally, I correlated percent improved each year versus percent of cases that were children each year: r = 0.51, n = 45, p = 0.0003: the greater the proportion of my annual caseload consisting of children, the better my outcomes.

Effect size

I have only had 10 or more ESs per year for the most recent 20 years. I ran a Pearson correlation between the year of my practice and the mean ES within each year. The result was as follows: r = -0.29, n = 20, p = 0.1661.

Discussion

Distinctive Features of the Present Report

The present article includes many unique features. It covers the lifespan with the patients ranging in age from 6 months to 88 years at intake. It spans all editions of the DSM through DSM-IV-TR. It includes a wide range of DSM diagnoses and shows differences in outcomes among these diagnoses. It provides quantitative analyses of treatment outcomes. It traverses 45 years of one psychotherapist’s private practice. It incorporates all cases of the practice, not just a sample. It takes into account an extraordinary number of cases. It demonstrates the importance of performing pre-/post-treatment analyses. It reveals outcomes from routine clinical practice.

Therapist Versus Patient (or Parent) Outcome Rating

According to Minami, Wampold, Serlin, Kircher, and Brown (2007) measures are considered high on reactivity if they are assessed by a clinician and low on reactivity if patients provide self-report data. Measures that focus on symptoms or the targets of treatment are classified as high on specificity but when the measures cover global functioning they are considered low on specificity. These authors (2007) provided pre-/post-treatment ES benchmarks in the treatment of major depression. High-reactivity/high-specificity measures produced the largest ESs. Low-reactivity/high-specificity measures produced intermediate ESs. And low-reactivity/low-specificity measures produced the smallest ESs. When my patients (or their parents) provided pre-treatment and post-treatment ratings on problem checklists, their ratings seemed to match Minami et al.’s definition of low-reactivity/high-specificity measures. When I provided a global estimate of outcome (GEO) by reviewing all evidence within a case folder, these scores seemed to be examples of high-reactivity/low-specificity. The results provided by Minami et al. did not provide a clue as to where high-reactivity/low specificity results would fall in comparison to the three combinations that they did examine. My results were clear: the patients and their parents rated treatment outcomes more positively using low-reactivity/high-specificity measures than I did using high-reactivity/low-specificity measures.

Dropouts

Patients who drop out of treatment before a mutually planned termination date is a problem both in routine practice and controlled research. According to a recent review of 669 studies covering 83,834 patients, the dropout rate was 19.7%, 95% CI [18.70%, 20.70%] (Swift & Greenberg, 2012). In my private practice of all patients seeking treatment 386 (18.76%) dropped out after one, two, or three sessions without receiving any identified treatment.

Improvement

Overall about 68% of my patients have improved, 31% have not improved, and 1% have gotten worse. These results are strikingly similar to those reported for 3,672 cases of practice-based evidence in the United Kingdom (Barkham et al., 2008). These researchers said that 67.50% had improved, 31.8% showed no reliable change, and 0.7% deteriorated.

My mean ES at termination of 1.90 is probably more representative of outcomes in routine private practice than mean ESs reported from RCTs. As mentioned earlier Minami et al. (2007) determined benchmarks for treatment outcomes with adults presenting with major depression. For 846 “completers” plus “early terminators” the mean pre-/post-therapy ES using the Beck Depression Inventory (a low-reactivity/high-specificity measure) was 1.71. For 1,387 “completers” the mean ES was 1.86. This latter mean is very similar to my mean ES of 1.90 across all of my patients. My mean ES for major depression was 2.44.

There are other findings from routine clinical practice to compare with my results. For example, Barkham, Mellor-Clark, Connell, Evans, Evans, and Margison (2010) reported pre-/post-therapy ESs for 9,337 patients treated in various service settings within the UK. The mean of the five means that they published was 1.34. The grand mean falls between the mean ES of 1.24 for children with oppositional defiant disorder and of 1.47 for those with conduct disorder published by Kazdin and Whitley (2006) in their research program. These latter two means were also based on pre-/post-therapy comparisons. As indicated in the Discussion section of this article, such pre-/post-therapy ESs are consistently greater than between-groups ESs normally reported in meta-analyses of controlled treatment outcome research (e.g., Casey & Berman, 1985; Smith & Glass, 1977; Smith et al., 1980; Weisz et al., 1995).

Hollon (1996) asserted, “[T]he available evidence appears to differ between the child and adult literatures, but in neither does [sic] outcomes observed in applied settings exceed those observed in controlled clinical trials” (p. 1028). In fact, the present mean ES clearly exceeds the mean ESs reported in published meta-analyses of RCTs on child, adolescent, and adult psychotherapies. Hollon did say, “I agree with Seligman that the best way to determine what goes on in actual clinical practice is to study it directly” (p. 1025), and Kazdin and Weisz (1998) declared, “[T]he magnitude of therapeutic change is an issue in need of much greater attention” (p. 30). The ES results reviewed above focused on magnitude of therapeutic change.

The typical RCT is not designed to determine magnitude of therapeutic change. Usually its purpose is to demonstrate a causal relationship between a particular treatment and its effect in comparison to a control or contrast condition (cf., Seligman, 1995). The resultant ESs show differences between the treated and the control/contrast group. Most investigators employing RCT designs do not report pre-/post-treament ESs for their cases (Kazdin, Bass, Ayers, & Rodgers, 1990).

Asay and Lambert (1999) warned, “Client outcome is principally determined by client variables and extratherapeutic factors rather than by the therapist or therapy” (p. 43); hence, the positive results presented above can be celebrated without claiming that I or the particular treatments that I used were primarily responsible for the measured gains. These same authors estimated that 15% of improvement comes from expectancy (placebo effects), 15% from techniques, 30% from the therapeutic relationship, and 40% from extratherapeutic change. There is evidence that placebo effects may even be greater than the estimate of Asay and Lambert. For example, Glass and Kliegl (1983) gave an average ES of 0.56 from “placebo treatment.” Dush, Hirt, and Schroeder (1989) claimed, “Collective results from other meta-analyses suggest an average placebo ES of about a third to a half of a standard deviation, in comparison with the ES of no treatment” (p. 100).

Of all ESs obtained in my results, 3.14% were negative. In comparison, Dush et al. (1989) reported that 15% of the ESs in their meta-analysis of self-statement modification in children were negative. Smith and Glass (1977) said that 12% of the 833 ES measures that they calculated were negative.

Outcome by Diagnosis

The percent of my patients who showed improvement varied substantially across diagnoses. Although this finding is consistent with findings from some controlled research (e.g., Lambert & Archer, 2006), researchers tend to focus on one diagnostic group at a time. And therapists in private practice rarely publish their treatment outcomes; therefore, the present report, covering 45 years of one therapist’s private practice and all patients seen, is unique. Although there was quite a large range in my mean ESs across diagnoses, the large variance in ESs within diagnoses blurred the apparent differences. Nevertheless, these findings are distinctive in the published literature on outcomes of psychotherapy.

Outcome by Patient Age

There was no clear or consistent relationship between age groups of patients and treatment outcome. Regarding percent improved there was a negative but not quite statistically significant correlation, indicating that my younger patients improved more than my older ones. However, for mean ES, there was a positive but not statistically significant correlation. Looking at just the three youngest age groups (age 0.5 years–5 years, 6 years–12 years, and 13 years–19 years) there was consistency in the trends of the two measures of outcome: percent improved and ES. Preschoolers faired best. Children came in second. And teenagers had the poorest outcome. This trend has held up across my career.

Outcome by Sex of Patient

The slight but statistically nonsignificant advantage in outcome for females in my practice is consistent with previous research. For example, Casey and Berman (1985) found that the greater the proportion of males in a study, the smaller the ESs. And in their review of psychotherapy with children and adolescents, Weisz, Weiss, Han, Granger, and Morton (1995) reported that females faired better than males. Similarly, Smith et al. (1980) determined a small, negative correlation between the percent of males in a study and ES. Thirty years later Parry, Castonguay, Borkovec, and Wolf (2010) reported that of 220 clients treated by 57 clinicians in the community, female clients showed more improvement than male clients.

Outcome by Duration of Treatment

With the exception of patients who improved with just one or two sessions, the greater the number of treatments, the greater the percentage of patients who improved. This finding is consistent with that of Messer and Boals (1981). Clearly the dose-effect phenomenon is not always found. For example, in their meta-analysis of outcomes of psychotherapy with children Casey and Berman (1985) concluded that length of treatment was negatively related to mean ES: r = -0.28 (p = 0.02); whereas, Smith and Glass (1977) and Smith et al. (1980) in their meta-analyses of adult psychotherapy outcomes found no significant relationship between the duration of treatment and outcome. In discussing such findings Howard, Kopta, Krause, and Orlinsky (1986) warned, “. . .a between-study analysis . . . has no necessary implication for the relationship between duration and benefit within each study” (p. 159). Their article provided a quantitative model estimating the percentage of patients improved for specific amounts of psychotherapy. I calculated the Pearson correlation between their model and my outcome data at 4, 8, 13, 26, 52, and 104 sessions: r = 0.92, n = 6, p = 0.0093. These findings are consistent with those of the Consumer Reports study (Seligman, 1995). Although there is no correlation across all cases between number of treatment sessions and mean ES, when multiple ESs are obtained on the same patient across time, there is a strong relationship between amount of treatment and outcome. This finding is also consistent with Howard’s dose-response model.

Outcome by Treatment Format

There is a significant difference in my outcomes based on treatment format (modality). Unfortunately I have not found an article by any other therapist or researcher that compared the five modalities compared in Figure 2. I am unsure whether the trends in my results would generalize to those of other therapists. Young (2007) retrospectively evaluated her outcomes from 55 years of practice as a psychoanalyst and psychoanalytic psychotherapist. During that period of time she had treated just 231 people. She said that she provided “classical analysis” to 22 patients (p. 316), “modified analysis” to 19 seen once or twice a week (p. 316), “analytically oriented psychotherapy” to 26 (p. 316), “insight-oriented supportive therapy” to 65 (p. 316), and “supportive therapy” to 47 (p. 316). She reported that her success rates ranged from 40.4% improved with supportive non-insight oriented treatment to 94.5% improved with modified analysis.

Therapist’s Effectiveness Across 45 Years

Measured by percent improved per year, not only have I failed to improve across the years, my outcomes have gotten worse across time. This trend was puzzling and troubling. Switching from part-time to full-time practice corresponded to a drop in effectiveness. This finding was similar to that reported by Borkovec, Echemendia, Ragusea, and Ruiz (2001); increases in therapists’ caseloads corresponded to lessened treatment outcomes. Parry et al. (2010) reported the same trend for 57 therapists in Pennsylvania. Similarly, increases in the proportion of my case load per year that had been referred by a managed care company corresponded with poorer outcomes. During the first 23 years of my career I was a full-time academic. My research, writing, and teaching during those years focused on children. Unhappily there was a significant drop in the percent of my caseload consisting of children across the 45 years in question. This was too bad, because the greater the proportion of my caseload consisting of children in a given year, the greater the % of my patients that had improved.

The ES data did not show a significant change across time. Smith et al. (1980) concluded, “… there was no relationship between the years of experience of the therapists in a study and the magnitude of therapeutic effect produced in that study (r = 0.00)” (p. 117); however, the average amount of experience of therapists in the studies reviewed was only three and a quarter years. Dawes (2008) claimed, however, that over 500 studies of psychotherapy outcome had shown no relationship between amount of experience in the psychotherapists and patient outcomes.

One of my motives in starting to evaluate my treatment outcomes 24 years ago was to answer the question: How good am I as a psychotherapist? Although how many years of experience a therapist has had does not predict treatment outcomes, there is growing evidence that therapists vary greatly in their effectiveness. In their large-scale study in the UK Stiles and Barkham (2012) found that improvement rates for individual therapists ranged from 23.5% for the least effective therapist to 95.6% for the most effective therapist. Seidel (2012) reported very similar ranges of effectiveness in 268 therapists who had each treated at least 30 patients in the US. I suspect that most therapists would be interested in discovering where they fall in such distributions from least effective to most effective but simultaneously afraid to find out.

Closing Comment

Kazdin (2008) warned, “We do not benefit as a field from the accumulated practice of clinicians….” (p. 157). I hope that the results I have provided in the present report will encourage my psychotherapy colleagues to evaluate their outcomes and to publish their results. In closing, I want to say, “I have shown you my practice-based evidence. Now you show me yours!”

Footnote

This paper is an update of “Outcomes from 40 Years of Psychotherapy in a Private Practice,” which was published in the American Journal of Psychotherapy, 62/3

REFERENCES

Abbass, A., Kisely, S., & Kroenke, K. (2009). Short-term psychodynamic psychotherapy for somatic disorders: Systematic review of meta-analysis of clinical trials. Psychotherapy and Psychosomatics, 78, 265–274.

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

Abstract

Introduction

Method

Therapist

Participants

Diagnoses

Sessions per Case

Storage and Retrieval of Data

Statistical Analyses

Cancellations and No-Shows

Results

Who Made Referrals to Me?

How Did Outcome Vary by Therapist Estimate and by Patient (or Parent) Rating?

How Much Did Patients Improve?

Percent Improved

Global Estimate of Outcome (GEO) Score

Effect Size

How Did Outcomes Vary By Diagnosis?

Percent Improved

Effect Size

How Did Outcomes Vary by Patient Age?

Percent Improved

Effect Size

How Did Outcomes Vary by Sex of the Patient?

Percent Improved

Effect Size

Was There a Relationship between Treatment Length and Outcome?

Percent Improved

Effect Size

Was there a Relationship between Treatment Format (Modality) and Outcome?

Percent Improved

Has My Therapeutic Effectiveness Changed over Time?

Percent improved

Effect size

Discussion

Distinctive Features of the Present Report

Therapist Versus Patient (or Parent) Outcome Rating

Dropouts

Improvement

Outcome by Diagnosis

Outcome by Patient Age

Outcome by Sex of Patient

Outcome by Duration of Treatment

Outcome by Treatment Format

Therapist’s Effectiveness Across 45 Years

Closing Comment

Footnote

REFERENCES

Information

Published In

History

Keywords:

Authors

Affiliations

Notes

Metrics

Citations

Export Citations

View options

PDF/ePub

Get Access

Login options

Purchase Options

Not a subscriber?

Figures

Other

Share

Share article link

Share