Reviews of short-term outcome studies have suggested that patients make minimal improvements in the first two years after the diagnosis of borderline personality disorder has been established (
5,
6). Najavits and Gunderson (
7) have pointed out that many studies lack within-group statistical comparisons and that the studies show considerable variation in results. They note differences in methodology, assessment points, instruments used to measure outcome, blindness of raters, and rigorousness of diagnosis.
In this report we provide treatment evaluation data from a prospective study of 216 patients with severe personality disorders who received intensive inpatient treatment in one of two private, not-for-profit psychiatric hospitals. To our knowledge this is the largest sample of patients with serious personality disorders studied prospectively.
We wished to answer three questions: Do patients with severe personality disorders improve with intensive inpatient treatment? Alternatively, do they regress and develop pathological forms of dependency with such treatment? If they do improve, in what areas of ego functioning and symptomatology does the improvement occur? To study these questions, we used a pretest-posttest follow-up design involving repeated measures of one group to test the effectiveness of treatment under naturalistic conditions.
Methods
Treatment program
At the time of this study the C. F. Menninger Memorial Hospital in Topeka, Kansas, and Harding Hospital in Worthington, Ohio, the two study settings, had similar treatment programs. The inpatient approach was milieu oriented, with a strong emphasis on group treatment and individual psychotherapy. A continual effort was made to strike a therapeutic balance between exploratory goals and ego-building, rehabilitative goals.
Patients were seen in daily rounds by their hospital psychiatrists, and two-thirds of our sample received individual psychotherapy two or three times a week with a clinician other than the hospital psychiatrist. The vast majority of patients (96 percent) received psychopharmacologic treatment as well. Group approaches included group therapy, community meetings, patient and staff team meetings, and various specialized groups for psychoeducation, relapse prevention, and the like.
Subjects
Patients admitted to the general psychiatry units at the Menninger and Harding hospitals who were age 18 or older and had an IQ over 70 were eligible to be contacted for informed consent for participation in this project. Although clinicians had the option of recommending that a patient not be contacted, this option was never exercised.
Of the 364 Menninger patients who entered the project and participated in the admission assessment, 325 also participated in the discharge and one-year follow-up assessments, a dropout rate of 10.7 percent. Of the 253 Harding patients who entered the project, 62 completed the follow-up, a dropout rate of 75.5 percent. The overall dropout rate was thus 37.3 percent, with 387 patients completing follow-up assessments. The greater loss of patients at the Harding site appeared to be related to managed care. Many patients were bitter about having to leave the hospital abruptly, and they refused to participate in the research interviews. Also, staff attrition resulted in the loss of key figures in the project.
The subjects we describe in this study are a subsample of the patients who entered the project and did not drop out at discharge or one-year follow-up and who had a DSM-III-R diagnosis of personality disorder. Overall, 216 such patients were studied between December 1986 and December 1993. The representativeness of the sample is unknown.
The largest group of patients, 99, had a DSM-III-R diagnosis of personality disorder not otherwise specified or mixed personality disorder, indicating that many of the patients with personality disorders did not fit neatly into DSM-III-R categories. Seventy-five received a diagnosis of borderline personality disorder, nine dependent, nine histrionic, seven narcissistic, five avoidant, five obsessive-compulsive, four schizotypal, two passive-aggressive, and one each antisocial, schizoid, paranoid, and masochistic-self-defeating. Three patients were given two personality disorder diagnoses: one borderline and narcissistic, one borderline and histrionic, and one borderline and not otherwise specified.
We excluded patients who had comorbid organic brain disorders and psychotic disorders. As one might expect, comorbidity with other conditions was common. Affective disorders— major depression, bipolar disorder, and dysthymia— were diagnosed for 179 (82.9 percent) of the patients with personality disorders.
Although the diagnoses were not made on the basis of research interviews, they were well-informed clinical diagnoses. They were based not only on a psychiatric history and evaluation but also on extensive psychological assessment, family history, and observation in intensive treatment over an extended period, which is especially useful in making axis II diagnoses.
The patients' mean±SD age was 37.9±10.9 years, with a range of 18 years to 79 years. Two-thirds of the sample (144, or 66.7 percent) were women. The mean±SD number of prior hospitalizations elsewhere was 2.6±4.5 (range, 0 to 35), indicating that the sample may have been "treatment resistant" in the sense that hospitalization elsewhere had not resulted in remission of symptoms.
Overall, 106 subjects (49.1 percent) were married or remarried, 70 (32.4 percent) had never married, 37 (17.1 percent) were divorced or separated, and three (1.4 percent) were widowed. Despite the severity of the psychopathology among patients in the sample, their educational achievements were impressive. Fifty patients (23.4 percent) had a baccalaureate degree, 38 (17.8 percent) had a master's degree, and 14 (6.5 percent) had a Ph.D. or M.D. degree.
Length of stay
Patients' length of stay varied widely. The mean±SD length of stay was 137.33±183.81 days, ranging from ten days to 1,014 days. Because the mean appeared to be inflated by several very long stays, the median, 58 days, is a better index of the typical patient's stay.
During the course of the study, patients' length of stay began to be influenced by managed care reviewers, who stipulated the number of days that would be reimbursed by insurance companies. Hence we included a measure of whether finances had a significant influence on discharge planning. Of the 215 patients for whom we had such data, the discharges of 103 patients (47.9 percent) were determined at least in part by finances, and the discharges of 112 patients (52.1 percent) were not.
Procedures
As soon as possible after admission, eligible patients were contacted by a research assistant. After the subjects had been given a complete description of the study, written informed consent was obtained as directed by the internal review board. The research assistant then scheduled the first research interview with one of the project interviewers within one week after admission. The interviewers were practicing clinicians but did not work on the unit of the patient to be interviewed.
Data on admission were collected in semistructured face-to-face interviews lasting one to one and a half hours. Discharge data were collected in face-to-face interviews within two weeks before discharge when possible; otherwise, discharge interviews were conducted by telephone within two weeks after discharge. Follow-up interviews were conducted by telephone one year after discharge, with a window of one month.
All interviews were derived from Bellak's interview for the ego function scales (
15), with specific questions added from the Brief Psychiatric Rating Scale, the Global Assessment Scale, and the risk scales described below.
All interviews were rated by two clinician-raters, trained to reliability, who were not the patients' treating clinicians. If rating discrepancies on a particular scale exceeded a predetermined amount, the raters met and reconciled differences by consensus. Otherwise, the ratings were averaged. This procedure helped to reduce measurement error below the level of expected change.
Measures
Bellak's ego function scales. We used selected scales from Bellak's ego function scales to assess structural change (
15). These scales have been used widely in outcome and efficacy research, and many studies attest to their validity and reliability. They are 7-point scales with detailed definitions and descriptions of anchor points. Midpoints between scale points, such as 3.5 or 4.5, were also used in ratings. We selected scales for the following ego functions on the basis of their relevance to our patient population and to the treatment goals: reality testing; judgment; regulation and control of drives, impulses, and affects; object relations; thought processes; autonomous functioning; mastery-competence; and superego.
Reliability estimates (Spearman-Brown corrected intraclass correlations) were calculated for 30 patients balanced on assessment point and length of stay. Reliability estimates for the three pairs of raters were as follows: reality testing, .88, .81, and .94; judgment, .89, .89, and .88; regulation and control, .81, .81, and .91; object relations, .83, .47, and .68; thought processes, .84, .86, and .89; autonomous functioning, .82, .86, and .87; mastery-competence, .81, .85, and .83; and superego, .65, .81, and .74.
Brief Psychiatric Rating Scale (BPRS). The Brief Psychiatric Rating Scale (
16) is widely used to assess common psychiatric symptoms in outcome research. Symptoms are rated on a 7-point scale, ranging from 0, indicating not present, to 6, indicating extremely severe, on the basis of questions in the interview together with observations of the patient's behavior during the interview.
Because of problems in achieving reliability using the original scale-point descriptions, we developed a set of behavioral-example anchors as scale-point definitions (
17). Although we were able to achieve adequate reliability for the 18-symptom version of the BPRS, the scale distributions for many of the symptoms at admission were positively skewed to the right as a result of a preponderance of 0 or other low scores, indicating low levels of pathology; thus they would have been unlikely to show significant change with treatment. Thus the five scales with a sufficient level of pathology at admission (that is, a skewness coefficient of less than 1) were kept for statistical analysis— somatic concern, anxiety, depression, guilt, and hostility. Reliabilities, computed as described above, were as follows: somatic concern, .96, .97, and .97; anxiety, .96, .93, and .96; depression, .96, .96, and .95; guilt, .87, .90, and .98; and hostility, .88, .92, and .85.
Global Assessment Scale (GAS).The Global Assessment Scale is a 100-point scale that is divided into ten 10-point clinical bands to assess overall level of functioning (
18). It has been widely employed in outcome and other research, and many studies attest to its validity and reliability when raters have been trained. Reliability estimates were .68, .89, and .83.
Risk scales.Four risk scales were developed by the third author for hospital outcome research. They are 5-point scales ranging from 1, very high, to 5, not at all. Only the scales for substance abuse and suicide risk showed sufficient pathology at admission (skewness less than 1) to potentially show change. Reliability estimates were .83, .85, and .85 for substance abuse and .79, .62, and .63 for suicide risk.
Data analysis
For testing the significance of the change from admission through discharge and then one-year follow-up, a multivariate profile analysis of variance (SPSS repeated-measures multiple analysis of variance) was conducted for the set of scores within each measure. Post hoc univariate analyses were conducted for each of the scales in the set.
The change from admission to discharge and the change from discharge to one-year follow-up were then tested using Roy-Bargmann step-down tests (
19) to determine whether the change from admission to discharge was significant, and then whether the change from discharge to follow-up was significant when admission-to-discharge change was controlled for. This procedure avoids the inflated significance that often occurs when the changes from admission to discharge, from discharge to follow-up, and from admission to follow-up are tested separately.
Results
The results are summarized in
Table 1. The analysis of variance for the GAS was highly significant (F= 295.09, df=2, 213, p>.001). Step-down tests showed that the change in GAS ratings from admission to discharge was highly significant (F= 491.65, df=1, 214, p>.001), as was the change from discharge to follow-up (F=30.58, df=1, 213, p>.001). Inspection of the means— admission mean± SD of 39.66±6.60, discharge mean of 52.51±10.12, and one-year follow-up mean of 57.65±14.79— showed that both changes were in the direction of improvement.
Improvement was particularly impressive when the GAS ratings were examined according to the proportion of patients with scores of 50 or higher. The proportion of patients in this category was only 3.7 percent at the time of admission. By the time of discharge the proportion had increased to 55.1 percent, and by time of the one-year follow-up it had risen to 66.3 percent. The Bellak ego function scales and the BPRS provide some data on the psychological underpinnings of this significant change.
As indicated in
Table 1, the multivariate analysis of all Bellak ego function scales was highly significant. The post hoc univariate tests were also highly significant for all of the scales. The Roy-Bargmann step-down tests revealed a significant change from admission to discharge for all eight scales and an additional change from discharge to follow-up that was significant for all of the scales. Examination of the means showed that all eight scales showed significant gains from admission to discharge and further significant gains for all but autonomous functioning, which appeared to decline from discharge to follow-up.
The multivariate analysis of the two risk scales was highly significant, as were both post hoc univariate analyses. Step-down tests showed that the change from admission to discharge was highly significant and that the change from discharge to follow-up was significant. Inspection of the means showed that all of these changes were in the direction of improvement.
The multivariate analysis of the BPRS was highly significant as well, and all of the univariate analyses were significant. Step-down tests showed that change from admission to discharge was highly significant for all of the scales except anxiety. The change from discharge to follow-up was significant only for anxiety, and there was a trend toward significance for hostility. Inspection of the means revealed that the significant change (or trend toward significance) from admission to discharge was improvement for somatic concern, anxiety, depression, and guilt. On the other hand, hostility significantly increased from admission to discharge. The significant change from discharge to follow-up for anxiety was in the direction of improvement, as was the trend for hostility; however, the improvement for hostility was very small.
Discussion
In this sample of 216 patients with severe personality disorders, intensive inpatient treatment was associated with substantial positive change at discharge that held up at one-year follow-up. Previous long-term follow-up studies that measured global functioning at 14 years or more have found that about two-thirds of the patients were functioning at a level of fair or higher, a GAS rating above 50 (
1,
2,
3,
4). At follow-up of only one year, our sample was functioning at a similar level. Najavits and Gunderson (
7) noted that other short-term outcome studies have found patients with borderline personality disorder to be in the fair range after two to three years and that most of the improvement on the GAS may occur early in the treatment course. In our study a broad range of ego functions as well as several symptom measures improved.
The results are particularly encouraging because some evidence suggests that our sample included many patients who were treatment resistant. The patients had an average of 2.6 previous hospitalizations and high rates of comorbidity. Our results also are consistent with recent reports of patients with borderline personality disorder in outpatient psychotherapy who had some degree of hospitalization (
7,
8,
9,
10,
20,).
Our study had several shortcomings. First, rather than using research-based diagnostic instruments — that is, interviews by independent raters— we elected to use the DSM-III-R diagnoses made by clinicians. Although this approach may lack some rigor, it is in keeping with the naturalistic approach at the heart of the project.
Also, without randomization to different treatment conditions or comparison groups, we cannot be certain that the improvement documented in this study was a result of the treatment. As we noted earlier, we assigned patients to length of treatment on the basis of clinical decision making in a naturalistic manner. However, other research has documented the basic stability of severe personality disorder diagnoses over time (
21). It would be difficult to imagine that the substantial gains seen in this study were unrelated to treatment. Moreover, random assignment could have resulted in a situation in which clinicians were forced to conduct the treatment in an artificial way that went against their better judgment. For example, patients who clearly needed long-term treatment could have received brief crisis intervention.
As with other studies of severe personality disorders, we must acknowledge that patients who complete a research protocol may represent a subgroup of patients with a better prognosis than those who do not. We do not have data on all patients who dropped out of our study, so we cannot draw more definitive conclusions.
Another deficiency in our study is that we lack detailed information on the type of treatment patients received between discharge and one-year follow-up. We do not have data on the Harding patients, and most of the Menninger patients had left the system. We do know from asking about the nature of aftercare that only some 20 percent had entered substantial day hospital programs after discharge. Most, however, continued with some type of psychotherapy, medication, or both.
Despite these shortcomings, our study had a number of strengths that bolster the value of our findings. First, the large sample size allowed us to study a substantial cross-section of patients with severe personality disorder. Second, the prospective design allowed us to make meaningful comparisons between admission ratings and discharge and follow-up ratings. Third, we used independent raters who were not involved with the treatment, thereby removing the common problem of clinician bias stemming from clinicians' wishes to see their own patients improve. Our raters also were able to achieve interrater reliability on the measures, whereas some other studies have failed to do so. Fourth, our follow-up assessment, unlike those in other studies, was conducted at a fixed period.