Full access

Review of Research Evidence Supporting Guideline Statements

Appendix C

Assessment and Determination of Treatment Plan

Statement 1 – Initial Assessment

APA recommends (1C) that the initial assessment of a patient with possible borderline personality disorder include the reason the individual is presenting for evaluation; the patient’s goals and preferences for treatment; a review of psychiatric symptoms, including core features of personality disorders and common co-occurring disorders; a psychiatric treatment history; an assessment of physical health; an assessment of psychosocial and cultural factors; a mental status examination; and an assessment of risk of suicide, self-injury, and aggressive behaviors, as outlined in the APA’s Practice Guidelines for the Psychiatric Evaluation of Adults, 3rd Edition.

Evidence for this statement comes from general principles of assessment and clinical care in psychiatric practice. Expert opinion suggests that conducting such assessments as part of the initial psychiatric evaluation improves diagnostic accuracy, appropriateness of treatment selection, and treatment safety. For additional details, see Guideline I, “Review of Psychiatric Symptoms, Trauma History, and Psychiatric Treatment History,” Guideline III, “Assessment of Suicide Risk,” Guideline IV, “Assessment of Risk for Aggressive Behaviors,” Guideline V, “Assessment of Cultural Factors,” and Guideline VI, “Assessment of Medical Health,” in the APA’s Practice Guidelines for the Psychiatric Evaluation of Adults, 3rd Edition (American Psychiatric Association 2016a). A detailed systematic review to support this statement is outside the scope of this guideline; however, less comprehensive searches of the literature did not yield any studies related to this recommendation in the context of BPD treatment. Consequently, the strength of research evidence is rated as low.

Grading of the Overall Supporting Body of Research Evidence for Assessment of a Patient with Possible Borderline Personality Disorder

On the basis of the limitations of the evidence for assessment of a patient with possible BPD, no grading of the body of research evidence is possible.

Statement 2 – Quantitative Measures

APA suggests (2C) that the initial psychiatric evaluation of a patient with possible borderline personality disorder include a quantitative measure to identify and determine the severity of symptoms and impairments of functioning that may be a focus of treatment.

Evidence for this statement comes from general principles of assessment and clinical care in psychiatric practice. Consequently, the strength of research evidence is rated as low. Expert opinion suggests that conducting quantitative assessments as part of the initial psychiatric evaluation improves diagnostic accuracy, appropriateness of treatment selection, and longitudinal assessment of patient symptoms and treatment effects. This recommendation is also consistent with Guideline VII, “Quantitative Assessment,” as part of the APA Practice Guidelines for the Psychiatric Evaluation of Adults, 3rd Edition (American Psychiatric Association 2016a).

Grading of the Overall Supporting Body of Research Evidence for Use of Quantitative Measures

On the basis of the limitations of the evidence for use of quantitative measures, no grading of the body of research evidence is possible.

Statement 3 – Treatment Planning

APA recommends (1C) that a patient with borderline personality disorder have a documented, comprehensive, and person-centered treatment plan.

Evidence for this statement comes from general principles of assessment and clinical care in psychiatric practice. For additional details, see the APA Practice Guidelines for the Psychiatric Evaluation of Adults, 3rd Edition (American Psychiatric Association 2016a). A detailed systematic review to support this statement was outside the scope of this guideline; however, less comprehensive searches of the literature did not yield any studies that directly related to this recommendation in the context of BPD treatment. Consequently, the strength of research evidence is rated as low.

Grading of the Overall Supporting Body of Research Evidence for Evidence-Based Treatment Planning

On the basis of the limitations of the evidence for evidence-based treatment planning, no grading of the body of research evidence is possible.

Statement 4 – Discussion of Diagnosis and Treatment

APA recommends (1C) that a patient with borderline personality disorder be engaged in a collaborative discussion about their diagnosis and treatment, which includes psychoeducation related to the disorder.

In terms of collaborative discussion about diagnosis and treatment, evidence for this statement comes from general principles of clinical care in psychiatric practice. Psychoeducation is also generally accepted as an important element of psychiatric care. In addition, several studies have examined effects of psychoeducation in individuals with BPD, but these did not find a significant effect of psychoeducation, per se.

Psychoeducation Versus Wait-List

Two randomized controlled trials (RCTs; N =50 and N = 80), rated as having a moderate risk of bias, assessed the effectiveness of psychoeducation compared with a wait-list control over 12 weeks (Zanarini and Frankenburg 2008; Zanarini et al. 2018). Psychoeducation consisted of an internet-based program detailing the latest information on BPD in one study (Zanarini et al. 2018) and a single workshop in the other (Zanarini and Frankenburg 2008). Participants received psychoeducation in addition to treatment as usual (TAU). Participants in the control group were on a wait-list for psychoeducation and continued with TAU only.

All participants were female, and the majority were White. The mean age was 21 years (Zanarini et al. 2018) and 19 years (Zanarini and Frankenburg 2008). Only one study reported the severity of BPD at baseline (Zanarini et al. 2018). Participants were mildly ill at baseline with mean Zanarini Rating Scale for BPD (ZAN-BPD) scores ranging from 10.13 to 12.13 (Zanarini et al. 2018).

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing psychoeducation with wait-list control

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with wait-list	Difference in effect with psychoeducation
				Anticipated absolute effects
Severity of BPD
Assessed with ZAN-BPD Follow-up: mean 12 weeks	130 (two RCTs: Zanarini and Frankenburg 2008; Zanarini et al. 2018)	⨁⨁◯◯; LOW^a for similar effects	–	Mean score at endpoint = 9.16	Mean 1.33 lower (ns)
Anxiety
Assessed with CUXOS Follow-up: mean 12 weeks	80 (one RCT: Zanarini et al. 2018)	⨁⨁◯◯; LOW^b for similar effects	–	Mean score at endpoint = 40.11	Mean 4.96 lower (ns)
Depression
Assessed with CUDOS Follow-up: mean 12 weeks	80 (one RCT: Zanarini et al. 2018)	⨁⨁◯◯; LOW^b for similar effects	–	Mean score at endpoint = 26.89	Mean 6.11 lower (ns)
Functioning
Assessed with SDS Follow-up: mean 12 weeks	80 (one RCT: Zanarini et al. 2018)	⨁⨁◯◯; LOW^b for similar effects	–	Mean score at endpoint = 9.76	Mean 2.18 higher (ns)

BPD = borderline personality disorder; CI = confidence interval; CUDOS = Clinically Useful Depression Outcome Scale; CUXOS = Clinically Useful Anxiety Outcome Scale; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; ns = not significant; N = sample size; RCT = randomized controlled trial; SDS = Sheehan Disability Scale; ZAN-BPD = Zanarini Rating Scale for Borderline Personality Disorder.

^aStudies do not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of Borderline Personality Disorder

Both studies assessed the severity of BPD on the ZAN-BPD (Zanarini and Frankenburg 2008; Zanarini et al. 2018) and reported nonsignificant differences between the psychoeducation and the wait-list groups. In addition, one RCT reported similar treatment effects between groups on the Borderline Evaluation of Severity Over Time (BEST) scale (Zanarini et al. 2018). This RCT reported significantly better scores for the psychoeducation group after 12 months of follow-up (Zanarini et al. 2018). The investigators, however, tested 10 outcome measures and did not adjust for multiple comparisons.

Severity of Symptoms Associated With Borderline Personality Disorder

The larger of the two RCTs (N = 80; Zanarini et al. 2018) employing internet-based psychoeducation reported no significant differences between intervention and wait-list groups for anxiety and depressive symptoms. Participants in the psychoeducation group, however, achieved significantly better scores on the Social Adjustment Scale than participants in the wait-list group. As mentioned earlier, however, this study tested 10 outcome measures and did not adjust for multiple testing.

Global Impression and Functioning

One study (N = 80; Zanarini et al. 2018) reported similar effects and no significant differences on the Sheehan Disability Scale after 12 weeks and 12 months.

Incidence of Adverse Events, Serious Adverse Events, and Withdrawal Due to Adverse Events

None of the studies reported on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events (Zanarini and Frankenburg 2008; Zanarini et al. 2018).

Grading of the Overall Supporting Body of Research Evidence for Psychoeducation in Patients With BPD

▫

Magnitude of effect: None noted. In the two studies that specifically assessed psychoeducation in BPD, no differences were noted as compared to a wait-list control condition.

▫

Risk of bias: Moderate. Both studies of psychoeducation in BPD were rated as having a moderate risk of bias.

▫

Applicability: In both studies, participants were female, with a mean age of 19–22 years. Race was predominantly White in both studies, with some other races and ethnicities represented in one study. One study used in-person psychoeducation, whereas the other study used internet-based psychoeducation, which is less common. One of the studies also excluded individuals who were currently receiving psychiatric treatment, which would also be atypical. Thus, the applicability of these studies to typical treatment of individuals with BPD appears limited.

▫

Directness: Direct. Measured outcomes include BPD symptom severity and functioning.

▫

Consistency: Inconsistent. The internet-based psychoeducation study showed better outcomes with psychoeducation at 12 weeks on social adjustment and at 12 months on BPD severity, whereas the other study showed no differences with psychoeducation.

▫

Precision: Imprecise. The studies did not meet the optimal information size (i.e., number of participants in a meta-analysis).

▫

Dose-response relationship: Not applicable. Dose-response was not studied.

▫

Confounding factors (including likely direction of effect): Not identified.

▫

Publication bias: Not identified.

▫

Overall strength of research evidence: Low. Only two studies are available that assessed BPD severity, and, for other outcomes including functioning, only one study was available. Both studies were relatively small and were of moderate risk of bias. The strength of evidence was also downgraded for imprecision, and there was inconsistency in the findings of the two studies.

Psychosocial Interventions

Statement 5 – Psychotherapy

APA recommends (1B) that a patient with borderline personality disorder be treated with a structured approach to psychotherapy that has support in the literature and targets the core features of the disorder.

Evidence in the treatment of adults with BPD comes from the systematic review conducted by RTI. The data from clinical trials include comparisons with wait-list control and TAU conditions as well as head-to-head comparisons of specific psychotherapies. For the vast majority of treatments, there were only one or two studies of each comparison, which makes it challenging to draw robust conclusions. Notably, in the vast majority of studies that used TAU or an active comparator treatment, all treatment arms showed improvement with psychotherapy even when differences between the treatment groups did not show statistically significant differences. This consistency as well as the superiority of many of the psychotherapies to TAU led the writing group to assess the overall strength of research evidence as moderate for psychotherapy in BPD.

For adolescents with BPD, the evidence for psychotherapeutic interventions is more limited but generally consistent with the benefits of treatment found in adults. Two studies in adolescents met the inclusion criteria for this review (Chanen et al. 2008; Santisteban et al. 2015) and are discussed in further detail later in this appendix and in Appendix D. Other studies in adolescents did not meet inclusion criteria, primarily because they included patients with borderline traits as well as patients who fulfilled criteria for a diagnosis of BPD. A systematic review of studies in adolescents concluded that additional rigorous trials are needed because current studies have small samples, high attrition rates, inconsistent findings, and high risks of bias (Jørgensen et al. 2021).

Interpersonal Psychotherapy Versus Wait-List Plus Clinical Management

One RCT (Bozzatello and Bellino 2020) evaluated the efficacy of interpersonal psychotherapy compared with wait-list plus clinical management. The study included 43 participants in Italy who were assessed at 10 months. This study was rated as having a moderate risk of bias. The trial was funded by the Italian government.

The majority of the study participants were female; race was not reported. The overall mean age of participants was 35 years of age. The study excluded patients receiving psychiatric services or who had existing schizophrenia, bipolar disorder, mental impairment, or drug or alcohol dependence.

The intervention group received 22 sessions in the first 20 weeks and 20 sessions in the last 20 weeks. Each session lasted 50 minutes. TAU consisted of case management provided by hospital and primary and community care services.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing IPT with wait-list plus clinical management

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with wait-list plus clinical management	Difference in effect with IPT
				Anticipated absolute effects
Severity of BPD
Assessed with BPDSI Follow-up: mean 10 months	43 (one RCT: Bozzatello and Bellino 2020)	⨁⨁◯◯ LOW^a for greater effects with IPT	–	Mean score at endpoint = 36.1	Mean 8.4 lower (P= 0.01)
Severity of BPD symptoms
Assessed with BIS-11 and SHI Follow-up: mean 10 months	43 (one RCT: Bozzatello and Bellino 2020)	⨁◯◯◯; VERY LOW^b for similar effects	–	Mean score at endpoint on BIS-11 = 64.8, on SHI = 6.91	Mean 12.6 lower on BIS-11 (P= 0.03) and 2.8 higher on SHI (P= 0.27)
Functioning
Assessed with CGI-S and SOFAS Follow-up: mean 10 months	43 (one RCT: Bozzatello and Bellino 2020)	⨁⨁◯◯; LOW^a for greater effects with IPT	–	Mean score at endpoint on CGI-S = 3.1, on SOFAS = 57.1	Mean 1.0 lower on CGI-S (P= 0.009) and 11.1 higher on SOFAS (P= 0.02)

BIS-11 = Barratt Impulsiveness Scale, Version 11; BPD = borderline personality disorder; BPDSI = Borderline Personality Disorder Severity Index; CGI-S = Clinical Global Impression-Severity; CI = confidence interval; GRADE = Grading of Recommendations, Assessment, Development, and Evaluation; IPT = interpersonal psychotherapy; RCT = randomized controlled trial; SHI = Self-Harm Inventory; SOFAS = Social Occupational Functioning Assessment Scale.

^aStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision; inconsistent direction of effect on measures of severity of BPD symptoms; downgraded one step for inconsistency.

Severity of borderline personality disorder

After 10 months of treatment, the study reported significantly greater improvements on the Borderline Personality Disorder Severity Index for participants in the interpersonal psychotherapy group compared with the wait-list plus clinical management group (Bozzatello and Bellino 2020).

Severity of symptoms associated with borderline personality disorder

After 10 months of treatment, the study reported significantly greater improvements on the Barratt Impulsiveness Scale (BIS), version 11, but not on the Self-Harm Inventory, for participants in the interpersonal psychotherapy group compared with the wait-list plus clinical management group (Bozzatello and Bellino 2020).

Global impression and functioning

After 10 months of treatment, the study reported significantly greater improvements on the Clinical Global Impression (CGI) scale, Severity item and the Social Occupational Functioning Assessment Scale for participants in the interpersonal psychotherapy group compared with the wait-list plus clinical management group (Bozzatello and Bellino 2020).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Acceptance and Commitment Therapy Versus Treatment as Usual

One RCT (Morton et al. 2012) evaluated the efficacy of acceptance and commitment therapy (ACT) in addition to TAU compared with TAU alone. The Australian study included 41 participants who were followed for a duration of 13 weeks. The study was rated as having a moderate risk of bias because of high attrition. The trial did not report funding.

Almost all of the study participants were female. The mean age of the ACT group was 36 years, while the mean age of the TAU group was 34 years. The study excluded participants with psychotic symptoms (besides “reactive psychotic symptoms” associated with BPD [not specified further]), with intellectual disability, with cognitive impairment, or who were a significant risk to other participants.

ACT was delivered as weekly group sessions that included performing mindfulness exercises, doing emotions skills training, focusing on awareness of one’s values, and identifying choice points for action. TAU consisted of case management provided by public mental health services in Australia.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing ACT with TAU

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with ACT
				Anticipated absolute effects
Severity of BPD
Assessed with BEST Follow-up: mean 13 weeks	41 (one RCT: Morton et al. 2012)	⨁⨁◯◯; LOW^a for greater effect with ACT	–	Mean score at endpoint = 47.4	Mean 17.2 lower (P= 0.028)
Anxiety
Assessed with DASS Follow-up: mean 12 days	41 (one RCT: Morton et al. 2012)	⨁⨁◯◯; LOW^a for greater effect with ACT	–	Mean score at endpoint = 26.3	Mean 11.6 lower (P= 0.025)
Depression
Assessed with DASS Follow-up: mean 13 weeks	41 (one RCT: Morton et al. 2012)	⨁⨁◯◯; LOW^a for greater effect with ACT	–	Mean score at endpoint = 31.0	Mean 15 lower (ns)
Difficulties in emotion regulation
Assessed with DERS Follow-up: mean 13 weeks	41 (one RCT: Morton et al. 2012)	⨁⨁◯◯; LOW^a for greater effect with ACT	–	Mean score at endpoint = 140.0	Mean 35.3 lower (P= 0.008)
Hopelessness
Assessed with BHS Follow-up: mean 13 weeks	41 (one RCT: Morton et al. 2012)	⨁⨁◯◯; LOW^a for greater effect with ACT	–	Mean score at endpoint = 16.4	Mean 8.9 lower (P= 0.006)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

ACT = acceptance and commitment therapy; BEST = Borderline Evaluation of Severity Over Time; BHS = Beck Hopelessness Scale; BPD = borderline personality disorder; CI = confidence interval; DASS = Depression Anxiety Stress Scale; DERS = Difficulties in Emotion Regulation Scale; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; ns = not significant; RCT = randomized controlled trial; TAU = treatment as usual.

^aStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

After 13 weeks of treatment, the study reported significantly greater improvements on the BEST scale for participants in the ACT group compared with the TAU group (Morton et al. 2012).

Severity of symptoms associated with borderline personality disorder

After 13 weeks, participants who received ACT in addition to TAU had significantly greater improvements than participants treated with TAU only on the Beck Hopelessness Scale, the Difficulties in Emotion Regulation Scale, and the subscale for anxiety of the Depression Anxiety Stress Scale. Changes on the subscales for depression and stress of the Depression Anxiety Stress Scale were also greater for the ACT group but did not achieve statistical significance (Morton et al. 2012).

Global impression and functioning

The study did not assess measures of global impression or functioning (Morton et al. 2012).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Manual-Assisted Cognitive Therapy Versus Treatment as Usual

One U.S. RCT (Weinberg et al. 2006) evaluated the efficacy of manual-assisted cognitive therapy (MACT), compared with TAU. Overall, the study provided data on 30 participants. The study was rated as having a moderate risk of bias. Follow-up duration was 6 months after treatment. The study was supported by a Young Investigator Award from the Borderline Personality Disorder Research Foundation. The majority of the study participants were female and White and had a mean age of 28 years. The study did not report on baseline severity. The study excluded participants with psychotic disorders, substance abuse disorder, or risk of suicide.

MACT was administered as an adjunctive intervention to TAU and comprised six sessions, over 6–8 weeks, incorporating elements of dialectical behavior therapy (DBT), cognitive-behavioral therapy (CBT), and bibliotherapy, modified to focus on deliberate self-harm. Each session was structured around a chapter of a booklet, covering functional analysis of episodes of parasuicide (defined as deliberate self-harm or suicide attempts), emotion regulation strategies, problem-solving strategies, management of negative thinking, management of substance use, and relapse prevention strategies. TAU consisted of standard care.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing MACT with TAU

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with MACT
				Anticipated absolute effects
Deliberate self-harm
Assessed with deliberate self-harm frequency (scale NR) Follow-up: mean 6 months	30 (one RCT: Weinberg et al. 2006)	⨁⨁◯◯; LOW^a for greater effects with MACT	–	Mean at endpoint for frequency = 6.69	Mean 4.71 lower (P< 0.001)
Assessed with deliberate self-harm severity (scale NR) Follow-up: mean 6 months	30 (one RCT: Weinberg et al. 2006)	⨁⨁◯◯; LOW^a for greater effects with MACT	–	Mean severity score at endpoint = 1.01	Mean 0.5 lower (P< 0.001)

CI = confidence interval; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MACT = manual-assisted cognitive therapy; NR = not reported; RCT = randomized controlled trial; TAU = treatment as usual.

^aStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

The study did not report on the severity of BPD.

Severity of symptoms associated with borderline personality disorder

The study (Weinberg et al. 2006) reported significant reductions in the frequency and severity of deliberate self-harm for participants in the MACT group when compared with TAU after 6 months of treatment. The authors recorded the use of the Parasuicide History Interview to identify the frequency or severity of deliberate self-harm but did not specify the range of the scale for assessing severity.

Global impression and functioning

The study did not report on global impression or functioning.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Cognitive-Behavioral Therapy Versus Treatment as Usual

The Borderline Personality Disorder Study of Cognitive Therapy (BOSCOT) RCT (Davidson et al. 2006) evaluated the efficacy of CBT in addition to TAU compared with TAU only. The study included 106 participants in the United Kingdom who were followed for a duration of 24 months. The study was rated as having a moderate risk of bias. The trial was funded by a public foundation.

The majority of the study participants were female, and all of them were White (Davidson et al. 2006). The overall mean age of participants was 32 years of age. The study excluded patients receiving psychiatric services or who had existing schizophrenia or bipolar disorder, mental impairment, or drug or alcohol dependence.

The intervention group received an average of 27 sessions of CBT over 12 months in addition to TAU (Davidson et al. 2006). Each session lasted 1 hour. TAU consisted of case management provided by hospital and primary and community care services.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing CBT with TAU

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with CBT
				Anticipated absolute effects
Anxiety
Assessed with STAI Follow-up: mean 24 months	102 (one RCT: Davidson et al. 2006)	⨁⨁◯◯; LOW^a for greater effect with CBT	–	Mean score at endpoint = 50.9	Mean 7.96 lower (0 to 0)
Depression
Assessed with BDI Follow-up: mean 24 months	102 (one RCT: Davidson et al. 2006)	⨁⨁◯◯; LOW^a for similar effects	–	Mean score at endpoint = 28.8	Mean 2.3 lower (0 to 0)
Proportion of participants with suicidal acts
Follow-up: mean 24 months	102 (one RCT: Davidson et al. 2006)	⨁⨁◯◯; LOW^b for similar risks	OR 0.78 (0.30–1.98)	531 per 1,000	62 fewer per 1,000 (277 fewer to 161 more)
Mean number of suicidal acts
Follow-up: mean 24 months	102 (one RCT: Davidson et al. 2006)	⨁⨁◯◯; LOW^b for greater effect with CBT	–	Mean number at endpoint = 1.73	Mean 0.91 lower (1.67 lower to 0.15 lower)
Quality of life
Assessed with EQ-5D Follow-up: mean 24 months	102 (one RCT: Davidson et al. 2006)	⨁⨁◯◯; LOW^a for similar effects	–	Mean score at endpoint = 0.66	Mean 0.02 lower (0 to 0)
Social functioning
Assessed with SFQ Follow-up: mean 24 months	102 (one RCT: Davidson et al. 2006)	⨁⨁◯◯; LOW^a for similar effects	–	Mean score at endpoint = 12.3	Mean 0.7 lower (0 to 0)

BDI = Beck Depression Inventory; CBT = cognitive-behavioral therapy; CI = confidence interval; EQ-5D = European Quality of Life–5 Dimension; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; OR = odds ratio; RCT = randomized controlled trial; SFQ = Social Functioning Questionnaire; STAI = State-Trait Anxiety Inventory; TAU = treatment as usual.

^aStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bFew events; downgraded two steps for imprecision.

Severity of borderline personality disorder

The study did not report on the severity of BPD.

Severity of symptoms associated with borderline personality disorder

The proportion of participants in the study (Davidson et al. 2006) who engaged in suicidal acts (defined as acts that were deliberate, life threatening, and resulting in or requiring medical intervention) was not significantly different between treatment groups after 24 months of follow-up. The number of mean suicidal acts per person had not reached significant differences at 12 months but was significantly lower for participants in the CBT group than the TAU group after 24 months. Improvements on the State-Trait Anxiety Inventory were significantly greater for participants in the CBT group compared with those treated with TAU only after 24 months but not after 12 months. No significant differences between treatment groups could be detected on the Beck Depression Inventory (BDI) or for the number of hospitalizations after 12 months.

Global impression and functioning

No significant differences between treatment groups were detected for the Social Functioning Questionnaire and the European Quality of Life–5 Dimension instrument after 12 months (Davidson et al. 2006).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Dialectical Behavior Therapy Versus Treatment as Usual

Six studies, four RCTs (Carter et al. 2010; Feigenbaum et al. 2012; McMain et al. 2017; Verheul et al. 2003), a nonrandomized trial (Bohus et al. 2004), and a retrospective cohort study (Gregory and Sachdeva 2016), evaluated the efficacy of DBT compared with TAU. Overall, these studies provided data on 483 participants. Three studies were rated as having a high risk of bias, two as moderate risk of bias, and one as low risk of bias. Reasons for ratings of high risk of bias were lack of intention-to-treat analysis and high attrition. Follow-up durations ranged from 3 months to 12 months. One trial was funded by a health insurance company; the other studies were publicly funded or did not report source of funding.

The majority of study participants were female, and mean ages ranged from 25 years to 35 years. Only one study, in which the majority of participants were White, reported on race or ethnicity. Likewise, only one study reported the severity of BPD at baseline (Gregory and Sachdeva 2016). In this retrospective cohort study, participants were moderately ill at baseline, with BEST scores of 45 to 49. Studies excluded patients with psychiatric comorbidities such as schizophrenia, major depressive disorder (MDD), alcohol or substance use disorder, and bipolar disorder.

DBT combines weekly individual psychotherapy sessions, weekly skills training groups, and weekly supervision and consultation meetings for the therapists. One study assessed brief DBT with skills training only over 20 weeks (McMain et al. 2017). All studies enrolled outpatients, except a study from Germany, which conducted DBT as an inpatient treatment (Bohus et al. 2004).

TAU consisted of a range of individualized service provisions and professional mental health care. All except one study (Gregory and Sachdeva 2016) employed a wait-list design in which participants of the TAU groups were offered DBT at the end of the study.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing DBT with TAU

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with DBT
				Anticipated absolute effects
Severity of BPD
Assessed with BSC-23 Follow-up: mean 32 weeks	125 (one RCT, one observational study: Gregory and Sachdeva 2016; McMain et al. 2017)	⨁⨁◯◯ LOW^a for similar effects	–	Mean score at endpoint = 45.99*	Mean 4.91 higher (ns)
Anger, depression
Assessed with various scales Follow-up: 3–12 months	227 (one RCT, one nRCT, one observational study: Bohus et al. 2004; Feigenbaum et al. 2012; Gregory and Sachdeva 2016; McMain et al. 2017)	⨁◯◯◯; VERY LOW^a,b,c,d for similar effects	–	Inconsistent effects with TAU	Inconsistent
Dissociative experiences
Assessed with DES Follow-up: 3–12 months	102 (one RCT, one nRCT: Bohus et al. 2004; Feigenbaum et al. 2012)	⨁◯◯◯; VERY LOW^a,e for similar effects	–	Mean score at endpoint = 83.3	Mean 0.1 higher (ns)
Impulsiveness
Assessed with BIS Follow-up: mean 32 weeks	84 (one RCT: McMain et al. 2017)	⨁⨁◯◯; LOW^e for similar effects	–	Mean score at endpoint = 55.16	Mean 1.84 lower (ns)
Self-harm
Assessed with DSHI, self-injury, self-mutilation Follow-up: mean 3–12 months	367 (four RCTs, one nRCT, one observational study: Bohus et al. 2004; Carter et al. 2010; Feigenbaum et al. 2012; Gregory and Sachdeva 2016; McMain et al. 2017; Verheul et al. 2003)	⨁⨁◯◯; LOW^b,c for greater effect with DBT	Not estimable	Mean score for DHSI at endpoint = 1.14*	Mean 0.34 lower (ns)
Suicidal and nonsuicidal self-injuries
Assessed with LSASI Follow-up: mean 32 weeks	184 (three RCTs: Feigenbaum et al. 2012; McMain et al. 2017; Verheul et al. 2003)	⨁⨁◯◯; LOW^a for greater effect with DBT	–	Mean score at endpoint = 2.56*
General psychopathology
Assessed with SCL-90-R; follow-up: mean 32 weeks	134 (two RCTs: Bohus et al. 2004; McMain et al. 2017)	⨁⨁◯◯; LOW^a for greater effect with DBT	OR 3.44 (NR)	184 per 1,000*
Functioning
Assessed with GAF; follow-up: mean 4 months	50 (one RCT: Bohus et al. 2004)	⨁◯◯◯; VERY LOW^a,e for greater effect with DBT	-	Mean score at endpoint = 49.4
Withdrawal due to adverse events
Follow-up: 12 months	41 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,e for similar risks	RR 1 (– to –)	0 per 1,000

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

BIS = Barratt Impulsiveness Scale; BPD = borderline personality disorder; BSC-23 = Borderline Symptom Checklist–23; CI = confidence interval; DBT = dialectical behavior therapy; DES = Dissociative Experiences Scale; DSHI = Deliberate Self-Harm Inventory; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; LSASI = Lifetime Suicide Attempt Self-Injury Interview; NR = not reported; nRCT = nonrandomized controlled trial; ns = not significant; OR = odds ratio; RCT = randomized controlled trial; RR = risk ratio; SCL-90-R = Symptom Checklist–90–Revised; TAU = treatment as usual.

*Data based on McMain et al. 2017.

^aStudies do not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bStudies report inconsistent results regarding differences in treatment effects; downgraded one step for inconsistency.

^cStudies do not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

^dTwo of three studies are high risk of bias.

^eStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

In the study by McMain et al. (2017) (N = 84), rated as having a moderate risk of bias, participants receiving brief DBT achieved significantly greater reductions on the Borderline Symptom List–23 (BSL-23) compared with participants in the TAU group at the end of the intervention (20 weeks) but not at the 32-week follow-up. A retrospective cohort study (N = 41; Gregory and Sachdeva 2016) also reported no significant differences on the BEST scale between participants treated with DBT and TAU after 12 months.

Severity of symptoms associated with borderline personality disorder

All six studies reported on changes in symptoms associated with BPD (Bohus et al. 2004; Carter et al. 2010; Feigenbaum et al. 2012; Gregory and Sachdeva 2016; McMain et al. 2017; Verheul et al. 2003). The two RCTs (N = 84 and N = 58), rated as having a moderate risk of bias, reported fewer suicide attempts in participants assigned to the DBT group than in participants receiving TAU (McMain et al. 2017; Verheul et al. 2003). By contrast, two studies (one RCT [Feigenbaum et al. 2012] and one cohort study [Gregory and Sachdeva 2016]), rated as having a high risk of bias, reported no significant differences in suicide attempts between treatment groups.

All studies reported on self-harm, defined variously as deliberate self-harm, self-injury, and self-mutilation. The majority of trials also showed greater reductions in self-harm in the DBT group than in the TAU group. In two trials (total N of 108), the difference in self-mutilating behaviors reached statistical significance (Bohus et al. 2004; Verheul et al. 2003).

Two studies, rated as having a high risk of bias, reported no significant differences in dissociative experiences between DBT and TAU (Bohus et al. 2004; Feigenbaum et al. 2012). One study reported on improvements of aggression (Feigenbaum et al. 2012) and impulsiveness (McMain et al. 2017), respectively; neither reported significant differences.

Studies reported mixed results regarding differences in efficacy between DBT and TAU to improve the severity of anger (Bohus et al. 2004; Feigenbaum et al. 2012; McMain et al. 2017) and depressive symptoms (Bohus et al. 2004; Feigenbaum et al. 2012; McMain et al. 2017).

Global impression and functioning

Significantly more participants in the brief DBT group than in the TAU group achieved clinically relevant improvements on the Symptom Checklist–90–Revised (SCL-90-R) at 32 weeks (McMain et al. 2017). Likewise, Bohus et al. (2004) reported greater improvements on the Global Severity Index and the Global Assessment of Functioning (GAF) Scale after 4 months of treatment with DBT than TAU.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

None of the studies reported on the incidence of adverse events and serious adverse events. The retrospective cohort study (Gregory and Sachdeva 2016) found no differences in withdrawals due to adverse events between participants treated with DBT and TAU (0% vs. 0%).

Dialectical Behavior Therapy Versus Mentalization-Based Treatment

One nonrandomized clinical trial (Barnicot and Crawford 2019), conducted in the United Kingdom and rated as having a high risk of bias, compared DBT with mentalization-based treatment (MBT) in 90 patients with BPD. The majority of participants were female (72%), with a mean age of 31 years. More than one-third (36%) were Black or belonged to a minority ethnic group. Mean baseline BPD severity ranged from 40.7 points to 44.8 points on the BEST scale. Reasons for the high risk of bias included selection bias and confounding.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing MBT with DBT for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with DBT	Difference in effect with MBT
				Anticipated absolute effects
Severity of BPD
Assessed with BEST Follow-up: 12 months	90 (one nRCT: Barnicot and Crawford 2019)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 35.0	Mean 0.8 higher (ns)
Dissociative experiences
Assessed with DES Follow-up: 12 months	90 (one nRCT: Barnicot and Crawford 2019)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 30.6	Mean 4 lower (ns)
Emotional dysregulation
Assessed with DERS Follow-up: 12 months	90 (one nRCT: Barnicot and Crawford 2019)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 103.1 points	Mean 5.6 higher (ns)
Self-harm incidents
Assessed with SASII Follow-up: 12 months	90 (one nRCT: Barnicot and Crawford 2019)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Median number at endpoint was 2.0	Mean 10.5 more (ns)

BEST = Borderline Evaluation of Severity Over Time; BPD = borderline personality disorder; CI = confidence interval; DBT = dialectical behavior therapy; DERS = Difficulties in Emotion Regulation Scale; DES = Dissociative Experiences Scale; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MBT = mentalization-based treatment; nRCT = nonrandomized controlled trial; ns = not significant; SASII = Suicide Attempt Self-Injury Interview.

^aHigh risk for bias in selection of participants into the study and high risk for confounding; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

Treatment duration for both DBT and MBT was 12 months, and the study was funded through the United Kingdom’s National Institute for Health (Barnicot and Crawford 2019). DBT included weekly individual therapy and group skills training, telephone skills coaching, and team consultation. MBT included weekly or fortnightly individual therapy and weekly group therapy along with a short-term, 10-week group program offering psychoeducation and support aimed at helping patients get a better understanding of their problems and suggestions for better ways of dealing with them.

At the end of the 12-month treatment phase, there was no significant difference in severity of BPD between DBT and MBT as measured by the BEST scale (Barnicot and Crawford 2019). There was significant improvement from baseline in both groups.

Severity of symptoms associated with borderline personality disorder

At the end of the 12-month treatment phase, there was no significant difference between DBT and MBT in the number of self-harm incidents over the previous 3 months or in the number of dissociative symptoms and emotional dysregulation (Barnicot and Crawford 2019). Significant improvement from baseline in the severity of symptoms specific to BPD occurred in both groups.

Global impression and functioning

The study did not look at global impression or functioning at follow-up.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on treatment-related adverse events, including withdrawal due to adverse events.

Dialectical Behavior Therapy Versus General Psychiatric Management for Borderline Personality Disorder

One Canadian RCT (McMain et al. 2012; described in three publications), rated as having a high risk of bias, compared DBT with well-specified general psychiatric management (GPM) in 180 patients with BPD. The majority of participants were female (86%), with a mean age of 30 years. Race and ethnicity were not reported. Mean baseline BPD severity ranged from 14.9 points to 15.5 points on the ZAN-BPD. Reasons for high risk of bias included high attrition (38%) at 12 months.

Treatment duration was 12 months, and the study was funded through the Canadian Institutes for Health Research (McMain et al. 2012). DBT included weekly individual therapy and group skills training, weekly telephone coaching with explicit focus on self-harm and suicidal behavior, and weekly therapist team consultation. Manualized GPM consisted of weekly individual therapy that was expanded away from focusing on self-harm and suicidal behaviors and included medication management. Generalized psychiatric therapy also included mandated therapist supervision weekly meetings.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–8 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing DBT with GPM for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with GPM	Difference in effect with DBT
				Anticipated absolute effects
Severity of BPD
Assessed with ZAN-BPD Follow-up: 36 months	180 (one RCT: McMain et al. 2012)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean score at endpoint = 6.66	Mean 1.63 higher (ns)
Depression
Assessed with BDI Follow-up: 36 months	180 (one RCT: McMain et al. 2012)	⨁⨁◯◯; LOW^a,b for greater effect with GPM	–	Mean score at endpoint = 18.05	Mean 6.40 higher (P = 0.004)
Interpersonal functioning
Assessed with IIP Follow-up: 36 months	180 (one RCT: McMain et al. 2012)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean score at endpoint = 84.36	Mean 10.12 higher (ns)
Nonsuicidal self-injuries
Assessed with SASII Follow-up: 36 months	180 (one RCT: McMain et al. 2012)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean number at endpoint = 1.09	Mean 1.09 more (ns)
Suicidal episodes
Assessed with SASII Follow-up: 36 months	180 (one RCT: McMain et al. 2012)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean number at endpoint = 0.29	Mean 0.26 more (ns)
Symptom distress
Assessed with SCL-90-R total score Follow-up: 36 months	180 (one RCT: McMain et al. 2012)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean score at endpoint = 1.03	Mean 0.23 higher (ns)

BDI = Beck Depression Inventory; BPD = borderline personality disorder; CI = confidence interval; DBT = dialectical behavior therapy; GPM = general psychiatric management; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; IIP = Inventory of Interpersonal Problems; ns = not significant; RCT = randomized controlled trial; SASII = Suicide Attempt Self-Injury Interview; SCL-90-R = Symptom Checklist–90–Revised; ZAN-BPD = Zanarini Rating Scale for BPD.

^aHigh risk of bias due to attrition; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

Severity of borderline personality disorder

At the end of the 12-month treatment phase and again at the 36-month follow-up, there was no significant difference in severity of BPD on the ZAN-BPD among patients receiving DBT and those receiving GPM (McMain et al. 2012). There was significant improvement from baseline in both groups.

Severity of symptoms associated with borderline personality disorder

With respect to symptoms specific to BPD, after 12 months of treatment and at the 36-month follow-up, there were no significant differences between DBT and GPM across multiple measures of symptom severity, including the number of suicidal episodes and the number of nonsuicidal self-injuries as measured on the Suicide Attempt Self-Injury Interview and improvement on the Inventory of Interpersonal Problems scale (McMain et al. 2012). With respect to depression, there was no significant difference between groups in BDI scores at the end of the 12-month treatment phase. However, at 36 months (24 months post-treatment), mean BDI scores were significantly lower among patients in the GPM group than in the DBT group.

Global impression and functioning

The study reported no significant differences between treatment groups on the SCL-90-R and the Inventory of Interpersonal Problems (McMain et al. 2012).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on treatment-related adverse events including withdrawal due to adverse events.

Dialectical Behavior Therapy Versus Systems Training for Emotional Predictability and Problem-Solving

One nonrandomized clinical trial (Guillén Botella et al. 2021) conducted in Spain, rated as having a high risk of bias, compared DBT with Systems Training for Emotional Predictability and Problem-Solving (STEPPS) in 72 patients with BPD. The overwhelming majority of participants were female (94%), and all were White, with a mean age of 32 years. Mean baseline BPD severity ranged from 35.8 points to 38.6 points on the BSL-23. The study was rated as having a high risk of bias due to high attrition (32%).

Treatment duration was 6 months (Guillén Botella et al. 2021). DBT included weekly individual therapy and group skills training, telephone skills coaching, and team consultation. STEPPS included group therapy, a reinforcement team, telephone consultations with relatives, consultations with other professionals, and weekly clinician meetings. The study funding source was not reported.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–9 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing DBT with STEPPS

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with DBT	Difference in effect with STEPPS
				Anticipated absolute effects
Severity of BPD
Assessed with BSL-23 Follow-up: 6 months	72 (one nRCT: Guillén Botella et al. 2021)	⨁◯◯◯; VERY LOW^a,b for greater effect with DBT	–	Mean score at endpoint = 23.56	Mean 5.73 higher (P= 0.03)
Anxiety
Assessed with severity of participants index Follow-up: 6 months	72 (one nRCT: Guillén Botella et al. 2021)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 8.40	Mean 0.71 higher (ns)
Depression
Assessed with BDI Follow-up: 6 months	72 (one nRCT: Guillén Botella et al. 2021)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 28.03	Mean 6.7 lower (ns)
Dissociation experiences
Assessed with DES-II Follow-up: 6 months	72 (one nRCT: Guillén Botella et al. 2021)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 20.81	Mean 2.8 lower (ns)
Suicide risk
Assessed with SRS Follow-up: 6 months	72 (one nRCT: Guillén Botella et al. 2021)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 7.0	Mean 1.56 higher (ns)
Quality of life
Assessed with QoL Follow-up: 6 months	72 (one nRCT: Guillén Botella et al. 2021)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 6.31	Mean 1.16 lower (ns)

BDI = Beck Depression Inventory; BPD = borderline personality disorder; BSL-23 = Borderline Symptom List–23; CI = confidence interval; DBT = dialectical behavior therapy; DES-II = Dissociative Experiences Scale–II; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; nRCT = nonrandomized controlled trial; ns = not significant; QoL = Quality of Life Index; SRS = Suicide Risk Scale; STEPPS = Systems Training for Emotional Predictability and Problem-Solving.

^aHigh risk of bias due to high attrition and moderate for confounding; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

At the end of a 6-month treatment phase, compared with STEPPS, DBT resulted in a greater improvement in BPD symptom severity with significantly lower scores on the BSL-23 scale (Guillén Botella et al. 2021). Both DBT and STEPPS resulted in a significant improvement in BPD severity from baseline.

Severity of symptoms associated with borderline personality disorder

Following 6 months of treatment, there was no significant difference between DBT and STEPPS in suicide risk, depression, anxiety, dissociation experiences, and resilience scores (Guillén Botella et al. 2021). Severity of symptoms decreased across both groups.

Global impression and functioning

Following 6 months of treatment, there was no significant difference between STEPPS and DBT on quality-of-life scores (Guillén Botella et al. 2021).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report treatment-related adverse events, including withdrawal due to adverse events.

Dialectical Behavior Therapy Versus Dynamic Deconstructive Psychotherapy

One three-armed retrospective cohort study (Gregory and Sachdeva 2016; reported in two publications), conducted in the United States and rated as having a high risk of bias, compared DBT with dynamic deconstructive psychotherapy (DDP) and TAU in 68 patients with BPD. The majority of participants were female (81%) and White (88%), with a mean age of 31 years. Mean baseline BPD severity ranged from 45.5 points to 49.2 points on the BEST scale. Reasons for the rating of high risk of bias included high attrition (53%) and confounding.

Treatment duration for both DBT (N = 25) and DDP (N = 27) was 12 months (Gregory and Sachdeva 2016). DBT included weekly individual therapy, weekly group sessions, and telephone skills coaching. DDP included weekly individual sessions that combined elements of translational neuroscience, object relations theory, and deconstructionist philosophy. The study was supported by the American Psychoanalytic Association.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–10 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing DBT with DDP for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with DBT	Difference in effect with DDP
				Anticipated absolute effects
Severity of BPD
Assessed with BEST Follow-up: 12 months	52 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effect with DDP	–	Mean score at endpoint = 41.8	Mean 8.8 lower (P= 0.04)
Depression
Assessed with BDI Follow-up: 12 months	52 (one observational study: (Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effect with DDP	–	Mean score at endpoint = 27.6	Mean 10.5 lower (P= 0.009)
Disability
Assessed with SDS Follow-up: 12 months	52 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effect with DDP	–	Mean score at endpoint = 6.1	Mean 2.3 lower (P= 0.049)
Self-harm
Follow-up: 12 months	52 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effect with DDP	–	Mean number at endpoint = 2.4	Mean 1.1 fewer (P= 0.02)
Suicide attempts
Follow-up: 12 months	52 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean number at endpoint = 1.3	Mean 0.74 fewer (ns)

BDI = Beck Depression Inventory; BEST = Borderline Evaluation of Severity Over Time; BPD = borderline personality disorder; CI = confidence interval; DBT = dialectical behavior therapy; DDP = dynamic deconstructive psychotherapy; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; ns = not significant; SBQ = Suicidal Behaviors Questionnaire; SDS = Sheehan Disability Scale.

^aHigh risk of bias due to confounding and attrition; downgraded one step due to risk of bias.

^b Study does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

At the end of a 12-month treatment phase, participants receiving DDP achieved significantly greater reductions on the BEST scale compared with participants receiving DBT (Gregory and Sachdeva 2016). Both DBT and DDP resulted in a significant improvement in BPD severity from baseline.

Severity of symptoms associated with borderline personality disorder

Following 12 months of treatment, reductions in self-harm, as measured on the Suicidal Behaviors Questionnaire, and improvements in depression scores on the BDI were significantly greater among patients receiving DDP than for those receiving DBT (Gregory and Sachdeva 2016). There was no difference at 12 months between DDP and DBT in reported suicide attempts.

Global impression and functioning

At 12 months, DDP resulted in significant greater improvement in disability with significantly lower scores on the Sheehan Disability Scale compared with DBT (Gregory and Sachdeva 2016).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report treatment-related adverse events including withdrawal due to adverse events.

Dialectical Behavior Therapy Versus Transference-Focused Psychotherapy Versus Supportive Therapy

One three-armed RCT (Clarkin et al. 2007), rated as having a high risk of bias and conducted in the United States, compared DBT with transference-focused psychotherapy (TFP) and supportive therapy in 90 patients with BPD and reported results for patients for whom they had at least three data points (N = 62). The majority of participants were female (92%), White (68%), and with a mean age of 31 years. Mean baseline BPD severity was not reported. We rated the study as having high risk of bias due to the randomization process and high attrition (31%). Treatment duration was 12 months. DBT included weekly individual therapy, weekly group sessions, and telephone skills coaching. TFP included two individual weekly sessions focused primarily on the dominant affect-laden themes that emerge in the patient-therapist relationship. Supportive treatment included one weekly session supplemented with additional sessions as needed. The study was supported by the Borderline Personality Disorder Research Foundation.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–11, A3–12 and A3–13 present certainty-of-evidence ratings for the different comparisons.

Certainty-of-evidence ratings of outcomes comparing DBT with TFP for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TFP	Difference in effect with DBT
				Anticipated absolute effects
Anxiety
Assessed with BSI Follow-up: 12 months	40 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Depression
Assessed with BDI Follow-up: 12 months	40 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Suicidal behaviors
Assessed with OAS-M Follow-up: 12 months	40 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Global functioning
Assessed with GAF Follow-up: 12 months	40 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)

BDI = Beck Depression Inventory; BPD = borderline personality disorder; BSI = Brief Symptom Inventory; CI = confidence interval; DBT = dialectical behavior therapy; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; NR = not reported; ns = not significant; OAS-M = Overt Aggression Scale-Modified; RCT = randomized controlled trial; TFP = transference-focused psychotherapy.

^aHigh risk of bias due to improper randomization and high attrition; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Certainty-of-evidence ratings of outcomes comparing DBT with supportive therapy for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with supportive therapy	Difference in effect with DBT
				Anticipated absolute effects
Anxiety
Assessed with BSI Follow-up: 12 months	39 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Depression
Assessed with BDI Follow-up: 12 months	39 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Global functioning
Assessed with GAF Follow-up: 12 months	39 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Suicidal behaviors
Assessed with OAS-M Follow-up: 12 months	39 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)

^aHigh risk of bias due to improper randomization and high attrition; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Certainty-of-evidence ratings of outcomes comparing TFP with supportive therapy for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TFP	Difference in effect with supportive therapy
				Anticipated absolute effects
Anxiety
Assessed with BSI Follow-up: 12 months	45 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Depression
Assessed with BDI Follow-up: 12 months	45 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Global functioning
Assessed with GAF Follow-up: 12 months	45 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)
Suicidal behaviors
Assessed with OAS-M Follow-up: 12 months	45 (one RCT: Clarkin et al. 2007)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = NR	NR (ns)

BDI = Beck Depression Inventory; BPD = borderline personality disorder; BSI = Brief Symptom Inventory; CI = confidence interval; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; NR = not reported; ns = not significant; OAS-M = Overt Aggression Scale–Modified; RCT = randomized controlled trial; TFP = transference-focused psychotherapy.

^aHigh risk of bias due to improper randomization and high attrition; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

The study did not report on severity of BPD.

Severity of symptoms associated with borderline personality disorder

Following 12 months of treatment, there was a reduction in suicidal behavior (compared with baseline) among patients receiving DBT and TFP but not among those receiving supportive therapy (Clarkin et al. 2007). However, there was no significant difference between DBT, TFP, and supportive therapy. There was also no significant difference between treatment groups on the BDI.

Global impression and functioning

Following 12 months of treatment, patients exhibited no significant differences between DBT, TFP, and supportive therapy on the GAF scale or the Brief Symptom Inventory for anxiety (data not provided) (Clarkin et al. 2007).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report treatment-related adverse events including withdrawal due to adverse events.

Dialectical Behavior Therapy Components Versus Other Components of Dialectical Behavior Therapy

DBT is a multifaceted cognitive-behavioral treatment approach that includes individual therapy, group skills training, telephone coaching, and a consultation team meeting for therapists. Three studies (one nonrandomized clinical trial [Andión et al. 2012], one RCT [Linehan et al. 2015], one prospective cohort study [Lyng et al. 2020]) assessed the comparative value of individual therapy components of DBT. Together, these studies provided data on 238 participants. One study (Andión et al. 2012) compared the individual therapy component of DBT with combined individual and group therapy. Another (Lyng et al. 2020) compared the stand-alone group skills component with 6 months of the full four-component DBT program. A third three-armed study (Linehan et al. 2015) compared 12 months of standard DBT (i.e., the full four-component program) with stand-alone group skills training and individual therapy with an activities group. All three studies were rated as having a high risk of bias. Reasons for ratings of high risk of bias included high overall attrition or high differential attrition, bias due to deviations from the intended intervention, and bias due to confounding (Andión et al. 2012; Linehan et al. 2015; Lyng et al. 2020).

The majority of participants were female, with a mean age across studies ranging from 26 years to 33 years. Race was reported in just one of three studies, in which more than 70% of participants were White (Linehan et al. 2015). Two studies were conducted in Europe (Andión et al. 2012; Lyng et al. 2020) and one in the United States (Linehan et al. 2015). Just one study provided baseline information on BPD severity, reporting a mean score on the BSL-23 of 2.7 points (Lyng et al. 2020). Treatment durations ranged from 6 months (Lyng et al. 2020) to 1 year (Andión et al. 2012; Linehan et al. 2015). One study followed patients through 18 months (6 months after the end of the intervention) (Andión et al. 2012), and another study followed patients through 2 years (12 months following the end of treatment) (Linehan et al. 2015). Studies were generally funded by public funds with no commercial funding.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–14, A3–15, and A3–16 present certainty-of-evidence ratings for different comparisons.

Certainty-of-evidence ratings of outcomes comparing DBT group skills training with standard DBT for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with standard DBT	Difference in effect with DBT group skills training
				Anticipated absolute effects
Severity of BPD
Assessed with BSL-23 Follow-up: mean 6 months	88 (one observational study: Lyng et al. 2020)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 2.56	Mean 0.51 lower (ns)
Self-harm acts (NSSI)
Assessed with SASII Follow-up: mean 2 years	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^b,c for similar effects	–	Mean number at endpoint = 7.9	Mean 1.5 more (ns)
Suicidal ideation
Assessed with SBQ and BSS Follow-up: 6 months to 2 years	154 (one RCT, one observational study: Linehan et al. 2015; Lyng et al. 2020)	⨁◯◯◯; VERY LOW^b,c for similar effects	–	Not estimable (different scales)	Mean 4.1 to mean 7.7 lower (ns)
Suicide attempts
Assessed with SASII Follow-up: mean 2 years	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^b,c for similar effects	–	Mean number at endpoint = 2.0	Mean 0.5 fewer (ns)
General psychopathology
Assessed with SCL-90 Follow-up: mean 6 months	88 (one observational study: Lyng et al. 2020)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 2.09	Mean 0.32 lower (ns)

BPD = borderline personality disorder; BSL-23 = Borderline Symptom List–23; BSS = Beck Scale for Suicide Ideation; CI = confidence interval; DBT = dialectical behavior therapy; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; ns = not significant; NSSI = nonsuicidal self-injury; RCT = randomized controlled trial; SASII = Suicide Attempt Self-Injury Interview; SBQ = Suicidal Behaviors Questionnaire; SCL-90 = Symptom Checklist–90.

^aHigh risk of bias due to attrition, confounding, and selection bias; downgraded two steps for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^cHigh risk of bias due to deviations from intended intervention and attrition; downgraded one step for risk of bias.

Certainty-of-evidence ratings of outcomes comparing individual DBT with standard DBT for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with standard DBT	Difference in effect with individual DBT therapy
				Anticipated absolute effects
Anxiety
Assessed with Ham-A Follow-up: end of 1-year treatment	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 17.2	Mean 7.1 higher (ns)
Depression
Assessed with Ham-D Follow-up: end of 1-year treatment	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^a,b for greater effect with DBT	–	Mean score at endpoint = 12.3	Mean 5.9 higher (P = 0.03)
Self-harm acts (NSSI)
Assessed with SASII Follow-up: 2 years	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean number at endpoint = 7.9	Mean 8.1 more (ns)
Suicidal ideation
Assessed with SBQ Follow-up: 2 years	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 28.9	Mean 3.4 lower (ns)
Suicide attempts
Assessed with SASII Follow-up: mean 2 years	66 (one RCT: Linehan et al. 2015)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean number at endpoint = 2.0	Mean 1.6 more (ns)

BPD = borderline personality disorder; CI = confidence interval; DBT = dialectical behavior therapy; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; Ham-A = Hamilton Rating Scale for Anxiety; Ham-D = Hamilton Rating Scale for Depression; ns = not significant; NSSI = nonsuicidal self-injury; RCT = randomized controlled trial; SASII = Suicide Attempt Self-Injury Interview; SBQ = Suicidal Behaviors Questionnaire.

^aHigh risk of bias due to deviations from intended intervention and attrition; downgraded one step due to risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Certainty-of-evidence ratings of outcomes comparing combined individual plus group DBT with individual DBT for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with individual DBT therapy	Difference in effect with combined individual plus group therapy DBT
				Anticipated absolute effects
Self-harm behaviors
Follow-up: 18 months	51 (one nRCT: Andión et al. 2012)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean number at endpoint = 22	Mean 13 fewer (ns)
Suicide attempts
Follow-up: 18 months	51 (one nRCT: Andión et al. 2012)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean number at endpoint = 14	Mean 8 fewer (ns)

^aHigh risk of bias due to deviations from intended intervention; downgraded one step due to risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

One prospective cohort study (Lyng et al. 2020), rated as having a high risk of bias, assessed improvements in the severity of BPD. The study, which included 88 participants, reported no clinical improvements in BSL-23 scores among patients receiving 6 months of stand-alone DBT skills training or 6 months of the full four-component DBT program and no significant difference between the groups. There were several serious limitations to the study, including that high-risk patients (defined as those with a suicide attempt and/or deliberate self-harm that had required treatment by a physician in the previous 6 months) were excluded from the DBT skills training group but not from the full DBT group.

Severity of symptoms associated with borderline personality disorder

Three studies (Andión et al. 2012; Linehan et al. 2015; Lyng et al. 2020) investigating individual components of DBT assessed changes in the severity of symptoms associated with BPD, and all reported no significant differences between groups regarding reduction in suicide attempts and improvements in self-harm acts and suicidal ideation. One study (Linehan et al. 2015) found a significant improvement in Hamilton Rating Scale for Depression scores at the end of 1-year treatment among participants receiving standard DBT and the group skills component of DBT versus those receiving only the individual therapy component of DBT (P = 0.02). There were no differences in anxiety scores at the end of the 1-year treatment phase (Linehan et al. 2015).

Global impression and functioning

The study by Lyng et al. (2020; N = 88), rated as having a high risk of bias, comparing 6 months of stand-alone DBT skills training with 6 months of the full four-component DBT program reported no significant difference between groups on the Global Severity Index of the SCL-90-R.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

No studies reported on treatment-related adverse events including withdrawal due to adverse events.

Dialectical Behavior Therapy Versus Community Therapy by Experts

One RCT (N = 111) (Linehan et al. 2006), rated as having a high risk of bias, compared DBT with community therapy offered by nonbehavioral psychotherapy experts over 1 year. All participants were female, with a mean age of 29 years, who had at least two suicide attempts; the majority were White (87%). The severity of BPD at baseline was not reported.

We rated the study as having a high risk of bias because of lack of intention-to-treat analysis. The follow-up duration was 2 years, and the study was funded by the National Institute of Mental Health (Linehan et al. 2006).

The intervention group received standard DBT for 1 year, including weekly individual psychotherapy sessions, weekly group skills training, and telephone consultation as needed (Linehan et al. 2006). Community treatment by experts involved selected psychotherapists who were matched with therapists administering DBT by controlling for sex, availability, expertise, allegiance, training, and experience.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–17 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing DBT with community therapy by experts

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with community therapy by experts	Difference in effect with DBT
				Anticipated absolute effects
Suicide attempts
Follow-up: mean 2 years	101 (one RCT: Linehan et al. 2006)	⨁⨁◯◯; LOW^a,b for greater effects with DBT	HR 2.66 (2.40–18.07)	469 per 1,000	345 more per 1,000 (312–531 more)
Self-harm
Assessed with mean number of events Follow-up: mean 2 years	101 (one RCT: Linehan et al. 2006)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean number at endpoint = 3.0	Mean 0 lower (ns)
Depression
Assessed with Ham-D Follow-up: mean 2 years	101 (one RCT: Linehan et al. 2006)	⨁⨁◯◯; LOW^a,c for similar effects	–	Mean score at endpoint = 14.4	Mean 1.8 lower (ns)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI = confidence interval; DBT = dialectical behavior therapy; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; Ham-D = Hamilton Rating Scale for Depression; HR = hazard ratio; ns = not significant; RCT = randomized controlled trial.

^aLack of intention-to-treat analysis: downgraded one step for risk of bias.

^bOverall few events; downgraded one step for imprecision.

^cStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

Severity of borderline personality disorder

The study did not report on severity of BPD.

Severity of symptoms associated with borderline personality disorder

At the end of the treatment period (12 months) and after the 2-year follow-up, participants in the DBT group had significantly fewer suicide attempts and emergency department visits or hospital admissions because of suicidal ideation and behavior (Linehan et al. 2006).

No significant differences between treatment groups were apparent for self-harm and depressive symptoms (Linehan et al. 2006).

Global impression and functioning

The study did not report on global impression and functioning.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Dynamic Deconstructive Psychotherapy Versus Treatment as Usual

A retrospective cohort study (Gregory and Sachdeva 2016) evaluated the efficacy of DDP compared with TAU. The study provided data on 44 participants. The study was rated as having a high risk of bias because of confounding and attrition. The follow-up duration was 12 months, and the study was funded by the American Psychoanalytic Association. The majority of the study participants were female and White, and the mean age was 28 years. This study reported baseline BEST scores ranging from 46 to 49. The study excluded patients with schizophrenia, intellectual disabilities, or dementia.

DDP involved weekly individual sessions over a 12-month period and combined elements of translational neuroscience, object relations theory, and deconstruction philosophy (Gregory and Sachdeva 2016). TAU consisted of unstructured psychotherapy.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–18 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing DDP with TAU

Outcomes	Participants, N (studies); follow-up	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with DDP
				Anticipated absolute effects
Severity of BPD
Assessed with BEST Follow-up: mean 12 months	44 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effects with DDP	–	Mean severity score at endpoint = 42.9	Mean 9.9 lower (P= 0.006)
Depression
Assessed with BDI Follow-up: mean 12 months	44 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effects with DDP	–	Mean depression score at endpoint = 29.6	Mean 12.5 lower (P < 0.001)
Self-injuries
Follow-up: mean 12 months	44 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for similar effect	–	Mean number of self-injuries at endpoint = 1.8	Mean 0.5 lower (ns)
Suicide attempts
Follow-up: mean 12 months	44 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for similar effect	–	Mean number of attempts at endpoint = 1.5	Mean 0.94 lower (ns)
Functioning
Assessed with SDS Follow-up: mean 12 months	44 (one observational study: Gregory and Sachdeva 2016)	⨁◯◯◯; VERY LOW^a,b for greater effects with DDP	–	Mean functioning score = 7.0	Mean 3.2 lower (P < 0.001)

BDI = Beck Depression Inventory; BEST = Borderline Evaluation of Severity Over Time; BPD = borderline personality disorder; CI = confidence interval; DDP = dynamic deconstructive psychotherapy; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; SDS = Sheehan Disability Scale; TAU = treatment as usual.

^aNot controlled for confounding; downgraded two steps for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

At the 1-year follow-up, the study found that participants in the DDP group had significant improvements in the differences on the BEST scale when compared with the TAU group (Gregory and Sachdeva 2016).

Severity of symptoms associated with borderline personality disorder

The study reported no significant differences in the mean number of self-injuries or suicide attempts but did report significant improvements in mean scores on the BDI for participants in the DDP group when compared with TAU (Gregory and Sachdeva 2016).

Global impression and functioning

The study reported significant improvements in mean scores on the Sheehan Disability Scale for participants in the DDP group when compared with TAU after 12 months (Gregory and Sachdeva 2016).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Mentalization-Based Treatment Versus Treatment as Usual

One RCT (Beck et al. 2020) evaluated the efficacy of MBT compared with TAU alone. This Danish study included 112 participants who were followed for a duration of 12 months. The study was rated as having a high risk of bias because of high attrition. The trial reported no commercial funding.

Almost all of the study participants were female, with the exception of one person (Beck et al. 2020). The mean age was 16 years. The study excluded participants with comorbid diagnosis of pervasive developmental disorder, learning disability, anorexia, current psychosis, schizophrenia or schizotypal personality disorder, antisocial personality disorder, any other mental disorder other than BPD considered the primary diagnosis, current (past 2 months) substance use disorder (SUD; but not substance abuse), and current psychiatric inpatient treatment.

MBT, delivered over 12 months, consisted of 3 introductory sessions, 37 weekly group sessions (90 minutes each), 5 individual case formulation sessions, and 6 sessions for caregivers (Beck et al. 2020). TAU consisted of at least 12 individual supportive sessions, one per month, comprising psychoeducation, counseling, and crisis management and sessions as needed.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–19presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing MBT with TAU

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with MBT (95% CI)
				Anticipated absolute effects
Severity of BPD
Assessed with BPFS-C, BPFS-P, ZAN-BPD Follow-up: mean 1 years	112 (one RCT: Beck et al. 2020)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean score at endpoint for BPFS-C = 71.3; for BPFS-P = 68.7; for ZAN-BPD = 8.0	Mean for BPFS-C 0 (ns), for BPFS-P 0.1 lower (–7.0 to 7.3), for ZAN-BPD 0.6 lower (95% CI, –4.0 to 2.8)
BPD symptoms
Assessed with BDI-Y, RTSHIA Follow-up: mean 1 year	112 (one RCT: Beck et al. 2020)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean score at endpoint for BDI-Y = 64.3, for RTSHIA = 39.0	Mean for BDI-Y 0.7 lower (–6.5 to 5.1), for RTSHIA 1.4 lower (–7.1 to 4.3)
Functioning
Assessed with CGAS Follow-up: mean 1 year	112 (one RCT: Beck et al. 2020)	⨁⨁◯◯; LOW^a,b for similar effects	–	Mean score at endpoint = 46.7	Mean 0.5 higher (–5.8 to 6.7)

BDI-Y = Beck Depression Inventory-Youth; BPD = borderline personality disorder; BPFS-C = Borderline Personality Features Scale for Children; BPFS-P = Borderline Personality Features Scale for Parents; CGAS = Children's Global Assessment Scale; CI = confidence interval; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MBT = mentalization-based treatment; RCT = randomized controlled trial; RTSHIA = Risk-Taking and Self-Harm Inventory for adolescents; TAU = treatment as usual; ZAN-BPD = Zanarini Rating Scale for Borderline Personality Disorder.

^aHigh attrition; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

Severity of borderline personality disorder

After 12 months of treatment, the study reported no significant differences between groups on the Borderline Personality Features Scale for Children, the Borderline Personality Features Scale for Parents, or the ZAN-BPD (Beck et al. 2020).

Severity of symptoms associated with borderline personality disorder

After 12 months of treatment, the study reported no significant differences between groups on self-harm (measured by the Risk-Taking and Self-Harm Inventory for Adolescents) or depression (measured by the BDI for Youth) (Beck et al. 2020).

Global impression and functioning

After 12 months of treatment, the study reported no significant differences between groups on the Children’s Global Assessment Scale (Beck et al. 2020).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study reported no adverse events in either arm.

Mentalization-Based Treatment Versus Supportive Therapy

Three RCTs, described in four articles, compared MBT with supportive therapy (Bateman and Fonagy 2009; Bateman et al. 2021; Carlyle et al. 2020; Jørgensen et al. 2013). Together, these studies provided data on 317 participants. Supportive therapy was not identical across the studies, but all included group sessions that focused on supportive techniques such as problem-solving. Two studies were rated as having a moderate risk of bias (Bateman and Fonagy 2009; Carlyle et al. 2020), and the other as a high risk of bias (Jørgensen et al. 2013). Reasons for ratings of high risk of bias included high attrition and deviations from the intended intervention.

The majority of participants were female, and the mean age across the three studies was 31 years. Race was reported in two studies, in which the majority of participants were White (Bateman and Fonagy 2009; Carlyle et al. 2020). Two studies were conducted in Europe (Bateman and Fonagy 2009; Jørgensen et al. 2013) and one in New Zealand (Carlyle et al. 2020). No study reported severity of BPD at baseline; however, one study reported global severity of symptoms at baseline that ranged from 1.7 points to 2.0 points on the Symptom Checklist–90 (SCL-90) Global Severity Index scale (Jørgensen et al. 2013). Treatment durations ranged from 18 months (Bateman and Fonagy 2009; Carlyle et al. 2020) to 24 months (Jørgensen et al. 2013). No study had commercial funding; one was funded through a foundation grant (Bateman and Fonagy 2009).

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–20 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing MBT with supportive therapy for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with supportive therapy	Difference in effect with MBT
				Anticipated absolute effects
Anxiety
Assessed with BAI Follow-up: 24 months	85 (one RCT: Jørgensen et al. 2013)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 15.6	Mean 2.1 lower (ns)
Depression
Assessed with BDI Follow-up: 18–24 months	219 (two RCTs: Bateman and Fonagy 2009; Jørgensen et al. 2013)	⨁◯◯◯; VERY LOW^c,d,e for inconsistent effects	–	Mean score at endpoint = 18.68^f	Inconsistent findings
General psychopathology
Assessed with SCL-90-GSI Follow-up: 18–24 months	219 (two RCTs: Bateman and Fonagy 2009; Jørgensen et al. 2013)	⨁◯◯◯; VERY LOW^c,d,e for inconsistent effects	–	Mean score at endpoint = 1.55^f	Inconsistent findings
Global functioning
Assessed with GAF Follow-up: 18–24 months	219 (two RCTs: Bateman and Fonagy 2009; Jørgensen et al. 2013)	⨁⨁◯◯; LOW^c,e for greater effect with MBT	–	Mean score at endpoint = 53.2^f	Mean 7.7 higher^f (P < 0.001)
Interpersonal functioning
Assessed with IIP Follow-up: 18–24 months	219 (two RCTs: Bateman and Fonagy 2009; Jørgensen et al. 2013)	⨁◯◯◯; VERY LOW^c,d,e for inconsistent effects	–	Mean score at endpoint = 1.65^f	Inconsistent findings
Severe self-harm incidents
Assessed with SCL-90-R Follow-up: 18 months	206 (two RCTs: Bateman and Fonagy 2009; Carlyle et al. 2020)	⨁⨁◯◯; LOW^d,e for inconsistent effects	–	Mean number at endpoint = 1.66^f	Inconsistent findings
Suicide attempts
Assessed with SCL-90-R Follow-up: 18 months	206 (two RCTs: Bateman and Fonagy 2009; Carlyle et al. 2020)	⨁⨁◯◯; LOW^d,e for inconsistent effects	–	Mean number at endpoint = 0.32^f	Inconsistent findings

BAI = Beck Anxiety Inventory; BDI = Beck Depression Inventory; BPD = borderline personality disorder; CI = confidence interval; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; GSI = Global Severity Index; IIP = Inventory of Interpersonal Problems; MBT = mentalization-based treatment; ns = not significant; RCT = randomized controlled trial; SCL-90-GSI = Symptom Checklist–90–Global Severity Index; SCL-90-R = Symptom Checklist–90–Revised.

^aStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bHigh risk of bias due to attrition and deviations from intended intervention; downgraded one step for risk of bias.

^cOne of two studies was high risk of bias due to attrition and deviations from intended intervention; downgraded one step for risk of bias.

^dTwo studies reported opposite direction of outcome; downgraded one step for inconsistency.

^eStudies do not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

^fValue is for the study rated at a moderate risk of bias (Bateman and Fonagy 2009).

Severity of borderline personality disorder

No study reported on the severity of BPD.

Severity of symptoms associated with borderline personality disorder

All three studies assessed symptoms associated with BPD and reported mixed findings (Bateman and Fonagy 2009; Bateman et al. 2021; Carlyle et al. 2020; Jørgensen et al. 2013). Following 18 months of treatment, one study (N = 134), rated as having a moderate risk of bias, reported a significant reduction in suicide attempts, hospitalizations, and life-threatening self-harm in the previous 6-month period, along with improvements in interpersonal functioning and depression among patients receiving MBT compared with supportive therapy and case management (Bateman and Fonagy 2009). A 6-year follow-up of 97 participants reported that, compared with the supportive treatment and case management groups, significantly more of the MBT group who had achieved the primary recovery criteria (i.e., free of self-harm, suicide attempts, and inpatient hospital stays) had remained well during the follow-up period (Bateman et al. 2021).

In contrast, a similar study, rated as having a moderate risk of bias, attempting to replicate findings by Bateman and colleagues found no significant differences between groups in incidents of severe self-harm and suicide attempts in the previous 6 months (Carlyle et al. 2020). Similarly, a study (N = 111), rated as having a high risk of bias, reported no differences between groups in terms of interpersonal functioning, depression, and anxiety (Jørgensen et al. 2013).

Global impression and functioning

With the exception of one outcome for which there was agreement, studies reported mixed findings in terms of global impression and functioning (Bateman and Fonagy 2009; Bateman et al. 2021; Carlyle et al. 2020; Jørgensen et al. 2013). One study (N = 134), rated as having a moderate risk of bias, reported significant improvements in Global Severity Index (using the SCL-90 Global Severity Index) among patients receiving MBT compared with supportive therapy and case management (Bateman and Fonagy 2009). In contrast, another study (N = 111), rated as having a high risk of bias, reported no differences between groups on the SCL-90 Global Severity Index (Jørgensen et al. 2013). Both studies reported significant improvement in independently rated global assessment functioning among patients receiving MBT compared with patients receiving supportive therapy (Bateman and Fonagy 2009; Jørgensen et al. 2013).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

No study reported on treatment-related adverse events including withdrawal due to adverse events.

Mentalization-Based Treatment Versus Specialized Psychotherapy

Two studies, one RCT (Laurenssen et al. 2018) that was rated as having a moderate risk of bias and one observational study with a nonconcurrent control group (Bales et al. 2015) that was rated as having a high risk of bias, compared day-hospital MBT with another specialized psychotherapy. Together, these studies provided data on 299 participants. Day-hospital MBT differed from typical MBT in terms of intensity; it involved daily group psychotherapy and weekly individual therapy along with art and writing therapy. The specialized psychotherapy comparator groups consisted of a variety of treatments, settings, and durations that were explicitly not limited to supportive therapy. Reasons for the high risk of bias rating included confounding and measurement of outcomes.

The majority of participants were female, and the mean age across the two studies was 32 years (Bales et al. 2015; Laurenssen et al. 2018). Race was not reported in either study, both of which were conducted in the Netherlands. One study reported BPD severity ranging from 32.8 points to 34.3 points at baseline using the BPD Severity Index (Laurenssen et al. 2018). Treatment duration was 18 months in both studies, with one study following patients through 36 months (Bales et al. 2015). Neither study had commercial funding.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–21 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing MBT with specialized psychotherapy

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with specialized psychotherapy	Difference in effect with day-hospital MBT
				Anticipated absolute effects
Severity of BPD
Assessed with BPDSI Follow-up: 18 months	95 (one RCT: Laurenssen et al. 2018)	⨁⨁◯◯; LOW^a for similar effects	–	Mean score at endpoint = 21.39	Mean 0.76 lower (ns)
General psychopathology
Assessed with GSI of BSI Follow-up: 18–36 months	299 (one RCT, one observational study; Bales et al. 2015; Laurenssen et al. 2018)	⨁◯◯◯; VERY LOW^b,c,d for inconsistent effects	–	Mean score at endpoint = 1.04^e	Inconsistent findings
Interpersonal functioning
Assessed with IIP Follow-up: 18 months	95 (one RCT: Laurenssen et al. 2018)	⨁⨁◯◯; LOW^a for similar effects	–	Mean score at endpoint = NR	NR (ns)

BPD = borderline personality disorder; BPDSI = Borderline Personality Disorder Severity Index; BSI = Brief Symptom Inventory; CI = confidence interval; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; GSI = Global Severity Index; IIP = Inventory of Interpersonal Problems; MBT = mentalization-based treatment; NR = not reported; ns = not significant; RCT = randomized controlled trial.

^aStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bOne of two studies was rated at a high risk of bias due to confounding and high risk for bias in the measurement of outcomes; downgraded one step for risk of bias.

^cTwo studies reported opposite direction of outcome; downgraded one step for inconsistency.

^dStudies do not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

^eValue is for the study rated at a high risk of bias (Bales et al. 2015). Data NR for the study rated at a moderate risk of bias (Laurenssen et al. 2018).

Severity of borderline personality disorder

One study (N = 95), rated as having a moderate risk of bias, examined improvements in the severity of BPD as a primary outcome of interest and found no significant difference between 18 months of day-hospital MBT and 18 months of specialized psychotherapy in BPD Severity Index total scores and Personality Assessment Inventory–Borderline Features Scale scores (Laurenssen et al. 2018). There was significant improvement from baseline in both groups.

Severity of symptoms associated with borderline personality disorder

One study, rated as having a moderate risk of bias, reported no significant difference between day-hospital MBT and specialized psychotherapy on the Inventory of Interpersonal Problems (Laurenssen et al. 2018). There was significant improvement from baseline in both groups.

Global impression and functioning

Both studies (Bales et al. 2015; Laurenssen et al. 2018) examined global symptom severity using the Global Severity Index of the Brief Symptom Inventory and found mixed results. As with the other outcomes, the study by Laurenssen et al. (2018) (N = 95), rated as having a moderate risk of bias, reported no significant difference in the severity of symptoms among patients receiving day-hospital MBT and those receiving specialized psychotherapy. In contrast, at the end of 18 months of treatment and again at the 36-month follow-up, the study by Bales et al. (2015; N = 204), rated as having a high risk of bias, reported significant improvements in symptom severity (measured using the Global Severity Index of the Brief Symptom Inventory) among patients receiving day-hospital MBT compared with those receiving specialized psychotherapy.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

One study (Laurenssen et al. 2018), rated as having a moderate risk of bias, reported no serious adverse events among patients receiving either day-hospital MBT or other specialized psychotherapy.

Systems Training for Emotional Predictability and Problem Solving Versus Treatment as Usual

Two RCTs (Blum et al. 2008; Bos et al. 2010) and one prospective cohort study (González-González et al. 2021) evaluated the efficacy of STEPPS compared with TAU. The studies provided data on 362 participants. One RCT (Bos et al. 2010) was rated as having a moderate risk of bias because of differential attrition, and one RCT (Blum et al. 2008) and one cohort study (González-González et al. 2021) were rated as having a high risk of bias for high overall attrition. Additionally, the cohort study had risks of bias from selection and confounding. The timing of the initial follow-up ranged from 20 weeks to 24 weeks for the two RCTs. Both reported 1-year outcomes. For one RCT, the primary endpoint was at 20 weeks (Blum et al. 2008); for the other, the primary endpoint was at 1 year (Bos et al. 2010). For the cohort study (González-González et al. 2021), the primary endpoint was at 2 years. Both RCTs were funded; neither had pharmaceutical industry support. The cohort study did not have specific funding. The majority of the study participants were female, and the mean age was 32 years in the RCTs and 34 years in the cohort study. Only one of the studies reported ethnicity; 94% of participants were White (Blum et al. 2008). One RCT reported mean baseline BEST scores ranging from 39 to 40 (Blum et al. 2008). The cohort study reported mean baseline BEST scores ranging from 50 to 52. Studies excluded patients with psychotic or primary neurological disorders, who were cognitively impaired, or who had participated in STEPPS previously.

STEPPS involved 18 or 20 weekly therapy sessions; components included psychoeducation about BPD, emotion management skills training, and behavior management skills training. TAU consisted of usual care such as individual psychotherapy, medication, and case management.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–22 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing STEPPS with TAU

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TAU	Difference in effect with STEPPS
				Anticipated absolute effects
Severity of BPD
Assessed with ZAN-BPD, BPD-40, BEST Follow-up: mean 20 weeks to 2 years	240 (two RCTs, one prospective cohort: Blum et al. 2008; Bos et al. 2010; González-González et al. 2021)	⨁⨁⨁◯; MODERATE for greater effects with STEPPS^a	–	Mean score at primary endpoint on ZAN-BPD = 13.4; on BPD-40 = 88.6; on BEST = 34.1 in trial and 28.8 in cohort	Mean 3.6 lower on ZAN-BPD; 10.4 lower on BPD-40 (P = 0.001); 2.3 lower on BEST (ns) in trial, 17.7 lower in cohort (P < 0.0)
Depression
Assessed with BDI Follow-up: mean 20 weeks	124 (one RCT: Blum et al. 2008)	⨁⨁◯◯; LOW^a,b for greater effect with STEPPS	–	Mean score at primary endpoint = 25.8	Mean 3.8 higher (P= 0.03)
Impulsiveness
Assessed with BIS Follow-up: mean 20 weeks	124 (one RCT: Blum et al. 2008)	⨁⨁◯◯; LOW^a,b for greater effect with STEPPS	–	Mean score at primary endpoint = 76.8	Mean 4.1 lower (P = 0.004)
Self-harm attempts
Follow-up: mean 1 year	124 (one RCT: Blum et al. 2008)	⨁⨁◯◯; LOW^a,b for similar effects	Not estimable	NR	(ns)
Suicide attempts
Follow-up: mean 1 year	124 (one RCT: Blum et al. 2008)	⨁⨁◯◯; LOW^a,b for similar effects^,	Not estimable	NR	(ns)
General psychopathology
Assessed with CGI-S, CGI-I, SCL-90 Follow-up: 20 weeks to 1 year	203 (two RCTs: Blum et al. 2008; Bos et al. 2010)	⨁⨁⨁◯; MODERATE for greater effects with STEPPS^a	–	Varied by study and measure	P ≤ 0.03
Quality of life
Assessed with WHOQOL Follow-up: mean 1 year	79 (one RCT: Bos et al. 2010)	⨁⨁⨁◯; MODERATE^b for greater effect with STEPPS	–	Mean score at primary endpoint = 11.3	Mean 1.3 higher (0 to 0)
Functioning
Assessed with GAS, SAS Follow-up: mean 20 weeks	124 (one RCT: Blum et al. 2008)	⨁⨁◯◯; LOW^a,b for greater effect with STEPPS	–	Mean score at primary endpoint on GAS = 43.5; on SAS = 26.3	Mean 7 higher on GAS (ns); 1.7 lower on SAS (ns)

BDI = Beck Depression Inventory; BEST = Borderline Evaluation of Severity Over Time; BIS = Barratt Impulsiveness Scale; BPD = borderline personality disorder; BPD-40 = Borderline Personality Disorder checklist–40; CGI-I = Clinical Global Impressions-Improvement; CGI-S = Clinical Global Impressions-Severity; CI = confidence interval; GAS = Global Assessment Scale; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; Nr = not reported; ns = not significant; RCT = randomized controlled trial; SAS = Social Assessment Scale; SCL-90 = Symptom Checklist–90; STEPPS = Systems Training for Emotional Predictability and Problem Solving; TAU = treatment as usual; WHOQOL = World Health Organization Quality of Life; ZAN-BPD = Zanarini Rating Scale for Borderline Personality Disorder.

^aHigh overall attrition; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

Severity of borderline personality disorder

All three studies reported that STEPPS was associated with significant improvements in BPD-specific symptoms (measured by ZAN-BPD and Borderline Personality Disorder Checklist–40) at the primary endpoint (20 weeks, 1 year, and 2 years, respectively) as compared with TAU. However, one RCT, rated as having a high risk of bias, reported no differences on the BEST scale for participants in the STEPPS group compared with the TAU group at 20 weeks or between 20 weeks and 1 year (Blum et al. 2008).

Severity of symptoms associated with borderline personality disorder

An RCT (Blum et al. 2008), rated as having a high risk of bias, reported significant improvement in impulsiveness (measured by the BIS) and depression (measured by the BDI) for participants in the STEPPS group when compared with TAU at 20 weeks. The same RCT reported no significant differences in suicide attempts or self-harm acts at 1 year.

Global impression and functioning

Both RCTs (Blum et al. 2008; Bos et al. 2010) reported on global impression and functioning using four scales: global impression using SCL-90 (at 20 weeks and 1 year in one RCT and at 24 weeks in another) and the CGI (at 20 weeks and 1 year in one RCT), quality of life using the World Health Organization Quality of Life scale (at 1 year in one RCT), and functioning using the Social Adjustment Scale and Global Assessment Scale (at 20 weeks and 1 year in one RCT). Together, these findings suggest benefits in global impression and functioning for the STEPPS group compared with TAU.

Regarding global impressions at 20–24 weeks using SCL-90, both RCTs reported significant improvement for the STEPPS when compared with TAU (Blum et al. 2008; Bos et al. 2010). One RCT (Blum et al. 2008), rated as having a high risk of bias, also reported significant improvement for the STEPPS group when compared with TAU at 20 weeks in CGI severity and improvement ratings. The same study, rated as having a high risk of bias, reported no significant differences between 20 weeks and 1 year in SCL-90 or CGI severity or improvement ratings.

Regarding quality of life, one RCT (Bos et al. 2010), rated as having a moderate risk of bias, reported significant improvement for the STEPPS group when compared with TAU at 1 year.

Regarding functioning, one RCT (Blum et al. 2008), rated as having a high risk of bias, reported significant differences favoring the STEPPS group at 20 weeks and no significant differences between 20 weeks and 1 year in functioning (measured by the Global Assessment Scale). However, the same study reported no significant differences in social adjustment (measured by the Social Adjustment Scale at 20 weeks and between 20 weeks and 1 year).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The studies did not report on adverse events or withdrawals due to adverse events.

Transference-Focused Psychotherapy Versus Treatment by Experienced Community Psyhcotherapists

One RCT (Doering et al. 2010) conducted in Austria and Germany evaluated the efficacy of TFP compared with TAU. The study provided data on 104 participants. The study was rated as having a high risk of bias because of high differential attrition from follow-up. Follow-up duration was 12 months. The trial was funded by the Austrian National Bank.

All of the study participants were female and had a mean age of 28 years (Doering et al. 2010). The ethnicity of the participants was not reported. Authors noted that the study included participants with less severe BPD, with higher GAF scores, fewer comorbid Axis I and II disorders, and fewer self-harming acts than other treatment studies of BPD because patients with more severe symptoms would receive inpatient treatment in Austria and Germany. Studies excluded patients with schizophrenia; bipolar I and II disorder with a major depressive, manic, or hypomanic episode during the previous 6 months; SUD in the past 6 months; or organic pathology or intellectual disability.

TFP is a modified psychodynamic therapy and consists of two 50-minute sessions delivered every week by experienced clinical psychologists or medical doctors, along with medications as needed for 1 year of treatment (Doering et al. 2010). TAU consisted of individualized standard care from community psychiatrists.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–23 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing TFP with treatment by experienced community psychotherapists

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effects with treatment by experienced community psychotherapists	Difference in effect with TFP
				Anticipated absolute effects
Severity of BPD symptoms
Assessed with proportion meeting fewer than five DSM-IV diagnostic criteria Follow-up: mean 1 year	104 (one RCT: Doering et al. 2010)	⨁⨁◯◯; LOW^a,b for greater effect with TFP	RR 2.23 (1.07–4.65)	154 per 1,000	189 more per 1,000 (11 more to 562 more)
Anxiety
Assessed with STAI Follow-up: mean 1 year	104 (one RCT: Doering et al. 2010)	⨁◯◯◯; VERY LOW^a,c for similar effect	–	Mean score at endpoint for state = 50.47; for trait anxiety = 55.49	Mean score for state 2.30 higher and for trait anxiety 0.43 lower (ns)
Depression
Assessed with BDI Follow-up: mean 1 year	104 (one RCT: Doering et al. 2010)	⨁◯◯◯; VERY LOW^a,c for similar effect	–	Mean score at endpoint = 20.02	Mean 1.65 higher (ns)
Suicide attempts
Assessed with proportion with any suicide attempts Follow-up: mean 1 year	104 (one RCT: Doering et al. 2010)	⨁◯◯◯; VERY LOW^a,d for similar effect	RR 0.63 (0.27–1.51)*	135 per 1,000	50 fewer per 1,000 (98 fewer to 69 more)
General psychopathology
Assessed with BSI Follow-up: mean 1 year	104 (one RCT: Doering et al. 2010)	⨁◯◯◯; VERY LOW^a,c for similar effect	–	Mean score at endpoint = 1.27	MD 0.06 higher (ns)
Functioning
Assessed with GAF Follow-up: mean 1 year	104 (one RCT: Doering et al. 2010)	⨁⨁◯◯; LOW^a,b for greater effect with TFP	–	Mean score at endpoint = 56.06	Mean 2.6 higher (P= 0.001)

BDI = Beck Depression Inventory; BPD = borderline personality disorder; BSI = Brief Symptom Inventory; CI = confidence interval; DSM-IV = Diagnostic and Statistical Manual of Mental Disorders; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; LOCF = last observation carried forward; MD = mean difference; ns = not significant; RCT = randomized controlled trial; RR = risk ratio; STAI = State–Trait Anxiety Inventory; TAU = treatment as usual; TFP = transference-focused psychotherapy.

*Calculated based on data at follow-up.

^aHigh overall and differential attrition; downgraded one step for risk of bias.

^bFew events or study does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for precision.

^cStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); results likely had wide CIs, P not significant; downgraded two steps for precision.

^dFew events; significant LOCF results, adjustment for dose in completers analyses no longer significant; downgraded two steps for precision.

Severity of borderline personality disorder

TFP was superior to TAU in last-observation-carried-forward analyses for the number of DSM-IV (American Psychiatric Association 1994) diagnostic criteria on average for BPD and proportion having fewer than five DSM-IV borderline criteria after 1 year (Doering et al. 2010).

Severity of symptoms associated with borderline personality disorder

The study (Doering et al. 2010) reported a significantly lower proportion of participants with suicide attempts for TFP than TAU for last-observation-carried-forward analyses and marginally significant for number of suicide attempts. However, completers analyses controlling for dose response for number of psychotherapy sessions (48.5 sessions, on average, for TFP vs. 18.6 for community psychotherapists) found no significant differences in either measure. The study reported no significant differences in depression (measured by BDI) or state and trait anxiety (measured by State-Trait Anxiety Inventory).

Global impression and functioning

TFP was significantly superior to TAU for GAF scores but not for the Brief Symptom Inventory (Doering et al. 2010).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report on the incidence of adverse events, serious adverse events, or withdrawal due to adverse events.

Transference-Focused Psychotherapy Versus Schema-Focused Therapy

One RCT (described in two publications; Giesen-Bloo et al. 2006; Spinhoven et al. 2007), rated as having a high risk of bias and conducted in the Netherlands, compared TFP with schema-focused therapy (SFT) in 88 patients with BPD. The majority of participants were female (93%), with a mean age of 31 years. Race and ethnicity were not reported. Mean baseline BPD severity ranged from 33.5 points to 34.4 points on the BPD Severity Index. Reasons for a rating of high risk of bias included high attrition (39%) and measurement of outcomes.

Treatment duration was 3 years (Giesen-Bloo et al. 2006). Both TFP and SFT included two 50-minute sessions per week. The TFP focused on the patient-therapist relationship, while the SFT involved integrated cognitive therapy focused on four schema modes. The study was funded by a grant from the Dutch Health Care Insurance Board.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–24 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing TFP with SFT for BPD

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with TFP	Difference in effect with SFT
				Anticipated absolute effects
Severity of BPD
Assessed with BPDSI Follow-up: 3 years	88 (one RCT: Giesen-Bloo et al. 2006)	⨁◯◯◯; VERY LOW^a,b for greater effect with SFT	–	Mean score at endpoint = 21.87	Mean 5.63 lower (P= 0.005)
Quality of life
Assessed with EQ Follow-up: 3 years	88 (one RCT: Giesen-Bloo et al. 2006)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 67.5	Mean 3.0 lower (ns)
Assessed with WHOQOL Follow-up: 3 years	88 (one RCT: Giesen-Bloo et al. 2006)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 11.09	Mean 0.5 higher (ns)

BPD = borderline personality disorder; BPDSI = Borderline Personality Disorder Severity Index; CI = confidence interval; EQ = European Quality of Life scale; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; ns = not significant; RCT = randomized controlled trial; SFT = schema-focused therapy; TFP = transference-focused psychotherapy; WHOQOL = World Health Organization Quality of Life Scale.

^aHigh risk of bias due to high attrition and moderate risk of bias related to measurement of outcomes; downgraded one step for risk of bias.

^bStudy does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

At the end of a 3-year treatment phase, participants receiving SFT exhibited significant greater clinical improvement on the BPD Severity Index than patients receiving TFP. Reliable clinical improvement (defined as improvement of at least 11.7 points at the last assessment) favored SFT over TFP (RR = 2.33 [95% CI 1.24 to 4.37]) (Giesen-Bloo et al. 2006).

Severity of symptoms associated with borderline personality disorder

The study did not report on symptoms associated with BPD.

Global impression and functioning

After 3 years of treatment, there was no significant difference between TFP and SFT in quality-of-life measures (Giesen-Bloo et al. 2006). There was significant improvement in quality-of-life scores from baseline in both groups.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report treatment-related adverse events including withdrawal due to adverse events.

Psychotherapy for Special Populations

Detailed information on main study characteristics and treatment effects is presented in Appendix D for nine studies that compared various psychotherapies within special populations. Overall, there is no evidence to support one psychotherapy over another for any of the special populations identified.

Comprehensive validation therapy plus 12-step versus dialectical behavior therapy for borderline personality disorder and substance use disorder

One RCT (N = 24; Linehan et al. 2002), rated as having a moderate risk of bias and conducted in the United States, compared comprehensive validation therapy plus 12-step (a manualized approach that provided the major acceptance-based strategies used in DBT in combination with participation in 12-step programs) with DBT for the treatment of comorbid BPD and SUD. At a 16-month follow-up, there was no significant difference between comprehensive validation therapy plus 12-step and DBT in percentage of opiate-positive urine specimens, Brief Symptom Inventory scores, and scores on the Global Adjustment Scale, although the percentage of opiate-positive urine specimens decreased and rating scale scores improved in both groups. In addition, the incidence of parasuicidal behavior, measured using the Parasuicide History Interview, did not differ between groups and was low throughout the treatment period.

Mentalization-based treatment plus substance use disorder treatment versus substance use disorder treatment alone for borderline personality disorder and substance use disorder

One feasibility RCT (N = 46; Philips et al. 2018), conducted in Sweden and rated as having a high risk of bias, compared MBT plus SUD treatment with SUD treatment alone for the treatment of BPD and SUD. The MBT included a combination of individual therapy and group therapy over 18 months. At 18 months, there was no significant difference between groups on any outcome measured, including borderline symptom severity, suicide attempts, self-harm, inventory of interpersonal problems, reflective functioning, and global functioning.

Dynamic deconstructive psychotherapy versus treatment as usual in the community for borderline personality disorder and alcohol use disorder

One RCT (N = 30; Gregory et al. 2008), conducted in the United States and rated as having a high risk of bias, compared DDP with TAU for the treatment of comorbid BPD and alcohol use disorder. DDP involved weekly individual therapy focused on fostering verbalization of affects and elaboration of recent interpersonal experiences into simple narratives. Participants were encouraged but not required to attend some form of group therapy. Most TAU participants received a combination of individual psychotherapy and medication management. At 12 months, there was no significant difference between DDP and TAU groups in parasuicide behavior (measured using the adapted 3-month version of the Lifetime Parasuicide Count), alcohol misuse, and dissociation. DDP led to significant improvements in depression and in core symptoms of BPD as measured by the BEST scale.

Dialectical behavior therapy plus dialectical behavior therapy–prolonged exposure versus dialectical behavior therapy alone for borderline personality disorder and posttraumatic stress disorder

One RCT described in two publications (N = 26; Harned et al. 2014, 2018), rated as having a high risk of bias and conducted in the United States, compared DBT plus DBT–prolonged exposure with standard DBT for the treatment of comorbid BPD and posttraumatic stress disorder (PTSD). This pilot study did not conduct a between-group statistical analysis on the primary outcomes related to intentional self-harm. Preliminary findings suggested that DBT plus prolonged exposure may improve global social adjustment, health-related quality of life, and achievement of good global functioning, but not interpersonal problems or quality of life.

Cognitive-behavioral therapy versus dialectical behavior therapy for borderline personality disorder and eating disorders

One nonrandomized clinical trial (N = 118; Navarro-Haro et al. 2021), rated as having a moderate risk of bias, compared CBT (described as TAU) with DBT for the treatment of comorbid BPD and eating disorders and found no significant differences between groups in the primary outcome of suicide attempts in the previous 6 months. Depression scores on the BDI-II were significantly better among patients receiving DBT than CBT. At a 6-year follow-up of 69 participants, there were no significant differences between participants who had received DBT and those who had received CBT for depression, emotional regulation, and resilience.

Specialist supportive clinical management versus modified mentalization- based treatment for borderline personality disorder and eating disorders

One RCT (N = 68; Robinson et al. 2016), conducted in the United Kingdom and rated as having a high risk of bias, compared specialist supportive clinical management with modified MBT for the treatment of comorbid BPD and eating disorders and found no significant difference between groups on the ZAN-BPD.

Cognitive therapy plus fluoxetine versus interpersonal therapy plus fluoxetine for borderline personality disorder and major depressive disorder

One RCT (N = 32; Bellino et al. 2007), conducted in Italy and rated as having a moderate risk of bias, compared cognitive therapy plus fluoxetine with interpersonal therapy plus fluoxetine for the treatment of comorbid BPD and MDD and at the 24-week follow-up found no differences between groups in symptoms of depression, anxiety, or global functioning scales.

Individual drug counseling versus integrative borderline personality disorder— oriented adolescent family therapy for borderline personality disorder and substance use disorder among adolescents

One RCT (N = 40; Santisteban et al. 2015), conducted in the United States and rated as having a high risk of bias, compared individual drug counseling with integrative BPD-oriented adolescent family therapy for the treatment of comorbid BPD and SUD. Individual drug counseling consisted of two sessions per week of individual manualized drug counseling with a monthly family meeting with caregivers. Goals of the treatment included identifying signs and symptoms of addiction and triggers to use, increasing motivation to achieve and sustain abstinence, and developing more effective problem-solving strategies. Integrative BPD-oriented adolescent family therapy consisted of two sessions per week that included family therapy, individual therapy, and skills-building interventions targeting factors that directly contribute to adolescent drug abuse and other self-harm behaviors, such as emotion dysregulation and impulsivity, failure to establish life goals and ineffective life skills, unstable family attachment, and maladaptive family interactions. At the 12-month follow-up, there was no significant difference between individual drug counseling and integrative BPD-oriented adolescent family therapy on BPD behavior as measured by the borderline personality scale from the Millon Adolescent Clinical Inventory and no significant difference in substance use.

Manualized good clinical care versus cognitive analytic therapy for adolescents with borderline personality disorder

One RCT (N = 86; Chanen et al. 2008), rated as having a moderate risk of bias and conducted in Australia, compared manualized good clinical practice with cognitive analytic therapy (which uses integrative psychotherapy) for adolescents with BPD. At 24 months, there were no significant differences between groups across a range of outcomes including BPD severity, parasuicidal behaviors, and functioning.

Grading of the Overall Supporting Body of Research Evidence for Benefits of Psychotherapy in Borderline Personality Disorder

▫

Magnitude of effect: Low. When studies showed differences between treatments, these were typically low in size. Few studies used wait-list control comparison conditions, and the effects of BPD-specific psychotherapies may be greater if compared with no treatment.

▫

Risk of bias: Moderate. Although a few studies had a low risk of bias, the majority of studies had a moderate or high risk of bias.

▫

Applicability: The studies included individuals with BPD, but some studies excluded patients who were at significant suicide risk or who had other co-occurring conditions, which would limit applicability. Most samples were White, although some studies did not describe the race or ethnicity of participants. Study populations were primarily young adult women in the United States, Canada, United Kingdom, Australia, or Europe. Differences in health care delivery systems may result in some differences from practice in the United States. Most studies were conducted in outpatients, and there may be less applicability to inpatient settings.

▫

Directness: Direct. Some of the outcomes such as functioning addressed patient-oriented outcomes, whereas others such as BPD severity addressed symptom-related outcomes that are also of importance to patients.

▫

Consistency: Inconsistent. Findings for a specific treatment differed for measured outcomes, and findings for specific outcomes differed for various psychotherapies. Overall, however, there were consistent improvements in all treatment arms on at least some outcomes even when differences between the treatment groups did not show statistically significant differences.

▫

Precision: Imprecise. For many of the psychotherapy comparisons, the studies did not meet the optimal information size (i.e., number of participants in a meta-analysis) and were downgraded for imprecision.

▫

Dose-response relationship: No information on dose-response relationships was available.

▫

Confounding factors (including likely direction of effect): Present. Confounding factors may increase the observed effect. Subjects and treating clinicians are aware of the treatment arm to which subjects were assigned. This may cause confounding effects due to expectancy.

▫

Publication bias: Unable to be assessed. The relatively small number of studies for each comparison and the heterogeneity of study designs make it difficult to assess publication bias. However, publication bias seems possible because of the tendency for negative clinical trial results to go unpublished.

▫

Overall strength of research evidence: Moderate. The writing group assessed the overall strength of research evidence for psychotherapy in BPD as moderate. Although the relatively small number of studies for each comparison and the heterogeneity of study designs make it difficult to assess the strength of research evidence for specific psychotherapies, in the vast majority of studies, all treatment arms showed improvement with psychotherapy even when differences between the treatment groups did not show statistically significant differences. When compared with TAU or other active comparison arms, superiority was noted on at least some outcomes for a number of specific psychotherapies (e.g., DBT, DDP, GPM, MBT, SFT, STEPPS, TFP).

Grading of the Overall Supporting Body of Research Evidence for Harms of Psychotherapy in Borderline Personality Disorder

On the basis of the lack of data on harms in studies of psychotherapies in BPD, no grading of the body of research evidence is possible.

Pharmacotherapy

Statement 6 – Clinical Review Before Medication Initiation

APA recommends (1C) that a patient with borderline personality disorder have a review of co-occurring disorders, prior psychotherapies, other nonpharmacological treatments, past medication trials, and current medications before initiating any new medication.

Evidence for this statement comes from general principles of assessment and clinical care in psychiatric practice. A detailed systematic review to support this statement is outside the scope of this guideline; however, less comprehensive searches of the literature did not yield any studies related to this recommendation in the context of BPD treatment. Consequently, the strength of research evidence is rated as low.

Grading of the Overall Supporting Body of Research Evidence for Clinical Review Before Medication Initiation in Patients With Borderline Personality Disorder

On the basis of the limitations of the evidence for assessment of patients with possible BPD, no grading of the body of research evidence is possible.

Statement 7 – Pharmacotherapy Principles

APA suggests (2C) that any psychotropic medication treatment of borderline personality disorder be time-limited, aimed at addressing a specific measurable target symptom, and adjunctive to psychotherapy.

Evidence for this statement comes primarily from the systematic review conducted by RTI on the efficacy and comparative effectiveness of second-generation antipsychotics (SGAs), anticonvulsants, and antidepressants in patients with BPD (Gartlehner et al. 2021). Few studies were designed to specifically address benefits of pharmacotherapy as an adjunct to psychotherapy. One small study found an adjunctive benefit of olanzapine as an add-on to DBT (Soler et al. 2005), but small studies of adjunctive fluoxetine in patients with (Bellino et al. 2006) and without (Simpson et al. 2004) MDD did not find a benefit for BPD. Older literature suggested possible effects of lithium, the monoamine oxidase inhibitor tranylcypromine, and the anticonvulsant carbamazepine (Cowdry and Gardner 1988; de la Fuente and Lotstra 1994; Gardner and Cowdry 1986; Links et al. 1990). However, sample sizes were small, and BPD was diagnosed using different criteria than at present.

Second-Generation Antipsychotics Versus Placebo

Nine double-blinded RCTs evaluated the efficacy of four SGAs (aripiprazole, olanzapine, quetiapine extended release [ER], ziprasidone) compared with placebo (Black et al. 2014; Bogenschutz and Nurnberg 2004; Linehan et al. 2008; Nickel et al. 2006; Pascual et al. 2008; Schulz et al. 2008; Soler et al. 2005; Zanarini and Frankenburg 2001; Zanarini et al. 2011b). Overall, these studies provided data on 1,124 participants. Two studies were rated as having a moderate (Black et al. 2014; Nickel et al. 2006) risk of bias and seven as having a high risk of bias (Bogenschutz and Nurnberg 2004; Linehan et al. 2008; Pascual et al. 2008; Schulz et al. 2008; Soler et al. 2005; Zanarini and Frankenburg 2001; Zanarini et al. 2011b). Reasons for ratings of high risk of bias were lack of intention-to-treat analysis and high attrition. Four trials employed fixed-dose designs assessing aripiprazole (15 mg/day; Nickel et al. 2006), olanzapine (2.5 mg/day or 5 mg/day; Linehan et al. 2008; Zanarini and Frankenburg 2001), and quetiapine ER (150 mg/day or 300 mg/day; Black et al. 2014); five trials used flexible-dose designs for olanzapine (2.5–20 mg/day; Bogenschutz and Nurnberg 2004; Schulz et al. 2008; Soler et al. 2005; Zanarini et al. 2011b) and ziprasidone (40–200 mg/day; Pascual et al. 2008). Follow-up durations ranged from 8 weeks to 6 months. All trials, except one (Nickel et al. 2006), were funded by the pharmaceutical industry.

The majority of trial participants were female and White; mean ages across studies ranged from 21 years to 34 years. Participants were moderately ill at baseline, with mean ZAN-BPD scores ranging from 14.6 to 17.7 and scores on the CGI scale modified for BPD from 4.3 to 4.8. Studies, in general, excluded patients with psychiatric comorbidities such as schizophrenia, MDD, alcohol or substance use disorder, or bipolar disorder.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–25 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing SGAs with placebo

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with placebo	Difference in effect with SGA
				Anticipated absolute effects
Severity of BPD
Assessed with ZAN-BPD Follow-up: range 8–12 weeks	860 (three RCTs: Black et al. 2014; Schulz et al. 2008; Zanarini et al. 2011b)	⨁⨁◯◯; LOW^a for no effect of SGA	–	Mean score at endpoint = 10.3*	Mean 1.2 lower
Anger
Assessed with STAXI Follow-up: mean 8 weeks	52 (one RCT: Nickel et al. 2006)	⨁⨁◯◯; LOW^b for effect of SGA	–	Mean score at endpoint = 26.2	Mean 7.7 lower (P < 0.001)
Aggression
Assessed with MOAS Follow-up: range 8–12 weeks	610 (four RCTs: Black et al. 2014; Bogenschutz and Nurnberg 2004; Linehan et al. 2008; Zanarini et al. 2011b)	⨁⨁◯◯; LOW^a,c for no effect of SGA	–	Mean score at endpoint = 18.6*	Mean 14.7 lower (ns)
Depression
Assessed with Ham-D and MADRS Follow-up: range 8–21 weeks	497 (five RCTs: Gunderson et al. 2011; Linehan et al. 2008; Nickel et al. 2006; Pascual et al. 2008)	⨁⨁◯◯; LOW^d,e for no effect of SGA	–	Mean score at endpoint = NR	Mean 0.28 SDs (Cohen’s d) greater (-0.05 to 0.60)
Impulsiveness
Assessed with BIS Follow-up: range 8–12 weeks	155 (two RCTs: Black et al. 2014; Pascual et al. 2008)	⨁⨁◯◯; LOW^d,f for no effect of SGA	–	Mean score at endpoint = 69.1*	Mean 1.4 lower (ns)
General psychopathology
Assessed with SCL-90 Follow-up: range 8–12 weeks	698 (five RCTs: Black et al. 2014; Bogenschutz and Nurnberg 2004; Nickel et al. 2006; Pascual et al. 2008; Zanarini et al. 2011b)	⨁⨁⨁◯; MODERATE^a for effect of SGA	–	Mean score at endpoint = 10.3*	Mean 1.2 lower (ns)
Functioning
Assessed with GAF and SDS Follow-up: mean 8–12 weeks	586 (three RCTs: Black et al. 2014; Bogenschutz and Nurnberg 2004; Zanarini et al. 2011b)	⨁⨁⨁◯; MODERATE^g for no effect of SGA	–	Mean score at endpoint = 63.2*	Mean 2.9 higher (ns)
Incidence of adverse events
Follow-up: range 8–12 weeks	920 (four RCTs: Black et al. 2014; Pascual et al. 2008; Schulz et al. 2008; Zanarini et al. 2011b)	⨁⨁⨁◯; MODERATE^a for higher risk with antipsychotics	RR 1.10 (1.00–1.21)	571 per 1,000	57 more per 1,000 (0 fewer to 120 more)
Withdrawal due to adverse events
Follow-up: range 8–12 weeks	917 (five RCTs: Bogenschutz and Nurnberg 2004; Pascual et al. 2008; Schulz et al. 2008; Zanarini and Frankenburg 2001; Zanarini et al. 2011b)	⨁⨁◯◯; LOW^a,h for similar risks	RR 1.91 (0.83–4.43)	69 per 1,000	63 more per 1,000 (12 fewer to 237 more)
Incidence of serious adverse events
Follow-up: range 8–12 weeks	957 (six RCTs: Black et al. 2014; Bogenschutz and Nurnberg 2004; Nickel et al. 2006; Pascual et al. 2008; Schulz et al. 2008; Zanarini et al. 2011b)	⨁◯◯◯ VERY LOWⁱ for higher risk with placebo	RR 0.46 (0.23–0.95)**	44 per 1,000	24 fewer per 1,000 (34 fewer to 2 fewer)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

BIS = Barratt Impulsiveness Scale ; BPD = borderline personality disorder; CI = confidence interval; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; Ham-D = Hamilton Rating Scale for Depression; MADRS = Montgomery-Åsberg Depression Rating Scale; MOAS = Modified Overt Aggression Scale; NR = not reported; ns = not significant; RCT = randomized controlled trial; RR = risk ratio; SCL-90 = Symptom Checklist-90; SD = standard deviation; SDS = Sheehan Disability Scale; SGA = second-generation antipsychotic; STAXI = State-Trait Anger Expression Inventory; ZAN-BPD = Zanarini Rating Scale for Borderline Personality Disorder.

*Effect estimate from largest study or study with lowest risk of bias (Zanarini et al. 2011b or Black et al. 2014).

**Effect estimate from Zanarini et al. 2011b. Other studies reported no serious adverse events.

^aMajority of studies were high risk of bias; downgraded two steps for study limitations.

^bSmall study, does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^cSchulz et al. 2008 assessed MOAS but did not report data; downgraded one step for reporting bias.

^dAt least half of studies were high risk of bias; downgraded one step for study limitations.

^eInconsistent effects, largest study shows substantially smaller treatment effect; downgraded one step for inconsistency.

^fSmall study, does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

^g Does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

^hFew events; downgraded one step for imprecision.

ⁱVery few events; downgraded two steps for imprecision.

Severity of borderline personality disorder

Three studies assessed changes in the severity of BPD on the ZAN-BPD (Black et al. 2014; Schulz et al. 2008; Zanarini et al. 2011b). Two multinational, flexible-dose trials on olanzapine, rated as having a high risk of bias, reported mixed results (Gunderson et al. 2011; Schulz et al. 2008). A three-armed trial (N = 451) included a fixed-dosage arm with olanzapine 2.5 mg/day (n = 150), which did not achieve significant improvements compared with placebo on the ZAN-BPD (Zanarini et al. 2011b). A flexibly dosed arm showed significantly greater improvements for participants treated with olanzapine 5–10 mg/day than those treated with placebo, although the absolute difference in points was small (1.5 points) (Zanarini et al. 2011b). By contrast, another large trial (N = 314) reported no significant differences between olanzapine 5–20 mg/day and placebo on the ZAN-BPD (Schulz et al. 2008).

A fixed-dosage trial assessing quetiapine ER (N = 95), rated as having a moderate risk of bias, reported significant improvements on the ZAN-BPD scale for low-dosage (150 mg/day) but not moderate-dosage (300 mg/day) treatment with quetiapine ER compared with placebo (treatment effects NR) (Black et al. 2014).

Severity of symptoms associated with borderline personality disorder

Results assessing changes in the severity of symptoms associated with BPD reported mixed results regarding improvements in anger, impulsiveness, aggression, and depressive symptoms. A random-effects meta-analysis on the reduction of depressive symptoms favored SGAs over placebo but rendered no significant difference (A3–1).

Standardized mean differences of changes of depressive symptoms for second-generation antipsychotics versus placebo.
CI = confidence interval; *N =* sample size; REML = restricted maximum likelihood; SD = standard deviation.
*Source.* Linehan et al. 2008; Nickel et al. 2006; Pascual et al. 2008; Soler et al. 2005; Zanarini et al. 2011b.

One study (N = 52), rated as having a moderate risk of bias, reported significant improvements for aripiprazole on the State-Trait Anger Expression Inventory (STAXI; Nickel et al. 2006). By contrast, two RCTs (N = 95 and N = 60), one moderate risk of bias and the other high, detected no significant improvements for quetiapine ER (Black et al. 2014) and ziprasidone (Pascual et al. 2008) on the BIS.

Regarding improvement of aggression, one moderate risk of bias RCT (N = 451) (Zanarini et al. 2011b) and two RCTs (N = 40 and N = 24) (Bogenschutz and Nurnberg 2004; Linehan et al. 2008), rated as having a high risk of bias, reported no significant differences between olanzapine and placebo on the Modified Overt Aggression Scale (MOAS). By contrast, another RCT (N = 95), rated as having a moderate risk of bias, detected significant improvements for quetiapine ER compared with placebo on the MOAS (Black et al. 2014).

Global impression and functioning

Five RCTs assessed differences between SGAs and placebo on the SCL-90-R and provided mixed results (Black et al. 2014; Bogenschutz and Nurnberg 2004; Nickel et al. 2006; Pascual et al. 2008; Zanarini et al. 2011b). Three RCTs (N = 451 [Zanarini et al. 2011b], N = 52 [Nickel et al. 2006], N = 95 [Black et al. 2014]), rated as having a moderate risk of bias, reported significantly greater improvements on the SCL-90-R for participants treated with SGAs (aripiprazole, olanzapine, quetiapine) compared with participants in the placebo groups. Two RCTs, one on olanzapine (N = 40; Bogenschutz and Nurnberg 2004) and the other on ziprasidone (N = 60; Pascual et al. 2008), that were both rated as having a high risk of bias favored SGAs over placebo but rendered no significant differences between active treatments and placebo on the SCL-90-R. Studies provided insufficient data for meta-analyses.

Likewise, two trials (N = 40 and N = 60), rated as having a high risk of bias, provided mixed results about improvements with olanzapine versus placebo on the CGI scale (Bogenschutz and Nurnberg 2004; Soler et al. 2005). Bogenschutz and Nurnberg (2004) reported a significant improvement with olanzapine, whereas Soler et al. (2005) found no significant differences in treatment effects for olanzapine and placebo on the CGI scale.

An 18-month follow-up of the trial by Nickel et al. (2006; N = 52) reported that the significant difference on the SCL-90-R between aripiprazole and placebo could be maintained (Nickel et al. 2007).

Three trials, two rated as moderate risk of bias (Black et al. 2014; Zanarini et al. 2011b) and one rated as high risk of bias (Bogenschutz and Nurnberg 2004), with a total of 586 participants, reported no significant differences in functional capacity comparing quetiapine ER or olanzapine with placebo.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The incidence of adverse events was generally higher in the groups that received SGAs (Black et al. 2014; Pascual et al. 2008; Schulz et al. 2008; Zanarini et al. 2011b). A random-effects meta-analysis showed a small, but significantly higher risk of adverse events for participants treated with antipsychotics compared with placebo (A3–2).

Random effects meta-analysis of the incidence of adverse events comparing second-generation antipsychotics with placebo.
CI = confidence interval; REML = restricted maximum likelihood.
*Source.* Black et al. 2014; Pascual et al. 2008; Schulz et al. 2008; Zanarini et al. 2011b.

Likewise, withdrawals due to adverse events were numerically higher for participants receiving SGAs than for those receiving placebo (Bogenschutz and Nurnberg 2004; Linehan et al. 2008; Moher et al. 2015; Pascual et al. 2008; Schulz et al. 2008; Zanarini and Frankenburg 2001; Zanarini et al. 2011b). A random-effects meta-analysis, however, did not reach a significant difference (A3–3).

Random effects meta-analysis of withdrawal due to adverse events comparing second-generation antipsychotics with placebo.
CI = confidence interval; REML = restricted maximum likelihood.
*Source.* Bogenschutz and Nurnberg 2004; Linehan et al. 2008; Pascual et al. 2008; Schulz et al. 2008; Zanarini and Frankenburg 2001; Zanarini et al. 2011b.

The incidence of serious adverse events, when reported, was numerically lower for SGAs than for placebo. Sample sizes, however, were too small to detect rare but serious adverse events reliably.

Grading of the overall supporting body of research evidence for benefits of second-generation antipsychotics in borderline personality disorder

▫

Magnitude of effect: Low. There was a small benefit of SGAs on general psychopathology but no effect on other outcomes.

▫

Risk of bias: High. Of the RCT studies on SGAs, two had a moderate risk of bias and seven had a high risk of bias, suggesting that the body of evidence has a high risk of bias.

▫

Applicability: Studies included individuals with a diagnosis of BPD, but many excluded individuals taking other medications or who had other co-occurring disorders, which are common among clinical populations. The symptom severity of patients in these trials was also less than is typically seen in clinical populations. Demographically, the study samples were primarily young adult White females. Some but not all studies included a mix of races and ethnicities. Medication dosages that were studied were generally consistent with clinical practice.

▫

Directness: Direct. Some of the outcomes such as functioning addressed patient-oriented outcomes whereas others, such as BPD severity, addressed symptom-related outcomes that are also of importance to patients.

▫

Consistency: Inconsistent. In many of the studies, there was at least one outcome measure that showed a statistically significant effect. However, these were not consistent for specific SGAs or for SGAs as a group.

▫

Precision: Imprecise. For many of the outcomes, the optimal information size (i.e., number of participants in a meta-analysis) was not met and the certainty of evidence was downgraded for imprecision.

▫

Dose-response relationship: Insufficient information. Although two studies included treatment arms with two different dosages of medication, there was inconsistent evidence for a dose-response relationship.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted.

▫

Publication bias: Unable to be assessed. The relatively small number of studies of each SGA and the heterogeneity of study designs make it difficult to assess publication bias. However, publication bias seems possible because of the tendency for negative clinical trial results to go unpublished.

▫

Overall strength of research evidence: Low. There is a high risk of bias of the majority of the studies, inconsistency of some of the findings, and some limits on the applicability of the studies to typical clinical practice.

Grading of the overall supporting body of research evidence for harms of second- generation antipsychotics in borderline personality disorder

▫

Magnitude of effect: Low. Although study withdrawals due to adverse effects were comparable for SGAs and placebo, there was a small increase in adverse effects with SGAs and a very small increase in serious adverse effects with placebo.

▫

Risk of bias: High. Of the RCT studies on SGAs, two had a moderate risk of bias and seven had a high risk of bias, suggesting that the body of evidence has a high risk of bias.

▫

Applicability: Studies included individuals with a diagnosis of BPD, but many excluded individuals taking other medications or who had other co-occurring disorders, which are common among clinical populations. Demographically, the study samples were primarily young adult White females. Some but not all studies included a mix of races and ethnicities. Medication dosages that were studied were generally consistent with clinical practice.

▫

Directness: Direct as well as indirect. Outcomes included adverse effects and serious adverse effects but also study withdrawal due to adverse effects.

▫

Consistency: Inconsistent. Findings were different for adverse effects, serious adverse effects, and study withdrawal due to adverse effects.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted.

▫

Second-Generation Antipsychotic Versus Antidepressant

One industry-funded RCT (N = 45; Zanarini et al. 2004c), rated as having a moderate risk of bias, assessed differences in efficacy between olanzapine (2.5–7.5 mg/day), fluoxetine (10–30 mg/day), and a combination of fluoxetine and olanzapine. The study duration was 8 weeks. All trial participants were females between 18 years and 40 years of age; the majority were White. The severity of disease at baseline was not reported.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–26 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing SGA with second-generation antidepressant

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with antidepressants	Difference in effect with SGA
				Anticipated absolute effects
Olanzapine vs. fluoxetine
Aggression
Assessed with MOAS Follow-up: mean 8 weeks	30 (one RCT: Zanarini et al. 2004c)	⨁◯◯◯; LOW^a for greater effect of olanzapine	–	Mean score at endpoint = 7.83	Mean 4.3 lower (P = 0.003)
Depression
Assessed with MADRS Follow-up: mean 8 weeks	30 (one RCT: Zanarini et al. 2004c)	⨁◯◯◯; LOW^a for greater effect of olanzapine	–	Mean score at endpoint = 6.2	Mean 1.0 lower (P < 0.001)
Olanzapine + fluoxetine vs. fluoxetine
Aggression
Assessed with MOAS Follow-up: mean 8 weeks	29 (one RCT: Zanarini et al. 2004c)	⨁◯◯◯ LOW^a for greater effect of olanzapine + fluoxetine	–	Mean score at endpoint = 7.83	Mean 4.8 lower (P < 0.001)
Depression
Assessed with MADRS Follow-up: mean 8 weeks	29 (one RCT: Zanarini et al. 2004c)	⨁◯◯◯; LOW^a for greater effect of olanzapine + fluoxetine	–	Mean score at endpoint = 6.2	Mean 1.8 lower (P = 0.02)
Withdrawals due to adverse events
Follow-up: mean 8 weeks	29 (one RCT: Zanarini et al. 2004c)	⨁◯◯◯; VERY LOW^a,b for similar risks	RR 0.94 (0.06–13.68)	71 per 1,000	4 fewer per 1,000 (67 fewer to 906 more)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI = confidence interval; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MADRS = Montgomery-Åsberg Depression Scale; MOAS = Modified Overt Aggression Scale; RCT = randomized controlled trial; RR = risk ratio; SGA = second-generation antipsychotic.

^aSmall study, does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^bUnclear how withdrawal due to adverse events was determined; downgraded one step for indirectness.

Severity of borderline personality disorder

The study did not report any relevant outcomes.

Severity of symptoms associated with borderline personality disorder

After 8 weeks, participants treated with olanzapine or a combination of olanzapine and fluoxetine had significantly greater improvements in aggression (MOAS) and depressive symptoms (Montgomery-Åsberg Depression Rating Scale) than participants treated with fluoxetine alone (Zanarini et al. 2004c).

Global impression and functioning

The study did not report any relevant outcomes.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report data on the incidence of adverse or serious adverse events. Only two participants (one in the fluoxetine and one in the olanzapine plus fluoxetine group) withdrew because of adverse events (Zanarini et al. 2004c).

Second-Generation Antipsychotics Versus Second-Generation Antipsychotics

One RCT (N = 51; Bozzatello et al. 2017) and one retrospective cohort study (N = 116; García-Carmona et al. 2021) compared SGAs with other SGAs.

The RCT, rated as having a high risk of bias, assessed differences in efficacy between asenapine (5–10 mg/day) and olanzapine (5–10 mg/day) (Bozzatello et al. 2017). The study duration was 12 weeks. All trial participants were between 18 years and 50 years of age; the majority were female (63%), with race being unreported.

The high risk of bias retrospective cohort study compared the effectiveness of oral SGAs (not specified) and long-acting injectable SGAs (aripiprazole, paliperidone, risperidone) (García-Carmona et al. 2021). The study used data from 116 outpatients in Spain with follow-up data from 1 month to 3 months.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–27 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing SGAs with SGAs

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with olanzapine	Difference in effect with asenapine
				Anticipated absolute effects
Severity of BPD
Assessed with BPDSI Follow-up: mean 12 weeks	51 (one RCT: Bozzatello et al. 2017)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 49.12	Mean 2.23 lower (ns)
Aggression
Assessed with MOAS Follow-up: mean 12 weeks	51 (one RCT: Bozzatello et al. 2017)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 4.8	Mean 1.4 higher (ns)
Impulsiveness
Assessed with BIS Follow-up: mean 12 weeks	51 (one RCT: Bozzatello et al. 2017)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 72.9	Mean 8.2 lower (ns)
Self-harm
Assessed with SHI Follow-up: mean 12 weeks	51 (one RCT: Bozzatello et al. 2017)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 10	Mean 2 lower (ns)
Global impression
Assessed with CGI-S Follow-up: mean 12 weeks	51 (one RCT: Bozzatello et al. 2017)	⨁◯◯◯; VERY LOW^a,b for similar effects	–	Mean score at endpoint = 3.9	Mean 0.2 lower (ns)
Incidence of adverse events
Assessed with DOTES Follow-up: mean 12 weeks	40 (one RCT: Bozzatello et al. 2017)	⨁◯◯◯; VERY LOW^a,b for similar risks	RR 1.38 (0.43–4.40)	263 per 1,000	100 more per 1,000 (150 fewer to 895 more)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

BIS = Barratt Impulsiveness Scale; BPD = borderline personality disorder; BPDSI = Borderline Personality Disorder Severity Index; CGI-S = Clinical Global Impression–Severity; CI = confidence interval; DOTES = Dosage Record and Treatment Emergent Symptom Scale; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MOAS = Modified Overt Aggression Scale; ns = not significant; RCT = randomized controlled trial; RR = risk ratio; SGA = second-generation antipsychotic; SHI = Self-Harm Inventory.

^aHigh attrition; downgraded one step for risk of bias.

^bSmall study, does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

After 12 weeks, the RCT reported no significant difference on the BPD Severity Index between the asenapine and olanzapine groups (Bozzatello et al. 2017).

Severity of symptoms associated with borderline personality disorder

After 12 weeks, the RCT reported no significant differences on the BIS, the Self-Harm Inventory, and the MOAS between the asenapine and olanzapine groups (Bozzatello et al. 2017). The retrospective cohort study reported no significant differences for suicidal behavior for individuals who received long-acting injectable antipsychotics compared with those who were receiving oral antipsychotics (García-Carmona et al. 2021).

Global impression and functioning

After 12 weeks, the RCT reported no significant difference on the CGI–Severity scale between the asenapine and olanzapine groups (Bozzatello et al. 2017).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

In the RCT, the incidence of adverse events was nearly equal in both groups (five in the olanzapine group and four in the asenapine group). The study did not report data on the incidence of serious adverse events. Only four participants (two in each group) withdrew because of adverse events (Bozzatello et al. 2017).

Anticonvulsants Versus Placebo

Nine double-blinded RCTs evaluated the efficacy of three anticonvulsant medications (divalproex sodium, lamotrigine, topiramate) compared with placebo (Crawford et al. 2018; Frankenburg and Zanarini 2002; Hollander et al. 2001; Loew et al. 2006; Moen et al. 2012; Nickel et al. 2004, 2005; Reich et al. 2009; Tritt et al. 2005). Overall, these studies provided data on 523 participants.

Two studies were rated as having a low risk of bias (Loew et al. 2006; Tritt et al. 2005), three as having a moderate risk of bias (Crawford et al. 2018; Nickel et al. 2004, 2005), and four as having a high risk of bias (Frankenburg and Zanarini 2002; Hollander et al. 2001; Moen et al. 2012; Reich et al. 2009). Reasons for ratings of high risk of bias were lack of intention-to-treat analysis and high attrition.

Four trials employed fixed-dosage designs assessing lamotrigine (200 mg/day) (Tritt et al. 2005) or topiramate (200 mg/day and 250 mg/day) (Loew et al. 2006; Nickel et al. 2004, 2005); five trials used flexible-dosage designs for divalproex sodium (Frankenburg and Zanarini 2002; Hollander et al. 2001; Moen et al. 2012) or lamotrigine (Crawford et al. 2018; Reich et al. 2009). Follow-up durations ranged from 8 weeks to 52 weeks. Four trials were funded by the pharmaceutical industry (Frankenburg and Zanarini 2002; Hollander et al. 2001; Moen et al. 2012; Reich et al. 2009); the others reported no funding or were supported by public institutions.

The majority of trial participants were female and White, and mean ages ranged from 25 years to 38 years. Participants were moderately ill at baseline, with mean scores on the ZAN-BPD ranging from 11.3 to 20.2. Studies, in general, excluded patients with psychiatric comorbidities such as schizophrenia, MDD, alcohol or substance use disorder, and bipolar disorder. An exception was the trial by Frankenburg and Zanarini (2002), which included participants with BPD and bipolar disorder.

Detailed information on main study characteristics and treatment effects is presented in Appendix D. A3–28 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing anticonvulsants with placebo

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with placebo	Difference in effect with anticonvulsants
				Anticipated absolute effects
Divalproex sodium
Severity of BPD
Assessed with BEST Follow-up: mean 12 weeks	15 (one RCT: Moen et al. 2012)	⨁◯◯◯; VERY LOW^a,b for no effect of divalproex sodium	–	Mean score at endpoint = 30.0	Mean 1.3 lower (ns)
Aggression
Assessed with MOAS; SCL-90-R subscale for anger and hostility Follow-up: range 10–24 weeks	46 (two RCTs: Frankenburg and Zanarini 2002; Hollander et al. 2001)	⨁◯◯◯; VERY LOW^a,c,d for effect of divalproex sodium	–	Mean score on MOAS = 3.2 *	Mean 0.6 lower (P = 0.03)
Impulsiveness
Assessed with BIS-Motor Follow-up: mean 12 weeks	15 (one RCT: Moen et al. 2012)	⨁◯◯◯; VERY LOW^a,b for no effect of divalproex sodium	–	Mean score at endpoint = 18.2	Mean 5.7 higher (ns)
General psychopathology
Assessed with SCL-90-R, CGI-I Follow-up: range 10–12 weeks	31 (two RCTs: Hollander et al. 2001; Moen et al. 2012)	⨁◯◯◯; VERY LOW^a,d for no effect of divalproex sodium	–	Mean score at endpoint on SCL-90 = 114.2*	Mean 22.8 higher (ns)
Withdrawals due to adverse events
Follow-up: range 10–24 weeks	46 (two RCTs: Frankenburg and Zanarini 2002; Hollander et al. 2001)	⨁◯◯◯; VERY LOW^a,d for similar risks	RR 0.26 (0.03–2.35)	136 per 1,000*	101 fewer per 1,000 (132 fewer to 184 more; ns)
Lamotrigine
Severity of BPD
Assessed with ZAN-BPD Follow-up: range 12–52 weeks	304 (two RCTs: Crawford et al. 2018; Reich et al. 2009)	⨁⨁⨁◯; MODERATE^e for no effect of lamotrigine	–	Mean score at endpoint = 11.5*	Mean 0.5 lower (ns)
Affective lability
Assessed with ALS Follow-up: mean 12 weeks	28 (one RCT: Reich et al. 2009)	⨁◯◯◯; VERY LOW^b,f for effect of lamotrigine	–	Mean score at endpoint = 1.52	Mean 0.27 lower (P = 0.012)
Alcohol and substance use
Assessed with ASSIST Follow-up: mean 52 weeks	160 (one RCT: Crawford et al. 2018)	⨁⨁◯◯; LOW^b for no effect of lamotrigine	–	Mean score at endpoint = 23	Mean 4 higher (ns)
Anger
Assessed with STAXI Follow-up: mean 8 weeks	27 (one RCT: Tritt et al. 2005)	⨁⨁◯◯; LOW^b for effect of lamotrigine	–	NR	NR (four of five subscales significantly improved)
Functioning
Assessed with SFQ Follow-up: mean 52 weeks	276 (one RCT: Crawford et al. 2018)	⨁⨁⨁◯; MODERATE^e,g for no effect of lamotrigine	–	Mean score at endpoint = 12.3	Mean 0.1 higher (ns)
Incidence of adverse events
Follow-up: range 10–52 weeks	304 (two RCTs: Crawford et al. 2018; Reich et al. 2009)	⨁⨁◯◯; LOW^g for similar risks	RR 0.86 (0.71–1.03)	630 per 1,000*	88 fewer per 1,000 (183 fewer to 19 more; ns)
Incidence of serious adverse events
Follow-up: mean 52 weeks	276 (one RCT: Crawford et al. 2018)	⨁⨁◯◯; LOW^h for similar risks	RR 0.82 (0.52–1.31)	230 per 1,000	41 fewer per 1,000 (111 fewer to 71 more; ns)
Withdrawal due to adverse events
Follow-up: range 10–52 weeks	328 (three RCTs: Crawford et al. 2018; Reich et al. 2009; Tritt et al. 2005)	⨁◯◯◯; VERY LOW^h,i for similar risks	RR 3.79 (0.82–17.57)	12 per 1,000	35 more per 1,000 (2 fewer to 206 more; ns)
Topiramate
Anger
Assessed with STAXI Follow-up: mean 8 weeks	75 (two RCTs: Nickel et al. 2004; Nickel et al. 2005)	⨁⨁◯◯; LOW^d for effect of topiramate	–	NR	NR (four of five subscales significantly improved)
General psychopathology
Assessed with SCL-90 Follow-up: range 8–12 weeks	56 (one RCT: Loew et al. 2006)	⨁⨁◯◯; LOW^b for effect of topiramate	–	Mean score at endpoint = 70.1	Mean 5.9 lower (P < 0.001)
Withdrawal due to adverse events
Follow-up: mean 8 weeks	75 (two RCTs: Nickel et al. 2004; Nickel et al. 2005)	⨁◯◯◯; VERY LOW^d,j for similar risks	RR 1.95 (0.77–4.94)	0 per 1,000	0 fewer per 1,000 (0 fewer to 0 fewer)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

ALS = Affective Lability Scale; ASSIST = Alcohol, Smoking, and Substance Involvement Screening Test; BEST = Borderline Evaluation of Severity Over Time; BIS-Motor = Barratt Impulsiveness Scale-Motor; BPD = borderline personality disorder; CGI-I = Clinical Global Impressions–Improvement; CI = confidence interval; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MOAS = Modified Overt Aggression Scale; NR = not reported; ns = not significant; RCT = randomized controlled trial; RR = risk ratio; SCL-90 = Symptom Checklist-90; SCL-90-R = Symptom Checklist-90-Revised; SFQ = Social Functioning Questionnaire; STAXI = State-Trait Anger Expression Inventory; ZAN-BPD = Zanarini Rating Scale for Borderline Personality Disorder.

*Effect estimate from largest study or the study with the lowest risk of bias.

^aHigh attrition; downgraded one step for risk of bias.

^bSmall study, does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^cConflicting results of two studies; downgraded one step for inconsistency.

^dSmall studies, do not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

^eSample size probably does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded one step for imprecision.

^fTrial with high risk of bias; downgraded one step for risk of bias.

^gFew events; downgraded two steps for imprecision.

^hVery few events; downgraded two steps for imprecision.

ⁱProportions vary substantially; downgraded one step for inconsistency.

^jOne study does not report data on withdrawal due to adverse events; downgraded one step for outcomes reporting bias.

Severity of borderline personality disorder

Divalproex sodium

A small RCT (N = 15; Moen et al. 2012), rated as having a high risk of bias, assessed the efficacy of divalproex sodium ER compared with placebo in participants who were already receiving 12-week DBT, which included individual therapy sessions, a skills training group, and telephone coaching calls. The study reported no significant differences between participants receiving divalproex sodium ER or placebo on the BEST scale after 12 weeks of treatment.

Lamotrigine

The publicly funded Lamotrigine and Borderline Personality Disorder: Investigating Long-Term Effects (LABILE) trial (N = 276; Crawford et al. 2018), rated as having a moderate risk of bias, and a small, industry-funded RCT (N = 28; Reich et al. 2009), rated as having a high risk of bias, assessed the efficacy of lamotrigine (200–400 mg/day) compared with placebo on the ZAN-BPD. Both trials reported no significant differences between participants in the lamotrigine and the placebo groups after 12 weeks of treatment. The primary endpoint of the LABILE trial was at 52 weeks, which also yielded no significant difference on the ZAN-BPD between treatment groups (Crawford et al. 2018).

Topiramate

None of the included trials reported relevant outcomes.

Severity of symptoms associated with borderline personality disorder

Divalproex sodium

Two small RCTs, rated as having a high risk of bias, reported results regarding the efficacy of divalproex sodium (flexible dosage to achieve serum levels of 80 mg/L and 50–100 mg/L, respectively) to reduce aggression (Frankenburg and Zanarini 2002; Hollander et al. 2001). One trial (N = 30; Frankenburg and Zanarini 2002) reported significant improvements for divalproex sodium compared with placebo on the MOAS and the SCL-90-R subscale for anger and hostility after 24 weeks of treatment. This study enrolled participants with BPD and bipolar II disorder. The other trial (N = 16; Hollander et al. 2001) also favored divalproex sodium over placebo but found no significant differences on the Aggression Questionnaire and the MOAS after 10 weeks.

Another RCT (N = 15; Moen et al. 2012), rated as having a high risk of bias, reported no significant differences between participants on divalproex sodium ER or placebo on the BIS after 12 weeks of treatment.

Lamotrigine

Three trials assessed improvements of BPD-specific symptoms under lamotrigine treatment (Crawford et al. 2018; Reich et al. 2009; Tritt et al. 2005). The LABILE trial (N = 276; Crawford et al. 2018), rated as having a moderate risk of bias, reported no significant differences in alcohol or other substance use between participants treated with lamotrigine or placebo. In a Cochrane review of pharmacological treatments for BPD, the evidence for lamotrigine was assessed as being very uncertain in terms of effects on self-harm (Stoffers-Winterling et al. 2022).

An RCT in 27 female participants with BPD, rated as having a low risk of bias, showed significant improvements in anger as measured on four out of five subscales on the STAXI after 8 weeks of treatment (Tritt et al. 2005). The subscale assessing the tendency to repress anger did not improve significantly.

Likewise, a small RCT with 28 participants, rated as having a high risk of bias, reported significantly greater reductions on the Affective Lability Scale for the lamotrigine group compared with the placebo treatment group (Reich et al. 2009).

Topiramate

Two RCTs (N = 31 [Nickel et al. 2004] and N =44 [Nickel et al. 2005]), rated as having a moderate risk of bias, that had similar protocols conducted by the same author team investigated the efficacy of topiramate (titrated from 50 mg/day to 250 mg/day) to reduce anger and aggression in females and males with BPD. After 8 weeks, both women and men experienced significant improvements in four out of five subscales of the STAXI. In both trials, the subscale assessing the tendency to repress anger did not improve significantly (Nickel et al. 2004, 2005).

Global impression and functioning

Divalproex sodium

Two very small RCTs (N = 16 and N = 15), rated as having a high risk of bias, reported no significant differences between divalproex sodium and placebo on the CGI–Improvement scale and the SCL-90-R after 10 weeks and 12 weeks of treatment (Hollander et al. 2001; Moen et al. 2012).

Lamotrigine

The LABILE trial (N = 276; Crawford et al. 2018) reported no significant differences on the Social Functioning Questionnaire between participants treated with lamotrigine or placebo after 52 weeks of treatment.

Topiramate

One RCT (N = 56; Loew et al. 2006), rated as having a low risk of bias, assessed the efficacy of topiramate (titrated from 50 mg/day to 200 mg/day) in females with BPD ages 18–35 years. After 10 weeks, participants in the topiramate group had significantly greater improvements on the Global Severity Index of the SCL-90-R, the Short Form-36, and the Inventory of Interpersonal Problems.

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

Divalproex sodium

None of the three included studies reported on the incidence of adverse events. Two trials reported similar proportions of withdrawals due to adverse events between the divalproex sodium and the placebo treatment groups (Frankenburg and Zanarini 2002; Hollander et al. 2001).

Lamotrigine

The incidence of adverse events, serious adverse events, and withdrawals due to adverse events was similar between lamotrigine and placebo treatment groups (Crawford et al. 2018; Reich et al. 2009; Tritt et al. 2005).

Topiramate

None of the trials reported the incidence of adverse events or serious adverse events. Two publications stated that no participants withdrew because of adverse events during 8 weeks of treatment (Nickel et al. 2004, 2005).

A meta-analysis of anticonvulsant medications as a class rendered no significant differences in withdrawals because of adverse events after 8–52 weeks of treatment (A3–4).

Random effects meta-analysis of withdrawal due to adverse events comparing anticonvulsant medications with placebo.
CI = confidence interval; REML = restricted maximum likelihood.
*Source.* Crawford et al. 2018; Frankenburg and Zanarini 2002; Reich et al. 2009; Tritt et al. 2005.

Grading of the overall supporting body of research evidence for benefits of divalproex in borderline personality disorder

▫

Magnitude of effect: Minimal. There was a very small benefit of divalproex on aggression but no effect on other outcomes.

▫

Risk of bias: High. Of the RCT studies of divalproex, both had a high risk of bias.

▫

Applicability: Studies were conducted in the United States and included individuals with a diagnosis of BPD but excluded individuals with co-occurring disorders or those who were suicidal. Demographically, the study samples were primarily young adult White females, but a mix of races and ethnicities were included. Medication dosages that were studied were smaller than in usual clinical practice, limiting the generalizability of the findings.

▫

Directness: Indirect. Outcomes in one study were not well delineated; in the other study, outcomes were either global or addressed aggressive behavior.

▫

Consistency: Consistent. Studies were generally consistent and, with the exception of aggressive behavior in one study, showed significant effects of divalproex.

▫

Precision: Imprecise. The optimal information size (i.e., the number of participants in a meta-analysis) was not met due to small samples, and the certainty of evidence was downgraded for imprecision.

▫

Dose-response relationship: Unable to be assessed. Studies did not include information on dose-response relationships.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted, but some may have been present due to the high risk of bias in the study design.

▫

Publication bias: Unable to be assessed. The small number of studies makes it difficult to assess publication bias. However, publication bias seems possible because of the tendency for negative clinical trial results to go unpublished.

▫

Overall strength of research evidence: Low. There is a high risk of bias in both studies, inconsistency of some of the findings, and limits on the applicability of the studies to typical clinical practice.

Grading of the overall supporting body of research evidence for harms of divalproex in borderline personality disorder

▫

Magnitude of effect: None noted. Study withdrawal rates due to adverse effects were comparable for placebo and divalproex in one study. No data on adverse effects was reported in the other study.

▫

Risk of bias: High. Of the RCT studies of divalproex, both had a high risk of bias.

▫

Directness: Indirect. Outcomes in one study were not well delineated; in the other study, outcomes were either global or addressed aggressive behavior.

▫

Consistency: Consistent. Studies were generally consistent and, with the exception of aggressive behavior in one study, showed significant effects of divalproex.

▫

Dose-response relationship: Unable to be assessed. Studies did not include information on dose-response relationships.

▫

Grading of the overall supporting body of research evidence for benefits of lamotrigine in borderline personality disorder

▫

Magnitude of effect: Minimal. There was a very small benefit of lamotrigine on affective lability and anger, in one small study each, but no effect on other outcomes. In one large study that assessed BPD severity and functioning, lamotrigine had no significant effect.

▫

Risk of bias: Moderate. Of the RCT studies of lamotrigine, the largest study had a moderate risk of bias, whereas the two smaller studies had a low and a high risk of bias.

▫

Applicability: Studies were conducted in the United States, the United Kingdom, Germany, and Austria. They included individuals with a diagnosis of BPD, but the smaller studies excluded individuals with co-occurring disorders or those who were suicidal. Demographically, the study samples were primarily young adult White females, but in the largest study 25% of participants were male and 11% non-White race. Medication dosages that were studied were comparable with those used in usual clinical practice.

▫

Directness: Direct. The primary outcome in the largest study was BPD severity, although the smaller studies had indirect measures of anger and affective lability as primary outcomes.

▫

Consistency: Inconsistent. The smaller studies showed some benefits on affective lability and anger, whereas the larger study showed no effect of lamotrigine on BPD severity, self-harm, or functioning.

▫

Precision: Imprecise. The optimal information size (i.e., number of participants in a meta-analysis) was not met due to small samples in two studies, and the certainty of evidence was downgraded for imprecision.

▫

Dose-response relationship: Unable to be assessed. Studies did not include information on dose-response relationships.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted.

▫

Overall strength of research evidence: Low. For most outcomes, data were only available from a single study. There was also inconsistency of some of the findings and variability in the risk of bias in the studies.

Grading of the overall supporting body of research evidence for harms of lamotrigine in borderline personality disorder

▫

Magnitude of effect: None detected. There was a similar effect of lamotrigine on withdrawal due to adverse effects as well as on the incidence of adverse effects and serious adverse effects.

▫

Risk of bias: Moderate. Of the RCT studies of lamotrigine, the largest study had a moderate risk of bias, whereas the two smaller studies had a low and a high risk of bias.

▫

Directness: Direct. The studies measured the incidence of adverse effects and serious adverse effects.

▫

Consistency: Consistent. The studies were consistent in showing a comparable incidence of adverse effects and serious adverse effects as well as similar rates of study withdrawal due to adverse effects.

▫

Dose-response relationship: Unable to be assessed. Studies did not include information on dose-response relationships.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted.

▫

Overall strength of research evidence: Low. Based on the variability in the risk of bias in the studies and imprecision, the overall strength of research evidence was rated as low.

Grading of the overall supporting body of research evidence for benefits of topiramate in borderline personality disorder

▫

Magnitude of effect: Minimal. There was a very small benefit of topiramate on general psychopathology in one small study and anger in two small studies.

▫

Risk of bias: Moderate. Of the RCT studies of topiramate, two had a moderate risk of bias and one had a low risk of bias.

▫

Applicability: Studies were conducted in Germany and Austria. They included individuals with a diagnosis of BPD, but the smaller studies excluded individuals with co-occurring disorders or those who were suicidal. Demographically, the study samples were primarily young adults, with only females in two studies and only males in the third study. No data were obtained on race or ethnicity. Medication dosages that were studied were comparable with those used in usual clinical practice.

▫

Directness: Indirect. The primary outcomes were symptom measures but not specific to BPD severity or functioning.

▫

Consistency: Consistent. The studies were consistent in showing some minimal benefits of topiramate.

▫

Precision: Imprecise. The optimal information size (i.e., number of participants in a meta-analysis) was not met due to small samples, and the certainty of evidence was downgraded for imprecision.

▫

Dose-response relationship: Unable to be assessed. Studies did not include information on dose-response relationships.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted.

▫

Overall strength of research evidence: Low. Two of the studies had a moderate risk of bias, results were downgraded for imprecision, and there were significant issues with applicability of the study samples.

Grading of the overall supporting body of research evidence for harms of topiramate in borderline personality disorder

▫

Magnitude of effect: None noted. No study withdrawals due to adverse effects were noted in the two studies that examined this outcome.

▫

Risk of bias: Moderate. Of the RCT studies of topiramate, two had a moderate risk of bias and one had a low risk of bias.

▫

Directness: Indirect. The primary outcome related to adverse effects was study withdrawals.

▫

Consistency: Consistent. The two studies that measured withdrawals due to adverse effects were consistent in showing no study withdrawals for this reason.

▫

Precision: Imprecise. The optimal information size (i.e., number of participants in a meta-analysis) was not met due to small samples, and the certainty of evidence was downgraded for imprecision.

▫

Dose-response relationship: Unable to be assessed. Studies did not include information on dose-response relationships.

▫

Confounding factors (including likely direction of effect): Not identified. No specific confounding effects were noted.

▫

Second-Generation Antidepressants Versus Placebo

One industry-funded RCT (N = 25; Simpson et al. 2004), rated as having a high risk of bias, assessed differences in efficacy between fluoxetine (20–40 mg/day) and placebo. The study duration was 12 weeks. All trial participants were female; the majority were White. Participants in both treatment groups received individual DBT and were part of 2-hour weekly skills groups.

Detailed information on main study characteristics and treatment effects is presented in Appendix D.A3–29 presents certainty-of-evidence ratings.

Certainty-of-evidence ratings of outcomes comparing second-generation antidepressants with placebo

Outcomes	Participants, N (studies)	Certainty of the evidence (GRADE)	Relative effect (95% CI)	Effect with placebo	Difference in effect second-generation antidepressants
				Anticipated absolute effects
Anger
Assessed with STAXI Follow-up: mean 10 weeks	25 (one RCT: Simpson et al. 2004)	⨁◯◯◯; VERY LOW^a,b for no effect of fluoxetine	–	Mean score at endpoint = 27.6	Mean 7.1 lower (ns)
Aggression
Assessed with MOAS Follow-up: mean 10 weeks	25 (one RCT: Simpson et al. 2004)	⨁◯◯◯; VERY LOW^a,b for no effect of fluoxetine	–	Mean score at endpoint = NR	NR (ns)
Functioning
Assessed with GAF Follow-up: mean 10 weeks	25 (one RCT: Simpson et al. 2004)	⨁◯◯◯; VERY LOW^a,b for no effect of fluoxetine	–	Mean score at endpoint = 59.3	Mean 0.6 higher (ns)

The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI = confidence interval; GAF = Global Assessment of Functioning; GRADE = Grading of Recommendations Assessment, Development, and Evaluation; MOAS = Modified Overt Aggression Scale; NR = not reported; ns = not significant; RCT = randomized controlled trial; STAXI = State-Trait Anger Expression Inventory.

^aNo intention-to-treat analysis; downgraded one step for risk of bias.

^bSmall study, does not meet optimal information size (i.e., number of participants in a meta-analysis); downgraded two steps for imprecision.

Severity of borderline personality disorder

The study did not report any relevant outcomes.

Severity of symptoms associated with borderline personality disorder

After a mean of 10 weeks, authors reported no significant difference between fluoxetine and placebo on the STAXI and the MOAS (Simpson et al. 2004).

Global impression and functioning

After 10 weeks, there were no significant differences between both groups in the GAF scale (Simpson et al. 2004).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report any relevant adverse events.

Grading of the overall supporting body of research evidence for antidepressants in borderline personality disorder

Only a single study met inclusion criteria related to antidepressants in BPD and, thus, no grading of the body of research evidence is possible.

Repetitive Transcranial Magnetic Stimulation Versus Sham Treatment

One RCT (N = 9; Cailhol et al. 2014), rated as having a moderate risk of bias, assessed differences in efficacy between 10 sessions of repetitive transcranial magnetic stimulation (rTMS) and sham rTMS. The study duration was 3 months. The majority of trial participants were females between 20 years and 45 years of age, with race being unreported. The severity of disease at baseline was reported by the BPD Severity Index. The study was publicly funded. Detailed information on main study characteristics and treatment effects is presented in Appendix D.

Severity of borderline personality disorder

After 3 months, there were no significant differences on the BPD Severity Index between the rTMS and the sham rTMS groups (Cailhol et al. 2014).

Severity of symptoms associated with borderline personality disorder

The study did not report any relevant outcomes (Cailhol et al. 2014).

Global impression and functioning

After 3 months, differences on the SCL-90 and the Global Assessment Scale favored rTMS over sham treatment, but the difference did not reach statistical significance because of the small sample size (N = 9) (Cailhol et al. 2014).

Incidence of adverse events, serious adverse events, and withdrawal due to adverse events

The study did not report data on the incidence of adverse or serious adverse events. No participants withdrew due to adverse events (Cailhol et al. 2014).

Grading of the overall supporting body of research evidence for repetitive transcranial magnetic stimulation in borderline personality disorder

Only a single study met inclusion criteria related to rTMS in BPD and, thus, no grading of the body of research evidence is possible.

Statement 8 – Pharmacotherapy Review

APA recommends (1C) that a patient with borderline personality disorder receive a review and reconciliation of their medications at least every 6 months to assess the effectiveness of treatment and identify medications that warrant tapering or discontinuation.

Evidence for this statement comes from general principles of clinical care in psychiatric practice. In addition, medication reconciliation and de-prescribing, where indicated, are recommended best practices in hospital as well as outpatient settings (Institute for Safe Medication Practice 2023; The Joint Commission 2022). A detailed systematic review to support this statement is outside the scope of this guideline; however, less comprehensive searches of the literature did not yield any studies related to this recommendation in the context of BPD treatment. Consequently, the strength of research evidence is rated as low.

Grading of the Overall Supporting Body of Research Evidence for Pharmacotherapy Review in Patients With Borderline Personality Disorder

On the basis of the limitations of the evidence for pharmacotherapy review in patients with possible BPD, no grading of the body of research evidence is possible.

Information & Authors

Information

Published In

The American Psychiatric Association Practice Guideline for the Treatment of Patients With Borderline Personality Disorder

November 2024

Authors

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

For more information or tips please see 'Downloading to a citation manager' in the Help menu.

Format	RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks
Direct importt
Citation style
Style

Copy to clipboard
Tips for downloading citations

View Options

View options

Login options

Already a subscriber? Access your subscription through your login credentials or your institution for full access to this article.

Personal login Institutional Login Open Athens login

Not a subscriber?

Subscribe Now / Learn More

PsychiatryOnline subscription options offer access to the DSM-5-TR^® library, books, journals, CME, and patient resources. This all-in-one virtual library provides psychiatrists and mental health professionals with key resources for diagnosis, treatment, research, and professional development.

Need more help? PsychiatryOnline Customer Service may be reached by emailing [email protected] or by calling 800-368-5777 (in the U.S.) or 703-907-7322 (outside the U.S.).