Peer-administered interventions (PAIs), delivered by nonprofessionals with a history of mental illness (
1), have been gaining acceptance (
2). PAIs can increase the availability, affordability, and scalability of psychotherapeutic treatments. PAIs are more commonly used in low- and middle-income countries than in high-income ones, but the fidelity of these interventions has rarely been assessed (
1–
3).
Treatment fidelity has been defined as the extent to which a treatment is delivered as intended and is composed of two elements: adherence and competence (
4,
5).
Adherence represents the degree to which the therapeutic techniques used are consistent with the treatment protocol, and
competence represents the level of skill and judgment shown by therapists (
4,
5). Fidelity assessment can increase confidence that the changes seen during treatment are due to the intervention, inform whether a provider is delivering therapy effectively, and determine whether therapist training needs to be modified (
6).
Treatment manuals, validated rating scales, and supervision are recommended to optimize fidelity (
7). However, fidelity assessment in clinical settings remains challenging because of the time required to perform it (
4); its cost; and the effort required to develop rating systems, to train raters, and to establish rating reliability (
8). Although the task-shifting of psychotherapy to peer facilitators could help address treatment gaps, the dearth of professionals available to assess fidelity and to provide psychotherapy supervision remains a barrier to widespread implementation of PAIs (
3,
9).
Shifting the task of fidelity assessment and supervision to peers could partially mitigate these bottlenecks (
9). However, peer-led supervision is only possible if peers can effectively evaluate intervention fidelity. Prior studies (
2,
9) have suggested that peer-led fidelity evaluation is feasible in low-resource settings, but this task-shifting has not been investigated for PAIs for postpartum populations in high-income countries. The objective of this study was to determine whether individuals who have recovered from postpartum depression (i.e., peers) can effectively and reliably assess the fidelity of a peer-delivered online group cognitive-behavioral therapy (CBT) intervention, compared with an expert psychiatrist and with a trained graduate student.
Methods
The study took place from March 28, 2022, to May 30, 2022. Ethical approval was obtained from the Hamilton Integrated Research Ethics Board (approval 3781). Mothers who had recovered from postpartum depression were recruited and trained to deliver a structured 9-week online group CBT intervention to mothers with current postpartum depression (
10). The intervention’s weekly 2-hour sessions consisted of teaching and practicing CBT skills, followed by discussing topics relevant to people with postpartum depression (e.g., sleep, support, transitions). For this study, two 9-week groups were held simultaneously and were each delivered by two randomly selected peer facilitators.
One expert psychiatrist (R.J.V.L., who developed the treatment used in this study), one psychiatry graduate student (Z.B.), and three peers who had delivered this intervention previously individually viewed video recordings of the peer-delivered CBT sessions and rated the sessions (
10). All study participants signed an online consent form.
Adherence and competence scales were developed by an expert psychologist (P.J.B.) and a psychiatrist (R.J.V.L.). The development of these scales was based on these experts’ experience in developing and delivering group CBT and in providing supervision (
11), as well as on fidelity measures developed for another group CBT intervention for depression—Building Recovery by Improving Goals, Habits, and Thoughts (
5).
Because each of the intervention’s nine sessions varied in content, the adherence scale was composed of different items for each session and assessed topics such as agenda setting, content delivery, and homework review. Individual items were rated on a 3-point Likert scale ranging from 0, not covered at all, to 2, adequate coverage, or on a 4-point Likert scale ranging from 0, not covered at all, to 3, thorough coverage, with higher scores indicating greater adherence. Possible adherence scores for each session were variable and ranged from 0 to 15 (sessions 4, 6, and 9) to 0 to 31 (session 1).
The competence scale was composed of the same seven items for all nine sessions, with individual item scores ranging from 0, low competence, to 6, expert/high competence. Competence was assessed on structure and use of time, genuineness, empathy, collaboration, guided discovery, group participation, and emotional expression elicited. Possible competence scores for each session ranged from 0 to 42, with higher scores indicating greater competence.
The graduate student and peer facilitators were trained by the expert psychiatrist; training consisted of two 3-hour sessions. The first training session familiarized the trainees with the concept of fidelity and with the measures. During the second session, the trainees independently rated two previously recorded 2-hour peer-delivered sessions, and their ratings were compared and discussed.
After the training sessions, recorded group CBT sessions were sent to the expert psychiatrist, the graduate student, and the three peer facilitators weekly. One peer rated all nine sessions of one group; the two other peers rated all nine sessions of the other group. The expert psychiatrist and graduate student rated all 18 sessions. Each rater independently used the adherence and competence scales to rate the sessions and then sent their ratings to the research coordinator. Adherence and competence rating scores were standardized for each session and rater by dividing them by the total possible score for a given session. Interrater reliability was calculated for the expert psychiatrist versus the graduate student, for the expert psychiatrist versus the peer facilitators, and for the graduate student versus the peer facilitators.
Interrater reliability was calculated by using intraclass correlation coefficients (ICCs) for the three types of raters across sessions. Because not all peers rated the same sessions, a one-way random-effects model was used to calculate ICCs between the expert psychiatrist and the peer facilitators and between the graduate student and the peer facilitators. A two-way random-effects model was used to calculate ICCs between the expert psychiatrist and the graduate student because both rated all 18 sessions. We used nonparametric analyses to account for our small number of total raters (N=5) and sessions (N=18). Wilcoxon signed-rank tests were used to compare differences between the mean adherence and competence ratings of the three types of raters. Data were analyzed with SPSS Statistics, version 28, and statistical significance was set at p<0.05.
Results
Interrater reliability for adherence was excellent between the expert psychiatrist and the peer facilitators (ICC=0.98) and between the psychiatrist and the graduate student (ICC=0.91). Good interrater reliability for adherence between the student and peers (ICC=0.88) was noted.
Excellent interrater reliability for competence ratings between the psychiatrist and the peers (ICC=0.96) and between the student and the peers (ICC=0.92) was observed. Good interrater reliability was found for competence between the psychiatrist and the student (ICC=0.88).
No statistically significant differences in adherence or competence rating scores were found between groups. Mean adherence ratings for the three types of raters varied by only 0.02, and mean competence ratings varied by only 0.03 (
Table 1). Score ranges for adherence and competence among the three types of raters were also narrow.
Discussion
The results of this small study suggest that, with sufficient training and practice, peer facilitators can rate the fidelity of peer-delivered group CBT for postpartum depression as effectively as an expert and a trained student rater. These results represent the first step toward training peer facilitators to provide feedback to other peer facilitators during expert-led supervision and toward eventually shifting the task of supervision from experts to peers.
Our results were consistent with those of Singla and colleagues (
9), who compared peer and expert ratings of a peer-delivered behavioral intervention for perinatal depression. The structured scales used by Singla et al. to measure treatment-specific (e.g., homework assignment) and general skills (e.g., peer-client collaboration) were similar to our adherence and competence scales. In that study, peer providers used their ratings to guide the provision of supervision for select sessions later in the trial. Singla et al. found that, with training and practice, peers were able to rate sessions as reliably as experts, and they reported that the use of structured scales to rate therapy quality enabled effective supervision. In another study, Singla and colleagues (
2) examined the interrater reliability of therapy quality ratings for an intervention for depression and alcohol use and found that rating consistency between the expert and peers was achieved after 8 months of practice.
In addition to the use of structured scales, thorough training is also important for effective fidelity assessment. In our study, after two training sessions, the three peer facilitators evaluated fidelity as effectively as did the expert psychiatrist and the trained graduate student. In the studies by Singla and colleagues (
2,
9), peers needed more than 6 months of practice and supplementary training to rate sessions as effectively as did the experts. This difference may have been caused by the large numbers of raters in those studies or may reflect varying education and experience levels among raters.
The primary limitation of this preliminary study was its small number of sessions and peer raters, challenging the generalizability of our results to other peers and to future therapy sessions. In addition, use of the ICC method to calculate agreement among a small sample of raters may have yielded an overestimation of interrater reliability. Furthermore, we believe that our scales, developed and revised by a perinatal psychiatrist and by a psychologist with extensive experience in scale development and validation, have face validity and content validity. We did not formally assess the criterion or construct validity of these scales, however, given the lack of a gold standard with which to compare our measures (
5). Because of the limited data collected, the analysis was conducted at only the whole-scale level rather than for individual items on the scales. Moreover, our study was conducted in a high-income Western country, which may limit generalizability of the results to other parts of the world. Debate is ongoing about whether a distinction should be made between adherence and competence or whether a broader term, such as
therapy quality, might be more meaningful (
12). We acknowledge that, in clinical practice, high adherence is of little use in the presence of low competence (i.e., doing the right things poorly), as is high competence in the presence of low adherence (i.e., doing the wrong things well). In our context of research on a question involving a treatment delivered by nonprofessionals (i.e., peers), we decided to distinguish between applying the right psychotherapeutic procedures (i.e., adherence) and implementing the procedures skillfully (i.e., competence).
Conclusions
The results of this small study suggested that trained peer facilitators who have recovered from postpartum depression may effectively evaluate the fidelity of peer-delivered group CBT for postpartum depression. The next step is to conduct a larger study, with more peers and sessions, to confirm our preliminary interrater reliability results. Future work should also incorporate item-level analysis to obtain more specific insight into treatment integrity and investigate potential drawbacks of distinguishing between adherence and competence. Furthermore, future studies are needed to provide further support for shifting the task of supervision of peer facilitators from experts to trained peers. Finally, criterion validity and construct validity of the scales used to measure adherence and competence require formal assessment.