This guideline was developed using a process intended to meet standards of the Institute of Medicine (2011) (now known as the National Academy of Medicine). The process is fully described in a document available on the APA Web site at: www.psychiatry.org/psychiatrists/practice/clinical-practice-guidelines/guideline-development-process.
Management of Potential Conflicts of Interest
Members of the Guideline Writing Group (GWG) are required to disclose all potential conflicts of interest before appointment, before and during guideline development, and on publication. If any potential conflicts are found or disclosed during the guideline development process, the member must recuse himself or herself from any related discussion and voting on a related recommendation. The members of both the GWG and the Systematic Review Group (SRG), as well as the two consultants, reported no conflicts of interest. The Disclosures section includes more detailed disclosure information for each GWG and SRG member and for the consultants involved in the guideline’s development.
Guideline Writing Group Composition
The GWG was initially composed of seven psychiatrists and one registered nurse with general research and clinical expertise. This non-topic-specific group was intended to provide diverse and balanced views on the guideline topic to minimize potential bias. For subject matter expertise, two experts on AUD were added, one of whom is board-certified in both internal medicine and addiction medicine and the other of whom is board-certified in psychiatry, with subspecialty certification in child and adolescent psychiatry. One consultant (J. M.) was also added to the GWG to provide input on quality measure considerations. An additional consultant (J. K.) assisted with drafting of guideline text. The vice-chair of the GWG (L. J. F.) provided methodological expertise on such topics as appraising the strength of research evidence. The GWG was also diverse and balanced with respect to other characteristics, such as geographical location and demographic background.
Mental Health America reviewed the draft and provided perspective from patients, families, and other care partners.
Systematic Review Methodology
The AHRQ’s systematic review, Pharmacotherapy for Adults With Alcohol-Use Disorders in Outpatient Settings (Jonas et al. 2014), served as the predominant source of information for this guideline. Both the AHRQ review and the guideline are based on a systematic search of available research evidence using MEDLINE (PubMed), Cochrane Library, PsycINFO, CINAHL, and EMBASE databases (Table 1). The search terms and limits used are available in Appendix A. Results were limited to English-language, adult (18 and older), and human-only studies. The search that informed the AHRQ review (Jonas et al. 2014) was from January 1, 1970 to October 11, 2013, and the subsequent search of the literature by APA staff was from September 1, 2013 through April 24, 2016. Literature from the updated search was screened by two reviewers (L. J. F. and S.-H. H.) according to APA’s general screening criteria: RCT, systematic review or meta-analysis, or observational study with a sample of at least 50 individuals; human; study of the effects of a specific intervention or psychiatric disorder or symptoms. Abstracts were then reviewed by one individual (L. J. F.), with verification by a second reviewer (S.-H. H.) to determine whether they met eligibility criteria.
Studies were included if subjects were adults (age 18 years or older) with AUD, including alcohol abuse or alcohol dependence as defined in DSM-IV-TR (American Psychiatric Association 2000), who received treatment with medications approved by the FDA for treating alcohol dependence (acamprosate, disulfiram, naltrexone) or with medications that have been used off-label or are under investigation for treatment of AUD (e.g., amitriptyline, aripiprazole, atomoxetine, baclofen, buspirone, citalopram, desipramine, escitalopram, fluoxetine, fluvoxamine, gabapentin, imipramine, nalmefene, olanzapine, ondansetron, paroxetine, prazosin, quetiapine, sertraline, topiramate, valproate, varenicline, viloxazine). Outcomes could include consumption-related outcomes (e.g., return to any drinking, return to heavy drinking, drinking days, heavy drinking days, drinks per drinking day, time to lapse or relapse), health outcomes (e.g., accidents, injuries, quality of life, function, mortality), and adverse events (including study withdrawal). Studies also needed to be published in English and to include at least 12 weeks of outpatient follow-up from the time of treatment initiation.
Exclusion criteria were studies of children and adolescents under 18 years of age, trials in which the purpose of pharmacotherapy was to treat alcohol withdrawal, trials with craving or cue reactivity as primary outcomes, studies that were conducted predominantly in inpatient settings or with follow-up of less than 12 weeks, and those that were published in languages other than English.
For each trial identified for inclusion from the updated search, risk of bias was determined (Agency for Healthcare Research and Quality 2014; Viswanathan et al. 2012) on the basis of information from each study that was extracted by one reviewer (L. J. F.) and checked for accuracy by another reviewer (S.-H. H.). In addition to specific information about each reported outcome, extracted information included citation; study design; treatment arms (including doses, sample sizes); co-intervention, if applicable; trial duration and follow-up duration, if applicable; country; setting; funding source; recruitment method; sample characteristics (mean age, percent nonwhite, percent female, percent with co-occurring condition); methods for randomization and allocation concealment; similarity of groups at baseline; overall and differential attrition; cross-overs or other contamination in group composition; adequacy of intervention fidelity; adequacy of adherence; appropriate masking of patients, outcome assessors, and care providers; validity and reliability of outcome measures; appropriateness of statistical methods and handling of missing data; appropriate methods for assessing harms (e.g., well-defined, pre-specified, well-described valid/reliable ascertainment); and adequate follow-up period for assessing harms.
Summary tables (see Appendices B and C) include specific details for each study identified for inclusion from the updated literature search and also include data on studies identified for inclusion in the AHRQ review. For studies from the AHRQ review, study details were obtained from tables published with the AHRQ review by one reviewer (S.-H. H.) and double-checked by a second reviewer (L. J. F.). Data on elements that were not included in the AHRQ review were extracted from the original articles as described above for articles from the updated search.
Available guidelines from other organizations were also reviewed (National Collaborating Centre for Mental Health 2011; Rolland et al. 2016; U.S. Department of Veterans Affairs, U.S. Department of Defense 2015).
Additional targeted searches were conducted in MEDLINE (PubMed) on alcohol biomarkers, patient preferences in AUD pharmacotherapy, and use of pharmacotherapy for AUD during pregnancy and while breastfeeding. The search terms, limits used, and dates of these searches are available in Appendix A. Results were limited to English-language, adult (18 and older), and human-only studies. These titles and abstracts were reviewed for relevance by one individual (L. J. F.).
Rating the Strength of Supporting Research Evidence
Strength of supporting research evidence describes the level of confidence that findings from scientific observation and testing of an effect of an intervention reflect the true effect. Confidence is enhanced by such factors as rigorous study design and minimal potential for study bias.
Ratings were determined, in accordance with the AHRQ’s Methods Guide for Effectiveness and Comparative Effectiveness Reviews (Agency for Healthcare Research and Quality 2014), by the methodologist (L. J. F.) and reviewed by members of the SRG and GWG. Available clinical trials were assessed across four primary domains: risk of bias, consistency of findings across studies, directness of the effect on a specific health outcome, and precision of the estimate of effect.
The ratings are defined as follows:
▫
High (denoted by the letter A) = High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.
▫
Moderate (denoted by the letter B) = Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.
▫
Low (denoted by the letter C) = Low confidence that the evidence reflects the true effect. Further research is likely to change our confidence in the estimate of effect and is likely to change the estimate.
The AHRQ has an additional category of insufficient for evidence that is unavailable or does not permit estimation of an effect. The APA uses the low rating when evidence is insufficient because there is low confidence in the conclusion and further research, if conducted, would likely change the estimated effect or confidence in the estimated effect.
Rating the Strength of Recommendations
Each guideline statement is separately rated to indicate strength of recommendation and strength of supporting research evidence. Strength of recommendation describes the level of confidence that potential benefits of an intervention outweigh potential harms. This level of confidence is informed by available evidence, which includes evidence from clinical trials as well as expert opinion and patient values and preferences. As described in the section “Rating the Strength of Supporting Research Evidence”), this rating is a consensus judgment of the authors of the guideline and is endorsed by the APA Board of Trustees.
There are two possible ratings: recommendation or suggestion. A recommendation (denoted by the numeral 1 after the guideline statement) indicates confidence that the benefits of the intervention clearly outweigh harms. A suggestion (denoted by the numeral 2 after the guideline statement) indicates greater uncertainty. Although the benefits of the statement are still viewed as outweighing the harms, the balance of benefits and harms is more difficult to judge, or either the benefits or the harms may be less clear. With a suggestion, patient values and preferences may be more variable, and this can influence the clinical decision that is ultimately made. These strengths of recommendation correspond to ratings of strong or weak (also termed conditional) as defined under the GRADE method for rating recommendations in clinical practice guidelines (described in publications such as Guyatt et al. 2008 and others available on the Web site of the GRADE Working Group at http://www.gradeworkinggroup.org/).
When a negative statement is made, ratings of strength of recommendation should be understood as meaning the inverse of the above (e.g., recommendation indicates confidence that harms clearly outweigh benefits).
The GWG determined ratings of strength of recommendation by a modified Delphi method using blind, iterative voting and discussion. In order for the GWG members to be able to ask for clarifications about the evidence, the wording of statements, or the process, the vice-chair of the GWG served as a resource and did not vote on statements. All other formally appointed GWG members, including the chair, voted.
In weighing potential benefits and harms, GWG members considered the strength of supporting research evidence, their own clinical experiences and opinions, and patient preferences. For recommendations, at least eight out of nine members must have voted to recommend the intervention or assessment after two rounds of voting, and at most one member was allowed to vote other than “recommend” the intervention or assessment. On the basis of the discussion among the GWG members, adjustments to the wording of recommendations could be made between the voting rounds. If this level of consensus was not achieved, the GWG could have agreed to make a suggestion rather than a recommendation. No suggestion or statement could have been made if three or more members voted “no statement.” Differences of opinion within the group about ratings of strength of recommendation, if any, are described in the subsection “Balancing of Potential Benefits and Harms in Rating the Strength of the Guideline Statement” for each statement.
Use of Guidelines to Enhance Quality of Care
Clinical practice guidelines can help enhance quality by synthesizing available research evidence and delineating recommendations for care on the basis of the available evidence. In some circumstances, practice guideline recommendations will be appropriate to use in developing quality measures. Guideline statements can also be used in other ways, such as educational activities or electronic clinical decision support, to enhance the quality of care that patients receive.
Typically, guideline recommendations that are chosen for development into quality measures will advance one or more aims of the Institute of Medicine's (2001) report on “Crossing the Quality Chasm” and the ongoing work guided by the multistakeholder-integrated AHRQ-led National Quality Strategy by facilitating care that is safe, effective, patient-centered, timely, efficient, and equitable. To achieve these aims, a broad range of quality measures (Watkins et al. 2015) is needed that spans the entire continuum of care (e.g., prevention, screening, assessment, treatment, continuing care), addresses the different levels of the health system hierarchy (e.g., system-wide, organization, program/department, individual clinicians), and includes measures of different types (e.g., process, outcome, patient-centered experience). Emphasis is also needed on factors that influence the dissemination and adoption of evidence-based practices (Drake et al. 2008; Greenhalgh et al. 2004; Horvitz-Lennon et al. 2009).
Measure development is complex and requires detailed development of specification and pilot testing (Center for Health Policy/Center for Primary Care and Outcomes Research and Battelle Memorial Institute 2011; Fernandes-Taylor and Harris 2012; Iyer et al. 2016; Pincus et al. 2016; Watkins et al. 2011). Generally, however, measure development should be guided by the available evidence and focused on measures that are broadly relevant, feasible to implement, and meaningful to patients, clinicians, and policy makers. Often, quality measures will focus on gaps in care or on care processes and outcomes that have significant variability across specialties, health care settings, geographic areas, or patients’ demographic characteristics. Administrative databases, registries, and data from electronic health records can help to identify gaps in care and key domains that would benefit from performance improvements (Acevedo et al. 2015; Patel et al. 2015; Watkins et al. 2016). Nevertheless, for some guideline statements, evidence of practice gaps or variability will be based on anecdotal observations if the typical practices of psychiatrists and other health professionals are unknown. Variability in the use of guideline-recommended approaches may reflect appropriate differences that are tailored to the patient’s preferences, treatment of co-occurring illnesses, or other clinical circumstances that may not have been studied in the available research. On the other hand, variability may indicate a need to strengthen clinician knowledge or address other barriers to adoption of best practices (Drake et al. 2008; Greenhalgh et al. 2004; Horvitz-Lennon et al. 2009). When performance is compared among organizations, variability may reflect a need for quality improvement initiatives to improve overall outcomes but could also reflect case-mix differences such as socioeconomic factors or the prevalence of co-occurring illnesses.
When a guideline recommendation is considered for development into a quality measure, it must be possible to define the applicable patient group (i.e., the denominator) and the clinical action or outcome of interest that is measured (i.e., the numerator) in validated, clear, and quantifiable terms. Furthermore, the health system’s or clinician’s performance on the measure must be readily ascertained from chart review, patient-reported outcome measures, registries, or administrative data. Documentation of quality measures can be challenging, and, depending on the practice setting, can pose practical barriers to meaningful interpretation of quality measures based on guideline recommendations. For example, when recommendations relate to patient assessment or treatment selection, clinical judgment may need to be used to determine whether the clinician has addressed the factors that merit emphasis for an individual patient. In other circumstances, standardized instruments can facilitate quality measurement reporting, but it is difficult to assess the appropriateness of clinical judgment in a validated, standardized manner. Furthermore, utilization of standardized assessments remains low (Fortney et al. 2017), and clinical findings are not routinely documented in a standardized format. Many clinicians appropriately use free text prose to describe symptoms, response to treatment, discussions with family, plans of treatment, and other aspects of care and clinical decision making. Reviewing these free text records for measurement purposes would be impractical, and it would be inappropriate to hold clinicians accountable to such measures without significant increases in electronic medical record use and advances in natural language processing technology.
Conceptually, quality measures can be developed for purposes of accountability, for internal or health system–based quality improvement, or both. Accountability measures require clinicians to report their rate of performance of a specified process, intermediate outcome, or outcome in a specified group of patients. Because these data are used to determine financial incentives or penalties based on performance, accountability measures must be scientifically validated, have a strong evidence base, and fill gaps in care. In contrast, internal or health system–based quality improvement measures are typically designed by and for individual providers, health systems, or payers. They typically focus on measurements that can suggest ways for clinicians or administrators to improve efficiency and delivery of services within a particular setting. Internal or health system–based quality improvement programs may or may not link performance with payment, and, in general, these measures are not subject to strict testing and validation requirements. Quality improvement activities, including performance measures derived from these guidelines, should yield improvements in quality of care to justify any clinician burden (e.g., documentation burden) or related administrative costs (e.g., for manual extraction of data from charts, for modifications of electronic medical record systems to capture required data elements). Possible unintended consequences of any derived measures would also need to be addressed in testing of a fully specified measure in a variety of practice settings. For example, highly specified measures may lead to overuse of standardized language that does not accurately reflect what has occurred in practice. If multiple discrete fields are used to capture information on a paper or electronic record form, data will be easily retrievable and reportable, but oversimplification is a possible unintended consequence of measurement. Just as guideline developers must balance the benefits and harms of a particular guideline recommendation, developers of performance measures must weigh the potential benefits, burdens, and unintended consequences in optimizing quality measure design and testing.
External Review
This guideline was made available for review in February 2017 by stakeholders, including the APA membership, scientific and clinical experts, allied organizations, and the public. In addition, a number of patient advocacy organizations were invited for input. Forty-eight individuals and 12 organizations submitted comments on the guideline (see the section “Individuals and Organizations That Submitted Comments” for a list of the names). Dr. Raymond Anton provided significant helpful input on the implementation section of Statement 3 (Use of Physiological Biomarkers). The Chair and Co-chair of the GWG reviewed and addressed all comments received; substantive issues were reviewed by the GWG.
Funding and Approval
This guideline development project was funded and supported by the APA without any involvement of industry or external funding. The guideline was submitted to the APA Assembly and APA Board of Trustees and approved on May 20, 2017 and July 16, 2017, respectively.