The Global Assessment of Functioning scale (GAF) (
1) is a measure of illness severity that was incorporated into the
DSM-III-R and
DSM-IV multiaxial diagnostic systems. The GAF is easily and quickly administered and is perhaps the most commonly used assessment instrument for psychiatric patients. All psychometric measures involve a compromise between ease of administration that comes with brevity and more time-consuming evaluations that result from comprehensiveness. The GAF, like other global measures, combines several distinct domains that may vary independently (
2,
3). A GAF score should reflect a patient's most severe symptoms and social-occupational dysfunction. However, when symptomatic and functional severity are not congruent, the GAF score is based on the domain (symptoms or functioning) with the greatest impairment and gives no information concerning other features of illness. This can create problems when the GAF is used to assess severity or monitor change in only one aspect of illness. Some studies suggest that GAF ratings tend to reflect symptom severity (
4–
8), in which case functional impairment may contribute minimal information. The presence of discordance between different aspects of illness can also increase the difficulty in assigning a GAF rating and may affect the scale's reliability and validity. In support of these issues, both relatively low (
3,
9–
11) and high (
5,
8,
9,
12,
13) interrater reliabilities have been reported, and the GAF tends to be a poor predictor of functional outcome (
6,
14).
In order to overcome these shortcomings, some researchers have developed multiple scales in an attempt to make global ratings for specific aspects of illness (
4,
15,
16). One research group generated a second scale by extracting the descriptors from the GAF that refer only to functioning (Social and Occupational Functioning Assessment Scale, or SOFAS) (
3). The SOFAS retains the same format as the GAF, shows good predictive validity (
5,
12,
17,
18), and was added to
DSM-IV-TR as a rating for further study. Unfortunately, the GAF was left unchanged, and there is considerable overlap between the GAF and SOFAS and continued ambiguity concerning the meaning of GAF ratings. Many treatment facilities routinely use the GAF to measure treatment response and to evaluate psychiatric care programs (
2). Given the widespread use of the GAF, its scores should be unambiguous.
The primary aim of this research was to improve the precision of routine clinical assessments that use
DSM-IV rating scales. In order to achieve this goal, we split the GAF into its two principal domains. The first scale includes the GAF descriptors of functioning (SOFAS) (
3), and the second was created by removing the SOFAS items from the GAF and leaving only descriptors of symptoms (Global Assessment of Psychopathology Scale, or GAPS). The two scores are compared, and as instructed by
DSM-IV, the GAF score is the most severe of the two ratings (that is, the lowest rating of the two measures, either the SOFAS or the GAPS). This study included two specific goals. The first goal was to determine the extent to which the GAPS and SOFAS contribute to GAF scores of patients with psychosis and the second was to assess the concurrent validity of these subscales.
Methods
Participants
Two patient samples were obtained. The first included inpatients from a ward for treatment-refractory psychosis, and the second comprised patients experiencing their first episode of psychosis. Patients with psychosis resulting from a general medical condition, substance use, or intellectual disability were excluded. Data from patients with treatment-refractory psychosis were collected as part of routine clinical practice. First-episode patients were recruited during a study of first-episode psychosis.
Patients with treatment-refractory psychosis.
The treatment-refractory sample was obtained from 314 consecutive admissions between 2001 and 2010. Eight patients had two admissions, and only the second was included in the analysis. Of the 306 patients, 18 (6%) were excluded (eight left the hospital before a full assessment could be completed, seven had a psychosis resulting from a general medical condition, two had intellectual disabilities, and one had an alcohol-induced psychosis). Each of the other 288 patients received a full multiaxial
DSM-IV diagnosis obtained by consensus between at least two psychiatrists, one psychologist, a social worker, and a psychiatric nurse. Diagnoses were based on a comprehensive psychiatric, social work, and nursing assessment of each patient and a review of all available hospital records from previous admissions (
19). The three diagnostic groups were nonaffective psychosis (150 with schizophrenia, 11 with psychosis not otherwise specified, and one with delusional disorder), schizoaffective disorder (104 patients), and mood disorder (15 with bipolar disorder and seven with depression with psychotic features).
First-episode patients.
A total of 124 patients were recruited from an early psychosis intervention program between 2001 and 2006. Five patients (4%) with a diagnosis of substance-induced psychosis were excluded. Each of the 119 other patients was given a diagnosis through a consensus process similar to the one used for the patients with treatment-refractory psychosis. Information used to make the diagnosis included a psychiatric interview, longitudinal assessment for one year, a Structured Clinical Interview for DSM-IV, and an interview with at least one family member. Diagnostic groups included nonaffective psychosis (58 with schizophrenia, six with schizophreniform disorder, eight with psychosis not otherwise specified, and two with delusional disorder), schizoaffective disorder (N=20), mood disorder (16 with bipolar disorder and nine with depression with psychotic features). A six- to 12-month follow-up (median=44 weeks) assessment was completed for 87 patients, and 32 refused to participate or were lost to follow-up.
Approval for all data collection was obtained from the Clinical Research Ethics Board of the University of British Columbia.
Measures
We split the GAF into two scales (GAPS and SOFAS) in January 2001 and thereafter collected separate ratings for these two components of the GAF. The SOFAS included the GAF descriptors of functioning as described in
DSM-IV (
3). The GAPS was created by removing the SOFAS items from the GAF, which left only the descriptors of symptoms. Functioning was rated at the time of referral and at discharge or follow-up.
DSM-IV instructs that the GAF rating reflect the most severe symptoms or functioning observed at the time of assessment. As would be the case if there were only a standard full-GAF rating, the GAF score reflected the lower of the two subscale scores. Illness severity at referral and at discharge or follow-up was also assessed for all patients with the Clinical Global Impression-severity subscale (CGI-severity) (
20); clinical change was assessed at discharge or follow-up with the Clinical Global Impression-improvement subscale (CGI-improvement) (
20). For patients with treatment-refractory psychosis, each rating (GAPS, SOFAS, CGI-severity, and CGI-improvement) reflected a consensus decision between at least two psychiatrists, a psychologist, a social worker, and a psychiatric nurse and were made after a detailed review of patient functioning during the present and all past episodes of illness. First-episode patients were rated by a research psychiatrist or psychologist trained in the use of these instruments. These assessments were based on information obtained during multiple patient and family interviews.
The Positive and Negative Syndrome Scale (PANSS) (
21) was completed at referral for all 407 patients and at discharge or follow-up for all 357 patients. These assessments were made by the treating psychiatrist, who was trained in the use of the PANSS. Most of the patients in the first-episode sample were outpatients, whereas all those who were chronically ill were inpatients. The definition of functioning in these two very different situations means that no single measure could capture the range of possible functioning. For the treatment-refractory group, the Routine Assessment of Patient Progress (RAPP) (
22) was completed at both admission and discharge by nursing staff trained in the use of this instrument. The three subscales of the RAPP (psychopathology, basic needs, and life skills) measure functioning over a one-week period. Most first-episode participants were outpatients, and therefore it was not possible to complete a nursing assessment such as the RAPP. For these patients, the treating clinician rated level of functioning using the Role Functioning Scale (RFS) (
23). The RFS includes an assessment of work productivity, independent living, and usual social networks and therefore could not be used with the inpatient sample. The RFS was completed for all first-episode patients at referral and for 87 at follow-up. The PANSS, RAPP, and RFS were completed independently and blind to other assessments.
Statistical analyses
No data were missing for any of the demographic variables or for GAPS, SOFAS, CGI-severity, or PANSS at recruitment. An exploration of the distribution of variables revealed significant kurtosis and skewness for the GAPS, SOFAS, and GAF scores, and these abnormalities could not be corrected by transforming the data. The distribution of PANSS, RAPP, and RFS scores also showed abnormalities in their distributions. The analysis of abnormally distributed data with the use of parametric statistics is likely to misrepresent true associations, and therefore nonparametric statistics were used in all analyses. The magnitude of associations between continuous variables was assessed with Spearman's rho (r
s). Differences in the strength of these nonindependent correlations were analyzed with the methods described by Steiger (
24) and a program provided at
www.stat-help.com. Differences between men and women were computed with the Mann-Whitney U test (reported as z scores); differences between the three diagnostic groups were computed with the Kruskal-Wallis test with post hoc adjusted pairwise comparisons.
Results
Demographic and clinical characteristics
Demographic characteristics of the 288 patients with treatment-refractory psychosis and the 119 first-episode patients are shown in
Table 1. In the treatment-refractory group, the median duration of hospital stay was 29 weeks, and most patients had shown substantial improvement by discharge (
Table 2). Eighteen patients had not been discharged at the time of this study, and therefore discharge ratings had not yet been completed. All assessments of patients with treatment-refractory psychosis were completed as part of standard clinical practice. Among the first-episode patients, all were acutely psychotic at referral, and most had improved substantially by the time of follow-up (
Table 2).
Age at presentation showed consistently low associations with all clinical measures in both treatment-refractory and first-episode groups (all r=−.15 to .16). In the treatment-refractory sample, women were more severely ill at admission than men on the GAPS (mean=25 versus 26 out of a possible 100, with higher scores indicating less severe symptoms; z=2.1, p<.03), PANSS general (mean=49 versus 47 out of a possible 210, with higher scores indicating greater severity; z=2.3, p=.02), and RAPP total scores (mean=22 versus 19 out of a possible 63, with higher scores indicating greater severity; z=2.9, p<.01). No significant gender differences were found for first-episode patients. Ratings of symptoms and functioning at admission did not significantly differ across diagnostic groups.
Contribution of the GAPS and SOFAS to GAF Scores
GAPS scores range from very severe (
1–
10) to no problems (90–100), and the presence of psychotic symptoms requires a rating of 40 or lower. All patients had psychotic symptoms at presentation, and therefore all GAPS scores were 40 or less. In addition, the lowest scores on this scale require persistent danger of severely hurting self or others, and virtually no patient met this criterion. Because of these features of the scale, GAPS scores were mostly between 11 and 40. SOFAS scores were more variable among first-episode patients, but patients in the treatment-refractory sample tended to score between 11 and 50 (
Table 3).
A strong positive correlation was found between scores on the GAPS and SOFAS for treatment-refractory and first-episode samples at admission or recruitment (r=.69 and r=.75, respectively) and at discharge or follow-up (r=.71 and r=.82, respectively). Similar correlations were found for men and women and for each diagnostic group (all correlations ranged from r=.63 to r=.83). Examination of the relative magnitude of ratings at presentation for the total sample indicated lower GAPS than SOFAS scores for 273 (67%) patients, equal scores for 76 (19%) patients, and lower SOFAS scores for 58 (14%) patients. These percentages remained skewed at discharge and follow-up: 198 (55%), 76 (21%), and 83 (23%).
The same pattern of differences between GAPS and SOFAS scores was found among men and women, but differences were observed between treatment-refractory and first-episode groups. For first-episode patients, GAPS ratings were virtually always lower than or equal to SOFAS scores at recruitment (118 of 119 patients, 99%) and at follow-up (84 of 87 patients, 96%), although this pattern was less marked in the treatment-refractory group at admission (231 of 288 patients, 80%) and discharge (190 of 270 patients, 70%).
GAPS ratings were lower than or equal to SOFAS ratings in each diagnostic group among virtually all first-episode patients, but results were more variable across diagnoses in the treatment-refractory group. GAPS scores were lower than or equal to SOFAS ratings for most patients with nonaffective psychosis (134 of 162 patients, 83%) or schizoaffective disorder (83 of 104 patients, 80%) although this pattern was less marked for mood disorder patients (13 of 22 patients, 59%). In the small sample of patients in the treatment-refractory group who had a mood disorder, half had a SOFAS score lower than (nine of 22 patients, 41%) or equal to (two of 22, 9%) the GAPS score. This suggests that functional impairment ratings made a relatively large contribution to GAF scores among mood disorder patients.
As noted above, the GAF score should be based on the features of illness that are most severe—either symptom severity or functional impairment. Lower scores on the GAPS than the SOFAS among most patients indicated that the GAF rating reflected symptom severity more often than functional impairment. This resulted in a very strong correlation between the GAF and the GAPS and a significantly weaker association between the GAF and the SOFAS (treatment-refractory group, .97 versus .76, z=16.9, p<.01; first-episode group, .99 versus .75, z=16.2, p<.01). This pattern of correlations was observed for each gender and across diagnostic groups.
Concurrent validity
Concurrent validity was determined by comparing the GAPS and SOFAS ratings with other clinical measures that were completed at the same time. Evaluation of the GAPS and SOFAS scores as measures of clinical change was assessed by correlating change scores with the change scores of other clinical measures. These comparisons were made for each patient group separately, and within each group the effects of gender and diagnosis were evaluated.
Patients with treatment-refractory psychosis.
Positive symptoms were more strongly correlated with GAPS scores than with SOFAS scores, and negative symptoms were more strongly associated with SOFAS ratings (
Table 4). This pattern of associations was evident at both admission and discharge. Correlations with the functional part of the RAPP (life skills) tended to be greater with the SOFAS than the GAPS, although this difference was significant only at discharge. Correlations with the GAPS were similar to those with the SOFAS for all other measures and did not significantly differ across gender or diagnosis. GAPS and SOFAS change scores were strongly correlated with each other (r
s=.71) and with changes in other clinical measures (
Table 5). The strength of correlations between the GAPS and other clinical measures did not significantly differ from the strength of those obtained with the SOFAS.
First-episode patients.
Positive symptoms were more strongly associated with GAPS ratings than with SOFAS ratings, and negative symptoms showed a trend toward a greater correlation with SOFAS ratings (
Table 4). Compared with the GAPS, the SOFAS was a significantly better predictor of total role functioning, which was evident at presentation and at follow-up. In addition, the CGI-severity score was more strongly associated with GAPS than SOFAS ratings. These associations did not differ significantly with gender. There were too few patients with schizoaffective and mood disorders to allow comparisons between diagnostic groups in the first-episode sample. Changes in GAPS and SOFAS scores between recruitment and follow-up were strongly correlated with each other (r
s=.72) and with changes in other clinical measures (
Table 5). Compared with changes in SOFAS scores, changes in GAPS scores were more strongly associated with changes in the CGI-severity score, and SOFAS changes were more strongly associated with changes on the RFS.
Discussion
The range in symptom severity ratings (GAPS) is restricted among patients with psychosis, and virtually all ratings were between 11 and 40. More variability was observed in ratings of functional impairment (SOFAS) for first-episode patients, although the range in the treatment-refractory sample was largely restricted to between 11 and 50. The GAF is based on the most severe symptoms or functioning, and ratings indicated greater symptom severity than functional impairment for most patients. This was especially the case for acutely ill patients, first-episode patients, and those with a nonaffective psychosis. These observations suggest that GAF scores of psychotic patients primarily reflect symptom severity, and functional impairment contributes relatively little information. This finding is consistent with previous research in which GAF scores were strongly related to symptom severity (
5–
8,
24). It is noteworthy that the relative contribution of the two parts of the GAF may differ across illness severity and diagnosis, and this would further add to the ambiguity associated with GAF scores. This finding suggests that despite its name, the GAF may be better conceptualized as a measure of symptom severity than a measure of functional impairment of patients with psychosis.
The assessment of concurrent validity indicated that the GAPS differed from the SOFAS in the pattern of associations with other clinical measures. The GAPS showed stronger correlations with the positive symptom scale of the PANSS and with the psychopathology scale of the RAPP. This finding was not surprising, given that GAPS descriptors for ratings between 11 and 40 are primarily the positive symptoms of psychosis. The SOFAS was more strongly correlated with the PANSS negative scale and measures of social functioning and self-care. These differences suggest that both scales can contribute distinct clinical information. It is, however, noteworthy that many of the associations with the GAPS were similar to those with the SOFAS. This finding suggests that the GAPS and SOFAS are, to some extent, general measures of illness severity. The change in both GAPS and SOFAS ratings between presentation and discharge or follow-up were strongly related to changes observed for more comprehensive clinical measures. However, both measures tend to predict general change in illness severity rather than specific changes in symptoms or functioning.
Interrater reliability was not assessed in this study. However, ratings for patients with treatment-refractory illness were based on a consensus among several mental health professionals, and assessments of first-episode patients were made by research clinicians. In both cases, ratings were made after a detailed review of past and current clinical status by clinicians trained in the use of the instruments. Previous studies that included trained raters reported high interrater reliability for the GAF (
9,
12), suggesting that reliability was acceptable in this study. It is also the case that our GAPS and SOFAS ratings were not obtained blind to other assessments, and this may have artificially inflated associations between the various measures. This possibility was reduced by using a process in which ratings were made after some debate and when all sources of information were considered. Nevertheless, further research is needed to fully assess reliability. Finally, this study included only patients with psychosis, and the psychometric properties of the GAF may differ for patients without psychosis.
Conclusions
Overall, the findings from this study suggest that the clinical utility of the GAF would be improved if it were divided into two separate scales. The use of separate scales allows the assessment of two distinct aspects of illness while retaining the ease of administration enjoyed by the current version of the GAF.
Acknowledgments and disclosures
Funding for the early-psychosis part of the study was provided by grant NET-54013 from the Canadian Institutes of Health Research, by the British Columbia Mental Health and Addictions Services, and by the Michael Smith Foundation for Health Research. No funding source or any other organization had any role in the study design, collection of data, analysis of results, interpretation of findings, or the writing of the report.
Dr. Flynn reports receiving consulting fees from Novartis. Dr. MacEwan reports receiving consulting or advisory board fees from AstraZeneca, Eli Lilly, Janssen, and Novartis and receiving lecture fees from GlaxoSmithKline; he has also received grant support from AstraZeneca. Dr. Kopala reports receiving honoraria for advisory board membership and speaker's fees from Bristol-Myers Squibb, Janssen, Ortho, and Pfizer. Dr. Honer reports receiving consulting or advisory board fees from In Silico, Janssen, and Solvay and lecture fees from AstraZeneca. The other authors report no competing interests.