In Reply: The New York state legislature mandated an evaluation of the New York City pilot program on outpatient commitment to investigate whether the program reduced rehospitalization and rearrest and improved clinical outcomes and quality of life. As is the case with many program evaluations, the evaluation was conducted in the real-world setting of Bellevue Hospital and with time constraints imposed by the statute. The statute also included a provision for eliminating outpatient commitment unless the evaluation was completed and its results implemented by a certain date.
The site selection was limited to the only hospital authorized for the pilot program—Bellevue. The number of patients who were admitted to Bellevue and who were eligible for outpatient commitment during the study period was not under our control, although we approached all eligible patients. However, we did have control over the assignment of patients to the study groups. Eligible patients were randomly assigned to outpatient commitment and control groups, and both groups received enhanced treatment. Funding for the evaluation began in July 1995, and the results were to be submitted to the legislature by December 1998. These dates, along with the need to allow adequate follow-up time for the outcome measures, dictated the length of the enrollment period and hence the number of subjects.
Early in our investigation, we conducted a power analysis to estimate an adequate sample size, and we expected to have samples large enough to allow us to statistically detect moderate differences between the groups. In general, samples of 78 and 64 subjects provide 80 percent power to detect differences of a moderate (.47) effect size. Thus we believe that the sample was large enough to allow valid comparison of the two groups on all major outcome variables.
Our findings indicated a difference between the outpatient commitment group and the control group in hospital use, but the difference was not statistically significant. The lack of significance is due to the magnitude of the difference, the variability within the groups, and the size of the sample. Nonsignificant results can be reported in a variety of ways. These include reporting the observed p value (or test statistic), constructing a 95 percent confidence interval around the group differences, or reporting the number of subjects that would have been required for the observed difference to reach significance at the .05 level. When any one of the three values is known, the other values can be computed.
We chose to report the size of the samples that would permit us to detect statistically significant differences. We attempted to clarify for readers just how close the results were to statistical significance by hypothesizing that if the samples were twice as large, the results would have been statistically significant. In their letter, Dr. O'Reilly and Dr. Bishop observe that it is useful to have a measure of the strength of the trend toward statistical significance.
We reported the facts as they were observed in this study. We conducted the study within the constraints often imposed in real-world research, and we made every effort to use the best scientific methods and analytical techniques available. Although the results were not as striking or conclusive as people on either side of the outpatient commitment debate would have liked, they are accurately represented in the article and, we believe, help inform the debate on these issues.