In single-arm, two-stage phase II clinical trials to evaluate efficacy of cancer treatments using a response endpoint, one typically identifies a single reference response rate to be the null hypothesis benchmark. Patients eligible for the trial are assumed to have this response rate on average under the null hypothesis. When patients arise from subpopulations having different response rates, this single response rate reference may not be appropriate for the particular mix of patients actually enrolled on the trial. As a result, the Type I and Type II error rates conditional on the mix of enrolled patients may differ considerably from the unconditional error rates used to design the trial. We describe a method for designing two-stage Phase II studies that accounts for patient heterogeneity and effectively stabilizes conditional Type I and Type II error over the range of patient mixes that are likely to arise. Use of the design requires good estimates of the expected response rate within each population stratum as well as the stratum membership probabilities, but its properties are similar to and often preferable to the standard two-stage design even in situations where the underlying assumptions do not hold absolutely.