We discuss the design of observer agreement studies with binary assessments, with particular emphasis on the need for adequate sample size and the use of replicate observations. First, we present a method and tables for determining the sample size required for ensuring a desired precision for the estimate of the probability of disagreement between two observers. Second, for studies including replicate observations, we present a statistical model that allows estimation of the magnitude of within- and between-observer variation. We then derive sample sizes guaranteeing a specified precision for these estimates, present tables of these sample sizes and give examples of their use.