Current elbow-scoring systems are based on the observer-derived assessment of a variety of clinical and functional criteria, which are scored separately and then aggregated. The aggregate score then is assigned a categorical ranking that ranges from excellent to poor. The developers of different elbow-scoring systems have chosen different outcome criteria, assigned different weights to each criterion, and accorded different ranges of values to each categorical ranking. Five different elbow-scoring systems (the Mayo elbow-performance index and the systems of Broberg and Morrey, Ewald et al., The Hospital for Special Surgery, and Pritchard) were used to evaluate the same group of patients. The validity of the scoring systems was determined with use of visual-analog scales for the assessment of pain and function, patient and physician-derived ratings of the severity of impairment of the elbow, and two functional questionnaires completed by the patient (the Disabilities of the Arm, Shoulder and Hand questionnaire and the Modified American Shoulder and Elbow Surgeons patient self-evaluation form). The study sample consisted of sixty-nine patients who had sought treatment at one of two tertiary referral clinics because of problems related to the elbow. Pearson product-moment correlation coefficients were used to compare the raw aggregate scores, and kappa statistics were used to determine the level of agreement among the categorical rankings (excellent, good, fair, and poor). Examination of the five scoring systems revealed a remarkable lack of concordance with regard to the aspects of elbow function that were assessed. Good correlation was observed when the systems were compared on the basis of raw scores (Pearson product-moment correlation coefficients, 0.79 to 0.90), but only slight-to-moderate correlation was noted when the systems were compared on the basis of categorical rankings (quadratic weighted kappa coefficients, 0.18 to 0.49). Validity testing showed the system of Ewald et al. and the Mayo elbow-performance index to be the most discriminating, the system of Pritchard to be the least discriminating, and the system of The Hospital for Special Surgery and the system of Broberg and Morrey to be intermediate. The scores determined with the elbow-scoring systems demonstrated only moderate correlation with the score for function on the visual analog scale (Pearson product-moment correlation coefficients, 0.44 to 0.66), whereas those derived from the functional questionnaires completed by the patient demonstrated moderate-to-good correlation with the score for function (Pearson product-moment correlation coefficients, 0.72 and 0.80).