Purpose: To assess the reliability of the S.T.O.N.E. (stone size [S], tract length [T], obstruction [O], number of involved calices [N], and essence or stone density [E]) nephrolithometry scoring system by testing its reproducibility between different observers.
Patients and methods: Preoperative images of 58 patients who underwent percutaneous nephrolithotomy (PCNL) were reviewed. Medical students, urology residents, one fellow, and a urology attending independently reviewed all images and scored the renal stones. Interobserver reliabilities of the total score for all categories and each component were evaluated by the intraclass correlation (ICC) and a κ coefficient.
Results: The interobserver reliability for the total score demonstrated high correlations for all components and total score (ICC=S, T, O, N, E and total 0.80, 0.97, 0.89, 0.84, 0.91, and 0.87, respectively). κ rates for individual components between two medical students were 0.36, 1, 0.31, 0.45, 0.33, and 0.30 for the S, T, O, N, E components and total score, respectively. κ values between the two urology residents were 0.71, 1, 0.92, 0.79, 0.93, and 0.67 for S, T, O, N, E components and total score, respectively. κ values between the urology fellow and an attending physician were 0.95, 1, 0.88, 0.94, 0.89, and 0.87 for S, T, O, N, E components and total score, respectively. P value for all the scoring components was <0.05, indicating that the estimated κ was not a result of chance.
Conclusions: The S.T.O.N.E. nephrolithometry has excellent interobserver reliability. Quantifying the S and N metrics was the most challenging and least reliable. Standardized protocols to measure these components should be considered to improve accuracy and reproducibility of the scoring system.