Introduction: Most cohort studies are limited by sampling and accrual bias. The capability to detect specific lesions identified in radiological text reports could eliminate these biases and benefit patient care, clinical research, and trial recruitment. This study derived and internally validated text search algorithms to identify four common urological lesions (solid renal masses, complex renal cysts, adrenal masses, and simple renal cysts) using radiology text reports.
Methods: A simple random sample of 10 000 abdominal ultrasound (US) and computed tomography (CT) reports was drawn from our hospital's data warehouse. Reports were manually reviewed to determine the true status of the four lesions. Using commonly available software, we created logistic regression models having as predictors the status of a priori selected text terms in the report. We used bootstrap sampling with 95th percentile thresholds to select variables for the final models, which were modified into point systems. A second independent, random sample of 2855 reports, stratified by the number of points for each abnormality, was reviewed in a blinded fashion to measure the accuracy of each lesion's point system.
Results: The prevalence of solid renal mass, complex renal cyst, adrenal mass, and simple renal cyst, was 2.0%, 1.7%, 3.2%, and 20.0%, respectively. Each model contained between one and five text terms with c-statistics ranging between 0.66 and 0.90. In the independent validation, the scoring systems accurately predicted the probability that a text report cited the four lesions.
Conclusions: Textual radiology reports can be analyzed using common statistical software to accurately determine the probability that important abnormalities of the kidneys or adrenal glands exist. These methods can be used for case identification or epidemiological studies.