A logistic regression model based on the national mammography database format to aid breast cancer diagnosis

Jagpreet Chhatwal; Oguzhan Alagoz; Mary J Lindstrom; Charles E Kahn Jr; Katherine A Shaffer; Elizabeth S Burnside

doi:10.2214/AJR.07.3345

A logistic regression model based on the national mammography database format to aid breast cancer diagnosis

AJR Am J Roentgenol. 2009 Apr;192(4):1117-27. doi: 10.2214/AJR.07.3345.

Authors

Jagpreet Chhatwal¹, Oguzhan Alagoz, Mary J Lindstrom, Charles E Kahn Jr, Katherine A Shaffer, Elizabeth S Burnside

Affiliation

¹ Department of Radiology, University of Wisconsin School of Medicine and Public Health, E3/311 Clinical Science Center, 600 Highland Ave., Madison, WI 53792-3252, USA.

Abstract

Objective: The purpose of our study was to create a breast cancer risk estimation model based on the descriptors of the National Mammography Database using logistic regression that can aid in decision making for the early detection of breast cancer.

Materials and methods: We created two logistic regression models based on the mammography features and demographic data for 62,219 consecutive mammography records from 48,744 studies in 18,269 [corrected] patients reported using the Breast Imaging Reporting and Data System (BI-RADS) lexicon and the National Mammography Database format between April 5, 1999 and February 9, 2004. State cancer registry outcomes matched with our data served as the reference standard. The probability of cancer was the outcome in both models. Model 2 was built using all variables in Model 1 plus radiologists' BI-RADS assessment categories. We used 10-fold cross-validation to train and test the model and to calculate the area under the receiver operating characteristic curves (A(z)) to measure the performance. Both models were compared with the radiologists' BI-RADS assessments.

Results: Radiologists achieved an A(z) value of 0.939 +/- 0.011. The A(z) was 0.927 +/- 0.015 for Model 1 and 0.963 +/- 0.009 for Model 2. At 90% specificity, the sensitivity of Model 2 (90%) was significantly better (p < 0.001) than that of radiologists (82%) and Model 1 (83%). At 85% sensitivity, the specificity of Model 2 (96%) was significantly better (p < 0.001) than that of radiologists (88%) and Model 1 (87%).

Conclusion: Our logistic regression model can effectively discriminate between benign and malignant breast disease and can identify the most important features associated with breast cancer.

MeSH terms

Adolescent
Adult
Aged
Aged, 80 and over
Breast Neoplasms / diagnostic imaging*
Female
Humans
Logistic Models*
Mammography*
Middle Aged
Predictive Value of Tests
ROC Curve
Registries
Retrospective Studies
Risk Assessment / methods
Sensitivity and Specificity
Vereinigte Staaten

Abstract

MeSH terms

Grants and funding