Purpose: This study aims to evaluate the performance of four artificial intelligence-aided diagnostic systems in identifying and measuring four types of pulmonary nodules.
Methods: Four types of nodules were implanted in a commercial lung phantom. The phantom was scanned with multislice spiral computed tomography, after which four systems (A, B, C, D) were used to identify the nodules and measure their volumes.
Results: The relative volume error (RVE) of system A was the lowest for all nodules, except for small ground glass nodules (SGGNs). System C had the smallest RVE for SGGNs, -0.13 (-0.56, 0.00). In the Bland-Altman test, only systems A and C passed the consistency test, P = 0.40. In terms of precision, the miss rate (MR) of system C was 0.00% for small solid nodules (SSNs), ground glass nodules (GGNs), and solid nodules (SNs) but 4.17% for SGGNs. The comparable system D MRs for SGGNs, SSNs, and GGNs were 71.30%, 25.93%, and 47.22%, respectively, the highest among all the systems. Receiver operating characteristic curve analysis indicated that system A had the best performance in recognizing SSNs and GGNs, with areas under the curve of 0.91 and 0.68. System C had the best performance for SGGNs (AUC = 0.91).
Conclusion: Among four types nodules, SGGNs are the most difficult to recognize, indicating the need to improve higher accuracy and precision of artificial systems. System A most accurately measured nodule volume. System C was most precise in recognizing all four types of nodules, especially SGGN.
Keywords: artificial intelligence; lung phantom; pulmonary nodules.
© 2020 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.