Purpose: To develop a new algorithm to measure the similarity between the query lung mass and reference lung mass data set for content-based medical image retrieval (CBMIR).
Methods: A lung mass data set including 746 mass regions of interest (ROIs) was assembled. Among them, 375 ROIs depicted malignant lesions and 371 depicted benign lesions. Each mass ROI is represented by a vector of 26 texture features. A kernel function was employed to map the original data in input space to a feature space. In this space, a semisupervised distance metric was learned, which used differential scatter discriminant criterion to represent the semantic relevance, and the regularization term to represent the visual similarity. The learned distance metric can measure the similarity of the query mass and reference mass data set. The clustering accuracy is used to configure the parameters. The retrieval accuracy and classification accuracy are used as the performance assessment index.
Results: After configuring the parameters, a mean clustering accuracy of 0.87 can be achieved. For retrieval accuracy, our algorithm achieves better performance than other state-of-the-art retrieval algorithms when applying a leave-one-out validation method to the testing data set. For classification accuracy, the area under the ROC curve of our algorithm can be achieved as 0.941 ± 0.006. The running times of 346 query images with the proposed algorithm are 5.399 and 6.0 s, respectively.
Conclusions: The study results demonstrated the proposed algorithm outperforms the compared algorithms, when taking the semantic relevant and visual similarity into account in kernel space. The algorithm can be used in a CBMIR system for a query mass to retrieve similarity masses, which can help doctors make better decisions.