One size does not fit all in evaluating model selection scores for image classification

Sci Rep. 2024 Dec 4;14(1):30239. doi: 10.1038/s41598-024-81752-w.

Abstract

Selecting pretrained models for image classification often involves computationally intensive finetuning. This study addresses a research gap in the standardized evaluation of transferability scores, which could simplify model selection by ranking pretrained models without exhaustive finetuning. The motivation is to reduce the computational burden of model selection through a consistent approach that guides practitioners in balancing accuracy and efficiency across tasks. This study evaluates 14 transferability scores on 11 benchmark datasets. It includes both Convolutional Neural Network (CNN) and Vision Transformer (ViT) models and ensures consistency in experimental conditions to counter the variability in previous research. Key findings reveal significant variability in score effectiveness based on dataset characteristics (e.g., fine-grained versus coarse-grained classes) and model architectures. ViT models generally show superior transferability, especially for fine-grained datasets. While no single score is best in all cases, some scores excel in specific contexts. In addition to predictive accuracy, the study also evaluates computational efficiency and identifies scores that are suitable for resource-constrained scenarios. This research provides insights for selecting appropriate transferability scores to optimize model selection strategies to facilitate efficient deployment in practice.

Keywords: Image classification; Model ranking; Model selection; Transfer learning; Transferability estimation.