ELFS: Enhancing Label-Free Coreset Selection via Clustering-based Pseudo-Labeling

Zheng, Haizhong; Tsai, Elisa; Lu, Yifu; Sun, Jiachen; Bartoldson, Brian R.; Kailkhura, Bhavya; Prakash, Atul

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.04273 (cs)

[Submitted on 6 Jun 2024]

Title:ELFS: Enhancing Label-Free Coreset Selection via Clustering-based Pseudo-Labeling

Authors:Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R. Bartoldson, Bhavya Kailkhura, Atul Prakash

View PDF HTML (experimental)

Abstract:High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground-truth labels over the whole dataset, failing to reduce the human labeling burden. Meanwhile, SOTA label-free coreset selection methods deliver inferior performance due to poor geometry-based scores. In this paper, we introduce ELFS, a novel label-free coreset selection method. ELFS employs deep clustering to estimate data difficulty scores without ground-truth labels. Furthermore, ELFS uses a simple but effective double-end pruning method to mitigate bias on calculated scores, which further improves the performance on selected coresets. We evaluate ELFS on five vision benchmarks and show that ELFS consistently outperforms SOTA label-free baselines. For instance, at a 90% pruning rate, ELFS surpasses the best-performing baseline by 5.3% on CIFAR10 and 7.1% on CIFAR100. Moreover, ELFS even achieves comparable performance to supervised coreset selection at low pruning rates (e.g., 30% and 50%) on CIFAR10 and ImageNet-1K.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.04273 [cs.CV]
	(or arXiv:2406.04273v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.04273

Submission history

From: Haizhong Zheng [view email]
[v1] Thu, 6 Jun 2024 17:23:05 UTC (5,123 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:ELFS: Enhancing Label-Free Coreset Selection via Clustering-based Pseudo-Labeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:ELFS: Enhancing Label-Free Coreset Selection via Clustering-based Pseudo-Labeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators