When is an Embedding Model More Promising than Another?

Darrin, Maxime; Formont, Philippe; Ayed, Ismail Ben; Cheung, Jackie CK; Piantanida, Pablo

Computer Science > Machine Learning

arXiv:2406.07640 (cs)

[Submitted on 11 Jun 2024]

Title:When is an Embedding Model More Promising than Another?

Authors:Maxime Darrin, Philippe Formont, Ismail Ben Ayed, Jackie CK Cheung, Pablo Piantanida

View PDF HTML (experimental)

Abstract:Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to perform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.07640 [cs.LG]
	(or arXiv:2406.07640v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.07640

Submission history

From: Maxime Darrin [view email]
[v1] Tue, 11 Jun 2024 18:13:46 UTC (6,279 KB)

Computer Science > Machine Learning

Title:When is an Embedding Model More Promising than Another?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:When is an Embedding Model More Promising than Another?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators