An Efficient Self-Supervised Cross-View Training For Sentence Embedding

Limkonchotiwat, Peerat; Ponwitayarat, Wuttikorn; Lowphansirikul, Lalita; Udomcharoenchaikit, Can; Chuangsuwanich, Ekapol; Nutanong, Sarana

Computer Science > Computation and Language

arXiv:2311.03228 (cs)

[Submitted on 6 Nov 2023]

Title:An Efficient Self-Supervised Cross-View Training For Sentence Embedding

Authors:Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong

View PDF

Abstract:Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a representation learning method such as contrastive learning. While this approach achieves impressive performance on larger PLMs, the performance rapidly degrades as the number of parameters decreases. In this paper, we propose a framework called Self-supervised Cross-View Training (SCT) to narrow the performance gap between large and small PLMs. To evaluate the effectiveness of SCT, we compare it to 5 baseline and state-of-the-art competitors on seven Semantic Textual Similarity (STS) benchmarks using 5 PLMs with the number of parameters ranging from 4M to 340M. The experimental results show that STC outperforms the competitors for PLMs with less than 100M parameters in 18 of 21 cases.

Comments:	Accepted to TACL. The code and pre-trained models are available at this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2311.03228 [cs.CL]
	(or arXiv:2311.03228v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.03228

Submission history

From: Ekapol Chuangsuwanich [view email]
[v1] Mon, 6 Nov 2023 16:12:25 UTC (499 KB)

Computer Science > Computation and Language

Title:An Efficient Self-Supervised Cross-View Training For Sentence Embedding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:An Efficient Self-Supervised Cross-View Training For Sentence Embedding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators