DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

Wong, Kyle; Amayuelas, Alfonso; Pan, Liangming; Wang, William Yang

Computer Science > Machine Learning

arXiv:2406.14867 (cs)

[Submitted on 21 Jun 2024]

Title:DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

Authors:Kyle Wong, Alfonso Amayuelas, Liangming Pan, William Yang Wang

View PDF HTML (experimental)

Abstract:Large language models (LLMs) have shown remarkable performance on code generation tasks. A recent application of LLMs for code generation is iterative code repair, where a model fixes an incorrect program by rationalizing about errors and generating a new program. However, code repair is primarily studied on high-resource languages like Python, and the framework's efficacy is under-explored on low-resource languages. To apply code repair for low-resource languages, we propose Distilling Low-Resource Repairs (DistiLRR), an approach that transfers the reasoning and code generation ability from a teacher model to a student model. Our results show that DistiLRR consistently outperforms baselines on low-resource languages, but has similar performance on high-resource languages. To investigate this behavior, we perform a further analysis and find that the correlation between rationale quality and code correctness is weaker than previously perceived. We hypothesize this weakness is magnified in low-resource settings where base models lack deep knowledge of a programming language, leading to wavering benefits of code repair between high-resource and low-resource languages.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2406.14867 [cs.LG]
	(or arXiv:2406.14867v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.14867

Submission history

From: Kyle Wong [view email]
[v1] Fri, 21 Jun 2024 05:05:39 UTC (796 KB)

Computer Science > Machine Learning

Title:DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DistiLRR: Transferring Code Repair for Low-Resource Programming Languages

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators