Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Tang, Tianyi; Luo, Wenyang; Huang, Haoyang; Zhang, Dongdong; Wang, Xiaolei; Zhao, Xin; Wei, Furu; Wen, Ji-Rong

Computer Science > Computation and Language

arXiv:2402.16438 (cs)

[Submitted on 26 Feb 2024 (v1), last revised 6 Jun 2024 (this version, v2)]

Title:Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Authors:Tianyi Tang, Wenyang Luo, Haoyang Huang, Dongdong Zhang, Xiaolei Wang, Xin Zhao, Furu Wei, Ji-Rong Wen

View PDF HTML (experimental)

Abstract:Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. It remains a challenging problem to explain the underlying mechanisms by which LLMs process multilingual texts. In this paper, we delve into the composition of Transformer architectures in LLMs to pinpoint language-specific regions. Specially, we propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Based on LAPE, we conduct comprehensive experiments on several representative LLMs, such as LLaMA-2, BLOOM, and Mistral. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons, primarily situated in the models' top and bottom layers. Furthermore, we showcase the feasibility to "steer" the output language of LLMs by selectively activating or deactivating language-specific neurons. Our research provides important evidence to the understanding and exploration of the multilingual capabilities of LLMs.

Comments:	Accepted by ACL 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.16438 [cs.CL]
	(or arXiv:2402.16438v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.16438

Submission history

From: Tianyi Tang [view email]
[v1] Mon, 26 Feb 2024 09:36:05 UTC (273 KB)
[v2] Thu, 6 Jun 2024 08:14:46 UTC (281 KB)

Computer Science > Computation and Language

Title:Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators