Learning Multiscale Transformer Models for Sequence Generation

Li, Bei; Zheng, Tong; Jing, Yi; Jiao, Chengbo; Xiao, Tong; Zhu, Jingbo

Computer Science > Computation and Language

arXiv:2206.09337 (cs)

[Submitted on 19 Jun 2022]

Title:Learning Multiscale Transformer Models for Sequence Generation

Authors:Bei Li, Tong Zheng, Yi Jing, Chengbo Jiao, Tong Xiao, Jingbo Zhu

View PDF

Abstract:Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly modeled local features but ignored the word-boundary information. This results in redundant and ambiguous attention distributions, which lacks of interpretability. In this work, we define those scales in different linguistic units, including sub-words, words and phrases. We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. The proposed \textbf{U}niversal \textbf{M}ulti\textbf{S}cale \textbf{T}ransformer, namely \textsc{Umst}, was evaluated on two sequence generation tasks. Notably, it yielded consistent performance gains over the strong baseline on several test sets without sacrificing the efficiency.

Comments:	accepted by ICML2022
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2206.09337 [cs.CL]
	(or arXiv:2206.09337v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2206.09337

Submission history

From: Li Bei [view email]
[v1] Sun, 19 Jun 2022 07:28:54 UTC (3,917 KB)

Computer Science > Computation and Language

Title:Learning Multiscale Transformer Models for Sequence Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Learning Multiscale Transformer Models for Sequence Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators