Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Mhaskar, Shivam Ratnakant; Shah, Nirmesh J.; Zaki, Mohammadi; Gudmalwar, Ashishkumar P.; Wasnik, Pankaj; Shah, Rajiv Ratn

Computer Science > Computation and Language

arXiv:2403.15469 (cs)

[Submitted on 20 Mar 2024]

Title:Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Authors:Shivam Ratnakant Mhaskar, Nirmesh J. Shah, Mohammadi Zaki, Ashishkumar P. Gudmalwar, Pankaj Wasnik, Rajiv Ratn Shah

View PDF HTML (experimental)

Abstract:Traditional Automatic Video Dubbing (AVD) pipeline consists of three key modules, namely, Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS). Within AVD pipelines, isometric-NMT algorithms are employed to regulate the length of the synthesized output text. This is done to guarantee synchronization with respect to the alignment of video and audio subsequent to the dubbing process. Previous approaches have focused on aligning the number of characters and words in the source and target language texts of Machine Translation models. However, our approach aims to align the number of phonemes instead, as they are closely associated with speech duration. In this paper, we present the development of an isometric NMT system using Reinforcement Learning (RL), with a focus on optimizing the alignment of phoneme counts in the source and target language sentence pairs. To evaluate our models, we propose the Phoneme Count Compliance (PCC) score, which is a measure of length compliance. Our approach demonstrates a substantial improvement of approximately 36% in the PCC score compared to the state-of-the-art models when applied to English-Hindi language pairs. Moreover, we propose a student-teacher architecture within the framework of our RL approach to maintain a trade-off between the phoneme count and translation quality.

Comments:	Accepted in NAACL2024 Findings
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2403.15469 [cs.CL]
	(or arXiv:2403.15469v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.15469

Submission history

From: Mohammadi Zaki [view email]
[v1] Wed, 20 Mar 2024 08:52:40 UTC (8,730 KB)

Computer Science > Computation and Language

Title:Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators