Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus

Khiarak, Jalil Nourmohammadi; Ahmadi, Ammar; Saeed, Taher Ak-bari; Asgari-Chenaghlu, Meysam; Atabay, Toğrul; Karimi, Mohammad Reza Baghban; Ceferli, Ismail; Hasanvand, Farzad; Mousavi, Seyed Mahboub; Noshad, Morteza

Computer Science > Computation and Language

arXiv:2407.05189 (cs)

[Submitted on 6 Jul 2024]

Title:Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus

Authors:Jalil Nourmohammadi Khiarak, Ammar Ahmadi, Taher Ak-bari Saeed, Meysam Asgari-Chenaghlu, Toğrul Atabay, Mohammad Reza Baghban Karimi, Ismail Ceferli, Farzad Hasanvand, Seyed Mahboub Mousavi, Morteza Noshad

View PDF

Abstract:This paper introduces a pioneering English-Azerbaijani (Arabic Script) parallel corpus, designed to bridge the technological gap in language learning and machine translation (MT) for under-resourced languages. Consisting of 548,000 parallel sentences and approximately 9 million words per language, this dataset is derived from diverse sources such as news articles and holy texts, aiming to enhance natural language processing (NLP) applications and language education technology. This corpus marks a significant step forward in the realm of linguistic resources, particularly for Turkic languages, which have lagged in the neural machine translation (NMT) revolution. By presenting the first comprehensive case study for the English-Azerbaijani (Arabic Script) language pair, this work underscores the transformative potential of NMT in low-resource contexts. The development and utilization of this corpus not only facilitate the advancement of machine translation systems tailored for specific linguistic needs but also promote inclusive language learning through technology. The findings demonstrate the corpus's effectiveness in training deep learning MT systems and underscore its role as an essential asset for researchers and educators aiming to foster bilingual education and multilingual communication. This research covers the way for future explorations into NMT applications for languages lacking substantial digital resources, thereby enhancing global language education frameworks. The Python package of our code is available at this https URL, and we also have a website accessible at this https URL.

Comments:	This paper is accepted and published at NeTTT 2024 Conf
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.05189 [cs.CL]
	(or arXiv:2407.05189v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.05189

Submission history

From: Jalil Nourmohammadi Khiarak [view email]
[v1] Sat, 6 Jul 2024 21:23:20 UTC (330 KB)

Computer Science > Computation and Language

Title:Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators