Zum Hauptinhalt springen

Showing 1–5 of 5 results for author: Robinson, N R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05376  [pdf, other

    cs.CL

    Kreyòl-MT: Building MT for Latin American, Caribbean and Colonial African Creole Languages

    Authors: Nathaniel R. Robinson, Raj Dabre, Ammon Shurtz, Rasul Dent, Onenamiyi Onesi, Claire Bizon Monroc, Loïc Grobol, Hasan Muhammad, Ashi Garg, Naome A. Etori, Vijay Murari Tiyyala, Olanrewaju Samuel, Matthew Dean Stutzman, Bismarck Bamfo Odoom, Sanjeev Khudanpur, Stephen D. Richardson, Kenton Murray

    Abstract: A majority of language technologies are tailored for a small number of high-resource languages, while relatively many low-resource languages are neglected. One such group, Creole languages, have long been marginalized in academic study, though their speakers could benefit from machine translation (MT). These languages are predominantly used in much of Latin America, Africa and the Caribbean. We pr… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: NAACL 2024

  2. arXiv:2403.13169  [pdf, other

    cs.CL

    Wav2Gloss: Generating Interlinear Glossed Text from Speech

    Authors: Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

    Abstract: Thousands of the world's languages are in danger of extinction--a tremendous threat to cultural identities and human language diversity. Interlinear Glossed Text (IGT) is a form of linguistic annotation that can support documentation and resource creation for these languages' communities. IGT typically consists of (1) transcriptions, (2) morphological segmentation, (3) glosses, and (4) free transl… ▽ More

    Submitted 5 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: ACL 2024 camera ready version

  3. arXiv:2402.01582  [pdf

    cs.CL

    Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study

    Authors: Kalvin Chang, Nathaniel R. Robinson, Anna Cai, Ting Chen, Annie Zhang, David R. Mortensen

    Abstract: We describe a set of new methods to partially automate linguistic phylogenetic inference given (1) cognate sets with their respective protoforms and sound laws, (2) a mapping from phones to their articulatory features and (3) a typological database of sound changes. We train a neural network on these sound change data to weight articulatory distances between phones and predict intermediate sound c… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted to LChange 2023

  4. arXiv:2309.07423  [pdf, other

    cs.CL

    ChatGPT MT: Competitive for High- (but not Low-) Resource Languages

    Authors: Nathaniel R. Robinson, Perez Ogayo, David R. Mortensen, Graham Neubig

    Abstract: Large language models (LLMs) implicitly learn to perform a range of language tasks, including machine translation (MT). Previous studies explore aspects of LLMs' MT capabilities. However, there exist a wide variety of languages for which recent LLM MT performance has never before been evaluated. Without published experimental evidence on the matter, it is difficult for speakers of the world's dive… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 27 pages, 9 figures, 14 tables

  5. arXiv:2209.06295  [pdf, other

    cs.CL

    Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican

    Authors: Nathaniel R. Robinson, Cameron J. Hogan, Nancy Fulda, David R. Mortensen

    Abstract: Multilingual transfer techniques often improve low-resource machine translation (MT). Many of these techniques are applied without considering data characteristics. We show in the context of Haitian-to-English translation that transfer effectiveness is correlated with amount of training data and relationships between knowledge-sharing languages. Our experiments suggest that for some languages beyo… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.