Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Tyers, F M

.
  1. arXiv:2310.06764  [pdf, other

    cs.CL

    OmniLingo: Listening- and speaking-based language learning

    Authors: Francis M. Tyers, Nicholas Howell

    Abstract: In this demo paper we present OmniLingo, an architecture for distributing data for listening- and speaking-based language learning applications and a demonstration client built using the architecture. The architecture is based on the Interplanetary Filesystem (IPFS) and puts at the forefront user sovereignty over data.

    Submitted 10 October, 2023; originally announced October 2023.

  2. arXiv:2209.09742  [pdf, other

    cs.CL

    Yet Another Format of Universal Dependencies for Korean

    Authors: Yige Chen, Eunkyul Leah Jo, Yundong Yao, KyungTae Lim, Miikka Silfverberg, Francis M. Tyers, Jungyeul Park

    Abstract: In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automat… ▽ More

    Submitted 20 September, 2022; originally announced September 2022.

    Comments: COLING2022, Poster

  3. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  4. arXiv:2105.04674  [pdf

    cs.CL cs.LG cs.SD eess.AS

    What shall we do with an hour of data? Speech recognition for the un- and under-served languages of Common Voice

    Authors: Francis M. Tyers, Josh Meyer

    Abstract: This technical report describes the methods and results of a three-week sprint to produce deployable speech recognition models for 31 under-served languages of the Common Voice project. We outline the preprocessing steps, hyperparameter selection, and resulting accuracy on official testing sets. In addition to this we evaluate the models on multiple tasks: closed-vocabulary speech recognition, pre… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

  5. arXiv:2102.03662  [pdf, other

    cs.CL cs.SD eess.AS

    A bandit approach to curriculum generation for automatic speech recognition

    Authors: Anastasia Kuznetsova, Anurag Kumar, Francis M. Tyers

    Abstract: The Automated Speech Recognition (ASR) task has been a challenging domain especially for low data scenarios with few audio examples. This is the main problem in training ASR systems on the data from low-resource or marginalized languages. In this paper we present an approach to mitigate the lack of training data by employing Automated Curriculum Learning in combination with an adversarial bandit a… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

  6. arXiv:1912.06670  [pdf, other

    cs.CL cs.LG

    Common Voice: A Massively-Multilingual Speech Corpus

    Authors: Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, Gregor Weber

    Abstract: The Common Voice corpus is a massively-multilingual collection of transcribed speech intended for speech technology research and development. Common Voice is designed for Automatic Speech Recognition purposes but can be useful in other domains (e.g. language identification). To achieve scale and sustainability, the Common Voice project employs crowdsourcing for both data collection and data valida… ▽ More

    Submitted 5 March, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

    Comments: Accepted to LREC 2020

  7. arXiv:1809.04022  [pdf, ps, other

    cs.CL

    Can LSTM Learn to Capture Agreement? The Case of Basque

    Authors: Shauli Ravfogel, Francis M. Tyers, Yoav Goldberg

    Abstract: Sequential neural networks models are powerful tools in a variety of Natural Language Processing (NLP) tasks. The sequential nature of these models raises the questions: to what extent can these models implicitly learn hierarchical structures typical to human language, and what kind of grammatical phenomena can they acquire? We focus on the task of agreement prediction in Basque, as a case study… ▽ More

    Submitted 26 November, 2018; v1 submitted 11 September, 2018; originally announced September 2018.

    Comments: Accepted to "Analyzing and interpreting neural networks for NLP" workshop at EMNLP 2018