Search | arXiv e-print repository

Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification

Authors: Pierre Lepagnol, Thomas Gerald, Sahar Ghannay, Christophe Servan, Sophie Rosset

Abstract: This study is part of the debate on the efficiency of large versus small language models for text classification by prompting.We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring fu… ▽ More This study is part of the debate on the efficiency of large versus small language models for text classification by prompting.We assess the performance of small language models in zero-shot text classification, challenging the prevailing dominance of large models.Across 15 datasets, our investigation benchmarks language models from 77M to 40B parameters using different architectures and scoring functions. Our findings reveal that small models can effectively classify texts, getting on par with or surpassing their larger counterparts.We developed and shared a comprehensive open-source repository that encapsulates our methodologies. This research underscores the notion that bigger isn't always better, suggesting that resource-efficient small models may offer viable solutions for specific data classification challenges. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Journal ref: LREC-COLING 2024, May 2024, TURIN, Italy

arXiv:2403.19727 [pdf, ps, other]

New Semantic Task for the French Spoken Language Understanding MEDIA Benchmark

Authors: Nadège Alavoine, Gaëlle Laperriere, Christophe Servan, Sahar Ghannay, Sophie Rosset

Abstract: Intent classification and slot-filling are essential tasks of Spoken Language Understanding (SLU). In most SLUsystems, those tasks are realized by independent modules. For about fifteen years, models achieving both of themjointly and exploiting their mutual enhancement have been proposed. A multilingual module using a joint modelwas envisioned to create a touristic dialogue system for a European p… ▽ More Intent classification and slot-filling are essential tasks of Spoken Language Understanding (SLU). In most SLUsystems, those tasks are realized by independent modules. For about fifteen years, models achieving both of themjointly and exploiting their mutual enhancement have been proposed. A multilingual module using a joint modelwas envisioned to create a touristic dialogue system for a European project, HumanE-AI-Net. A combination ofmultiple datasets, including the MEDIA dataset, was suggested for training this joint model. The MEDIA SLU datasetis a French dataset distributed since 2005 by ELRA, mainly used by the French research community and free foracademic research since 2020. Unfortunately, it is annotated only in slots but not intents. An enhanced version ofMEDIA annotated with intents has been built to extend its use to more tasks and use cases. This paper presents thesemi-automatic methodology used to obtain this enhanced version. In addition, we present the first results of SLUexperiments on this enhanced dataset using joint models for intent classification and slot-filling. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Journal ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy

arXiv:2403.19726 [pdf, other]

A Benchmark Evaluation of Clinical Named Entity Recognition in French

Authors: Nesrine Bannour, Christophe Servan, Aurélie Névéol, Xavier Tannier

Abstract: Background: Transformer-based language models have shown strong performance on many Natural LanguageProcessing (NLP) tasks. Masked Language Models (MLMs) attract sustained interest because they can be adaptedto different languages and sub-domains through training or fine-tuning on specific corpora while remaining lighterthan modern Large Language Models (LLMs). Recently, several MLMs have been rel… ▽ More Background: Transformer-based language models have shown strong performance on many Natural LanguageProcessing (NLP) tasks. Masked Language Models (MLMs) attract sustained interest because they can be adaptedto different languages and sub-domains through training or fine-tuning on specific corpora while remaining lighterthan modern Large Language Models (LLMs). Recently, several MLMs have been released for the biomedicaldomain in French, and experiments suggest that they outperform standard French counterparts. However, nosystematic evaluation comparing all models on the same corpora is available. Objective: This paper presentsan evaluation of masked language models for biomedical French on the task of clinical named entity recognition.Material and methods: We evaluate biomedical models CamemBERT-bio and DrBERT and compare them tostandard French models CamemBERT, FlauBERT and FrALBERT as well as multilingual mBERT using three publicallyavailable corpora for clinical named entity recognition in French. The evaluation set-up relies on gold-standardcorpora as released by the corpus developers. Results: Results suggest that CamemBERT-bio outperformsDrBERT consistently while FlauBERT offers competitive performance and FrAlBERT achieves the lowest carbonfootprint. Conclusion: This is the first benchmark evaluation of biomedical masked language models for Frenchclinical entity recognition that compares model performance consistently on nested entity recognition using metricscovering performance and environmental impact. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Journal ref: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), May 2024, Torino, Italy

arXiv:2403.18338 [pdf, other]

mALBERT: Is a Compact Multilingual BERT Model Still Worth It?

Authors: Christophe Servan, Sahar Ghannay, Sophie Rosset

Abstract: Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models. In this article, considering these critical remarks, we propose to focus on smallermodels, such as compact models like ALBERT, which are more ecologically virtuous than these PLM. However,PLMs enable huge breakthroughs in Natural Language Processing ta… ▽ More Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models. In this article, considering these critical remarks, we propose to focus on smallermodels, such as compact models like ALBERT, which are more ecologically virtuous than these PLM. However,PLMs enable huge breakthroughs in Natural Language Processing tasks, such as Spoken and Natural LanguageUnderstanding, classification, Question--Answering tasks. PLMs also have the advantage of being multilingual, and,as far as we know, a multilingual version of compact ALBERT models does not exist. Considering these facts, wepropose the free release of the first version of a multilingual compact ALBERT model, pre-trained using Wikipediadata, which complies with the ethical aspect of such a language model. We also evaluate the model against classicalmultilingual PLMs in classical NLP tasks. Finally, this paper proposes a rare study on the subword tokenizationimpact on language performances. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, May 2024, Torino, Italy

arXiv:2310.14392 [pdf, other]

Effects of phylogeny on coexistence in model communities

Authors: Carlos A. Servan, Jose A. Capitan, Zachary R. Miller, Stefano Allesina

Abstract: Species' interactions are shaped by their traits. Thus, we expect traits -- in particular, trait (dis)similarity -- to play a central role in determining whether a particular set of species coexists. Traits are, in turn, the outcome of an eco-evolutionary process summarized by a phylogenetic tree. Therefore, the phylogenetic tree associated with a set of species should carry information about the… ▽ More Species' interactions are shaped by their traits. Thus, we expect traits -- in particular, trait (dis)similarity -- to play a central role in determining whether a particular set of species coexists. Traits are, in turn, the outcome of an eco-evolutionary process summarized by a phylogenetic tree. Therefore, the phylogenetic tree associated with a set of species should carry information about the dynamics and assembly properties of the community. Many studies have highlighted the potentially complex ways in which this phylogenetic information is translated into species' ecological properties. However, much less emphasis has been placed on developing clear, quantitative expectations for community properties under a particular hypothesis. To address this gap, we couple a simple model of trait evolution on a phylogenetic tree with Lotka-Volterra community dynamics. This allows us to derive properties of a community of coexisting species as a function of the number of traits, tree topology and the size of the species pool. Our analysis highlights how phylogenies, through traits, affect the coexistence of a set of species. Together, these results provide much-needed baseline expectations for the ways in which evolutionary history, summarized by phylogeny, is reflected in the size and structure of ecological communities. △ Less

Submitted 10 August, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

MSC Class: 92D40 (Primary); 60B20 (Secondary)

arXiv:2305.04153 [pdf, ps, other]

Isometric embeddings of Teichmüller spaces are covering constructions

Authors: Frederik Benirschke, Carlos A. Serván

Abstract: Pulling back complex structures along a branched covering induces a holomorphic isometric embedding of Teichmüller spaces. We show that for dimension at least $2$, all isometric embeddings arise from branched coverings. This generalizes a theorem of Royden. As a consequence we obtain that totally geodesic submanifolds of Teichmüller space, which are isometric to some Teichmüller space, are coverin… ▽ More Pulling back complex structures along a branched covering induces a holomorphic isometric embedding of Teichmüller spaces. We show that for dimension at least $2$, all isometric embeddings arise from branched coverings. This generalizes a theorem of Royden. As a consequence we obtain that totally geodesic submanifolds of Teichmüller space, which are isometric to some Teichmüller space, are covering constructions. Another consequence is the classification of locally isometric embeddings of moduli spaces of Riemann surfaces. △ Less

Submitted 6 May, 2023; originally announced May 2023.

Comments: 17 pages

arXiv:2207.09157 [pdf, ps, other]

On the cross-lingual transferability of multilingual prototypical models across NLU tasks

Authors: Oralie Cattan, Christophe Servan, Sophie Rosset

Abstract: Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications when a sufficient number of training examples are available. In practice, these approaches suffer from the drawbacks of domain-driven design and under-resourced languages. Domain and language models are supposed to grow and change as the p… ▽ More Supervised deep learning-based approaches have been applied to task-oriented dialog and have proven to be effective for limited domain and language applications when a sufficient number of training examples are available. In practice, these approaches suffer from the drawbacks of domain-driven design and under-resourced languages. Domain and language models are supposed to grow and change as the problem space evolves. On one hand, research on transfer learning has demonstrated the cross-lingual ability of multilingual Transformers-based models to learn semantically rich representations. On the other, in addition to the above approaches, meta-learning have enabled the development of task and language learning algorithms capable of far generalization. Through this context, this article proposes to investigate the cross-lingual transferability of using synergistically few-shot learning with prototypical neural networks and multilingual Transformers-based models. Experiments in natural language understanding tasks on MultiATIS++ corpus shows that our approach substantially improves the observed transfer learning performances between the low and the high resource languages. More generally our approach confirms that the meaningful latent space learned in a given language can be can be generalized to unseen and under-resourced ones using meta-learning. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted to the ACL workshop METANLP 2021

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2207.09152 [pdf, ps, other]

Benchmarking Transformers-based models on French Spoken Language Understanding tasks

Authors: Oralie Cattan, Sahar Ghannay, Christophe Servan, Sophie Rosset

Abstract: In the last five years, the rise of the self-attentional Transformer-based architectures led to state-of-the-art performances over many natural language tasks. Although these approaches are increasingly popular, they require large amounts of data and computational resources. There is still a substantial need for benchmarking methodologies ever upwards on under-resourced languages in data-scarce ap… ▽ More In the last five years, the rise of the self-attentional Transformer-based architectures led to state-of-the-art performances over many natural language tasks. Although these approaches are increasingly popular, they require large amounts of data and computational resources. There is still a substantial need for benchmarking methodologies ever upwards on under-resourced languages in data-scarce application conditions. Most pre-trained language models were massively studied using the English language and only a few of them were evaluated on French. In this paper, we propose a unified benchmark, focused on evaluating models quality and their ecological impact on two well-known French spoken language understanding tasks. Especially we benchmark thirteen well-established Transformer-based models on the two available spoken language understanding tasks for French: MEDIA and ATIS-FR. Within this framework, we show that compact models can reach comparable results to bigger ones while their ecological impact is considerably lower. However, this assumption is nuanced and depends on the considered compression method. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted paper at INTERSPEECH 2022

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2207.09150 [pdf, ps, other]

On the Usability of Transformers-based models for a French Question-Answering task

Authors: Oralie Cattan, Christophe Servan, Sophie Rosset

Abstract: For many tasks, state-of-the-art results have been achieved with Transformer-based architectures, resulting in a paradigmatic shift in practices from the use of task-specific architectures to the fine-tuning of pre-trained language models. The ongoing trend consists in training models with an ever-increasing amount of data and parameters, which requires considerable resources. It leads to a strong… ▽ More For many tasks, state-of-the-art results have been achieved with Transformer-based architectures, resulting in a paradigmatic shift in practices from the use of task-specific architectures to the fine-tuning of pre-trained language models. The ongoing trend consists in training models with an ever-increasing amount of data and parameters, which requires considerable resources. It leads to a strong search to improve resource efficiency based on algorithmic and hardware improvements evaluated only for English. This raises questions about their usability when applied to small-scale learning problems, for which a limited amount of training data is available, especially for under-resourced languages tasks. The lack of appropriately sized corpora is a hindrance to applying data-driven and transfer learning-based approaches with strong instability cases. In this paper, we establish a state-of-the-art of the efforts dedicated to the usability of Transformer-based models and propose to evaluate these improvements on the question-answering performances of French language which have few resources. We address the instability relating to data scarcity by investigating various training strategies with data augmentation, hyperparameters optimization and cross-lingual transfer. We also introduce a new compact model for French FrALBERT which proves to be competitive in low-resource settings. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: French compact model paper: FrALBERT, Accepted to RANLP 2021

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2207.01704 [pdf, other]

On the uniqueness of the Prym map

Authors: Carlos A. Serván

Abstract: The classical Prym construction associates to a smooth, genus $g$ complex curve $X$ equipped with a nonzero cohomology class $θ\in H^1(X,\mathbb{Z}/2\mathbb{Z})$, a principally polarized abelian variety (PPAV) $\mbox{Prym}(X,θ)$. Denote the moduli space of pairs $(X,θ)$ by $\mathcal{R}_g$, and let $\mathcal{A}_h$ be the moduli space of PPAVs of dimension $h$. The Prym construction globalizes to a… ▽ More The classical Prym construction associates to a smooth, genus $g$ complex curve $X$ equipped with a nonzero cohomology class $θ\in H^1(X,\mathbb{Z}/2\mathbb{Z})$, a principally polarized abelian variety (PPAV) $\mbox{Prym}(X,θ)$. Denote the moduli space of pairs $(X,θ)$ by $\mathcal{R}_g$, and let $\mathcal{A}_h$ be the moduli space of PPAVs of dimension $h$. The Prym construction globalizes to a holomorphic map of complex orbifolds $\mbox{Prym}: \mathcal{R}_g \to \mathcal{A}_{g-1}$. For $g\geq 4$ and $h \leq g-1$, we show that $\mbox{Prym}$ is the unique nonconstant holomorphic map of complex orbifolds $F:\mathcal{R}_g \to \mathcal{A}_h$. This solves a conjecture of Farb. A main component in our proof is a classification of homomorphisms $π_1^{\mbox{orb}}(\mathcal{R}_g) \to \mbox{Sp}(2h,\mathbb{Z})$ for $h \leq g-1$. This is achieved using arguments from geometric group theory and low-dimensional topology. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:1910.07481 [pdf, ps, other]

Using Whole Document Context in Neural Machine Translation

Authors: Valentin Macé, Christophe Servan

Abstract: In Machine Translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a simple yet promising approach to add contextual information in Neural Machine Translation. We present a method to add source context that capture the whole document with accurate boundaries, taking every word into account. We provide this additional informati… ▽ More In Machine Translation, considering the document as a whole can help to resolve ambiguities and inconsistencies. In this paper, we propose a simple yet promising approach to add contextual information in Neural Machine Translation. We present a method to add source context that capture the whole document with accurate boundaries, taking every word into account. We provide this additional information to a Transformer model and study the impact of our method on three language pairs. The proposed approach obtains promising results in the English-German, English-French and French-English document-level translation tasks. We observe interesting cross-sentential behaviors where the model learns to use document-level information to improve translation coherence. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Comments: Accepted paper to IWSLT2019

arXiv:1907.05790 [pdf, other]

Qwant Research @DEFT 2019: Document matching and information retrieval using clinical cases

Authors: Estelle Maudet, Oralie Cattan, Maureen de Seyssel, Christophe Servan

Abstract: This paper reports on Qwant Research contribution to tasks 2 and 3 of the DEFT 2019's challenge, focusing on French clinical cases analysis. Task 2 is a task on semantic similarity between clinical cases and discussions. For this task, we propose an approach based on language models and evaluate the impact on the results of different preprocessings and matching techniques. For task 3, we have deve… ▽ More This paper reports on Qwant Research contribution to tasks 2 and 3 of the DEFT 2019's challenge, focusing on French clinical cases analysis. Task 2 is a task on semantic similarity between clinical cases and discussions. For this task, we propose an approach based on language models and evaluate the impact on the results of different preprocessings and matching techniques. For task 3, we have developed an information extraction system yielding very encouraging results accuracy-wise. We have experimented two different approaches, one based on the exclusive use of neural networks, the other based on a linguistic analysis. △ Less

Submitted 6 July, 2019; originally announced July 2019.

Comments: Article accepted at the workshop DEfi fouille de Texte (DEFT 2019). Article in French

Journal ref: DEFT 2019

arXiv:1903.11299 [pdf, other]

Image search using multilingual texts: a cross-modal learning approach between image and text

Authors: Maxime Portaz, Hicham Randrianarivo, Adrien Nivaggioli, Estelle Maudet, Christophe Servan, Sylvain Peyronnet

Abstract: Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) conten… ▽ More Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7]. △ Less

Submitted 14 May, 2019; v1 submitted 27 March, 2019; originally announced March 2019.

arXiv:1902.08278 [pdf, other]

doi 10.1103/PhysRevE.101.062302

Thresholding normally distributed data creates complex networks

Authors: George T. Cantwell, Yanchen Liu, Benjamin F. Maier, Alice C. Schwarze, Carlos A. Serván, Jordan Snyder, Guillaume St-Onge

Abstract: Network data sets are often constructed by some kind of thresholding procedure. The resulting networks frequently possess properties such as heavy-tailed degree distributions, clustering, large connected components and short average shortest path lengths. These properties are considered typical of complex networks and appear in many contexts, prompting consideration of their universality. Here we… ▽ More Network data sets are often constructed by some kind of thresholding procedure. The resulting networks frequently possess properties such as heavy-tailed degree distributions, clustering, large connected components and short average shortest path lengths. These properties are considered typical of complex networks and appear in many contexts, prompting consideration of their universality. Here we introduce a simple model for correlated relational data and study the network ensemble obtained by thresholding it. We find that some, but not all, of the properties associated with complex networks can be seen after thresholding the correlated data, even though the underlying data are not "complex". In particular, we observe heavy-tailed degree distributions, a large numbers of triangles, and short path lengths, while we do not observe non-vanishing clustering or community structure. △ Less

Submitted 29 May, 2020; v1 submitted 21 February, 2019; originally announced February 2019.

Comments: incorporated referees' suggestions; to be published in Phys. Rev. E

Journal ref: Phys. Rev. E 101, 062302 (2020)

arXiv:1709.03814 [pdf, other]

SYSTRAN Purely Neural MT Engines for WMT2017

Authors: Yongchao Deng, Jungi Kim, Guillaume Klein, Catherine Kobus, Natalia Segal, Christophe Servan, Bo Wang, Dakun Zhang, Josep Crego, Jean Senellart

Abstract: This paper describes SYSTRAN's systems submitted to the WMT 2017 shared news translation task for English-German, in both translation directions. Our systems are built using OpenNMT, an open-source neural machine translation system, implementing sequence-to-sequence models with LSTM encoder/decoders and attention. We experimented using monolingual data automatically back-translated. Our resulting… ▽ More This paper describes SYSTRAN's systems submitted to the WMT 2017 shared news translation task for English-German, in both translation directions. Our systems are built using OpenNMT, an open-source neural machine translation system, implementing sequence-to-sequence models with LSTM encoder/decoders and attention. We experimented using monolingual data automatically back-translated. Our resulting models are further hyper-specialised with an adaptation technique that finely tunes models according to the evaluation test sentences. △ Less

Submitted 12 September, 2017; originally announced September 2017.

Comments: Published in WMT 2017

arXiv:1612.06141 [pdf, other]

Domain specialization: a post-training domain adaptation for Neural Machine Translation

Authors: Christophe Servan, Josep Crego, Jean Senellart

Abstract: Domain adaptation is a key feature in Machine Translation. It generally encompasses terminology, domain and style adaptation, especially for human post-editing workflows in Computer Assisted Translation (CAT). With Neural Machine Translation (NMT), we introduce a new notion of domain adaptation that we call "specialization" and which is showing promising results both in the learning speed and in a… ▽ More Domain adaptation is a key feature in Machine Translation. It generally encompasses terminology, domain and style adaptation, especially for human post-editing workflows in Computer Assisted Translation (CAT). With Neural Machine Translation (NMT), we introduce a new notion of domain adaptation that we call "specialization" and which is showing promising results both in the learning speed and in adaptation accuracy. In this paper, we propose to explore this approach under several perspectives. △ Less

Submitted 19 December, 2016; originally announced December 2016.

Comments: Submitted to EACL 2017 short paper

arXiv:1612.01744 [pdf, other]

Listen and Translate: A Proof of Concept for End-to-End Speech-to-Text Translation

Authors: Alexandre Berard, Olivier Pietquin, Christophe Servan, Laurent Besacier

Abstract: This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding. We propose a model for direct speech-to-text translation, which gives promising results on a small French-English synthetic corpus. Relaxing the need for source language transcription would drastically change the data collection… ▽ More This paper proposes a first attempt to build an end-to-end speech-to-text translation system, which does not use source language transcription during learning or decoding. We propose a model for direct speech-to-text translation, which gives promising results on a small French-English synthetic corpus. Relaxing the need for source language transcription would drastically change the data collection methodology in speech translation, especially in under-resourced scenarios. For instance, in the former project DARPA TRANSTAC (speech translation from spoken Arabic dialects), a large effort was devoted to the collection of speech transcripts (and a prerequisite to obtain transcripts was often a detailed transcription guide for languages with little standardized spelling). Now, if end-to-end approaches for speech-to-text translation are successful, one might consider collecting data by asking bilingual speakers to directly utter speech in the source language from target language text utterances. Such an approach has the advantage to be applicable to any unwritten (source) language. △ Less

Submitted 6 December, 2016; originally announced December 2016.

Comments: accepted to NIPS workshop on End-to-end Learning for Speech and Audio Processing

arXiv:1610.05540 [pdf, ps, other]

SYSTRAN's Pure Neural Machine Translation Systems

Authors: Josep Crego, Jungi Kim, Guillaume Klein, Anabel Rebollo, Kathy Yang, Jean Senellart, Egor Akhanov, Patrice Brunelle, Aurelien Coquard, Yongchao Deng, Satoshi Enoue, Chiyo Geiss, Joshua Johanson, Ardas Khalsa, Raoum Khiari, Byeongil Ko, Catherine Kobus, Jean Lorieux, Leidiana Martins, Dang-Chuan Nguyen, Alexandra Priori, Thomas Riccardi, Natalia Segal, Christophe Servan, Cyril Tiquet , et al. (5 additional authors not shown)

Abstract: Since the first online demonstration of Neural Machine Translation (NMT) by LISA, NMT development has recently moved from laboratory to production systems as demonstrated by several entities announcing roll-out of NMT engines to replace their existing technologies. NMT systems have a large number of training configurations and the training process of such systems is usually very long, often a few… ▽ More Since the first online demonstration of Neural Machine Translation (NMT) by LISA, NMT development has recently moved from laboratory to production systems as demonstrated by several entities announcing roll-out of NMT engines to replace their existing technologies. NMT systems have a large number of training configurations and the training process of such systems is usually very long, often a few weeks, so role of experimentation is critical and important to share. In this work, we present our approach to production-ready systems simultaneously with release of online demonstrators covering a large variety of languages (12 languages, for 32 language pairs). We explore different practical choices: an efficient and evolutive open-source framework; data preparation; network architecture; additional implemented features; tuning for production; etc. We discuss about evaluation methodology, present our first findings and we finally outline further work. Our ultimate goal is to share our expertise to build competitive production systems for "generic" translation. We aim at contributing to set up a collaborative framework to speed-up adoption of the technology, foster further research efforts and enable the delivery and adoption to/by industry of use-case specific engines integrated in real production workflows. Mastering of the technology would allow us to build translation engines suited for particular needs, outperforming current simplest/uniform systems. △ Less

Submitted 18 October, 2016; originally announced October 2016.

arXiv:1610.01291 [pdf, ps, other]

Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

Authors: Christophe Servan, Alexandre Berard, Zied Elloumi, Hervé Blanchon, Laurent Besacier

Abstract: This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments ar… ▽ More This paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments are made in the framework of the Metrics task of WMT 2014. We show that distributed representations are a good alternative to lexico-semantic resources for MT evaluation and they can even bring interesting additional information. The augmented versions of METEOR, using vector representations, are made available on our Github page. △ Less

Submitted 5 October, 2016; originally announced October 2016.

Comments: accepted to COLING 2016 conference

Showing 1–19 of 19 results for author: Servan, C