Christian Boitet

Also published as: Ch. Boitet


Démo de AMALD-serveur et AMALD-corpus, dédiés à l’analyse morphologique de l’allemand (Demonstration of AMALD-serveur and AMALD-corpus, dedicated to the morphological analysis of German)
Christian Boitet | Vincent Berment | Jean-Philippe Guilbaud | Claire Lemaire
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 4 : Démonstrations et résumés d'articles internationaux

Le projet AMALDarium vise à offrir sur la plateforme (1) un service d’analyse morphologique de l’allemand (AMALD-serveur), à grande couverture et de haute qualité, traitant la flexion, la dérivation et la composition, ainsi que les verbes à particule séparable séparée (ou agglutinée), (2) un corpus de référence de haute qualité donnant tous les résultats possibles de l’analyse morphologique, avant filtrage par une méthode statistique ou syntaxique, et (3) une plateforme (AMALD-éval) permettant d’organiser des évaluations comparatives, dans la perspective d’améliorer les performances d’algorithmes d’apprentissage en morphologie. Nous présentons ici une démonstration en ligne seulement de AMALD-serveur et AMALD-corpus. Le corpus est un sous-ensemble anonymisé et vérifié d’un corpus en allemand formé de textes sur le cancer du sein, contenant de nombreux mots composés techniques.


Towards an Automatic Classification of Illustrative Examples in a Large Japanese-French Dictionary Obtained by OCR
Christian Boitet | Mathieu Mangeot | Mutsuko Tomokiyo
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

We work on improving the Cesselin, a large and open source Japanese-French bilingual dictionary digitalized by OCR, available on the web, and contributively improvable online. Labelling its examples (about 226000) would significantly enhance their usefulness for language learners. Examples are proverbs, idiomatic constructions, normal usage examples, and, for nouns, phrases containing a quantifier. Proverbs are easy to spot, but not examples of other types. To find a method for automatically or at least semi-automatically annotating them, we have studied many entries, and hypothesized that the degree of lexical similarity between results of MT into a third language might give good cues. To confirm that hypothesis, we sampled 500 examples and used Google Translate to translate into English their Japanese expressions and their French translations. The hypothesis holds well, in particular for distinguishing examples of normal usage from idiomatic examples. Finally, we propose a detailed annotation procedure and discuss its future automatization.


Development of a classifiers/quantifiers dictionary towards French-Japanese MT
Mutsuko Tomokiyo | Mathieu Mangeot | Christian Boitet
Proceedings of Machine Translation Summit XVI: Research Track


An Aligned French-Chinese corpus of 10K segments from university educational material
Ruslan Kalitvianski | Lingxiao Wang | Valérie Bellynck | Christian Boitet
Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA2016)

This paper describes a corpus of nearly 10K French-Chinese aligned segments, produced by post-editing machine translated computer science courseware. This corpus was built from 2013 to 2016 within the PROJECT_NAME project, by native Chinese students. The quality, as judged by native speakers, is ad-equate for understanding (far better than by reading only the original French) and for getting better marks. This corpus is annotated at segment-level by a self-assessed quality score. It has been directly used as supplemental training data to build a statistical machine translation system dedicated to that sublanguage, and can be used to extract the specific bilingual terminology. To our knowledge, it is the first corpus of this kind to be released.

Corpus and dictionary development for classifiers/quantifiers towards a French-Japanese machine translation
Mutsuko Tomokiyo | Christian Boitet
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

Although quantifiers/classifiers expressions occur frequently in everyday communications or written documents, there is no description for them in classical bilingual paper dictionaries, nor in machine-readable dictionaries. The paper describes a corpus and dictionary development for quantifiers/classifiers, and their usage in the framework of French-Japanese machine translation (MT). They often cause problems of lexical ambiguity and of set phrase recognition during analysis, in particular for a long-distance language pair like French and Japanese. For the development of a dictionary aiming at ambiguity resolution for expressions including quantifiers and classifiers which may be ambiguous with common nouns, we have annotated our corpus with UWs (interlingual lexemes) of UNL (Universal Networking Language) found on the UNL-jp dictionary. The extraction of potential classifiers/quantifiers from corpus is made by UNLexplorer web service. Keywords : classifiers, quantifiers, phraseology study, corpus annotation, UNL (Universal Networking Language), UWs dictionary, Tori Bank, French-Japanese machine translation (MT).

Héloïse, une plate-forme pour développer des systèmes de TA compatibles Ariane en réseau (Heloise, a platform for collaborative development of Ariane-compatible MT systems)
Vincent Berment | Christian Boitet | Guillaume de Malézieux
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations

Dans cette démo, nous montrons comment utiliser Héloïse pour développer des systèmes de TA.


Post-editing a chapter of a specialized textbook into 7 languages: importance of terminological proximity with English for productivity
Ritesh Shah | Christian Boitet | Pushpak Bhattacharyya | Mithun Padmakumar | Leonardo Zilio | Ruslan Kalitvianski | Mohammad Nasiruddin | Mutsuko Tomokiyo | Sandra Castellanos Páez
Proceedings of the 12th International Conference on Natural Language Processing


On-going Cooperative Research towards Developing Economy-Oriented Chinese-French SMT Systems with a New SMT Framework
Yidong Chen | Lingxiao Wang | Christian Boitet | Xiaodong Shi
Proceedings of TALN 2014 (Volume 2: Short Papers)

Jibiki-LINKS: a tool between traditional dictionaries and lexical networks for modelling lexical resources
Ying Zhang | Mathieu Mangeot | Valérie Bellynck | Christian Boitet
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing
Christian Boitet | M.G. Abbas Malik
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing


Online production of HQ parallel corpora and permanent task-based evaluation of multiple MT systems: both can be obtained through iMAGs with no added cost
Lingxiao Wang | Christian Boitet
Proceedings of the 2nd Workshop on Post-editing Technology and Practice

Urdu Hindi Machine Transliteration using SMT
M. G. Abbas Malik | Christian Boitet | Laurent Besacier | Pushpak Bhattacharyya
Proceedings of the 4th Workshop on South and Southeast Asian Natural Language Processing

An extended morphological analyzer of German handling verbal forms with separated separable particles (Un analyseur morphologique étendu de l’allemand traitant les formes verbales à particule séparée) [in French]
Jean-Philippe Guilbaud | Christian Boitet | Vincent Berment
Proceedings of TALN 2013 (Volume 2: Short Papers)


Proceedings of COLING 2012
Martin Kay | Christian Boitet
Proceedings of COLING 2012

Proceedings of COLING 2012: Posters
Martin Kay | Christian Boitet
Proceedings of COLING 2012: Posters

Heloise — A Reengineering of Ariane-G5 SLLPs for Application to π-languages
Vincent Berment | Christian Boitet
Proceedings of COLING 2012: Posters

Proceedings of COLING 2012: Demonstration Papers
Martin Kay | Christian Boitet
Proceedings of COLING 2012: Demonstration Papers

Heloise — An Ariane-G5 Compatible Rnvironment for Developing Expert MT Systems Online
Vincent Berment | Christian Boitet
Proceedings of COLING 2012: Demonstration Papers

An In-Context and Collaborative Software Localisation Model
Amel Fraisse | Christian Boitet | Valérie Bellynck
Proceedings of COLING 2012: Demonstration Papers

Collaborative Computer-Assisted Translation Applied to Pedagogical Documents and Literary Works
Ruslan Kalitvianski | Christian Boitet | Valérie Bellynck
Proceedings of COLING 2012: Demonstration Papers

Demo of iMAG Possibilities: MT-postediting, Translation Quality Evaluation, Parallel Corpus Production
Ling Xiao Wang | Ying Zhang | Christian Boitet | Valerie Bellynck
Proceedings of COLING 2012: Demonstration Papers


Operationalization of interactive multilingual gateways (iMAGs) in the Traouiero project
Christian Boitet | Valérie Bellynck | Achille Falaise | Nguyen Hong-Thai
Proceedings of Translating and the Computer 33

Learning-to-Translate Based on the S-SSTC Annotation Schema
Enya Kong Tang | Zaharin Yusoff | Christian Boitet
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

Communautés Internet comme sources de préterminologie (Internet communities as sources of preterminology)
Mohammad Daoud | Christian Boitet
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article décrit deux expériences sur la construction de ressources terminologiques multilingues (preterminologies) préliminaires, mais grandes, grâce à des communautés Internet, et s’appuie sur ces expériences pour cibler des données terminologiques plus raffinées venant de communautés Internet et d’applications Web 2.0. La première expérience est une passerelle de contribution pour le site Web de la Route de la Soie numérique (DSR). Les visiteurs contribuent en effet à un référentiel lexical multilingue dédié, pendant qu’ils visitent et lisent les livres archivés, parce qu’ils sont intéressés par le domaine et ont tendance à être polygottes. Nous avons recueilli 1400 contributions lexicales en 4 mois. La seconde expérience est basée sur le JeuxDeMots arabe, où les joueurs en ligne contribuent à un réseau lexical arabe. L’expérience a entraîné une croissance régulière du nombre de joueurs et de contributions, ces dernières contenant des termes absents et des mots de dialectes oraux.


Multilingual Lexical Network from the Archives of the Digital Silk Road
Hans-Mohammad Daoud | Kyo Kageura | Christian Boitet | Asanobu Kitamoto | Mathieu Mangeot
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

Ontology driven content extraction using interlingual annotation of texts in the OMNIA project
Achille Falaise | David Rouquet | Didier Schwab | Hervé Blanchon | Christian Boitet
Proceedings of the 4th Workshop on Cross Lingual Information Access

Multilinguization and Personalization of NL-based Systems
Najeh Hajlaoui | Christian Boitet
Proceedings of the 4th Workshop on Cross Lingual Information Access

mwetoolkit: a Framework for Multiword Expression Identification
Carlos Ramisch | Aline Villavicencio | Christian Boitet
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the Multiword Expression Toolkit (mwetoolkit), an environment for type and language-independent MWE identification from corpora. The mwetoolkit provides a targeted list of MWE candidates, extracted and filtered according to a number of user-defined criteria and a set of standard statistical association measures. For generating corpus counts, the toolkit provides both a corpus indexation facility and a tool for integration with web search engines, while for evaluation, it provides validation and annotation facilities. The mwetoolkit also allows easy integration with a machine learning tool for the creation and application of supervised MWE extraction models if annotated data is available. In our experiment, the mwetoolkit was tested and evaluated in the context of MWE extraction in the biomedical domain. Our preliminary results show that the toolkit performs better than other approaches, especially concerning recall. Moreover, this first version can also be extended in several ways in order to improve the quality of the results.

Weak Translation Problems – a case study of Scriptural Translation
Muhammad Ghulam Abbas Malik | Christian Boitet | Pushpak Bhattacharyya | Laurent Besacier
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

General purpose, high quality and fully automatic MT is believed to be impossible. We are interested in scriptural translation problems, which are weak sub-problems of the general problem of translation. We introduce the characteristics of the weak problems of translation and of the scriptural translation problems, describe different computational approaches (finite-state, statistical and hybrid) to solve these problems, and report our results on several combinations of Indo-Pak languages and writing systems.

The iMAG concept: multilingual access gateway to an elected Web sites with incremental quality increase through collaborative post-edition of MT pretranslations
Christian Boitet | Cong Phap Huynh | Hong Thai Nguyen | Valérie Bellynck
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

We will demonstrate iMAGs (interactive Multilingual Access Gateways), in particular on a scientific laboratory web site and on the Greater Grenoble (La Métro) web site.

Finite-state Scriptural Translation
M. G. Abbas Malik | Christian Boitet | Pushpak Bhattacharyya
Coling 2010: Posters

Web-based and combined language models: a case study on noun compound identification
Carlos Ramisch | Aline Villavicencio | Christian Boitet
Coling 2010: Posters

Multiword Expressions in the wild? The mwetoolkit comes in handy
Carlos Ramisch | Aline Villavicencio | Christian Boitet
Coling 2010: Demonstrations


A Hybrid Model for Urdu Hindi Transliteration
Abbas Malik | Laurent Besacier | Christian Boitet | Pushpak Bhattacharyya
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

A Web Service Enabling Gradable Post-edition of Pre-translations Produced by Existing Translation Tools: Practical Use to Provide High-quality Translation of an Online Encyclopedia
Hervé Blanchon | Christian Boitet | Cong-Phap Huynh
Beyond Translation Memories: New Tools for Translators Workshop


SECTra_w.1: an Online Collaborative System for Evaluating, Post-editing and Presenting MT Translation Corpora
Cong-Phap Huynh | Christian Boitet | Hervé Blanchon
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

SECTra_w is a web-oriented system mainly dedicated to the evaluation of MT systems. After importing a source corpus, and possibly reference translations, one can call various MT systems, store their results, and have a collection of human judges perform subjective evaluation online (fluidity, adequacy). It is also possible to perform objective, task-oriented evaluation by letting humans post-edit the MT results, using a web translation editor, and measuring an edit distance and/or the post-editing time. The post-edited results can be added to the set of reference translations, or constitute it if there were no references. SECTra_w makes it possible to show not only tables of figures as results of an evaluation campaign, but also the real data (source, MT outputs, references, post-edited outputs), and to make the post-edition effort sensible by transforming the trace of the edit distance computation in an intuitive presentation, much like a “revision” presentation in Word. The system is written in java under Xwiki and uses the Ajax technique. It can handle large, multilingual and multimedia corpora: EuroParl, BTEC, ERIM (bilingual interpreted dialogues with audio and text), Unesco-B@bel, and a test corpus by France Telecom have been loaded together and used in tests.

Les architectures linguistiques et computationnelles en traduction automatique sont indépendantes
Christian Boitet
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Contrairement à une idée répandue, les architectures linguistiques et computationnelles des systèmes de traduction automatique sont indépendantes. Les premières concernent le choix des représentations intermédiaires, les secondes le type d’algorithme, de programmation et de ressources utilisés. Il est ainsi possible d’utiliser des méthodes de calcul « expertes » ou « empiriques » pour construire diverses phases ou modules de systèmes d’architectures linguistiques variées. Nous terminons en donnant quelques éléments pour le choix de ces architectures en fonction des situations traductionnelles et des ressources disponibles, en termes de dictionnaires, de corpus, et de compétences humaines.

Hindi Urdu Machine Transliteration using Finite-State Transducers
M G Abbas Malik | Christian Boitet | Pushpak Bhattacharyya
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)


BEYTrans: A Free Online Collaborative Wiki-Based CAT Environment Designed for Online Translation Communities
Youcef Bey | Kyo Kageura | Christian Boitet
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

Vers un méta-EDL complet, puis un EDL universel pour la TAO
Hong-Thai Nguyen | Christian Boitet
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Un “méta-EDL” (méta-Environnement de Développement Linguiciel) pour la TAO permet de piloter à distance un ou plusieurs EDL pour construire des systèmes de TAO hétérogènes. Partant de CASH, un méta-EDL dédié à Ariane-G5, et de WICALE 1.0, un premier méta-EDL générique mais aux fonctionnalités minimales, nous dégageons les problèmes liés à l’ajout de fonctionnalités riches comme l’édition et la navigation en local, et donnons une solution implémentée dans WICALE 2.0. Nous y intégrons maintenant une base lexicale pour les systèmes à « pivot lexical », comme UNL/U++. Un but à plus long terme est de passer d’un tel méta-EDL générique multifonctionnel à un EDL « universel », ce qui suppose la réingénierie des compilateurs et des moteurs des langages spécialisés pour la programmation linguistique (LSPL) supportés par les divers EDL.

Pour l’évaluation externe des systèmes de TA par des méthodes fondées sur la tâche [For an external evaluation of MT systems by task-based methods]
Hervé Blanchon | Christian Boitet
Traitement Automatique des Langues, Volume 48, Numéro 1 : Principes de l'évaluation en Traitement Automatique des Langues [Principles of Evaluation in Natural Language Processing]


Data Management in QRLex, an Online Aid System for Volunteer Translators’
Youcef Bey | Kyo Kageura | Christian Boitet
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 4, December 2006

IWSLT-06: experiments with commercial MT systems and lessons from subjective evaluations
Christian Boitet | Youcef Bey | Mutsuko Tomokio | Wenjie Cao | Hervé Blanchon
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

Traduction automatisée fondée sur le dialogue et documents auto-explicatifs : bilan du projet LIDIA [Machine translation based on dialogues and self-explanatory documents: an assessment of the LIDIA project]
Hervé Blanchon | Christian Boitet | Ali Choumane
Traitement Automatique des Langues, Volume 47, Numéro 3 : Varia [Varia]


A Framework for Data Management for the Online Volunteer Translators’ Aid System QRLex
Youcef Bey | Kyo Kageura | Christian Boitet
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation


Collecting and Sharing Bilingual Spontaneous Speech Corpora: the ChinFaDial Experiment
Georges Fafiotte | Christian Boitet | Mark Seligman | Chengqing Zong
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

PolyphraZ: a Tool for the Management of Parallel Corpora
Najeh Hajlaoui | Christian Boitet
Proceedings of the Workshop on Multilingual Linguistic Resources

Towards fairer evaluations of commercial MT systems on basic travel expressions corpora
Herve Blanchon | Christian Boitet | Francis Brunet-Manquat | Mutsuko Tomokiyo | Agnes Hamon | Vo Trung Hung | Youcef Bey
Proceedings of the First International Workshop on Spoken Language Translation: Evaluation Campaign

Spoken dialogue translation systems evaluation: results, new trends, problems and proposals
Herve Blanchon | Christian Boitet | Laurent Besacier
Proceedings of the First International Workshop on Spoken Language Translation: Papers

PolyphraZ: a tool for the quantitative and subjective evaluation of parallel corpora
Najeh Hajlaoui | Christian Boitet
Proceedings of the First International Workshop on Spoken Language Translation: Papers

Deux premières étapes vers les documents auto-explicatifs
Hervé Blanchon | Christian Boitet
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans le cadre du projet LIDIA, nous avons montré que dans de nombreuses situations, la TA Fondée sur le Dialogue (TAFD) pour auteur monolingue peut offrir une meilleure solution en traduction multicible que les aides aux traducteurs, ou la traduction avec révision, même si des langages contrôlés sont utilisés. Nos premières expériences ont mis en évidence le besoin de conserver les « intentions de l’auteur » au moyen « d’annotations de désambiguïsation ». Ces annotations permettent de transformer le document source en un Document Auto-Explicatif (DAE). Nous présentons ici une solution pour intégrer ces annotations dans un document XML et les rendre visibles et utilisables par un lecteur pour une meilleure compréhension du « vrai contenu » du document. Le concept de Document Auto-Explicatif pourrait changer profondément notre façon de comprendre des documents importants ou écrits dans un style complexe. Nous montrerons aussi qu’un DAE, traduit dans une langue cible L, pourrait aussi être transformé, sans interaction humaine, en un DAE en langue L si un analyseur et un désambiguïseur sont disponibles pour cette langue L. Ainsi, un DAE pourrait être utilisé dans un contexte monolingue, mais aussi dans un contexte multilingue sans travail humain additionnel.


SYSTRAN new generation: the XML translation workflow
Jean Senellart | Christian Boitet | Laurent Romary
Proceedings of Machine Translation Summit IX: Papers

Customization of Machine Translation (MT) is a prerequisite for corporations to adopt the technology. It is therefore important but nonetheless challenging. Ongoing implementation proves that XML is an excellent exchange device between MT modules that efficiently enables interaction between the user and the processes to reach highly granulated structure-based customization. Accomplished through an innovative approach called the SYSTRAN Translation Stylesheet, this method is coherent with the current evolution of the “authoring process”. As a natural progression, the next stage in the customization process is the integration of MT in a multilingual tool kit designed for the “authoring process”.


La coédition langue↔UNL pour partager la révision entre les langues d’un document multilingue : un concept unificateur
Christian Boitet | Wang-Ju Tsai
Actes de la 9ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La coédition d’un texte en langue naturelle et de sa représentation dans une forme interlingue semble le moyen le meilleur et le plus simple de partager la révision du texte vers plusieurs langues. Pour diverses raisons, les graphes UNL sont les meilleurs candidats dans ce contexte. Nous développons un prototype où, dans le scénario avec partage le plus simple, des utilisateurs “naïfs” interagissent directement avec le texte dans leur langue (L0), et indirectement avec le graphe associé pour corriger les erreurs. Le graphe modifié est ensuite envoyé au déconvertisseur UNL-L0 et le résultat est affiché. S’il est satisfaisant, les erreurs étaient probablement dues au graphe et non au déconvertisseur, et le graphe est envoyé aux déconvertisseurs vers d’autres langues. Les versions dans certaines autres langues connues de l’utilisateur peuvent être affichées, de sorte que le partage de l’amélioration soit visible et encourageant. Comme les nouvelles versions sont ajoutées dans le document multilingue original avec des balises et des attributs appropriés, rien n’est jamais perdu, et le travail coopératif sur un même document est rendu possible. Du côté interne, des liaisons sont établies entre des éléments du texte et du graphe en utilisant des ressources largement disponibles comme un dictionnaire L0-anglais, ou mieux L0-UNL, un analyseur morphosyntaxique de L0, et une transformation canonique de graphe UNL à arbre. On peut établir une “meilleure” correspondance entre “l’arbre-UNL+L0” et la “structure MS-L0”, une treille, en utilisant le dictionnaire et en cherchant à aligner l’arbre et une trajectoire avec aussi peu que possible de croisements de liaisons. Un but central de cette recherche est de fusionner les approches de la TA par pivot, de la TA interactive, et de la génération multilingue de texte.

UNL Lexical Selection with Conceptual Vectors
Mathieu Lafourcade | Christian Boitet
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Coedition to Share Text Revision across Languages and Improve MT a Posteriori
Christian Boitet | Wang-Ju Tsai
COLING-02: Machine Translation in Asia

The PAPILLON Project: Cooperatively Building a Multilingual Lexical Data-base to Derive Open Source Dictionaries & Lexicons
Christian Boitet | Mathieu Mangeot | Gilles Sérasset
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)


Four technical and organizational keys to handle more languages and improve quality (on demand) in MT
Christian Boitet
Workshop on MT2010: Towards a Road Map for MT

Despite considerable investment over the past 50 years, only a small number of language pairs is covered by MT systems designed for information access, and even fewer are capable of quality translation or speech translation. To open the door toward MT of adequate quality for all languages (at least in principle), we propose four keys. On the technical side, we should (1) dramatically increase the use of learning techniques which have demonstrated their potential at the research level, and (2) use pivot architectures, the most universally usable pivot being UNL. On the organizational side, the keys are (3) the cooperative development of open source linguistic resources on the Web, and (4) the construction of systems where quality can be improved "on demand" by users, either a priori through interactive disambiguation, or a posteriori by correcting the pivot representation through any language, thereby unifying MT, computer-aided authoring, and multilingual generation.


On UNL as the future “html of the linguistic content” & the reuse of existing NLP components in UNL-related applications with the example of a UNL-French deconverter
Gilles Sérasset | Christian Boitet
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics


A research perspective on how to democratize machine translation and translation aids aiming at high quality final output
Christian Boitet
Proceedings of Machine Translation Summit VII

Machine Translation (MT) systems and Translation Aids (TA) aiming at cost-effective high quality final translation are not yet usable by small firms, departments and individuals, and handle only a few languages and language pairs. This is due to a variety of reasons, some of them not frequently mentioned. But commercial, technical and cultural reasons make it mandatory to find ways to democratize MT and TA. This goal could be attained by: (1) giving users, free of charge, TA client tools and server resources in exchange for the permission to store and refine on the server linguistic resources produced while using TA; (2) establishing a synergy between MT and TA, in particular by using them jointly in translation projects where translators codevelop the lexical resources specific to MT; (3) renouncing the illusion of fully automatic general purpose high quality MT (FAHQMT) and go for semi-automaticity (SAHQMT), where user participation, made possible by recent technical network-oriented advances, is used to solve ambiguities otherwise computationnally unsolvable due to the impossibility, intractability or cost of accessing the necessary knowledge; (4) adopting a hybrid (symbolic & numerical) and "pivot" approach for MT, where pivot lexemes arc UNL or UNL inspired English-oriented denotations of (sets of) interlingual acceptions or word/term senses, and the rest of the representation of utterances is either fully abstract and interlingual as in UNL, or, less ambitiously but more realistically, obtained by adding to an abstract English multilevel structure features underspecified in English but essential for other languages, including minority languages.

UNL-French deconversion as transfer & generation from an interlingua with possible quality enhancement through offline human interaction
Gilles Sérasset | Christian Boitet
Proceedings of Machine Translation Summit VII

We present the architecture of the UNL-French deconverter, which "generates" from the UNL interlingua by first "localizing" the UNL form for French, within UNL, and then applying slightly adapted but classical transfer and generation techniques, implemented in GETA's Ariane-G5 environment, supplemented by some UNL-specific tools. Online interaction can be used during deconversion to enhance output quality and is now used for development purposes. We show how interaction could be delayed and embedded in the postedition phase, which would then interact not directly with the output text, but indirectly with several components of the deconverter. Interacting online or offline can improve the quality not only of the utterance at hand, but also of the utterances processed later, as various preferences may be automatically changed to let the deconverter "learn".


Transforming Lattices into Non-deterministic Automata with Optional Null Arcs
Mark Seligman | Christian Boitet | Boubaker Meddeb-Hamrouni
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

Transforming Lattices into Non-deterministic Automata with Optional Null Arcs
Mark Seligman | Christian Boitet | Boubaker Meddeb-Hamrouni
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2


Theory and practice of ambiguity labelling with a view to interactive disambiguation in text and speech MT
Christian Boitet | Mutsuko Tomokiyo
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics


Factors for success and failure in MT
Christian Boitet
Proceedings of Machine Translation Summit V


The “Whiteboard” Architecture: A Way to Integrate Heterogeneous Components of NLP Systems
Christian Boitet | Mark Seligman
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

Dialogue-Based MT and self-explaining documents as an alternative to MAHT and MT of controlled languages
Christian Boitet
Proceedings of the Second International Conference on Machine Translation: Ten years on

We argue that, in many situations, Dialogue-Based MT is likely to offer better solutions to translation needs than machine aids to translators or batch MT, even if controlled languages are used. Objections to DBMT have led us to introduce the new concept of “self-explaining document”, which might be used in monolingual as well as in multilingual contexts, and deeply change our way of understanding important or difficult written material.


Practical Speech Translation Systems will Integrate Human Expertise, Multimodal Communication, and Interactive Disambiguation
Christian Boitet
Proceedings of Machine Translation Summit IV


About these proceedings
Christian Boitet
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

Multilinguisation d’un editeur de documents structures. Application a un dictionnaire trilingue
Huy Khanh Phan | Christian Boitet
COLING 1992 Volume 3: The 14th International Conference on Computational Linguistics


Towards Personal MT: general design, dialogue structure, potential role of speech
Christian Boitet
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

Towards Personal MT: general design, dialogue structure, potential role of speech
Christian Boitet
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics


Representation Trees and String-Tree Correspondences
Ch. Boitet | Y. Zaharin
Coling Budapest 1988 Volume 1: International Conference on Computational Linguistics

Bernard Vauqois’ contribution to the theory and practice of building MT systems: a historical perspective
Christian Boitet
Proceedings of the Second Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages


TOWARD INTEGRATED DICTIONARIES FOR M(a)T: motivations and linguistic organisation
Ch. Boitet | N. Nedobejkine
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

Current machine translation systems developed with GETA’s methodology and software tools
Christian Boitet
Proceedings of Translating and the Computer 8: A profession on the move


Various Representations of Text Proposed for Eurotra
Christian Boitet | Nelson Verastegui | Daniel Bachut
Second Conference of the European Chapter of the Association for Computational Linguistics

A Case Study in Software Evolution: from Ariane-78.4 to Ariane-85
Christian Boitet | P. Guillaume | M. Quezel-Ambrunaz
Proceedings of the first Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

On the Design of Expert Systems Grafted on MT Systems
R. Gerber | Christian Boitet
Proceedings of the first Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

Automated Translation at Grenoble University
Bernard Vauquois | Christian Boitet
Computational Linguistics Formerly the American Journal of Computational Linguistics, Volume 11, Number 1, January-March 1985


Expert Systems and Other New Techniques in MT Systems
Christian Boitet | Rene Gerber
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics


Implementation and Conversational Environment of ARIANE 78.4, An Integrated System for Automated Translation and Human Revision
Ch. Boitet | P. Guillaume | M. Quezel-Ambrunaz
Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics


Present and Future Paradigms in the Automatized Translation of Natural Languages.
Ch. Boitet | P. Chatelin | P. Daun Fraga
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics

Russian-French at GETA: Outline of the Method and Detailed Example
Ch. Boitet | N. Nedobejkine
COLING 1980 Volume 1: The 8th International Conference on Computational Linguistics