-
Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining
Authors:
Asa Cooper Stickland,
Sailik Sengupta,
Jason Krone,
Saab Mansour,
He He
Abstract:
Advances in neural modeling have achieved state-of-the-art (SOTA) results on public natural language processing (NLP) benchmarks, at times surpassing human performance. However, there is a gap between public benchmarks and real-world applications where noise, such as typographical or grammatical mistakes, is abundant and can result in degraded performance. Unfortunately, works which evaluate the r…
▽ More
Advances in neural modeling have achieved state-of-the-art (SOTA) results on public natural language processing (NLP) benchmarks, at times surpassing human performance. However, there is a gap between public benchmarks and real-world applications where noise, such as typographical or grammatical mistakes, is abundant and can result in degraded performance. Unfortunately, works which evaluate the robustness of neural models on noisy data and propose improvements, are limited to the English language. Upon analyzing noise in different languages, we observe that noise types vary greatly across languages. Thus, existing investigations do not generalize trivially to multilingual settings. To benchmark the performance of pretrained multilingual language models, we construct noisy datasets covering five languages and four NLP tasks and observe a clear gap in the performance between clean and noisy data in the zero-shot cross-lingual setting. After investigating several ways to boost the robustness of multilingual models in this setting, we propose Robust Contrastive Pretraining (RCP). RCP combines data augmentation with a contrastive loss term at the pretraining stage and achieves large improvements on noisy (and original test data) across two sentence-level (+3.2%) and two sequence-labeling (+10 F1-score) multilingual classification tasks.
△ Less
Submitted 10 February, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Label Semantic Aware Pre-training for Few-shot Text Classification
Authors:
Aaron Mueller,
Jason Krone,
Salvatore Romeo,
Saab Mansour,
Elman Mansimov,
Yi Zhang,
Dan Roth
Abstract:
In text classification tasks, useful information is encoded in the label names. Label semantic aware systems have leveraged this information for improved text classification performance during fine-tuning and prediction. However, use of label-semantics during pre-training has not been extensively explored. We therefore propose Label Semantic Aware Pre-training (LSAP) to improve the generalization…
▽ More
In text classification tasks, useful information is encoded in the label names. Label semantic aware systems have leveraged this information for improved text classification performance during fine-tuning and prediction. However, use of label-semantics during pre-training has not been extensively explored. We therefore propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems. LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains. As domain-general pre-training requires large amounts of data, we develop a filtering and labeling pipeline to automatically create sentence-label pairs from unlabeled text. We perform experiments on intent (ATIS, Snips, TOPv2) and topic classification (AG News, Yahoo! Answers). LSAP obtains significant accuracy improvements over state-of-the-art models for few-shot text classification while maintaining performance comparable to state of the art in high-resource settings.
△ Less
Submitted 29 May, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Soft Layer Selection with Meta-Learning for Zero-Shot Cross-Lingual Transfer
Authors:
Weijia Xu,
Batool Haider,
Jason Krone,
Saab Mansour
Abstract:
Multilingual pre-trained contextual embedding models (Devlin et al., 2019) have achieved impressive performance on zero-shot cross-lingual transfer tasks. Finding the most effective fine-tuning strategy to fine-tune these models on high-resource languages so that it transfers well to the zero-shot languages is a non-trivial task. In this paper, we propose a novel meta-optimizer to soft-select whic…
▽ More
Multilingual pre-trained contextual embedding models (Devlin et al., 2019) have achieved impressive performance on zero-shot cross-lingual transfer tasks. Finding the most effective fine-tuning strategy to fine-tune these models on high-resource languages so that it transfers well to the zero-shot languages is a non-trivial task. In this paper, we propose a novel meta-optimizer to soft-select which layers of the pre-trained model to freeze during fine-tuning. We train the meta-optimizer by simulating the zero-shot transfer scenario. Results on cross-lingual natural language inference show that our approach improves over the simple fine-tuning baseline and X-MAML (Nooralahzadeh et al., 2020).
△ Less
Submitted 20 July, 2021;
originally announced July 2021.
-
On the Robustness of Intent Classification and Slot Labeling in Goal-oriented Dialog Systems to Real-world Noise
Authors:
Sailik Sengupta,
Jason Krone,
Saab Mansour
Abstract:
Intent Classification (IC) and Slot Labeling (SL) models, which form the basis of dialogue systems, often encounter noisy data in real-word environments. In this work, we investigate how robust IC/SL models are to noisy data. We collect and publicly release a test-suite for seven common noise types found in production human-to-bot conversations (abbreviations, casing, misspellings, morphological v…
▽ More
Intent Classification (IC) and Slot Labeling (SL) models, which form the basis of dialogue systems, often encounter noisy data in real-word environments. In this work, we investigate how robust IC/SL models are to noisy data. We collect and publicly release a test-suite for seven common noise types found in production human-to-bot conversations (abbreviations, casing, misspellings, morphological variants, paraphrases, punctuation and synonyms). On this test-suite, we show that common noise types substantially degrade the IC accuracy and SL F1 performance of state-of-the-art BERT-based IC/SL models. By leveraging cross-noise robustness transfer -- training on one noise type to improve robustness on another noise type -- we design aggregate data-augmentation approaches that increase the model performance across all seven noise types by +10.8% for IC accuracy and +15 points for SL F1 on average. To the best of our knowledge, this is the first work to present a single IC/SL model that is robust to a wide range of noise phenomena.
△ Less
Submitted 1 November, 2021; v1 submitted 14 April, 2021;
originally announced April 2021.
-
Structured Prediction as Translation between Augmented Natural Languages
Authors:
Giovanni Paolini,
Ben Athiwaratkun,
Jason Krone,
Jie Ma,
Alessandro Achille,
Rishita Anubhai,
Cicero Nogueira dos Santos,
Bing Xiang,
Stefano Soatto
Abstract:
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discri…
▽ More
We propose a new framework, Translation between Augmented Natural Languages (TANL), to solve many structured prediction language tasks including joint entity and relation extraction, nested named entity recognition, relation classification, semantic role labeling, event extraction, coreference resolution, and dialogue state tracking. Instead of tackling the problem by training task-specific discriminative classifiers, we frame it as a translation task between augmented natural languages, from which the task-relevant information can be easily extracted. Our approach can match or outperform task-specific models on all tasks, and in particular, achieves new state-of-the-art results on joint entity and relation extraction (CoNLL04, ADE, NYT, and ACE2005 datasets), relation classification (FewRel and TACRED), and semantic role labeling (CoNLL-2005 and CoNLL-2012). We accomplish this while using the same architecture and hyperparameters for all tasks and even when training a single model to solve all tasks at the same time (multi-task learning). Finally, we show that our framework can also significantly improve the performance in a low-resource regime, thanks to better use of label semantics.
△ Less
Submitted 2 December, 2021; v1 submitted 14 January, 2021;
originally announced January 2021.
-
Meta learning to classify intent and slot labels with noisy few shot examples
Authors:
Shang-Wen Li,
Jason Krone,
Shuyan Dong,
Yi Zhang,
Yaser Al-onaizan
Abstract:
Recently deep learning has dominated many machine learning areas, including spoken language understanding (SLU). However, deep learning models are notorious for being data-hungry, and the heavily optimized models are usually sensitive to the quality of the training examples provided and the consistency between training and inference conditions. To improve the performance of SLU models on tasks wit…
▽ More
Recently deep learning has dominated many machine learning areas, including spoken language understanding (SLU). However, deep learning models are notorious for being data-hungry, and the heavily optimized models are usually sensitive to the quality of the training examples provided and the consistency between training and inference conditions. To improve the performance of SLU models on tasks with noisy and low training resources, we propose a new SLU benchmarking task: few-shot robust SLU, where SLU comprises two core problems, intent classification (IC) and slot labeling (SL). We establish the task by defining few-shot splits on three public IC/SL datasets, ATIS, SNIPS, and TOP, and adding two types of natural noises (adaptation example missing/replacing and modality mismatch) to the splits. We further propose a novel noise-robust few-shot SLU model based on prototypical networks. We show the model consistently outperforms the conventional fine-tuning baseline and another popular meta-learning method, Model-Agnostic Meta-Learning (MAML), in terms of achieving better IC accuracy and SL F1, and yielding smaller performance variation when noises are present.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Augmented Natural Language for Generative Sequence Labeling
Authors:
Ben Athiwaratkun,
Cicero Nogueira dos Santos,
Jason Krone,
Bing Xiang
Abstract:
We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple sequence labeling tasks at once using a single, shared natural language output space. Unlike prior discriminative methods, our model naturally incorporates label semantics and shares knowledge across tasks. Our framework is general purpose, performing well on few-shot, low-r…
▽ More
We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple sequence labeling tasks at once using a single, shared natural language output space. Unlike prior discriminative methods, our model naturally incorporates label semantics and shares knowledge across tasks. Our framework is general purpose, performing well on few-shot, low-resource, and high-resource tasks. We demonstrate these advantages on popular named entity recognition, slot labeling, and intent classification benchmarks. We set a new state-of-the-art for few-shot slot labeling, improving substantially upon the previous 5-shot ($75.0\% \rightarrow 90.9\%$) and 1-shot ($70.4\% \rightarrow 81.0\%$) state-of-the-art results. Furthermore, our model generates large improvements ($46.27\% \rightarrow 63.83\%$) in low-resource slot labeling over a BERT baseline by incorporating label semantics. We also maintain competitive results on high-resource tasks, performing within two points of the state-of-the-art on all tasks and setting a new state-of-the-art on the SNIPS dataset.
△ Less
Submitted 15 September, 2020;
originally announced September 2020.
-
Learning to Classify Intents and Slot Labels Given a Handful of Examples
Authors:
Jason Krone,
Yi Zhang,
Mona Diab
Abstract:
Intent classification (IC) and slot filling (SF) are core components in most goal-oriented dialogue systems. Current IC/SF models perform poorly when the number of training examples per class is small. We propose a new few-shot learning task, few-shot IC/SF, to study and improve the performance of IC and SF models on classes not seen at training time in ultra low resource scenarios. We establish a…
▽ More
Intent classification (IC) and slot filling (SF) are core components in most goal-oriented dialogue systems. Current IC/SF models perform poorly when the number of training examples per class is small. We propose a new few-shot learning task, few-shot IC/SF, to study and improve the performance of IC and SF models on classes not seen at training time in ultra low resource scenarios. We establish a few-shot IC/SF benchmark by defining few-shot splits for three public IC/SF datasets, ATIS, TOP, and Snips. We show that two popular few-shot learning algorithms, model agnostic meta learning (MAML) and prototypical networks, outperform a fine-tuning baseline on this benchmark. Prototypical networks achieves significant gains in IC performance on the ATIS and TOP datasets, while both prototypical networks and MAML outperform the baseline with respect to SF on all three datasets. In addition, we demonstrate that joint training as well as the use of pre-trained language models, ELMo and BERT in our case, are complementary to these few-shot learning methods and yield further gains.
△ Less
Submitted 22 April, 2020;
originally announced April 2020.
-
Multi-task Learning for Continuous Control
Authors:
Himani Arora,
Rajath Kumar,
Jason Krone,
Chong Li
Abstract:
Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been…
▽ More
Reliable and effective multi-task learning is a prerequisite for the development of robotic agents that can quickly learn to accomplish related, everyday tasks. However, in the reinforcement learning domain, multi-task learning has not exhibited the same level of success as in other domains, such as computer vision. In addition, most reinforcement learning research on multi-task learning has been focused on discrete action spaces, which are not used for robotic control in the real-world. In this work, we apply multi-task learning methods to continuous action spaces and benchmark their performance on a series of simulated continuous control tasks. Most notably, we show that multi-task learning outperforms our baselines and alternative knowledge sharing methods.
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
A Typeful Integration of SQL into Curry
Authors:
Michael Hanus,
Julia Krone
Abstract:
We present an extension of the declarative programming language Curry to support the access to data stored in relational databases via SQL. Since Curry is statically typed, our emphasis on this SQL integration is on type safety. Our extension respects the type system of Curry so that run-time errors due to ill-typed data are avoided. This is obtained by preprocessing SQL statements at compile ti…
▽ More
We present an extension of the declarative programming language Curry to support the access to data stored in relational databases via SQL. Since Curry is statically typed, our emphasis on this SQL integration is on type safety. Our extension respects the type system of Curry so that run-time errors due to ill-typed data are avoided. This is obtained by preprocessing SQL statements at compile time and translating them into type-safe database access operations. As a consequence, the type checker of the Curry system can spot type errors in SQL statements at compile time. To generate appropriately typed access operations, the preprocessor uses an entity-relationship (ER) model describing the structure of the relational data. In addition to standard SQL, SQL statements embedded in Curry can include program expressions and also relationships specified in the ER model. The latter feature is useful to avoid the error-prone use of foreign keys. As a result, our SQL integration supports a high-level and type-safe access to databases in Curry programs.
△ Less
Submitted 3 January, 2017;
originally announced January 2017.