Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Kozareva, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2206.12116  [pdf, other

    stat.ML cs.AI cs.LG

    Approximating 1-Wasserstein Distance with Trees

    Authors: Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

    Abstract: Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

  2. arXiv:2205.12495  [pdf, other

    cs.CL

    ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

    Authors: Badr AlKhamissi, Faisal Ladhak, Srini Iyer, Ves Stoyanov, Zornitsa Kozareva, Xian Li, Pascale Fung, Lambert Mathias, Asli Celikyilmaz, Mona Diab

    Abstract: Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts… ▽ More

    Submitted 20 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: Accepted at EMNLP 2022

    Journal ref: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 2109-2120, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics

  3. arXiv:2205.01703  [pdf, other

    cs.CL

    Improving In-Context Few-Shot Learning via Self-Supervised Training

    Authors: Mingda Chen, Jingfei Du, Ramakanth Pasunuru, Todor Mihaylov, Srini Iyer, Veselin Stoyanov, Zornitsa Kozareva

    Abstract: Self-supervised pretraining has made few-shot learning possible for many NLP tasks. But the pretraining objectives are not typically adapted specifically for in-context few-shot learning. In this paper, we propose to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage with the goal to teach the model to perform in-context few shot learning. We p… ▽ More

    Submitted 6 June, 2022; v1 submitted 3 May, 2022; originally announced May 2022.

    Comments: NAACL 2022

  4. arXiv:2112.10684  [pdf, other

    cs.CL cs.AI cs.LG

    Efficient Large Scale Language Modeling with Mixtures of Experts

    Authors: Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov

    Abstract: Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full-shot fine-tuning. With the exception of fine-tuning, we… ▽ More

    Submitted 26 October, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: EMNLP 2022

  5. arXiv:2112.10668  [pdf, other

    cs.CL cs.AI

    Few-shot Learning with Multilingual Language Models

    Authors: Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li

    Abstract: Large-scale generative language models such as GPT-3 are competitive few-shot learners. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual generative language models on a corpus covering a diverse set of languages, and study t… ▽ More

    Submitted 10 November, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted to EMNLP 2022; 34 pages

  6. arXiv:2111.13654  [pdf, other

    cs.CL cs.AI cs.LG

    Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

    Authors: Peter Hase, Mona Diab, Asli Celikyilmaz, Xian Li, Zornitsa Kozareva, Veselin Stoyanov, Mohit Bansal, Srinivasan Iyer

    Abstract: Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on meth… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

    Comments: 19 pages

  7. arXiv:2109.03431  [pdf, other

    cs.AI cs.LG

    Fixed Support Tree-Sliced Wasserstein Barycenter

    Authors: Yuki Takezawa, Ryoma Sato, Zornitsa Kozareva, Sujith Ravi, Makoto Yamada

    Abstract: The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the t… ▽ More

    Submitted 11 February, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: AISTATS 2022

  8. arXiv:2004.05801  [pdf, other

    cs.CL cs.LG

    ProFormer: Towards On-Device LSH Projection Based Transformers

    Authors: Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva

    Abstract: At the heart of text based neural models lay word representations, which are powerful but occupy a lot of memory making it challenging to deploy to devices with memory constraints such as mobile phones, watches and IoT. To surmount these challenges, we introduce ProFormer -- a projection based transformer architecture that is faster and lighter making it suitable to deploy to memory constraint dev… ▽ More

    Submitted 23 April, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

    Comments: EACL 2021 - BEST PAPER AWARD, Honorable Mention

  9. arXiv:2003.00443  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

    Authors: Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi

    Abstract: Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to generalize well in previously unseen environments. To close the gap between seen and unseen environments, we aim at learning a generalized navi… ▽ More

    Submitted 20 July, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

    Comments: ECCV 2020

  10. arXiv:1912.06806  [pdf, other

    cs.CL cs.IR cs.LG

    SemEval-2013 Task 2: Sentiment Analysis in Twitter

    Authors: Preslav Nakov, Zornitsa Kozareva, Alan Ritter, Sara Rosenthal, Veselin Stoyanov, Theresa Wilson

    Abstract: In recent years, sentiment analysis in social media has attracted a lot of research interest and has been used for a number of applications. Unfortunately, research has been hindered by the lack of suitable datasets, complicating the comparison between approaches. To address this issue, we have proposed SemEval-2013 Task 2: Sentiment Analysis in Twitter, which included two subtasks: A, an expressi… ▽ More

    Submitted 14 December, 2019; originally announced December 2019.

    Comments: Sentiment analysis, microblog sentiment analysis, Twitter opinion mining, SMS

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: SemEval-2013

  11. arXiv:1911.10422  [pdf, ps, other

    cs.CL cs.AI cs.IR

    SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals

    Authors: Iris Hendrickx, Su Nam Kim, Zornitsa Kozareva, Preslav Nakov, Diarmuid Ó Séaghdha, Sebastian Padó, Marco Pennacchiotti, Lorenza Romano, Stan Szpakowicz

    Abstract: In response to the continuing research interest in computational semantic analysis, we have proposed a new task for SemEval-2010: multi-way classification of mutually exclusive semantic relations between pairs of nominals. The task is designed to compare different approaches to the problem and to provide a standard testbed for future research. In this paper, we define the task, describe the creati… ▽ More

    Submitted 23 November, 2019; originally announced November 2019.

    Comments: semantic relations, nominals

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: SemEval-2010

  12. arXiv:1911.10421  [pdf, ps, other

    cs.CL cs.AI cs.IR

    SemEval-2013 Task 4: Free Paraphrases of Noun Compounds

    Authors: Iris Hendrickx, Preslav Nakov, Stan Szpakowicz, Zornitsa Kozareva, Diarmuid Ó Séaghdha, Tony Veale

    Abstract: In this paper, we describe SemEval-2013 Task 4: the definition, the data, the evaluation and the results. The task is to capture some of the meaning of English noun compounds via paraphrasing. Given a two-word noun compound, the participating system is asked to produce an explicitly ranked list of its free-form paraphrases. The list is automatically compared and evaluated against a similarly ranke… ▽ More

    Submitted 23 November, 2019; originally announced November 2019.

    Comments: noun compounds, paraphrasing verbs, semantic interpretation, multi-word expressions, MWEs

    MSC Class: 68T50 ACM Class: I.2.7

    Journal ref: SemEval-2013

  13. arXiv:1908.05763  [pdf, other

    cs.CL cs.LG

    On-Device Text Representations Robust To Misspellings via Projections

    Authors: Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva

    Abstract: Recently, there has been a strong interest in developing natural language applications that live on personal devices such as mobile phones, watches and IoT with the objective to preserve user privacy and have low memory. Advances in Locality-Sensitive Hashing (LSH)-based projection networks have demonstrated state-of-the-art performance in various classification tasks without explicit word (or wor… ▽ More

    Submitted 23 April, 2021; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: EACL 2021

  14. arXiv:1906.01605  [pdf, other

    cs.CL cs.AI cs.LG

    Transferable Neural Projection Representations

    Authors: Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva

    Abstract: Neural word representations are at the core of many state-of-the-art natural language processing models. A widely used approach is to pre-train, store and look up word or character embedding matrices. While useful, such representations occupy huge memory making it hard to deploy on-device and often do not generalize to unknown words due to vocabulary pruning. In this paper, we propose a skip-gra… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Journal ref: Proc. of NAACL 2019

  15. arXiv:1711.03754  [pdf, other

    cs.CL

    Neural Skill Transfer from Supervised Language Tasks to Reading Comprehension

    Authors: Todor Mihaylov, Zornitsa Kozareva, Anette Frank

    Abstract: Reading comprehension is a challenging task in natural language processing and requires a set of skills to be solved. While current approaches focus on solving the task as a whole, in this paper, we propose to use a neural network `skill' transfer approach. We transfer knowledge from several lower-level language tasks (skills) including textual entailment, named entity recognition, paraphrase dete… ▽ More

    Submitted 10 November, 2017; originally announced November 2017.

  16. arXiv:1709.04071  [pdf, other

    cs.LG cs.AI cs.CL

    Variational Reasoning for Question Answering with Knowledge Graph

    Authors: Yuyu Zhang, Hanjun Dai, Zornitsa Kozareva, Alexander J. Smola, Le Song

    Abstract: Knowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noi… ▽ More

    Submitted 27 November, 2017; v1 submitted 12 September, 2017; originally announced September 2017.