Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Louvan, S

Searching in archive cs. Search in all archives.
.
  1. IndoNLI: A Natural Language Inference Dataset for Indonesian

    Authors: Rahmad Mahendra, Alham Fikri Aji, Samuel Louvan, Fahrurrozi Rahman, Clara Vania

    Abstract: We present IndoNLI, the first human-elicited NLI dataset for Indonesian. We adapt the data collection protocol for MNLI and collect nearly 18K sentence pairs annotated by crowd workers and experts. The expert-annotated data is used exclusively as a test set. It is designed to provide a challenging test-bed for Indonesian NLI by explicitly incorporating various linguistic phenomena such as numerica… ▽ More

    Submitted 27 October, 2021; originally announced October 2021.

    Comments: Accepted at EMNLP 2021 main conference

    Journal ref: https://aclanthology.org/2021.emnlp-main.821/

  2. arXiv:2011.00564  [pdf, other

    cs.CL

    Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey

    Authors: Samuel Louvan, Bernardo Magnini

    Abstract: In recent years, fostered by deep learning technologies and by the high demand for conversational AI, various approaches have been proposed that address the capacity to elicit and understand user's needs in task-oriented dialogue systems. We focus on two core tasks, slot filling (SF) and intent classification (IC), and survey how neural-based models have rapidly evolved to address natural language… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: COLING 2020

  3. arXiv:2009.03695  [pdf, other

    cs.CL

    Simple is Better! Lightweight Data Augmentation for Low Resource Slot Filling and Intent Classification

    Authors: Samuel Louvan, Bernardo Magnini

    Abstract: Neural-based models have achieved outstanding performance on slot filling and intent classification, when fairly large in-domain training data are available. However, as new domains are frequently added, creating sizeable data is expensive. We show that lightweight augmentation, a set of augmentation methods involving word span and sentence level operations, alleviates data scarcity problems. Our… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: Accepted at PACLIC 2020 - The 34th Pacific Asia Conference on Language, Information and Computation

  4. arXiv:1810.05334  [pdf, ps, other

    cs.CL

    IndoSum: A New Benchmark Dataset for Indonesian Text Summarization

    Authors: Kemal Kurniawan, Samuel Louvan

    Abstract: Automatic text summarization is generally considered as a challenging task in the NLP community. One of the challenges is the publicly available and large dataset that is relatively rare and difficult to construct. The problem is even worse for low-resource languages such as Indonesian. In this paper, we present IndoSum, a new benchmark dataset for Indonesian text summarization. The dataset consis… ▽ More

    Submitted 19 March, 2019; v1 submitted 11 October, 2018; originally announced October 2018.

    Comments: Accepted in IALP 2018

  5. arXiv:1806.01523  [pdf, other

    cs.CL

    Multi-Task Active Learning for Neural Semantic Role Labeling on Low Resource Conversational Corpus

    Authors: Fariz Ikhwantri, Samuel Louvan, Kemal Kurniawan, Bagas Abisena, Valdi Rachman, Alfan Farizki Wicaksono, Rahmad Mahendra

    Abstract: Most Semantic Role Labeling (SRL) approaches are supervised methods which require a significant amount of annotated corpus, and the annotation requires linguistic expertise. In this paper, we propose a Multi-Task Active Learning framework for Semantic Role Labeling with Entity Recognition (ER) as the auxiliary task to alleviate the need for extensive data and use additional information from ER to… ▽ More

    Submitted 5 June, 2018; originally announced June 2018.

    Comments: ACL 2018 workshop on Deep Learning Approaches for Low-Resource NLP

  6. arXiv:1805.12291  [pdf, other

    cs.CL

    Empirical Evaluation of Character-Based Model on Neural Named-Entity Recognition in Indonesian Conversational Texts

    Authors: Kemal Kurniawan, Samuel Louvan

    Abstract: Despite the long history of named-entity recognition (NER) task in the natural language processing community, previous work rarely studied the task on conversational texts. Such texts are challenging because they contain a lot of word variations which increase the number of out-of-vocabulary (OOV) words. The high number of OOV words poses a difficulty for word-based neural models. Meanwhile, there… ▽ More

    Submitted 19 September, 2018; v1 submitted 30 May, 2018; originally announced May 2018.

    Comments: Accepted in EMNLP 2018 Workshop on Noisy User-generated Text (W-NUT)