Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: Astudillo, R F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16961  [pdf, other

    cs.HC cs.AI

    The Future of Open Human Feedback

    Authors: Shachar Don-Yehiya, Ben Burtenshaw, Ramon Fernandez Astudillo, Cailean Osborne, Mimansa Jaiswal, Tzu-Sheng Kuo, Wenting Zhao, Idan Shenfeld, Andi Peng, Mikhail Yurochkin, Atoosa Kasirzadeh, Yangsibo Huang, Tatsunori Hashimoto, Yacine Jernite, Daniel Vila-Suero, Omri Abend, Jennifer Ding, Sara Hooker, Hannah Rose Kirk, Leshem Choshen

    Abstract: Human feedback on conversations with language language models (LLMs) is central to how these systems learn about the world, improve their capabilities, and are steered toward desirable and safe behaviors. However, this feedback is mostly collected by frontier AI labs and kept behind closed doors. In this work, we bring together interdisciplinary experts to assess the opportunities and challenges t… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  2. arXiv:2403.00827  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Refinement of Language Models from External Proxy Metrics Feedback

    Authors: Keshav Ramji, Young-Suk Lee, Ramón Fernandez Astudillo, Md Arafat Sultan, Tahira Naseem, Asim Munawar, Radu Florian, Salim Roukos

    Abstract: It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response. In document-grounded response generation, for example, agent responses are expected to be relevant to a user's query while also being grounded in a given document. In this paper, we introduce Proxy Metric-based Self-Refinement (ProMiSe), which enables an LLM to refine its own initial re… ▽ More

    Submitted 27 February, 2024; originally announced March 2024.

  3. arXiv:2402.11770  [pdf, other

    cs.CL

    Structured Chain-of-Thought Prompting for Few-Shot Generation of Content-Grounded QA Conversations

    Authors: Md Arafat Sultan, Jatin Ganhotra, Ramón Fernandez Astudillo

    Abstract: We introduce a structured chain-of-thought (SCoT) prompting approach to generating content-grounded multi-turn question-answer conversations using a pre-trained large language model (LLM). At the core of our proposal is a structured breakdown of the complex task into a number of states in a state machine, so that actions corresponding to various subtasks, e.g., content reading and utterance genera… ▽ More

    Submitted 19 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  4. arXiv:2402.02479  [pdf, other

    cs.LG cs.AI cs.CL cs.HC

    BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback

    Authors: Gaurav Pandey, Yatin Nandwani, Tahira Naseem, Mayank Mishra, Guangxuan Xu, Dinesh Raghu, Sachindra Joshi, Asim Munawar, Ramón Fernandez Astudillo

    Abstract: Distribution matching methods for language model alignment such as Generation with Distributional Control (GDC) and Distributional Policy Gradient (DPG) have not received the same level of attention in reinforcement learning from human feedback (RLHF) as contrastive methods such as Sequence Likelihood Calibration (SLiC), Direct Preference Optimization (DPO) and its variants. We identify high varia… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: Accepted at ICML 2024 (main conference)

  5. arXiv:2310.13961  [pdf, other

    cs.CL cs.AI

    Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs

    Authors: Young-Suk Lee, Md Arafat Sultan, Yousef El-Kurdi, Tahira Naseem Asim Munawar, Radu Florian, Salim Roukos, Ramón Fernandez Astudillo

    Abstract: Using in-context learning (ICL) for data generation, techniques such as Self-Instruct (Wang et al., 2023) or the follow-up Alpaca (Taori et al., 2023) can train strong conversational agents with only a small amount of human supervision. One limitation of these approaches is that they resort to very large language models (around 175B parameters) that are also proprietary and non-public. Here we exp… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Journal ref: EMNLP 2023

  6. arXiv:2305.17273  [pdf, other

    cs.CL cs.AI

    Slide, Constrain, Parse, Repeat: Synchronous SlidingWindows for Document AMR Parsing

    Authors: Sadhana Kumaravel, Tahira Naseem, Ramon Fernandez Astudillo, Radu Florian, Salim Roukos

    Abstract: The sliding window approach provides an elegant way to handle contexts of sizes larger than the Transformer's input window, for tasks like language modeling. Here we extend this approach to the sequence-to-sequence task of document parsing. For this, we exploit recent progress in transition-based parsing to implement a parser with synchronous sliding windows over source and target. We develop an o… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

  7. arXiv:2304.12272  [pdf, other

    cs.CL cs.AI

    AMR Parsing with Instruction Fine-tuned Pre-trained Language Models

    Authors: Young-Suk Lee, Ramón Fernandez Astudillo, Radu Florian, Tahira Naseem, Salim Roukos

    Abstract: Instruction fine-tuned language models on a collection of instruction annotated datasets (FLAN) have shown highly effective to improve model performance and generalization to unseen tasks. However, a majority of standard parsing tasks including abstract meaning representation (AMR), universal dependency (UD), semantic role labeling (SRL) has been excluded from the FLAN collections for both model t… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  8. arXiv:2205.01464  [pdf, other

    cs.CL

    Inducing and Using Alignments for Transition-based AMR Parsing

    Authors: Andrew Drozdov, Jiawei Zhou, Radu Florian, Andrew McCallum, Tahira Naseem, Yoon Kim, Ramon Fernandez Astudillo

    Abstract: Transition-based parsers for Abstract Meaning Representation (AMR) rely on node-to-word alignments. These alignments are learned separately from parser training and require a complex pipeline of rule-based components, pre-processing, and post-processing to satisfy domain-specific constraints. Parsers also train on a point-estimate of the alignment pipeline, neglecting the uncertainty due to the in… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: Accepted at NAACL 2022

  9. arXiv:2112.08513  [pdf, other

    cs.CL

    DocAMR: Multi-Sentence AMR Representation and Evaluation

    Authors: Tahira Naseem, Austin Blodgett, Sadhana Kumaravel, Tim O'Gorman, Young-Suk Lee, Jeffrey Flanigan, Ramón Fernandez Astudillo, Radu Florian, Salim Roukos, Nathan Schneider

    Abstract: Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation. Taking advantage of a super-sentential level of coreference annotation from previous work, we introduce a simple algorithm… ▽ More

    Submitted 6 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

    MSC Class: I.2.7

  10. arXiv:2112.07877  [pdf, other

    cs.CL

    Learning to Transpile AMR into SPARQL

    Authors: Mihaela Bornea, Ramon Fernandez Astudillo, Tahira Naseem, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Pavan Kapanipathi, Radu Florian, Salim Roukos

    Abstract: We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA). This allows us to delegate part of the semantic representation to a strongly pre-trained semantic parser, while learning transpiling with small amount of paired data. We depart from recent work relating AMR and SPARQL constructs, but rather than applying… ▽ More

    Submitted 8 December, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

  11. arXiv:2112.07790  [pdf, ps, other

    cs.CL cs.AI

    Maximum Bayes Smatch Ensemble Distillation for AMR Parsing

    Authors: Young-Suk Lee, Ramon Fernandez Astudillo, Thanh Lam Hoang, Tahira Naseem, Radu Florian, Salim Roukos

    Abstract: AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data augmentation seems to be fading. In this pa… ▽ More

    Submitted 2 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Journal ref: NAACL-HLT 2022

  12. arXiv:2110.15534  [pdf, other

    cs.CL

    Structure-aware Fine-tuning of Sequence-to-sequence Transformers for Transition-based AMR Parsing

    Authors: Jiawei Zhou, Tahira Naseem, Ramón Fernandez Astudillo, Young-Suk Lee, Radu Florian, Salim Roukos

    Abstract: Predicting linearized Abstract Meaning Representation (AMR) graphs using pre-trained sequence-to-sequence Transformer models has recently led to large improvements on AMR parsing benchmarks. These parsers are simple and avoid explicit modeling of structure but lack desirable properties such as graph well-formedness guarantees or built-in graph-sentence alignments. In this work we explore the integ… ▽ More

    Submitted 29 October, 2021; originally announced October 2021.

    Comments: EMNLP 2021 main conference

  13. arXiv:2110.09131  [pdf, other

    cs.CL cs.AI

    Ensembling Graph Predictions for AMR Parsing

    Authors: Hoang Thanh Lam, Gabriele Picco, Yufang Hou, Young-Suk Lee, Lam M. Nguyen, Dzung T. Phan, Vanessa López, Ramon Fernandez Astudillo

    Abstract: In many machine learning tasks, models are trained to predict structure data such as graphs. For example, in natural language processing, it is very common to parse texts into dependency trees or abstract meaning representation (AMR) graphs. On the other hand, ensemble methods combine predictions from multiple models to create a new one that is more robust and accurate than individual predictions.… ▽ More

    Submitted 24 January, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: Published at NeurIPS 2021

  14. arXiv:2108.00104  [pdf, other

    cs.CL

    Structural Guidance for Transformer Language Models

    Authors: Peng Qian, Tahira Naseem, Roger Levy, Ramón Fernandez Astudillo

    Abstract: Transformer-based language models pre-trained on large amounts of text data have proven remarkably successful in learning generic transferable linguistic representations. Here we study whether structural guidance leads to more human-like systematic linguistic generalization in Transformer language models without resorting to pre-training on very large amounts of data. We explore two general ideas.… ▽ More

    Submitted 30 July, 2021; originally announced August 2021.

    Comments: To be issued as paper revision for ACL 2021

  15. arXiv:2104.14674  [pdf, other

    cs.CL

    AMR Parsing with Action-Pointer Transformer

    Authors: Jiawei Zhou, Tahira Naseem, Ramón Fernandez Astudillo, Radu Florian

    Abstract: Abstract Meaning Representation parsing is a sentence-to-graph prediction task where target nodes are not explicitly aligned to sentence tokens. However, since graph nodes are semantically based on one or more sentence tokens, implicit alignments can be derived. Transition-based parsers operate over the sentence from left to right, capturing this inductive bias via alignments at the cost of limite… ▽ More

    Submitted 18 May, 2021; v1 submitted 29 April, 2021; originally announced April 2021.

    Comments: Accepted at NAACL 2021

  16. arXiv:2104.07474  [pdf, other

    eess.AS cs.AI cs.LG cs.SD

    EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition

    Authors: Murali Karthick Baskar, Lukáš Burget, Shinji Watanabe, Ramon Fernandez Astudillo, Jan "Honza'' Černocký

    Abstract: Self-supervised ASR-TTS models suffer in out-of-domain data conditions. Here we propose an enhanced ASR-TTS (EAT) model that incorporates two main features: 1) The ASR$\rightarrow$TTS direction is equipped with a language model reward to penalize the ASR hypotheses before forwarding it to TTS. 2) In the TTS$\rightarrow$ASR direction, a hyper-parameter is introduced to scale the attention context f… ▽ More

    Submitted 13 April, 2021; originally announced April 2021.

  17. arXiv:2102.02189  [pdf, other

    cs.CL cs.AI

    Bootstrapping Multilingual AMR with Contextual Word Alignments

    Authors: Janaki Sheth, Young-Suk Lee, Ramon Fernandez Astudillo, Tahira Naseem, Radu Florian, Salim Roukos, Todd Ward

    Abstract: We develop high performance multilingualAbstract Meaning Representation (AMR) sys-tems by projecting English AMR annotationsto other languages with weak supervision. Weachieve this goal by bootstrapping transformer-based multilingual word embeddings, in partic-ular those from cross-lingual RoBERTa (XLM-R large). We develop a novel technique forforeign-text-to-English AMR alignment, usingthe contex… ▽ More

    Submitted 3 February, 2021; originally announced February 2021.

    Journal ref: EACL 2021

  18. arXiv:2010.10673  [pdf, other

    cs.CL

    Pushing the Limits of AMR Parsing with Self-Learning

    Authors: Young-Suk Lee, Ramon Fernandez Astudillo, Tahira Naseem, Revanth Gangi Reddy, Radu Florian, Salim Roukos

    Abstract: Abstract Meaning Representation (AMR) parsing has experienced a notable growth in performance in the last two years, due both to the impact of transfer learning and the development of novel architectures specific to AMR. At the same time, self-learning techniques have helped push the performance boundaries of other natural language processing applications, such as machine translation or question a… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted to Findings of EMNLP2020, open review https://openreview.net/forum?id=4q5-oJgLiO, code https://github.com/IBM/transition-amr-parser

  19. arXiv:2010.10669  [pdf, other

    cs.CL

    Transition-based Parsing with Stack-Transformers

    Authors: Ramon Fernandez Astudillo, Miguel Ballesteros, Tahira Naseem, Austin Blodgett, Radu Florian

    Abstract: Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local state modeling of contextualized features, e.g. Bi-LSTM parsers. Given the success of Transformer architectures in recent parsing systems, this work explores mod… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

    Comments: Accepted to Findings of EMNLP2020, open review https://openreview.net/forum?id=b36spsuUAde, code https://github.com/IBM/transition-amr-parser

  20. arXiv:2005.09123  [pdf, ps, other

    cs.CL cs.LG

    GPT-too: A language-model-first approach for AMR-to-text generation

    Authors: Manuel Mager, Ramon Fernandez Astudillo, Tahira Naseem, Md Arafat Sultan, Young-Suk Lee, Radu Florian, Salim Roukos

    Abstract: Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs. Existing approaches to generating text from AMR have focused on training sequence-to-sequence or graph-to-sequence models on AMR annotated data only. In this paper, we propose an alternative approach that combines a strong pre-trained language model with cycle consistency-based re-scoring. Despite the simplicity of t… ▽ More

    Submitted 27 May, 2020; v1 submitted 18 May, 2020; originally announced May 2020.

    Comments: Paper accepted to the Annual Meeting of the Association for Computational Linguistics (ACL 2020)

  21. arXiv:1609.02082  [pdf, ps, other

    cs.LG cs.CL cs.SD

    An improved uncertainty decoding scheme with weighted samples for DNN-HMM hybrid systems

    Authors: Christian Huemmer, Ramón Fernández Astudillo, Walter Kellermann

    Abstract: In this paper, we advance a recently-proposed uncertainty decoding scheme for DNN-HMM (deep neural network - hidden Markov model) hybrid systems. This numerical sampling concept averages DNN outputs produced by a finite set of feature samples (drawn from a probabilistic distortion model) to approximate the posterior likelihoods of the context-dependent HMM states. As main innovation, we propose a… ▽ More

    Submitted 4 August, 2016; originally announced September 2016.

    Comments: 5 pages

  22. arXiv:1602.02068  [pdf, other

    cs.CL cs.LG stat.ML

    From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

    Authors: André F. T. Martins, Ramón Fernandez Astudillo

    Abstract: We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network trained with backpropagation. Then, we propose a new smooth and convex loss function which is the sparsemax analogue of the logistic loss. We reveal an unexpect… ▽ More

    Submitted 8 February, 2016; v1 submitted 5 February, 2016; originally announced February 2016.

    Comments: Minor corrections

  23. arXiv:1508.02096  [pdf, other

    cs.CL

    Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation

    Authors: Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso

    Abstract: We introduce a model for constructing vector representations of words by composing characters using bidirectional LSTMs. Relative to traditional word representation models that have independent vectors for each word type, our model requires only a single vector per character type and a fixed set of parameters for the compositional model. Despite the compactness of this model and, more importantly,… ▽ More

    Submitted 23 May, 2016; v1 submitted 9 August, 2015; originally announced August 2015.