-
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset
Authors:
Allen Roush,
Yusuf Shabazz,
Arvind Balaji,
Peter Zhang,
Stefano Mezza,
Markus Zhang,
Sanjay Basu,
Sriram Vishwanath,
Mehdi Fatemi,
Ravid Shwartz-Ziv
Abstract:
We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable r…
▽ More
We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable resources for training and evaluation. Our extensive experiments demonstrate the efficacy of fine-tuning state-of-the-art large language models for argumentative abstractive summarization across various methods, models, and datasets. By providing this comprehensive resource, we aim to advance computational argumentation and support practical applications for debaters, educators, and researchers. OpenDebateEvidence is publicly available to support further research and innovation in computational argumentation. Access it here: https://huggingface.co/datasets/Yusuf5/OpenCaselist
△ Less
Submitted 5 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Building Proactive Voice Assistants: When and How (not) to Interact
Authors:
O. Miksik,
I. Munasinghe,
J. Asensio-Cubero,
S. Reddy Bethi,
S-T. Huang,
S. Zylfo,
X. Liu,
T. Nica,
A. Mitrocsak,
S. Mezza,
R. Beard,
R. Shi,
R. Ng,
P. Mediano,
Z. Fountas,
S-H. Lee,
J. Medvesek,
H. Zhuang,
Y. Rogers,
P. Swietojanski
Abstract:
Voice assistants have recently achieved remarkable commercial success. However, the current generation of these devices is typically capable of only reactive interactions. In other words, interactions have to be initiated by the user, which somewhat limits their usability and user experience. We propose, that the next generation of such devices should be able to proactively provide the right infor…
▽ More
Voice assistants have recently achieved remarkable commercial success. However, the current generation of these devices is typically capable of only reactive interactions. In other words, interactions have to be initiated by the user, which somewhat limits their usability and user experience. We propose, that the next generation of such devices should be able to proactively provide the right information in the right way at the right time, without being prompted by the user. However, achieving this is not straightforward, since there is the danger it could interrupt what the user is doing too much, resulting in it being distracting or even annoying. Furthermore, it could unwittingly, reveal sensitive/private information to third parties. In this report, we discuss the challenges of developing proactively initiated interactions, and suggest a framework for when it is appropriate for the device to intervene. To validate our design assumptions, we describe firstly, how we built a functioning prototype and secondly, a user study that was conducted to assess users' reactions and reflections when in the presence of a proactive voice assistant. This pre-print summarises the state, ideas and progress towards a proactive device as of autumn 2018.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
ISO-Standard Domain-Independent Dialogue Act Tagging for Conversational Agents
Authors:
Stefano Mezza,
Alessandra Cervone,
Giuliano Tortoreto,
Evgeny A. Stepanov,
Giuseppe Riccardi
Abstract:
Dialogue Act (DA) tagging is crucial for spoken language understanding systems, as it provides a general representation of speakers' intents, not bound to a particular dialogue system. Unfortunately, publicly available data sets with DA annotation are all based on different annotation schemes and thus incompatible with each other. Moreover, their schemes often do not cover all aspects necessary fo…
▽ More
Dialogue Act (DA) tagging is crucial for spoken language understanding systems, as it provides a general representation of speakers' intents, not bound to a particular dialogue system. Unfortunately, publicly available data sets with DA annotation are all based on different annotation schemes and thus incompatible with each other. Moreover, their schemes often do not cover all aspects necessary for open-domain human-machine interaction. In this paper, we propose a methodology to map several publicly available corpora to a subset of the ISO standard, in order to create a large task-independent training corpus for DA classification. We show the feasibility of using this corpus to train a domain-independent DA tagger testing it on out-of-domain conversational data, and argue the importance of training on multiple corpora to achieve robustness across different DA categories.
△ Less
Submitted 12 June, 2018;
originally announced June 2018.