Search | arXiv e-print repository

FabricQA-Extractor: A Question Answering System to Extract Information from Documents using Natural Language Questions

Authors: Qiming Wang, Raul Castro Fernandez

Abstract: Reading comprehension models answer questions posed in natural language when provided with a short passage of text. They present an opportunity to address a long-standing challenge in data management: the extraction of structured data from unstructured text. Consequently, several approaches are using these models to perform information extraction. However, these modern approaches leave an opportun… ▽ More Reading comprehension models answer questions posed in natural language when provided with a short passage of text. They present an opportunity to address a long-standing challenge in data management: the extraction of structured data from unstructured text. Consequently, several approaches are using these models to perform information extraction. However, these modern approaches leave an opportunity behind because they do not exploit the relational structure of the target extraction table. In this paper, we introduce a new model, Relation Coherence, that exploits knowledge of the relational structure to improve the extraction quality. We incorporate the Relation Coherence model as part of FabricQA-Extractor, an end-to-end system we built from scratch to conduct large scale extraction tasks over millions of documents. We demonstrate on two datasets with millions of passages that Relation Coherence boosts extraction performance and evaluate FabricQA-Extractor on large scale datasets. △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.04092 [pdf, other]

Programmable Dataflows: Abstraction and Programming Model for Data Sharing

Authors: Siyuan Xia, Chris Zhu, Tapan Srivastava, Bridget Fahey, Raul Castro Fernandez

Abstract: Data sharing is central to a wide variety of applications such as fraud detection, ad matching, and research. The lack of data sharing abstractions makes the solution to each data sharing problem bespoke and cost-intensive, hampering value generation. In this paper, we first introduce a data sharing model to represent every data sharing problem with a sequence of dataflows. From the model, we dist… ▽ More Data sharing is central to a wide variety of applications such as fraud detection, ad matching, and research. The lack of data sharing abstractions makes the solution to each data sharing problem bespoke and cost-intensive, hampering value generation. In this paper, we first introduce a data sharing model to represent every data sharing problem with a sequence of dataflows. From the model, we distill an abstraction, the contract, which agents use to communicate the intent of a dataflow and evaluate its consequences, before the dataflow takes place. This helps agents move towards a common sharing goal without violating any regulatory and privacy constraints. Then, we design and implement the contract programming model (CPM), which allows agents to program data sharing applications catered to each problem's needs. Contracts permit data sharing, but their interactive nature may introduce inefficiencies. To mitigate those inefficiencies, we extend the CPM so that it can save intermediate outputs of dataflows, and skip computation if a dataflow tries to access data that it does not have access to. In our evaluation, we show that 1) the contract abstraction is general enough to represent a wide range of sharing problems, 2) we can write programs for complex data sharing problems and exhibit qualitative improvements over other alternate technologies, and 3) quantitatively, our optimizations make sharing programs written with the CPM efficient. △ Less

Submitted 7 August, 2024; originally announced August 2024.

arXiv:2408.01580 [pdf, other]

Controlling Dataflows with a Bolt-on Data Escrow

Authors: Zhiru Zhu, Raul Castro Fernandez

Abstract: The data-driven economy has created tremendous value in our society. Individuals share their data with platforms in exchange for services such as search, social networks, and health recommendations. Platforms use the data to provide those services and create other revenue-generating opportunities, e.g., selling the data to data brokers. With the ever-expanding data economy comes the growing concer… ▽ More The data-driven economy has created tremendous value in our society. Individuals share their data with platforms in exchange for services such as search, social networks, and health recommendations. Platforms use the data to provide those services and create other revenue-generating opportunities, e.g., selling the data to data brokers. With the ever-expanding data economy comes the growing concern about potential data misuse. While most platforms give individuals certain control over their data (i.e., what data is being shared), individuals do not know how the data will be used once shared; they cannot control the purpose. In this paper, we introduce a data escrow design that permits individuals to observe all dataflows - not just what is shared but for what purpose. Rather than data flowing to the platform, the platform delegates their computation to the escrow, where individuals can observe and manage their data. To make the data escrow practical, we design and implement a prototype that works alongside the Apple ecosystem; specifically, we retrofit the Apple SDKs with a programming interface to enable delegated computation. Our solution does not depend on Apple's software and can be applied to other platforms, but building for Apple lets us study the main hypothesis of our work: whether such a data escrow solution is a feasible alternative to today's data governance. We show that our escrow prototype implementation is efficient, and we analyze the dataflows in real-world apps and show that the escrow's programming interface supports implementing a wide range of dataflows. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2408.00253 [pdf, other]

Saving Money for Analytical Workloads in the Cloud

Authors: Tapan Srivastava, Raul Castro Fernandez

Abstract: As users migrate their analytical workloads to cloud databases, it is becoming just as important to reduce monetary costs as it is to optimize query runtime. In the cloud, a query is billed based on either its compute time or the amount of data it processes. We observe that analytical queries are either compute- or IO-bound and each query type executes cheaper in a different pricing model. We expl… ▽ More As users migrate their analytical workloads to cloud databases, it is becoming just as important to reduce monetary costs as it is to optimize query runtime. In the cloud, a query is billed based on either its compute time or the amount of data it processes. We observe that analytical queries are either compute- or IO-bound and each query type executes cheaper in a different pricing model. We exploit this opportunity and propose methods to build cheaper execution plans across pricing models that complete within user-defined runtime constraints. We implement these methods and produce execution plans spanning multiple pricing models that reduce the monetary cost for workloads by as much as 56%. We reduce individual query costs by as much as 90%. The prices chosen by cloud vendors for cloud services also impact savings opportunities. To study this effect, we simulate our proposed methods with different cloud prices and observe that multi-cloud savings are robust to changes in cloud vendor prices. These results indicate the massive opportunity to save money by executing workloads across multiple pricing models. △ Less

Submitted 31 July, 2024; originally announced August 2024.

Comments: 12 pages; VLDB 2024

arXiv:2407.17914 [pdf, other]

Modelling Multimodal Integration in Human Concept Processing with Vision-and-Language Models

Authors: Anna Bavaresco, Marianne de Heer Kloots, Sandro Pezzelle, Raquel Fernández

Abstract: Representations from deep neural networks (DNNs) have proven remarkably predictive of neural activity involved in both visual and linguistic processing. Despite these successes, most studies to date concern unimodal DNNs, encoding either visual or textual input but not both. Yet, there is growing evidence that human meaning representations integrate linguistic and sensory-motor information. Here w… ▽ More Representations from deep neural networks (DNNs) have proven remarkably predictive of neural activity involved in both visual and linguistic processing. Despite these successes, most studies to date concern unimodal DNNs, encoding either visual or textual input but not both. Yet, there is growing evidence that human meaning representations integrate linguistic and sensory-motor information. Here we investigate whether the integration of multimodal information operated by current vision-and-language DNN models (VLMs) leads to representations that are more aligned with human brain activity than those obtained by language-only and vision-only DNNs. We focus on fMRI responses recorded while participants read concept words in the context of either a full sentence or an accompanying picture. Our results reveal that VLM representations correlate more strongly than language- and vision-only DNNs with activations in brain areas functionally related to language processing. A comparison between different types of visuo-linguistic architectures shows that recent generative VLMs tend to be less brain-aligned than previous architectures with lower performance on downstream applications. Moreover, through an additional analysis comparing brain vs. behavioural alignment across multiple VLMs, we show that -- with one remarkable exception -- representations that strongly align with behavioural judgments do not correlate highly with brain responses. This indicates that brain similarity does not go hand in hand with behavioural similarity, and vice versa. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.04559 [pdf, other]

Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition

Authors: Aditya K Surikuchi, Raquel Fernández, Sandro Pezzelle

Abstract: Visual storytelling consists in generating a natural language story given a temporally ordered sequence of images. This task is not only challenging for models, but also very difficult to evaluate with automatic metrics since there is no consensus about what makes a story 'good'. In this paper, we introduce a novel method that measures story quality in terms of human likeness regarding three key a… ▽ More Visual storytelling consists in generating a natural language story given a temporally ordered sequence of images. This task is not only challenging for models, but also very difficult to evaluate with automatic metrics since there is no consensus about what makes a story 'good'. In this paper, we introduce a novel method that measures story quality in terms of human likeness regarding three key aspects highlighted in previous work: visual grounding, coherence, and repetitiveness. We then use this method to evaluate the stories generated by several models, showing that the foundation model LLaVA obtains the best result, but only slightly so compared to TAPM, a 50-times smaller visual storytelling model. Upgrading the visual and language components of TAPM results in a model that yields competitive performance with a relatively low number of parameters. Finally, we carry out a human evaluation study, whose results suggest that a 'good' story may require more than a human-like level of visual grounding, coherence, and repetition. △ Less

Submitted 29 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.18403 [pdf, other]

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

Abstract: There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human anno… ▽ More There is an increasing trend towards evaluating NLP models with LLM-generated judgments instead of human judgments. In the absence of a comparison against human data, this raises concerns about the validity of these evaluations; in case they are conducted with proprietary models, this also raises concerns over reproducibility. We provide JUDGE-BENCH, a collection of 20 NLP datasets with human annotations, and comprehensively evaluate 11 current LLMs, covering both open-weight and proprietary models, for their ability to replicate the annotations. Our evaluations show that each LLM exhibits a large variance across datasets in its correlation to human judgments. We conclude that LLMs are not yet ready to systematically replace human judges in NLP. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.13663 [pdf, other]

Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation

Authors: Jirui Qi, Gabriele Sarti, Raquel Fernández, Arianna Bisazza

Abstract: Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sou… ▽ More Ensuring the verifiability of model answers is a fundamental challenge for retrieval-augmented generation (RAG) in the question answering (QA) domain. Recently, self-citation prompting was proposed to make large language models (LLMs) generate citations to supporting documents along with their answers. However, self-citing LLMs often struggle to match the required format, refer to non-existent sources, and fail to faithfully reflect LLMs' context usage throughout the generation. In this work, we present MIRAGE --Model Internals-based RAG Explanations -- a plug-and-play approach using model internals for faithful answer attribution in RAG applications. MIRAGE detects context-sensitive answer tokens and pairs them with retrieved documents contributing to their prediction via saliency methods. We evaluate our proposed approach on a multilingual extractive QA dataset, finding high agreement with human answer attribution. On open-ended QA, MIRAGE achieves citation quality and efficiency comparable to self-citation while also allowing for a finer-grained control of attribution parameters. Our qualitative evaluation highlights the faithfulness of MIRAGE's attributions and underscores the promising application of model internals for RAG answer attribution. △ Less

Submitted 1 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

Comments: Under review. Code and data released at https://github.com/Betswish/MIRAGE

arXiv:2406.07243 [pdf, other]

MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs

Authors: Vera Neplenbroek, Arianna Bisazza, Raquel Fernández

Abstract: Generative large language models (LLMs) have been shown to exhibit harmful biases and stereotypes. While safety fine-tuning typically takes place in English, if at all, these models are being used by speakers of many different languages. There is existing evidence that the performance of these models is inconsistent across languages and that they discriminate based on demographic factors of the us… ▽ More Generative large language models (LLMs) have been shown to exhibit harmful biases and stereotypes. While safety fine-tuning typically takes place in English, if at all, these models are being used by speakers of many different languages. There is existing evidence that the performance of these models is inconsistent across languages and that they discriminate based on demographic factors of the user. Motivated by this, we investigate whether the social stereotypes exhibited by LLMs differ as a function of the language used to prompt them, while controlling for cultural differences and task accuracy. To this end, we present MBBQ (Multilingual Bias Benchmark for Question-answering), a carefully curated version of the English BBQ dataset extended to Dutch, Spanish, and Turkish, which measures stereotypes commonly held across these languages. We further complement MBBQ with a parallel control dataset to measure task performance on the question-answering task independently of bias. Our results based on several open-source and proprietary LLMs confirm that some non-English languages suffer from bias more than English, even when controlling for cultural shifts. Moreover, we observe significant cross-lingual differences in bias behaviour for all except the most accurate models. With the release of MBBQ, we hope to encourage further research on bias in multilingual settings. The dataset and code are available at https://github.com/Veranep/MBBQ. △ Less

Submitted 17 July, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to COLM 2024

arXiv:2406.05547 [pdf, other]

Exploring the Benefits of Tokenization of Discrete Acoustic Units

Authors: Avihu Dekel, Raul Fernandez

Abstract: Tokenization algorithms that merge the units of a base vocabulary into larger, variable-rate units have become standard in natural language processing tasks. This idea, however, has been mostly overlooked when the vocabulary consists of phonemes or Discrete Acoustic Units (DAUs), an audio-based representation that is playing an increasingly important role due to the success of discrete language-mo… ▽ More Tokenization algorithms that merge the units of a base vocabulary into larger, variable-rate units have become standard in natural language processing tasks. This idea, however, has been mostly overlooked when the vocabulary consists of phonemes or Discrete Acoustic Units (DAUs), an audio-based representation that is playing an increasingly important role due to the success of discrete language-modeling techniques. In this paper, we showcase the advantages of tokenization of phonetic units and of DAUs on three prediction tasks: grapheme-to-phoneme, grapheme-to-DAUs, and unsupervised speech generation using DAU language modeling. We demonstrate that tokenization yields significant improvements in terms of performance, as well as training and inference speed, across all three tasks. We also offer theoretical insights to provide some explanation for the superior performance observed. △ Less

Submitted 8 June, 2024; originally announced June 2024.

Comments: Interspeech 2024

arXiv:2405.20846 [pdf, other]

Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models

Authors: A. Bavaresco, A. Testoni, R. Fernández

Abstract: Image-based advertisements are complex multimodal stimuli that often contain unusual visual elements and figurative language. Previous research on automatic ad understanding has reported impressive zero-shot accuracy of contrastive vision-and-language models (VLMs) on an ad-explanation retrieval task. Here, we examine the original task setup and show that contrastive VLMs can solve it by exploitin… ▽ More Image-based advertisements are complex multimodal stimuli that often contain unusual visual elements and figurative language. Previous research on automatic ad understanding has reported impressive zero-shot accuracy of contrastive vision-and-language models (VLMs) on an ad-explanation retrieval task. Here, we examine the original task setup and show that contrastive VLMs can solve it by exploiting grounding heuristics. To control for this confound, we introduce TRADE, a new evaluation test set with adversarial grounded explanations. While these explanations look implausible to humans, we show that they "fool" four different contrastive VLMs. Our findings highlight the need for an improved operationalisation of automatic ad understanding that truly evaluates VLMs' multimodal reasoning abilities. We make our code and TRADE available at https://github.com/dmg-illc/trade . △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted to the main conference ACL 2024

arXiv:2405.08546 [pdf, other]

Analysing Cross-Speaker Convergence in Face-to-Face Dialogue through the Lens of Automatically Detected Shared Linguistic Constructions

Authors: Esam Ghaleb, Marlou Rasenberg, Wim Pouw, Ivan Toni, Judith Holler, Aslı Özyürek, Raquel Fernández

Abstract: Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain ope… ▽ More Conversation requires a substantial amount of coordination between dialogue participants, from managing turn taking to negotiating mutual understanding. Part of this coordination effort surfaces as the reuse of linguistic behaviour across speakers, a process often referred to as alignment. While the presence of linguistic alignment is well documented in the literature, several questions remain open, including the extent to which patterns of reuse across speakers have an impact on the emergence of labelling conventions for novel referents. In this study, we put forward a methodology for automatically detecting shared lemmatised constructions -- expressions with a common lexical core used by both speakers within a dialogue -- and apply it to a referential communication corpus where participants aim to identify novel objects for which no established labels exist. Our analyses uncover the usage patterns of shared constructions in interaction and reveal that features such as their frequency and the amount of different constructions used for a referent are associated with the degree of object labelling convergence the participants exhibit after social interaction. More generally, the present study shows that automatically detected shared constructions offer a useful level of analysis to investigate the dynamics of reference negotiation in dialogue. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: Accepted for publication at the 46th Proceedings of the Annual Meeting of the Cognitive Science Society

arXiv:2404.18798 [pdf, other]

Multi-Agent Synchronization Tasks

Authors: Rolando Fernandez, Garrett Warnell, Derrik E. Asher, Peter Stone

Abstract: In multi-agent reinforcement learning (MARL), coordination plays a crucial role in enhancing agents' performance beyond what they could achieve through cooperation alone. The interdependence of agents' actions, coupled with the need for communication, leads to a domain where effective coordination is crucial. In this paper, we introduce and define $\textit{Multi-Agent Synchronization Tasks}$ (MSTs… ▽ More In multi-agent reinforcement learning (MARL), coordination plays a crucial role in enhancing agents' performance beyond what they could achieve through cooperation alone. The interdependence of agents' actions, coupled with the need for communication, leads to a domain where effective coordination is crucial. In this paper, we introduce and define $\textit{Multi-Agent Synchronization Tasks}$ (MSTs), a novel subset of multi-agent tasks. We describe one MST, that we call $\textit{Synchronized Predator-Prey}$, offering a detailed description that will serve as the basis for evaluating a selection of recent state-of-the-art (SOTA) MARL algorithms explicitly designed to address coordination challenges through the use of communication strategies. Furthermore, we present empirical evidence that reveals the limitations of the algorithms assessed to solve MSTs, demonstrating their inability to scale effectively beyond 2-agent coordination tasks in scenarios where communication is a requisite component. Finally, the results raise questions about the applicability of recent SOTA approaches for complex coordination tasks (i.e. MSTs) and prompt further exploration into the underlying causes of their limitations in this context. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: Adaptive Learning Agents Workshop at AAMAS 2024

arXiv:2404.14952 [pdf, other]

Leveraging Speech for Gesture Detection in Multimodal Communication

Authors: Esam Ghaleb, Ilya Burenko, Marlou Rasenberg, Wim Pouw, Ivan Toni, Peter Uhrig, Anna Wilson, Judith Holler, Aslı Özyürek, Raquel Fernández

Abstract: Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability… ▽ More Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability, neglecting the integration of speech and vision signals to detect gestures that co-occur with speech. This work addresses this gap by focusing on co-speech gesture detection, emphasising the synchrony between speech and co-speech hand gestures. We address three main challenges: the variability of gesture forms, the temporal misalignment between gesture and speech onsets, and differences in sampling rate between modalities. We investigate extended speech time windows and employ separate backbone models for each modality to address the temporal misalignment and sampling rate differences. We utilize Transformer encoders in cross-modal and early fusion techniques to effectively align and integrate speech and skeletal sequences. The study results show that combining visual and speech information significantly enhances gesture detection performance. Our findings indicate that expanding the speech buffer beyond visual time segments improves performance and that multimodal integration using cross-modal and early fusion techniques outperforms baseline methods using unimodal and late fusion methods. Additionally, we find a correlation between the models' gesture prediction confidence and low-level speech frequency features potentially associated with gestures. Overall, the study provides a better understanding and detection methods for co-speech gestures, facilitating the analysis of multimodal communication. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.10250 [pdf, other]

AniFrame: A Programming Language for 2D Drawing and Frame-Based Animation

Authors: Mark Edward M. Gonzales, Hans Oswald A. Ibrahim, Elyssia Barrie H. Ong, Ryan Austin Fernandez

Abstract: Creative coding is an experimentation-heavy activity that requires translating high-level visual ideas into code. However, most languages and libraries for creative coding may not be adequately intuitive for beginners. In this paper, we present AniFrame, a domain-specific language for drawing and animation. Designed for novice programmers, it (i) features animation-specific data types, operations,… ▽ More Creative coding is an experimentation-heavy activity that requires translating high-level visual ideas into code. However, most languages and libraries for creative coding may not be adequately intuitive for beginners. In this paper, we present AniFrame, a domain-specific language for drawing and animation. Designed for novice programmers, it (i) features animation-specific data types, operations, and built-in functions to simplify the creation and animation of composite objects, (ii) allows for fine-grained control over animation sequences through explicit specification of the target object and the start and end frames, (iii) reduces the learning curve through a Python-like syntax, type inferencing, and a minimal set of control structures and keywords that map closely to their semantic intent, and (iv) promotes computational expressivity through support for common mathematical operations, built-in trigonometric functions, and user-defined recursion. Our usability test demonstrates AniFrame's potential to enhance readability and writability for multiple creative coding use cases. AniFrame is open-source, and its implementation and reference are available at https://github.com/memgonzales/aniframe-language. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: Accepted for paper presentation at the 24th Philippine Computing Science Congress (PCSC 2024), held in Laguna, Philippines

ACM Class: D.3.2; J.5

arXiv:2403.11209 [pdf, other]

Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations

Authors: Claudio Pinhanez, Raul Fernandez, Marcelo Grave, Julio Nogima, Ron Hoory

Abstract: Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use. In this paper we explore some unexpected challenges in the representation of race we found in the process of developing an U.S. English Text-to-Speech (TTS) system aimed to sound like an educated, professional, regional acce… ▽ More Representations of AI agents in user interfaces and robotics are predominantly White, not only in terms of facial and skin features, but also in the synthetic voices they use. In this paper we explore some unexpected challenges in the representation of race we found in the process of developing an U.S. English Text-to-Speech (TTS) system aimed to sound like an educated, professional, regional accent-free African American woman. The paper starts by presenting the results of focus groups with African American IT professionals where guidelines and challenges for the creation of a representative and appropriate TTS system were discussed and gathered, followed by a discussion about some of the technical difficulties faced by the TTS system developers. We then describe two studies with U.S. English speakers where the participants were not able to attribute the correct race to the African American TTS voice while overwhelmingly correctly recognizing the race of a White TTS system of similar quality. A focus group with African American IT workers not only confirmed the representativeness of the African American voice we built, but also suggested that the surprising recognition results may have been caused by the inability or the latent prejudice from non-African Americans to associate educated, non-vernacular, professionally-sounding voices to African American people. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Full version including appendixes

arXiv:2403.08810 [pdf]

doi 10.1016/j.engappai.2023.107149

Comparison of edge computing methods in Internet of Things architectures for efficient estimation of indoor environmental parameters with Machine Learning

Authors: Jose-Carlos Gamazo-Real, Raul Torres Fernandez, Adrian Murillo Armas

Abstract: The large increase in the number of Internet of Things (IoT) devices have revolutionised the way data is processed, which added to the current trend from cloud to edge computing has resulted in the need for efficient and reliable data processing near the data sources using energy-efficient devices. Two methods based on low-cost edge-IoT architectures are proposed to implement lightweight Machine L… ▽ More The large increase in the number of Internet of Things (IoT) devices have revolutionised the way data is processed, which added to the current trend from cloud to edge computing has resulted in the need for efficient and reliable data processing near the data sources using energy-efficient devices. Two methods based on low-cost edge-IoT architectures are proposed to implement lightweight Machine Learning (ML) models that estimate indoor environmental quality (IEQ) parameters, such as Artificial Neural Networks of Multilayer Perceptron type. Their implementation is based on centralised and distributed parallel IoT architectures, connected via wireless, which share commercial off-the-self modules for data acquisition and sensing, such as sensors for temperature, humidity, illuminance, CO2, and other gases. The centralised method uses a Graphics Processing Unit and the Message Queuing Telemetry Transport protocol, but the distributed method utilises low performance ARM-based devices and the Message Passing Interface protocol. Although multiple IEQ parameters are measured, the training and testing of ML models is accomplished with experiments focused on small temperature and illuminance datasets to reduce data processing load, obtained from sudden spikes, square profiles and sawteeth test cases. The results show a high estimation performance with F-score and Accuracy values close to 0.95, and an almost theorical Speedup with a reduction in power consumption close to 37% in the distributed parallel approach. In addition, similar or slightly better performance is achieved compared to equivalent IoT architectures from related research, but error reduction of 35 to 76% is accomplished with an adequate balance between performance and energy efficiency. △ Less

Submitted 7 February, 2024; originally announced March 2024.

Journal ref: Engineering Applications of Artificial Intelligence, 2023, vol. 126, Part D, no. 107149, pp. 1-27, ISSN 0952-1976

arXiv:2402.16102 [pdf, other]

Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

Authors: Joris Baan, Raquel Fernández, Barbara Plank, Wilker Aziz

Abstract: With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; t… ▽ More With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; the second as an indication of human label variation. We discuss their merits and limitations, and take the position that both are crucial for trustworthy and fair NLP systems, but that exploiting a single predictive distribution is limiting. We recommend tools and highlight exciting directions towards models with disentangled representations of uncertainty about predictions and uncertainty about human labels. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: EACL 2024 main

arXiv:2402.06509 [pdf, other]

Asking the Right Question at the Right Time: Human and Model Uncertainty Guidance to Ask Clarification Questions

Authors: Alberto Testoni, Raquel Fernández

Abstract: Clarification questions are an essential dialogue tool to signal misunderstanding, ambiguities, and under-specification in language use. While humans are able to resolve uncertainty by asking questions since childhood, modern dialogue systems struggle to generate effective questions. To make progress in this direction, in this work we take a collaborative dialogue task as a testbed and study how m… ▽ More Clarification questions are an essential dialogue tool to signal misunderstanding, ambiguities, and under-specification in language use. While humans are able to resolve uncertainty by asking questions since childhood, modern dialogue systems struggle to generate effective questions. To make progress in this direction, in this work we take a collaborative dialogue task as a testbed and study how model uncertainty relates to human uncertainty -- an as yet under-explored problem. We show that model uncertainty does not mirror human clarification-seeking behavior, which suggests that using human clarification questions as supervision for deciding when to ask may not be the most effective way to resolve model uncertainty. To address this issue, we propose an approach to generating clarification questions based on model uncertainty estimation, compare it to several alternatives, and show that it leads to significant improvements in terms of task success. Our findings highlight the importance of equipping dialogue systems with the ability to assess their own uncertainty and exploit in interaction. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: Accepted at EACL 2024

arXiv:2402.01352 [pdf, other]

Describing Images $\textit{Fast and Slow}$: Quantifying and Predicting the Variation in Human Signals during Visuo-Linguistic Processes

Authors: Ece Takmaz, Sandro Pezzelle, Raquel Fernández

Abstract: There is an intricate relation between the properties of an image and how humans behave while describing the image. This behavior shows ample variation, as manifested in human signals such as eye movements and when humans start to describe the image. Despite the value of such signals of visuo-linguistic variation, they are virtually disregarded in the training of current pretrained models, which m… ▽ More There is an intricate relation between the properties of an image and how humans behave while describing the image. This behavior shows ample variation, as manifested in human signals such as eye movements and when humans start to describe the image. Despite the value of such signals of visuo-linguistic variation, they are virtually disregarded in the training of current pretrained models, which motivates further investigation. Using a corpus of Dutch image descriptions with concurrently collected eye-tracking data, we explore the nature of the variation in visuo-linguistic signals, and find that they correlate with each other. Given this result, we hypothesize that variation stems partly from the properties of the images, and explore whether image representations encoded by pretrained vision encoders can capture such variation. Our results indicate that pretrained models do so to a weak-to-moderate degree, suggesting that the models lack biases about what makes a stimulus complex for humans and what leads to variations in human outputs. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: To appear in EACL 2024

arXiv:2311.01460 [pdf, ps, other]

Implicit Chain of Thought Reasoning via Knowledge Distillation

Authors: Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber

Abstract: To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternat… ▽ More To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternative reasoning approach: instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning. The implicit reasoning steps are distilled from a teacher model trained on explicit chain-of-thought reasoning, and instead of doing reasoning "horizontally" by producing intermediate words one-by-one, we distill it such that the reasoning happens "vertically" among the hidden states in different layers. We conduct experiments on a multi-digit multiplication task and a grade school math problem dataset and find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.17843 [pdf, other]

A Data-Centric Online Market for Machine Learning: From Discovery to Pricing

Authors: Minbiao Han, Jonathan Light, Steven Xia, Sainyam Galhotra, Raul Castro Fernandez, Haifeng Xu

Abstract: Data fuels machine learning (ML) - rich and high-quality training data is essential to the success of ML. However, to transform ML from the race among a few large corporations to an accessible technology that serves numerous normal users' data analysis requests, there still exist important challenges. One gap we observed is that many ML users can benefit from new data that other data owners posses… ▽ More Data fuels machine learning (ML) - rich and high-quality training data is essential to the success of ML. However, to transform ML from the race among a few large corporations to an accessible technology that serves numerous normal users' data analysis requests, there still exist important challenges. One gap we observed is that many ML users can benefit from new data that other data owners possess, whereas these data owners sit on piles of data without knowing who can benefit from it. This gap creates the opportunity for building an online market that can automatically connect supply with demand. While online matching markets are prevalent (e.g., ride-hailing systems), designing a data-centric market for ML exhibits many unprecedented challenges. This paper develops new techniques to tackle two core challenges in designing such a market: (a) to efficiently match demand with supply, we design an algorithm to automatically discover useful data for any ML task from a pool of thousands of datasets, achieving high-quality matching between ML models and data; (b) to encourage market participation of ML users without much ML expertise, we design a new pricing mechanism for selling data-augmented ML models. Furthermore, our market is designed to be API-compatible with existing online ML markets like Vertex AI and Sagemaker, making it easy to use while providing better results due to joint data and model search. We envision that the synergy of our data and model discovery algorithm and pricing mechanism will be an important step towards building a new data-centric online market that serves ML users effectively. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.17770 [pdf, other]

GROOViST: A Metric for Grounding Objects in Visual Storytelling

Authors: Aditya K Surikuchi, Sandro Pezzelle, Raquel Fernández

Abstract: A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding. In this work, we focus on evaluating the degree of grounding, that is, the extent to which a story is about the entities shown in the images. We analyze current metrics, both de… ▽ More A proper evaluation of stories generated for a sequence of images -- the task commonly referred to as visual storytelling -- must consider multiple aspects, such as coherence, grammatical correctness, and visual grounding. In this work, we focus on evaluating the degree of grounding, that is, the extent to which a story is about the entities shown in the images. We analyze current metrics, both designed for this purpose and for general vision-text alignment. Given their observed shortcomings, we propose a novel evaluation tool, GROOViST, that accounts for cross-modal dependencies, temporal misalignments (the fact that the order in which entities appear in the story and the image sequence may not match), and human intuitions on visual grounding. An additional advantage of GROOViST is its modular design, where the contribution of each component can be assessed and interpreted individually. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: In EMNLP 2023 main conference proceedings (to appear)

arXiv:2310.15061 [pdf, other]

The BLA Benchmark: Investigating Basic Language Abilities of Pre-Trained Multimodal Models

Authors: Xinyi Chen, Raquel Fernández, Sandro Pezzelle

Abstract: Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions -- active-passive voice, coordination, and relative clauses -- that even preschool children can typically mast… ▽ More Despite the impressive performance achieved by pre-trained language-and-vision models in downstream tasks, it remains an open question whether this reflects a proper understanding of image-text interaction. In this work, we explore to what extent they handle basic linguistic constructions -- active-passive voice, coordination, and relative clauses -- that even preschool children can typically master. We present BLA, a novel, automatically constructed benchmark to evaluate multimodal models on these Basic Language Abilities. We show that different types of Transformer-based systems, such as CLIP, ViLBERT, and BLIP2, generally struggle with BLA in a zero-shot setting, in line with previous findings. Our experiments, in particular, show that most of the tested models only marginally benefit when fine-tuned or prompted with construction-specific samples. Yet, the generative BLIP2 shows promising trends, especially in an in-context learning setting. This opens the door to using BLA not only as an evaluation benchmark but also to improve models' basic language abilities. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: This is the camera-ready version of the paper that will be published in the Proceedings of EMNLP 2023 (Singapore, 6-10 December 2023)

arXiv:2310.13676 [pdf, other]

Information Value: Measuring Utterance Predictability as Distance from Plausible Alternatives

Authors: Mario Giulianelli, Sarenne Wallbridge, Raquel Fernández

Abstract: We present information value, a measure which quantifies the predictability of an utterance relative to a set of plausible alternatives. We introduce a method to obtain interpretable estimates of information value using neural text generators, and exploit their psychometric predictive power to investigate the dimensions of predictability that drive human comprehension behaviour. Information value… ▽ More We present information value, a measure which quantifies the predictability of an utterance relative to a set of plausible alternatives. We introduce a method to obtain interpretable estimates of information value using neural text generators, and exploit their psychometric predictive power to investigate the dimensions of predictability that drive human comprehension behaviour. Information value is a stronger predictor of utterance acceptability in written and spoken dialogue than aggregates of token-level surprisal and it is complementary to surprisal for predicting eye-tracked reading times. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 (Main, Long paper)

arXiv:2310.13104 [pdf, other]

Making Differential Privacy Easier to Use for Data Controllers and Data Analysts using a Privacy Risk Indicator and an Escrow-Based Platform

Authors: Zhiru Zhu, Raul Castro Fernandez

Abstract: Differential privacy (DP) enables private data analysis but is hard to use in practice. For data controllers who decide what output to release, choosing the amount of noise to add to the output is a non-trivial task because of the difficulty of interpreting the privacy parameter $ε$. For data analysts who submit queries, it is hard to understand the impact of the noise introduced by DP on their ta… ▽ More Differential privacy (DP) enables private data analysis but is hard to use in practice. For data controllers who decide what output to release, choosing the amount of noise to add to the output is a non-trivial task because of the difficulty of interpreting the privacy parameter $ε$. For data analysts who submit queries, it is hard to understand the impact of the noise introduced by DP on their tasks. To address these two challenges: 1) we define a privacy risk indicator that indicates the impact of choosing $ε$ on individuals' privacy and use that to design an algorithm to choose $ε$ and release output based on controllers' privacy preferences; 2) we introduce a utility signaling protocol that helps analysts interpret the impact of DP on their downstream tasks. We implement the algorithm and the protocol inside a new platform built on top of a data escrow, which allows controllers to control dataflows while maintaining high performance. We demonstrate our contributions through an IRB-approved user study, extensive experimental evaluations, and comparison with other DP platforms. All in all, our work contributes to making DP easier to use by lowering adoption barriers. △ Less

Submitted 2 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.10378 [pdf, other]

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

Authors: Jirui Qi, Raquel Fernández, Arianna Bisazza

Abstract: Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual P… ▽ More Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Ranking-based Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score. △ Less

Submitted 9 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: Accepted at EMNLP2023 main conference. All code and data are released at https://github.com/Betswish/Cross-Lingual-Consistency

arXiv:2309.11210 [pdf, other]

Speak While You Think: Streaming Speech Synthesis During Text Generation

Authors: Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory

Abstract: Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant l… ▽ More Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant latency reduction. LLM2Speech mimics the predictions of a non-streaming teacher model while limiting the exposure to future context in order to enable streaming. It exploits the hidden embeddings of the LLM, a by-product of the text generation that contains informative semantic context. Experimental results show that LLM2Speech maintains the teacher's quality while reducing the latency to enable natural conversations. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: Under review for ICASSP 2024

arXiv:2308.10680 [pdf, other]

doi 10.1109/WACV57701.2024.00396

Co-Speech Gesture Detection through Multi-Phase Sequence Labeling

Authors: Esam Ghaleb, Ilya Burenko, Marlou Rasenberg, Wim Pouw, Peter Uhrig, Judith Holler, Ivan Toni, Aslı Özyürek, Raquel Fernández

Abstract: Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and retraction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual… ▽ More Gestures are integral components of face-to-face communication. They unfold over time, often following predictable movement phases of preparation, stroke, and retraction. Yet, the prevalent approach to automatic gesture detection treats the problem as binary classification, classifying a segment as either containing a gesture or not, thus failing to capture its inherently sequential and contextual nature. To address this, we introduce a novel framework that reframes the task as a multi-phase sequence labeling problem rather than binary classification. Our model processes sequences of skeletal movements over time windows, uses Transformer encoders to learn contextual embeddings, and leverages Conditional Random Fields to perform sequence labeling. We evaluate our proposal on a large dataset of diverse co-speech gestures in task-oriented face-to-face dialogues. The results consistently demonstrate that our method significantly outperforms strong baseline models in detecting gesture strokes. Furthermore, applying Transformer encoders to learn contextual embeddings from movement sequences substantially improves gesture unit detection. These results highlight our framework's capacity to capture the fine-grained dynamics of co-speech gesture phases, paving the way for more nuanced and accurate gesture detection and analysis. △ Less

Submitted 23 April, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.05055 [pdf, other]

Enhancement of Direct LEO Satellite-to-Smartphone Communications by Distributed Beamforming

Authors: Zhuoao Xu, Gaojie Chen, Ryan Fernandez, Yue Gao, Rahim Tafazolli

Abstract: The low earth orbit (LEO) satellite network is undergoing rapid development with the maturing of satellite communications and rocket launch technologies, and the demand for a global coverage network. However, current satellite communication networks are constrained by limited transmitting signal power, resulting in the use of large-size and energy-consuming ground terminals to provide additional g… ▽ More The low earth orbit (LEO) satellite network is undergoing rapid development with the maturing of satellite communications and rocket launch technologies, and the demand for a global coverage network. However, current satellite communication networks are constrained by limited transmitting signal power, resulting in the use of large-size and energy-consuming ground terminals to provide additional gain. This paper proposes a novel technology called distributed beamforming to address such challenges and support direct communications from LEO satellites to smartphones. The proposed distributed beamforming technique is based on the superposition of electromagnetic (EM) waves and aims to enhance the received signal strength. Furthermore, we utilize EM wave superposition to increase the link budget and provide the coverage pattern formed by the distributed antenna array, which will be affected by the array structure and the transmitter parameters. In addition, the impact of Doppler frequency shift and time misalignment on the performance of distributed beamforming is investigated. Numerical results show that the enhancement of the received power depends on the angle formed by those radiated beams and can be up to the square of the number of beams; namely, a maximum enhancement of 6 dB could be obtained by using two satellites and a maximum of 12 dB increase through four satellites, which provide a clear guideline for the design of distributed beamforming for future satellite communications. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 11 pages, 14 figures, 2 tables

arXiv:2308.04818 [pdf, other]

Enhancement of Satellite-to-Phone Link Budget by Using Distributed Beamforming

Authors: Zhuoao Xu, Yue Gao, Gaojie Chen, Ryan Fernandez, Vedaprabhu Basavarajappa, Rahim Tafazolli

Abstract: Small satellites in Low Earth Orbit (LEO) attract much attention from both industry and academia. The latest production and launch technologies constantly drive the development of LEO constellations. However, the wideband signal, except text messages, cannot be transmitted directly from an LEO satellite to a standard mobile cellular phone due to the insufficient link budget. The current LEO conste… ▽ More Small satellites in Low Earth Orbit (LEO) attract much attention from both industry and academia. The latest production and launch technologies constantly drive the development of LEO constellations. However, the wideband signal, except text messages, cannot be transmitted directly from an LEO satellite to a standard mobile cellular phone due to the insufficient link budget. The current LEO constellation network has to use an extra ground device to receive the signal from the satellite first and then forward the signal to the User Equipment (UE). To achieve direct network communications between LEO satellites and UE, we propose a novel distributed beamforming technology based on the superposition of electromagnetic (EM) waves radiated from multiple satellites that can significantly enhance the link budget in this paper. EM full-wave simulation and Monte Carlo simulation results are provided to verify the effectiveness of the proposed method. The simulation results show a nearly 6 dB enhancement using two radiation sources and an almost 12 dB enhancement using four sources. The received power enhancement could be doubled compared to the diversity gain in Multiple-Input and Single-Output (MISO). Furthermore, other practical application challenges, such as the synchronization and Doppler effect, are also presented. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 8 pages, 6 figures, 1 table

arXiv:2307.15703 [pdf, other]

Uncertainty in Natural Language Generation: From Theory to Applications

Authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz

Abstract: Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely… ▽ More Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles -- reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2307.13163 [pdf, other]

Advancing Robot Autonomy for Long-Horizon Tasks

Authors: Isabel M. Rayas Fernández

Abstract: Autonomous robots have real-world applications in diverse fields, such as mobile manipulation and environmental exploration, and many such tasks benefit from a hands-off approach in terms of human user involvement over a long task horizon. However, the level of autonomy achievable by a deployment is limited in part by the problem definition or task specification required by the system. Task specif… ▽ More Autonomous robots have real-world applications in diverse fields, such as mobile manipulation and environmental exploration, and many such tasks benefit from a hands-off approach in terms of human user involvement over a long task horizon. However, the level of autonomy achievable by a deployment is limited in part by the problem definition or task specification required by the system. Task specifications often require technical, low-level information that is unintuitive to describe and may result in generic solutions, burdening the user technically both before and after task completion. In this thesis, we aim to advance task specification abstraction toward the goal of increasing robot autonomy in real-world scenarios. We do so by tackling problems that address several different angles of this goal. First, we develop a way for the automatic discovery of optimal transition points between subtasks in the context of constrained mobile manipulation, removing the need for the human to hand-specify these in the task specification. We further propose a way to automatically describe constraints on robot motion by using demonstrated data as opposed to manually-defined constraints. Then, within the context of environmental exploration, we propose a flexible task specification framework, requiring just a set of quantiles of interest from the user that allows the robot to directly suggest locations in the environment for the user to study. We next systematically study the effect of including a robot team in the task specification and show that multirobot teams have the ability to improve performance under certain specification conditions, including enabling inter-robot communication. Finally, we propose methods for a communication protocol that autonomously selects useful but limited information to share with the other robots. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: PhD dissertation. 160 pages

arXiv:2307.00432 [pdf, other]

Saibot: A Differentially Private Data Search Platform

Authors: Zezhou Huang, Jiaxiang Liu, Daniel Alabi, Raul Castro Fernandez, Eugene Wu

Abstract: Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset and these platforms search for augmentations (join or union compatible datasets) that, when used to augment the requester's dataset, most improve model (e.g., linear regression) performance. Although effective, providers that man… ▽ More Recent data search platforms use ML task-based utility measures rather than metadata-based keywords, to search large dataset corpora. Requesters submit a training dataset and these platforms search for augmentations (join or union compatible datasets) that, when used to augment the requester's dataset, most improve model (e.g., linear regression) performance. Although effective, providers that manage personally identifiable data demand differential privacy (DP) guarantees before granting these platforms data access. Unfortunately, making data search differentially private is nontrivial, as a single search can involve training and evaluating datasets hundreds or thousands of times, quickly depleting privacy budgets. We present Saibot, a differentially private data search platform that employs Factorized Privacy Mechanism (FPM), a novel DP mechanism, to calculate sufficient semi-ring statistics for ML over different combinations of datasets. These statistics are privatized once, and can be freely reused for the search. This allows Saibot to scale to arbitrary numbers of datasets and requests, while minimizing the amount that DP noise affects search results. We optimize the sensitivity of FPM for common augmentation operations, and analyze its properties with respect to linear regression. Specifically, we develop an unbiased estimator for many-to-many joins, prove its bounds, and develop an optimization to redistribute DP noise to minimize the impact on the model. Our evaluation on a real-world dataset corpus of 329 datasets demonstrates that Saibot can return augmentations that achieve model accuracy within 50 to 90% of non-private search, while the leading alternative DP mechanisms (TPM, APM, shuffling) are several orders of magnitude worse. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Journal ref: VLDB 2023

arXiv:2306.17747 [pdf, other]

Discriminatory or Samaritan -- which AI is needed for humanity? An Evolutionary Game Theory Analysis of Hybrid Human-AI populations

Authors: Tim Booker, Manuel Miranda, Jesús A. Moreno López, José María Ramos Fernández, Max Reddel, Valeria Widler, Filippo Zimmaro, Alberto Antonioni, The Anh Han

Abstract: As artificial intelligence (AI) systems are increasingly embedded in our lives, their presence leads to interactions that shape our behaviour, decision-making, and social interactions. Existing theoretical research has primarily focused on human-to-human interactions, overlooking the unique dynamics triggered by the presence of AI. In this paper, resorting to methods from evolutionary game theory,… ▽ More As artificial intelligence (AI) systems are increasingly embedded in our lives, their presence leads to interactions that shape our behaviour, decision-making, and social interactions. Existing theoretical research has primarily focused on human-to-human interactions, overlooking the unique dynamics triggered by the presence of AI. In this paper, resorting to methods from evolutionary game theory, we study how different forms of AI influence the evolution of cooperation in a human population playing the one-shot Prisoner's Dilemma game in both well-mixed and structured populations. We found that Samaritan AI agents that help everyone unconditionally, including defectors, can promote higher levels of cooperation in humans than Discriminatory AI that only help those considered worthy/cooperative, especially in slow-moving societies where change is viewed with caution or resistance (small intensities of selection). Intuitively, in fast-moving societies (high intensities of selection), Discriminatory AIs promote higher levels of cooperation than Samaritan AIs. △ Less

Submitted 3 July, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

Comments: This work is the result of the Complexity72h 2023 workshop

arXiv:2306.02543 [pdf, other]

Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm

Authors: Boxin Zhao, Boxiang Lyu, Raul Castro Fernandez, Mladen Kolar

Abstract: High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers pay to train a model, the market uses that budget to identify data and train the model (the budget allocation problem), and finally the market compensa… ▽ More High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers pay to train a model, the market uses that budget to identify data and train the model (the budget allocation problem), and finally the market compensates data providers according to their data contribution (revenue allocation problem). For example, a bank could pay the data market to access data from other financial institutions to train a fraud detection model. Compensating data contributors requires understanding data's contribution to the model; recent efforts to solve this revenue allocation problem based on the Shapley value are inefficient to lead to practical data markets. In this paper, we introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time. The new algorithm employs an adaptive sampling process that selects data from those providers who are contributing the most to the model. Better data means that the algorithm accesses those providers more often, and more frequent accesses corresponds to higher compensation. Furthermore, the algorithm can be deployed in both centralized and federated scenarios, boosting its applicability. We provide theoretical guarantees for the algorithm that show the budget is used efficiently and the properties of revenue allocation are similar to Shapley's. Finally, we conduct an empirical evaluation to show the performance of the algorithm in practical scenarios and when compared to other baselines. Overall, we believe that the new algorithm paves the way for the implementation of practical data markets. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: Published on International Conference on Machine Learning (ICML) 2023

arXiv:2306.00751 [pdf, other]

Differentiable Tree Operations Promote Compositional Generalization

Authors: Paul Soulos, Edward Hu, Kate McCurdy, Yunmo Chen, Roland Fernandez, Paul Smolensky, Jianfeng Gao

Abstract: In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Di… ▽ More In the context of structure-to-structure transformation tasks, learning sequences of discrete symbolic operations poses significant challenges due to their non-differentiability. To facilitate the learning of these symbolic sequences, we introduce a differentiable tree interpreter that compiles high-level symbolic tree operations into subsymbolic matrix operations on tensors. We present a novel Differentiable Tree Machine (DTM) architecture that integrates our interpreter with an external memory and an agent that learns to sequentially select tree operations to execute the target transformation in an end-to-end manner. With respect to out-of-distribution compositional generalization on synthetic semantic parsing and language generation tasks, DTM achieves 100% while existing baselines such as Transformer, Tree Transformer, LSTM, and Tree2Tree LSTM achieve less than 30%. DTM remains highly interpretable in addition to its perfect performance. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: ICML 2023. Code available at https://github.com/psoulos/dtm

arXiv:2305.19933 [pdf, other]

Speaking the Language of Your Listener: Audience-Aware Adaptation via Plug-and-Play Theory of Mind

Authors: Ece Takmaz, Nicolo' Brandizzi, Mario Giulianelli, Sandro Pezzelle, Raquel Fernández

Abstract: Dialogue participants may have varying levels of knowledge about the topic under discussion. In such cases, it is essential for speakers to adapt their utterances by taking their audience into account. Yet, it is an open question how such adaptation can be modelled in computational agents. In this paper, we model a visually grounded referential game between a knowledgeable speaker and a listener w… ▽ More Dialogue participants may have varying levels of knowledge about the topic under discussion. In such cases, it is essential for speakers to adapt their utterances by taking their audience into account. Yet, it is an open question how such adaptation can be modelled in computational agents. In this paper, we model a visually grounded referential game between a knowledgeable speaker and a listener with more limited visual and linguistic experience. Inspired by psycholinguistic theories, we endow our speaker with the ability to adapt its referring expressions via a simulation module that monitors the effectiveness of planned utterances from the listener's perspective. We propose an adaptation mechanism building on plug-and-play approaches to controlled language generation, where utterance generation is steered on the fly by the simulator without finetuning the speaker's underlying language model. Our results and analyses show that our approach is effective: the speaker's utterances become closer to the listener's domain of expertise, which leads to higher communicative success. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: To appear in Findings of ACL 2023

arXiv:2305.12050 [pdf, other]

AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation

Authors: Vijayaraghavan Murali, Chandra Maddila, Imad Ahmad, Michael Bolin, Daniel Cheng, Negar Ghorbani, Renuka Fernandez, Nachiappan Nagappan, Peter C. Rigby

Abstract: Generative LLMs have been shown to effectively power AI-based code authoring tools that can suggest entire statements or blocks of code during code authoring. In this paper we present CodeCompose, an AI-assisted code authoring tool developed and deployed at Meta internally. CodeCompose is based on the InCoder LLM that merges generative capabilities with bi-directionality. We have scaled up CodeCom… ▽ More Generative LLMs have been shown to effectively power AI-based code authoring tools that can suggest entire statements or blocks of code during code authoring. In this paper we present CodeCompose, an AI-assisted code authoring tool developed and deployed at Meta internally. CodeCompose is based on the InCoder LLM that merges generative capabilities with bi-directionality. We have scaled up CodeCompose to serve tens of thousands of developers at Meta, across 9 programming languages and several coding surfaces. We present our experience in making design decisions about the model and system architecture for CodeCompose that addresses these challenges. To release a LLM model at this scale, we needed to first ensure that it is sufficiently accurate. In a random sample of 20K source code files, depending on the language, we are able to reproduce hidden lines between 40% and 58% of the time, an improvement of 1.4x and 4.1x over a model trained only on public data. We gradually rolled CodeCompose out to developers. At the time of this writing, 16K developers have used it with 8% of their code coming directly from CodeCompose. To triangulate our numerical findings, we conduct a thematic analysis on the feedback from 70 developers. We find that 91.5% of the feedback is positive, with the most common themes being discovering APIs, dealing with boilerplate code, and accelerating coding. Meta continues to integrate this feedback into CodeCompose. △ Less

Submitted 16 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.11993 [pdf, other]

Interpretable Word Sense Representations via Definition Generation: The Case of Semantic Change Analysis

Authors: Mario Giulianelli, Iris Luden, Raquel Fernandez, Andrey Kutuzov

Abstract: We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition… ▽ More We propose using automatically generated natural language definitions of contextualised word usages as interpretable word and word sense representations. Given a collection of usage examples for a target word, and the corresponding data-driven usage clusters (i.e., word senses), a definition is generated for each usage with a specialised Flan-T5 language model, and the most prototypical definition in a usage cluster is chosen as the sense label. We demonstrate how the resulting sense labels can make existing approaches to semantic change analysis more interpretable, and how they can allow users -- historical linguists, lexicographers, or social scientists -- to explore and intuitively explain diachronic trajectories of word meaning. Semantic change analysis is only one of many possible applications of the `definitions as representations' paradigm. Beyond being human-readable, contextualised definitions also outperform token or usage sentence embeddings in word-in-context semantic similarity judgements, making them a new promising type of lexical representation for NLP. △ Less

Submitted 25 July, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.11707 [pdf, other]

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

Authors: Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank

Abstract: In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of o… ▽ More In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of output strings shaped by a generation system's predicted probability distribution and decoding algorithm to probe its uncertainty. For each test input, we measure the generator's calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model's representation of uncertainty. Code available at https://github.com/dmg-illc/nlg-uncertainty-probes. △ Less

Submitted 20 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Camera ready version for EMNLP 2023

arXiv:2305.10419 [pdf, other]

Kitana: Efficient Data Augmentation Search for AutoML

Authors: Zezhou Huang, Pranav Subramaniam, Raul Castro Fernandez, Eugene Wu

Abstract: AutoML services provide a way for non-expert users to benefit from high-quality ML models without worrying about model design and deployment, in exchange for a charge per hour ($21.252 for VertexAI). However, existing AutoML services are model-centric, in that they are limited to extracting features and searching for models from initial training data-they are only as effective as the initial train… ▽ More AutoML services provide a way for non-expert users to benefit from high-quality ML models without worrying about model design and deployment, in exchange for a charge per hour ($21.252 for VertexAI). However, existing AutoML services are model-centric, in that they are limited to extracting features and searching for models from initial training data-they are only as effective as the initial training data quality. With the increasing volume of tabular data available, there is a huge opportunity for data augmentation. For instance, vertical augmentation adds predictive features, while horizontal augmentation adds examples. This augmented training data yields potentially much better AutoML models at a lower cost. However, existing systems either forgo the augmentation opportunities that provide poor models, or apply expensive augmentation searching techniques that drain users' budgets. Kitana is a data-centric AutoML system that also searches for new tabular datasets that can augment the tabular training data with new features and/or examples. Kitana manages a corpus of datasets, exposes an AutoML interface to users and searches for augmentation with datasets in the corpus to improve AutoML performance. To accelerate search, Kitana applies aggressive pre-computation to train a factorized proxy model and evaluate each candidate augmentation within 0.1s. Kitana also uses a cost model to limit the time spent on augmentation search, supports expressive data access controls, and performs request caching to benefit from past similar requests. Using a corpus of 518 open-source datasets, Kitana produces higher quality models than existing AutoML systems in orders of magnitude less time. Across different user requests, Kitana increases the model R2 from 0.16 to 0.66 while reducing the cost by >100x compared to the naive factorized learning and SOTA data augmentation search. △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.03842 [pdf, other]

Data Station: Delegated, Trustworthy, and Auditable Computation to Enable Data-Sharing Consortia with a Data Escrow

Authors: Siyuan Xia, Zhiru Zhu, Chris Zhu, Jinjin Zhao, Kyle Chard, Aaron J. Elmore, Ian Foster, Michael Franklin, Sanjay Krishnan, Raul Castro Fernandez

Abstract: Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively controlling what data to release is difficult, the few data-sharing consortia that exist are often built around data-sharing agreements resulting from long… ▽ More Pooling and sharing data increases and distributes its value. But since data cannot be revoked once shared, scenarios that require controlled release of data for regulatory, privacy, and legal reasons default to not sharing. Because selectively controlling what data to release is difficult, the few data-sharing consortia that exist are often built around data-sharing agreements resulting from long and tedious one-off negotiations. We introduce Data Station, a data escrow designed to enable the formation of data-sharing consortia. Data owners share data with the escrow knowing it will not be released without their consent. Data users delegate their computation to the escrow. The data escrow relies on delegated computation to execute queries without releasing the data first. Data Station leverages hardware enclaves to generate trust among participants, and exploits the centralization of data and computation to generate an audit log. We evaluate Data Station on machine learning and data-sharing applications while running on an untrusted intermediary. In addition to important qualitative advantages, we show that Data Station: i) outperforms federated learning baselines in accuracy and runtime for the machine learning application; ii) is orders of magnitude faster than alternative secure data-sharing frameworks; and iii) introduces small overhead on the critical path. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.09068 [pdf, other]

METAM: Goal-Oriented Data Discovery

Authors: Sainyam Galhotra, Yue Gong, Raul Castro Fernandez

Abstract: Data is a central component of machine learning and causal inference tasks. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to augment data and boost those tasks' performance. However, augmentation techniques rely on a user manually discovering and shortlisting useful candidate augmentations. Existing so… ▽ More Data is a central component of machine learning and causal inference tasks. The availability of large amounts of data from sources such as open data repositories, data lakes and data marketplaces creates an opportunity to augment data and boost those tasks' performance. However, augmentation techniques rely on a user manually discovering and shortlisting useful candidate augmentations. Existing solutions do not leverage the synergy between discovery and augmentation, thus under exploiting data. In this paper, we introduce METAM, a novel goal-oriented framework that queries the downstream task with a candidate dataset, forming a feedback loop that automatically steers the discovery and augmentation process. To select candidates efficiently, METAM leverages properties of the: i) data, ii) utility function, and iii) solution set size. We show METAM's theoretical guarantees and demonstrate those empirically on a broad set of tasks. All in all, we demonstrate the promise of goal-oriented data discovery to modern data science applications. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: ICDE 2023 paper

arXiv:2304.06873 [pdf, other]

Reducing Network Load via Message Utility Estimation for Decentralized Multirobot Teams

Authors: Isabel M. Rayas Fernández, Christopher E. Denniston, Gaurav S. Sukhatme

Abstract: We are motivated by quantile estimation of algae concentration in lakes and how decentralized multirobot teams can effectively tackle this problem. We find that multirobot teams improve performance in this task over single robots, and communication-enabled teams further over communication-deprived teams; however, real robots are resource-constrained, and communication networks cannot support arbit… ▽ More We are motivated by quantile estimation of algae concentration in lakes and how decentralized multirobot teams can effectively tackle this problem. We find that multirobot teams improve performance in this task over single robots, and communication-enabled teams further over communication-deprived teams; however, real robots are resource-constrained, and communication networks cannot support arbitrary message loads, making naive, constant information-sharing but also complex modeling and decision-making infeasible. With this in mind, we propose online, locally computable metrics for determining the utility of transmitting a given message to the other team members and a decision-theoretic approach that chooses to transmit only the most useful messages, using a decentralized and independent framework for maintaining beliefs of other teammates. We validate our approach in simulation on a real-world aquatic dataset, and we show that restricting communication via a utility estimation method based on the expected impact of a message on future teammate behavior results in a 42% decrease in network load while simultaneously decreasing quantile estimation error by 1.84%. △ Less

Submitted 6 July, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 7 pages, 1 table, 7 figures

arXiv:2303.03539 [pdf, other]

A Study on Multirobot Quantile Estimation in Natural Environments

Authors: Isabel M. Rayas Fernández, Christopher E. Denniston, Gaurav S. Sukhatme

Abstract: Quantiles of a natural phenomena can provide scientists with an important understanding of different spreads of concentrations. When there are several available robots, it may be advantageous to pool resources in a collaborative way to improve performance. A multirobot team can be difficult to practically bring together and coordinate. To this end, we present a study across several axes of the imp… ▽ More Quantiles of a natural phenomena can provide scientists with an important understanding of different spreads of concentrations. When there are several available robots, it may be advantageous to pool resources in a collaborative way to improve performance. A multirobot team can be difficult to practically bring together and coordinate. To this end, we present a study across several axes of the impact of using multiple robots to estimate quantiles of a distribution of interest using an informative path planning formulation. We measure quantile estimation accuracy with increasing team size to understand what benefits result from a multirobot approach in a drone exploration task of analyzing the algae concentration in lakes. We additionally perform an analysis on several parameters, including the spread of robot initial positions, the planning budget, and inter-robot communication, and find that while using more robots generally results in lower estimation error, this benefit is achieved under certain conditions. We present our findings in the context of real field robotic applications and discuss the implications of the results and interesting directions for future work. △ Less

Submitted 6 July, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 7 pages, 2 tables, 7 figures

arXiv:2302.05033 [pdf, other]

doi 10.1109/3ICT56508.2022.9990696

Short-Term Aggregated Residential Load Forecasting using BiLSTM and CNN-BiLSTM

Authors: Bharat Bohara, Raymond I. Fernandez, Vysali Gollapudi, Xingpeng Li

Abstract: Higher penetration of renewable and smart home technologies at the residential level challenges grid stability as utility-customer interactions add complexity to power system operations. In response, short-term residential load forecasting has become an increasing area of focus. However, forecasting at the residential level is challenging due to the higher uncertainties involved. Recently deep neu… ▽ More Higher penetration of renewable and smart home technologies at the residential level challenges grid stability as utility-customer interactions add complexity to power system operations. In response, short-term residential load forecasting has become an increasing area of focus. However, forecasting at the residential level is challenging due to the higher uncertainties involved. Recently deep neural networks have been leveraged to address this issue. This paper investigates the capabilities of a bidirectional long short-term memory (BiLSTM) and a convolutional neural network-based BiLSTM (CNN-BiLSTM) to provide a day ahead (24 hr.) forecasting at an hourly resolution while minimizing the root mean squared error (RMSE) between the actual and predicted load demand. Using a publicly available dataset consisting of 38 homes, the BiLSTM and CNN-BiLSTM models are trained to forecast the aggregated active power demand for each hour within a 24 hr. span, given the previous 24 hr. load data. The BiLSTM model achieved the lowest RMSE of 1.4842 for the overall daily forecast. In addition, standard LSTM and CNN-LSTM models are trained and compared with the BiLSTM architecture. The RMSE of BiLSTM is 5.60%, 2.85% and 2.60% lower than the LSTM, CNN-LSTM and CNN-BiLSTM models respectively. The source code of this work is available at https://github.com/Varat7v2/STLF-BiLSTM-CNNBiLSTM.git. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: This article has been accepted for publication in 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). This preprint is for personal use - that is solely for the purpose of research, but republication/redistribution requires IEEE permission. Please check IEEE website for more information

Journal ref: 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

arXiv:2301.03560 [pdf, other]

Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach

Authors: Qiming Wang, Raul Castro Fernandez

Abstract: Most deployed data discovery systems, such as Google Datasets, and open data portals only support keyword search. Keyword search is geared towards general audiences but limits the types of queries the systems can answer. We propose a new system that lets users write natural language questions directly. A major barrier to using this learned data discovery system is it needs expensive-to-collect tra… ▽ More Most deployed data discovery systems, such as Google Datasets, and open data portals only support keyword search. Keyword search is geared towards general audiences but limits the types of queries the systems can answer. We propose a new system that lets users write natural language questions directly. A major barrier to using this learned data discovery system is it needs expensive-to-collect training data, thus limiting its utility. In this paper, we introduce a self-supervised approach to assemble training datasets and train learned discovery systems without human intervention. It requires addressing several challenges, including the design of self-supervised strategies for data discovery, table representation strategies to feed to the models, and relevance models that work well with the synthetically generated questions. We combine all the above contributions into a system, Solo, that solves the problem end to end. The evaluation results demonstrate the new techniques outperform state-of-the-art approaches on well-known benchmarks. All in all, the technique is a stepping stone towards building learned discovery systems. The code is open-sourced at https://github.com/TheDataStation/solo △ Less

Submitted 17 October, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: To appear at Sigmod 2024

arXiv:2210.16133 [pdf, other]

Stop Measuring Calibration When Humans Disagree

Authors: Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández

Abstract: Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We sh… ▽ More Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy. △ Less

Submitted 30 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022

arXiv:2210.08321 [pdf, other]

Construction Repetition Reduces Information Rate in Dialogue

Authors: Mario Giulianelli, Arabella Sinclair, Raquel Fernández

Abstract: Speakers repeat constructions frequently in dialogue. Due to their peculiar information-theoretic properties, repetitions can be thought of as a strategy for cost-effective communication. In this study, we focus on the repetition of lexicalised constructions -- i.e., recurring multi-word units -- in English open-domain spoken dialogues. We hypothesise that speakers use construction repetition to m… ▽ More Speakers repeat constructions frequently in dialogue. Due to their peculiar information-theoretic properties, repetitions can be thought of as a strategy for cost-effective communication. In this study, we focus on the repetition of lexicalised constructions -- i.e., recurring multi-word units -- in English open-domain spoken dialogues. We hypothesise that speakers use construction repetition to mitigate information rate, leading to an overall decrease in utterance information content over the course of a dialogue. We conduct a quantitative analysis, measuring the information content of constructions and that of their containing utterances, estimating information content with an adaptive neural language model. We observe that construction usage lowers the information content of utterances. This facilitating effect (i) increases throughout dialogues, (ii) is boosted by repetition, (iii) grows as a function of repetition frequency and density, and (iv) is stronger for repetitions of referential constructions. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2022)

Showing 1–50 of 106 results for author: Fernandez, R