Search | arXiv e-print repository

doi 10.1145/3678957.3678959

Everything We Hear: Towards Tackling Misinformation in Podcasts

Authors: Sachin Pathiyan Cherumanal, Ujwal Gadiraju, Damiano Spina

Abstract: Advances in generative AI, the proliferation of large multimodal models (LMMs), and democratized open access to these technologies have direct implications for the production and diffusion of misinformation. In this prequel, we address tackling misinformation in the unique and increasingly popular context of podcasts. The rise of podcasts as a popular medium for disseminating information across di… ▽ More Advances in generative AI, the proliferation of large multimodal models (LMMs), and democratized open access to these technologies have direct implications for the production and diffusion of misinformation. In this prequel, we address tackling misinformation in the unique and increasingly popular context of podcasts. The rise of podcasts as a popular medium for disseminating information across diverse topics necessitates a proactive strategy to combat the spread of misinformation. Inspired by the proven effectiveness of \textit{auditory alerts} in contexts like collision alerts for drivers and error pings in mobile phones, our work envisions the application of auditory alerts as an effective tool to tackle misinformation in podcasts. We propose the integration of suitable auditory alerts to notify listeners of potential misinformation within the podcasts they are listening to, in real-time and without hampering listening experiences. We identify several opportunities and challenges in this path and aim to provoke novel conversations around instruments, methods, and measures to tackle misinformation in podcasts. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: Accepted at ACM ICMI'24 (Third Place Blue Sky Paper)

arXiv:2405.12480 [pdf, other]

doi 10.1145/3640471.3680245

Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational Search

Authors: Kaixin Ji, Sachin Pathiyan Cherumanal, Johanne R. Trippas, Danula Hettiachchi, Flora D. Salim, Falk Scholer, Damiano Spina

Abstract: Instruments such as eye-tracking devices have contributed to understanding how users interact with screen-based search engines. However, user-system interactions in audio-only channels -- as is the case for Spoken Conversational Search (SCS) -- are harder to characterize, given the lack of instruments to effectively and precisely capture interactions. Furthermore, in this era of information overlo… ▽ More Instruments such as eye-tracking devices have contributed to understanding how users interact with screen-based search engines. However, user-system interactions in audio-only channels -- as is the case for Spoken Conversational Search (SCS) -- are harder to characterize, given the lack of instruments to effectively and precisely capture interactions. Furthermore, in this era of information overload, cognitive bias can significantly impact how we seek and consume information -- especially in the context of controversial topics or multiple viewpoints. This paper draws upon insights from multiple disciplines (including information seeking, psychology, cognitive science, and wearable sensors) to provoke novel conversations in the community. To this end, we discuss future opportunities and propose a framework including multimodal instruments and methods for experimental designs and settings. We demonstrate preliminary results as an example. We also outline the challenges and offer suggestions for adopting this multimodal approach, including ethical considerations, to assist future researchers and practitioners in exploring cognitive biases in SCS. △ Less

Submitted 6 August, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: Extended version of MobileHCI'24 LBW paper

arXiv:2405.03303 [pdf, other]

doi 10.1145/3626772.3657768

Explainability for Transparent Conversational Information-Seeking

Authors: Weronika Łajewska, Damiano Spina, Johanne Trippas, Krisztian Balog

Abstract: The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining t… ▽ More The increasing reliance on digital information necessitates advancements in conversational search systems, particularly in terms of information transparency. While prior research in conversational information-seeking has concentrated on improving retrieval techniques, the challenge remains in generating responses useful from a user perspective. This study explores different methods of explaining the responses, hypothesizing that transparency about the source of the information, system confidence, and limitations can enhance users' ability to objectively assess the response. By exploring transparency across explanation type, quality, and presentation mode, this research aims to bridge the gap between system-generated responses and responses verifiable by the user. We design a user study to answer questions concerning the impact of (1) the quality of explanations enhancing the response on its usefulness and (2) ways of presenting explanations to users. The analysis of the collected data reveals lower user ratings for noisy explanations, although these scores seem insensitive to the quality of the response. Inconclusive results on the explanations presentation format suggest that it may not be a critical factor in this setting. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: This is the author's version of the work. The definitive version is published in: 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14-18, 2024, Washington, DC, USA

arXiv:2405.00322 [pdf, other]

doi 10.1145/3626772.3657793

Characterizing Information Seeking Processes with Multiple Physiological Signals

Authors: Kaixin Ji, Danula Hettiachchi, Flora D. Salim, Falk Scholer, Damiano Spina

Abstract: Information access systems are getting complex, and our understanding of user behavior during information seeking processes is mainly drawn from qualitative methods, such as observational studies or surveys. Leveraging the advances in sensing technologies, our study aims to characterize user behaviors with physiological signals, particularly in relation to cognitive load, affective arousal, and va… ▽ More Information access systems are getting complex, and our understanding of user behavior during information seeking processes is mainly drawn from qualitative methods, such as observational studies or surveys. Leveraging the advances in sensing technologies, our study aims to characterize user behaviors with physiological signals, particularly in relation to cognitive load, affective arousal, and valence. We conduct a controlled lab study with 26 participants, and collect data including Electrodermal Activities, Photoplethysmogram, Electroencephalogram, and Pupillary Responses. This study examines informational search with four stages: the realization of Information Need (IN), Query Formulation (QF), Query Submission (QS), and Relevance Judgment (RJ). We also include different interaction modalities to represent modern systems, e.g., QS by text-typing or verbalizing, and RJ with text or audio information. We analyze the physiological signals across these stages and report outcomes of pairwise non-parametric repeated-measure statistical tests. The results show that participants experience significantly higher cognitive loads at IN with a subtle increase in alertness, while QF requires higher attention. QS involves demanding cognitive loads than QF. Affective responses are more pronounced at RJ than QS or IN, suggesting greater interest and engagement as knowledge gaps are resolved. To the best of our knowledge, this is the first study that explores user behaviors in a search process employing a more nuanced quantitative analysis of physiological signals. Our findings offer valuable insights into user behavior and emotional responses in information seeking processes. We believe our proposed methodology can inform the characterization of more complex processes, such as conversational information seeking. △ Less

Submitted 7 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

ACM Class: H.5; H.3.3; C.3

Journal ref: In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, Washington, DC, USA. ACM, New York, NY, USA, 12 pages

arXiv:2401.07216 [pdf, other]

doi 10.1145/3627508.3638309

Walert: Putting Conversational Search Knowledge into Action by Building and Evaluating a Large Language Model-Powered Chatbot

Authors: Sachin Pathiyan Cherumanal, Lin Tian, Futoon M. Abushaqra, Angel Felipe Magnossao de Paula, Kaixin Ji, Danula Hettiachchi, Johanne R. Trippas, Halil Ali, Falk Scholer, Damiano Spina

Abstract: Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a cus… ▽ More Creating and deploying customized applications is crucial for operational success and enriching user experiences in the rapidly evolving modern business world. A prominent facet of modern user experiences is the integration of chatbots or voice assistants. The rapid evolution of Large Language Models (LLMs) has provided a powerful tool to build conversational applications. We present Walert, a customized LLM-based conversational agent able to answer frequently asked questions about computer science degrees and programs at RMIT University. Our demo aims to showcase how conversational information-seeking researchers can effectively communicate the benefits of using best practices to stakeholders interested in developing and deploying LLM-based chatbots. These practices are well-known in our community but often overlooked by practitioners who may not have access to this knowledge. The methodology and resources used in this demo serve as a bridge to facilitate knowledge transfer from experts, address industry professionals' practical needs, and foster a collaborative environment. The data and code of the demo are available at https://github.com/rmit-ir/walert. △ Less

Submitted 14 January, 2024; originally announced January 2024.

Comments: Accepted at 2024 ACM SIGIR CHIIR

arXiv:2308.13755 [pdf, other]

doi 10.1007/s10618-023-00963-3

i-Align: an interpretable knowledge graph alignment model

Authors: Bayu Distiawan Trisedya, Flora D Salim, Jeffrey Chan, Damiano Spina, Falk Scholer, Mark Sanderson

Abstract: Knowledge graphs (KGs) are becoming essential resources for many downstream applications. However, their incompleteness may limit their potential. Thus, continuous curation is needed to mitigate this problem. One of the strategies to address this problem is KG alignment, i.e., forming a more complete KG by merging two or more KGs. This paper proposes i-Align, an interpretable KG alignment model. U… ▽ More Knowledge graphs (KGs) are becoming essential resources for many downstream applications. However, their incompleteness may limit their potential. Thus, continuous curation is needed to mitigate this problem. One of the strategies to address this problem is KG alignment, i.e., forming a more complete KG by merging two or more KGs. This paper proposes i-Align, an interpretable KG alignment model. Unlike the existing KG alignment models, i-Align provides an explanation for each alignment prediction while maintaining high alignment performance. Experts can use the explanation to check the correctness of the alignment prediction. Thus, the high quality of a KG can be maintained during the curation process (e.g., the merging process of two KGs). To this end, a novel Transformer-based Graph Encoder (Trans-GE) is proposed as a key component of i-Align for aggregating information from entities' neighbors (structures). Trans-GE uses Edge-gated Attention that combines the adjacency matrix and the self-attention matrix to learn a gating mechanism to control the information aggregation from the neighboring entities. It also uses historical embeddings, allowing Trans-GE to be trained over mini-batches, or smaller sub-graphs, to address the scalability issue when encoding a large KG. Another component of i-Align is a Transformer encoder for aggregating entities' attributes. This way, i-Align can generate explanations in the form of a set of the most influential attributes/neighbors based on attention weights. Extensive experiments are conducted to show the power of i-Align. The experiments include several aspects, such as the model's effectiveness for aligning KGs, the quality of the generated explanations, and its practicality for aligning large KGs. The results show the effectiveness of i-Align in these aspects. △ Less

Submitted 25 August, 2023; originally announced August 2023.

Comments: Data Min Knowl Disc (2023)

arXiv:2308.10220 [pdf, other]

doi 10.1145/3583780.3614841

Designing and Evaluating Presentation Strategies for Fact-Checked Content

Authors: Danula Hettiachchi, Kaixin Ji, Jenny Kennedy, Anthony McCosker, Flora D. Salim, Mark Sanderson, Falk Scholer, Damiano Spina

Abstract: With the rapid growth of online misinformation, it is crucial to have reliable fact-checking methods. Recent research on finding check-worthy claims and automated fact-checking have made significant advancements. However, limited guidance exists regarding the presentation of fact-checked content to effectively convey verified information to users. We address this research gap by exploring the crit… ▽ More With the rapid growth of online misinformation, it is crucial to have reliable fact-checking methods. Recent research on finding check-worthy claims and automated fact-checking have made significant advancements. However, limited guidance exists regarding the presentation of fact-checked content to effectively convey verified information to users. We address this research gap by exploring the critical design elements in fact-checking reports and investigating whether credibility and presentation-based design improvements can enhance users' ability to interpret the report accurately. We co-developed potential content presentation strategies through a workshop involving fact-checking professionals, communication experts, and researchers. The workshop examined the significance and utility of elements such as veracity indicators and explored the feasibility of incorporating interactive components for enhanced information disclosure. Building on the workshop outcomes, we conducted an online experiment involving 76 crowd workers to assess the efficacy of different design strategies. The results indicate that proposed strategies significantly improve users' ability to accurately interpret the verdict of fact-checking articles. Our findings underscore the critical role of effective presentation of fact reports in addressing the spread of misinformation. By adopting appropriate design enhancements, the effectiveness of fact-checking reports can be maximized, enabling users to make informed judgments. △ Less

Submitted 23 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: Accepted to the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23)

ACM Class: H.3.3; H.5.2

arXiv:2307.03385 [pdf, other]

AI-UPV at EXIST 2023 -- Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime

Authors: Angel Felipe Magnossão de Paula, Giulia Rizzi, Elisabetta Fersini, Damiano Spina

Abstract: With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the… ▽ More With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the learning with disagreements regime. This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm by training directly from the data with disagreements, without using any aggregated label. Yet, performances considering both soft and hard evaluations are reported. The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish. In particular, our system is articulated in three different pipelines. The ensemble approach outperformed the individual large language models obtaining the best performances both adopting a soft and a hard label evaluation. This work describes the participation in all the three EXIST tasks, considering a soft evaluation, it obtained fourth place in Task 2 at EXIST and first place in Task 3, with the highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of our approaches is publicly available at https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 15 pages, 9 tables, 1 figures, conference

arXiv:2307.03377 [pdf, ps, other]

Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection

Authors: Angel Felipe Magnossão de Paula, Paolo Rosso, Damiano Spina

Abstract: This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are… ▽ More This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are unavailable or expensive to gather. Therefore another solution, based on the sharing of information between tasks, has been developed: Multi-Task Learning (MTL). Despite the recent developments regarding MTL, the problem of negative transfer has still to be solved. Negative transfer is a phenomenon that occurs when noisy information is shared between tasks, resulting in a drop in performance. This paper proposes a new approach to mitigate the negative transfer problem based on the task awareness concept. The proposed approach results in diminishing the negative transfer together with an improvement of performance over classic MTL solution. Moreover, the proposed approach has been implemented in two unified architectures to detect Sexism, Hate Speech, and Toxic Language in text comments. The proposed architectures set a new state-of-the-art both in EXIST-2021 and HatEval-2019 benchmarks. △ Less

Submitted 7 July, 2023; originally announced July 2023.

Comments: 8 pages, 2 figures, 5 tables, IJCNN 2023 conference

arXiv:2304.13488 [pdf, other]

doi 10.1145/3539618.3591981

Examining the Impact of Uncontrolled Variables on Physiological Signals in User Studies for Information Processing Activities

Authors: Kaixin Ji, Damiano Spina, Danula Hettiachchi, Flora Dilys Salim, Falk Scholer

Abstract: Physiological signals can potentially be applied as objective measures to understand the behavior and engagement of users interacting with information access systems. However, the signals are highly sensitive, and many controls are required in laboratory user studies. To investigate the extent to which controlled or uncontrolled (i.e., confounding) variables such as task sequence or duration influ… ▽ More Physiological signals can potentially be applied as objective measures to understand the behavior and engagement of users interacting with information access systems. However, the signals are highly sensitive, and many controls are required in laboratory user studies. To investigate the extent to which controlled or uncontrolled (i.e., confounding) variables such as task sequence or duration influence the observed signals, we conducted a pilot study where each participant completed four types of information-processing activities (READ, LISTEN, SPEAK, and WRITE). Meanwhile, we collected data on blood volume pulse, electrodermal activity, and pupil responses. We then used machine learning approaches as a mechanism to examine the influence of controlled and uncontrolled variables that commonly arise in user studies. Task duration was found to have a substantial effect on the model performance, suggesting it represents individual differences rather than giving insight into the target variables. This work contributes to our understanding of such variables in using physiological signals in information retrieval user studies. △ Less

Submitted 26 April, 2023; originally announced April 2023.

Comments: Accepted to the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23)

arXiv:2208.03443 [pdf, other]

Imagining Future Digital Assistants at Work: A Study of Task Management Needs

Authors: Yonchanok Khaokaew, Indigo Holcombe-James, Mohammad Saiedur Rahaman, Jonathan Liono, Johanne R. Trippas, Damiano Spina, Nicholas Belkin, Peter Bailey, Paul N. Bennett, Yongli Ren, Mark Sanderson, Falk Scholer, Ryen W. White, Flora D. Salim

Abstract: Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs using data from a user study of 40 workers over a f… ▽ More Digital Assistants (DAs) can support workers in the workplace and beyond. However, target user needs are not fully understood, and the functions that workers would ideally want a DA to support require further study. A richer understanding of worker needs could help inform the design of future DAs. We investigate user needs of future workplace DAs using data from a user study of 40 workers over a four-week period. Our qualitative analysis confirms existing research and generates new insight on the role of DAs in managing people's time, tasks, and information. Placing these insights in relation to quantitative analysis of self-reported task data, we highlight how different occupation roles require DAs to take varied approaches to these domains and the effect of task characteristics on the imagined features. Our findings have implications for the design of future DAs in work settings, and we offer some recommendations for reduction to practice. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 59 pages

arXiv:2202.07858 [pdf, ps, other]

ITTC @ TREC 2021 Clinical Trials Track

Authors: Thinh Hung Truong, Yulia Otmakhova, Rahmad Mahendra, Timothy Baldwin, Jey Han Lau, Trevor Cohn, Lawrence Cavedon, Damiano Spina, Karin Verspoor

Abstract: This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track. The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes. We explor… ▽ More This paper describes the submissions of the Natural Language Processing (NLP) team from the Australian Research Council Industrial Transformation Training Centre (ITTC) for Cognitive Computing in Medical Technologies to the TREC 2021 Clinical Trials Track. The task focuses on the problem of matching eligible clinical trials to topics constituting a summary of a patient's admission notes. We explore different ways of representing trials and topics using NLP techniques, and then use a common retrieval model to generate the ranked list of relevant trials for each topic. The results from all our submitted runs are well above the median scores for all topics, but there is still plenty of scope for improvement. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 7 pages

arXiv:2108.10442 [pdf, other]

doi 10.1145/3459637.3482099

Evaluating Fairness in Argument Retrieval

Authors: Sachin Pathiyan Cherumanal, Damiano Spina, Falk Scholer, W. Bruce Croft

Abstract: Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argument retrieval systems is typically evaluated based… ▽ More Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user's information need on a controversial topic (e.g., climate change). The effectiveness of such argument retrieval systems is typically evaluated based on topical relevance and argument quality, without taking into account the often differing number of documents shown for the argument stances (PRO or CON). Therefore, systems may retrieve relevant passages, but with a biased exposure of arguments. In this work, we analyze a range of non-stochastic fairness-aware ranking and diversity metrics to evaluate the extent to which argument stances are fairly exposed in argument retrieval systems. Using the official runs of the argument retrieval task Touché at CLEF 2020, as well as synthetic data to control the amount and order of argument stances in the rankings, we show that systems with the best effectiveness in terms of topical relevance are not necessarily the most fair or the most diverse in terms of argument stance. The relationships we found between (un)fairness and diversity metrics shed light on how to evaluate group fairness -- in addition to topical relevance -- in argument retrieval settings. △ Less

Submitted 19 September, 2021; v1 submitted 23 August, 2021; originally announced August 2021.

Comments: Accepted at CIKM 2021

arXiv:2108.01222 [pdf, other]

doi 10.1016/j.ipm.2021.102710

The Many Dimensions of Truthfulness: Crowdsourcing Misinformation Assessments on a Multidimensional Scale

Authors: Michael Soprano, Kevin Roitero, David La Barbera, Davide Ceolin, Damiano Spina, Stefano Mizzaro, Gianluca Demartini

Abstract: Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) using a coarse grained assessment scale, the crowd ca… ▽ More Recent work has demonstrated the viability of using crowdsourcing as a tool for evaluating the truthfulness of public statements. Under certain conditions such as: (1) having a balanced set of workers with different backgrounds and cognitive abilities; (2) using an adequate set of mechanisms to control the quality of the collected data; and (3) using a coarse grained assessment scale, the crowd can provide reliable identification of fake news. However, fake news are a subtle matter: statements can be just biased ("cherrypicked"), imprecise, wrong, etc. and the unidimensional truth scale used in existing work cannot account for such differences. In this paper we propose a multidimensional notion of truthfulness and we ask the crowd workers to assess seven different dimensions of truthfulness selected based on existing literature: Correctness, Neutrality, Comprehensibility, Precision, Completeness, Speaker's Trustworthiness, and Informativeness. We deploy a set of quality control mechanisms to ensure that the thousands of assessments collected on 180 publicly available fact-checked statements distributed over two datasets are of adequate quality, including a custom search engine used by the crowd workers to find web pages supporting their truthfulness assessments. A comprehensive analysis of crowdsourced judgments shows that: (1) the crowdsourced assessments are reliable when compared to an expert-provided gold standard; (2) the proposed dimensions of truthfulness capture independent pieces of information; (3) the crowdsourcing task can be easily learned by the workers; and (4) the resulting assessments provide a useful basis for a more complete estimation of statement truthfulness. △ Less

Submitted 23 August, 2021; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: 33 pages; Paper accepted at Information Processing & Management on July 28, 2021; IP&M Special Issue on Dis/Misinformation Mining from Social Media

MSC Class: 68P20 ACM Class: H.3

Journal ref: Information Processing & Management Information Processing & Management, Volume 58, Issue 6, November 2021, 102710

arXiv:2107.11755 [pdf, other]

doi 10.1007/s00779-021-01604-6

Can the Crowd Judge Truthfulness? A Longitudinal Study on Recent Misinformation about COVID-19

Authors: Kevin Roitero, Michael Soprano, Beatrice Portelli, Massimiliano De Luise, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, Gianluca Demartini

Abstract: Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an effective and reliable method to assess truthfulness during a pandemic, targeting statements related to COVID-19, thus addressing (mis)information that is… ▽ More Recently, the misinformation problem has been addressed with a crowdsourcing-based approach: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of non-expert is exploited. We study whether crowdsourcing is an effective and reliable method to assess truthfulness during a pandemic, targeting statements related to COVID-19, thus addressing (mis)information that is both related to a sensitive and personal issue and very recent as compared to when the judgment is done. In our experiments, crowd workers are asked to assess the truthfulness of statements, and to provide evidence for the assessments. Besides showing that the crowd is able to accurately judge the truthfulness of the statements, we report results on workers behavior, agreement among workers, effect of aggregation functions, of scales transformations, and of workers background and bias. We perform a longitudinal study by re-launching the task multiple times with both novice and experienced workers, deriving important insights on how the behavior and quality change over time. Our results show that: workers are able to detect and objectively categorize online (mis)information related to COVID-19; both crowdsourced and expert judgments can be transformed and aggregated to improve quality; worker background and other signals (e.g., source of information, behavior) impact the quality of the data. The longitudinal study demonstrates that the time-span has a major effect on the quality of the judgments, for both novice and experienced workers. Finally, we provide an extensive failure analysis of the statements misjudged by the crowd-workers. △ Less

Submitted 19 September, 2021; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: 31 pages; Preprint of an article accepted in Personal and Ubiquitous Computing (Special Issue on Intelligent Systems for Tackling Online Harms, 2021). arXiv admin note: substantial text overlap with arXiv:2008.05701

MSC Class: 68P20 ACM Class: H.3

arXiv:2008.05701 [pdf, other]

doi 10.1145/3340531.3412048

The COVID-19 Infodemic: Can the Crowd Judge Recent Misinformation Objectively?

Authors: Kevin Roitero, Michael Soprano, Beatrice Portelli, Damiano Spina, Vincenzo Della Mea, Giuseppe Serra, Stefano Mizzaro, Gianluca Demartini

Abstract: Misinformation is an ever increasing problem that is difficult to solve for the research community and has a negative impact on the society at large. Very recently, the problem has been addressed with a crowdsourcing-based approach to scale up labeling efforts: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of (non-expert) judges is exploited. We follow the… ▽ More Misinformation is an ever increasing problem that is difficult to solve for the research community and has a negative impact on the society at large. Very recently, the problem has been addressed with a crowdsourcing-based approach to scale up labeling efforts: to assess the truthfulness of a statement, instead of relying on a few experts, a crowd of (non-expert) judges is exploited. We follow the same approach to study whether crowdsourcing is an effective and reliable method to assess statements truthfulness during a pandemic. We specifically target statements related to the COVID-19 health emergency, that is still ongoing at the time of the study and has arguably caused an increase of the amount of misinformation that is spreading online (a phenomenon for which the term "infodemic" has been used). By doing so, we are able to address (mis)information that is both related to a sensitive and personal issue like health and very recent as compared to when the judgment is done: two issues that have not been analyzed in related work. In our experiment, crowd workers are asked to assess the truthfulness of statements, as well as to provide evidence for the assessments as a URL and a text justification. Besides showing that the crowd is able to accurately judge the truthfulness of the statements, we also report results on many different aspects, including: agreement among workers, the effect of different aggregation functions, of scales transformations, and of workers background / bias. We also analyze workers behavior, in terms of queries submitted, URLs found / selected, text justifications, and other behavioral data like clicks and mouse actions collected by means of an ad hoc logger. △ Less

Submitted 13 August, 2020; originally announced August 2020.

Comments: 10 pages; Preprint of the full paper accepted at CIKM 2020

MSC Class: 68P20 ACM Class: H.3

arXiv:2005.06915 [pdf, other]

doi 10.1145/3397271.3401112

Can The Crowd Identify Misinformation Objectively? The Effects of Judgment Scale and Assessor's Background

Authors: Kevin Roitero, Michael Soprano, Shaoyang Fan, Damiano Spina, Stefano Mizzaro, Gianluca Demartini

Abstract: Truthfulness judgments are a fundamental step in the process of fighting misinformation, as they are crucial to train and evaluate classifiers that automatically distinguish true and false statements. Usually such judgments are made by experts, like journalists for political statements or medical doctors for medical statements. In this paper, we follow a different approach and rely on (non-expert)… ▽ More Truthfulness judgments are a fundamental step in the process of fighting misinformation, as they are crucial to train and evaluate classifiers that automatically distinguish true and false statements. Usually such judgments are made by experts, like journalists for political statements or medical doctors for medical statements. In this paper, we follow a different approach and rely on (non-expert) crowd workers. This of course leads to the following research question: Can crowdsourcing be reliably used to assess the truthfulness of information and to create large-scale labeled collections for information credibility systems? To address this issue, we present the results of an extensive study based on crowdsourcing: we collect thousands of truthfulness assessments over two datasets, and we compare expert judgments with crowd judgments, expressed on scales with various granularity levels. We also measure the political bias and the cognitive background of the workers, and quantify their effect on the reliability of the data provided by the crowd. △ Less

Submitted 24 June, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: Preprint of the full paper accepted at SIGIR 2020

MSC Class: 68P20 ACM Class: H.3

arXiv:1910.13166 [pdf, other]

Towards a Model for Spoken Conversational Search

Authors: Johanne R. Trippas, Damiano Spina, Paul Thomas, Mark Sanderson, Hideo Joho, Lawrence Cavedon

Abstract: Conversation is the natural mode for information exchange in daily life, a spoken conversational interaction for search input and output is a logical format for information seeking. However, the conceptualisation of user-system interactions or information exchange in spoken conversational search (SCS) has not been explored. The first step in conceptualising SCS is to understand the conversational… ▽ More Conversation is the natural mode for information exchange in daily life, a spoken conversational interaction for search input and output is a logical format for information seeking. However, the conceptualisation of user-system interactions or information exchange in spoken conversational search (SCS) has not been explored. The first step in conceptualising SCS is to understand the conversational moves used in an audio-only communication channel for search. This paper explores conversational actions for the task of search. We define a qualitative methodology for creating conversational datasets, propose analysis protocols, and develop the SCSdata. Furthermore, we use the SCSdata to create the first annotation schema for SCS: the SCoSAS, enabling us to investigate interactivity in SCS. We further establish that SCS needs to incorporate interactivity and pro-activity to overcome the complexity that the information seeking process in an audio-only channel poses. In summary, this exploratory study unpacks the breadth of SCS. Our results highlight the need for integrating discourse in future SCS models and contributes the advancement in the formalisation of SCS models and the design of SCS systems. △ Less

Submitted 29 October, 2019; v1 submitted 29 October, 2019; originally announced October 2019.

Comments: Paper accepted at Information Processing & Management on October 29, 2019, Spoken Conversational Search, Information Seeking

arXiv:1807.04317 [pdf, other]

doi 10.1145/3234944.3234958

A Formal Account of Effectiveness Evaluation and Ranking Fusion

Authors: Enrique Amigó, Fernando Giner, Stefano Mizzaro, Damiano Spina

Abstract: This paper proposes a theoretical framework which models the information provided by retrieval systems in terms of Information Theory. The proposed framework allows to formalize: (i) system effectiveness as an information theoretic similarity between system outputs and human assessments, and (ii) ranking fusion as an information quantity measure. As a result, the proposed effectiveness metric impr… ▽ More This paper proposes a theoretical framework which models the information provided by retrieval systems in terms of Information Theory. The proposed framework allows to formalize: (i) system effectiveness as an information theoretic similarity between system outputs and human assessments, and (ii) ranking fusion as an information quantity measure. As a result, the proposed effectiveness metric improves popular metrics in terms of formal constraints. In addition, our empirical experiments suggest that it captures quality aspects from traditional metrics, while the reverse is not true. Our work also advances the understanding of theoretical foundations of the empirically known phenomenon of effectiveness increase when combining retrieval system outputs in an unsupervised manner. △ Less

Submitted 14 September, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

Comments: ICTIR'18 paper, extended version with formal proofs (10 pages)

MSC Class: 68P20 ACM Class: H.3.3

arXiv:1806.03957 [pdf, other]

doi 10.1007/978-3-030-28577-7_12

Prosody Modifications for Question-Answering in Voice-Only Settings

Authors: Aleksandr Chuklin, Aliaksei Severyn, Johanne Trippas, Enrique Alfonseca, Hanna Silen, Damiano Spina

Abstract: Many popular form factors of digital assistants---such as Amazon Echo, Apple Homepod, or Google Home---enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose… ▽ More Many popular form factors of digital assistants---such as Amazon Echo, Apple Homepod, or Google Home---enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose a task of evaluating the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup where we evaluate the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of the user to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that some of these prosodic modifications lead to better comprehension at the expense of only slightly degraded naturalness of the audio. △ Less

Submitted 2 October, 2019; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: Shorter version of this paper was accepted to CLEF'2019, Lugano, Switzerland. The final authenticated version is available online at https://doi.org/10.1007/978-3-030-28577-7_12

ACM Class: H.3.3; H.5.2

Journal ref: Lecture Notes in Computer Science, vol 11696 CLEF 2019

arXiv:1805.02334 [pdf, ps, other]

doi 10.1145/3209978.3210024

An Axiomatic Analysis of Diversity Evaluation Metrics: Introducing the Rank-Biased Utility Metric

Authors: Enrique Amigó, Damiano Spina, Jorge Carrillo-de-Albornoz

Abstract: Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular sce… ▽ More Many evaluation metrics have been defined to evaluate the effectiveness ad-hoc retrieval and search result diversification systems. However, it is often unclear which evaluation metric should be used to analyze the performance of retrieval systems given a specific task. Axiomatic analysis is an informative mechanism to understand the fundamentals of metrics and their suitability for particular scenarios. In this paper, we define a constraint-based axiomatic framework to study the suitability of existing metrics in search result diversification scenarios. The analysis informed the definition of Rank-Biased Utility (RBU) -- an adaptation of the well-known Rank-Biased Precision metric -- that takes into account redundancy and the user effort associated to the inspection of documents in the ranking. Our experiments over standard diversity evaluation campaigns show that the proposed metric captures quality criteria reflected by different metrics, being suitable in the absence of knowledge about particular features of the scenario under study. △ Less

Submitted 19 August, 2018; v1 submitted 6 May, 2018; originally announced May 2018.

Comments: Original version: 10 pages. Preprint of full paper to appear at SIGIR'18: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, July 8-12, 2018, Ann Arbor, MI, USA. ACM, New York, NY, USA

MSC Class: 68P20 ACM Class: H.3.3

arXiv:1403.1451 [pdf, ps, other]

Real-Time Classification of Twitter Trends

Authors: Arkaitz Zubiaga, Damiano Spina, Raquel Martínez, Víctor Fresno

Abstract: Social media users give rise to social trends as they share about common interests, which can be triggered by different reasons. In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following four types: 'news', 'ongoing events', 'memes', and 'commemoratives'. While previous research has analyzed trending topics in a long term, we look at the ear… ▽ More Social media users give rise to social trends as they share about common interests, which can be triggered by different reasons. In this work, we explore the types of triggers that spark trends on Twitter, introducing a typology with following four types: 'news', 'ongoing events', 'memes', and 'commemoratives'. While previous research has analyzed trending topics in a long term, we look at the earliest tweets that produce a trend, with the aim of categorizing trends early on. This would allow to provide a filtered subset of trends to end users. We analyze and experiment with a set of straightforward language-independent features based on the social spread of trends to categorize them into the introduced typology. Our method provides an efficient way to accurately categorize trending topics without need of external data, enabling news organizations to discover breaking news in real-time, or to quickly identify viral memes that might enrich marketing decisions, among others. The analysis of social features also reveals patterns associated with each type of trend, such as tweets about ongoing events being shorter as many were likely sent from mobile devices, or memes having more retweets originating from a few trend-setters. △ Less

Submitted 6 March, 2014; originally announced March 2014.

Comments: Pre-print of article accepted for publication in Journal of the American Society for Information Science and Technology copyright @ 2013 (American Society for Information Science and Technology)

arXiv:1204.3731 [pdf, other]

Towards Real-Time Summarization of Scheduled Events from Twitter Streams

Authors: Arkaitz Zubiaga, Damiano Spina, Enrique Amigó, Julio Gonzalo

Abstract: This paper explores the real-time summarization of scheduled events such as soccer games from torrential flows of Twitter streams. We propose and evaluate an approach that substantially shrinks the stream of tweets in real-time, and consists of two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, which picks a representative tweet to describ… ▽ More This paper explores the real-time summarization of scheduled events such as soccer games from torrential flows of Twitter streams. We propose and evaluate an approach that substantially shrinks the stream of tweets in real-time, and consists of two steps: (i) sub-event detection, which determines if something new has occurred, and (ii) tweet selection, which picks a representative tweet to describe each sub-event. We compare the summaries generated in three languages for all the soccer games in "Copa America 2011" to reference live reports offered by Yahoo! Sports journalists. We show that simple text analysis methods which do not involve external knowledge lead to summaries that cover 84% of the sub-events on average, and 100% of key types of sub-events (such as goals in soccer). Our approach should be straightforwardly applicable to other kinds of scheduled events such as other sports, award ceremonies, keynote talks, TV shows, etc. △ Less

Submitted 17 April, 2012; originally announced April 2012.

Showing 1–23 of 23 results for author: Spina, D