Human-Computer Interaction
See recent articles
- [1] arXiv:2407.12786 [pdf, html, other]
-
Title: Cube2Pipes : Investigating Hybrid Gameplay Using AR and a Tangible 3D PuzzleSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
We present our game, Cube2Pipes, as an attempt to investigate a unique gameplay design where we use a tangible 3D spatial puzzle, in the form of a 2X2 Rubik's Cube, as an interface to a tabletop mobile augmented reality (AR) game. The game interface adapts to user movement and interaction with both virtual and tangible elements via computer vision based tracking. This game can be seen as an instance of generic interactive hybrid systems as it involves interaction with both virtual and real, tangible elements. We present a thorough user evaluation about various aspects of the gameplay in order to answer the question as to whether hybrid gameplay involving both real and virtual interfaces and elements is more captivating and preferred by users, than standard (baseline) gameplay with only virtual elements. We use multiple industry standard user study questionnaires to try and answer this question. We also try to determine whether the game facilitates understanding of the spatial moves required to solve a Rubik's Cube, and the efficacy of a tangible puzzle interface to a tabletop AR game.
- [2] arXiv:2407.12787 [pdf, html, other]
-
Title: GameVibe: A Multimodal Affective Game CorpusMatthew Barthet, Maria Kaselimi, Kosmas Pinitas, Konstantinos Makantasis, Antonios Liapis, Georgios N. YannakakisComments: 11 pages, 5 figures, 1 tableSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
As online video and streaming platforms continue to grow, affective computing research has undergone a shift towards more complex studies involving multiple modalities. However, there is still a lack of readily available datasets with high-quality audiovisual stimuli. In this paper, we present GameVibe, a novel affect corpus which consists of multimodal audiovisual stimuli, including in-game behavioural observations and third-person affect labels for viewer engagement. The corpus consists of videos from a diverse set of publicly available gameplay sessions across 30 games, with particular attention to ensure high-quality stimuli with good audiovisual and gameplay diversity. Furthermore, we present an analysis on the reliability of the annotators in terms of inter-annotator agreement.
- [3] arXiv:2407.12795 [pdf, other]
-
Title: The need of a self for self-driving cars a theoretical model applying homeostasis to self drivingComments: Pre-print draftSubjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
This paper explores the concept of creating a "self" for self-driving cars through a homeostatic architecture designed to enhance their autonomy, safety, and efficiency. The proposed system integrates inward focused sensors to monitor the car's internal state, such as the condition of its metal bodywork, wheels, engine, and battery, establishing a baseline homeostatic state representing optimal functionality. Outward facing sensors, like cameras and LIDAR, are then interpreted via their impact on the car's homeostatic state by quantifying deviations from homeostasis. This contrasts with the approach of trying to make cars "see" reality in a similar way to humans and identify elements in their reality in the same way humans. Virtual environments would be leveraged to accelerate training. Additionally, cars are programmed to communicate and share experiences via blockchain technology, learning from each other's mistakes while maintaining individualized training models. A dedicated language for self-driving cars is proposed to enable nuanced interpretation and response to environmental data. This architecture allows self-driving cars to dynamically adjust their behavior based on internal and external feedback, promoting cooperation and continuous improvement. The study concludes by discussing the broader implications for AI development, potential real-world applications, and future research directions.
- [4] arXiv:2407.12800 [pdf, other]
-
Title: Digital Storytelling for Competence Development in GamesSubjects: Human-Computer Interaction (cs.HC); Graphics (cs.GR)
The acquisition of complex knowledge and competences raises difficult challenges for the supporting tools within the corporate environment, which digital storytelling presents a potential solution. Traditionally, a driving goal of digital storytelling is the generation of dramatic stories with human significance, but for learning purposes, the need for drama is complemented by the requirement of achieving particular learning outcomes. This paper presents a narrative engine that supports emergent storytelling to support the development of complex competences in the learning domains of project management and innovation. The approach is based on the adaptation on the Fabula model combined with cases representing situated contexts associated to particular competences. These cases are then triggered to influence the unfolding of the story such that a learner encounters dramatic points in the narrative where the associated competences need to be used. In addition to the description of the approach and corresponding narrative engine, an illustration is presented of how the competence 'conflict management' influences a story.
- [5] arXiv:2407.12804 [pdf, html, other]
-
Title: Modulating Language Model Experiences through FrictionsKatherine M. Collins, Valerie Chen, Ilia Sucholutsky, Hannah Rose Kirk, Malak Sadek, Holli Sargeant, Ameet Talwalkar, Adrian Weller, Umang BhattComments: Pre-print, under reviewSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Language models are transforming the ways that their users engage with the world. Despite impressive capabilities, over-consumption of language model outputs risks propagating unchecked errors in the short-term and damaging human capabilities for critical thinking in the long-term, particularly in knowledge-based tasks. How can we develop scaffolding around language models to curate more appropriate use? We propose selective frictions for language model experiences, inspired by behavioral science interventions, to dampen misuse. Frictions involve small modifications to a user's experience, e.g., the addition of a button impeding model access and reminding a user of their expertise relative to the model. Through a user study with real humans, we observe shifts in user behavior from the imposition of a friction over LLMs in the context of a multi-topic question-answering task as a representative task that people may use LLMs for, e.g., in education and information retrieval. We find that frictions modulate over-reliance by driving down users' click rates while minimally affecting accuracy for those topics. Yet, frictions may have unintended effects. We find marked differences in users' click behaviors even on topics where frictions were not provisioned. Our contributions motivate further study of human-AI behavioral interaction to inform more effective and appropriate LLM use.
- [6] arXiv:2407.12807 [pdf, html, other]
-
Title: Vision Controlled Sensorized Prosthetic HandMd Abdul Baset Sarker, Juan Pablo S. Sola, Aaron Jones, Evan Laing, Ernesto Sola-Thomas, Masudul H. ImtiazSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
This paper presents a sensorized vision-enabled prosthetic hand aimed at replicating a natural hand's performance, functionality, appearance, and comfort. The design goal was to create an accessible substitution with a user-friendly interface requiring little to no training. Our mechanical hand uses a camera and embedded processors to perform most of these tasks. The interfaced pressure sensor is used to get pressure feedback and ensure a safe grasp of the object; an accelerometer is used to detect gestures and release the object. Unlike current EMG-based designs, the prototyped hand does not require personalized training. The details of the design, trade-offs, results, and informing the next iteration are presented in this paper.
- [7] arXiv:2407.12809 [pdf, html, other]
-
Title: Simplify, Consolidate, Intervene: Facilitating Institutional Support with Mental Models of Learning Management System UseTaha Hassan, Bob Edmison, Daron Williams, Larry Cox II, Matthew Louvet, Bart Knijnenburg, D. Scott McCrickardComments: CSCW 2024 (accepted for publication)Subjects: Human-Computer Interaction (cs.HC)
Measuring instructors' adoption of learning management system (LMS) tools is a critical first step in evaluating the efficacy of online teaching and learning at scale. Existing models for LMS adoption are often qualitative, learner-centered, and difficult to leverage towards institutional support. We propose depth-of-use (DOU): an intuitive measurement model for faculty's utilization of a university-wide LMS and their needs for institutional support. We hypothesis-test the relationship between DOU and course attributes like modality, participation, logistics, and outcomes. In a large-scale analysis of metadata from 30000+ courses offered at Virginia Tech over two years, we find that a pervasive need for scale, interoperability and ubiquitous access drives LMS adoption by university instructors. We then demonstrate how DOU can help faculty members identify the opportunity-cost of transition from legacy apps to LMS tools. We also describe how DOU can help instructional designers and IT organizational leadership evaluate the impact of their support allocation, faculty development and LMS evangelism initiatives.
- [8] arXiv:2407.13067 [pdf, html, other]
-
Title: Large Language Model Agents for Improving Engagement with Behavior Change Interventions: Application to Digital MindfulnessHarsh Kumar, Suhyeon Yoo, Angela Zavaleta Bernuy, Jiakai Shi, Huayin Luo, Joseph Williams, Anastasia Kuzminykh, Ashton Anderson, Rachel KornfieldComments: Under reviewSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
Although engagement in self-directed wellness exercises typically declines over time, integrating social support such as coaching can sustain it. However, traditional forms of support are often inaccessible due to the high costs and complex coordination. Large Language Models (LLMs) show promise in providing human-like dialogues that could emulate social support. Yet, in-depth, in situ investigations of LLMs to support behavior change remain underexplored. We conducted two randomized experiments to assess the impact of LLM agents on user engagement with mindfulness exercises. First, a single-session study, involved 502 crowdworkers; second, a three-week study, included 54 participants. We explored two types of LLM agents: one providing information and another facilitating self-reflection. Both agents enhanced users' intentions to practice mindfulness. However, only the information-providing LLM, featuring a friendly persona, significantly improved engagement with the exercises. Our findings suggest that specific LLM agents may bridge the social support gap in digital health interventions.
- [9] arXiv:2407.13107 [pdf, html, other]
-
Title: DITTO: A Visual Digital Twin for Interventions and Temporal Treatment Outcomes in Head and Neck CancerAndrew Wentzel, Serageldin Attia, Xinhua Zhang, Guadalupe Canahuate, Clifton David Fuller, G.Elisabeta MaraiSubjects: Human-Computer Interaction (cs.HC)
Digital twin models are of high interest to Head and Neck Cancer (HNC) oncologists, who have to navigate a series of complex treatment decisions that weigh the efficacy of tumor control against toxicity and mortality risks. Evaluating individual risk profiles necessitates a deeper understanding of the interplay between different factors such as patient health, spatial tumor location and spread, and risk of subsequent toxicities that can not be adequately captured through simple heuristics. To support clinicians in better understanding tradeoffs when deciding on treatment courses, we developed DITTO, a digital-twin and visual computing system that allows clinicians to analyze detailed risk profiles for each patient, and decide on a treatment plan. DITTO relies on a sequential Deep Reinforcement Learning digital twin (DT) to deliver personalized risk of both long-term and short-term disease outcome and toxicity risk for HNC patients. Based on a participatory collaborative design alongside oncologists, we also implement several visual explainability methods to promote clinical trust and encourage healthy skepticism when using our system. We evaluate the efficacy of DITTO through quantitative evaluation of performance and case studies with qualitative feedback. Finally, we discuss design lessons for developing clinical visual XAI applications for clinical end users.
- [10] arXiv:2407.13131 [pdf, html, other]
-
Title: Reimagining Communities through Transnational Bengali Decolonial Discourse with YouTube Content CreatorsComments: accepted at CSCW 2024Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
Colonialism--the policies and practices wherein a foreign body imposes its ways of life on local communities--has historically impacted how collectives perceive themselves in relation to others. One way colonialism has impacted how people see themselves is through nationalism, where nationalism is often understood through shared language, culture, religion, and geopolitical borders. The way colonialism has shaped people's experiences with nationalism has shaped historical conflicts between members of different nation-states for a long time. While recent social computing research has studied how colonially marginalized people can engage in discourse to decolonize or re-imagine and reclaim themselves and their communities on their own terms--what is less understood is how technology can better support decolonial discourses in an effort to re-imagine nationalism. To understand this phenomenon, this research draws on a semi-structured interview study with YouTubers who make videos about culturally Bengali people whose lives were upended as a product of colonization and are now dispersed across Bangladesh, India, and Pakistan. This research seeks to understand people's motivations and strategies for engaging in video-mediated decolonial discourse in transnational contexts. We discuss how our work demonstrates the potential of the sociomateriality of decolonial discourse online and extends an invitation to foreground complexities of nationalism in social computing research.
- [11] arXiv:2407.13166 [pdf, html, other]
-
Title: Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User SatisfactionHyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho KimComments: Accepted to LLM4Eval @ SIGIR 2024 - The First Workshop on Large Language Models (LLMs) for Evaluation in Information RetrievalSubjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored how and why users ask follow-up queries to continue conversations with conversational search engines and how the follow-up queries signal users' satisfaction. From qualitative analysis of 250 conversational turns from an in-lab user evaluation of Naver Cue:, a commercial conversational search engine, we propose a taxonomy of 18 users' follow-up query patterns from conversational search, comprising two major axes: (1) users' motivations behind continuing conversations (N = 7) and (2) actions of follow-up queries (N = 11). Compared to the existing literature on query reformulations, we uncovered a new set of motivations and actions behind follow-up queries, including asking for subjective opinions or providing natural language feedback on the engine's responses. To analyze conversational search logs with our taxonomy in a scalable and efficient manner, we built an LLM-powered classifier (73% accuracy). With our classifier, we analyzed 2,061 conversational tuples collected from real-world usage logs of Cue: and examined how the conversation patterns from our taxonomy correlates with satisfaction. Our initial findings suggest some signals of dissatisfactions, such as Clarifying Queries, Excluding Condition, and Substituting Condition with follow-up queries. We envision our approach could contribute to automated evaluation of conversation search experience by providing satisfaction signals and grounds for realistic user simulations.
- [12] arXiv:2407.13408 [pdf, html, other]
-
Title: DISCOVER: A Data-driven Interactive System for Comprehensive Observation, Visualization, and ExploRation of Human BehaviourSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Understanding human behavior is a fundamental goal of social sciences, yet its analysis presents significant challenges. Conventional methodologies employed for the study of behavior, characterized by labor-intensive data collection processes and intricate analyses, frequently hinder comprehensive exploration due to their time and resource demands. In response to these challenges, computational models have proven to be promising tools that help researchers analyze large amounts of data by automatically identifying important behavioral indicators, such as social signals. However, the widespread adoption of such state-of-the-art computational models is impeded by their inherent complexity and the substantial computational resources necessary to run them, thereby constraining accessibility for researchers without technical expertise and adequate equipment. To address these barriers, we introduce DISCOVER -- a modular and flexible, yet user-friendly software framework specifically developed to streamline computational-driven data exploration for human behavior analysis. Our primary objective is to democratize access to advanced computational methodologies, thereby enabling researchers across disciplines to engage in detailed behavioral analysis without the need for extensive technical proficiency. In this paper, we demonstrate the capabilities of DISCOVER using four exemplary data exploration workflows that build on each other: Interactive Semantic Content Exploration, Visual Inspection, Aided Annotation, and Multimodal Scene Search. By illustrating these workflows, we aim to emphasize the versatility and accessibility of DISCOVER as a comprehensive framework and propose a set of blueprints that can serve as a general starting point for exploratory data analysis.
- [13] arXiv:2407.13515 [pdf, html, other]
-
Title: CookAR: Affordance Augmentations in Wearable AR to Support Kitchen Tool Interactions for People with Low VisionJaewook Lee, Andrew D. Tjahjadi, Jiho Kim, Junpu Yu, Minji Park, Jiawen Zhang, Jon E. Froehlich, Yapeng Tian, Yuhang ZhaoSubjects: Human-Computer Interaction (cs.HC)
Cooking is a central activity of daily living, supporting independence and both mental and physical health. However, prior work has highlighted key barriers for people with low vision (LV) to cook, particularly around safely interacting with cooking tools, such as sharp knives or hot pans. Drawing on recent advancements in computer vision (CV) and robotics, we present CookAR, a head-mounted AR system with real-time object affordance augmentations to support safe and efficient interactions with kitchen tools. To design and implement CookAR, we manually collected and annotated the first egocentric dataset of kitchen tool affordances, fine-tuned an affordance segmentation model, and leveraged a stereo camera attached to an AR headset to generate the visual augmentations. To validate CookAR, we conducted a technical performance evaluation and a three-part qualitative lab study with ten LV participants. Our technical evaluation demonstrates that our fine-tuned model outperforms the base model on our class-specific dataset, while our user study indicates a preference for affordance augmentations over the traditional whole object augmentations. Code is available at: this https URL
- [14] arXiv:2407.13598 [pdf, html, other]
-
Title: KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph IntegrationComments: 9 pages, 9 figures, accepted by IEEE VIS 2024Subjects: Human-Computer Interaction (cs.HC)
The increasing reliance on Large Language Models (LLMs) for health information seeking can pose severe risks due to the potential for misinformation and the complexity of these topics. This paper introduces KNOWNET a visualization system that integrates LLMs with Knowledge Graphs (KG) to provide enhanced accuracy and structured exploration. Specifically, for enhanced accuracy, KNOWNET extracts triples (e.g., entities and their relations) from LLM outputs and maps them into the validated information and supported evidence in external KGs. For structured exploration, KNOWNET provides next-step recommendations based on the neighborhood of the currently explored entities in KGs, aiming to guide a comprehensive understanding without overlooking critical aspects. To enable reasoning with both the structured data in KGs and the unstructured outputs from LLMs, KNOWNET conceptualizes the understanding of a subject as the gradual construction of graph visualization. A progressive graph visualization is introduced to monitor past inquiries, and bridge the current query with the exploration history and next-step recommendations. We demonstrate the effectiveness of our system via use cases and expert interviews.
- [15] arXiv:2407.13701 [pdf, other]
-
Title: Cannabis Impairment Monitoring Using Objective Eye Tracking AnalyticsSubjects: Human-Computer Interaction (cs.HC); Neurons and Cognition (q-bio.NC)
The continuing growth in cannabis legalization necessitates the development of rapid, objective methods for assessing impairment to ensure public and occupational safety. Traditional measurement techniques are subjective, time-consuming, and do not directly measure physical impairment. This study introduces objective metrics derived from eye-tracking analytics to address these limitations.
We employed a head-mounted display to present 20 subjects with smooth pursuit performance, horizontal saccade, and simple reaction time tasks. Individual and group performance was compared before and after cannabis use. Results demonstrated significant changes in oculomotor control post-cannabis consumption, with smooth pursuit performance showing the most substantial signal.
The objective eye-tracking data was used to develop supervised learning models, achieving a classification accuracy of 89% for distinguishing between sober and impaired states when normalized against baseline measures. Eye-tracking is the optimal candidate for a portable, rapid, and objective tool for assessing cannabis impairment, offering significant improvements over current subjective and indirect methods.
New submissions for Friday, 19 July 2024 (showing 15 of 15 entries )
- [16] arXiv:2407.12801 (cross-list from cs.CY) [pdf, other]
-
Title: Evaluation of LLMs Biases Towards Elite Universities: A Persona-Based ExplorationComments: 14 pages, 4 FiguresSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Elite universities are a dream destination for not just students but also top employers who get a supply of amazing talents. When we hear about top universities, the first thing that comes to mind is their academic rigor, prestigious reputation, and highly successful alumni. However, society at large is not just represented by a few elite universities, but several others. We have seen several examples where many, even without formal education, built big businesses. There are various instances in which several people, however talented, couldn't make it to top elite universities because of several resource constraints. For recruitment of candidates, we do see candidates from a few elite universities well represented in top technology companies. However, we found during our study that LLMs go overboard in representing that. Why is it a problem, though? LLMs are now becoming mainstream and may play a role in evaluating candidates' relevance in the recruitment process across industries. Our study investigates whether LLMs are biased toward Elite universities like Stanford University, Harvard University, University of California, Berkley, and MIT. Our research compares the performance of three popular large language models by adopting a novel persona-based approach and compares the predicted educational backgrounds of professionals in the technology industry with actual data collected from LinkedIn. Specifically, we examined GPT-3.5, Gemini, and Claude 3 Sonnet predictions for job positions such as VP Product, Director of Product, Product Manager, VP Engineering, Director of Engineering, and Software Engineer at Microsoft, Meta, and Google. We noticed biases in LLMs' prediction of educational backgrounds. We are confident that our research will propel the study of LLM biases and our suggested strategies could mitigate biases in LLM-based use cases and applications.
- [17] arXiv:2407.12847 (cross-list from cs.CL) [pdf, html, other]
-
Title: Aligning Model Evaluations with Human Preferences: Mitigating Token Count Bias in Language Model AssessmentsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
The SLAM paper demonstrated that on-device Small Language Models (SLMs) are a viable and cost-effective alternative to API-based Large Language Models (LLMs), such as OpenAI's GPT-4, offering comparable performance and stability. However, SLAM also identified discrepancies between human preferences and traditional auto-evaluators. This follow-up paper explores methods to align LLM evaluator preferences with human evaluations by addressing biases, particularly toward higher token counts. We employed Bayesian statistics and a t-test to quantify this bias and developed a recalibration procedure to adjust the GPTScorer. Our findings significantly improve aligning the recalibrated LLM evaluator with human evaluations across multiple use cases. For instance, spearman's ranking correlation score in the Recommendation use case improved from -27.27 to 44.55. These results highlight the importance of accounting for biases in automated evaluations to ensure fair and accurate model assessments. The recalibration process enhances the reliability of automated evaluators, leading to better AI models that align with human values and expectations. This study provides a robust methodology for future research into bias correction and emphasizes the feasibility and benefits of developing human-aligned AI evaluation systems.
- [18] arXiv:2407.12861 (cross-list from cs.CL) [pdf, html, other]
-
Title: CiteME: Can Language Models Accurately Cite Scientific Claims?Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3\% on CiteME. Overall, CiteME serves as a challenging testbed for open-ended claim attribution, driving the research community towards a future where any claim made by an LM can be automatically verified and discarded if found to be incorrect.
- [19] arXiv:2407.12884 (cross-list from cs.LG) [pdf, html, other]
-
Title: SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty QuantificationComments: To be published in Proc. IEEE VIS 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Human-Computer Interaction (cs.HC)
Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.
- [20] arXiv:2407.12896 (cross-list from cs.CY) [pdf, html, other]
-
Title: A Survey of Scam Exposure, Victimization, Types, Vectors, and Reporting in 12 CountriesComments: To appear in the Journal of Online Trust and SafetySubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
Scams are a widespread issue with severe consequences for both victims and perpetrators, but existing data collection is fragmented, precluding global and comparative local understanding. The present study addresses this gap through a nationally representative survey (n = 8,369) on scam exposure, victimization, types, vectors, and reporting in 12 countries: Belgium, Egypt, France, Hungary, Indonesia, Mexico, Romania, Slovakia, South Africa, South Korea, Sweden, and the United Kingdom. We analyze 6 survey questions to build a detailed quantitative picture of the scams landscape in each country, and compare across countries to identify global patterns. We find, first, that residents of less affluent countries suffer financial loss from scams more often. Second, we find that the internet plays a key role in scams across the globe, and that GNI per-capita is strongly associated with specific scam types and contact vectors. Third, we find widespread under-reporting, with residents of less affluent countries being less likely to know how to report a scam. Our findings contribute valuable insights for researchers, practitioners, and policymakers in the online fraud and scam prevention space.
- [21] arXiv:2407.13195 (cross-list from cs.LG) [pdf, other]
-
Title: Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty EstimationComments: 41 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Information Theory (cs.IT); Machine Learning (stat.ML)
Foundation models often struggle with uncertainty when faced with new situations in online decision-making, necessitating scalable and efficient exploration to resolve this uncertainty. We introduce GPT-HyperAgent, an augmentation of GPT with HyperAgent for uncertainty-aware, scalable exploration in contextual bandits, a fundamental online decision problem involving natural language input. We prove that HyperAgent achieves fast incremental uncertainty estimation with $\tilde{O}(\log T)$ per-step computational complexity over $T$ periods under the linear realizable assumption. Our analysis demonstrates that HyperAgent's regret order matches that of exact Thompson sampling in linear contextual bandits, closing a significant theoretical gap in scalable exploration. Empirical results in real-world contextual bandit tasks, such as automated content moderation with human feedback, validate the practical effectiveness of GPT-HyperAgent for safety-critical decisions. Our code is open-sourced at \url{this https URL}.
- [22] arXiv:2407.13240 (cross-list from cs.CR) [pdf, html, other]
-
Title: Intelligo ut Confido: Understanding, Trust and User Experience in Verifiable Receipt-Free E-Voting (long version)Subjects: Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
Voting protocols seek to provide integrity and vote privacy in elections. To achieve integrity, procedures have been proposed allowing voters to verify their vote - however this impacts both the user experience and privacy. Especially, vote verification can lead to vote-buying or coercion, if an attacker can obtain documentation, i.e. a receipt, of the cast vote. Thus, some voting protocols go further and provide mechanisms to prevent such receipts. To be effective, this so-called receipt-freeness depends on voters being able to understand and use these mechanisms. In this paper, we present a study with 300 participants which aims to evaluate the voters' experience of the receipt-freeness procedures in the e-voting protocol Selene in the context of vote-buying. This actually constitutes the first user study dealing with vote-buying in e-voting. While the usability and trust factors were rated low in the experiments, we found a positive correlation between trust and understanding.
- [23] arXiv:2407.13266 (cross-list from cs.SD) [pdf, html, other]
-
Title: How Private is Low-Frequency Speech Audio in the Wild? An Analysis of Verbal Intelligibility by Humans and MachinesComments: This manuscript has been accepted by Interspeech 2024Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
Low-frequency audio has been proposed as a promising privacy-preserving modality to study social dynamics in real-world settings. To this end, researchers have developed wearable devices that can record audio at frequencies as low as 1250 Hz to mitigate the automatic extraction of the verbal content of speech that may contain private details. This paper investigates the validity of this hypothesis, examining the degree to which low-frequency speech ensures verbal privacy. It includes simulating a potential privacy attack in various noise environments. Further, it explores the trade-off between the performance of voice activity detection, which is fundamental for understanding social behavior, and privacy-preservation. The evaluation incorporates subjective human intelligibility and automatic speech recognition performance, comprehensively analyzing the delicate balance between effective social behavior analysis and preserving verbal privacy.
- [24] arXiv:2407.13415 (cross-list from cs.CR) [pdf, html, other]
-
Title: Empirical Analysis of Sri Lankan Mobile Health Ecosystem: A Precursor to an Effective Stakeholder EngagementSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Software Engineering (cs.SE)
Sri Lanka recently passed its first privacy legislation covering a wide range of sectors, including health. As a precursor for effective stakeholder engagement in the health domain to understand the most effective way to implement legislation in healthcare, we have analyzed 41 popular mobile apps and web portals. We found that 78% of the tested systems have third-party domains receiving sensitive health data with minimal visibility to the consumers. We discuss how this will create potential issues in preparing for the new privacy legislation.
Cross submissions for Friday, 19 July 2024 (showing 9 of 9 entries )
- [25] arXiv:2209.09943 (replaced) [pdf, html, other]
-
Title: Adversarial Bi-Regressor Network for Domain Adaptive RegressionComments: 7 pages, 5 figures; IJCAI 2022; tested in the SPAWC2021 dataset for indoor localizationSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV)
Domain adaptation (DA) aims to transfer the knowledge of a well-labeled source domain to facilitate unlabeled target learning. When turning to specific tasks such as indoor (Wi-Fi) localization, it is essential to learn a cross-domain regressor to mitigate the domain shift. This paper proposes a novel method Adversarial Bi-Regressor Network (ABRNet) to seek more effective cross-domain regression model. Specifically, a discrepant bi-regressor architecture is developed to maximize the difference of bi-regressor to discover uncertain target instances far from the source distribution, and then an adversarial training mechanism is adopted between feature extractor and dual regressors to produce domain-invariant representations. To further bridge the large domain gap, a domain-specific augmentation module is designed to synthesize two source-similar and target-similar intermediate domains to gradually eliminate the original domain mismatch. The empirical studies on two cross-domain regressive benchmarks illustrate the power of our method on solving the domain adaptive regression (DAR) problem.
- [26] arXiv:2305.00510 (replaced) [pdf, html, other]
-
Title: Towards AI-Architecture Liberty: A Comprehensive Survey on Design and Generation of Virtual Architecture by Deep LearningComments: 36 pages, 9 figures, and 5 tablesSubjects: Human-Computer Interaction (cs.HC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
3D shape generation techniques leveraging deep learning have garnered significant interest from both the computer vision and architectural design communities, promising to enrich the content in the virtual environment. However, research on virtual architectural design remains limited, particularly regarding designer-AI collaboration and deep learning-assisted design. In our survey, we reviewed 149 related articles (81.2% of articles published between 2019 and 2023) covering architectural design, 3D shape techniques, and virtual environments. Through scrutinizing the literature, we first identify the principles of virtual architecture and illuminate its current production challenges, including datasets, multimodality, design intuition, and generative frameworks. We then introduce the latest approaches to designing and generating virtual buildings leveraging 3D shape generation and summarize four characteristics of various approaches to virtual architecture. Based on our analysis, we expound on four research agendas, including agency, communication, user consideration, and integrating tools. Additionally, we highlight four important enablers of ubiquitous interaction with immersive systems in deep learning-assisted architectural generation. Our work contributes to fostering understanding between designers and deep learning techniques, broadening access to designer-AI collaboration. We advocate for interdisciplinary efforts to address this timely research topic, facilitating content designing and generation in the virtual environment.
- [27] arXiv:2306.13509 (replaced) [pdf, html, other]
-
Title: Exploring AI-enhanced Shared Control for an Assistive Robotic ArmComments: Springer LNCS 14517: Engineering Interactive Computer Systems - EICS 2023 International Workshops and Doctoral Consortium (Workshop on Engineering Interactive Systems Embedding AI Technologies)Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Assistive technologies and in particular assistive robotic arms have the potential to enable people with motor impairments to live a self-determined life. More and more of these systems have become available for end users in recent years, such as the Kinova Jaco robotic arm. However, they mostly require complex manual control, which can overwhelm users. As a result, researchers have explored ways to let such robots act autonomously. However, at least for this specific group of users, such an approach has shown to be futile. Here, users want to stay in control to achieve a higher level of personal autonomy, to which an autonomous robot runs counter. In our research, we explore how Artifical Intelligence (AI) can be integrated into a shared control paradigm. In particular, we focus on the consequential requirements for the interface between human and robot and how we can keep humans in the loop while still significantly reducing the mental load and required motor skills.
- [28] arXiv:2405.18887 (replaced) [pdf, html, other]
-
Title: 4Doodle: Two-handed Gestures for Immersive Sketching of Architectural ModelsComments: 9 pages; 15 FiguresSubjects: Human-Computer Interaction (cs.HC)
Three-dimensional immersive sketching for content creation and modeling has been studied for some time. However, research in this domain mainly focused on CAVE-like scenarios. These setups can be expensive and offer a narrow interaction space. Building more affordable setups using head-mounted displays is possible, allowing greater immersion and a larger space for user physical movements. This paper presents a fully immersive environment using bi-manual gestures to sketch and create content freely in the virtual world. This approach can be applied to many scenarios, allowing people to express their ideas or review existing designs. To cope with known motor difficulties and inaccuracy of freehand 3D sketching, we explore proxy geometry and a laser-like metaphor to draw content directly from models and create content surfaces. Our current prototype offers 24 cubic meters for movement, limited by the room size. It features infinite virtual drawing space through pan and scale techniques and is larger than the typical 6-sided cave at a fraction of the cost. In a preliminary study conducted with architects and engineers, our system showed a clear promise as a tool for sketching and 3D content creation in virtual reality with a great emphasis on bi-manual gestures.
- [29] arXiv:2407.12192 (replaced) [pdf, html, other]
-
Title: Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model PromptsSubjects: Human-Computer Interaction (cs.HC)
Recent advancements in Large Language Models (LLMs) and Prompt Engineering have made chatbot customization more accessible, significantly reducing barriers to tasks that previously required programming skills. However, prompt evaluation, especially at the dataset scale, remains complex due to the need to assess prompts across thousands of test instances within a dataset. Our study, based on a comprehensive literature review and pilot study, summarized five critical challenges in prompt evaluation. In response, we introduce a feature-oriented workflow for systematic prompt evaluation. In the context of text summarization, our workflow advocates evaluation with summary characteristics (feature metrics) such as complexity, formality, or naturalness, instead of using traditional quality metrics like ROUGE. This design choice enables a more user-friendly evaluation of prompts, as it guides users in sorting through the ambiguity inherent in natural language. To support this workflow, we introduce Awesum, a visual analytics system that facilitates identifying optimal prompt refinements for text summarization through interactive visualizations, featuring a novel Prompt Comparator design that employs a BubbleSet-inspired design enhanced by dimensionality reduction techniques. We evaluate the effectiveness and general applicability of the system with practitioners from various domains and found that (1) our design helps overcome the learning curve for non-technical people to conduct a systematic evaluation of summarization prompts, and (2) our feature-oriented workflow has the potential to generalize to other NLG and image-generation tasks. For future works, we advocate moving towards feature-oriented evaluation of LLM prompts and discuss unsolved challenges in terms of human-agent interaction.
- [30] arXiv:2311.08644 (replaced) [pdf, html, other]
-
Title: Interpretable by Design: Wrapper Boxes Combine Neural Performance with Faithful Attribution of Model Decisions to Training DataSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Can we preserve the accuracy of neural models while also providing faithful explanations? We present wrapper boxes, a general approach to generate faithful, example-based explanations for model predictions while maintaining predictive performance. After training a neural model as usual, its learned feature representation is input to a classic, interpretable model to perform the actual prediction. This simple strategy is surprisingly effective, with results largely comparable to those of the original neural model, as shown across three large pre-trained language models, two datasets of varying scale, four classic models, and four evaluation metrics. Moreover, because these classic models are interpretable by design, the subset of training examples that determine classic model predictions can be shown directly to users.
- [31] arXiv:2404.17113 (replaced) [pdf, html, other]
-
Title: MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion RecognitionZheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua TaoSubjects: Machine Learning (cs.LG); Human-Computer Interaction (cs.HC)
Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing the dataset size and building more effective algorithms. However, due to problems such as complex environments and inaccurate annotations, current systems are hard to meet the demands of practical applications. Therefore, we organize the MER series of competitions to promote the development of this field. Last year, we launched MER2023, focusing on three interesting topics: multi-label learning, noise robustness, and semi-supervised learning. In this year's MER2024, besides expanding the dataset size, we further introduce a new track around open-vocabulary emotion recognition. The main purpose of this track is that existing datasets usually fix the label space and use majority voting to enhance the annotator consistency. However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels. In this track, we encourage participants to generate any number of labels in any category, aiming to describe emotional states as accurately as possible. Our baseline code relies on MERTools and is available at: this https URL.
- [32] arXiv:2405.06078 (replaced) [pdf, html, other]
-
Title: Collaborative Design for Job-Seekers with Autism: A Conceptual Framework for Future ResearchSubjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC)
The success of employment is highly related to a job seeker's capability of communicating and collaborating with others. While leveraging one's network during the job-seeking process is intuitive to the neurotypical, this can be challenging for people with autism. Recent empirical findings have started to show how facilitating collaboration between people with autism and their social surroundings through new design can improve their chances of employment. This work aims to provide actionable guidelines and conceptual frameworks that future researchers and practitioners can apply to improve collaborative design for job-seekers with autism. Built upon the literature on past technological interventions built for supporting job-seekers with autism, we define three major research challenges of (1) communication support, (2) employment stage-wise support, and (3) group work support. For each challenge, we review the current state-of-the-art practices and possible future solutions. We then suggest future designs that can provide breakthroughs from the interdisciplinary lens of human-AI collaboration, health services, group work, accessibility computing, and natural language processing.
- [33] arXiv:2406.10273 (replaced) [pdf, html, other]
-
Title: Beyond Words: On Large Language Models Actionability in Mission-Critical Risk AnalysisSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Human-Computer Interaction (cs.HC)
Context. Risk analysis assesses potential risks in specific scenarios. Risk analysis principles are context-less; the same methodology can be applied to a risk connected to health and information technology security. Risk analysis requires a vast knowledge of national and international regulations and standards and is time and effort-intensive. A large language model can quickly summarize information in less time than a human and can be fine-tuned to specific tasks.
Aim. Our empirical study aims to investigate the effectiveness of Retrieval-Augmented Generation and fine-tuned LLM in risk analysis. To our knowledge, no prior study has explored its capabilities in risk analysis.
Method. We manually curated 193 unique scenarios leading to 1283 representative samples from over 50 mission-critical analyses archived by the industrial context team in the last five years. We compared the base GPT-3.5 and GPT-4 models versus their Retrieval-Augmented Generation and fine-tuned counterparts. We employ two human experts as competitors of the models and three other human experts to review the models and the former human experts' analysis. The reviewers analyzed 5,000 scenario analyses.
Results and Conclusions. Human experts demonstrated higher accuracy, but LLMs are quicker and more actionable. Moreover, our findings show that RAG-assisted LLMs have the lowest hallucination rates, effectively uncovering hidden risks and complementing human expertise. Thus, the choice of model depends on specific needs, with FTMs for accuracy, RAG for hidden risks discovery, and base models for comprehensiveness and actionability. Therefore, experts can leverage LLMs as an effective complementing companion in risk analysis within a condensed timeframe. They can also save costs by averting unnecessary expenses associated with implementing unwarranted countermeasures. - [34] arXiv:2407.00463 (replaced) [pdf, html, other]
-
Title: Open-Source Conversational AI with SpeechBrain 1.0Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Xuechen Liu, Sangeet Sagar, Jarod Duret, Salima Mdhaffar, Gaelle Laperriere, Mickael Rouvier, Renato De Mori, Yannick EsteveComments: Submitted to JMLR (Machine Learning Open Source Software)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.