Search | arXiv e-print repository

A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech

Authors: Gaurav Verma, Rynaa Grover, Jiawei Zhou, Binny Mathew, Jordan Kraemer, Munmun De Choudhury, Srijan Kumar

Abstract: Violence-provoking speech -- speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-… ▽ More Violence-provoking speech -- speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-Asian violence-provoking speech. Using data from ~420k Twitter posts spanning a 3-year duration (January 1, 2020 to February 1, 2023), we develop a codebook to characterize anti-Asian violence-provoking speech and collect a community-crowdsourced dataset to facilitate its large-scale detection using state-of-the-art classifiers. We contrast the capabilities of natural language processing classifiers, ranging from BERT-based to LLM-based classifiers, in detecting violence-provoking speech with their capabilities to detect anti-Asian hateful speech. In contrast to prior work that has demonstrated the effectiveness of such classifiers in detecting hateful speech ($F_1 = 0.89$), our work shows that accurate and reliable detection of violence-provoking speech is a challenging task ($F_1 = 0.69$). We discuss the implications of our findings, particularly the need for proactive interventions to support Asian communities during public health crises. The resources related to the study are available at https://claws-lab.github.io/violence-provoking-speech/. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: Accepted to ACL 2024 Main

arXiv:2407.02662 [pdf, other]

Supporters and Skeptics: LLM-based Analysis of Engagement with Mental Health (Mis)Information Content on Video-sharing Platforms

Authors: Viet Cuong Nguyen, Mini Jain, Abhijat Chauhan, Heather Jaime Soled, Santiago Alvarez Lesmes, Zihang Li, Michael L. Birnbaum, Sunny X. Tang, Srijan Kumar, Munmun De Choudhury

Abstract: Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis… ▽ More Over one in five adults in the US lives with a mental illness. In the face of a shortage of mental health professionals and offline resources, online short-form video content has grown to serve as a crucial conduit for disseminating mental health help and resources. However, the ease of content creation and access also contributes to the spread of misinformation, posing risks to accurate diagnosis and treatment. Detecting and understanding engagement with such content is crucial to mitigating their harmful effects on public health. We perform the first quantitative study of the phenomenon using YouTube Shorts and Bitchute as the sites of study. We contribute MentalMisinfo, a novel labeled mental health misinformation (MHMisinfo) dataset of 739 videos (639 from Youtube and 100 from Bitchute) and 135372 comments in total, using an expert-driven annotation schema. We first found that few-shot in-context learning with large language models (LLMs) are effective in detecting MHMisinfo videos. Next, we discover distinct and potentially alarming linguistic patterns in how audiences engage with MHMisinfo videos through commentary on both video-sharing platforms. Across the two platforms, comments could exacerbate prevailing stigma with some groups showing heightened susceptibility to and alignment with MHMisinfo. We discuss technical and public health-driven adaptive solutions to tackling the "epidemic" of mental health misinformation online. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 12 pages, in submission to ICWSM

arXiv:2404.14548 [pdf, ps, other]

Advancing a Consent-Forward Paradigm for Digital Mental Health Data

Authors: Sachin R. Pendse, Logan Stapleton, Neha Kumar, Munmun De Choudhury, Stevie Chancellor

Abstract: The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to gener… ▽ More The field of digital mental health is advancing at a rapid pace. Passively collected data from user engagements with digital tools and services continue to contribute new insights into mental health and illness. As the field of digital mental health grows, a concerning norm has been established -- digital service users are given little say over how their data is collected, shared, or used to generate revenue for private companies. Given a long history of service user exclusion from data collection practices, we propose an alternative approach that is attentive to this history: the consent-forward paradigm. This paradigm embeds principles of affirmative consent in the design of digital mental health tools and services, strengthening trust through designing around individual choices and needs, and proactively protecting users from unexpected harm. In this perspective, we outline practical steps to implement this paradigm, toward ensuring that people searching for care have the safest experiences possible. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 15 pages with 2 tables

arXiv:2401.14362 [pdf, ps, other]

The Typing Cure: Experiences with Large Language Model Chatbots for Mental Health Support

Authors: Inhwa Song, Sachin R. Pendse, Neha Kumar, Munmun De Choudhury

Abstract: People experiencing severe distress increasingly use Large Language Model (LLM) chatbots as mental health support tools. Discussions on social media have described how engagements were lifesaving for some, but evidence suggests that general-purpose LLM chatbots also have notable risks that could endanger the welfare of users if not designed responsibly. In this study, we investigate the lived expe… ▽ More People experiencing severe distress increasingly use Large Language Model (LLM) chatbots as mental health support tools. Discussions on social media have described how engagements were lifesaving for some, but evidence suggests that general-purpose LLM chatbots also have notable risks that could endanger the welfare of users if not designed responsibly. In this study, we investigate the lived experiences of people who have used LLM chatbots for mental health support. We build on interviews with 21 individuals from globally diverse backgrounds to analyze how users create unique support roles for their chatbots, fill in gaps in everyday care, and navigate associated cultural limitations when seeking support from chatbots. We ground our analysis in psychotherapy literature around effective support, and introduce the concept of therapeutic alignment, or aligning AI with therapeutic values for mental health contexts. Our study offers recommendations for how designers can approach the ethical and effective use of LLM chatbots and other AI mental health support tools in mental health care. △ Less

Submitted 6 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: The first two authors contributed equally to this work; typos corrected

arXiv:2311.14693 [pdf, other]

Benefits and Harms of Large Language Models in Digital Mental Health

Authors: Munmun De Choudhury, Sachin R. Pendse, Neha Kumar

Abstract: The past decade has been transformative for mental health research and practice. The ability to harness large repositories of data, whether from electronic health records (EHR), mobile devices, or social media, has revealed a potential for valuable insights into patient experiences, promising early, proactive interventions, as well as personalized treatment plans. Recent developments in generative… ▽ More The past decade has been transformative for mental health research and practice. The ability to harness large repositories of data, whether from electronic health records (EHR), mobile devices, or social media, has revealed a potential for valuable insights into patient experiences, promising early, proactive interventions, as well as personalized treatment plans. Recent developments in generative artificial intelligence, particularly large language models (LLMs), show promise in leading digital mental health to uncharted territory. Patients are arriving at doctors' appointments with information sourced from chatbots, state-of-the-art LLMs are being incorporated in medical software and EHR systems, and chatbots from an ever-increasing number of startups promise to serve as AI companions, friends, and partners. This article presents contemporary perspectives on the opportunities and risks posed by LLMs in the design, development, and implementation of digital mental health tools. We adopt an ecological framework and draw on the affordances offered by LLMs to discuss four application areas -- care-seeking behaviors from individuals in need of care, community care provision, institutional and medical care provision, and larger care ecologies at the societal level. We engage in a thoughtful consideration of whether and how LLM-based technologies could or should be employed for enhancing mental health. The benefits and harms our article surfaces could serve to help shape future research, advocacy, and regulatory efforts focused on creating more responsible, user-friendly, equitable, and secure LLM-based tools for mental health treatment and intervention. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.13132 [pdf, other]

Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries

Authors: Yiqiao Jin, Mohit Chandra, Gaurav Verma, Yibo Hu, Munmun De Choudhury, Srijan Kumar

Abstract: Large language models (LLMs) are transforming the ways the general public accesses and consumes information. Their influence is particularly pronounced in pivotal sectors like healthcare, where lay individuals are increasingly appropriating LLMs as conversational agents for everyday queries. While LLMs demonstrate impressive language understanding and generation proficiencies, concerns regarding t… ▽ More Large language models (LLMs) are transforming the ways the general public accesses and consumes information. Their influence is particularly pronounced in pivotal sectors like healthcare, where lay individuals are increasingly appropriating LLMs as conversational agents for everyday queries. While LLMs demonstrate impressive language understanding and generation proficiencies, concerns regarding their safety remain paramount in these high-stake domains. Moreover, the development of LLMs is disproportionately focused on English. It remains unclear how these LLMs perform in the context of non-English languages, a gap that is critical for ensuring equity in the real-world use of these systems.This paper provides a framework to investigate the effectiveness of LLMs as multi-lingual dialogue systems for healthcare queries. Our empirically-derived framework XlingEval focuses on three fundamental criteria for evaluating LLM responses to naturalistic human-authored health-related questions: correctness, consistency, and verifiability. Through extensive experiments on four major global languages, including English, Spanish, Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets, and through an amalgamation of algorithmic and human-evaluation strategies, we found a pronounced disparity in LLM responses across these languages, indicating a need for enhanced cross-lingual capabilities. We further propose XlingHealth, a cross-lingual benchmark for examining the multilingual capabilities of LLMs in the healthcare context. Our findings underscore the pressing need to bolster the cross-lingual capacities of these models, and to provide an equitable information ecosystem accessible to all. △ Less

Submitted 23 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 18 pages, 7 figures

arXiv:2310.10928 [pdf, ps, other]

Using Audio Data to Facilitate Depression Risk Assessment in Primary Health Care

Authors: Adam Valen Levinson, Abhay Goyal, Roger Ho Chun Man, Roy Ka-Wei Lee, Koustuv Saha, Nimay Parekh, Frederick L. Altice, Lam Yin Cheung, Munmun De Choudhury, Navin Kumar

Abstract: Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve ove… ▽ More Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve overall diagnosis and treatment outcomes. Telehealth consultations often have video issues, such as poor connectivity or dropped calls. Audio-only telehealth is often more practical for lower-income patients who may lack stable internet connections. Thus, our study focused on using audio data to predict depression risk. The objectives were to: 1) Collect audio data from 24 people (12 with depression and 12 without mental health or major health condition diagnoses); 2) Build a machine learning model to predict depression risk. TPOT, an autoML tool, was used to select the best machine learning algorithm, which was the K-nearest neighbors classifier. The selected model had high performance in classifying depression risk (Precision: 0.98, Recall: 0.93, F1-Score: 0.96). These findings may lead to a range of tools to help screen for and treat depression. By developing tools to detect depression risk, patients can be routed to AI-driven chatbots for initial screenings. Partnerships with a range of stakeholders are crucial to implementing these solutions. Moreover, ethical considerations, especially around data privacy and potential biases in AI models, need to be at the forefront of any AI-driven intervention in mental health care. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.08483 [pdf, other]

Understanding the Humans Behind Online Misinformation: An Observational Study Through the Lens of the COVID-19 Pandemic

Authors: Mohit Chandra, Anush Mattapalli, Munmun De Choudhury

Abstract: The proliferation of online misinformation has emerged as one of the biggest threats to society. Considerable efforts have focused on building misinformation detection models, still the perils of misinformation remain abound. Mitigating online misinformation and its ramifications requires a holistic approach that encompasses not only an understanding of its intricate landscape in relation to the c… ▽ More The proliferation of online misinformation has emerged as one of the biggest threats to society. Considerable efforts have focused on building misinformation detection models, still the perils of misinformation remain abound. Mitigating online misinformation and its ramifications requires a holistic approach that encompasses not only an understanding of its intricate landscape in relation to the complex issue and topic-rich information ecosystem online, but also the psychological drivers of individuals behind it. Adopting a time series analytic technique and robust causal inference-based design, we conduct a large-scale observational study analyzing over 32 million COVID-19 tweets and 16 million historical timeline tweets. We focus on understanding the behavior and psychology of users disseminating misinformation during COVID-19 and its relationship with the historical inclinations towards sharing misinformation on Non-COVID domains before the pandemic. Our analysis underscores the intricacies inherent to cross-domain misinformation, and highlights that users' historical inclination toward sharing misinformation is positively associated with their present behavior pertaining to misinformation sharing on emergent topics and beyond. This work may serve as a valuable foundation for designing user-centric inoculation strategies and ecologically-grounded agile interventions for effectively tackling online misinformation. △ Less

Submitted 18 January, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2307.12402 [pdf, ps, other]

ChatGPT and Bard Responses to Polarizing Questions

Authors: Abhay Goyal, Muhammad Siddique, Nimay Parekh, Zach Schwitzky, Clara Broekaert, Connor Michelotti, Allie Wong, Lam Yin Cheung, Robin O Hanlon, Lam Yin Cheung, Munmun De Choudhury, Roy Ka-Wei Lee, Navin Kumar

Abstract: Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometime… ▽ More Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometimes produce text that is convincing, but often incorrect, known as hallucinations. As such, their use can distort scientific facts and spread misinformation. To counter polarizing responses on these tools, it is critical to provide an overview of such responses so stakeholders can determine which topics tend to produce more contentious responses -- key to developing targeted regulatory policy and interventions. In addition, there currently exists no annotated dataset of ChatGPT and Bard responses around possibly polarizing topics, central to the above aims. We address the indicated issues through the following contribution: Focusing on highly polarizing topics in the US, we created and described a dataset of ChatGPT and Bard responses. Broadly, our results indicated a left-leaning bias for both ChatGPT and Bard, with Bard more likely to provide responses around polarizing topics. Bard seemed to have fewer guardrails around controversial topics, and appeared more willing to provide comprehensive, and somewhat human-like responses. Bard may thus be more likely abused by malicious actors. Stakeholders may utilize our findings to mitigate misinformative and/or polarizing responses from LLMs △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2304.07417 [pdf, other]

Understanding and Mitigating Mental Health Misinformation on Video Sharing Platforms

Authors: Viet Cuong Nguyen, Michael Birnbaum, Munmun De Choudhury

Abstract: Despite the ever-strong demand for mental health care globally, access to traditional mental health services remains severely limited expensive, and stifled by stigma and systemic barriers. Thus, over the last few years, young people are increasingly turning to content on video-sharing platforms (VSPs) like TikTok and YouTube to help them navigate their mental health journey. However, navigating t… ▽ More Despite the ever-strong demand for mental health care globally, access to traditional mental health services remains severely limited expensive, and stifled by stigma and systemic barriers. Thus, over the last few years, young people are increasingly turning to content on video-sharing platforms (VSPs) like TikTok and YouTube to help them navigate their mental health journey. However, navigating towards trustworthy information relating to mental health on these platforms is challenging, given the uncontrollable and unregulated growth of dedicated mental health content and content creators catering to a wide array of mental health conditions on these platforms. In this paper, we attempt to define what constitutes as "mental health misinformation" through examples. In addition, we also suggest some open questions to answer and challenges to tackle regarding this important and timely research topic △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: 5 pages, 1 figure

arXiv:2302.00799 [pdf, other]

doi 10.1145/3579467

Charting the Sociotechnical Gap in Explainable AI: A Framework to Address the Gap in XAI

Authors: Upol Ehsan, Koustuv Saha, Munmun De Choudhury, Mark O. Riedl

Abstract: Explainable AI (XAI) systems are sociotechnical in nature; thus, they are subject to the sociotechnical gap--divide between the technical affordances and the social needs. However, charting this gap is challenging. In the context of XAI, we argue that charting the gap improves our problem understanding, which can reflexively provide actionable insights to improve explainability. Utilizing two case… ▽ More Explainable AI (XAI) systems are sociotechnical in nature; thus, they are subject to the sociotechnical gap--divide between the technical affordances and the social needs. However, charting this gap is challenging. In the context of XAI, we argue that charting the gap improves our problem understanding, which can reflexively provide actionable insights to improve explainability. Utilizing two case studies in distinct domains, we empirically derive a framework that facilitates systematic charting of the sociotechnical gap by connecting AI guidelines in the context of XAI and elucidating how to use them to address the gap. We apply the framework to a third case in a new domain, showcasing its affordances. Finally, we discuss conceptual implications of the framework, share practical considerations in its operationalization, and offer guidance on transferring it to new contexts. By making conceptual and practical contributions to understanding the sociotechnical gap in XAI, the framework expands the XAI design space. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: Published at ACM CSCW 2023

Journal ref: ACM CSCW 2023

arXiv:2206.10594 [pdf]

How is Vaping Framed on Online Knowledge Dissemination Platforms?

Authors: Keyu Chen, Yiwen Shi, Jun Luo, Joyce Jiang, Shweta Yadav, Munmun De Choudhury, Ashiqur R. KhudaBukhsh, Marzieh Babaeianjelodar, Frederick Altice, Navin Kumar

Abstract: We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venue… ▽ More We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venues for those looking to transition from smoking to vaping. Other platforms (Reddit, wikiHow) are more for vaping hobbyists and may not sufficiently dissuade youth vaping. Conversely, Wikipedia may exaggerate vaping harms, dissuading smokers from transitioning. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to design informational tools to reinforce or mitigate vaping (mis)perceptions online. △ Less

Submitted 22 July, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: arXiv admin note: text overlap with arXiv:2206.07765, arXiv:2206.09024

arXiv:2206.09024 [pdf]

Partisan US News Media Representations of Syrian Refugees

Authors: Keyu Chen, Marzieh Babaeianjelodar, Yiwen Shi, Kamila Janmohamed, Rupak Sarkar, Ingmar Weber, Thomas Davidson, Munmun De Choudhury, Jonathan Huang, Shweta Yadav, Ashique Khudabukhsh, Preslav Ivanov Nakov, Chris Bauch, Orestis Papakyriakopoulos, Kaveh Khoshnood, Navin Kumar

Abstract: We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media t… ▽ More We investigate how representations of Syrian refugees (2011-2021) differ across US partisan news outlets. We analyze 47,388 articles from the online US media about Syrian refugees to detail differences in reporting between left- and right-leaning media. We use various NLP techniques to understand these differences. Our polarization and question answering results indicated that left-leaning media tended to represent refugees as child victims, welcome in the US, and right-leaning media cast refugees as Islamic terrorists. We noted similar results with our sentiment and offensive speech scores over time, which detail possibly unfavorable representations of refugees in right-leaning media. A strength of our work is how the different techniques we have applied validate each other. Based on our results, we provide several recommendations. Stakeholders may utilize our findings to intervene around refugee representations, and design communications campaigns that improve the way society sees refugees and possibly aid refugee outcomes. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2206.07765 [pdf]

US News and Social Media Framing around Vaping

Authors: Keyu Chen, Marzieh Babaeianjelodar, Yiwen Shi, Rohan Aanegola, Lam Yin Cheung, Preslav Ivanov Nakov, Shweta Yadav, Angus Bancroft, Ashiqur R. KhudaBukhsh, Munmun De Choudhury, Frederick L. Altice, Navin Kumar

Abstract: In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news me… ▽ More In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news media framing of vaping shifted over time in line with emergent regulatory trends, such as; flavored vaping bans, with little discussion around vaping as a smoking cessation tool. We found that social media discussions were far more varied, with transitions toward vaping both as a public health harm and as a smoking cessation tool. Our cloze test, dynamic topic model, and question answering showed similar patterns, where social media, but not news media, characterizes vaping as combustible cigarette substitute. We use n-grams to detail that social media data first centered on vaping as a smoking cessation tool, and in 2019 moved toward narratives around vaping regulation, similar to news media frames. Overall, social media tracks the evolution of vaping as a social practice, while news media reflects more risk based concerns. A strength of our work is how the different techniques we have applied validate each other. Stakeholders may utilize our findings to intervene around the framing of vaping, and may design communications campaigns that improve the way society sees vaping, thus possibly aiding smoking cessation; and reducing youth vaping. △ Less

Submitted 22 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

arXiv:2205.09744 [pdf, other]

Overcoming Language Disparity in Online Content Classification with Multimodal Learning

Authors: Gaurav Verma, Rohit Mujumdar, Zijie J. Wang, Munmun De Choudhury, Srijan Kumar

Abstract: Advances in Natural Language Processing (NLP) have revolutionized the way researchers and practitioners address crucial societal problems. Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks. However, the development of advanced computational techniques and resources is disproportionately focused on the English language, side… ▽ More Advances in Natural Language Processing (NLP) have revolutionized the way researchers and practitioners address crucial societal problems. Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks. However, the development of advanced computational techniques and resources is disproportionately focused on the English language, sidelining a majority of the languages spoken globally. While existing research has developed better multilingual and monolingual language models to bridge this language disparity between English and non-English languages, we explore the promise of incorporating the information contained in images via multimodal machine learning. Our comparative analyses on three detection tasks focusing on crisis information, fake news, and emotion recognition, as well as five high-resource non-English languages, demonstrate that: (a) detection frameworks based on pre-trained large language models like BERT and multilingual-BERT systematically perform better on the English language compared against non-English languages, and (b) including images via multimodal learning bridges this performance gap. We situate our findings with respect to existing work on the pitfalls of large language models, and discuss their theoretical and practical implications. Resources for this paper are available at https://multimodality-language-disparity.github.io/. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: Accepted for publication at ICWSM 2022 as a full paper

arXiv:2201.03074 [pdf, other]

A Survey of Passive Sensing in the Workplace

Authors: Subigya Nepal, Gonzalo J. Martinez, Arvind Pillai, Koustuv Saha, Shayan Mirjafari, Vedant Das Swain, Xuhai Xu, Pino G. Audia, Munmun De Choudhury, Anind K. Dey, Aaron Striegel, Andrew T. Campbell

Abstract: As emerging technologies increasingly integrate into all facets of our lives, the workplace stands at the forefront of potential transformative changes. A notable development in this realm is the advent of passive sensing technology, designed to enhance both cognitive and physical capabilities by monitoring human behavior. This paper reviews current research on the application of passive sensing t… ▽ More As emerging technologies increasingly integrate into all facets of our lives, the workplace stands at the forefront of potential transformative changes. A notable development in this realm is the advent of passive sensing technology, designed to enhance both cognitive and physical capabilities by monitoring human behavior. This paper reviews current research on the application of passive sensing technology in the workplace, focusing on its impact on employee wellbeing and productivity. Additionally, we explore unresolved issues and outline prospective pathways for the incorporation of passive sensing in future workplaces. △ Less

Submitted 30 March, 2024; v1 submitted 9 January, 2022; originally announced January 2022.

Comments: Added references and other minor revisions. Also udated to include relevant works published after 2022

ACM Class: H.5.0

arXiv:2109.05322 [pdf, other]

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

Authors: Mai ElSherief, Caleb Ziems, David Muchlinski, Vaishnavi Anupindi, Jordyn Seybolt, Munmun De Choudhury, Diyi Yang

Abstract: Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified ta… ▽ More Hate speech has grown significantly on social media, causing serious consequences for victims of all demographics. Despite much attention being paid to characterize and detect discriminatory speech, most work has focused on explicit or overt hate speech, failing to address a more pervasive form based on coded or indirect language. To fill this gap, this work introduces a theoretically-justified taxonomy of implicit hate speech and a benchmark corpus with fine-grained labels for each message and its implication. We present systematic analyses of our dataset using contemporary baselines to detect and explain implicit hate speech, and we discuss key features that challenge existing models. This dataset will continue to serve as a useful benchmark for understanding this multifaceted issue. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Comments: EMNLP 2021 main conference

arXiv:2006.13259 [pdf]

Computational Support for Substance Use Disorder Prevention, Detection, Treatment, and Recovery

Authors: Lana Yarosh, Suzanne Bakken, Alan Borning, Munmun De Choudhury, Cliff Lampe, Elizabeth Mynatt, Stephen Schueller, Tiffany Veinot

Abstract: Substance Use Disorders (SUDs) involve the misuse of any or several of a wide array of substances, such as alcohol, opioids, marijuana, and methamphetamine. SUDs are characterized by an inability to decrease use despite severe social, economic, and health-related consequences to the individual. A 2017 national survey identified that 1 in 12 US adults have or have had a substance use disorder. The… ▽ More Substance Use Disorders (SUDs) involve the misuse of any or several of a wide array of substances, such as alcohol, opioids, marijuana, and methamphetamine. SUDs are characterized by an inability to decrease use despite severe social, economic, and health-related consequences to the individual. A 2017 national survey identified that 1 in 12 US adults have or have had a substance use disorder. The National Institute on Drug Abuse estimates that SUDs relating to alcohol, prescription opioids, and illicit drug use cost the United States over $520 billion annually due to crime, lost work productivity, and health care expenses. Most recently, the US Department of Health and Human Services has declared the national opioid crisis a public health emergency to address the growing number of opioid overdose deaths in the United States. In this interdisciplinary workshop, we explored how computational support - digital systems, algorithms, and sociotechnical approaches (which consider how technology and people interact as complex systems) - may enhance and enable innovative interventions for prevention, detection, treatment, and long-term recovery from SUDs. The Computing Community Consortium (CCC) sponsored a two-day workshop titled "Computational Support for Substance Use Disorder Prevention, Detection, Treatment, and Recovery" on November 14-15, 2019 in Washington, DC. As outcomes from this visioning process, we identified three broad opportunity areas for computational support in the SUD context: 1. Detecting and mitigating risk of SUD relapse, 2. Establishing and empowering social support networks, and 3. Collecting and sharing data meaningfully across ecologies of formal and informal care. △ Less

Submitted 23 June, 2020; originally announced June 2020.

Comments: A Computing Community Consortium (CCC) workshop report, 28 pages

Report number: ccc2020report_3

arXiv:2006.08364 [pdf, other]

Jointly Predicting Job Performance, Personality, Cognitive Ability, Affect, and Well-Being

Authors: Pablo Robles-Granda, Suwen Lin, Xian Wu, Sidney D'Mello, Gonzalo J. Martinez, Koustuv Saha, Kari Nies, Gloria Mark, Andrew T. Campbell, Munmun De Choudhury, Anind D. Dey, Julie Gregg, Ted Grover, Stephen M. Mattingly, Shayan Mirjafari, Edward Moskal, Aaron Striegel, Nitesh V. Chawla

Abstract: Assessment of job performance, personalized health and psychometric measures are domains where data-driven and ubiquitous computing exhibits the potential of a profound impact in the future. Existing techniques use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits, to assess well-being and cognitive attributes of individuals. However, these techniques can neit… ▽ More Assessment of job performance, personalized health and psychometric measures are domains where data-driven and ubiquitous computing exhibits the potential of a profound impact in the future. Existing techniques use data extracted from questionnaires, sensors (wearable, computer, etc.), or other traits, to assess well-being and cognitive attributes of individuals. However, these techniques can neither predict individual's well-being and psychological traits in a global manner nor consider the challenges associated to processing the data available, that is incomplete and noisy. In this paper, we create a benchmark for predictive analysis of individuals from a perspective that integrates: physical and physiological behavior, psychological states and traits, and job performance. We design data mining techniques as benchmark and uses real noisy and incomplete data derived from wearable sensors to predict 19 constructs based on 12 standardized well-validated tests. The study included 757 participants who were knowledge workers in organizations across the USA with varied work roles. We developed a data mining framework to extract the meaningful predictors for each of the 19 variables under consideration. Our model is the first benchmark that combines these various instrument-derived variables in a single framework to understand people's behavior by leveraging real uncurated data from wearable, mobile, and social media sources. We verify our approach experimentally using the data obtained from our longitudinal study. The results show that our framework is consistently reliable and capable of predicting the variables under study better than the baselines when prediction is restricted to the noisy, incomplete data. △ Less

Submitted 10 June, 2020; originally announced June 2020.

arXiv:2005.11228 [pdf, other]

Leveraging WiFi Network Logs to Infer Student Collocation and its Relationship with Academic Performance

Authors: V. Das Swain, H. Kwon, S. Sargolzaei, B. Saket, M. Bin Morshed, K. Tran, D. Patel, Y. Tian, J. Philipose, Y. Cui, T. Plötz, M. De Choudhury, G. D. Abowd

Abstract: A comprehensive understanding of collocation can help understand performance outcomes. For university cohorts, this needs data that describes large groups over a long period. Harnessing user devices to infer this, while tempting, is challenged by privacy concerns, power consumption, and maintenance issues. Alternatively, embedding new sensors in the environment is limited by the expense of coverin… ▽ More A comprehensive understanding of collocation can help understand performance outcomes. For university cohorts, this needs data that describes large groups over a long period. Harnessing user devices to infer this, while tempting, is challenged by privacy concerns, power consumption, and maintenance issues. Alternatively, embedding new sensors in the environment is limited by the expense of covering the entire campus. We investigate the feasibility of leveraging WiFi association logs for this purpose. While these provide coarse approximations of location, these are easily obtainable and depict multiple users on campus over a semester. We explore how these coarse collocations are related to individual performance. Specifically, we inspect the association between individual performance and the collocation behaviors of project group members. We study 163 students (in 54 project groups) over 14 weeks. After describing how we determine collocation with the WiFi logs, we present a study to analyze how collocation within groups relates to a student's final score. We find collocation behaviors show a significant correlation (Pearson's r = 0.24) with performance -- better than both peer feedback or individual behaviors like attendance. Finally, we discuss how repurposing WiFi logs can facilitate applications for domains like mental wellbeing and physical health. △ Less

Submitted 5 May, 2021; v1 submitted 22 May, 2020; originally announced May 2020.

Comments: 25 pages, 10 figures, 5 tables

ACM Class: J.4

arXiv:1712.01411 [pdf, other]

#anorexia, #anarexia, #anarexyia: Characterizing Online Community Practices with Orthographic Variation

Authors: Ian Stewart, Stevie Chancellor, Munmun De Choudhury, Jacob Eisenstein

Abstract: Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We… ▽ More Distinctive linguistic practices help communities build solidarity and differentiate themselves from outsiders. In an online community, one such practice is variation in orthography, which includes spelling, punctuation, and capitalization. Using a dataset of over two million Instagram posts, we investigate orthographic variation in a community that shares pro-eating disorder (pro-ED) content. We find that not only does orthographic variation grow more frequent over time, it also becomes more profound or deep, with variants becoming increasingly distant from the original: as, for example, #anarexyia is more distant than #anarexia from the original spelling #anorexia. These changes are driven by newcomers, who adopt the most extreme linguistic practices as they enter the community. Moreover, this behavior correlates with engagement: the newcomers who adopt deeper orthographic variants tend to remain active for longer in the community, and the posts that contain deeper variation receive more positive feedback in the form of "likes." Previous work has linked community membership change with language change, and our work casts this connection in a new light, with newcomers driving an evolving practice, rather than adapting to it. We also demonstrate the utility of orthographic variation as a new lens to study sociolinguistic change in online communities, particularly when the change results from an exogenous force such as a content ban. △ Less

Submitted 4 December, 2017; originally announced December 2017.

arXiv:1605.08844 [pdf]

doi 10.1145/2486227.2486249

Smart Societies: From Citizens as Sensors to Collective Action

Authors: Andrés Monroy-Hernández, Shelly Farnham, Emre Kıcıman, Scott Counts, Munmun De Choudhury

Abstract: Social media has become globally ubiquitous, transforming how people are networked and mobilized. This forum explores research and applications of these new networked publics at individual, organizational, and societal levels. Social media has become globally ubiquitous, transforming how people are networked and mobilized. This forum explores research and applications of these new networked publics at individual, organizational, and societal levels. △ Less

Submitted 28 May, 2016; originally announced May 2016.

Journal ref: interactions 20, 4 (July 2013)

arXiv:1603.07933 [pdf, other]

Quote RTs on Twitter: Usage of the New Feature for Political Discourse

Authors: Kiran Garimella, Ingmar Weber, Munmun De Choudhury

Abstract: Social media platforms provide several social interactional features. Due to the large scale reach of social media, these interactional features help enable various types of political discourse. Constructive and diversified discourse is important for sustaining healthy communities and reducing the impact of echo chambers. In this paper, we empirically examine the role of a newly introduced Twitter… ▽ More Social media platforms provide several social interactional features. Due to the large scale reach of social media, these interactional features help enable various types of political discourse. Constructive and diversified discourse is important for sustaining healthy communities and reducing the impact of echo chambers. In this paper, we empirically examine the role of a newly introduced Twitter feature, 'quote retweets' (or 'quote RTs') in political discourse, specifically whether it has led to improved, civil, and balanced exchange. Quote RTs allow users to quote the tweet they retweet, while adding a short comment. Our analysis using content, network and crowd labeled data indicates that the feature has increased political discourse and its diffusion, compared to existing features. We discuss the implications of our findings in understanding and reducing online polarization. △ Less

Submitted 25 March, 2016; originally announced March 2016.

Comments: Accepted as short paper at ACM WebScience 2016

arXiv:1507.01291 [pdf]

doi 10.1145/2441776.2441938

The New War Correspondents: the Rise of Civic Media Curation in Urban Warfare

Authors: Andrés Monroy-Hernández, danah boyd, Emre Kiciman, Munmun De Choudhury, Scott Counts

Abstract: In this paper we examine the information sharing practices of people living in cities amid armed conflict. We describe the volume and frequency of microblogging activity on Twitter from four cities afflicted by the Mexican Drug War, showing how citizens use social media to alert one another and to comment on the violence that plagues their communities. We then investigate the emergence of civic me… ▽ More In this paper we examine the information sharing practices of people living in cities amid armed conflict. We describe the volume and frequency of microblogging activity on Twitter from four cities afflicted by the Mexican Drug War, showing how citizens use social media to alert one another and to comment on the violence that plagues their communities. We then investigate the emergence of civic media "curators," individuals who act as "war correspondents" by aggregating and disseminating information to large numbers of people on social media. We conclude by outlining the implications of our observations for the design of civic media systems in wartime. △ Less

Submitted 5 July, 2015; originally announced July 2015.

Comments: In Proceedings of the 2013 conference on Computer supported cooperative work (CSCW 2013). ACM, New York, NY, USA, 1443-1452

arXiv:1507.01287 [pdf]

doi 10.1145/2556288.2557197

"Narco" Emotions: Affect and Desensitization in Social Media during the Mexican Drug War

Authors: Munmun De Choudhury, Andrés Monroy-Hernández, Gloria Mark

Abstract: Social media platforms have emerged as prominent information sharing ecosystems in the context of a variety of recent crises, ranging from mass emergencies, to wars and political conflicts. We study affective responses in social media and how they might indicate desensitization to violence experienced in communities embroiled in an armed conflict. Specifically, we examine three established affect… ▽ More Social media platforms have emerged as prominent information sharing ecosystems in the context of a variety of recent crises, ranging from mass emergencies, to wars and political conflicts. We study affective responses in social media and how they might indicate desensitization to violence experienced in communities embroiled in an armed conflict. Specifically, we examine three established affect measures: negative affect, activation, and dominance as observed on Twitter in relation to a number of statistics on protracted violence in four major cities afflicted by the Mexican Drug War. During a two year period (Aug 2010-Dec 2012), while violence was on the rise in these regions, our findings show a decline in negative emotional expression as well as a rise in emotional arousal and dominance in Twitter posts: aspects known to be psychological markers of desensitization. We discuss the implications of our work for behavioral health, facilitating rehabilitation efforts in communities enmeshed in an acute and persistent urban warfare, and the impact on civic engagement. △ Less

Submitted 5 July, 2015; originally announced July 2015.

Comments: Best paper award at the 32nd annual ACM conference on Human factors in computing systems (CHI '14). ACM, New York, NY, USA, pages 3563-3572

Journal ref: In Proceedings of the 32nd annual ACM conference on Human factors in computing systems (CHI 2014). ACM, New York, NY, USA, pages 3563-3572

arXiv:1006.1702 [pdf, other]

"Birds of a Feather": Does User Homophily Impact Information Diffusion in Social Media?

Authors: Munmun De Choudhury, Hari Sundaram, Ajita John, Doree Duncan Seligmann, Aisling Kelliher

Abstract: This article investigates the impact of user homophily on the social process of information diffusion in online social media. Over several decades, social scientists have been interested in the idea that similarity breeds connection: precisely known as "homophily". Homophily has been extensively studied in the social sciences and refers to the idea that users in a social system tend to bond more w… ▽ More This article investigates the impact of user homophily on the social process of information diffusion in online social media. Over several decades, social scientists have been interested in the idea that similarity breeds connection: precisely known as "homophily". Homophily has been extensively studied in the social sciences and refers to the idea that users in a social system tend to bond more with ones who are similar to them than to ones who are dissimilar. The key observation is that homophily structures the ego-networks of individuals and impacts their communication behavior. It is therefore likely to effect the mechanisms in which information propagates among them. To this effect, we investigate the interplay between homophily along diverse user attributes and the information diffusion process on social media. In our approach, we first extract diffusion characteristics---corresponding to the baseline social graph as well as graphs filtered on different user attributes (e.g. location, activity). Second, we propose a Dynamic Bayesian Network based framework to predict diffusion characteristics at a future time. Third, the impact of attribute homophily is quantified by the ability of the predicted characteristics in explaining actual diffusion, and external variables, including trends in search and news. Experimental results on a large Twitter dataset demonstrate that choice of the homophilous attribute can impact the prediction of information diffusion, given a specific metric and a topic. In most cases, attribute homophily is able to explain the actual diffusion and external trends by ~15-25% over cases when homophily is not considered. △ Less

Submitted 9 June, 2010; originally announced June 2010.

Comments: 31 pages, 10 figures, 3 tables

ACM Class: H.1.2; H.2.8; H.3.3; H.3.5; H.4.3; H.5.4; I.2.6; J.4

Showing 1–26 of 26 results for author: De Choudhury, M