Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: KhudaBukhsh, A R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08411  [pdf, other

    cs.CL

    Rater Cohesion and Quality from a Vicarious Perspective

    Authors: Deepak Pandita, Tharindu Cyril Weerasooriya, Sujan Dutta, Sarah K. Luger, Tharindu Ranasinghe, Ashiqur R. KhudaBukhsh, Marcos Zampieri, Christopher M. Homan

    Abstract: Human feedback is essential for building human-centered AI systems across domains where disagreement is prevalent, such as AI safety, content moderation, or sentiment analysis. Many disagreements, particularly in politically charged settings, arise because raters have opposing values or beliefs. Vicarious annotation is a method for breaking down disagreement by asking raters how they think others… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  2. arXiv:2403.13272  [pdf, other

    cs.CY cs.CL cs.SI

    Community Needs and Assets: A Computational Analysis of Community Conversations

    Authors: Md Towhidul Absar Chowdhury, Naveen Sharma, Ashiqur R. KhudaBukhsh

    Abstract: A community needs assessment is a tool used by non-profits and government agencies to quantify the strengths and issues of a community, allowing them to allocate their resources better. Such approaches are transitioning towards leveraging social media conversations to analyze the needs of communities and the assets already present within them. However, manual analysis of exponentially increasing s… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  3. arXiv:2402.13528  [pdf, other

    cs.CY cs.CL cs.LG cs.SI

    Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response

    Authors: Md Towhidul Absar Chowdhury, Soumyajit Datta, Naveen Sharma, Ashiqur R. KhudaBukhsh

    Abstract: Current research concentrates on studying discussions on social media related to structural failures to improve disaster response strategies. However, detecting social web posts discussing concerns about anticipatory failures is under-explored. If such concerns are channeled to the appropriate authorities, it can aid in the prevention and mitigation of potential infrastructural failures. In this p… ▽ More

    Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  4. arXiv:2310.07078  [pdf, other

    cs.LG cs.AI cs.CL

    Auditing and Robustifying COVID-19 Misinformation Datasets via Anticontent Sampling

    Authors: Clay H. Yoo, Ashiqur R. KhudaBukhsh

    Abstract: This paper makes two key contributions. First, it argues that highly specialized rare content classifiers trained on small data typically have limited exposure to the richness and topical diversity of the negative class (dubbed anticontent) as observed in the wild. As a result, these classifiers' strong performance observed on the test set may not translate into real-world settings. In the context… ▽ More

    Submitted 5 August, 2023; originally announced October 2023.

    Comments: This paper has been accepted at AAAI 2023 (Robust and Safe AI track)

  5. arXiv:2309.06415  [pdf, other

    cs.CL cs.CY

    Down the Toxicity Rabbit Hole: A Novel Framework to Bias Audit Large Language Models

    Authors: Arka Dutta, Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh

    Abstract: This paper makes three contributions. First, it presents a generalizable, novel framework dubbed \textit{toxicity rabbit hole} that iteratively elicits toxic content from a wide suite of large language models. Spanning a set of 1,266 identity groups, we first conduct a bias audit of \texttt{PaLM 2} guardrails presenting key insights. Next, we report generalizability across several other models. Th… ▽ More

    Submitted 30 March, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

  6. arXiv:2307.10200  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Disentangling Societal Inequality from Model Biases: Gender Inequality in Divorce Court Proceedings

    Authors: Sujan Dutta, Parth Srivastava, Vaishnavi Solunke, Swaprava Nath, Ashiqur R. KhudaBukhsh

    Abstract: Divorce is the legal dissolution of a marriage by a court. Since this is usually an unpleasant outcome of a marital union, each party may have reasons to call the decision to quit which is generally documented in detail in the court proceedings. Via a substantial corpus of 17,306 court proceedings, this paper investigates gender inequality through the lens of divorce court proceedings. While emerg… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: This paper is accepted at IJCAI 2023 (AI for good track)

  7. arXiv:2307.10189  [pdf, other

    cs.IR cs.CL cs.SI

    Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning

    Authors: Tharindu Cyril Weerasooriya, Sarah Luger, Saloni Poddar, Ashiqur R. KhudaBukhsh, Christopher M. Homan

    Abstract: Human-annotated data plays a critical role in the fairness of AI systems, including those that deal with life-altering decisions or moderating human-created web/social media content. Conventionally, annotator disagreements are resolved before any learning takes place. However, researchers are increasingly identifying annotator disagreement as pervasive and meaningful. They also question the perfor… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted for Publication at ACL 2023

  8. arXiv:2307.03764  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    For Women, Life, Freedom: A Participatory AI-Based Social Web Analysis of a Watershed Moment in Iran's Gender Struggles

    Authors: Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh

    Abstract: In this paper, we present a computational analysis of the Persian language Twitter discourse with the aim to estimate the shift in stance toward gender equality following the death of Mahsa Amini in police custody. We present an ensemble active learning pipeline to train a stance classifier. Our novelty lies in the involvement of Iranian women in an active role as annotators in building this AI sy… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted at IJCAI 2023 (AI for good track)

  9. arXiv:2301.12534  [pdf, other

    cs.CL cs.CY cs.LG

    Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive

    Authors: Tharindu Cyril Weerasooriya, Sujan Dutta, Tharindu Ranasinghe, Marcos Zampieri, Christopher M. Homan, Ashiqur R. KhudaBukhsh

    Abstract: Offensive speech detection is a key component of content moderation. However, what is offensive can be highly subjective. This paper investigates how machine and human moderators disagree on what is offensive when it comes to real-world social web political discourse. We show that (1) there is extensive disagreement among the moderators (humans and machines); and (2) human and large-language-model… ▽ More

    Submitted 9 November, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: Accepted to appear at EMNLP 2023

  10. arXiv:2206.10594  [pdf

    cs.SI

    How is Vaping Framed on Online Knowledge Dissemination Platforms?

    Authors: Keyu Chen, Yiwen Shi, Jun Luo, Joyce Jiang, Shweta Yadav, Munmun De Choudhury, Ashiqur R. KhudaBukhsh, Marzieh Babaeianjelodar, Frederick Altice, Navin Kumar

    Abstract: We analyze 1,888 articles and 1,119,453 vaping posts to study how vaping is framed across multiple knowledge dissemination platforms (Wikipedia, Quora, Medium, Reddit, Stack Exchange, wikiHow). We use various NLP techniques to understand these differences. For example, n-grams, emotion recognition, and question answering results indicate that Medium, Quora, and Stack Exchange are appropriate venue… ▽ More

    Submitted 22 July, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2206.07765, arXiv:2206.09024

  11. arXiv:2206.07765  [pdf

    cs.SI

    US News and Social Media Framing around Vaping

    Authors: Keyu Chen, Marzieh Babaeianjelodar, Yiwen Shi, Rohan Aanegola, Lam Yin Cheung, Preslav Ivanov Nakov, Shweta Yadav, Angus Bancroft, Ashiqur R. KhudaBukhsh, Munmun De Choudhury, Frederick L. Altice, Navin Kumar

    Abstract: In this paper, we investigate how vaping is framed differently (2008-2021) between US news and social media. We analyze 15,711 news articles and 1,231,379 Facebook posts about vaping to study the differences in framing between media varieties. We use word embeddings to provide two-dimensional visualizations of the semantic changes around vaping for news and for social media. We detail that news me… ▽ More

    Submitted 22 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

  12. arXiv:2203.04837  [pdf, other

    eess.AS cs.CL cs.CY

    'Beach' to 'Bitch': Inadvertent Unsafe Transcription of Kids' Content on YouTube

    Authors: Krithika Ramesh, Ashiqur R. KhudaBukhsh, Sumeet Kumar

    Abstract: Over the last few years, YouTube Kids has emerged as one of the highly competitive alternatives to television for children's entertainment. Consequently, YouTube Kids' content should receive an additional level of scrutiny to ensure children's safety. While research on detecting offensive or inappropriate content for kids is gaining momentum, little or no current work exists that investigates to w… ▽ More

    Submitted 17 February, 2022; originally announced March 2022.

    Comments: This paper got accepted at AAAI 2022, AI for Social Impact track

  13. arXiv:2106.12044  [pdf, other

    cs.SI cs.CY

    Empathy and Hope: Resource Transfer to Model Inter-country Social Media Dynamics

    Authors: Clay H. Yoo, Shriphani Palakodety, Rupak Sarkar, Ashiqur R. KhudaBukhsh

    Abstract: The ongoing COVID-19 pandemic resulted in significant ramifications for international relations ranging from travel restrictions, global ceasefires, and international vaccine production and sharing agreements. Amidst a wave of infections in India that resulted in a systemic breakdown of healthcare infrastructure, a social welfare organization based in Pakistan offered to procure medical-grade oxyg… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  14. arXiv:2104.05611  [pdf, other

    cs.SI cs.CY

    Exploring Polarization of Users Behavior on Twitter During the 2019 South American Protests

    Authors: Ramon Villa-Cox, Helen, Zeng, Ashiqur R. KhudaBukhsh, Kathleen M. Carley

    Abstract: Research across different disciplines has documented the expanding polarization in social media. However, much of it focused on the US political system or its culturally controversial topics. In this work, we explore polarization on Twitter in a different context, namely the protest that paralyzed several countries in the South American region in 2019. By leveraging users' endorsement of politicia… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

  15. arXiv:2102.09103  [pdf, other

    cs.CY

    Gender Bias, Social Bias and Representation: 70 Years of B$^H$ollywood

    Authors: Kunal Khadilkar, Ashiqur R. KhudaBukhsh, Tom M. Mitchell

    Abstract: With an outreach in more than 90 countries, a market share of 2.1 billion dollars and a target audience base of at least 1.2 billion people, Bollywood, aka the Mumbai film industry, is a formidable entertainment force. While the number of lives Bollywood can potentially touch is massive, no comprehensive NLP study on the evolution of social and gender biases in Bollywood dialogues exists. Via a su… ▽ More

    Submitted 17 February, 2021; originally announced February 2021.

  16. arXiv:2101.10112  [pdf, other

    cs.CY cs.CL

    Fringe News Networks: Dynamics of US News Viewership following the 2020 Presidential Election

    Authors: Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet, Tom M. Mitchell

    Abstract: The growing political polarization of the American electorate over the last several decades has been widely studied and documented. During the administration of President Donald Trump, charges of "fake news" made social and news media not only the means but, to an unprecedented extent, the topic of political communication. Using data from before the November 3rd, 2020 US Presidential election, rec… ▽ More

    Submitted 21 January, 2021; originally announced January 2021.

  17. arXiv:2011.10280  [pdf, ps, other

    cs.CL

    Are Chess Discussions Racist? An Adversarial Hate Speech Data Set

    Authors: Rupak Sarkar, Ashiqur R. KhudaBukhsh

    Abstract: On June 28, 2020, while presenting a chess podcast on Grandmaster Hikaru Nakamura, Antonio Radić's YouTube handle got blocked because it contained "harmful and dangerous" content. YouTube did not give further specific reason, and the channel got reinstated within 24 hours. However, Radić speculated that given the current political situation, a referral to "black against white", albeit in the conte… ▽ More

    Submitted 20 November, 2020; originally announced November 2020.

  18. arXiv:2010.02339  [pdf, ps, other

    cs.CL cs.CY

    We Don't Speak the Same Language: Interpreting Polarization through Machine Translation

    Authors: Ashiqur R. KhudaBukhsh, Rupak Sarkar, Mark S. Kamlet, Tom M. Mitchell

    Abstract: Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-comm… ▽ More

    Submitted 18 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

  19. arXiv:2008.13347  [pdf, other

    cs.CL cs.CY cs.LG

    Discovering Bilingual Lexicons in Polyglot Word Embeddings

    Authors: Ashiqur R. KhudaBukhsh, Shriphani Palakodety, Tom M. Mitchell

    Abstract: Bilingual lexicons and phrase tables are critical resources for modern Machine Translation systems. Although recent results show that without any seed lexicon or parallel data, highly accurate bilingual lexicons can be learned using unsupervised methods, such methods rely on the existence of large, clean monolingual corpora. In this work, we utilize a single Skip-gram model trained on a multilingu… ▽ More

    Submitted 30 August, 2020; originally announced August 2020.

  20. arXiv:2001.11258  [pdf, ps, other

    cs.CL cs.CY cs.LG

    Harnessing Code Switching to Transcend the Linguistic Barrier

    Authors: Ashiqur R. KhudaBukhsh, Shriphani Palakodety, Jaime G. Carbonell

    Abstract: Code mixing (or code switching) is a common phenomenon observed in social-media content generated by a linguistically diverse user-base. Studies show that in the Indian sub-continent, a substantial fraction of social media posts exhibit code switching. While the difficulties posed by code mixed documents to further downstream analyses are well-understood, lending visibility to code mixed documents… ▽ More

    Submitted 15 June, 2020; v1 submitted 30 January, 2020; originally announced January 2020.

  21. arXiv:2001.01697  [pdf, other

    cs.CY cs.LG

    Social Media Attributions in the Context of Water Crisis

    Authors: Rupak Sarkar, Hirak Sarkar, Sayantan Mahinder, Ashiqur R. KhudaBukhsh

    Abstract: Attribution of natural disasters/collective misfortune is a widely-studied political science problem. However, such studies are typically survey-centric or rely on a handful of experts to weigh in on the matter. In this paper, we explore how can we use social media data and an AI-driven approach to complement traditional surveys and automatically extract attribution factors. We focus on the most-r… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

  22. arXiv:1910.03206  [pdf, ps, other

    cs.CY cs.CL cs.IR cs.LG

    Voice for the Voiceless: Active Sampling to Detect Comments Supporting the Rohingyas

    Authors: Shriphani Palakodety, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell

    Abstract: The Rohingya refugee crisis is one of the biggest humanitarian crises of modern times with more than 600,000 Rohingyas rendered homeless according to the United Nations High Commissioner for Refugees. While it has received sustained press attention globally, no comprehensive research has been performed on social media pertaining to this large evolving crisis. In this work, we construct a substanti… ▽ More

    Submitted 6 January, 2020; v1 submitted 8 October, 2019; originally announced October 2019.

  23. arXiv:1909.12940  [pdf, ps, other

    cs.CY cs.CL cs.LG

    Hope Speech Detection: A Computational Analysis of the Voice of Peace

    Authors: Shriphani Palakodety, Ashiqur R. KhudaBukhsh, Jaime G. Carbonell

    Abstract: The recent Pulwama terror attack (February 14, 2019, Pulwama, Kashmir) triggered a chain of escalating events between India and Pakistan adding another episode to their 70-year-old dispute over Kashmir. The present era of ubiquitious social media has never seen nuclear powers closer to war. In this paper, we analyze this evolving international crisis via a substantial corpus constructed using comm… ▽ More

    Submitted 24 February, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: Minor edits