Zum Hauptinhalt springen

Showing 1–17 of 17 results for author: Ignat, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02623  [pdf, other

    cs.CY cs.AI cs.CL cs.CV

    Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Vision-Language Models

    Authors: Joan Nwatu, Oana Ignat, Rada Mihalcea

    Abstract: Unequal representation across cultures and socioeconomic groups in AI is a significant and challenging problem, often leading to uneven model performance. As a step toward addressing this issue, we formulate translated non-English, geographic, and socioeconomic integrated prompts and evaluate their impact on VL model performance for data from different countries and income groups. Our findings sho… ▽ More

    Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    ACM Class: K.4; I.2.7; I.2.8

  2. arXiv:2406.05967  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark

    Authors: David Romero, Chenyang Lyu, Haryo Akbarianto Wibowo, Teresa Lynn, Injy Hamed, Aditya Nanda Kishore, Aishik Mandal, Alina Dragonetti, Artem Abzaliev, Atnafu Lambebo Tonja, Bontu Fufa Balcha, Chenxi Whitehouse, Christian Salamea, Dan John Velasco, David Ifeoluwa Adelani, David Le Meur, Emilio Villa-Cueva, Fajri Koto, Fauzan Farooqui, Frederico Belcavello, Ganzorig Batnasan, Gisela Vallejo, Grainne Caulfield, Guido Ivetta, Haiyue Song , et al. (50 additional authors not shown)

    Abstract: Visual Question Answering (VQA) is an important task in multimodal AI, and it is often used to test the ability of vision-language models to understand and reason on knowledge present in both visual and textual data. However, most of the current VQA models use datasets that are primarily focused on English and a few major world languages, with images that are typically Western-centric. While recen… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  3. arXiv:2404.12938  [pdf, other

    cs.CL cs.AI

    MAiDE-up: Multilingual Deception Detection of GPT-generated Hotel Reviews

    Authors: Oana Ignat, Xiaomeng Xu, Rada Mihalcea

    Abstract: Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily o… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  4. arXiv:2404.12933  [pdf, other

    cs.CL cs.AI

    Cross-cultural Inspiration Detection and Analysis in Real and LLM-generated Social Media Data

    Authors: Oana Ignat, Gayathri Ganesh Lakshmy, Rada Mihalcea

    Abstract: Inspiration is linked to various positive outcomes, such as increased creativity, productivity, and happiness. Although inspiration has great potential, there has been limited effort toward identifying content that is inspiring, as opposed to just engaging or positive. Additionally, most research has concentrated on Western data, with little attention paid to other cultures. This work is the first… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  5. arXiv:2403.16909  [pdf, other

    cs.AI cs.CL cs.CY

    Towards Algorithmic Fidelity: Mental Health Representation across Demographics in Synthetic vs. Human-generated Data

    Authors: Shinka Mori, Oana Ignat, Andrew Lee, Rada Mihalcea

    Abstract: Synthetic data generation has the potential to impact applications and domains with scarce data. However, before such data is used for sensitive tasks such as mental health, we need an understanding of how different demographics are represented in it. In our paper, we analyze the potential of producing synthetic data using GPT-3 by exploring the various stressors it attributes to different race an… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 14 pages, 16 figures

  6. arXiv:2403.07687  [pdf, other

    cs.CV cs.AI cs.CL

    Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

    Authors: Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea

    Abstract: Current foundation models have shown impressive performance across various tasks. However, several studies have revealed that these models are not effective for everyone due to the imbalanced geographical and economic representation of the data used in the training process. Most of this data comes from Western countries, leading to poor results for underrepresented countries. To address this issue… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted at COLING 2024

  7. arXiv:2311.05746  [pdf, other

    cs.CY cs.AI cs.CL cs.CV

    Bridging the Digital Divide: Performance Variation across Socio-Economic Factors in Vision-Language Models

    Authors: Joan Nwatu, Oana Ignat, Rada Mihalcea

    Abstract: Despite the impressive performance of current AI models reported across various tasks, performance reports often do not include evaluations of how these models perform on the specific groups that will be impacted by these technologies. Among the minority groups under-represented in AI, data from low-income households are often overlooked in data collection and model evaluation. We evaluate the per… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Journal ref: EMNLP 2023

  8. arXiv:2311.02536  [pdf, other

    cs.CV

    Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models

    Authors: Jingru Yi, Burak Uzkent, Oana Ignat, Zili Li, Amanmeet Garg, Xiang Yu, Linda Liu

    Abstract: Grounding-based vision and language models have been successfully applied to low-level vision tasks, aiming to precisely locate objects referred in captions. The effectiveness of grounding representation learning heavily relies on the scale of the training dataset. Despite being a useful data enrichment strategy, data augmentation has received minimal attention in existing vision and language task… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV2024

  9. arXiv:2309.06219  [pdf, other

    cs.CV cs.CL cs.CY cs.IR

    Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

    Authors: Oana Ignat, Santiago Castro, Weiji Li, Rada Mihalcea

    Abstract: We introduce the task of automatic human action co-occurrence identification, i.e., determine whether two human actions can co-occur in the same interval of time. We create and make publicly available the ACE (Action Co-occurrencE) dataset, consisting of a large graph of ~12k co-occurring pairs of visual actions and their corresponding video clips. We describe graph link prediction models that lev… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

  10. arXiv:2305.18786  [pdf, other

    cs.CV cs.CL

    Scalable Performance Analysis for Vision-Language Models

    Authors: Santiago Castro, Oana Ignat, Rada Mihalcea

    Abstract: Joint vision-language models have shown great performance over a diverse set of tasks. However, little is known about their limitations, as the high dimensional space learned by these models makes it difficult to identify semantic errors. Recent work has addressed this problem by designing highly controlled probing task benchmarks. Our paper introduces a more scalable solution that relies on alrea… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Camera-ready version for *SEM 2023

  11. arXiv:2305.12544  [pdf, other

    cs.CL cs.AI

    Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models

    Authors: Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea

    Abstract: Recent progress in large language models (LLMs) has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on. Has it all be… ▽ More

    Submitted 15 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted at COLING 2024

  12. arXiv:2202.13274  [pdf, other

    cs.CL

    OCR Improves Machine Translation for Low-Resource Languages

    Authors: Oana Ignat, Jean Maillard, Vishrav Chaudhary, Francisco Guzmán

    Abstract: We aim to investigate the performance of current OCR systems on low resource languages and low resource scripts. We introduce and make publicly available a novel benchmark, OCR4MT, consisting of real and synthetic data, enriched with noise, for 60 low-resource languages in low resource scripts. We evaluate state-of-the-art OCR systems on our benchmark and analyse most common errors. We show that O… ▽ More

    Submitted 13 March, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

    Comments: Accepted at ACL Findings 2022

  13. arXiv:2202.08138  [pdf, other

    cs.CV cs.CL

    When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

    Authors: Oana Ignat, Santiago Castro, Yuhang Zhou, Jiajun Bao, Dandan Shan, Rada Mihalcea

    Abstract: We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effec… ▽ More

    Submitted 21 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:1906.04236

  14. arXiv:2109.02747  [pdf, other

    cs.CV cs.CL

    WhyAct: Identifying Action Reasons in Lifestyle Vlogs

    Authors: Oana Ignat, Santiago Castro, Hanwen Miao, Weiji Li, Rada Mihalcea

    Abstract: We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information… ▽ More

    Submitted 9 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021

  15. arXiv:2109.02734  [pdf, other

    cs.CL

    Detecting Inspiring Content on Social Media

    Authors: Oana Ignat, Y-Lan Boureau, Jane A. Yu, Alon Halevy

    Abstract: Inspiration moves a person to see new possibilities and transforms the way they perceive their own potential. Inspiration has received little attention in psychology, and has not been researched before in the NLP community. To the best of our knowledge, this work is the first to study inspiration through machine learning methods. We aim to automatically detect inspiring content from social media d… ▽ More

    Submitted 29 May, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: accepted at ACII 2021

  16. arXiv:2104.04182  [pdf, other

    cs.CV

    FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework

    Authors: Santiago Castro, Ruoyao Wang, Pingxuan Huang, Ian Stewart, Oana Ignat, Nan Liu, Jonathan C. Stroud, Rada Mihalcea

    Abstract: We propose fill-in-the-blanks as a video understanding evaluation framework and introduce FIBER -- a novel dataset consisting of 28,000 videos and descriptions in support of this evaluation framework. The fill-in-the-blanks setting tests a model's understanding of a video by requiring it to predict a masked noun phrase in the caption of the video, given the video and the surrounding text. The FIBE… ▽ More

    Submitted 22 March, 2022; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted at ACL 2022 Main conference. Camera-ready version

  17. Identifying Visible Actions in Lifestyle Vlogs

    Authors: Oana Ignat, Laura Burdick, Jia Deng, Rada Mihalcea

    Abstract: We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present. We construct a dataset with crowdsourced manual annotations of visible actions,… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: Accepted at ACL 2019