Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Bhiwandiwalla, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.15993  [pdf, other

    cs.CV cs.LG physics.ao-ph

    ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution

    Authors: Sungduk Yu, Brian L. White, Anahita Bhiwandiwalla, Musashi Hinck, Matthew Lyle Olson, Tung Nguyen, Vasudev Lal

    Abstract: Detecting and attributing temperature increases due to climate change is crucial for understanding global warming and guiding adaptation strategies. The complexity of distinguishing human-induced climate signals from natural variability has challenged traditional detection and attribution (D&A) approaches, which seek to identify specific "fingerprints" in climate response variables. Deep learning… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2407.02333  [pdf, other

    cs.CL cs.CV

    Why do LLaVA Vision-Language Models Reply to Images in English?

    Authors: Musashi Hinck, Carolin Holtermann, Matthew Lyle Olson, Florian Schneider, Sungduk Yu, Anahita Bhiwandiwalla, Anne Lauscher, Shaoyen Tseng, Vasudev Lal

    Abstract: We uncover a surprising multilingual bias occurring in a popular class of multimodal vision-language models (VLMs). Including an image in the query to a LLaVA-style VLM significantly increases the likelihood of the model returning an English response, regardless of the language of the query. This paper investigates the causes of this loss with a two-pronged approach that combines extensive ablatio… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Pre-print

  3. arXiv:2405.20152  [pdf, other

    cs.CV

    Uncovering Bias in Large Vision-Language Models at Scale with Counterfactuals

    Authors: Phillip Howard, Kathleen C. Fraser, Anahita Bhiwandiwalla, Svetlana Kiritchenko

    Abstract: With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2404.03118  [pdf, other

    cs.CV

    LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models

    Authors: Gabriela Ben Melech Stan, Estelle Aflalo, Raanan Yehezkel Rohekar, Anahita Bhiwandiwalla, Shao-Yen Tseng, Matthew Lyle Olson, Yaniv Gurwicz, Chenfei Wu, Nan Duan, Vasudev Lal

    Abstract: In the rapidly evolving landscape of artificial intelligence, multi-modal large language models are emerging as a significant area of interest. These models, which combine various forms of data input, are becoming increasingly popular. However, understanding their internal mechanisms remains a complex task. Numerous advancements have been made in the field of explainability tools and mechanisms, y… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  5. arXiv:2404.00166  [pdf, other

    cs.CV cs.AI

    Uncovering Bias in Large Vision-Language Models with Counterfactuals

    Authors: Phillip Howard, Anahita Bhiwandiwalla, Kathleen C. Fraser, Svetlana Kiritchenko

    Abstract: With the advent of Large Language Models (LLMs) possessing increasingly impressive capabilities, a number of Large Vision-Language Models (LVLMs) have been proposed to augment LLMs with visual inputs. Such models condition generated text on both an input image and a text prompt, enabling a variety of use cases such as visual question answering and multimodal chat. While prior studies have examined… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: Accepted to the CVPR 2024 Responsible Generative AI (ReGenAI) Workshop

  6. arXiv:2312.00825  [pdf, other

    cs.CV cs.AI

    SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

    Authors: Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal

    Abstract: While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be… ▽ More

    Submitted 9 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: Accepted to CVPR 2024. arXiv admin note: text overlap with arXiv:2310.02988

  7. arXiv:2310.04914  [pdf, other

    cs.CV cs.AI cs.CL

    Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

    Authors: Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal

    Abstract: Foundational multimodal models pre-trained on large scale image-text pairs or video-text pairs or both have shown strong generalization abilities on downstream tasks. However unlike image-text models, pretraining video-text models is always not feasible due to the difficulty in collecting large-scale clean and aligned data, and exponential computational costs involved in the pretraining phase. The… ▽ More

    Submitted 24 November, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

  8. arXiv:2306.00103  [pdf, other

    cs.CV cs.CL cs.LG

    ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

    Authors: Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

    Abstract: Two-Tower Vision-Language (VL) models have shown promising improvements on various downstream VL tasks. Although the most advanced work improves performance by building bridges between encoders, it suffers from ineffective layer-by-layer utilization of uni-modal representations and cannot flexibly exploit different levels of uni-modal semantic knowledge. In this work, we propose ManagerTower, a no… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023 Main Conference, Oral

  9. arXiv:2001.05674  [pdf, other

    cs.LG

    Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks

    Authors: Léopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Mehran Nekuii, Oguz H Elibol, Hanlin Tang

    Abstract: Training with larger number of parameters while keeping fast iterations is an increasingly adopted strategy and trend for developing better performing Deep Neural Network (DNN) models. This necessitates increased memory footprint and computational requirements for training. Here we introduce a novel methodology for training deep neural networks using 8-bit floating point (FP8) numbers. Reduced bit… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

  10. arXiv:1910.03085  [pdf, other

    physics.space-ph astro-ph.IM cs.LG

    Correlation of Auroral Dynamics and GNSS Scintillation with an Autoencoder

    Authors: Kara Lamb, Garima Malhotra, Athanasios Vlontzos, Edward Wagstaff, Atılım Günes Baydin, Anahita Bhiwandiwalla, Yarin Gal, Alfredo Kalaitzis, Anthony Reina, Asti Bhatt

    Abstract: High energy particles originating from solar activity travel along the the Earth's magnetic field and interact with the atmosphere around the higher latitudes. These interactions often manifest as aurora in the form of visible light in the Earth's ionosphere. These interactions also result in irregularities in the electron density, which cause disruptions in the amplitude and phase of the radio si… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: Four first authors contributed equally; Paper accepted in Machine Learning for the Physical Sciences workshop of NeurIPS 2019; Camera Ready Version to Follow

  11. arXiv:1910.01570  [pdf, other

    cs.LG stat.ML

    Prediction of GNSS Phase Scintillations: A Machine Learning Approach

    Authors: Kara Lamb, Garima Malhotra, Athanasios Vlontzos, Edward Wagstaff, Atılım Günes Baydin, Anahita Bhiwandiwalla, Yarin Gal, Alfredo Kalaitzis, Anthony Reina, Asti Bhatt

    Abstract: A Global Navigation Satellite System (GNSS) uses a constellation of satellites around the earth for accurate navigation, timing, and positioning. Natural phenomena like space weather introduce irregularities in the Earth's ionosphere, disrupting the propagation of the radio signals that GNSS relies upon. Such disruptions affect both the amplitude and the phase of the propagated waves. No physics-b… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: First 4 authors contributed equally Paper accepted in Machine Learning for the Physical Sciences workshop of NeurIPS 2019 Camera Ready Version to Follow

  12. arXiv:1901.03762  [pdf, other

    cs.CV

    Using Scene Graph Context to Improve Image Generation

    Authors: Subarna Tripathi, Anahita Bhiwandiwalla, Alexei Bastidas, Hanlin Tang

    Abstract: Generating realistic images from scene graphs asks neural networks to be able to reason about object relationships and compositionality. As a relatively new task, how to properly ensure the generated images comply with scene graphs or how to measure task performance remains an open question. In this paper, we propose to harness scene graph context to improve image generation from scene graphs. We… ▽ More

    Submitted 15 January, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

    Comments: arXiv admin note: text overlap with arXiv:1804.01622 by other authors

  13. arXiv:1801.08058  [pdf, other

    cs.DC cs.LG

    Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

    Authors: Scott Cyphers, Arjun K. Bansal, Anahita Bhiwandiwalla, Jayaram Bobba, Matthew Brookhart, Avijit Chakraborty, Will Constable, Christian Convey, Leona Cook, Omar Kanawi, Robert Kimball, Jason Knight, Nikolay Korovaiko, Varun Kumar, Yixing Lao, Christopher R. Lishka, Jaikrishnan Menon, Jennifer Myers, Sandeep Aswath Narayana, Adam Procter, Tristan J. Webb

    Abstract: The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call "direct optimization", requires deep changes within each framework to improve the tr… ▽ More

    Submitted 29 January, 2018; v1 submitted 24 January, 2018; originally announced January 2018.