Search | arXiv e-print repository

Many-Shot In-Context Learning in Multimodal Foundation Models

Authors: Yixing Jiang, Jeremy Irvin, Ji Hun Wang, Muhammad Ahmed Chaudhry, Jonathan H. Chen, Andrew Y. Ng

Abstract: Large language models are well-known to be effective at few-shot in-context learning (ICL). Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows, presenting an opportunity to explore their capability to perform ICL with many more demonstrating examples. In this work, we evaluate the performance of multimodal foundation models scaling from few-shot t… ▽ More Large language models are well-known to be effective at few-shot in-context learning (ICL). Recent advancements in multimodal foundation models have enabled unprecedentedly long context windows, presenting an opportunity to explore their capability to perform ICL with many more demonstrating examples. In this work, we evaluate the performance of multimodal foundation models scaling from few-shot to many-shot ICL. We benchmark GPT-4o and Gemini 1.5 Pro across 10 datasets spanning multiple domains (natural imagery, medical imagery, remote sensing, and molecular imagery) and tasks (multi-class, multi-label, and fine-grained classification). We observe that many-shot ICL, including up to almost 2,000 multimodal demonstrating examples, leads to substantial improvements compared to few-shot (<100 examples) ICL across all of the datasets. Further, Gemini 1.5 Pro performance continues to improve log-linearly up to the maximum number of tested examples on many datasets. Given the high inference costs associated with the long prompts required for many-shot ICL, we also explore the impact of batching multiple queries in a single API call. We show that batching up to 50 queries can lead to performance improvements under zero-shot and many-shot ICL, with substantial gains in the zero-shot setting on multiple datasets, while drastically reducing per-query cost and latency. Finally, we measure ICL data efficiency of the models, or the rate at which the models learn from more demonstrating examples. We find that while GPT-4o and Gemini 1.5 Pro achieve similar zero-shot performance across the datasets, Gemini 1.5 Pro exhibits higher ICL data efficiency than GPT-4o on most datasets. Our results suggest that many-shot ICL could enable users to efficiently adapt multimodal foundation models to new applications and domains. Our codebase is publicly available at https://github.com/stanfordmlgroup/ManyICL . △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2401.14486 [pdf, other]

CloudTracks: A Dataset for Localizing Ship Tracks in Satellite Images of Clouds

Authors: Muhammad Ahmed Chaudhry, Lyna Kim, Jeremy Irvin, Yuzu Ido, Sonia Chu, Jared Thomas Isobe, Andrew Y. Ng, Duncan Watson-Parris

Abstract: Clouds play a significant role in global temperature regulation through their effect on planetary albedo. Anthropogenic emissions of aerosols can alter the albedo of clouds, but the extent of this effect, and its consequent impact on temperature change, remains uncertain. Human-induced clouds caused by ship aerosol emissions, commonly referred to as ship tracks, provide visible manifestations of t… ▽ More Clouds play a significant role in global temperature regulation through their effect on planetary albedo. Anthropogenic emissions of aerosols can alter the albedo of clouds, but the extent of this effect, and its consequent impact on temperature change, remains uncertain. Human-induced clouds caused by ship aerosol emissions, commonly referred to as ship tracks, provide visible manifestations of this effect distinct from adjacent cloud regions and therefore serve as a useful sandbox to study human-induced clouds. However, the lack of large-scale ship track data makes it difficult to deduce their general effects on cloud formation. Towards developing automated approaches to localize ship tracks at scale, we present CloudTracks, a dataset containing 3,560 satellite images labeled with more than 12,000 ship track instance annotations. We train semantic segmentation and instance segmentation model baselines on our dataset and find that our best model substantially outperforms previous state-of-the-art for ship track localization (61.29 vs. 48.65 IoU). We also find that the best instance segmentation model is able to identify the number of ship tracks in each image more accurately than the previous state-of-the-art (1.64 vs. 4.99 MAE). However, we identify cases where the best model struggles to accurately localize and count ship tracks, so we believe CloudTracks will stimulate novel machine learning approaches to better detect elongated and overlapping features in satellite images. We release our dataset openly at {zenodo.org/records/10042922}. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 11 pages, 5 figures, submitted to Journal of Machine Learning Research

arXiv:2206.03220 [pdf]

A Transparency Index Framework for AI in Education

Authors: Muhammad Ali Chaudhry, Mutlu Cukurova, Rose Luckin

Abstract: Numerous AI ethics checklists and frameworks have been proposed focusing on different dimensions of ethical AI such as fairness, explainability, and safety. Yet, no such work has been done on developing transparent AI systems for real-world educational scenarios. This paper presents a Transparency Index framework that has been iteratively co-designed with different stakeholders of AI in education,… ▽ More Numerous AI ethics checklists and frameworks have been proposed focusing on different dimensions of ethical AI such as fairness, explainability, and safety. Yet, no such work has been done on developing transparent AI systems for real-world educational scenarios. This paper presents a Transparency Index framework that has been iteratively co-designed with different stakeholders of AI in education, including educators, ed-tech experts, and AI practitioners. We map the requirements of transparency for different categories of stakeholders of AI in education and demonstrate that transparency considerations are embedded in the entire AI development process from the data collection stage until the AI system is deployed in the real world and iteratively improved. We also demonstrate how transparency enables the implementation of other ethical AI dimensions in Education like interpretability, accountability, and safety. In conclusion, we discuss the directions for future research in this newly emerging field. The main contribution of this study is that it highlights the importance of transparency in developing AI-powered educational technologies and proposes an index framework for its conceptualization for AI in education. △ Less

Submitted 9 May, 2022; originally announced June 2022.

Comments: 17 pages, 4 Figures, 2 Tables

arXiv:0707.0860 [pdf, ps, other]

On the Minimum Number of Transmissions in Single-Hop Wireless Coding Networks

Authors: Salim Y. El Rouayheb, Mohammad Asad R. Chaudhry, Alex Sprintson

Abstract: The advent of network coding presents promising opportunities in many areas of communication and networking. It has been recently shown that network coding technique can significantly increase the overall throughput of wireless networks by taking advantage of their broadcast nature. In wireless networks, each transmitted packet is broadcasted within a certain area and can be overheard by the nei… ▽ More The advent of network coding presents promising opportunities in many areas of communication and networking. It has been recently shown that network coding technique can significantly increase the overall throughput of wireless networks by taking advantage of their broadcast nature. In wireless networks, each transmitted packet is broadcasted within a certain area and can be overheard by the neighboring nodes. When a node needs to transmit packets, it employs the opportunistic coding approach that uses the knowledge of what the node's neighbors have heard in order to reduce the number of transmissions. With this approach, each transmitted packet is a linear combination of the original packets over a certain finite field. In this paper, we focus on the fundamental problem of finding the optimal encoding for the broadcasted packets that minimizes the overall number of transmissions. We show that this problem is NP-complete over GF(2) and establish several fundamental properties of the optimal solution. We also propose a simple heuristic solution for the problem based on graph coloring and present some empirical results for random settings. △ Less

Submitted 5 July, 2007; originally announced July 2007.

Comments: 6 pages

Showing 1–4 of 4 results for author: Chaudhry, M A