Skip to main content

Showing 1–50 of 173 results for author: Wang, E

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12366  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

    Authors: Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

    Abstract: Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a trend underscores the potential of LLMs to generalize navigational reasoning and diverse language understanding. However, a significant discrepancy in agent performance is observed when integrating LLMs in the Vision-and-… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.11988  [pdf, other

    cs.CL

    Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

    Authors: Shafiuddin Rehan Ahmed, Zhiyong Eric Wang, George Arthur Baker, Kevin Stowe, James H. Martin

    Abstract: The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two iss… ▽ More

    Submitted 5 June, 2024; originally announced July 2024.

    Comments: Short Paper, ACL 2024

  3. arXiv:2407.08650  [pdf, other

    physics.med-ph cs.CV

    Latent Spaces Enable Transformer-Based Dose Prediction in Complex Radiotherapy Plans

    Authors: Edward Wang, Ryan Au, Pencilla Lang, Sarah A. Mattonen

    Abstract: Evidence is accumulating in favour of using stereotactic ablative body radiotherapy (SABR) to treat multiple cancer lesions in the lung. Multi-lesion lung SABR plans are complex and require significant resources to create. In this work, we propose a novel two-stage latent transformer framework (LDFormer) for dose prediction of lung SABR plans with varying numbers of lesions. In the first stage, pa… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  4. arXiv:2407.08219  [pdf, other

    cs.CL cs.HC

    Generating Contextually-Relevant Navigation Instructions for Blind and Low Vision People

    Authors: Zain Merchant, Abrar Anwar, Emily Wang, Souti Chattopadhyay, Jesse Thomason

    Abstract: Navigating unfamiliar environments presents significant challenges for blind and low-vision (BLV) individuals. In this work, we construct a dataset of images and goals across different scenarios such as searching through kitchens or navigating outdoors. We then investigate how grounded instruction generation methods can provide contextually-relevant navigational guidance to users in these instance… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted as RO-MAN 2024 Late Breaking Report

  5. arXiv:2406.19263  [pdf, other

    cs.CL cs.CV

    Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

    Authors: Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang

    Abstract: Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring task: screen reading based on user-indicated points, which we name the Screen Point-and-Read (SPR) task. This task is predominantly handled by rigid acce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  6. arXiv:2406.15762  [pdf, other

    cs.LG stat.ML

    Rethinking the Diffusion Models for Numerical Tabular Data Imputation from the Perspective of Wasserstein Gradient Flow

    Authors: Zhichao Chen, Haoxuan Li, Fangyikang Wang, Odin Zhang, Hu Xu, Xiaoyu Jiang, Zhihuan Song, Eric H. Wang

    Abstract: Diffusion models (DMs) have gained attention in Missing Data Imputation (MDI), but there remain two long-neglected issues to be addressed: (1). Inaccurate Imputation, which arises from inherently sample-diversification-pursuing generative process of DMs. (2). Difficult Training, which stems from intricate design required for the mask matrix in model training stage. To address these concerns within… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  7. arXiv:2406.15575  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Sketch-GNN: Scalable Graph Neural Networks with Sublinear Training Complexity

    Authors: Mucong Ding, Tahseen Rabbani, Bang An, Evan Z Wang, Furong Huang

    Abstract: Graph Neural Networks (GNNs) are widely applied to graph learning problems such as node classification. When scaling up the underlying graphs of GNNs to a larger size, we are forced to either train on the complete graph and keep the full graph adjacency and node embeddings in memory (which is often infeasible) or mini-batch sample the graph (which results in exponentially growing computational com… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2022

  8. arXiv:2406.14288  [pdf, other

    cs.LG cs.AI

    Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective

    Authors: Yunfei Liu, Jintang Li, Yuehe Chen, Ruofan Wu, Ericbk Wang, Jing Zhou, Sheng Tian, Shuheng Shen, Xing Fu, Changhua Meng, Weiqiang Wang, Liang Chen

    Abstract: Graph clustering, a fundamental and challenging task in graph mining, aims to classify nodes in a graph into several disjoint clusters. In recent years, graph contrastive learning (GCL) has emerged as a dominant line of research in graph clustering and advances the new state-of-the-art. However, GCL-based methods heavily rely on graph augmentations and contrastive schemes, which may potentially in… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: KDD 2024 research track. Code available at https://github.com/EdisonLeeeee/MAGI

  9. arXiv:2406.12831  [pdf, other

    cs.CV cs.AI cs.MM

    VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

    Authors: Jing Gu, Yuwei Fang, Ivan Skorokhodov, Peter Wonka, Xinya Du, Sergey Tulyakov, Xin Eric Wang

    Abstract: Video editing stands as a cornerstone of digital media, from entertainment and education to professional communication. However, previous methods often overlook the necessity of comprehensively understanding both global and local contexts, leading to inaccurate and inconsistency edits in the spatiotemporal dimension, especially for long videos. In this paper, we introduce VIA, a unified spatiotemp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 13 pages, 11 figures

  10. arXiv:2406.09305  [pdf, other

    cs.CV

    Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

    Authors: Yufan Zhou, Ruiyi Zhang, Kaizhi Zheng, Nanxuan Zhao, Jiuxiang Gu, Zichao Wang, Xin Eric Wang, Tong Sun

    Abstract: In subject-driven text-to-image generation, recent works have achieved superior performance by training the model on synthetic datasets containing numerous image pairs. Trained on these datasets, generative models can produce text-aligned images for specific subject from arbitrary testing image in a zero-shot manner. They even outperform methods which require additional fine-tuning on testing imag… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  11. arXiv:2406.08689  [pdf, other

    cs.CR cs.AI

    Security of AI Agents

    Authors: Yifeng He, Ethan Wang, Yuyang Rong, Zifei Cheng, Hao Chen

    Abstract: The study and development of AI agents have been boosted by large language models. AI agents can function as intelligent assistants and complete tasks on behalf of their users with access to tools and the ability to execute commands in their environments, Through studying and experiencing the workflow of typical AI agents, we have raised several concerns regarding their security. These potential v… ▽ More

    Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  12. arXiv:2406.08407  [pdf, other

    cs.CV cs.AI cs.CL

    MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

    Authors: Xuehai He, Weixi Feng, Kaizhi Zheng, Yujie Lu, Wanrong Zhu, Jiachen Li, Yue Fan, Jianfeng Wang, Linjie Li, Zhengyuan Yang, Kevin Lin, William Yang Wang, Lijuan Wang, Xin Eric Wang

    Abstract: Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate rich representations of real-world dynamics and causalities. To this end, we introduce MMWorld, a new benchmark for multi-discipline, multi-faceted multi… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.06559  [pdf, other

    cs.CL cs.AI cs.LG

    Harnessing Business and Media Insights with Large Language Models

    Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

    Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  14. arXiv:2406.06316  [pdf, other

    cs.CL cs.AI cs.CE cs.LG

    Tx-LLM: A Large Language Model for Therapeutics

    Authors: Juan Manuel Zambrano Chaves, Eric Wang, Tao Tu, Eeshit Dhaval Vaishnav, Byron Lee, S. Sara Mahdavi, Christopher Semturs, David Fleet, Vivek Natarajan, Shekoofeh Azizi

    Abstract: Developing therapeutics is a lengthy and expensive process that requires the satisfaction of many different criteria, and AI models capable of expediting the process would be invaluable. However, the majority of current AI approaches address only a narrowly defined set of tasks, often circumscribed within a particular domain. To bridge this gap, we introduce Tx-LLM, a generalist large language mod… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  15. arXiv:2405.20421  [pdf, other

    cs.AI

    Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA

    Authors: Qianqi Yan, Xuehai He, Xiang Yue, Xin Eric Wang

    Abstract: Large Multimodal Models (LMMs) have shown remarkable progress in medical Visual Question Answering (Med-VQA), achieving high accuracy on existing benchmarks. However, their reliability under robust evaluation is questionable. This study reveals that when subjected to simple probing evaluation, state-of-the-art models perform worse than random guessing on medical diagnosis questions. To address thi… ▽ More

    Submitted 21 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  16. arXiv:2405.07387  [pdf, other

    cs.LG

    Semantic Loss Functions for Neuro-Symbolic Structured Prediction

    Authors: Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

    Abstract: Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which inject… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Preprint of Ch. 22 "Semantic Loss Functions for Neuro-Symbolic Structured Prediction" in "Compendium of Neurosymbolic Artificial Intelligence", https://ebooks.iospress.nl/ISBN/978-1-64368-406-2. arXiv admin note: substantial text overlap with arXiv:2201.11250, arXiv:2007.13197

  17. arXiv:2405.04834  [pdf, other

    cs.CV

    FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

    Authors: Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang

    Abstract: Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexibl… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  18. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  19. arXiv:2404.13220  [pdf

    cs.CR cs.LG

    Security and Privacy Product Inclusion

    Authors: Dave Kleidermacher, Emmanuel Arriaga, Eric Wang, Sebastian Porst, Roger Piqueras Jover

    Abstract: In this paper, we explore the challenges of ensuring security and privacy for users from diverse demographic backgrounds. We propose a threat modeling approach to identify potential risks and countermeasures for product inclusion in security and privacy. We discuss various factors that can affect a user's ability to achieve a high level of security and privacy, including low-income demographics, p… ▽ More

    Submitted 23 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2404.05717  [pdf, other

    cs.CV cs.AI

    SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

    Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

    Abstract: Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, w… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 18 pages, 16 figures, 3 tables

  21. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  22. arXiv:2403.09974  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery

    Authors: Enguang Wang, Zhimao Peng, Zhengyuan Xie, Fei Yang, Xialei Liu, Ming-Ming Cheng

    Abstract: Given unlabelled datasets containing both old and new categories, generalized category discovery (GCD) aims to accurately discover new classes while correctly classifying old classes, leveraging the class concepts learned from labeled samples. Current GCD methods only use a single visual modality of information, resulting in poor classification of visually similar classes. As a different modality,… ▽ More

    Submitted 10 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  23. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  24. arXiv:2403.03956  [pdf, other

    cs.IR cs.CL

    Backtracing: Retrieving the Cause of the Query

    Authors: Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya Demszky

    Abstract: Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators -- such as lecturers who want to improve their content -- identify segments that _caused_ a user to ask those questions. We introduce the task of backtracing,… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/rosewang2008/backtracing; EACL 2024 Findings, Long Paper

  25. arXiv:2402.19441  [pdf

    cs.GR

    3D Gaussian Model for Animation and Texturing

    Authors: Xiangzhi Eric Wang, Zackary P. T. Sin

    Abstract: 3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We pro… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  26. arXiv:2402.19233  [pdf, other

    cs.CY

    Shared lightweight autonomous vehicles for urban food deliveries: A simulation study

    Authors: Ainhoa Genua Cerviño, Naroa Coretti Sanchez, Elaine Liu Wang, Arnaud Grignard, Kent Larson

    Abstract: In recent years, the rapid growth of on-demand deliveries, especially in food deliveries, has spurred the exploration of innovative mobility solutions. In this context, lightweight autonomous vehicles have emerged as a potential alternative. However, their fleet-level behavior remains largely unexplored. To address this gap, we have developed an agent-based model and an environmental impact study… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 17 pages, 25 including abstract, 16 figures, journal paper

  27. arXiv:2402.15690  [pdf, other

    cs.CL cs.AI

    Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology

    Authors: Zhenhua Wang, Wei Xie, Baosheng Wang, Enze Wang, Zhiwen Gui, Shuoyoucheng Ma, Kai Chen

    Abstract: Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-makin… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  28. arXiv:2402.14815  [pdf

    cs.CY cs.AI cs.CV cs.LG

    Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging

    Authors: Yuzhe Yang, Yujia Liu, Xin Liu, Avanti Gulhane, Domenico Mastrodicasa, Wei Wu, Edward J Wang, Dushyant W Sahani, Shwetak Patel

    Abstract: Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically margin… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Code and data are available at https://github.com/YyzHarry/vlm-fairness

  29. arXiv:2402.09171  [pdf, other

    cs.SE

    Automated Unit Test Improvement using Large Language Models at Meta

    Authors: Nadia Alshahwan, Jubin Chheda, Anastasia Finegenova, Beliz Gokkaya, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

    Abstract: This paper describes Meta's TestGen-LLM tool, which uses LLMs to automatically improve existing human-written tests. TestGen-LLM verifies that its generated test classes successfully clear a set of filters that assure measurable improvement over the original test suite, thereby eliminating problems due to LLM hallucination. We describe the deployment of TestGen-LLM at Meta test-a-thons for the Ins… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, 32nd ACM Symposium on the Foundations of Software Engineering (FSE 24)

  30. arXiv:2402.06111  [pdf, other

    cs.SE

    Observation-based unit test generation at Meta

    Authors: Nadia Alshahwan, Mark Harman, Alexandru Marginean, Rotem Tal, Eddy Wang

    Abstract: TestGen automatically generates unit tests, carved from serialized observations of complex objects, observed during app execution. We describe the development and deployment of TestGen at Meta. In particular, we focus on the scalability challenges overcome during development in order to deploy observation-based test carving at scale in industry. So far, TestGen has landed 518 tests into production… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 12 pages, 8 figures, FSE 2024, Mon 15 - Fri 19 July 2024, Porto de Galinhas, Brazil

  31. arXiv:2402.05111  [pdf, other

    cs.CL cs.AI

    Edu-ConvoKit: An Open-Source Library for Education Conversation Data

    Authors: Rose E. Wang, Dorottya Demszky

    Abstract: We introduce Edu-ConvoKit, an open-source library designed to handle pre-processing, annotation and analysis of conversation data in education. Resources for analyzing education conversation data are scarce, making the research challenging to perform and therefore hard to access. We address these challenges with Edu-ConvoKit. Edu-ConvoKit is open-source (https://github.com/stanfordnlp/edu-convokit… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: https://github.com/stanfordnlp/edu-convokit https://edu-convokit.readthedocs.io/en/latest/

  32. arXiv:2402.04380  [pdf, other

    cs.SE

    Assured LLM-Based Software Engineering

    Authors: Nadia Alshahwan, Mark Harman, Inna Harper, Alexandru Marginean, Shubho Sengupta, Eddy Wang

    Abstract: In this paper we address the following question: How can we use Large Language Models (LLMs) to improve code independently of a human, while ensuring that the improved code - does not regress the properties of the original code? - improves the original in a verifiable and measurable way? To address this question, we advocate Assured LLM-Based Software Engineering; a generate-and-test approac… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 6 pages, 1 figure, InteNSE 24: ACM International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, April, 2024, Lisbon, Portugal

  33. arXiv:2402.03396  [pdf, other

    cs.SE cs.AI cs.CL cs.CR cs.LG

    UniTSyn: A Large-Scale Dataset Capable of Enhancing the Prowess of Large Language Models for Program Testing

    Authors: Yifeng He, Jiabo Huang, Yuyang Rong, Yiwen Guo, Ethan Wang, Hao Chen

    Abstract: The remarkable capability of large language models (LLMs) in generating high-quality code has drawn increasing attention in the software testing community. However, existing code LLMs often demonstrate unsatisfactory capabilities in generating accurate and complete tests since they were trained on code snippets collected without differentiating between code for testing purposes and other code. In… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures

  34. arXiv:2402.00319  [pdf, other

    cs.CV cs.AI

    SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling

    Authors: Eileen Wang, Soyeon Caren Han, Josiah Poon

    Abstract: Visual storytelling aims to automatically generate a coherent story based on a given image sequence. Unlike tasks like image captioning, visual stories should contain factual descriptions, worldviews, and human social commonsense to put disjointed elements together to form a coherent and engaging human-writeable story. However, most models mainly focus on applying factual information and using tax… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  35. arXiv:2401.16645  [pdf, other

    cs.LG

    Speeding up and reducing memory usage for scientific machine learning via mixed precision

    Authors: Joel Hayford, Jacob Goldman-Wetzler, Eric Wang, Lu Lu

    Abstract: Scientific machine learning (SciML) has emerged as a versatile approach to address complex computational science and engineering problems. Within this field, physics-informed neural networks (PINNs) and deep operator networks (DeepONets) stand out as the leading techniques for solving partial differential equations by incorporating both physical equations and experimental data. However, training P… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 25 pages, 7 figures

  36. arXiv:2401.15847  [pdf, other

    cs.CV cs.AI cs.CL

    Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

    Authors: Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Xinze Guan, Xin Eric Wang

    Abstract: Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanced multimodal AI applications, such as agents that understand complex scenes and navigate through webpages, the skill of multipanel visual reasoning i… ▽ More

    Submitted 27 June, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: ACL 2024

  37. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2310.10648  [pdf, other

    cs.CL cs.AI

    Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes

    Authors: Rose E. Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, Dorottya Demszky

    Abstract: Scaling high-quality tutoring remains a major challenge in education. Due to growing demand, many platforms employ novice tutors who, unlike experienced educators, struggle to address student mistakes and thus fail to seize prime learning opportunities. Our work explores the potential of large language models (LLMs) to close the novice-expert knowledge gap in remediating math mistakes. We contribu… ▽ More

    Submitted 6 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: NAACL 2024. Code: https://github.com/rosewang2008/bridge

  39. arXiv:2310.10637  [pdf, other

    cs.CL

    "Mistakes Help Us Grow": Facilitating and Evaluating Growth Mindset Supportive Language in Classrooms

    Authors: Kunal Handa, Margaret Clapper, Jessica Boyle, Rose E Wang, Diyi Yang, David S Yeager, Dorottya Demszky

    Abstract: Teachers' growth mindset supportive language (GMSL)--rhetoric emphasizing that one's skills can be improved over time--has been shown to significantly reduce disparities in academic achievement and enhance students' learning outcomes. Although teachers espouse growth mindset principles, most find it difficult to adopt GMSL in their practice due the lack of effective coaching in this area. We explo… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  40. arXiv:2310.05872  [pdf, other

    cs.CV cs.AI cs.CL

    ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

    Authors: Kaiwen Zhou, Kwonjoon Lee, Teruhisa Misu, Xin Eric Wang

    Abstract: In our work, we explore the synergistic capabilities of pre-trained vision-and-language models (VLMs) and large language models (LLMs) on visual commonsense reasoning (VCR) problems. We find that VLMs and LLMs-based decision pipelines are good at different kinds of VCR problems. Pre-trained VLMs exhibit strong performance for problems involving understanding the literal visual content, which we no… ▽ More

    Submitted 17 May, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  41. arXiv:2310.03903  [pdf, other

    cs.CL cs.MA

    LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

    Authors: Saaket Agashe, Yue Fan, Anthony Reyna, Xin Eric Wang

    Abstract: The emergent reasoning and Theory of Mind (ToM) abilities demonstrated by Large Language Models (LLMs) make them promising candidates for developing coordination agents. In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed analysis of LLMs within the context of Pure Coordination Games, where participating agents need to cooperate for the most gain. This benchmark evalua… ▽ More

    Submitted 2 April, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

  42. arXiv:2310.03675  [pdf, other

    cs.LG

    Hadamard Domain Training with Integers for Class Incremental Quantized Learning

    Authors: Martin Schiemer, Clemens JS Schaefer, Jayden Parker Vap, Mark James Horeni, Yu Emma Wang, Juan Ye, Siddharth Joshi

    Abstract: Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge p… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  43. arXiv:2310.02239  [pdf, other

    cs.CV cs.AI

    MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens

    Authors: Kaizhi Zheng, Xuehai He, Xin Eric Wang

    Abstract: The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding. However, the simultaneous generation of images with coherent texts is still underdeveloped. Addressing this, we introduce a novel interleaved vision-and-language generation method, centered around the concept of ``generative vokens". These vokens serve as pivotal elements c… ▽ More

    Submitted 15 March, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: 23 pages, 10 figures

  44. arXiv:2309.03214  [pdf, other

    cs.AR cs.SE

    Test Primitive:A Straightforward Method To Decouple March

    Authors: Yindong Xiao, Shanshan Lu, Ensheng Wang, Ruiqi Zhu, Zhijian Dai

    Abstract: The academic community has made outstanding achievements in researching the March algorithm. However, the current fault modeling method, which centers on fault primitives, cannot be directly applied to analyzing the March algorithm. This paper proposes a new test primitive. The test primitives, which decouple the cell states from sensitization and detection operations, describe the common features… ▽ More

    Submitted 29 August, 2023; originally announced September 2023.

  45. arXiv:2308.11521   

    cs.CL cs.AI cs.LG

    Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

    Authors: Zhenhua Wang, Wei Xie, Kai Chen, Baosheng Wang, Zhiwen Gui, Enze Wang

    Abstract: Large language models (LLMs), such as ChatGPT, have emerged with astonishing capabilities approaching artificial general intelligence. While providing convenience for various societal needs, LLMs have also lowered the cost of generating harmful content. Consequently, LLM developers have deployed semantic-level defenses to recognize and reject prompts that may lead to inappropriate content. Unfortu… ▽ More

    Submitted 24 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Serious errors were found in the experiment, which may lead to the overturning of the overall conclusions of the paper

  46. arXiv:2308.07702  [pdf, other

    cs.CL

    Better Zero-Shot Reasoning with Role-Play Prompting

    Authors: Aobo Kong, Shiwan Zhao, Hao Chen, Qicheng Li, Yong Qin, Ruiqi Sun, Xin Zhou, Enzhi Wang, Xiaohang Dong

    Abstract: Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced nove… ▽ More

    Submitted 13 March, 2024; v1 submitted 15 August, 2023; originally announced August 2023.

    Comments: NAACL 2024, Main Conference

  47. arXiv:2308.07502  [pdf, other

    cs.HC cs.CV

    SpecTracle: Wearable Facial Motion Tracking from Unobtrusive Peripheral Cameras

    Authors: Yinan Xuan, Varun Viswanath, Sunny Chu, Owen Bartolf, Jessica Echterhoff, Edward Wang

    Abstract: Facial motion tracking in head-mounted displays (HMD) has the potential to enable immersive "face-to-face" interaction in a virtual environment. However, current works on facial tracking are not suitable for unobtrusive augmented reality (AR) glasses or do not have the ability to track arbitrary facial movements. In this work, we demonstrate a novel system called SpecTracle that tracks a user's fa… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

  48. arXiv:2307.15082  [pdf, other

    cs.HC cs.MA eess.SY

    Survey of Human Models for Verification of Human-Machine Systems

    Authors: Timothy E. Wang, Alessandro Pinto

    Abstract: We survey the landscape of human operator modeling ranging from the early cognitive models developed in artificial intelligence to more recent formal task models developed for model-checking of human machine interactions. We review human performance modeling and human factors studies in the context of aviation, and models of how the pilot interacts with automation in the cockpit. The purpose of th… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  49. arXiv:2306.09343  [pdf, other

    cs.CL cs.AI

    SIGHT: A Large Annotated Dataset on Student Insights Gathered from Higher Education Transcripts

    Authors: Rose E. Wang, Pawan Wirawarn, Noah Goodman, Dorottya Demszky

    Abstract: Lectures are a learning experience for both students and teachers. Students learn from teachers about the subject material, while teachers learn from students about how to refine their instruction. However, online student feedback is unstructured and abundant, making it challenging for teachers to learn and improve. We take a step towards tackling this challenge. First, we contribute a dataset for… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: First two authors contributed equally. In the Proceedings of Innovative Use of NLP for Building Educational Applications 2023. The code and data are open-sourced here: https://github.com/rosewang2008/sight

  50. arXiv:2306.04879  [pdf, other

    cs.LG

    Augmenting Hessians with Inter-Layer Dependencies for Mixed-Precision Post-Training Quantization

    Authors: Clemens JS Schaefer, Navid Lambert-Shirzad, Xiaofan Zhang, Chiachen Chou, Tom Jablin, Jian Li, Elfie Guo, Caitlin Stanton, Siddharth Joshi, Yu Emma Wang

    Abstract: Efficiently serving neural network models with low latency is becoming more challenging due to increasing model complexity and parameter count. Model quantization offers a solution which simultaneously reduces memory footprint and compute requirements. However, aggressive quantization may lead to an unacceptable loss in model accuracy owing to differences in sensitivity to numerical imperfection a… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.