Zum Hauptinhalt springen

Showing 1–50 of 137 results for author: Cai, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.14762  [pdf, other

    cs.LG cs.SI

    Explainable Hierarchical Urban Representation Learning for Commuting Flow Prediction

    Authors: Mingfei Cai, Yanbo Pang, Yoshihide Sekimoto

    Abstract: Commuting flow prediction is an essential task for municipal operations in the real world. Previous studies have revealed that it is feasible to estimate the commuting origin-destination (OD) demand within a city using multiple auxiliary data. However, most existing methods are not suitable to deal with a similar task at a large scale, namely within a prefecture or the whole nation, owing to the i… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 11 pages, 6 figures

  2. arXiv:2408.13787  [pdf, other

    cs.LG cs.DC

    Mask-Encoded Sparsification: Mitigating Biased Gradients in Communication-Efficient Split Learning

    Authors: Wenxuan Zhou, Zhihao Qu, Shen-Huan Lyu, Miao Cai, Baoliu Ye

    Abstract: This paper introduces a novel framework designed to achieve a high compression ratio in Split Learning (SL) scenarios where resource-constrained devices are involved in large-scale model training. Our investigations demonstrate that compressing feature maps within SL leads to biased gradients that can negatively impact the convergence rates and diminish the generalization capabilities of the resul… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Journal ref: Proceedings of the 27th European Conference on Artificial Intelligence, 2024

  3. arXiv:2408.11611  [pdf, other

    cs.IR cs.LG

    DTN: Deep Multiple Task-specific Feature Interactions Network for Multi-Task Recommendation

    Authors: Yaowen Bi, Yuteng Lian, Jie Cui, Jun Liu, Peijian Wang, Guanghui Li, Xuejun Chen, Jinglin Zhao, Hao Wen, Jing Zhang, Zhaoqi Zhang, Wenzhuo Song, Yang Sun, Weiwei Zhang, Mingchen Cai, Guanxing Zhang

    Abstract: Neural-based multi-task learning (MTL) has been successfully applied to many recommendation applications. However, these MTL models (e.g., MMoE, PLE) did not consider feature interaction during the optimization, which is crucial for capturing complex high-order features and has been widely used in ranking models for real-world recommender systems. Moreover, through feature importance analysis acro… ▽ More

    Submitted 23 August, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

  4. arXiv:2408.10635  [pdf, other

    cs.AI cs.CL

    Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search

    Authors: Jonathan Light, Min Cai, Weiqin Chen, Guanzhi Wang, Xiusi Chen, Wei Cheng, Yisong Yue, Ziniu Hu

    Abstract: In this paper, we propose a new method Strategist that utilizes LLMs to acquire new skills for playing multi-agent games through a self-improvement process. Our method gathers quality feedback through self-play simulations with Monte Carlo tree search and LLM-based reflection, which can then be used to learn high-level strategic skills such as how to evaluate states that guide the low-level execut… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: website: https://llm-strategist.github.io

  5. arXiv:2408.05101  [pdf, other

    cs.CL cs.AI

    MooER: LLM-based Speech Recognition and Translation Models from Moore Threads

    Authors: Junhao Xu, Zhenlin Liang, Yi Liu, Yichao Hu, Jian Li, Yajun Zheng, Meng Cai, Hua Wang

    Abstract: In this paper, we present MooER, a LLM-based large-scale automatic speech recognition (ASR) / automatic speech translation (AST) model of Moore Threads. A 5000h pseudo labeled dataset containing open source and self collected speech data is used for training. We achieve performance comparable to other open source models trained with up to hundreds of thousands of hours of labeled speech data. Mean… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  6. arXiv:2407.13596  [pdf, other

    cs.CV

    EarthMarker: Visual Prompt Learning for Region-level and Point-level Remote Sensing Imagery Comprehension

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Jun Li, Yin Zhuang, Xuerui Mao

    Abstract: Recent advances in visual prompting in the natural image area have allowed users to interact with artificial intelligence (AI) tools through various visual marks such as box, point, and free-form shapes. However, due to the significant difference between the natural and remote sensing (RS) images, existing visual prompting models face challenges in RS scenarios. Moreover, RS MLLMs mainly focus on… ▽ More

    Submitted 20 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  7. arXiv:2407.10972  [pdf, other

    cs.CV cs.AI cs.LG

    VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation

    Authors: Bocheng Zou, Mu Cai, Jianrui Zhang, Yong Jae Lee

    Abstract: In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the world using geometry primitives such as polygons. Vector graphics (VG), on the other hand, offer a textual representation of visual content, which can be more c… ▽ More

    Submitted 29 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Project Page: https://vgbench.github.io

  8. arXiv:2406.20095  [pdf, other

    cs.RO cs.AI cs.CL cs.CV cs.LG

    LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

    Authors: Xiang Li, Cristina Mata, Jongwoo Park, Kumara Kahatapitiya, Yoo Sung Jang, Jinghuan Shang, Kanchana Ranasinghe, Ryan Burgert, Mu Cai, Yong Jae Lee, Michael S. Ryoo

    Abstract: Large Language Models (LLMs) equipped with extensive world knowledge and strong reasoning skills can tackle diverse tasks across domains, often by posing them as conversation-style instruction-response pairs. In this paper, we propose LLaRA: Large Language and Robotics Assistant, a framework which formulates robot action policy as conversations, and provides improved responses when trained with au… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  9. arXiv:2406.18020  [pdf, other

    cs.LG cs.AI physics.chem-ph

    MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

    Authors: Muzhen Cai, Sendong Zhao, Haochun Wang, Yanrui Du, Zewen Qiang, Bing Qin, Ting Liu

    Abstract: Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most ex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  10. arXiv:2406.13793  [pdf, other

    cs.HC

    Exploring the Optimal Time Window for Predicting Cognitive Load Using Physiological Sensor Data

    Authors: Minghao Cai, Carrie Demmans Epp

    Abstract: Learning analytics has begun to use physiological signals because these have been linked with learners' cognitive and affective states. These signals, when interpreted through machine learning techniques, offer a nuanced understanding of the temporal dynamics of student learning experiences and processes. However, there is a lack of clear guidance on the optimal time window to use for analyzing ph… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: Presented at PhysioCHI: Towards Best Practices for Integrating Physiological Signals in HCI, May 11, 2024, Honolulu, HI, USA

  11. arXiv:2406.09400  [pdf, other

    cs.CV cs.LG

    Yo'LLaVA: Your Personalized Language and Vision Assistant

    Authors: Thao Nguyen, Haotian Liu, Yuheng Li, Mu Cai, Utkarsh Ojha, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) have shown remarkable capabilities across a variety of tasks (e.g., image captioning, visual question answering). While broad, their knowledge remains generic (e.g., recognizing a dog), and they are unable to handle personalized subjects (e.g., recognizing a user's pet dog). Human reasoning, in contrast, typically operates within the context of specific subjects in o… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Project page: https://thaoshibe.github.io/YoLLaVA

  12. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 41 pages, 12 figures, 41 tables; Website: https://llm-self-control.github.io/

  13. arXiv:2405.20718  [pdf, other

    cs.IR cs.AI

    Popularity-Aware Alignment and Contrast for Mitigating Popularity Bias

    Authors: Miaomiao Cai, Lei Chen, Yifan Wang, Haoyue Bai, Peijie Sun, Le Wu, Min Zhang, Meng Wang

    Abstract: Collaborative Filtering (CF) typically suffers from the significant challenge of popularity bias due to the uneven distribution of items in real-world datasets. This bias leads to a significant accuracy gap between popular and unpopular items. It not only hinders accurate user preference understanding but also exacerbates the Matthew effect in recommendation systems. To alleviate popularity bias,… ▽ More

    Submitted 11 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by KDD 2024

  14. arXiv:2405.17430  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Matryoshka Multimodal Models

    Authors: Mu Cai, Jianwei Yang, Jianfeng Gao, Yong Jae Lee

    Abstract: Large Multimodal Models (LMMs) such as LLaVA have shown strong performance in visual-linguistic reasoning. These models first embed images into a fixed large number of visual tokens and then feed them into a Large Language Model (LLM). However, this design causes an excessive number of tokens for dense visual scenarios such as high-resolution images and videos, leading to great inefficiency. While… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://matryoshka-mm.github.io/

  15. Multimodality Invariant Learning for Multimedia-Based New Item Recommendation

    Authors: Haoyue Bai, Le Wu, Min Hou, Miaomiao Cai, Zhuangzhuang He, Yuyang Zhou, Richang Hong, Meng Wang

    Abstract: Multimedia-based recommendation provides personalized item suggestions by learning the content preferences of users. With the proliferation of digital devices and APPs, a huge number of new items are created rapidly over time. How to quickly provide recommendations for new items at the inference time is challenging. What's worse, real-world items exhibit varying degrees of modality missing(e.g., m… ▽ More

    Submitted 28 April, 2024; originally announced May 2024.

  16. arXiv:2405.10818  [pdf

    cs.SI

    Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System

    Authors: Jiawei Feng, Mengsi Cai, Fangze Dai, Tianci Bu, Xiaoyu Zhang, Huijun Zheng, Xin Lu

    Abstract: In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2304.10428 by other authors

  17. arXiv:2405.06670  [pdf, other

    cs.LO cs.LG

    TLINet: Differentiable Neural Network Temporal Logic Inference

    Authors: Danyang Li, Mingyu Cai, Cristian-Ioan Vasile, Roberto Tron

    Abstract: There has been a growing interest in extracting formal descriptions of the system behaviors from data. Signal Temporal Logic (STL) is an expressive formal language used to describe spatial-temporal properties with interpretability. This paper introduces TLINet, a neural-symbolic framework for learning STL formulas. The computation in TLINet is differentiable, enabling the usage of off-the-shelf gr… ▽ More

    Submitted 14 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  18. arXiv:2405.05543  [pdf, ps, other

    cs.HC

    Predicting Cognitive Load Using Sensor Data in a Literacy Game

    Authors: Minghao Cai, Carrie Demmans Epp

    Abstract: Educational games are being increasingly used to support self-paced learning. However, educators and system designers often face challenges in monitoring student affect and cognitive load. Existing assessments in game-based learning environments (GBLEs) tend to focus more on outcomes rather than processes, potentially overlooking key aspects of the learning journey that include learner affect and… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This work has been accepted by the 17th International Conference on Educational Data Mining

  19. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  20. arXiv:2403.19770  [pdf, other

    cs.RO cs.AI cs.LG

    Hierarchical Deep Learning for Intention Estimation of Teleoperation Manipulation in Assembly Tasks

    Authors: Mingyu Cai, Karankumar Patel, Soshi Iba, Songpo Li

    Abstract: In human-robot collaboration, shared control presents an opportunity to teleoperate robotic manipulation to improve the efficiency of manufacturing and assembly processes. Robots are expected to assist in executing the user's intentions. To this end, robust and prompt intention estimation is needed, relying on behavioral observations. The framework presents an intention estimation technique at hie… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: ICRA 2024

  21. arXiv:2403.18348  [pdf, other

    cs.IR

    Sequential Recommendation with Latent Relations based on Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang

    Abstract: Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeli… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  22. arXiv:2403.18325  [pdf, other

    cs.IR

    Common Sense Enhanced Knowledge-based Recommendation with Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Min Zhang, Qingyao Ai, Yiqun Liu, Mingchen Cai

    Abstract: Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the for… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by DASFAA 2024

  23. arXiv:2403.15388  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

    Authors: Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

    Abstract: Large Multimodal Models (LMMs) have shown significant visual reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically take in a fixed and large amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which… ▽ More

    Submitted 22 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://llava-prumerge.github.io/

  24. arXiv:2403.14125  [pdf, other

    stat.ML cs.LG

    Learning causal graphs using variable grouping according to ancestral relationship

    Authors: Ming Cai, Hisayuki Hara

    Abstract: Several causal discovery algorithms have been proposed. However, when the sample size is small relative to the number of variables, the accuracy of estimating causal graphs using existing methods decreases. And some methods are not feasible when the sample size is smaller than the number of variables. To circumvent these problems, some researchers proposed causal structure learning algorithms usin… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figures

  25. arXiv:2403.04369  [pdf, other

    cs.AI cs.CL

    From Graph to Word Bag: Introducing Domain Knowledge to Confusing Charge Prediction

    Authors: Ang Li, Qiangchao Chen, Yiquan Wu, Ming Cai, Xiang Zhou, Fei Wu, Kun Kuang

    Abstract: Confusing charge prediction is a challenging task in legal AI, which involves predicting confusing charges based on fact descriptions. While existing charge prediction methods have shown impressive performance, they face significant challenges when dealing with confusing charges, such as Snatch and Robbery. In the legal domain, constituent elements play a pivotal role in distinguishing confusing c… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  26. arXiv:2403.04366  [pdf, other

    cs.AI

    Enhancing Court View Generation with Knowledge Injection and Guidance

    Authors: Ang Li, Yiquan Wu, Yifei Liu, Fei Wu, Ming Cai, Kun Kuang

    Abstract: Court View Generation (CVG) is a challenging task in the field of Legal Artificial Intelligence (LegalAI), which aims to generate court views based on the plaintiff claims and the fact descriptions. While Pretrained Language Models (PLMs) have showcased their prowess in natural language generation, their application to the complex, knowledge-intensive domain of CVG often reveals inherent limitatio… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  27. arXiv:2403.03790  [pdf, other

    cs.CV

    Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Guoqiang Lei, Yin Zhuang, Xuerui Mao

    Abstract: Ship detection needs to identify ship locations from remote sensing (RS) scenes. Due to different imaging payloads, various appearances of ships, and complicated background interference from the bird's eye view, it is difficult to set up a unified paradigm for achieving multi-source ship detection. To address this challenge, in this article, leveraging the large language models (LLMs)'s powerful g… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  28. arXiv:2403.03730  [pdf, other

    cs.CV cs.AI cs.LG

    Learning 3D object-centric representation through prediction

    Authors: John Day, Tushar Arora, Jirui Liu, Li Erran Li, Ming Bo Cai

    Abstract: As part of human core knowledge, the representation of objects is the building block of mental representation that supports high-level concepts and symbolic reasoning. While humans develop the ability of perceiving objects situated in 3D environments without supervision, models that learn the same set of abilities with similar constraints faced by human infants are lacking. Towards this end, we de… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 11 figures. Project webpage can be found at https://jday54.github.io/opple_site/

    ACM Class: I.2.10; I.4.8; I.4.6; I.4.10; I.2.6

  29. arXiv:2402.18166  [pdf, other

    cs.IR

    Sequence-level Semantic Representation Fusion for Recommender Systems

    Authors: Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, Jinpeng Wang, Mingchen Cai, Wayne Xin Zhao

    Abstract: With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated \emph{textual data} of items (eg product title) and study how text features can be effectively fused with ID features in sequential recommendation. However, there exists distinct data charact… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures

  30. arXiv:2402.13254  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    CounterCurate: Enhancing Physical and Semantic Visio-Linguistic Compositional Reasoning via Counterfactual Examples

    Authors: Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee

    Abstract: We propose CounterCurate, a framework to comprehensively improve the visio-linguistic compositional reasoning capability for both contrastive and generative multimodal models. In particular, we identify two critical under-explored problems: the neglect of the physically grounded reasoning (counting and position understanding) and the potential of using highly capable text and image generation mode… ▽ More

    Submitted 12 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 15 pages, 6 figures, 12 tables, Project Page: https://countercurate.github.io/

  31. arXiv:2402.13194  [pdf, other

    quant-ph cs.IT

    Quantum Wiretap Channel Coding Assisted by Noisy Correlation

    Authors: Minglai Cai, Andreas Winter

    Abstract: We consider the private classical capacity of a quantum wiretap channel, where the users (sender Alice, receiver Bob, and eavesdropper Eve) have access to the resource of a shared quantum state, additionally to their channel inputs and outputs. An extreme case is maximal entanglement or a secret key between Alice and Bob, both of which would allow for onetime padding the message. But here both the… ▽ More

    Submitted 11 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Journal ref: Proc. ISIT 2024, Athens (Greece), 7-12 July 2024

  32. arXiv:2401.16822  [pdf, other

    cs.CV

    EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

    Authors: Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao

    Abstract: Multi-modal large language models (MLLMs) have demonstrated remarkable success in vision and visual-language tasks within the natural image domain. Owing to the significant diversities between the natural and remote sensing (RS) images, the development of MLLMs in the RS domain is still in the infant stage. To fill the gap, a pioneer MLLM named EarthGPT integrating various multi-sensor RS interpre… ▽ More

    Submitted 8 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  33. arXiv:2401.04997  [pdf, other

    cs.IR

    Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis

    Authors: Lanling Xu, Junjie Zhang, Bingqian Li, Jinpeng Wang, Mingchen Cai, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Recently, large language models such as ChatGPT have showcased remarkable abilities in solving general tasks, demonstrating the potential for applications in recommender systems. To assess how effectively LLMs can be used in recommendation tasks, our study primarily focuses on employing LLMs as recommender systems through prompting engineering. We propose a general framework for utilizing LLMs in… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: 40 pages, under review

  34. arXiv:2312.10897  [pdf, other

    cs.CL cs.AI cs.LG

    Generalized Category Discovery with Large Language Models in the Loop

    Authors: Wenbin An, Wenkai Shi, Feng Tian, Haonan Lin, QianYing Wang, Yaqiang Wu, Mingxiang Cai, Luyan Wang, Yan Chen, Haiping Zhu, Ping Chen

    Abstract: Generalized Category Discovery (GCD) is a crucial task that aims to recognize both known and novel categories from a set of unlabeled data by utilizing a few labeled data with only known categories. Due to the lack of supervision and category information, current methods usually perform poorly on novel categories and struggle to reveal semantic meanings of the discovered clusters, which limits the… ▽ More

    Submitted 26 May, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted by ACL 2024 Findings, code and data are available at https://github.com/Lackel/LOOP

  35. arXiv:2312.08153  [pdf, other

    physics.comp-ph cs.LG

    $ρ$-Diffusion: A diffusion-based density estimation framework for computational physics

    Authors: Maxwell X. Cai, Kin Long Kelvin Lee

    Abstract: In physics, density $ρ(\cdot)$ is a fundamentally important scalar function to model, since it describes a scalar field or a probability density function that governs a physical process. Modeling $ρ(\cdot)$ typically scales poorly with parameter space, however, and quickly becomes prohibitively difficult and computationally expensive. One promising avenue to bypass this is to leverage the capabili… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 6 pages, 2 figures, accepted for publication at the NeurIPS 2023 workshop "Machine Learning and the Physical Sciences"

  36. arXiv:2312.00784  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

    Authors: Mu Cai, Haotian Liu, Dennis Park, Siva Karthik Mustikovela, Gregory P. Meyer, Yuning Chai, Yong Jae Lee

    Abstract: While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary visual… ▽ More

    Submitted 26 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: Accepted to CVPR2024. Project page: https://vip-llava.github.io/

  37. arXiv:2311.17318  [pdf

    cs.CY

    Impact of Indoor Mobility Behavior on the Respiratory Infectious Diseases Transmission Trends

    Authors: Ziwei Cui, Ming Cai, Zheng Zhu, Gongbo Chen, Yao Xiao

    Abstract: The importance of indoor human mobility in the transmission dynamics of respiratory infectious diseases has been acknowledged. Previous studies have predominantly addressed a single type of mobility behavior such as queueing and a series of behaviors under specific scenarios. However, these studies ignore the abstraction of mobility behavior in various scenes and the critical examination of how th… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  38. arXiv:2311.01487  [pdf, other

    cs.CV cs.CL

    What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

    Authors: Yifan Du, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen

    Abstract: Visual instruction tuning is an essential approach to improving the zero-shot generalization capability of Multi-modal Large Language Models (MLLMs). A surge of visual instruction datasets with various focuses and characteristics have been proposed recently, enabling MLLMs to achieve surprising results on evaluation benchmarks. To develop more capable MLLMs, in this paper, we aim to investigate a… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Work in progress

  39. arXiv:2310.20398  [pdf, other

    astro-ph.EP astro-ph.IM cs.LG physics.comp-ph

    A hybrid approach for solving the gravitational N-body problem with Artificial Neural Networks

    Authors: Veronica Saz Ulibarrena, Philipp Horn, Simon Portegies Zwart, Elena Sellentin, Barry Koren, Maxwell X. Cai

    Abstract: Simulating the evolution of the gravitational N-body problem becomes extremely computationally expensive as N increases since the problem complexity scales quadratically with the number of bodies. We study the use of Artificial Neural Networks (ANNs) to replace expensive parts of the integration of planetary systems. Neural networks that include physical knowledge have grown in popularity in the l… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted for publication in the Journal of Computational Physics

  40. arXiv:2310.13610  [pdf, other

    cs.CL cs.AI

    Make Your Decision Convincing! A Unified Two-Stage Framework: Self-Attribution and Decision-Making

    Authors: Yanrui Du, Sendong Zhao, Haochun Wang, Yuhan Chen, Rui Bai, Zewen Qiang, Muzhen Cai, Bing Qin

    Abstract: Explaining black-box model behavior with natural language has achieved impressive results in various NLP tasks. Recent research has explored the utilization of subsequences from the input text as a rationale, providing users with evidence to support the model decision. Although existing frameworks excel in generating high-quality rationales while achieving high task performance, they neglect to ac… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  41. arXiv:2310.05036  [pdf, other

    cs.AI cs.CL

    AvalonBench: Evaluating LLMs Playing the Game of Avalon

    Authors: Jonathan Light, Min Cai, Sheng Shen, Ziniu Hu

    Abstract: In this paper, we explore the potential of Large Language Models (LLMs) Agents in playing the strategic social deduction game, Resistance Avalon. Players in Avalon are challenged not only to make informed decisions based on dynamically evolving game phases, but also to engage in discussions where they must deceive, deduce, and negotiate with other players. These characteristics make Avalon a compe… ▽ More

    Submitted 8 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  42. arXiv:2310.05035  [pdf, other

    cs.CL cs.AI

    Self-Convinced Prompting: Few-Shot Question Answering with Repeated Introspection

    Authors: Haodi Zhang, Min Cai, Xinhe Zhang, Chen Jason Zhang, Rui Mao, Kaishun Wu

    Abstract: While large language models (LLMs) such as ChatGPT and PaLM have demonstrated remarkable performance in various language understanding and generation tasks, their capabilities in complex reasoning and intricate knowledge utilization still fall short of human-level proficiency. Recent studies have established the effectiveness of prompts in steering LLMs towards generating desired outputs. Building… ▽ More

    Submitted 10 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

  43. arXiv:2310.04610  [pdf, other

    cs.AI cs.LG

    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies

    Authors: Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, Masahiro Tanaka, Xiaoxia Wu, Jeff Rasley, Ammar Ahmad Awan, Connor Holmes, Martin Cai, Adam Ghanem, Zhongzhu Zhou, Yuxiong He, Pete Luferenko, Divya Kumar, Jonathan Weyn, Ruixiong Zhang, Sylwester Klocek, Volodymyr Vragov, Mohammed AlQuraishi, Gustaf Ahdritz, Christina Floristean, Cristina Negri , et al. (67 additional authors not shown)

    Abstract: In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique… ▽ More

    Submitted 11 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  44. arXiv:2309.12530  [pdf, other

    cs.CV

    A Sentence Speaks a Thousand Images: Domain Generalization through Distilling CLIP with Language Guidance

    Authors: Zeyi Huang, Andy Zhou, Zijian Lin, Mu Cai, Haohan Wang, Yong Jae Lee

    Abstract: Domain generalization studies the problem of training a model with samples from several domains (or distributions) and then testing the model with samples from a new, unseen domain. In this paper, we propose a novel approach for domain generalization that leverages recent advances in large vision-language models, specifically a CLIP teacher model, to train a smaller model that generalizes to unsee… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: to appear at ICCV2023

  45. arXiv:2309.10313  [pdf, other

    cs.CL cs.AI cs.LG

    Investigating the Catastrophic Forgetting in Multimodal Large Language Models

    Authors: Yuexiang Zhai, Shengbang Tong, Xiao Li, Mu Cai, Qing Qu, Yong Jae Lee, Yi Ma

    Abstract: Following the success of GPT4, there has been a surge in interest in multimodal large language model (MLLM) research. This line of research focuses on developing general-purpose LLMs through fine-tuning pre-trained LLMs and vision models. However, catastrophic forgetting, a notorious phenomenon where the fine-tuned model fails to retain similar performance compared to the pre-trained model, still… ▽ More

    Submitted 5 December, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

  46. arXiv:2309.04198  [pdf, other

    cs.CL

    Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain

    Authors: Yanrui Du, Sendong Zhao, Muzhen Cai, Ming Ma, Danyang Zhao, Jiawei Cao, Bing Qin

    Abstract: Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both posit… ▽ More

    Submitted 23 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

  47. arXiv:2309.04175  [pdf, other

    cs.CL cs.AI

    Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

    Authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited domain knowledge. Such shortcomings pose potential risks in the utilization of LLMs within medical contexts. To address this challenge, we propose knowledge-tunin… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 11 pages, 5 figures

  48. arXiv:2309.04174  [pdf, other

    cs.CL cs.AI

    Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

    Authors: Haochun Wang, Sendong Zhao, Chi Liu, Nuwa Xi, Muzhen Cai, Bing Qin, Ting Liu

    Abstract: Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Mean… ▽ More

    Submitted 29 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: Accepted by AAAI 2024, 11 pages, 3 figures

  49. arXiv:2308.12033  [pdf, other

    cs.CL cs.AI

    PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine

    Authors: Chenrui Zhang, Lin Liu, Jinpeng Wang, Chuyuan Wang, Xiao Sun, Hongyu Wang, Mingchen Cai

    Abstract: As an effective tool for eliciting the power of Large Language Models (LLMs), prompting has recently demonstrated unprecedented abilities across a variety of complex tasks. To further improve the performance, prompt ensemble has attracted substantial interest for tackling the hallucination and instability of LLMs. However, existing methods usually adopt a two-stage paradigm, which requires a pre-p… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: 8 pages, 4 figures

  50. arXiv:2308.09370  [pdf, other

    cs.CL cs.SD eess.AS

    TrOMR:Transformer-Based Polyphonic Optical Music Recognition

    Authors: Yixuan Li, Huaping Liu, Qiang Jin, Miaomiao Cai, Peng Li

    Abstract: Optical Music Recognition (OMR) is an important technology in music and has been researched for a long time. Previous approaches for OMR are usually based on CNN for image understanding and RNN for music symbol classification. In this paper, we propose a transformer-based approach with excellent global perceptual capability for end-to-end polyphonic OMR, called TrOMR. We also introduce a novel con… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)