Skip to main content

Showing 1–50 of 1,943 results for author: Zhang, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.12967  [pdf, ps, other

    cs.DS cs.LG math.ST stat.ML

    Rényi-infinity constrained sampling with $d^3$ membership queries

    Authors: Yunbum Kook, Matthew S. Zhang

    Abstract: Uniform sampling over a convex body is a fundamental algorithmic problem, yet the convergence in KL or Rényi divergence of most samplers remains poorly understood. In this work, we propose a constrained proximal sampler, a principled and simple algorithm that possesses elegant convergence guarantees. Leveraging the uniform ergodicity of this sampler, we show that it converges in the Rényi-infinity… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 30 pages

  2. arXiv:2407.12023  [pdf, other

    cs.CL cs.AI

    CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models

    Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Zhi-Long Ji, Jin-Feng Bai, Zhen-Ru Pan, Fan-Hu Zeng, Jian Xu, Jia-Xin Zhang, Cheng-Lin Liu

    Abstract: Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 ed… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  3. arXiv:2407.11681  [pdf, other

    cs.CL

    MINI-LLM: Memory-Efficient Structured Pruning for Large Language Models

    Authors: Hongrong Cheng, Miao Zhang, Javen Qinfeng Shi

    Abstract: As Large Language Models (LLMs) grow dramatically in size, there is an increasing trend in compressing and speeding up these models. Previous studies have highlighted the usefulness of gradients for importance scoring in neural network compressing, especially in pruning medium-size networks. However, the substantial memory requirements involved in calculating gradients with backpropagation impede… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 13 pages

  4. arXiv:2407.11467  [pdf, other

    cs.HC

    TEXasGAN: Tactile Texture Exploration and Synthesis System Using Generative Adversarial Network

    Authors: Mingxin Zhang, Shun Terui, Yasutoshi Makino, Hiroyuki Shinoda

    Abstract: To create more realistic experiences in human-virtual object interactions, texture rendering has become a research hotspot in recent years. Different frequency components of designed vibrations can activate texture-related sensations due to similar receptors. However, designing specific vibrations for numerous real-world materials is impractical. Therefore, this study proposes a human-in-the-loop… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  5. arXiv:2407.11219  [pdf, other

    cs.CV eess.IV

    TLRN: Temporal Latent Residual Networks For Large Deformation Image Registration

    Authors: Nian Wu, Jiarui Xing, Miaomiao Zhang

    Abstract: This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of registering time-series images often lies in the occurrence of large motions, especially when images differ significantly from a reference (e.g., the start of a cardiac cycle compared to the peak stretching phase… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 10 pages. Accepted by MICCAI 2024

  6. arXiv:2407.09771  [pdf, other

    cs.CR cs.DB

    Protecting Data Buyer Privacy in Data Markets

    Authors: Minxing Zhang, Jian Pei

    Abstract: Data markets serve as crucial platforms facilitating data discovery, exchange, sharing, and integration among data users and providers. However, the paramount concern of privacy has predominantly centered on protecting privacy of data owners and third parties, neglecting the challenges associated with protecting the privacy of data buyers. In this article, we address this gap by modeling the intri… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  7. arXiv:2407.09709  [pdf, other

    cs.LG cs.CL

    GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

    Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang

    Abstract: Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  8. arXiv:2407.08555  [pdf, other

    eess.IV cs.CV

    SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

    Authors: Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

    Abstract: Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, ver… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Under review

  9. arXiv:2407.08454  [pdf, other

    cs.CL

    Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks

    Authors: Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang

    Abstract: How to efficiently serve Large Language Models (LLMs) has become a pressing issue because of their huge computational cost in their autoregressive generation process. To mitigate computational costs, LLMs often employ the KV Cache technique to improve the generation speed. While improving the computational efficiency, the storage requirements of the KV cache are substantial, particularly in long-c… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.07520  [pdf, other

    cs.CV

    IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection

    Authors: Mingjin Zhang, Yuchun Wang, Jie Guo, Yunsong Li, Xinbo Gao, Jing Zhang

    Abstract: The recent Segment Anything Model (SAM) is a significant advancement in natural image segmentation, exhibiting potent zero-shot performance suitable for various downstream image segmentation tasks. However, directly utilizing the pretrained SAM for Infrared Small Target Detection (IRSTD) task falls short in achieving satisfying performance due to a notable domain gap between natural and infrared i… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 18 pages, 8 figures, to be published in ECCV2024

  11. arXiv:2407.07357  [pdf, ps, other

    cs.LG q-bio.MN

    A deep graph model for the signed interaction prediction in biological network

    Authors: Shuyi Jin, Mengji Zhang, Meijie Wang, Lun Yu

    Abstract: In pharmaceutical research, the strategy of drug repurposing accelerates the development of new therapies while reducing R&D costs. Network pharmacology lays the theoretical groundwork for identifying new drug indications, and deep graph models have become essential for their precision in mapping complex biological networks. Our study introduces an advanced graph model that utilizes graph convolut… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  12. arXiv:2407.07327  [pdf, other

    cs.AI

    Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram

    Authors: Ming-Liang Zhang, Zhong-Zhi Li, Fei Yin, Liang Lin, Cheng-Lin Liu

    Abstract: Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained st… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: under review by journal

  13. arXiv:2407.06628  [pdf, other

    cs.CV

    Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition

    Authors: Mingfang Zhang, Yifei Huang, Ruicong Liu, Yoichi Sato

    Abstract: Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs can capture accurate motion signals while being robust to lighting variation and occlusion. While these characteristics are intuitively valuable to help egocentric action recognition, the potential of IMUs remains under-explored. In this work, we present a novel method for action recognition that integrates motio… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  14. arXiv:2407.06191  [pdf, other

    cs.CV

    Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

    Authors: Zhangyang Qi, Yunhan Yang, Mengchen Zhang, Long Xing, Xiaoyang Wu, Tong Wu, Dahua Lin, Xihui Liu, Jiaqi Wang, Hengshuang Zhao

    Abstract: Recent advances in 3D AIGC have shown promise in directly creating 3D objects from text and images, offering significant cost savings in animation and product design. However, detailed edit and customization of 3D assets remains a long-standing challenge. Specifically, 3D Generation methods lack the ability to follow finely detailed instructions as precisely as their 2D image creation counterparts… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project Page: https://tailor3d-2024.github.io/

  15. arXiv:2407.06188  [pdf, other

    cs.CV

    CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

    Authors: Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

    Abstract: Crowd Motion Generation is essential in entertainment industries such as animation and games as well as in strategic fields like urban simulation and planning. This new task requires an intricate integration of control and generation to realistically synthesize crowd dynamics under specific spatial and semantic constraints, whose challenges are yet to be fully explored. On the one hand, existing h… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Project page: https://gxyes.github.io/projects/CrowdMoGen.html

  16. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  17. arXiv:2407.06136  [pdf, other

    cs.CV

    Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning

    Authors: Xiaojie Li, Yibo Yang, Jianlong Wu, Bernard Ghanem, Liqiang Nie, Min Zhang

    Abstract: Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples while preserving the knowledge of previously learned classes. Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially, prone to overfitting to the current session. Existing dynamic strateg… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/xiaojieli0903/Mamba-FSCIL

  18. arXiv:2407.06048  [pdf, other

    cs.CL cs.CV

    Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation

    Authors: Alan Wu, Ye Yuan, Ming Zhang

    Abstract: Visually impaired people are a large group who can only use braille for reading and writing. However, the lack of special educational resources is the bottleneck for educating them. Educational equity is a reflection of the level of social civilization, cultural equality, and individual dignity. Facilitating and improving lifelong learning channels for the visually impaired is of great significanc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper is submitted to NeurIPS 2024 High School Project Track

  19. arXiv:2407.05510  [pdf, other

    cs.AR cs.ET cs.LG

    SCATTER: Algorithm-Circuit Co-Sparse Photonic Accelerator with Thermal-Tolerant, Power-Efficient In-situ Light Redistribution

    Authors: Ziang Yin, Nicholas Gangi, Meng Zhang, Jeff Zhang, Rena Huang, Jiaqi Gu

    Abstract: Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads. However, limited reconfigurability, high electrical-optical conversion cost, and thermal sensitivity limit the deployment of current optical analog computing engines to support power-restricted, performance-sensitive AI workloads at scale. Sparsity provides a great… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  20. arXiv:2407.05310  [pdf, other

    eess.SP cs.NE cs.SD eess.AS

    Ternary Spike-based Neuromorphic Signal Processing System

    Authors: Shuai Wang, Dehao Zhang, Ammar Belatreche, Yichen Xiao, Hongyu Qing, Wenjie We, Malu Zhang, Yang Yang

    Abstract: Deep Neural Networks (DNNs) have been successfully implemented across various signal processing fields, resulting in significant enhancements in performance. However, DNNs generally require substantial computational resources, leading to significant economic costs and posing challenges for their deployment on resource-constrained edge devices. In this study, we take advantage of spiking neural net… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  21. arXiv:2407.05282  [pdf, other

    cs.CV

    UltraEdit: Instruction-based Fine-Grained Image Editing at Scale

    Authors: Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang

    Abstract: This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct a… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 32 pages, 14 figures

  22. arXiv:2407.05047  [pdf, other

    cs.AI

    MFE-ETP: A Comprehensive Evaluation Benchmark for Multi-modal Foundation Models on Embodied Task Planning

    Authors: Min Zhang, Jianye Hao, Xian Fu, Peilong Han, Hao Zhang, Lei Shi, Hongyao Tang, Yan Zheng

    Abstract: In recent years, Multi-modal Foundation Models (MFMs) and Embodied Artificial Intelligence (EAI) have been advancing side by side at an unprecedented pace. The integration of the two has garnered significant attention from the AI research community. In this work, we attempt to provide an in-depth and comprehensive evaluation of the performance of MFM s on embodied task planning, aiming to shed lig… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  23. arXiv:2407.05021  [pdf, other

    cs.CV

    Incremental Multiview Point Cloud Registration

    Authors: Xiaoya Cheng, Yu Liu, Maojun Zhang, Shen Yan

    Abstract: In this paper, we present a novel approach for multiview point cloud registration. Different from previous researches that typically employ a global scheme for multiview registration, we propose to adopt an incremental pipeline to progressively align scans into a canonical coordinate system. Specifically, drawing inspiration from image-based 3D reconstruction, our approach first builds a sparse sc… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  24. arXiv:2407.04248  [pdf, other

    stat.ML cs.LG

    Machine Learning for Complex Systems with Abnormal Pattern by Exception Maximization Outlier Detection Method

    Authors: Zhikun Zhang, Yiting Duan, Xiangjun Wang, Mingyuan Zhang

    Abstract: This paper proposes a novel fast online methodology for outlier detection called the exception maximization outlier detection method(EMODM), which employs probabilistic models and statistical algorithms to detect abnormal patterns from the outputs of complex systems. The EMODM is based on a two-state Gaussian mixture model and demonstrates strong performance in probability anomaly detection workin… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  25. arXiv:2407.03884  [pdf, other

    cs.CL cs.AI

    Planning with Large Language Models for Conversational Agents

    Authors: Zhigen Li, Jianxiang Peng, Yanmeng Wang, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong

    Abstract: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be un… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  26. arXiv:2407.03515  [pdf, other

    stat.ML cs.LG

    Feature-Specific Coefficients of Determination in Tree Ensembles

    Authors: Zhongli Jiang, Dabao Zhang, Min Zhang

    Abstract: Tree ensemble methods provide promising predictions with models difficult to interpret. Recent introduction of Shapley values for individualized feature contributions, accompanied with several fast computing algorithms for predicted values, shows intriguing results. However, individualizing coefficients of determination, aka $R^2$, for each feature is challenged by the underlying quadratic losses,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  27. arXiv:2407.03125  [pdf, other

    cs.LG cs.AI

    Foundations and Frontiers of Graph Learning Theory

    Authors: Yu Huang, Min Zhou, Menglin Yang, Zhen Wang, Muhan Zhang, Jie Wang, Hong Xie, Hao Wang, Defu Lian, Enhong Chen

    Abstract: Recent advancements in graph learning have revolutionized the way to understand and analyze data with complex structures. Notably, Graph Neural Networks (GNNs), i.e. neural network architectures designed for learning graph representations, have become a popular paradigm. With these models being usually characterized by intuition-driven design or highly intricate components, placing them within the… ▽ More

    Submitted 7 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 35pages,273references. Github link: https://github.com/minehly/awesome-paper-for-graph-learning-theory

  28. arXiv:2407.02894  [pdf, other

    cs.CL cs.AI

    Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation

    Authors: Zhibin Lan, Liqiang Niu, Fandong Meng, Jie Zhou, Min Zhang, Jinsong Su

    Abstract: In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language. In this regard, conventional cascaded methods suffer from issues such as error propagation, massive parameters, and difficulties in deployment and retaining visual characteristics of the input image. Thus, constructing end-to-end models has be… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Findings

  29. arXiv:2407.02614  [pdf, other

    cs.HC

    AcuVR: Enhancing Acupuncture Training Workflow with Virtual Reality

    Authors: Menghe Zhang, Chen Chen, Matin Yarmand, Anish Rajeshkumar, Nadir Weibel

    Abstract: Acupuncture is a widely adopted medical practice that involves inserting thin needles into specific points on the body to alleviate pain and treat various health conditions. Current learning practices heavily rely on 2D atlases and practice on peers, which are notably less intuitive and pose risks, particularly in sensitive areas such as the eyes. To address these challenges, we introduce AcuVR, a… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 10 pages

    ACM Class: J.3; J.4; H.5

  30. arXiv:2407.02229  [pdf, other

    cs.CV

    LaMoD: Latent Motion Diffusion Model For Myocardial Strain Generation

    Authors: Jiarui Xing, Nivetha Jayakumar, Nian Wu, Yu Wang, Frederick H. Epstein, Miaomiao Zhang

    Abstract: Motion and deformation analysis of cardiac magnetic resonance (CMR) imaging videos is crucial for assessing myocardial strain of patients with abnormal heart functions. Recent advances in deep learning-based image registration algorithms have shown promising results in predicting motion fields from routinely acquired CMR sequences. However, their accuracy often diminishes in regions with subtle ap… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  31. arXiv:2407.02043  [pdf, other

    cs.CL

    Concise and Precise Context Compression for Tool-Using Language Models

    Authors: Yang Xu, Yunlong Feng, Honglin Mu, Yutai Hou, Yitong Li, Xinghao Wang, Wanjun Zhong, Zhongyang Li, Dandan Tu, Qingfu Zhu, Min Zhang, Wanxiang Che

    Abstract: Through reading the documentation in the context, tool-using language models can dynamically extend their capability using external tools. The cost is that we have to input lengthy documentation every time the model needs to use the tool, occupying the input window as well as slowing down the decoding process. Given the progress in general-purpose compression, soft context compression is a suita… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  32. arXiv:2407.01284  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

    Authors: Runqi Qiao, Qiuna Tan, Guanting Dong, Minhui Wu, Chong Sun, Xiaoshuai Song, Zhuoma GongQue, Shanglin Lei, Zhe Wei, Miaoxuan Zhang, Runfeng Qiao, Yifan Zhang, Xiao Zong, Yida Xu, Muxi Diao, Zhimin Bao, Chen Li, Honggang Zhang

    Abstract: Visual mathematical reasoning, as a fundamental visual reasoning ability, has received widespread attention from the Large Multimodal Models (LMMs) community. Existing benchmarks, such as MathVista and MathVerse, focus more on the result-oriented performance but neglect the underlying principles in knowledge acquisition and generalization. Inspired by human-like mathematical reasoning, we introduc… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress

  33. arXiv:2407.01274  [pdf

    cs.CY cs.CL

    Leveraging Large Language Models for Actionable Course Evaluation Student Feedback to Lecturers

    Authors: Mike Zhang, Euan D Lindsay, Frederik Bode Thorbensen, Danny Bøgsted Poulsen, Johannes Bjerva

    Abstract: End of semester student evaluations of teaching are the dominant mechanism for providing feedback to academics on their teaching practice. For large classes, however, the volume of feedback makes these tools impractical for this purpose. This paper explores the use of open-source generative AI to synthesise factual, actionable and appropriate summaries of student feedback from these survey respons… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted to SEFI 2024

  34. arXiv:2407.01213  [pdf, other

    cs.SI

    EMIF: Evidence-aware Multi-source Information Fusion Network for Explainable Fake News Detection

    Authors: Qingxing Dong, Mengyi Zhang, Shiyuan Wu, Xiaozhen Wu

    Abstract: Extensive research on automatic fake news detection has been conducted due to the significant detrimental effects of fake news proliferation. Most existing approaches rely on a single source of evidence, such as comments or relevant news, to derive explanatory evidence for decision-making, demonstrating exceptional performance. However, their single evidence source suffers from two critical drawba… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  35. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  36. arXiv:2407.00382  [pdf, other

    math.NA cs.AI cs.CE cs.LG

    Towards Universal Mesh Movement Networks

    Authors: Mingrui Zhang, Chunyang Wang, Stephan Kramer, Joseph G. Wallwork, Siyi Li, Jiancheng Liu, Xiang Chen, Matthew D. Piggott

    Abstract: Solving complex Partial Differential Equations (PDEs) accurately and efficiently is an essential and challenging problem in all scientific and engineering disciplines. Mesh movement methods provide the capability to improve the accuracy of the numerical solution without increasing the overall mesh degree of freedom count. Conventional sophisticated mesh movement methods are extremely expensive and… ▽ More

    Submitted 1 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

  37. arXiv:2407.00079  [pdf, other

    cs.DC cs.AI cs.AR

    Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

    Authors: Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu

    Abstract: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances ma… ▽ More

    Submitted 9 July, 2024; v1 submitted 23 June, 2024; originally announced July 2024.

    Comments: 23 pages, 13 figures

  38. Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment

    Authors: Hao Fei, Shengqiong Wu, Meishan Zhang, Min Zhang, Tat-Seng Chua, Shuicheng Yan

    Abstract: While pre-training large-scale video-language models (VLMs) has shown remarkable potential for various downstream video-language tasks, existing VLMs can still suffer from certain commonly seen limitations, e.g., coarse-grained cross-modal aligning , under-modeling of temporal dynamics, detached video-language view. In this work, we target enhancing VLMs with a fine-grained structural spatio-tempo… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE TPAMI 2024

    Journal ref: [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  39. arXiv:2406.18846  [pdf, other

    cs.CE

    AFBench: A Large-scale Benchmark for Airfoil Design

    Authors: Jian Liu, Jianyu Wu, Hairun Xie, Guoqing Zhang, Jing Wang, Wei Liu, Wanli Ouyang, Junjun Jiang, Xianming Liu, Shixiang Tang, Miao Zhang

    Abstract: Data-driven generative models have emerged as promising approaches towards achieving efficient mechanical inverse design. However, due to prohibitively high cost in time and money, there is still lack of open-source and large-scale benchmarks in this field. It is mainly the case for airfoil inverse design, which requires to generate and edit diverse geometric-qualified and aerodynamic-qualified ai… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Submitted to NeurIPS 2024 Dataset & Benchmark Track

  40. arXiv:2406.18820  [pdf, other

    cs.DC cs.LG

    Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training

    Authors: Xinyu Lian, Sam Ade Jacobs, Lev Kurilenko, Masahiro Tanaka, Stas Bekman, Olatunji Ruwase, Minjia Zhang

    Abstract: Existing checkpointing approaches seem ill-suited for distributed training even though hardware limitations make model parallelism, i.e., sharding model state across multiple accelerators, a requirement for model scaling. Consolidating distributed model state into a single checkpoint unacceptably slows down training, and is impractical at extreme scales. Distributed checkpoints, in contrast, are t… ▽ More

    Submitted 27 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  41. arXiv:2406.18414  [pdf, other

    cs.CV cs.AI

    BiTrack: Bidirectional Offline 3D Multi-Object Tracking Using Camera-LiDAR Data

    Authors: Kemiao Huang, Meiying Zhang, Qi Hao

    Abstract: Compared with real-time multi-object tracking (MOT), offline multi-object tracking (OMOT) has the advantages to perform 2D-3D detection fusion, erroneous link correction, and full track optimization but has to deal with the challenges from bounding box misalignment and track evaluation, editing, and refinement. This paper proposes "BiTrack", a 3D OMOT framework that includes modules of 2D-3D detec… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  42. arXiv:2406.18129  [pdf, other

    cs.CV cs.LG

    CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection

    Authors: Meiying Zhang, Weiyuan Peng, Guangyao Ding, Chenyang Lei, Chunlin Ji, Qi Hao

    Abstract: Simulation data can be accurately labeled and have been expected to improve the performance of data-driven algorithms, including object detection. However, due to the various domain inconsistencies from simulation to reality (sim-to-real), cross-domain object detection algorithms usually suffer from dramatic performance drops. While numerous unsupervised domain adaptation (UDA) methods have been d… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  43. arXiv:2406.18088  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LLM-Driven Multimodal Opinion Expression Identification

    Authors: Bonian Jia, Huiyao Chen, Yueheng Sun, Meishan Zhang, Min Zhang

    Abstract: Opinion Expression Identification (OEI) is essential in NLP for applications ranging from voice assistants to depression diagnosis. This study extends OEI to encompass multimodal inputs, underlining the significance of auditory cues in delivering emotional subtleties beyond the capabilities of text. We introduce a novel multimodal OEI (MOEI) task, integrating text and speech to mirror real-world s… ▽ More

    Submitted 29 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 Figures, Accept by Interspeech 2024

  44. arXiv:2406.17534  [pdf, other

    cs.CL

    Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification

    Authors: Huiyao Chen, Yu Zhao, Zulong Chen, Mengjia Wang, Liangyue Li, Meishan Zhang, Min Zhang

    Abstract: Hierarchical text classification (HTC) is an important task with broad applications, while few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely-ambiguous labels. In this work, we intro… ▽ More

    Submitted 29 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 17 pages

  45. arXiv:2406.17276  [pdf, other

    cs.CL

    OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure

    Authors: Jikai Wang, Yi Su, Juntao Li, Qingrong Xia, Zi Ye, Xinyu Duan, Zhefeng Wang, Min Zhang

    Abstract: Autoregressive language models demonstrate excellent performance in various scenarios. However, the inference efficiency is limited by its one-step-one-word generation mode, which has become a pressing problem recently as the models become increasingly larger. Speculative decoding employs a "draft and then verify" mechanism to allow multiple tokens to be generated in one step, realizing lossless a… ▽ More

    Submitted 16 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  46. arXiv:2406.17095  [pdf, other

    cs.CL

    Attention Instruction: Amplifying Attention in the Middle via Prompting

    Authors: Meiru Zhang, Zaiqiao Meng, Nigel Collier

    Abstract: The context window of large language models has been extended to 128k tokens or more. However, language models still suffer from position bias and have difficulty in accessing and using the middle part of the context due to the lack of attention. We examine the relative position awareness of LLMs and the feasibility of mitigating disproportional attention through prompting. We augment the original… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  47. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  48. arXiv:2406.14683  [pdf, other

    cs.LG cs.CL

    TAGLAS: An atlas of text-attributed graph datasets in the era of large graph and language models

    Authors: Jiarui Feng, Hao Liu, Lecheng Kong, Yixin Chen, Muhan Zhang

    Abstract: In this report, we present TAGLAS, an atlas of text-attributed graph (TAG) datasets and benchmarks. TAGs are graphs with node and edge features represented in text, which have recently gained wide applicability in training graph-language or graph foundation models. In TAGLAS, we collect and integrate more than 23 TAG datasets with domains ranging from citation graphs to molecule graphs and tasks f… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  49. arXiv:2406.14657  [pdf, other

    cs.CL cs.AI cs.LG

    OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

    Authors: Allen Roush, Yusuf Shabazz, Arvind Balaji, Peter Zhang, Stefano Mezza, Markus Zhang, Sanjay Basu, Sriram Vishwanath, Mehdi Fatemi, Ravid Shwartz-Ziv

    Abstract: We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable r… ▽ More

    Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted for Publication to ARGMIN 2024 at ACL2024

  50. arXiv:2406.14192  [pdf, other

    cs.CL cs.AI

    Timo: Towards Better Temporal Reasoning for Language Models

    Authors: Zhaochen Su, Jun Zhang, Tong Zhu, Xiaoye Qu, Juntao Li, Min Zhang, Yu Cheng

    Abstract: Reasoning about time is essential for Large Language Models (LLMs) to understand the world. Previous works focus on solving specific tasks, primarily on time-sensitive question answering. While these methods have proven effective, they cannot generalize to a wider spectrum of temporal reasoning tasks. Therefore, we propose a crucial question: Can we build a universal framework to handle a variety… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review