Skip to main content

Showing 1–50 of 608 results for author: Song, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13703  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Energy-Efficient Channel Decoding for Wireless Federated Learning: Convergence Analysis and Adaptive Design

    Authors: Linping Qu, Yuyi Mao, Shenghui Song, Chi-Ying Tsui

    Abstract: One of the most critical challenges for deploying distributed learning, such as federated learning (FL), in wireless networks is the limited battery capacity of mobile devices. While it is a common belief that the major energy consumption of mobile devices comes from the uplink data transmission, this paper presents a novel finding, namely the channel decoding operation also contributes significan… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  2. arXiv:2407.12680  [pdf, other

    cs.CY cs.CL

    Reducing Biases towards Minoritized Populations in Medical Curricular Content via Artificial Intelligence for Fairer Health Outcomes

    Authors: Chiman Salavati, Shannon Song, Willmar Sosa Diaz, Scott A. Hale, Roberto E. Montenegro, Fabricio Murai, Shiri Dori-Hacohen

    Abstract: Biased information (recently termed bisinformation) continues to be taught in medical curricula, often long after having been debunked. In this paper, we introduce BRICC, a firstin-class initiative that seeks to mitigate medical bisinformation using machine learning to systematically identify and flag text with potential biases, for subsequent review in an expert-in-the-loop fashion, thus greatly… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

    Comments: Under review

  3. arXiv:2407.12622  [pdf, other

    cs.CV

    Rethinking the Architecture Design for Efficient Generic Event Boundary Detection

    Authors: Ziwei Zheng, Zechuan Zhang, Yulin Wang, Shiji Song, Gao Huang, Le Yang

    Abstract: Generic event boundary detection (GEBD), inspired by human visual cognitive behaviors of consistently segmenting videos into meaningful temporal chunks, finds utility in various applications such as video editing and. In this paper, we demonstrate that SOTA GEBD models often prioritize final performance over model complexity, resulting in low inference speed and hindering efficient deployment in r… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ACM MM 2024

  4. arXiv:2407.12019  [pdf, other

    cs.CL cs.AI

    DIM: Dynamic Integration of Multimodal Entity Linking with Large Language Model

    Authors: Shezheng Song, Shasha Li, Jie Yu, Shan Zhao, Xiaopeng Li, Jun Ma, Xiaodong Liu, Zhuo Li, Xiaoguang Mao

    Abstract: Our study delves into Multimodal Entity Linking, aligning the mention in multimodal information with entities in knowledge base. Existing methods are still facing challenges like ambiguous entity representations and limited image information utilization. Thus, we propose dynamic entity extraction using ChatGPT, which dynamically extracts entities and enhances datasets. We also propose a method: Dy… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

    Comments: Published on PRCV24

  5. arXiv:2407.11435  [pdf, other

    q-bio.GN cs.LG stat.ML

    Genomic Language Models: Opportunities and Challenges

    Authors: Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song

    Abstract: Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to signif… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Review article; 25 pages, 3 figures, 1 table

    MSC Class: 92-08; 92B20; 68T50; 68T07

  6. arXiv:2407.10353  [pdf, other

    cs.RO

    UMI on Legs: Making Manipulation Policies Mobile with Manipulation-Centric Whole-body Controllers

    Authors: Huy Ha, Yihuai Gao, Zipeng Fu, Jie Tan, Shuran Song

    Abstract: We introduce UMI-on-Legs, a new framework that combines real-world and simulation data for quadruped manipulation systems. We scale task-centric data collection in the real world using a hand-held gripper (UMI), providing a cheap way to demonstrate task-relevant manipulation skills without a robot. Simultaneously, we scale robot-centric data in simulation by training whole-body controller for task… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 18 pages, 7 figures, website: https://umi-on-legs.github.io/

    ACM Class: I.2.9

  7. arXiv:2407.08770  [pdf, other

    cs.AI

    Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

    Authors: Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang

    Abstract: Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current methods for detoxification or preventing jailbreaking usually in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 14 figures

    MSC Class: 68T50 (Primary) 68T07; 62M45 (Secondary) ACM Class: I.2.7

  8. arXiv:2407.08458  [pdf, other

    cs.LG cs.NI eess.SP

    Joint Optimization of Age of Information and Energy Consumption in NR-V2X System based on Deep Reinforcement Learning

    Authors: Shulin Song, Zheng Zhang, Qiong Wu, Qiang Fan, Pingyi Fan

    Abstract: Autonomous driving may be the most important application scenario of next generation, the development of wireless access technologies enabling reliable and low-latency vehicle communication becomes crucial. To address this, 3GPP has developed Vehicle-to-Everything (V2X) specifications based on 5G New Radio (NR) technology, where Mode 2 Side-Link (SL) communication resembles Mode 4 in LTE-V2X, allo… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by sensors. The source code has been released at: https://github.com/qiongwu86/Joint-Optimization-of-AoI-and-Energy-Consumption-in-NR-V2X-System-based-on-DRL

  9. arXiv:2407.08273   

    cs.CL

    RB-SQL: A Retrieval-based LLM Framework for Text-to-SQL

    Authors: Zhenhe Wu, Zhongqiu Li, Jie Zhang, Mengxiang Li, Yu Zhao, Ruiyu Fang, Zhongjiang He, Xuelong Li, Zhoujun Li, Shuangyong Song

    Abstract: Large language models (LLMs) with in-context learning have significantly improved the performance of text-to-SQL task. Previous works generally focus on using exclusive SQL generation prompt to improve the LLMs' reasoning ability. However, they are mostly hard to handle large databases with numerous tables and columns, and usually ignore the significance of pre-processing database and extracting v… ▽ More

    Submitted 12 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Further improvement and modification are needed.

  10. arXiv:2407.08109  [pdf, other

    cs.CV cs.AI cs.LG

    Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter

    Authors: Suqi Song, Chenxu Zhang, Peng Zhang, Pengkun Li, Fenglong Song, Lei Zhang

    Abstract: Urban waterlogging poses a major risk to public safety and infrastructure. Conventional methods using water-level sensors need high-maintenance to hardly achieve full coverage. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. In this paper, we establish a challenging Urban Waterlogging Be… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  11. arXiv:2407.04709  [pdf, other

    eess.SP cs.AI cs.CV

    Efficient 4D Radar Data Auto-labeling Method using LiDAR-based Object Detection Network

    Authors: Min-Hyeok Sun, Dong-Hee Paek, Seung-Hyun Song, Seung-Hyun Kong

    Abstract: Focusing on the strength of 4D (4-Dimensional) radar, research about robust 3D object detection networks in adverse weather conditions has gained attention. To train such networks, datasets that contain large amounts of 4D radar data and ground truth labels are essential. However, the existing 4D radar datasets (e.g., K-Radar) lack sufficient sensor data and labels, which hinders the advancement i… ▽ More

    Submitted 13 May, 2024; originally announced July 2024.

    Comments: Accept at IEEE IVS 2024

  12. arXiv:2407.03197  [pdf, other

    cs.CV

    DyFADet: Dynamic Feature Aggregation for Temporal Action Detection

    Authors: Le Yang, Ziwei Zheng, Yizeng Han, Hao Cheng, Shiji Song, Gao Huang, Fan Li

    Abstract: Recent proposed neural network-based Temporal Action Detection (TAD) models are inherently limited to extracting the discriminative representations and modeling action instances with various lengths from complex scenes by shared-weights detection heads. Inspired by the successes in dynamic neural networks, in this paper, we build a novel dynamic feature aggregation (DFA) module that can simultaneo… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  13. arXiv:2407.02877  [pdf, other

    cs.IT eess.SP

    Resource Allocation Design for Next-Generation Multiple Access: A Tutorial Overview

    Authors: Zhiqiang Wei, Dongfang Xu, Shuangyang Li, Shenghui Song, Derrick Wing Kwan Ng, Giuseppe Caire

    Abstract: Multiple access is the cornerstone technology for each generation of wireless cellular networks and resource allocation design plays a crucial role in multiple access. In this paper, we present a comprehensive tutorial overview for junior researchers in this field, aiming to offer a foundational guide for resource allocation design in the context of next-generation multiple access (NGMA). Initiall… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 69 pages, 10 figures, 5 tables

  14. arXiv:2407.02783  [pdf, ps, other

    cs.CL cs.AI

    52B to 1T: Lessons Learned via Tele-FLM Series

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: For the Tele-FLM-52B tech report, see also 2404.16645

  15. arXiv:2407.01479  [pdf, other

    cs.RO cs.LG

    EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning

    Authors: Jingyun Yang, Zi-ang Cao, Congyue Deng, Rika Antonova, Shuran Song, Jeannette Bohg

    Abstract: Building effective imitation learning methods that enable robots to learn from limited data and still generalize across diverse real-world environments is a long-standing problem in robot learning. We propose EquiBot, a robust, data-efficient, and generalizable approach for robot manipulation task learning. Our approach combines SIM(3)-equivariant neural network architectures with diffusion models… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally

  16. arXiv:2407.01178  [pdf, other

    cs.CL cs.AI cs.LG

    $\text{Memory}^3$: Language Modeling with Explicit Memory

    Authors: Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, Linpeng Tang, Weinan E

    Abstract: The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowled… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    MSC Class: 68T50 ACM Class: I.2.7

  17. arXiv:2407.00696  [pdf, other

    cs.LG

    Graph in Graph Neural Network

    Authors: Jiongshu Wang, Jing Yang, Jiankang Deng, Hatice Gunes, Siyang Song

    Abstract: Existing Graph Neural Networks (GNNs) are limited to process graphs each of whose vertices is represented by a vector or a single value, limited their representing capability to describe complex objects. In this paper, we propose the first GNN (called Graph in Graph Neural (GIG) Network) which can process graph-style data (called GIG sample) whose vertices are further represented by graphs. Given… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    MSC Class: 68T05

  18. arXiv:2407.00668  [pdf, other

    cs.CL

    HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

    Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

    Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More

    Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  19. arXiv:2406.19756  [pdf, other

    cs.CV cs.AI

    Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

    Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Technical report

  20. arXiv:2406.19464  [pdf, other

    cs.RO cs.AI cs.CV cs.SD eess.AS

    ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

    Authors: Zeyi Liu, Cheng Chi, Eric Cousineau, Naveen Kuppuswamy, Benjamin Burchfiel, Shuran Song

    Abstract: Audio signals provide rich information for the robot interaction and object properties through contact. These information can surprisingly ease the learning of contact-rich robot manipulation skills, especially when the visual information alone is ambiguous or incomplete. However, the usage of audio data in robot manipulation has been constrained to teleoperated demonstrations collected by either… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.18156  [pdf, other

    cs.LG cs.DC cs.NI eess.SP

    FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization

    Authors: Linping Qu, Shenghui Song, Chi-Ying Tsui

    Abstract: Federated learning (FL) is a powerful machine learning paradigm which leverages the data as well as the computational resources of clients, while protecting clients' data privacy. However, the substantial model size and frequent aggregation between the server and clients result in significant communication overhead, making it challenging to deploy FL in resource-limited wireless networks. In this… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2406.16962  [pdf, other

    cs.LG cs.AI cs.CL

    MetaGreen: Meta-Learning Inspired Transformer Selection for Green Semantic Communication

    Authors: Shubhabrata Mukherjee, Cory Beard, Sejun Song

    Abstract: Semantic Communication can transform the way we transmit information, prioritizing meaningful and effective content over individual symbols or bits. This evolution promises significant benefits, including reduced latency, lower bandwidth usage, and higher throughput compared to traditional communication. However, the development of Semantic Communication faces a crucial challenge: the need for uni… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2310.07592

  23. arXiv:2406.16866  [pdf, other

    cs.CV

    Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

    Authors: Jierun Chen, Fangyun Wei, Jinjing Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S. -H. Gary Chan, Hongyang Zhang

    Abstract: Referring expression comprehension (REC) involves localizing a target instance based on a textual description. Recent advancements in REC have been driven by large multimodal models (LMMs) like CogVLM, which achieved 92.44% accuracy on RefCOCO. However, this study questions whether existing benchmarks such as RefCOCO, RefCOCO+, and RefCOCOg, capture LMMs' comprehensive capabilities. We begin with… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  24. arXiv:2406.16862  [pdf, other

    cs.RO cs.CV

    Dreamitate: Real-World Visuomotor Policy Learning via Video Generation

    Authors: Junbang Liang, Ruoshi Liu, Ege Ozguroglu, Sruthi Sudhakar, Achal Dave, Pavel Tokmakov, Shuran Song, Carl Vondrick

    Abstract: A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets of internet videos. In this paper, we propose a visuomotor policy learning framework that fine-tunes a video diffusion model on human demonstrations o… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Project page: https://dreamitate.cs.columbia.edu/

  25. arXiv:2406.16502  [pdf, other

    cs.CV

    LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery

    Authors: Xiaowen Ma, Rongrong Lian, Zhenkai Wu, Hongbo Guo, Mengting Ma, Sensen Wu, Zhenhong Du, Siyang Song, Wei Zhang

    Abstract: Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensin… ▽ More

    Submitted 1 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: Under Review

  26. arXiv:2406.15320  [pdf, other

    cs.CV

    Rethinking Remote Sensing Change Detection With A Mask View

    Authors: Xiaowen Ma, Zhenkai Wu, Rongrong Lian, Wei Zhang, Siyang Song

    Abstract: Remote sensing change detection aims to compare two or more images recorded for the same area but taken at different time stamps to quantitatively and qualitatively assess changes in geographical entities and environmental factors. Mainstream models usually built on pixel-by-pixel change detection paradigms, which cannot tolerate the diversity of changes due to complex scenes and variation in imag… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Under review

  27. arXiv:2406.13165  [pdf, other

    eess.IV cs.AI cs.CV cs.RO

    Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

    Authors: Haojun Jiang, Zhenguo Sun, Ning Jia, Meng Li, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Early Accepted by MICCAI 2024

  28. arXiv:2406.11794  [pdf, other

    cs.LG cs.CL

    DataComp-LM: In search of the next generation of training sets for language models

    Authors: Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Gadre, Hritik Bansal, Etash Guha, Sedrick Keh, Kushal Arora, Saurabh Garg, Rui Xin, Niklas Muennighoff, Reinhard Heckel, Jean Mercat, Mayee Chen, Suchin Gururangan, Mitchell Wortsman, Alon Albalak, Yonatan Bitton, Marianna Nezhurina, Amro Abbas, Cheng-Yu Hsieh, Dhruba Ghosh, Josh Gardner , et al. (34 additional authors not shown)

    Abstract: We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with dat… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://www.datacomp.ai/dclm/

  29. Nurgle: Exacerbating Resource Consumption in Blockchain State Storage via MPT Manipulation

    Authors: Zheyuan He, Zihao Li, Ao Qiao, Xiapu Luo, Xiaosong Zhang, Ting Chen, Shuwei Song, Dijun Liu, Weina Niu

    Abstract: Blockchains, with intricate architectures, encompass various components, e.g., consensus network, smart contracts, decentralized applications, and auxiliary services. While offering numerous advantages, these components expose various attack surfaces, leading to severe threats to blockchains. In this study, we unveil a novel attack surface, i.e., the state storage, in blockchains. The state storag… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  30. arXiv:2406.09745  [pdf, other

    cs.LG

    How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis

    Authors: Yuxin Dong, Tieliang Gong, Hong Chen, Shuangyong Song, Weizhan Zhang, Chen Li

    Abstract: Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution match… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  31. arXiv:2406.08474  [pdf, other

    cs.CV cs.AI cs.LG

    Real2Code: Reconstruct Articulated Objects via Code Generation

    Authors: Zhao Mandi, Yijia Weng, Dominik Bauer, Shuran Song

    Abstract: We present Real2Code, a novel approach to reconstructing articulated objects via code generation. Given visual observations of an object, we first reconstruct its part geometry using an image segmentation model and a shape completion model. We then represent the object parts with oriented bounding boxes, which are input to a fine-tuned large language model (LLM) to predict joint articulation as co… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  32. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  33. arXiv:2406.05654  [pdf, other

    cs.CL cs.IR

    DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation

    Authors: Shuting Wang, Jiongnan Liu, Shiren Song, Jiehan Cheng, Yuqi Fu, Peidong Guo, Kun Fang, Yutao Zhu, Zhicheng Dou

    Abstract: Retrieval-Augmented Generation (RAG) offers a promising solution to address various limitations of Large Language Models (LLMs), such as hallucination and difficulties in keeping up with real-time updates. This approach is particularly critical in expert and domain-specific applications where LLMs struggle to cover expert knowledge. Therefore, evaluating RAG models in such scenarios is crucial, ye… ▽ More

    Submitted 16 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  34. arXiv:2406.05478  [pdf, other

    cs.CV cs.AI

    Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis

    Authors: Zanlin Ni, Yulin Wang, Renping Zhou, Jiayi Guo, Jinyi Hu, Zhiyuan Liu, Shiji Song, Yuan Yao, Gao Huang

    Abstract: The field of image synthesis is currently flourishing due to the advancements in diffusion models. While diffusion models have been successful, their computational intensity has prompted the pursuit of more efficient alternatives. As a representative work, non-autoregressive Transformers (NATs) have been recognized for their rapid generation. However, a major drawback of these models is their infe… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR2024

  35. arXiv:2406.05223  [pdf, other

    cs.LG cs.AI

    CorDA: Context-Oriented Decomposition Adaptation of Large Language Models

    Authors: Yibo Yang, Xiaojie Li, Zhongzhu Zhou, Shuaiwen Leon Song, Jianlong Wu, Liqiang Nie, Bernard Ghanem

    Abstract: Current parameter-efficient fine-tuning (PEFT) methods build adapters without considering the context of downstream task to learn, or the context of important knowledge to maintain. As a result, there is often a performance gap compared to full-parameter finetuning, and meanwhile the finetuned model suffers from catastrophic forgetting of the pre-trained world knowledge. In this paper, we propose… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  36. arXiv:2406.04295  [pdf, other

    cs.CV

    Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

    Authors: Jiayi Guo, Junhao Zhao, Chunjiang Ge, Chaoqun Du, Zanlin Ni, Shiji Song, Humphrey Shi, Gao Huang

    Abstract: Test-time adaptation (TTA) aims to enhance the performance of source-domain pretrained models when tested on unknown shifted target domains. Traditional TTA methods primarily adapt model weights based on target data streams, making model performance sensitive to the amount and order of target data. Recently, diffusion-driven TTA methods have demonstrated strong performance by using an unconditiona… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: GitHub: https://github.com/SHI-Labs/Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment

  37. arXiv:2406.02395  [pdf, other

    cs.LG cs.CV

    GrootVL: Tree Topology is All You Need in State Space Model

    Authors: Yicheng Xiao, Lin Song, Shaoli Huang, Jiangshan Wang, Siyu Song, Yixiao Ge, Xiu Li, Ying Shan

    Abstract: The state space models, employing recursively propagated features, demonstrate strong representation capabilities comparable to Transformer models and superior efficiency. However, constrained by the inherent geometric constraints of sequences, it still falls short in modeling long-range dependencies. To address this issue, we propose the GrootVL network, which first dynamically generates a tree t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: The code is available at https://github.com/EasonXiao-888/GrootVL

  38. arXiv:2406.00977  [pdf, other

    cs.CV cs.AI

    Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model

    Authors: Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

    Abstract: Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images. However, increasing input resolution poses two main challenges: 1) It extends the context length required by the language model, leading to inefficiencies and hitting… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  39. arXiv:2406.00262  [pdf, other

    cs.LG cs.AI

    Contrastive Learning Via Equivariant Representation

    Authors: Sifan Song, Jinfeng Wang, Qiaochu Zhao, Xiang Li, Dufan Wu, Angelos Stefanidis, Jionglong Su, S. Kevin Zhou, Quanzheng Li

    Abstract: Invariant-based Contrastive Learning (ICL) methods have achieved impressive performance across various domains. However, the absence of latent space representation for distortion (augmentation)-related information in the latent space makes ICL sub-optimal regarding training efficiency and robustness in downstream tasks. Recent studies suggest that introducing equivariance into Contrastive Learning… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: Preprint. Under review

  40. arXiv:2406.00017  [pdf, other

    cs.CL cs.AI cs.MM

    PTA: Enhancing Multimodal Sentiment Analysis through Pipelined Prediction and Translation-based Alignment

    Authors: Shezheng Song, Shasha Li, Shan Zhao, Chengyu Wang, Xiaopeng Li, Jie Yu, Qian Wan, Jun Ma, Tianwei Yan, Wentao Ma, Xiaoguang Mao

    Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to understand opinions in a granular manner, advancing human-computer interaction and other fields. Traditionally, MABSA methods use a joint prediction approach to identify aspects and sentiments simultaneously. However, we argue that joint models are not always superior. Our analysis shows that joint models struggle to align relevant text to… ▽ More

    Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced June 2024.

    Comments: Code will be released upon publication

  41. arXiv:2405.16933  [pdf, other

    cs.CL cs.IR

    Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning

    Authors: Xun Liang, Simin Niu, Zhiyu li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, Chenyang Xi

    Abstract: Retrieval-Augmented Generation (RAG) offers a cost-effective approach to injecting real-time knowledge into large language models (LLMs). Nevertheless, constructing and validating high-quality knowledge repositories require considerable effort. We propose a pre-retrieval framework named Pseudo-Graph Retrieval-Augmented Generation (PG-RAG), which conceptualizes LLMs as students by providing them wi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  42. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  43. arXiv:2405.15738  [pdf, other

    cs.CV

    ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

    Authors: Chunjiang Ge, Sijie Cheng, Ziming Wang, Jiale Yuan, Yuan Gao, Jun Song, Shiji Song, Gao Huang, Bo Zheng

    Abstract: High-resolution Large Multimodal Models (LMMs) encounter the challenges of excessive visual tokens and quadratic visual complexity. Current high-resolution LMMs address the quadratic complexity while still generating excessive visual tokens. However, the redundancy in visual tokens is the key problem as it leads to more substantial compute. To mitigate this issue, we propose ConvLLaVA, which emplo… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages

  44. arXiv:2405.11874  [pdf, other

    cs.CL

    xFinder: Robust and Pinpoint Answer Extraction for Large Language Models

    Authors: Qingchen Yu, Zifan Zheng, Shichao Song, Zhiyu Li, Feiyu Xiong, Bo Tang, Ding Chen

    Abstract: The continuous advancement of large language models (LLMs) has brought increasing attention to the critical issue of developing fair and reliable methods for evaluating their performance. Particularly, the emergence of subjective or non-subjective cheating phenomena, such as test set leakage and prompt format overfitting, poses significant challenges to the reliable evaluation of LLMs. Since evalu… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 37 Pages

  45. arXiv:2405.10096  [pdf, other

    cs.LG cs.CR cs.DC

    The Effect of Quantization in Federated Learning: A Rényi Differential Privacy Perspective

    Authors: Tianqu Kang, Lumin Liu, Hengtao He, Jun Zhang, S. H. Song, Khaled B. Letaief

    Abstract: Federated Learning (FL) is an emerging paradigm that holds great promise for privacy-preserving machine learning using distributed data. To enhance privacy, FL can be combined with Differential Privacy (DP), which involves adding Gaussian noise to the model weights. However, FL faces a significant challenge in terms of large communication overhead when transmitting these model weights. To address… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 6 pages, 5 figures, submitted to 2024 IEEE MeditCom

  46. arXiv:2405.09514  [pdf, other

    eess.SP cs.IT cs.LG

    Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck

    Authors: Hongru Li, Jiawei Shao, Hengtao He, Shenghui Song, Jun Zhang, Khaled B. Letaief

    Abstract: Task-oriented communication aims to extract and transmit task-relevant information to significantly reduce the communication overhead and transmission latency. However, the unpredictable distribution shifts between training and test data, including domain shift and semantic shift, can dramatically undermine the system performance. In order to tackle these challenges, it is crucial to ensure that t… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 13 pages, 8 figures, submitted to IEEE for potential publication

  47. arXiv:2405.09359  [pdf, other

    cs.RO cs.HC

    Visual Attention Based Cognitive Human-Robot Collaboration for Pedicle Screw Placement in Robot-Assisted Orthopedic Surgery

    Authors: Chen Chen, Qikai Zou, Yuhang Song, Shiji Song, Xiang Li

    Abstract: Current orthopedic robotic systems largely focus on navigation, aiding surgeons in positioning a guiding tube but still requiring manual drilling and screw placement. The automation of this task not only demands high precision and safety due to the intricate physical interactions between the surgical tool and bone but also poses significant risks when executed without adequate human oversight. As… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 7 pages, 8 figures, submitted to IROS 2024

  48. arXiv:2405.08768  [pdf, other

    cs.CV cs.AI cs.LG

    EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

    Authors: Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Gao Huang

    Abstract: The superior performance of modern visual backbones usually comes with a costly training procedure. We contribute to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns with… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Journal version of arXiv:2211.09703 (ICCV 2023). Code is available at: https://github.com/LeapLabTHU/EfficientTrain

  49. Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

    Authors: Guanlin Mo, Shihong Song, Hu Ding

    Abstract: DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimension… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  50. arXiv:2405.04032  [pdf, other

    cs.CR cs.AI

    Locally Differentially Private In-Context Learning

    Authors: Chunyan Zheng, Keke Sun, Wenhao Zhao, Haibo Zhou, Lixin Jiang, Shaoyang Song, Chunlai Zhou

    Abstract: Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability. An important application in deploying large language models is to augment LLMs with a private database for some specific task. The main problem with this promising commercial use is that LLMs have been shown to memorize their training data and their prompt data are vulnerable to membership inference at… ▽ More

    Submitted 8 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: This paper was published at LREC-Coling 2024