Zum Hauptinhalt springen

Showing 1–50 of 7,726 results for author: li, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17377  [pdf, other

    cs.CL cs.AI

    NDP: Next Distribution Prediction as a More Broad Target

    Authors: Junhao Ruan, Abudukeyumu Abudula, Xinyu Liu, Bei Li, Yinqiao Li, Chenglong Wang, Yuchun Fan, Yuan Ge, Tong Xiao, Jingbo Zhu

    Abstract: Large language models (LLMs) trained on next-token prediction (NTP) paradigm have demonstrated powerful capabilities. However, the existing NTP paradigm contains several limitations, particularly related to planned task complications and error propagation during inference. In our work, we extend the critique of NTP, highlighting its limitation also due to training with a narrow objective: the pred… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 8 pages,5 figures

  2. arXiv:2408.17209  [pdf, other

    cs.DB

    Updateable Data-Driven Cardinality Estimator with Bounded Q-error

    Authors: Yingze Li, Xianglong Liu, Hongzhi Wang, Kaixin Zhang, Zixuan Wang

    Abstract: Modern Cardinality Estimators struggle with data updates. This research tackles this challenge within single-table. We introduce ICE, an Index-based Cardinality Estimator, the first data-driven estimator that enables instant, tuple-leveled updates. ICE has learned two key lessons from the multidimensional index and applied them to solve cardinality estimation in dynamic scenarios: (1) Index poss… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  3. arXiv:2408.17003  [pdf, other

    cs.CR cs.AI

    Safety Layers of Aligned Large Language Models: The Key to LLM Security

    Authors: Shen Li, Liuyi Yao, Lan Zhang, Yaliang Li

    Abstract: Aligned LLMs are highly secure, capable of recognizing and refusing to answer malicious questions. However, the role of internal parameters in maintaining this security is not well understood, further these models are vulnerable to security degradation when fine-tuned with non-malicious backdoor data or normal data. To address these challenges, our work uncovers the mechanism behind security in al… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  4. arXiv:2408.16975  [pdf, other

    q-bio.BM cs.AI cs.LG

    Technical Report of HelixFold3 for Biomolecular Structure Prediction

    Authors: Lihang Liu, Shanzhuo Zhang, Yang Xue, Xianbin Ye, Kunrui Zhu, Yuxin Li, Yang Liu, Xiaonan Zhang, Xiaomin Fang

    Abstract: The AlphaFold series has transformed protein structure prediction with remarkable accuracy, often matching experimental methods. AlphaFold2, AlphaFold-Multimer, and the latest AlphaFold3 represent significant strides in predicting single protein chains, protein complexes, and biomolecular structures. While AlphaFold2 and AlphaFold-Multimer are open-sourced, facilitating rapid and reliable predicti… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.16756  [pdf, other

    cs.CL

    How Far Can Cantonese NLP Go? Benchmarking Cantonese Capabilities of Large Language Models

    Authors: Jiyue Jiang, Liheng Chen, Pengan Chen, Sheng Wang, Qinghang Bao, Lingpeng Kong, Yu Li, Chuan Wu

    Abstract: The rapid evolution of large language models (LLMs) has transformed the competitive landscape in natural language processing (NLP), particularly for English and other data-rich languages. However, underrepresented languages like Cantonese, spoken by over 85 million people, face significant development gaps, which is particularly concerning given the economic significance of the Guangdong-Hong Kong… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  6. arXiv:2408.16582  [pdf, other

    cs.CV cs.CR

    FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection

    Authors: Yangxiang Zhang, Yuezun Li, Ao Luo, Jiaran Zhou, Junyu Dong

    Abstract: With the rise in popularity of portable devices, the spread of falsified media on social platforms has become rampant. This necessitates the timely identification of authentic content. However, most advanced detection methods are computationally heavy, hindering their real-time application. In this paper, we describe an efficient two-stream architecture for real-time image manipulation detection.… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: BMVC 2024

  7. arXiv:2408.16530  [pdf, other

    cs.CV

    A Comprehensive Review of 3D Object Detection in Autonomous Driving: Technological Advances and Future Directions

    Authors: Yu Wang, Shaohua Wang, Yicheng Li, Mingchun Liu

    Abstract: In recent years, 3D object perception has become a crucial component in the development of autonomous driving systems, providing essential environmental awareness. However, as perception tasks in autonomous driving evolve, their variants have increased, leading to diverse insights from industry and academia. Currently, there is a lack of comprehensive surveys that collect and summarize these perce… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  8. arXiv:2408.16390  [pdf, other

    cs.CL

    MQM-Chat: Multidimensional Quality Metrics for Chat Translation

    Authors: Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Kentaro Inui

    Abstract: The complexities of chats pose significant challenges for machine translation models. Recognizing the need for a precise evaluation metric to address the issues of chat translation, this study introduces Multidimensional Quality Metrics for Chat Translation (MQM-Chat). Through the experiments of five models using MQM-Chat, we observed that all models generated certain fundamental errors, while eac… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16293  [pdf, other

    cs.CL cs.AI cs.LG

    Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems

    Authors: Tian Ye, Zicheng Xu, Yuanzhi Li, Zeyuan Allen-Zhu

    Abstract: Language models have demonstrated remarkable performance in solving reasoning tasks; however, even the strongest models still occasionally make reasoning mistakes. Recently, there has been active research aimed at improving reasoning accuracy, particularly by using pretrained language models to "self-correct" their mistakes via multi-round prompting. In this paper, we follow this line of work but… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.20311

  10. arXiv:2408.15868  [pdf, other

    cs.CV cs.AI

    GenDDS: Generating Diverse Driving Video Scenarios with Prompt-to-Video Generative Model

    Authors: Yongjie Fu, Yunlong Li, Xuan Di

    Abstract: Autonomous driving training requires a diverse range of datasets encompassing various traffic conditions, weather scenarios, and road types. Traditional data augmentation methods often struggle to generate datasets that represent rare occurrences. To address this challenge, we propose GenDDS, a novel approach for generating driving scenarios generation by leveraging the capabilities of Stable Diff… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  11. arXiv:2408.15772  [pdf, other

    cs.IT

    220 GHz Urban Microcell Channel Measurement and Characterization on a University Campus

    Authors: Yuanbo Li, Yiqin Wang, Yejian Lyu, Ziming Yu, Chong Han

    Abstract: Owning abundant bandwidth resources, the Terahertz (THz) band (0.1-10~THz) is envisioned as a key technology to realize ultra-high-speed communications in 6G and beyond wireless networks. To realize reliable THz communications in urban microcell (UMi) environments, propagation analysis and channel characterization are still insufficient. In this paper, channel measurement campaigns are conducted i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 5 pages, 4 figures, 1 table

  12. arXiv:2408.15600  [pdf, other

    cs.LG cs.DC

    Exploring Selective Layer Fine-Tuning in Federated Learning

    Authors: Yuchang Sun, Yuexiang Xie, Bolin Ding, Yaliang Li, Jun Zhang

    Abstract: Federated learning (FL) has emerged as a promising paradigm for fine-tuning foundation models using distributed data in a privacy-preserving manner. Under limited computational resources, clients often find it more practical to fine-tune a selected subset of layers, rather than the entire model, based on their task-specific data. In this study, we provide a thorough theoretical exploration of sele… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  13. arXiv:2408.15581  [pdf, other

    cs.NI cs.IT

    An eBPF-Based Trace-Driven Emulation Method for Satellite Networks

    Authors: Weibiao Tian, Ye Li, Jinwei Zhao, Sheng Wu, Jianping Pan

    Abstract: System-level performance evaluation over satellite networks often requires a simulated or emulated environment for reproducibility and low cost. However, the existing tools may not meet the needs for scenarios such as the low-earth orbit (LEO) satellite networks. To address the problem, this paper proposes and implements a trace-driven emulation method based on Linux's eBPF technology. Building a… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 4 pages, 4 figures

  14. arXiv:2408.15563  [pdf, other

    cs.DB

    Order-preserving pattern mining with forgetting mechanism

    Authors: Yan Li, Chenyu Ma, Rong Gao, Youxi Wu, Jinyan Li, Wenjian Wang, Xindong Wu

    Abstract: Order-preserving pattern (OPP) mining is a type of sequential pattern mining method in which a group of ranks of time series is used to represent an OPP. This approach can discover frequent trends in time series. Existing OPP mining algorithms consider data points at different time to be equally important; however, newer data usually have a more significant impact, while older data have a weaker i… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  15. arXiv:2408.15543  [pdf, other

    cs.CL cs.AI cs.CY cs.HC

    An Investigation of Warning Erroneous Chat Translations in Cross-lingual Communication

    Authors: Yunmeng Li, Jun Suzuki, Makoto Morishita, Kaori Abe, Kentaro Inui

    Abstract: The complexities of chats pose significant challenges for machine translation models. Recognizing the need for a precise evaluation metric to address the issues of chat translation, this study introduces Multidimensional Quality Metrics for Chat Translation (MQM-Chat). Through the experiments of five models using MQM-Chat, we observed that all models generated certain fundamental errors, while eac… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Journal ref: IJCNLP-AACL 2023 Student Research Workshop

  16. arXiv:2408.15406  [pdf, other

    cs.SI cs.AI cs.CL

    Intertwined Biases Across Social Media Spheres: Unpacking Correlations in Media Bias Dimensions

    Authors: Yifan Liu, Yike Li, Dong Wang

    Abstract: Media bias significantly shapes public perception by reinforcing stereotypes and exacerbating societal divisions. Prior research has often focused on isolated media bias dimensions such as \textit{political bias} or \textit{racial bias}, neglecting the complex interrelationships among various bias dimensions across different topic domains. Moreover, we observe that models trained on existing media… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted to ASONAM 2024

    ACM Class: I.2.7

  17. arXiv:2408.15388  [pdf, ps, other

    cs.RO cs.CV cs.LG

    Panoptic Perception for Autonomous Driving: A Survey

    Authors: Yunge Li, Lanyu Xu

    Abstract: Panoptic perception represents a forefront advancement in autonomous driving technology, unifying multiple perception tasks into a singular, cohesive framework to facilitate a thorough understanding of the vehicle's surroundings. This survey reviews typical panoptic perception models for their unique inputs and architectures and compares them to performance, responsiveness, and resource utilizatio… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  18. arXiv:2408.14843  [pdf, other

    cs.LG cs.NE eess.SP

    Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging

    Authors: Yuanhao Li, Badong Chen, Zhongxu Hu, Keita Suzuki, Wenjun Bai, Yasuharu Koike, Okito Yamashita

    Abstract: Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a pote… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  19. arXiv:2408.14754  [pdf, other

    physics.med-ph cs.AI cs.CV physics.ins-det

    Sequential-Scanning Dual-Energy CT Imaging Using High Temporal Resolution Image Reconstruction and Error-Compensated Material Basis Image Generation

    Authors: Qiaoxin Li, Ruifeng Chen, Peng Wang, Guotao Quan, Yanfeng Du, Dong Liang, Yinsheng Li

    Abstract: Dual-energy computed tomography (DECT) has been widely used to obtain quantitative elemental composition of imaged subjects for personalized and precise medical diagnosis. Compared with DECT leveraging advanced X-ray source and/or detector technologies, the use of the sequential-scanning data acquisition scheme to implement DECT may make a broader impact on clinical practice because this scheme re… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  20. arXiv:2408.14728  [pdf, other

    cs.LG cs.AI cs.CR

    TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

    Authors: Bongsoo Yi, Rongjie Lai, Yao Li

    Abstract: Adversarial training has been shown to be successful in enhancing the robustness of deep neural networks against adversarial attacks. However, this robustness is accompanied by a significant decline in accuracy on clean data. In this paper, we propose a novel method, called Tangent Direction Guided Adversarial Training (TART), that leverages the tangent space of the data manifold to ameliorate the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  21. arXiv:2408.14603  [pdf, other

    cs.LG stat.ML

    Biased Dueling Bandits with Stochastic Delayed Feedback

    Authors: Bongsoo Yi, Yue Kang, Yao Li

    Abstract: The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information retrieval, and more. However, in many real-world applications, the feedback for actions is often subject to unavoidable delays and is not immediately available to the ag… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  22. arXiv:2408.14600  [pdf, other

    cs.CV

    PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection

    Authors: Yidi Li, Jiahao Wen, Bin Ren, Wenhao Li, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: The integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection. However, this combination often struggles with capturing semantic information effectively. Moreover, relying solely on point features within regions of interest can lead to information loss and limitations in local feature representation. To tackle these challenges, we propose a novel two… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 3D Object Detection

  23. arXiv:2408.14585  [pdf, other

    cs.CV cs.SD eess.AS

    Global-Local Distillation Network-Based Audio-Visual Speaker Tracking with Incomplete Modalities

    Authors: Yidi Li, Yihan Li, Yixin Guo, Bin Ren, Zhenhuan Xu, Hao Guo, Hong Liu, Nicu Sebe

    Abstract: In speaker tracking research, integrating and complementing multi-modal data is a crucial strategy for improving the accuracy and robustness of tracking systems. However, tracking with incomplete modalities remains a challenging issue due to noisy observations caused by occlusion, acoustic noise, and sensor failures. Especially when there is missing data in multiple modalities, the performance of… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Audio-Visual Speaker Tracking with Incomplete Modalities

  24. arXiv:2408.14568  [pdf, other

    cs.CL cs.AI

    Improving Clinical Note Generation from Complex Doctor-Patient Conversation

    Authors: Yizhan Li, Sifan Wu, Christopher Smith, Thomas Lo, Bang Liu

    Abstract: Writing clinical notes and documenting medical exams is a critical task for healthcare professionals, serving as a vital component of patient care documentation. However, manually writing these notes is time-consuming and can impact the amount of time clinicians can spend on direct patient interaction and other tasks. Consequently, the development of automated clinical note generation systems has… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  25. arXiv:2408.14493  [pdf

    cs.LG eess.SY

    Extraction of Typical Operating Scenarios of New Power System Based on Deep Time Series Aggregation

    Authors: Zhaoyang Qu, Zhenming Zhang, Nan Qu, Yuguang Zhou, Yang Li, Tao Jiang, Min Li, Chao Long

    Abstract: Extracting typical operational scenarios is essential for making flexible decisions in the dispatch of a new power system. This study proposed a novel deep time series aggregation scheme (DTSAs) to generate typical operational scenarios, considering the large amount of historical operational snapshot data. Specifically, DTSAs analyze the intrinsic mechanisms of different scheduling operational sce… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by CAAI Transactions on Intelligence Technology

  26. arXiv:2408.14453  [pdf

    cs.LG eess.IV eess.SP

    Reconstructing physiological signals from fMRI across the adult lifespan

    Authors: Shiyu Wang, Ziyuan Xu, Yamin Li, Mara Mather, Roza G. Bayrak, Catie Chang

    Abstract: Interactions between the brain and body are of fundamental importance for human behavior and health. Functional magnetic resonance imaging (fMRI) captures whole-brain activity noninvasively, and modeling how fMRI signals interact with physiological dynamics of the body can provide new insight into brain function and offer potential biomarkers of disease. However, physiological recordings are not a… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  27. arXiv:2408.14393  [pdf, other

    cs.IR cs.LG

    CURE4Rec: A Benchmark for Recommendation Unlearning with Deeper Influence

    Authors: Chaochao Chen, Jiaming Zhang, Yizhao Zhang, Li Zhang, Lingjuan Lyu, Yuyuan Li, Biao Gong, Chenggang Yan

    Abstract: With increasing privacy concerns in artificial intelligence, regulations have mandated the right to be forgotten, granting individuals the right to withdraw their data from models. Machine unlearning has emerged as a potential solution to enable selective forgetting in models, particularly in recommender systems where historical data contains sensitive user information. Despite recent advances in… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  28. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  29. arXiv:2408.14144  [pdf, other

    cs.LG cs.DC

    Neighborhood and Global Perturbations Supported SAM in Federated Learning: From Local Tweaks To Global Awareness

    Authors: Boyuan Li, Zihao Peng, Yafei Li, Mingliang Xu, Shengbo Chen, Baofeng Ji, Cong Shen

    Abstract: Federated Learning (FL) can be coordinated under the orchestration of a central server to collaboratively build a privacy-preserving model without the need for data exchange. However, participant data heterogeneity leads to local optima divergence, subsequently affecting convergence outcomes. Recent research has focused on global sharpness-aware minimization (SAM) and dynamic regularization techni… ▽ More

    Submitted 29 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  30. arXiv:2408.14122  [pdf, other

    cs.CR

    FG-SAT: Efficient Flow Graph for Encrypted Traffic Classification under Environment Shifts

    Authors: Susu Cui, Xueying Han, Dongqi Han, Zhiliang Wang, Weihang Wang, Yun Li, Bo Jiang, Baoxu Liu, Zhigang Lu

    Abstract: Encrypted traffic classification plays a critical role in network security and management. Currently, mining deep patterns from side-channel contents and plaintext fields through neural networks is a major solution. However, existing methods have two major limitations: (1) They fail to recognize the critical link between transport layer mechanisms and applications, missing the opportunity to learn… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

  31. arXiv:2408.14008  [pdf, other

    cs.CV cs.AI

    LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models

    Authors: Qihang Ge, Wei Sun, Yu Zhang, Yunhao Li, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai

    Abstract: The explosive growth of videos on streaming media platforms has underscored the urgent need for effective video quality assessment (VQA) algorithms to monitor and perceptually optimize the quality of streaming videos. However, VQA remains an extremely challenging task due to the diverse video content and the complex spatial and temporal distortions, thus necessitating more advanced methods to addr… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  32. arXiv:2408.13987  [pdf, other

    cs.CL cs.AI

    Focused Large Language Models are Stable Many-Shot Learners

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Yueqi Zhang, Chuyi Tan, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: In-Context Learning (ICL) enables large language models (LLMs) to achieve rapid task adaptation by learning from demonstrations. With the increase in available context length of LLMs, recent experiments have shown that the performance of ICL does not necessarily scale well in many-shot (demonstration) settings. We theoretically and experimentally confirm that the reason lies in more demonstrations… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 15 pages

  33. arXiv:2408.13986  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    AgentMove: Predicting Human Mobility Anywhere Using Large Language Model based Agentic Framework

    Authors: Jie Feng, Yuwei Du, Jie Zhao, Yong Li

    Abstract: Human mobility prediction plays a crucial role in various real-world applications. Although deep learning based models have shown promising results over the past decade, their reliance on extensive private mobility data for training and their inability to perform zero-shot predictions, have hindered further advancements. Recently, attempts have been made to apply large language models (LLMs) to mo… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 13 pages

  34. arXiv:2408.13980  [pdf, other

    cs.CV

    FusionSAM: Latent Space driven Segment Anything Model for Multimodal Fusion and Segmentation

    Authors: Daixun Li, Weiying Xie, Mingxiang Cao, Yunke Wang, Jiaqing Zhang, Yunsong Li, Leyuan Fang, Chang Xu

    Abstract: Multimodal image fusion and segmentation enhance scene understanding in autonomous driving by integrating data from various sensors. However, current models struggle to efficiently segment densely packed elements in such scenes, due to the absence of comprehensive fusion features that can guide mid-process fine-tuning and focus attention on relevant areas. The Segment Anything Model (SAM) has emer… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  35. arXiv:2408.13977  [pdf, other

    cs.HC

    Say Your Reason: Extract Contextual Rules In Situ for Context-aware Service Recommendation

    Authors: Yuxuan Li, Jiahui Li, Lihang Pan, Chun Yu, Yuanchun Shi

    Abstract: This paper introduces SayRea, an interactive system that facilitates the extraction of contextual rules for personalized context-aware service recommendations in mobile scenarios. The system monitors a user's execution of registered services on their smartphones (via accessibility service) and proactively requests a single-sentence reason from the user. By utilizing a Large Language Model (LLM), S… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  36. arXiv:2408.13759  [pdf, other

    cs.RO

    MASQ: Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion

    Authors: Qi Liu, Jingxiang Guo, Sixu Lin, Shuaikang Ma, Jinxuan Zhu, Yanjie Li

    Abstract: This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped ro… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  37. arXiv:2408.13750  [pdf, other

    cs.AI cs.MA

    Multi-Agent Target Assignment and Path Finding for Intelligent Warehouse: A Cooperative Multi-Agent Deep Reinforcement Learning Perspective

    Authors: Qi Liu, Jianqi Gao, Dongjie Zhu, Xizheng Pang, Pengbin Chen, Jingxiang Guo, Yanjie Li

    Abstract: Multi-agent target assignment and path planning (TAPF) are two key problems in intelligent warehouse. However, most literature only addresses one of these two problems separately. In this study, we propose a method to simultaneously solve target assignment and path planning from a perspective of cooperative multi-agent deep reinforcement learning (RL). To the best of our knowledge, this is the fir… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  38. arXiv:2408.13738  [pdf, other

    cs.CL

    Poor-Supervised Evaluation for SuperLLM via Mutual Consistency

    Authors: Peiwen Yuan, Shaoxiong Feng, Yiwei Li, Xinglin Wang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: The guidance from capability evaluations has greatly propelled the progress of both human society and Artificial Intelligence. However, as LLMs evolve, it becomes challenging to construct evaluation benchmarks for them with accurate labels on hard tasks that approach the boundaries of human capabilities. To credibly conduct evaluation without accurate labels (denoted as poor-supervised evaluation)… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: ACL findings

  39. arXiv:2408.13728  [pdf, other

    cs.CV

    3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

    Authors: Haizhao Jing, Liuwei Wan, Xizhe Xue, Haokui Zhang, Ying Li

    Abstract: Recently, the Vision Transformer (ViT) model has replaced the classical Convolutional Neural Network (ConvNet) in various computer vision tasks due to its superior performance. Even in hyperspectral image (HSI) classification field, ViT-based methods also show promising potential. Nevertheless, ViT encounters notable difficulties in processing HSI data. Its self-attention mechanism, which exhibits… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  40. arXiv:2408.13705  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval

    Authors: Lifeng Zhou, Yuke Li, Rui Deng, Yuting Yang, Haoqi Zhu

    Abstract: The success of speech-image retrieval relies on establishing an effective alignment between speech and image. Existing methods often model cross-modal interaction through simple cosine similarity of the global feature of each modality, which fall short in capturing fine-grained details within modalities. To address this issue, we introduce an effective framework and a novel learning task named cro… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.13119

  41. arXiv:2408.13499  [pdf, other

    cs.CV

    R2G: Reasoning to Ground in 3D Scenes

    Authors: Yixuan Li, Zan Wang, Wei Liang

    Abstract: We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects within 3D scenes in a reasoning manner. In contrast to prior works, R2G explicitly models the 3D scene with a semantic concept-based scene graph; recurrently simulates the attention transferring across object entities; thus makes the process of grounding the target objects with the highest probability int… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  42. arXiv:2408.13471  [pdf, other

    cs.LG cs.AI

    Disentangled Generative Graph Representation Learning

    Authors: Xinyue Hu, Zhibin Duan, Xinyang Liu, Yuxin Li, Bo Chen, Mingyuan Zhou

    Abstract: Recently, generative graph models have shown promising results in learning graph representations through self-supervised methods. However, most existing generative graph representation learning (GRL) approaches rely on random masking across the entire graph, which overlooks the entanglement of learned representations. This oversight results in non-robustness and a lack of explainability. Furthermo… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  43. arXiv:2408.13457  [pdf, other

    cs.CL cs.AI

    Make Every Penny Count: Difficulty-Adaptive Self-Consistency for Cost-Efficient Reasoning

    Authors: Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, Kan Li

    Abstract: Self-consistency (SC), a widely used decoding strategy for chain-of-thought reasoning, shows significant gains across various multi-step reasoning tasks but comes with a high cost due to multiple sampling with the preset size. Its variants, Adaptive self-consistency (ASC) and Early-stopping self-consistency (ESC), dynamically adjust the number of samples based on the posterior distribution of a se… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: Preprint

  44. arXiv:2408.13385  [pdf, other

    cs.CV

    MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

    Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li

    Abstract: Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems. Unsupervised Few-Shot Learning (U-FSL) seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. In this work, we first quantitatively assess the impacts of Masked Image Modelin… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ACMMM 2024 (Oral)

  45. arXiv:2408.13373  [pdf, other

    cs.CV

    Learning Unknowns from Unknowns: Diversified Negative Prototypes Generator for Few-Shot Open-Set Recognition

    Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li

    Abstract: Few-shot open-set recognition (FSOR) is a challenging task that requires a model to recognize known classes and identify unknown classes with limited labeled data. Existing approaches, particularly Negative-Prototype-Based methods, generate negative prototypes based solely on known class data. However, as the unknown space is infinite while the known space is limited, these methods suffer from lim… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ACMMM 2024

  46. arXiv:2408.13252  [pdf, other

    cs.CV

    LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

    Authors: Shuai Yang, Jing Tan, Mengchen Zhang, Tong Wu, Yixuan Li, Gordon Wetzstein, Ziwei Liu, Dahua Lin

    Abstract: 3D immersive scene generation is a challenging yet critical task in computer vision and graphics. A desired virtual 3D scene should 1) exhibit omnidirectional view consistency, and 2) allow for free exploration in complex scene hierarchies. Existing methods either rely on successive scene expansion via inpainting or employ panorama representation to represent large FOV scene environments. However,… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Project page: https://ys-imtech.github.io/projects/LayerPano3D/

  47. arXiv:2408.13119  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Coarse-to-fine Alignment Makes Better Speech-image Retrieval

    Authors: Lifeng Zhou, Yuke Li

    Abstract: In this paper, we propose a novel framework for speech-image retrieval. We utilize speech-image contrastive (SIC) learning tasks to align speech and image representations at a coarse level and speech-image matching (SIM) learning tasks to further refine the fine-grained cross-modal alignment. SIC and SIM learning tasks are jointly trained in a unified manner. To optimize the learning process, we u… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  48. arXiv:2408.12897  [pdf, other

    eess.IV cs.CV

    When Diffusion MRI Meets Diffusion Model: A Novel Deep Generative Model for Diffusion MRI Generation

    Authors: Xi Zhu, Wei Zhang, Yijie Li, Lauren J. O'Donnell, Fan Zhang

    Abstract: Diffusion MRI (dMRI) is an advanced imaging technique characterizing tissue microstructure and white matter structural connectivity of the human brain. The demand for high-quality dMRI data is growing, driven by the need for better resolution and improved tissue contrast. However, acquiring high-quality dMRI data is expensive and time-consuming. In this context, deep generative modeling emerges as… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 11 pages, 3 figures

  49. arXiv:2408.12821  [pdf, other

    cs.CV cs.AI

    Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery

    Authors: Zhenyuan Yang, Xuhui Lin, Qinyi He, Ziye Huang, Zhengliang Liu, Hanqi Jiang, Peng Shu, Zihao Wu, Yiwei Li, Stephen Law, Gengchen Mai, Tianming Liu, Tao Yang

    Abstract: The emergence of Large Language Models (LLMs) and multimodal foundation models (FMs) has generated heightened interest in their applications that integrate vision and language. This paper investigates the capabilities of ChatGPT-4V and Gemini Pro for Street View Imagery, Built Environment, and Interior by evaluating their performance across various tasks. The assessments include street furniture i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  50. arXiv:2408.12803  [pdf, other

    cs.LG cs.AI cs.IR

    Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth

    Authors: Yuxiang Wei, Zhaoxin Qiu, Yingjie Li, Yuke Sun, Xiaoling Li

    Abstract: As a key component in boosting online user growth, uplift modeling aims to measure individual user responses (e.g., whether to play the game) to various treatments, such as gaming bonuses, thereby enhancing business outcomes. However, previous research typically considers a single-task, single-treatment setting, where only one treatment exists and the overall treatment effect is measured by a sing… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.