Zum Hauptinhalt springen

Showing 1–50 of 8,471 results for author: Cheng

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17214  [pdf, other

    cs.IR

    Efficient Multi-task Prompt Tuning for Recommendation

    Authors: Ting Bai, Le Huang, Yue Yu, Cheng Yang, Cheng Hou, Zhe Zhao, Chuan Shi

    Abstract: With the expansion of business scenarios, real recommender systems are facing challenges in dealing with the constantly emerging new tasks in multi-task learning frameworks. In this paper, we attempt to improve the generalization ability of multi-task recommendations when dealing with new tasks. We find that joint training will enhance the performance of the new task but always negatively impact e… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.17150  [pdf, other

    cs.CV cs.AI

    Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning

    Authors: Xiaoye Qu, Jiashuo Sun, Wei Wei, Yu Cheng

    Abstract: Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content. To mitigate hallucinations, previous studies mainly focus on retraining LVLMs with custom datasets. Although effective, they inherently come with add… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 tables, 7 figures

  3. arXiv:2408.17052  [pdf, other

    cs.CV

    Can We Leave Deepfake Data Behind in Training Deepfake Detector?

    Authors: Jikang Cheng, Zhiyuan Yan, Ying Zhang, Yuhao Luo, Zhongyuan Wang, Chen Li

    Abstract: The generalization ability of deepfake detectors is vital for their applications in real-world scenarios. One effective solution to enhance this ability is to train the models with manually-blended data, which we termed "blendfake", encouraging models to learn generic forgery artifacts like blending boundary. Interestingly, current SoTA methods utilize blendfake without incorporating any deepfake… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  4. arXiv:2408.16886  [pdf, other

    eess.IV cs.CV

    LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation

    Authors: Juntao Jiang, Mengmeng Wang, Huizhong Tian, Lingbo Cheng, Yong Liu

    Abstract: Although the progress made by large models in computer vision, optimization challenges, the complexity of transformer models, computational limitations, and the requirements of practical applications call for simpler designs in model architecture for medical image segmentation, especially in mobile medical devices that require lightweight and deployable models with real-time performance. However,… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  5. arXiv:2408.16859  [pdf, other

    eess.IV cs.CV

    Comparative Analysis of Transfer Learning Models for Breast Cancer Classification

    Authors: Sania Eskandari, Ali Eslamian, Qiang Cheng

    Abstract: The classification of histopathological images is crucial for the early and precise detection of breast cancer. This study investigates the efficiency of deep learning models in distinguishing between Invasive Ductal Carcinoma (IDC) and non-IDC in histopathology slides. We conducted a thorough comparison examination of eight sophisticated models: ResNet-50, DenseNet-121, ResNeXt-50, Vision Transfo… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  6. arXiv:2408.16774  [pdf

    cs.IT eess.SP

    Optimal UCA Design for OAM Based Wireless Backhaul Transmission

    Authors: Haiyue Jing, Wenchi Cheng, Wei Zhang, Hailin Zhang

    Abstract: Orbital angular momentum (OAM), which is considered as a novel way to achieve high capacity, has been attracted much attention recently. OAM signals emitted by uniform circular array (UCA) are widely regarded to go through the Bessel-form channels. However, the channel gains corresponding to the Bessel-form channels are with low signal-to-noise-ratio (SNR) on OAM-modes and it is difficult to achie… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2408.16532  [pdf, other

    eess.AS cs.LG cs.MM cs.SD eess.SP

    WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

    Authors: Shengpeng Ji, Ziyue Jiang, Xize Cheng, Yifu Chen, Minghui Fang, Jialong Zuo, Qian Yang, Ruiqi Li, Ziang Zhang, Xiaoda Yang, Rongjie Huang, Yidi Jiang, Qian Chen, Siqi Zheng, Wen Wang, Zhou Zhao

    Abstract: Language models have been effectively applied to modeling natural signals, such as images, video, speech, and audio. A crucial component of these models is the codec tokenizer, which compresses high-dimensional natural signals into lower-dimensional discrete tokens. In this paper, we introduce WavTokenizer, which offers several advantages over previous SOTA acoustic codec models in the audio domai… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Working in progress. arXiv admin note: text overlap with arXiv:2402.12208

  8. arXiv:2408.16500  [pdf, other

    cs.CV

    CogVLM2: Visual Language Models for Image and Video Understanding

    Authors: Wenyi Hong, Weihan Wang, Ming Ding, Wenmeng Yu, Qingsong Lv, Yan Wang, Yean Cheng, Shiyu Huang, Junhui Ji, Zhao Xue, Lei Zhao, Zhuoyi Yang, Xiaotao Gu, Xiaohan Zhang, Guanyu Feng, Da Yin, Zihan Wang, Ji Qi, Xixuan Song, Peng Zhang, Debing Liu, Bin Xu, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  9. arXiv:2408.16467  [pdf, other

    cs.NE cs.CV

    Spiking Diffusion Models

    Authors: Jiahang Cao, Hanzhong Guo, Ziqing Wang, Deming Zhou, Hao Cheng, Qiang Zhang, Renjing Xu

    Abstract: Recent years have witnessed Spiking Neural Networks (SNNs) gaining attention for their ultra-low energy consumption and high biological plausibility compared with traditional Artificial Neural Networks (ANNs). Despite their distinguished properties, the application of SNNs in the computationally intensive field of image generation is still under exploration. In this paper, we propose the Spiking D… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Artificial Intelligence

  10. arXiv:2408.16268  [pdf, other

    cs.CV

    UDD: Dataset Distillation via Mining Underutilized Regions

    Authors: Shiguang Wang, Zhongyu Zhang, Jian Cheng

    Abstract: Dataset distillation synthesizes a small dataset such that a model trained on this set approximates the performance of the original dataset. Recent studies on dataset distillation focused primarily on the design of the optimization process, with methods such as gradient matching, feature alignment, and training trajectory matching. However, little attention has been given to the issue of underutil… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: PRCV2024

  11. arXiv:2408.16238  [pdf, other

    cs.IR

    Efficient Transfer Learning Framework for Cross-Domain Click-Through Rate Prediction

    Authors: Qi Liu, Xingyuan Tang, Jianqiang Huang, Xiangqian Yu, Haoran Jin, Jin Chen, Yuanhao Pu, Defu Lian, Tan Qu, Zhe Wang, Jia Cheng, Jun Lei

    Abstract: Natural content and advertisement coexist in industrial recommendation systems but differ in data distribution. Concretely, traffic related to the advertisement is considerably sparser compared to that of natural content, which motivates the development of transferring knowledge from the richer source natural content domain to the sparser advertising domain. The challenges include the inefficienci… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  12. arXiv:2408.16236  [pdf, other

    cs.CV

    Neural Spectral Decomposition for Dataset Distillation

    Authors: Shaolei Yang, Shen Cheng, Mingbo Hong, Haoqiang Fan, Xing Wei, Shuaicheng Liu

    Abstract: In this paper, we propose Neural Spectrum Decomposition, a generic decomposition framework for dataset distillation. Unlike previous methods, we consider the entire dataset as a high-dimensional observation that is low-rank across all dimensions. We aim to discover the low-rank representation of the entire dataset and perform distillation efficiently. Toward this end, we learn a set of spectrum te… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  13. arXiv:2408.16233  [pdf, other

    cs.CV

    PSE-Net: Channel Pruning for Convolutional Neural Networks with Parallel-subnets Estimator

    Authors: Shiguang Wang, Tao Xie, Haijun Liu, Xingcheng Zhang, Jian Cheng

    Abstract: Channel Pruning is one of the most widespread techniques used to compress deep neural networks while maintaining their performances. Currently, a typical pruning algorithm leverages neural architecture search to directly find networks with a configurable width, the key step of which is to identify representative subnet for various pruning ratios by training a supernet. However, current methods mai… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 10pages, Neural Networks

  14. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  15. arXiv:2408.15778  [pdf, other

    cs.AI cs.CL

    LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

    Authors: Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu, Hongning Wang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning and critical for practical LLM agents and decision-making systems. However, evaluating LLMs as effective rule-based executors and planners remains under… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  16. arXiv:2408.15688  [pdf, other

    cs.IR

    PDSR: A Privacy-Preserving Diversified Service Recommendation Method on Distributed Data

    Authors: Lina Wang, Huan Yang, Yiran Shen, Chao Liu, Lianyong Qi, Xiuzhen Cheng, Feng Li

    Abstract: The last decade has witnessed a tremendous growth of service computing, while efficient service recommendation methods are desired to recommend high-quality services to users. It is well known that collaborative filtering is one of the most popular methods for service recommendation based on QoS, and many existing proposals focus on improving recommendation accuracy, i.e., recommending high-qualit… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  17. MMDRFuse: Distilled Mini-Model with Dynamic Refresh for Multi-Modality Image Fusion

    Authors: Yanglin Deng, Tianyang Xu, Chunyang Cheng, Xiao-Jun Wu, Josef Kittler

    Abstract: In recent years, Multi-Modality Image Fusion (MMIF) has been applied to many fields, which has attracted many scholars to endeavour to improve the fusion performance. However, the prevailing focus has predominantly been on the architecture design, rather than the training strategies. As a low-level vision task, image fusion is supposed to quickly deliver output images for observation and supportin… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 10 pages, 8 figures, accpeted by ACM International Conference on Multimedia 2024(Oral)

    Journal ref: 10 pages, 8 figures, accpeted by ACM International Conference on Multimedia 2024(Oral)

  18. arXiv:2408.15632  [pdf, other

    eess.SY cs.AI

    Structural Optimization of Lightweight Bipedal Robot via SERL

    Authors: Yi Cheng, Chenxi Han, Yuheng Min, Linqi Ye, Houde Liu, Hang Liu

    Abstract: Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  19. arXiv:2408.15555  [pdf, other

    eess.IV cs.CV cs.LG

    Latent Relationship Mining of Glaucoma Biomarkers: a TRI-LSTM based Deep Learning

    Authors: Cheng Huang, Junhao Shen, Qiuyu Luo, Karanjit Kooner, Tsengdar Lee, Yishen Liu, Jia Zhang

    Abstract: In recently years, a significant amount of research has been conducted on applying deep learning methods for glaucoma classification and detection. However, the explainability of those established machine learning models remains a big concern. In this research, in contrast, we learn from cognitive science concept and study how ophthalmologists judge glaucoma detection. Simulating experts' efforts,… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 images

  20. arXiv:2408.15371  [pdf, other

    cs.IR cs.LG

    Temporal Graph Neural Network-Powered Paper Recommendation on Dynamic Citation Networks

    Authors: Junhao Shen, Mohammad Ausaf Ali Haqqani, Beichen Hu, Cheng Huang, Xihao Xie, Tsengdar Lee, Jia Zhang

    Abstract: Due to the rapid growth of scientific publications, identifying all related reference articles in the literature has become increasingly challenging yet highly demanding. Existing methods primarily assess candidate publications from a static perspective, focusing on the content of articles and their structural information, such as citation relationships. There is a lack of research regarding how t… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 10 pages, 4 figures, accepted by SDU@AAAI-2024. The AAAI Workshop on Scientific Document Understanding (2024)

  21. arXiv:2408.15339  [pdf, other

    cs.LG cs.CL

    UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function

    Authors: Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng

    Abstract: An LLM is pretrained on trillions of tokens, but the pretrained LLM may still generate undesired responses. To solve this problem, alignment techniques such as RLHF, DPO and KTO are proposed. However, these alignment techniques have limitations. For example, RLHF requires training the reward model and policy separately, which is complex, time-consuming, memory intensive and unstable during trainin… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  22. arXiv:2408.15287  [pdf, other

    quant-ph cs.LG

    Quantum-Powered Personalized Learning

    Authors: Yifan Zhou, Chong Cheng Xu, Mingi Song, Yew Kee Wong

    Abstract: This paper explores the transformative potential of quantum computing in the realm of personalized learning. Traditional machine learning models and GPU-based approaches have long been utilized to tailor educational experiences to individual student needs. However, these methods face significant challenges in terms of scalability, computational efficiency, and real-time adaptation to the dynamic n… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 9 pages, 2 figures

  23. arXiv:2408.15273  [pdf

    eess.SP cs.IT

    Concentric UCAs Based Low-Order OAM for High Capacity in Radio Vortex Wireless Communications

    Authors: Haiyue Jing, Wenchi Cheng, Zan Li, Hailin Zhang

    Abstract: Due to the potential capacity-boosting for wireless communications, the Radio vOrtex Wireless COMMunication (RowComm) over orthogonal states/modes of Orbital Angular Momentum (OAM) has been paid much attention in recent years. Uniform circular array (UCA), as an efficient and convenient antenna structure, can transmit/receive multiple OAM beams with different OAM-modes simultaneously when the tran… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  24. arXiv:2408.15242  [pdf, other

    cs.CV

    Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty

    Authors: Saining Zhang, Baijun Ye, Xiaoxue Chen, Yuantao Chen, Zongzheng Zhang, Cheng Peng, Yongliang Shi, Hao Zhao

    Abstract: Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuiti… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: BMVC2024 Project Page: https://sainingzhang.github.io/project/uc-gs/ Code: https://github.com/SainingZhang/uc-gs/

  25. arXiv:2408.15165  [pdf, other

    cs.LG cond-mat.mtrl-sci physics.chem-ph physics.comp-ph

    Latent Ewald summation for machine learning of long-range interactions

    Authors: Bingqing Cheng

    Abstract: Machine learning interatomic potentials (MLIPs) often neglect long-range interactions, such as electrostatic and dispersion forces. In this work, we introduce a straightforward and efficient method to account for long-range interactions by learning a latent variable from local atomic descriptors and applying an Ewald summation to this variable. We demonstrate that in systems including charged, pol… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  26. arXiv:2408.15057  [pdf

    cs.LG

    Subgroup Analysis via Model-based Rule Forest

    Authors: I-Ling Cheng, Chan Hsu, Chantung Ku, Pei-Ju Lee, Yihuang Kang

    Abstract: Machine learning models are often criticized for their black-box nature, raising concerns about their applicability in critical decision-making scenarios. Consequently, there is a growing demand for interpretable models in such contexts. In this study, we introduce Model-based Deep Rule Forests (mobDRF), an interpretable representation learning algorithm designed to extract transparent models from… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  27. arXiv:2408.14853  [pdf, other

    cs.CL cs.AI cs.CR

    Detecting AI Flaws: Target-Driven Attacks on Internal Faults in Language Models

    Authors: Yuhao Du, Zhuo Li, Pengyu Cheng, Xiang Wan, Anningzhe Gao

    Abstract: Large Language Models (LLMs) have become a focal point in the rapidly evolving field of artificial intelligence. However, a critical concern is the presence of toxic content within the pre-training corpus of these models, which can lead to the generation of inappropriate outputs. Investigating methods for detecting internal faults in LLMs can help us understand their limitations and improve their… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  28. arXiv:2408.14831  [pdf, other

    cs.LG cs.DC cs.NI

    DRL-Based Federated Self-Supervised Learning for Task Offloading and Resource Allocation in ISAC-Enabled Vehicle Edge Computing

    Authors: Xueying Gu, Qiong Wu, Pingyi Fan, Nan Cheng, Wen Chen, Khaled B. Letaief

    Abstract: Intelligent Transportation Systems (ITS) leverage Integrated Sensing and Communications (ISAC) to enhance data exchange between vehicles and infrastructure in the Internet of Vehicles (IoV). This integration inevitably increases computing demands, risking real-time system stability. Vehicle Edge Computing (VEC) addresses this by offloading tasks to Road Side Unit (RSU), ensuring timely services. O… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: This paper has been submitted to Digital Communications and Networks. The source code has been released at: https://github.com/qiongwu86/Federated-SSL-task-offloading-and-resource-allocation

  29. arXiv:2408.14812  [pdf, other

    cs.CV

    HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling

    Authors: Yubin Wang, Xinyang Jiang, De Cheng, Wenli Sun, Dongsheng Li, Cairong Zhao

    Abstract: Prompt learning has become a prevalent strategy for adapting vision-language foundation models (VLMs) such as CLIP to downstream tasks. With the emergence of large language models (LLMs), recent studies have explored the potential of using category-related descriptions to enhance prompt effectiveness. However, conventional descriptions lack explicit structured information necessary to represent th… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures, 7 tables. arXiv admin note: substantial text overlap with arXiv:2312.06323

  30. arXiv:2408.14770  [pdf, other

    cs.CV

    Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification

    Authors: Sirui Li, Li Lin, Yijin Huang, Pujin Cheng, Xiaoying Tang

    Abstract: In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models. Recent multimodal text-image supervised foundation models offer new solutions to data scarcity through effective representation learning. However, their limited medical-specific pretraining hinders their performance… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE ISBI 2024

  31. arXiv:2408.14757  [pdf, other

    cs.CV cs.LG

    Learning effective pruning at initialization from iterative pruning

    Authors: Shengkai Liu, Yaofeng Cheng, Fusheng Zha, Wei Guo, Lining Sun, Zhenshan Bing, Chenguang Yang

    Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods still have a large accuracy gap with iterative pruning, especially at high sparsity levels. This raises an intriguing question: can we get inspiration from iterative pruning to improve the PaI performance? In the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  32. How to build trust in answers given by Generative AI for specific, and vague, financial questions

    Authors: Alex Zarifis, Xusen Cheng

    Abstract: Purpose: Generative artificial intelligence (GenAI) has progressed in its ability and has seen explosive growth in adoption. However, the consumer's perspective on its use, particularly in specific scenarios such as financial advice, is unclear. This research develops a model of how to build trust in the advice given by GenAI when answering financial questions. Design/methodology/approach: The mod… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Journal ref: Journal of Electronic Business & Digital Economics, pp.1-15

  33. arXiv:2408.14513  [pdf, other

    cs.LG cs.AI

    Variational autoencoder-based neural network model compression

    Authors: Liang Cheng, Peiyuan Guan, Amir Taherkordi, Lei Liu, Dapeng Lan

    Abstract: Variational Autoencoders (VAEs), as a form of deep generative model, have been widely used in recent years, and shown great great peformance in a number of different domains, including image generation and anomaly detection, etc.. This paper aims to explore neural network model compression method based on VAE. The experiment uses different neural network models for MNIST recognition as compression… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  34. arXiv:2408.14467  [pdf, other

    cs.CL

    Explicit Inductive Inference using Large Language Models

    Authors: Tianyang Liu, Tianyi Li, Liang Cheng, Mark Steedman

    Abstract: Large Language Models (LLMs) are reported to hold undesirable attestation bias on inference tasks: when asked to predict if a premise P entails a hypothesis H, instead of considering H's conditional truthfulness entailed by P, LLMs tend to use the out-of-context truth label of H as a fragile proxy. In this paper, we propose a pipeline that exploits this bias to do explicit inductive inference. Our… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  35. arXiv:2408.14419  [pdf, other

    cs.AI cs.CL cs.CV

    CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

    Authors: Shubham Bharti, Shiyun Cheng, Jihyun Rho, Martina Rao, Xiaojin Zhu

    Abstract: We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We d… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  36. arXiv:2408.14074  [pdf, other

    cs.SE

    Abstraction Engineering

    Authors: Nelly Bencomo, Jordi Cabot, Marsha Chechik, Betty H. C. Cheng, Benoit Combemale, Andrzej Wąsowski, Steffen Zschaler

    Abstract: Modern software-based systems operate under rapidly changing conditions and face ever-increasing uncertainty. In response, systems are increasingly adaptive and reliant on artificial-intelligence methods. In addition to the ubiquity of software with respect to users and application areas (e.g., transportation, smart grids, medicine, etc.), these high-impact software systems necessarily draw from m… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  37. arXiv:2408.13963  [pdf, other

    cs.CV

    Shifted Window Fourier Transform And Retention For Image Captioning

    Authors: Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

    Abstract: Image Captioning is an important Language and Vision task that finds application in a variety of contexts, ranging from healthcare to autonomous vehicles. As many real-world applications rely on devices with limited resources, much effort in the field was put into the development of lighter and faster models. However, much of the current optimizations focus on the Transformer architecture in contr… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Pre-print version of paper accepted for ICONIP 2024

  38. arXiv:2408.13959  [pdf, other

    cs.CL

    Bidirectional Awareness Induction in Autoregressive Seq2Seq Models

    Authors: Jia Cheng Hu, Roberto Cavicchioli, Alessandro Capotondi

    Abstract: Autoregressive Sequence-To-Sequence models are the foundation of many Deep Learning achievements in major research fields such as Vision and Natural Language Processing. Despite that, they still present significant limitations. For instance, when errors occur in the early steps of the prediction, the whole output is severely affected. Such reliance on previously predicted tokens and the inherent c… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  39. arXiv:2408.13899  [pdf, other

    cs.DB

    $\boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes

    Authors: Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, Wei Wang

    Abstract: Graph-based indexes have been widely employed to accelerate approximate similarity search of high-dimensional vectors. However, the performance of graph indexes to answer different queries varies vastly, leading to an unstable quality of service for downstream applications. This necessitates an effective measure to test query hardness on graph indexes. Nonetheless, popular distance-based hardness… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: Accepted by PVLDB Volume 17 (presented at 2025)

  40. arXiv:2408.13830  [pdf

    cs.CV

    Multi-SIGATnet: A multimodal schizophrenia MRI classification algorithm using sparse interaction mechanisms and graph attention networks

    Authors: Yuhong Jiao, Jiaqing Miao, Jinnan Gong, Hui He, Ping Liang, Cheng Luo, Ying Tan

    Abstract: Schizophrenia is a serious psychiatric disorder. Its pathogenesis is not completely clear, making it difficult to treat patients precisely. Because of the complicated non-Euclidean network structure of the human brain, learning critical information from brain networks remains difficult. To effectively capture the topological information of brain neural networks, a novel multimodal graph attention… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  41. arXiv:2408.13790  [pdf, other

    cs.CV

    CV-MOS: A Cross-View Model for Motion Segmentation

    Authors: Xiaoyu Tang, Zeyu Chen, Jintao Cheng, Xieyuanli Chen, Jin Wu, Bohuan Xue

    Abstract: In autonomous driving, accurately distinguishing between static and moving objects is crucial for the autonomous driving system. When performing the motion object segmentation (MOS) task, effectively leveraging motion information from objects becomes a primary challenge in improving the recognition of moving objects. Previous methods either utilized range view (RV) or bird's eye view (BEV) residua… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  42. arXiv:2408.13597  [pdf, other

    cs.CR cs.SE

    Automated Software Vulnerability Patching using Large Language Models

    Authors: Yu Nong, Haoran Yang, Long Cheng, Hongxin Hu, Haipeng Cai

    Abstract: Timely and effective vulnerability patching is essential for cybersecurity defense, for which various approaches have been proposed yet still struggle to generate valid and correct patches for real-world vulnerabilities. In this paper, we leverage the power and merits of pre-trained large language models (LLMs) to enable automated vulnerability patching using no test input/exploit evidence and wit… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  43. arXiv:2408.13546  [pdf, other

    eess.SP cs.AI

    Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics

    Authors: Zonghui Yang, Shijian Gao, Xiang Cheng, Liuqing Yang

    Abstract: Integrated sensing and communication (ISAC) technology plays a crucial role in vehicular networks. However, the communication channel within this context exhibits time-varying characteristics, and potential targets may move rapidly, resulting in double dynamics. These presents significant challenges for real-time ISAC precoding design that have not been thoroughly explored. While optimization-base… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 13 pages, 17 figures, 4 tables

  44. arXiv:2408.13479  [pdf, other

    quant-ph cs.LG q-bio.BM

    Quantum-machine-assisted Drug Discovery: Survey and Perspective

    Authors: Yidong Zhou, Jintai Chen, Jinglei Cheng, Gopal Karemore, Marinka Zitnik, Frederic T. Chong, Junyu Liu, Tianfan Fu, Zhiding Liang

    Abstract: Drug discovery and development is a highly complex and costly endeavor, typically requiring over a decade and substantial financial investment to bring a new drug to market. Traditional computer-aided drug design (CADD) has made significant progress in accelerating this process, but the development of quantum computing offers potential due to its unique capabilities. This paper discusses the integ… ▽ More

    Submitted 27 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 27 pages, 10 figures

  45. arXiv:2408.13454  [pdf, other

    cs.CV

    AdaOcc: Adaptive-Resolution Occupancy Prediction

    Authors: Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

    Abstract: Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computationa… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  46. arXiv:2408.13432  [pdf, other

    cs.CL

    Integrating Multi-Head Convolutional Encoders with Cross-Attention for Improved SPARQL Query Translation

    Authors: Yi-Hui Chen, Eric Jui-Lin Lu, Kwan-Ho Cheng

    Abstract: The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NM… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 24 pages, 20 figures, using the engrXiv template; the full version has been submitted to ACM Transactions on Information Systems and is currently under review. (2024)

  47. arXiv:2408.13115  [pdf, ps, other

    stat.ML cs.LG math.PR stat.CO

    Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias

    Authors: Yifan Chen, Xiaoou Cheng, Jonathan Niles-Weed, Jonathan Weare

    Abstract: The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  48. arXiv:2408.12980  [pdf, other

    cs.CL cs.LG

    MedDec: A Dataset for Extracting Medical Decisions from Discharge Summaries

    Authors: Mohamed Elgaar, Jiali Cheng, Nidhi Vakil, Hadi Amiri, Leo Anthony Celi

    Abstract: Medical decisions directly impact individuals' health and well-being. Extracting decision spans from clinical notes plays a crucial role in understanding medical decision-making processes. In this paper, we develop a new dataset called "MedDec", which contains clinical notes of eleven different phenotypes (diseases) annotated by ten types of medical decisions. We introduce the task of medical deci… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: In Findings of the Association for Computational Linguistics ACL 2024

  49. arXiv:2408.12815  [pdf, other

    cs.CV cs.AI

    Staircase Cascaded Fusion of Lightweight Local Pattern Recognition and Long-Range Dependencies for Structural Crack Segmentation

    Authors: Hui Liu, Chen Jia, Fan Shi, Xu Cheng, Mianzhao Wang, Shengyong Chen

    Abstract: Detecting cracks with pixel-level precision for key structures is a significant challenge, as existing methods struggle to effectively integrate local textures and pixel dependencies of cracks. Furthermore, these methods often possess numerous parameters and substantial computational requirements, complicating deployment on edge devices. In this paper, we propose a staircase cascaded fusion crack… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  50. arXiv:2408.12687  [pdf, other

    cs.HC

    Bridging the gap between natural user expression with complex automation programming in smart homes

    Authors: Yingtian Shi, Xiaoyi Liu, Chun Yu, Tianao Yang, Cheng Gao, Chen Liang, Yuanchun Shi

    Abstract: A long-standing challenge in end-user programming (EUP) is to trade off between natural user expression and the complexity of programming tasks. As large language models (LLMs) are empowered to handle semantic inference and natural language understanding, it remains under-explored how such capabilities can facilitate end-users to configure complex automation more naturally and easily. We propose A… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.