Zum Hauptinhalt springen

Showing 1–50 of 1,051 results for author: Huang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17154  [pdf, other

    cs.CV

    Self-supervised Anomaly Detection Pretraining Enhances Long-tail ECG Diagnosis

    Authors: Aofan Jiang, Chaoqin Huang, Qing Cao, Yuchen Xu, Zi Zeng, Kang Chen, Ya Zhang, Yanfeng Wang

    Abstract: Current computer-aided ECG diagnostic systems struggle with the underdetection of rare but critical cardiac anomalies due to the imbalanced nature of ECG datasets. This study introduces a novel approach using self-supervised anomaly detection pretraining to address this limitation. The anomaly detection model is specifically designed to detect and localize subtle deviations from normal cardiac pat… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2404.04935

  2. arXiv:2408.16251  [pdf, other

    cs.IT eess.SP

    Neural Network-Assisted Hybrid Model Based Message Passing for Parametric Holographic MIMO Near Field Channel Estimation

    Authors: Zhengdao Yuan, Yabo Guo, Dawei Gao, Qinghua Guo, Zhongyong Wang, Chongwen Huang, Ming Jin, Kai-Kit Wong

    Abstract: Holographic multiple-input and multiple-output (HMIMO) is a promising technology with the potential to achieve high energy and spectral efficiencies, enhance system capacity and diversity, etc. In this work, we address the challenge of HMIMO near field (NF) channel estimation, which is complicated by the intricate model introduced by the dyadic Green's function. Despite its complexity, the channel… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  3. arXiv:2408.16030  [pdf

    cs.SD cs.AI cs.LG eess.AS

    A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds

    Authors: Ying-Chieh Hsu, Stanley Yung-Chuan Liu, Chao-Jung Huang, Chi-Wei Wu, Ren-Kai Cheng, Jane Yung-Jen Hsu, Shang-Ran Huang, Yuan-Ren Cheng, Fu-Shun Hsu

    Abstract: This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  4. arXiv:2408.15555  [pdf, other

    eess.IV cs.CV cs.LG

    Latent Relationship Mining of Glaucoma Biomarkers: a TRI-LSTM based Deep Learning

    Authors: Cheng Huang, Junhao Shen, Qiuyu Luo, Karanjit Kooner, Tsengdar Lee, Yishen Liu, Jia Zhang

    Abstract: In recently years, a significant amount of research has been conducted on applying deep learning methods for glaucoma classification and detection. However, the explainability of those established machine learning models remains a big concern. In this research, in contrast, we learn from cognitive science concept and study how ophthalmologists judge glaucoma detection. Simulating experts' efforts,… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 images

  5. arXiv:2408.15371  [pdf, other

    cs.IR cs.LG

    Temporal Graph Neural Network-Powered Paper Recommendation on Dynamic Citation Networks

    Authors: Junhao Shen, Mohammad Ausaf Ali Haqqani, Beichen Hu, Cheng Huang, Xihao Xie, Tsengdar Lee, Jia Zhang

    Abstract: Due to the rapid growth of scientific publications, identifying all related reference articles in the literature has become increasingly challenging yet highly demanding. Existing methods primarily assess candidate publications from a static perspective, focusing on the content of articles and their structural information, such as citation relationships. There is a lack of research regarding how t… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 10 pages, 4 figures, accepted by SDU@AAAI-2024. The AAAI Workshop on Scientific Document Understanding (2024)

  6. arXiv:2408.15339  [pdf, other

    cs.LG cs.CL

    UNA: Unifying Alignments of RLHF/PPO, DPO and KTO by a Generalized Implicit Reward Function

    Authors: Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng

    Abstract: An LLM is pretrained on trillions of tokens, but the pretrained LLM may still generate undesired responses. To solve this problem, alignment techniques such as RLHF, DPO and KTO are proposed. However, these alignment techniques have limitations. For example, RLHF requires training the reward model and policy separately, which is complex, time-consuming, memory intensive and unstable during trainin… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  7. arXiv:2408.13891  [pdf, other

    cs.CL eess.AS

    SpeechCaps: Advancing Instruction-Based Universal Speech Models with Multi-Talker Speaking Style Captioning

    Authors: Chien-yu Huang, Min-Han Shih, Ke-Han Lu, Chi-Yuan Hsiao, Hung-yi Lee

    Abstract: Instruction-based speech processing is becoming popular. Studies show that training with multiple tasks boosts performance, but collecting diverse, large-scale tasks and datasets is expensive. Thus, it is highly desirable to design a fundamental task that benefits other downstream tasks. This paper introduces a multi-talker speaking style captioning task to enhance the understanding of speaker and… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: SynData4GenAI 2024

  8. arXiv:2408.12928  [pdf, other

    cs.CV

    ParGo: Bridging Vision-Language with Partial and Global Views

    Authors: An-Lan Wang, Bin Shan, Wei Shi, Kun-Yu Lin, Xiang Fei, Guozhi Tang, Lei Liao, Jingqun Tang, Can Huang, Wei-Shi Zheng

    Abstract: This work presents ParGo, a novel Partial-Global projector designed to connect the vision and language modalities for Multimodal Large Language Models (MLLMs). Unlike previous works that rely on global attention-based projectors, our ParGo bridges the representation gap between the separately pre-trained vision encoders and the LLMs by integrating global and partial views, which alleviates the ove… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  9. arXiv:2408.12658  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music

    Authors: Nithya Shikarpur, Krishna Maneesha Dendukuri, Yusong Wu, Antoine Caillon, Cheng-Zhi Anna Huang

    Abstract: Hindustani music is a performance-driven oral tradition that exhibits the rendition of rich melodic patterns. In this paper, we focus on generative modeling of singers' vocal melodies extracted from audio recordings, as the voice is musically prominent within the tradition. Prior generative work in Hindustani music models melodies as coarse discrete symbols which fails to capture the rich expressi… ▽ More

    Submitted 26 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted at International Society for Music Information Retrieval (ISMIR) 2024

  10. arXiv:2408.11446  [pdf, other

    cs.ET

    Green Probabilistic Semantic Communication over Wireless Networks

    Authors: Ruopeng Xu, Zhaohui Yang, Yijie Mao, Chongwen Huang, Qianqian Yang, Lexi Xu, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  11. arXiv:2408.11417  [pdf

    cs.DC

    Stream-K++: Adaptive GPU GEMM Kernel Scheduling and Selection using Bloom Filters

    Authors: Harisankar Sadasivan, Muhammad Osama, Maksim Podkorytov, Carlus Huang, Jun Liu

    Abstract: General matrix multiplication (GEMM) operations are crucial in various computational fields. As GPU architectures evolve, optimizing GEMM performance becomes increasingly important. This paper introduces Stream-K++, an enhancement to the promising Stream-K GEMM scheduling algorithm. We expand Stream-K's scheduling policies from three to seven and implement an efficient solution selection mechanism… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    ACM Class: D.2; I.2

  12. arXiv:2408.11355  [pdf, other

    cs.GT

    Technical Report: Coopetition in Heterogeneous Cross-Silo Federated Learning

    Authors: Chao Huang, Justin Dachille, Xin Liu

    Abstract: In cross-silo federated learning (FL), companies collaboratively train a shared global model without sharing heterogeneous data. Prior related work focused on algorithm development to tackle data heterogeneity. However, the dual problem of coopetition, i.e., FL collaboration and market competition, remains under-explored. This paper studies the FL coopetition using a dynamic two-period game model.… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Technical report; main paper accepted to ECAI 2024

  13. arXiv:2408.10738  [pdf, other

    cs.CR

    PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection

    Authors: Tri Cao, Chengyu Huang, Yuexin Li, Huilin Wang, Amy He, Nay Oo, Bryan Hooi

    Abstract: Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also encounter notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowl… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.10700  [pdf, other

    cs.LG cs.AI

    AnyGraph: Graph Foundation Model in the Wild

    Authors: Lianghao Xia, Chao Huang

    Abstract: The growing ubiquity of relational data structured as graphs has underscored the need for graph learning models with exceptional generalization capabilities. However, current approaches often struggle to effectively extract generalizable insights, frequently requiring extensive fine-tuning and limiting their versatility. Graph foundation models offer a transformative solution, with the potential t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  15. arXiv:2408.10269  [pdf, other

    cs.LG cs.AI cs.CY

    OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction

    Authors: Zhonghang Li, Long Xia, Lei Shi, Yong Xu, Dawei Yin, Chao Huang

    Abstract: Accurate traffic forecasting is crucial for effective urban planning and transportation management, enabling efficient resource allocation and enhanced travel experiences. However, existing models often face limitations in generalization, struggling with zero-shot prediction on unseen regions and cities, as well as diminished long-term accuracy. This is primarily due to the inherent challenges in… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 12 pages

  16. arXiv:2408.10092  [pdf, other

    cs.NI cs.DC

    Selecting Relay Nodes Based on Evaluation Results to Enhance P2P Broadcasting Efficiency

    Authors: Chunlin Huang

    Abstract: The existence of node failures is inevitable in distributed systems, thus many P2P broadcasting networks adopt highly robust Flooding-based broadcast algorithms. High redundancy inevitably leads to high network resource consumption, and it may constrain the data transmission rate of the network. To address excessive network resource consumption, many studies have explored broadcasting mechanisms i… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  17. arXiv:2408.09468  [pdf, other

    cs.RO

    Towards Safe and Robust Autonomous Vehicle Platooning: A Self-Organizing Cooperative Control Framework

    Authors: Chengkai Xu, Zihao Deng, Jiaqi Liu, Chao Huang, Peng Hang

    Abstract: In the emerging hybrid traffic flow environment, which includes both human-driven vehicles (HDVs) and autonomous vehicles (AVs), ensuring safe and robust decision-making and control is crucial for the effective operation of autonomous vehicle platooning. Current systems for cooperative adaptive cruise control and lane changing are inadequate in responding to real-world emergency situations, limiti… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  18. arXiv:2408.08821  [pdf, other

    cs.IR cs.AI

    EasyRec: Simple yet Effective Language Models for Recommendation

    Authors: Xubin Ren, Chao Huang

    Abstract: Deep neural networks have become a powerful technique for learning representations from user-item interaction data in collaborative filtering (CF) for recommender systems. However, many existing methods heavily rely on unique user and item IDs, which limits their ability to perform well in practical zero-shot learning scenarios where sufficient training data may be unavailable. Inspired by the suc… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  19. arXiv:2408.08592  [pdf, other

    cs.RO

    Case Study: Runtime Safety Verification of Neural Network Controlled System

    Authors: Frank Yang, Sinong Simon Zhan, Yixuan Wang, Chao Huang, Qi Zhu

    Abstract: Neural networks are increasingly used in safety-critical applications such as robotics and autonomous vehicles. However, the deployment of neural-network-controlled systems (NNCSs) raises significant safety concerns. Many recent advances overlook critical aspects of verifying control and ensuring safety in real-time scenarios. This paper presents a case study on using POLAR-Express, a state-of-the… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 15 pages, 5 figures, submitted to Runtime Verification 2024

  20. arXiv:2408.07875  [pdf, other

    cs.LG stat.ML

    Incremental Structure Discovery of Classification via Sequential Monte Carlo

    Authors: Changze Huang, Di Wang

    Abstract: Gaussian Processes (GPs) provide a powerful framework for making predictions and understanding uncertainty for classification with kernels and Bayesian non-parametric learning. Building such models typically requires strong prior knowledge to define preselect kernels, which could be ineffective for online applications of classification that sequentially process data because features of data may sh… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  21. arXiv:2408.04998  [pdf, other

    cs.CL cs.AI

    ProFuser: Progressive Fusion of Large Language Models

    Authors: Tianyuan Shi, Fanqi Wan, Canbin Huang, Xiaojun Quan, Chenliang Li, Ming Yan, Ji Zhang

    Abstract: While fusing the capacities and advantages of various large language models (LLMs) offers a pathway to construct more powerful and versatile models, a fundamental challenge is to properly select advantageous model during the training. Existing fusion methods primarily focus on the training mode that uses cross entropy on ground truth in a teacher-forcing setup to measure a model's advantage, which… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  22. arXiv:2408.02501  [pdf, ps, other

    cs.IT eess.SP

    Fair Resource Allocation For Hierarchical Federated Edge Learning in Space-Air-Ground Integrated Networks via Deep Reinforcement Learning with Hybrid Control

    Authors: Chong Huang, Gaojie Chen, Pei Xiao, Jonathon A. Chambers, Wei Huang

    Abstract: The space-air-ground integrated network (SAGIN) has become a crucial research direction in future wireless communications due to its ubiquitous coverage, rapid and flexible deployment, and multi-layer cooperation capabilities. However, integrating hierarchical federated learning (HFL) with edge computing and SAGINs remains a complex open issue to be resolved. This paper proposes a novel framework… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted for publication in IEEE Journal on Selected Areas in Communications

  23. arXiv:2407.21693  [pdf, other

    cs.AI

    TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

    Authors: Ming Zhang, Caishuang Huang, Yilong Wu, Shichun Liu, Huiyuan Zheng, Yurui Dong, Yujiong Shen, Shihan Dou, Jun Zhao, Junjie Ye, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can signif… ▽ More

    Submitted 7 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  24. arXiv:2407.21301  [pdf, ps, other

    cs.IT eess.SP

    Integrated Sensing and Communication in IRS-assisted High-Mobility Systems: Design, Analysis and Optimization

    Authors: Xingyu Peng, Qin Tao, Xiaoling Hu, Richeng Jin, Chongwen Huang, Xiaoming Chen

    Abstract: In this paper, we investigate integrated sensing and communication (ISAC) in high-mobility systems with the aid of an intelligent reflecting surface (IRS). To exploit the benefits of Delay-Doppler (DD) spread caused by high mobility, orthogonal time frequency space (OTFS)-based frame structure and transmission framework are proposed. {In such a framework,} we first design a low-complexity ratio-ba… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 pages, 9 figures

  25. arXiv:2407.17663  [pdf, other

    cs.CR cs.LG

    Explaining the Model, Protecting Your Data: Revealing and Mitigating the Data Privacy Risks of Post-Hoc Model Explanations via Membership Inference

    Authors: Catherine Huang, Martin Pawelczyk, Himabindu Lakkaraju

    Abstract: Predictive machine learning models are becoming increasingly deployed in high-stakes contexts involving sensitive personal data; in these contexts, there is a trade-off between model explainability and data privacy. In this work, we push the boundaries of this trade-off: with a focus on foundation models for image classification fine-tuning, we reveal unforeseen privacy risks of post-hoc model exp… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Workshop on the Next Generation of AI Safety

  26. arXiv:2407.16364  [pdf, other

    cs.CV

    Harmonizing Visual Text Comprehension and Generation

    Authors: Zhen Zhao, Jingqun Tang, Binghong Wu, Chunhui Lin, Shu Wei, Hao Liu, Xin Tan, Zhizhong Zhang, Can Huang, Yuan Xie

    Abstract: In this work, we present TextHarmony, a unified and versatile multimodal generative model proficient in comprehending and generating visual text. Simultaneously generating images and texts typically results in performance degradation due to the inherent inconsistency between vision and language modalities. To overcome this challenge, existing approaches resort to modality-specific data for supervi… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  27. arXiv:2407.15782  [pdf, ps, other

    cs.IT eess.SP

    Reconfigurable Intelligent Surface Empowered Full Duplex Systems: Opportunities and Challenges

    Authors: Chong Huang, Yun Wen, Long Zhang, Gaojie Chen, Zhen Gao, Pei Xiao

    Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology in wireless communications. Simultaneously transmitting and reflecting RIS (STAR-RISs) in particular have garnered significant attention due to their dual capabilities of simultaneous transmission and reflection, underscoring their potential applications in critical scenarios within the forthcoming sixth-generation (… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted for publication in IEEE Communications Standards Magazine

  28. arXiv:2407.13083  [pdf, other

    cs.SD cs.CV eess.AS

    Modeling and Driving Human Body Soundfields through Acoustic Primitives

    Authors: Chao Huang, Dejan Markovic, Chenliang Xu, Alexander Richard

    Abstract: While rendering and animation of photorealistic 3D human body models have matured and reached an impressive quality over the past years, modeling the spatial audio associated with such full body models has been largely ignored so far. In this work, we present a framework that allows for high-quality spatial audio generation, capable of rendering the full 3D soundfield generated by a human body, in… ▽ More

    Submitted 20 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project Page: https://wikichao.github.io/Acoustic-Primitives/

  29. arXiv:2407.09709  [pdf, other

    cs.LG cs.CL

    GOFA: A Generative One-For-All Model for Joint Graph Language Modeling

    Authors: Lecheng Kong, Jiarui Feng, Hao Liu, Chengsong Huang, Jiaxin Huang, Yixin Chen, Muhan Zhang

    Abstract: Foundation models, such as Large Language Models (LLMs) or Large Vision Models (LVMs), have emerged as one of the most powerful tools in the respective fields. However, unlike text and image data, graph data do not have a definitive structure, posing great challenges to developing a Graph Foundation Model (GFM). For example, current attempts at designing general graph models either transform graph… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  30. arXiv:2407.09250  [pdf

    cs.NI cs.LG

    FedsLLM: Federated Split Learning for Large Language Models over Communication Networks

    Authors: Kai Zhao, Zhaohui Yang, Chongwen Huang, Xiaoming Chen, Zhaoyang Zhang

    Abstract: Addressing the challenges of deploying large language models in wireless communication networks, this paper combines low-rank adaptation technology (LoRA) with the splitfed learning framework to propose the federated split learning for large language models (FedsLLM) framework. The method introduced in this paper utilizes LoRA technology to reduce processing loads by dividing the network into clie… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  31. arXiv:2407.07723  [pdf, other

    cs.IT cs.AI

    Understanding is Compression

    Authors: Ziguang Li, Chao Huang, Xuliang Wang, Haibo Hu, Cole Wyeth, Dongbo Bu, Quan Yu, Wen Gao, Xingwu Liu, Ming Li

    Abstract: Modern data compression methods are slowly reaching their limits after 80 years of research, millions of papers, and wide range of applications. Yet, the extravagant 6G communication speed requirement raises a major open question for revolutionary new ideas of data compression. We have previously shown all understanding or learning are compression, under reasonable assumptions. Large language mo… ▽ More

    Submitted 20 August, 2024; v1 submitted 23 June, 2024; originally announced July 2024.

  32. arXiv:2407.06153  [pdf, other

    cs.SE cs.CL

    What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

    Authors: Shihan Dou, Haoxiang Jia, Shenxi Wu, Huiyuan Zheng, Weikang Zhou, Muling Wu, Mingxu Chai, Jessica Fan, Caishuang Huang, Yunbo Tao, Yan Liu, Enyu Zhou, Ming Zhang, Yuhao Zhou, Yueming Wu, Rui Zheng, Ming Wen, Rongxiang Weng, Jingang Wang, Xunliang Cai, Tao Gui, Xipeng Qiu, Qi Zhang, Xuanjing Huang

    Abstract: The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality datasets and leveraging diverse training technologies. However, there is a notable lack of comprehensive studies examining the limitations and boundar… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 17 pages, 7 figures

  33. arXiv:2407.06094  [pdf, ps, other

    cs.RO

    ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions

    Authors: Micol Spitale, Maria Teresa Parreira, Maia Stiber, Minja Axelsson, Neval Kara, Garima Kankariya, Chien-Ming Huang, Malte Jung, Wendy Ju, Hatice Gunes

    Abstract: Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to t… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  34. arXiv:2407.05873  [pdf, other

    eess.SP cs.IT

    Receiver Selection and Transmit Beamforming for Multi-static Integrated Sensing and Communications

    Authors: Dan Wang, Yuanming Tian, Chuan Huang, Hao Chen, Xiaodong Xu, Ping Zhang

    Abstract: Next-generation wireless networks are expected to develop a novel paradigm of integrated sensing and communications (ISAC) to enable both the high-accuracy sensing and high-speed communications. However, conventional mono-static ISAC systems, which simultaneously transmit and receive at the same equipment, may suffer from severe self-interference, and thus significantly degrade the system performa… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  35. arXiv:2407.05840  [pdf, other

    cs.ET physics.optics

    A 103-TOPS/mm$^2$ Integrated Photonic Computing Engine Enabling Next-Generation Reservoir Computing

    Authors: Dongliang Wang, Yikun Nie, Gaolei Hu, Hon Ki Tsang, Chaoran Huang

    Abstract: Reservoir computing (RC) is a leading machine learning algorithm for information processing due to its rich expressiveness. A new RC paradigm has recently emerged, showcasing superior performance and delivering more interpretable results with shorter training data sets and training times, representing the next generation of RC computing. This work presents the first realization of a high-speed nex… ▽ More

    Submitted 31 May, 2024; originally announced July 2024.

  36. arXiv:2407.05249  [pdf, ps, other

    cs.IT eess.SP

    RIS-assisted Coverage Enhancement in mmWave Integrated Sensing and Communication Networks

    Authors: Xu Gan, Chongwen Huang, Zhaohui Yang, Xiaoming Chen, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Yong Liang Guan, Merouane Debbah

    Abstract: Integrated sensing and communication (ISAC) has emerged as a promising technology to facilitate high-rate communications and super-resolution sensing, particularly operating in the millimeter wave (mmWave) band. However, the vulnerability of mmWave signals to blockages severely impairs ISAC capabilities and coverage. To tackle this, an efficient and low-cost solution is to deploy distributed recon… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  37. arXiv:2407.03475  [pdf, other

    cs.LG

    How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

    Authors: Etai Littwin, Omid Saremi, Madhu Advani, Vimal Thilak, Preetum Nakkiran, Chen Huang, Joshua Susskind

    Abstract: Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other. A recent successful approach that falls under the JEPA framework is self-distillation, where an online encoder is trained to predict the… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Technical report

  38. arXiv:2407.03169  [pdf, other

    cs.CL cs.SD eess.AS

    Investigating Decoder-only Large Language Models for Speech-to-text Translation

    Authors: Chao-Wei Huang, Hui Lu, Hongyu Gong, Hirofumi Inaguma, Ilia Kulikov, Ruslan Mavlyutov, Sravya Popuri

    Abstract: Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating decoder-only LLMs to the task of speech-to-text translation (S2TT). We propose a decoder-only architecture that enables the LLM to directly consume the encoded sp… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted to Interspeech 2024

  39. arXiv:2407.03007  [pdf, other

    cs.CL cs.AI

    What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

    Authors: Chengrui Huang, Zhengliang Shi, Yuntao Wen, Xiuying Chen, Peng Han, Shen Gao, Shuo Shang

    Abstract: Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, a… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 19 pages, 9 figures

  40. arXiv:2407.02680  [pdf, other

    cs.SE

    KGym: A Platform and Dataset to Benchmark Large Language Models on Linux Kernel Crash Resolution

    Authors: Alex Mathai, Chenxi Huang, Petros Maniatis, Aleksandr Nogikh, Franjo Ivancic, Junfeng Yang, Baishakhi Ray

    Abstract: Large Language Models (LLMs) are consistently improving at increasingly realistic software engineering (SE) tasks. In real-world software stacks, significant SE effort is spent developing foundational system software like the Linux kernel. Unlike application-level software, a systems codebase like Linux is multilingual (low-level C/Assembly/Bash/Rust); gigantic (>20 million lines); critical (impac… ▽ More

    Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  41. arXiv:2407.01976  [pdf, other

    cs.CL cs.AI cs.MM

    A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

    Authors: Jinghui Lu, Haiyang Yu, Yanjie Wang, Yongjie Ye, Jingqun Tang, Ziwei Yang, Binghong Wu, Qi Liu, Hao Feng, Han Wang, Hao Liu, Can Huang

    Abstract: Recently, many studies have demonstrated that exclusively incorporating OCR-derived text and spatial layouts with large language models (LLMs) can be highly effective for document understanding tasks. However, existing methods that integrate spatial layouts with text have limitations, such as producing overly long text sequences or failing to fully leverage the autoregressive traits of LLMs. In th… ▽ More

    Submitted 24 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  42. arXiv:2407.01926  [pdf

    physics.med-ph cs.CV

    Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior

    Authors: Chaoxing Huang, Ziqiang Yu, Zijian Gao, Qiuyi Shen, Queenie Chan, Vincent Wai-Sun Wong, Winnie Chiu-Wing Chu, Weitian Chen

    Abstract: This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results s… ▽ More

    Submitted 25 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  43. arXiv:2407.01081  [pdf, other

    cs.CV cs.CL

    CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

    Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che

    Abstract: Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  44. arXiv:2406.18118  [pdf, other

    cs.CR cs.CL

    SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

    Authors: Caishuang Huang, Wanxu Zhao, Rui Zheng, Huijie Lv, Shihan Dou, Sixian Li, Xiao Wang, Enyu Zhou, Junjie Ye, Yuming Yang, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., efforts to bypass security protocols) often suffer from limited adaptability, restricted general capability, and high cost. To address these challenges, w… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  45. arXiv:2406.14307  [pdf, other

    q-bio.GN cs.CL cs.CV

    QuST-LLM: Integrating Large Language Models for Comprehensive Spatial Transcriptomics Analysis

    Authors: Chao Hui Huang

    Abstract: In this paper, we introduce QuST-LLM, an innovative extension of QuPath that utilizes the capabilities of large language models (LLMs) to analyze and interpret spatial transcriptomics (ST) data. In addition to simplifying the intricate and high-dimensional nature of ST data by offering a comprehensive workflow that includes data loading, region selection, gene expression analysis, and functional a… ▽ More

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 7 figures

  46. arXiv:2406.14171  [pdf, other

    cs.AI cs.CL

    Ranking LLMs by compression

    Authors: Peijia Guo, Ziguang Li, Haibo Hu, Chao Huang, Ming Li, Rui Zhang

    Abstract: We conceptualize the process of understanding as information compression, and propose a method for ranking large language models (LLMs) based on lossless data compression. We demonstrate the equivalence of compression length under arithmetic coding with cumulative negative log probabilities when using a large language model as a prior, that is, the pre-training phase of the model is essentially th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 tables

  47. arXiv:2406.14163  [pdf, other

    cs.DB stat.ME

    A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics

    Authors: Cynthia A. Huang

    Abstract: Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistica… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  48. arXiv:2406.12787  [pdf, other

    cs.CL cs.HC

    Generating Educational Materials with Different Levels of Readability using LLMs

    Authors: Chieh-Yang Huang, Jing Wei, Ting-Hao 'Kenneth' Huang

    Abstract: This study introduces the leveled-text generation task, aiming to rewrite educational materials to specific readability levels while preserving meaning. We assess the capability of GPT-3.5, LLaMA-2 70B, and Mixtral 8x7B, to generate content at various readability levels through zero-shot and few-shot prompting. Evaluating 100 processed educational materials reveals that few-shot prompting signific… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: In2Writing 2024

  49. arXiv:2406.11781  [pdf, other

    cs.IR

    DiffMM: Multi-Modal Diffusion Model for Recommendation

    Authors: Yangqin Jiang, Lianghao Xia, Wei Wei, Da Luo, Kangyi Lin, Chao Huang

    Abstract: The rise of online multi-modal sharing platforms like TikTok and YouTube has enabled personalized recommender systems to incorporate multiple modalities (such as visual, textual, and acoustic) into user representations. However, addressing the challenge of data sparsity in these systems remains a key issue. To address this limitation, recent research has introduced self-supervised learning techniq… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  50. arXiv:2406.11208  [pdf

    cs.NI

    Privacy-preserving Pseudonym Schemes for Personalized 3D Avatars in Mobile Social Metaverses

    Authors: Cheng Su, Xiaofeng Luo, Zhenmou Liu, Jiawen Kang, Min Hao, Zehui Xiong, Zhaohui Yang, Chongwen Huang

    Abstract: The emergence of mobile social metaverses, a novel paradigm bridging physical and virtual realms, has led to the widespread adoption of avatars as digital representations for Social Metaverse Users (SMUs) within virtual spaces. Equipped with immersive devices, SMUs leverage Edge Servers (ESs) to deploy their avatars and engage with other SMUs in virtual spaces. To enhance immersion, SMUs incline t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 6pages, 4 figures