Zum Hauptinhalt springen

Showing 1–50 of 3,599 results for author: Huang, Z

.
  1. arXiv:2408.17380  [pdf, other

    cs.AI cs.LG

    Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control

    Authors: Zihao Sheng, Zilin Huang, Sikai Chen

    Abstract: Model-based reinforcement learning (RL) is anticipated to exhibit higher sample efficiency compared to model-free RL by utilizing a virtual environment model. However, it is challenging to obtain sufficiently accurate representations of the environmental dynamics due to uncertainties in complex systems and environments. An inaccurate environment model may degrade the sample efficiency and performa… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  2. arXiv:2408.17081  [pdf, other

    cs.CV

    Stochastic Layer-Wise Shuffle: A Good Practice to Improve Vision Mamba Training

    Authors: Zizheng Huang, Haoxing Chen, Jiaqi Li, Jun Lan, Huijia Zhu, Weiqiang Wang, Limin Wang

    Abstract: Recent Vision Mamba models not only have much lower complexity for processing higher resolution images and longer videos but also the competitive performance with Vision Transformers (ViTs). However, they are stuck into overfitting and thus only present up to base size (about 80M). It is still unclear how vanilla Vision Mamba (Vim) can be efficiently scaled up to larger sizes, which is essentially… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  3. arXiv:2408.16398  [pdf, other

    astro-ph.CO

    Pair Counting without Binning -- A New Approach to Correlation Functions in Clustering Statistics

    Authors: Shiyu Yue, Longlong Feng, Wenjie Ju, Jun Pan, Zhiqi Huang, Feng Fang, Zhuoyang Li, Yan-Chuan Cai, Weishan Zhu

    Abstract: This paper presents a novel perspective on correlation functions in the clustering analysis of the large-scale structure of the universe. We first recognise that pair counting in bins of radial separation is equivalent to evaluating counts-in-cells (CIC), which can be modelled using a filtered density field with a binning-window function. This insight leads to an in situ expression for the two-poi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: 17 pages, 12 figures, submitted to MNRAS

  4. arXiv:2408.15529  [pdf, other

    quant-ph cond-mat.str-el physics.chem-ph physics.comp-ph

    Quasi-Lindblad pseudomode theory for open quantum systems

    Authors: Gunhee Park, Zhen Huang, Yuanran Zhu, Chao Yang, Garnet Kin-Lic Chan, Lin Lin

    Abstract: We introduce a new framework to study the dynamics of open quantum systems with linearly coupled Gaussian baths. Our approach replaces the continuous bath with an auxiliary discrete set of pseudomodes with dissipative dynamics, but we further relax the complete positivity requirement in the Lindblad master equation and formulate a quasi-Lindblad pseudomode theory. We show that this quasi-Lindblad… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 13 pages, 6 figures (main text); 8 pages, 1 figure (Supplementary Material)

  5. arXiv:2408.14881  [pdf, other

    astro-ph.CO gr-qc hep-lat hep-ph hep-th

    MEET-U project II: Curvature perturbations from kinetic preheating after $α$-attractor inflation

    Authors: Zhiqi Huang, Xichang Ouyang, Yu Cui, Jianqi Liu, Yanhong Yao, Zehong Qiu, Guangyao Yu, Lu Huang, Zhuoyang Li, Chi-Fong Wong

    Abstract: Preheating at the end of inflation is a violent nonlinear process that efficiently transfers the energy of the inflaton to a second field, the preheat field. When the preheat field is light during inflation and its background value modulates the preheating process, the superhorizon isocurvature perturbations of the preheat field may be converted to curvature perturbations that leave an imprint on… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Report number: SYSU-SPA-2024 MSC Class: 83F05 ACM Class: J.2

  6. arXiv:2408.14765  [pdf, other

    cs.CV

    CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis

    Authors: Weijia Li, Jun He, Junyan Ye, Huaping Zhong, Zhimeng Zheng, Zilong Huang, Dahua Lin, Conghui He

    Abstract: Satellite-to-street view synthesis aims at generating a realistic street-view image from its corresponding satellite-view image. Although stable diffusion models have exhibit remarkable performance in a variety of image generation applications, their reliance on similar-view inputs to control the generated structure or texture restricts their application to the challenging cross-view synthesis tas… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 21 pages, 11 figures

  7. arXiv:2408.14354  [pdf, other

    cs.SE cs.AI cs.CL

    SWE-bench-java: A GitHub Issue Resolving Benchmark for Java

    Authors: Daoguang Zan, Zhirong Huang, Ailun Yu, Shaoxin Lin, Yifan Shi, Wei Liu, Dong Chen, Zongshuai Qi, Hao Yu, Lei Yu, Dezhi Ran, Muhan Zeng, Bo Shen, Pan Bian, Guangtai Liang, Bei Guan, Pengjie Huang, Tao Xie, Yongji Wang, Qianxiang Wang

    Abstract: GitHub issue resolving is a critical task in software engineering, recently gaining significant attention in both industry and academia. Within this task, SWE-bench has been released to evaluate issue resolving capabilities of large language models (LLMs), but has so far only focused on Python version. However, supporting more programming languages is also important, as there is a strong demand in… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: This work is in progress

  8. arXiv:2408.14282  [pdf, other

    quant-ph cond-mat.mes-hall

    All-microwave readout, spectroscopy, and dynamic polarization of individual nuclear spins in a crystal

    Authors: J. Travesedo, J. O'Sullivan, L. Pallegoix, Z. W. Huang, P. Hogan, P. Goldner, T. Chaneliere, S. Bertaina, D. Esteve, P. Abgrall, D. Vion, E. Flurin, P. Bertet

    Abstract: Pushing the sensitivity of nuclear magnetic resonance spectroscopy to the single spin level would have a major impact in chemistry and biology and is the goal of intense research efforts. Individual nuclear spins have been detected via their hyperfine coupling to an individual electronic paramagnetic system, itself measured by optical or electrical means. These methods are however only applicable… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  9. arXiv:2408.13890  [pdf, other

    cs.CV

    Making Large Language Models Better Planners with Reasoning-Decision Alignment

    Authors: Zhijian Huang, Tao Tang, Shaoxiang Chen, Sihao Lin, Zequn Jie, Lin Ma, Guangrun Wang, Xiaodan Liang

    Abstract: Data-driven approaches for autonomous driving (AD) have been widely adopted in the past decade but are confronted with dataset bias and uninterpretability. Inspired by the knowledge-driven nature of human driving, recent approaches explore the potential of large language models (LLMs) to improve understanding and decision-making in traffic scenarios. They find that the pretrain-finetune paradigm o… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  10. arXiv:2408.13671  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci physics.optics

    Ultrafast Charge Transfer Dynamics at the MoS$_2$/Au Interface Observed via Optical Spectroscopy under Ambient Conditions

    Authors: Tao Yang, Zhipeng Huang, Stephan Sleziona, Eckart Hasselbrink, Peter Kratzer, Marika Schleberger, R. Kramer Campen, Yujin Tong

    Abstract: To take advantage of the exceptional properties of atomically thin transition metal dichalcogenides (TMDC) for advanced devices and catalysts, integration with metallic surfaces is an efficacious approach for facilitating charge carrier injection and extraction from TMDC monolayers. Light-matter interactions predominantly occur at the K point in TMDC monolayers, making the charge carrier dynamics… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures and supplemental material

  11. arXiv:2408.13385  [pdf, other

    cs.CV

    MICM: Rethinking Unsupervised Pretraining for Enhanced Few-shot Learning

    Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Zhimeng Huang, Yuhua Li, Ruixuan Li

    Abstract: Humans exhibit a remarkable ability to learn quickly from a limited number of labeled samples, a capability that starkly contrasts with that of current machine learning systems. Unsupervised Few-Shot Learning (U-FSL) seeks to bridge this divide by reducing reliance on annotated datasets during initial training phases. In this work, we first quantitatively assess the impacts of Masked Image Modelin… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: ACMMM 2024 (Oral)

  12. arXiv:2408.13008  [pdf, other

    cs.LG

    Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

    Authors: Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang

    Abstract: This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss. The proposed approach presents a novel framework to identify and improve a model's recognition on challengi… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: UK Speech 2024, Submitted to SLT 2024

  13. arXiv:2408.12821  [pdf, other

    cs.CV cs.AI

    Examining the Commitments and Difficulties Inherent in Multimodal Foundation Models for Street View Imagery

    Authors: Zhenyuan Yang, Xuhui Lin, Qinyi He, Ziye Huang, Zhengliang Liu, Hanqi Jiang, Peng Shu, Zihao Wu, Yiwei Li, Stephen Law, Gengchen Mai, Tianming Liu, Tao Yang

    Abstract: The emergence of Large Language Models (LLMs) and multimodal foundation models (FMs) has generated heightened interest in their applications that integrate vision and language. This paper investigates the capabilities of ChatGPT-4V and Gemini Pro for Street View Imagery, Built Environment, and Interior by evaluating their performance across various tasks. The assessments include street furniture i… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  14. arXiv:2408.12734  [pdf, other

    cs.AI cs.CY cs.SD eess.AS stat.ML

    Towards measuring fairness in speech recognition: Fair-Speech dataset

    Authors: Irina-Elena Veliche, Zhuangqun Huang, Vineeth Ayyat Kochaniyan, Fuchun Peng, Ozlem Kalinli, Michael L. Seltzer

    Abstract: The current public datasets for speech recognition (ASR) tend not to focus specifically on the fairness aspect, such as performance across different demographic groups. This paper introduces a novel dataset, Fair-Speech, a publicly released corpus to help researchers evaluate their ASR models for accuracy across a diverse set of self-reported demographic information, such as age, gender, ethnicity… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  15. arXiv:2408.12534  [pdf, other

    eess.IV cs.AI cs.CV

    Automatic Organ and Pan-cancer Segmentation in Abdomen CT: the FLARE 2023 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Ershuai Wang, Qin Zhou, Ziyan Huang, Pengju Lyu, Jian He, Bo Wang

    Abstract: Organ and cancer segmentation in abdomen Computed Tomography (CT) scans is the prerequisite for precise cancer diagnosis and treatment. Most existing benchmarks and algorithms are tailored to specific cancer types, limiting their ability to provide comprehensive cancer analysis. This work presents the first international competition on abdominal organ and pan-cancer segmentation by providing a lar… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024 FLARE Challenge Summary

  16. arXiv:2408.12524  [pdf, ps, other

    cs.DS cs.GT

    Stochastic Online Correlated Selection

    Authors: Ziyun Chen, Zhiyi Huang, Enze Sun

    Abstract: We study Stochastic Online Correlated Selection (SOCS), a family of online rounding algorithms for Non-IID Stochastic Online Submodular Welfare Maximization and special cases such as Online Stochastic Matching, Stochastic AdWords, and Stochastic Display Ads. At each step, the algorithm sees an online item's type and fractional allocation, then immediately allocates it to an agent. We propose a met… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  17. arXiv:2408.12260  [pdf, ps, other

    hep-th

    All Five-point Kaluza-Klein Correlators and Hidden 8d Symmetry in $\rm AdS_5\times S^3$

    Authors: Zhongjie Huang, Bo Wang, Ellis Ye Yuan, Jiarong Zhang

    Abstract: We systematically compute five-point correlators of chiral primary operators with arbitrary Kaluza-Klein charges at tree-level in $\mathrm{AdS}_5\times\mathrm{S}^3$, and obtain a unified formula. This result serves as the first concrete confirmation for the existence of the hidden eight-dimensional symmetries at the level of five points.

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 5 pages, 2 figures and 3 appendices + ancillary files for the results

  18. arXiv:2408.12171  [pdf, other

    cs.LG

    Recent Advances on Machine Learning for Computational Fluid Dynamics: A Survey

    Authors: Haixin Wang, Yadi Cao, Zijie Huang, Yuxuan Liu, Peiyan Hu, Xiao Luo, Zezheng Song, Wanjia Zhao, Jilin Liu, Jinan Sun, Shikun Zhang, Long Wei, Yue Wang, Tailin Wu, Zhi-Ming Ma, Yizhou Sun

    Abstract: This paper explores the recent advancements in enhancing Computational Fluid Dynamics (CFD) tasks through Machine Learning (ML) techniques. We begin by introducing fundamental concepts, traditional methods, and benchmark datasets, then examine the various roles ML plays in improving CFD. The literature systematically reviews papers in recent five years and introduces a novel classification for for… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 22 pages, 6 figures

  19. arXiv:2408.11296  [pdf, other

    cs.SE cs.CL

    RePair: Automated Program Repair with Process-based Feedback

    Authors: Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

    Abstract: The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedent… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 15 pages, 13 figures

    Journal ref: ACL 2024 Findings

  20. arXiv:2408.10486  [pdf, ps, other

    cs.SE

    Revisiting Evolutionary Program Repair via Code Language Model

    Authors: Yunan Wang, Tingyu Guo, Zilong Huang, Yuan Yuan

    Abstract: Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of Large Language Models, Code Language Models (CLMs) trained on code corpora excels in code generation, making them suitable for APR applications. Despite this progress, a significant limitation remain… ▽ More

    Submitted 26 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  21. arXiv:2408.10444  [pdf, other

    astro-ph.IM astro-ph.CO

    In-Flight Performance of Spider's 280 GHz Receivers

    Authors: Elle C. Shaw, P. A. R. Ade, S. Akers, M. Amiri, J. Austermann, J. Beall, D. T. Becker, S. J. Benton, A. S. Bergman, J. J. Bock, J. R. Bond, S. A. Bryan, H. C. Chiang, C. R. Contaldi, R. S. Domagalski, O. Doré, S. M. Duff, A. J. Duivenvoorden, H. K. Eriksen, M. Farhang, J. P. Filippini, L. M. Fissel, A. A. Fraisse, K. Freese, M. Galloway , et al. (62 additional authors not shown)

    Abstract: SPIDER is a balloon-borne instrument designed to map the cosmic microwave background at degree-angular scales in the presence of Galactic foregrounds. SPIDER has mapped a large sky area in the Southern Hemisphere using more than 2000 transition-edge sensors (TESs) during two NASA Long Duration Balloon flights above the Antarctic continent. During its first flight in January 2015, SPIDER observed i… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to SPIE Astronomical Telescopes + Instrumentation 2024, JATIS

  22. arXiv:2408.10410  [pdf, other

    eess.SP

    Stream-Based Ground Segmentation for Real-Time LiDAR Point Cloud Processing on FPGA

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Witek Jachimczyk, Xinming Huang

    Abstract: This paper presents a novel and fast approach for ground plane segmentation in a LiDAR point cloud, specifically optimized for processing speed and hardware efficiency on FPGA hardware platforms. Our approach leverages a channel-based segmentation method with an advanced angular data repair technique and a cross-eight-way flood-fill algorithm. This innovative approach significantly reduces the num… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  23. arXiv:2408.10404  [pdf, other

    cs.CV eess.IV eess.SP

    Parallel Processing of Point Cloud Ground Segmentation for Mechanical and Solid-State LiDARs

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Witek Jachimczyk, Xinming Huang

    Abstract: In this study, we introduce a novel parallel processing framework for real-time point cloud ground segmentation on FPGA platforms, aimed at adapting LiDAR algorithms to the evolving landscape from mechanical to solid-state LiDAR (SSL) technologies. Focusing on the ground segmentation task, we explore parallel processing techniques on existing approaches and adapt them to real-world SSL data handli… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 5 pages

  24. arXiv:2408.10072  [pdf, other

    cs.CV cs.AI

    FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant

    Authors: Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang

    Abstract: The rapid advancement of deepfake technologies has sparked widespread public concern, particularly as face forgery poses a serious threat to public information security. However, the unknown and diverse forgery techniques, varied facial features and complex environmental factors pose significant challenges for face forgery analysis. Existing datasets lack descriptions of these aspects, making it d… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 17 pages, 18 figures; project page: https://ffaa-vl.github.io

  25. arXiv:2408.09651  [pdf, other

    cs.IR cs.AI

    Data-driven Conditional Instrumental Variables for Debiasing Recommender Systems

    Authors: Zhirong Huang, Shichao Zhang, Debo Cheng, Jiuyong Li, Lin Liu, Guangquan Lu

    Abstract: In recommender systems, latent variables can cause user-item interaction data to deviate from true user preferences. This biased data is then used to train recommendation models, further amplifying the bias and ultimately compromising both recommendation accuracy and user satisfaction. Instrumental Variable (IV) methods are effective tools for addressing the confounding bias introduced by latent v… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  26. arXiv:2408.09646  [pdf, other

    cs.IR cs.AI

    Debiased Contrastive Representation Learning for Mitigating Dual Biases in Recommender Systems

    Authors: Zhirong Huang, Shichao Zhang, Debo Cheng, Jiuyong Li, Lin Liu, Guixian Zhang

    Abstract: In recommender systems, popularity and conformity biases undermine recommender effectiveness by disproportionately favouring popular items, leading to their over-representation in recommendation lists and causing an unbalanced distribution of user-item historical data. We construct a causal graph to address both biases and describe the abstract data generation mechanism. Then, we use it as a guide… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  27. arXiv:2408.09384  [pdf, other

    cs.CV cs.MM

    FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model

    Authors: Ziyu Yao, Xuxin Cheng, Zhiqi Huang

    Abstract: Talking head generation is a significant research topic that still faces numerous challenges. Previous works often adopt generative adversarial networks or regression models, which are plagued by generation quality and average facial shape problem. Although diffusion models show impressive generative ability, their exploration in talking head generation remains unsatisfactory. This is because they… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  28. arXiv:2408.09251  [pdf, other

    cs.RO cs.AI cs.LG

    V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

    Authors: Junwei You, Haotian Shi, Zhuoyu Jiang, Zilin Huang, Rui Gan, Keshu Wu, Xi Cheng, Xiaopeng Li, Bin Ran

    Abstract: Advancements in autonomous driving have increasingly focused on end-to-end (E2E) systems that manage the full spectrum of driving tasks, from environmental perception to vehicle navigation and control. This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with large vision-language models (VLMs). V2X-VLM is designed to enhance situ… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  29. arXiv:2408.09191  [pdf, other

    cs.CV

    GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

    Authors: Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li

    Abstract: For interacting with mobile objects in unfamiliar environments, simultaneously locating, mapping, and tracking the 3D poses of multiple objects are crucially required. This paper proposes a Tracklet Graph and Query Graph-based framework, i.e., GSLAMOT, to address this challenge. GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene i… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 11 pages, 9 figures, ACM MM 2024

  30. arXiv:2408.09144  [pdf, other

    cs.CV

    SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

    Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

    Abstract: Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these arti… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  31. arXiv:2408.08796  [pdf, ps, other

    cs.IT eess.SP

    Multi-Antenna Broadband Backscatter Communications

    Authors: Hao Chen, Zhizhi Huang, Ying-Chang Liang, Robert Schober

    Abstract: Backscatter communication offers a promising solution to connect massive Internet-of-Things (IoT) devices with low cost and high energy efficiency. Nevertheless, its inherently passive nature limits transmission reliability, thereby hindering improvements in communication range and data rate. To overcome these challenges, we introduce a bistatic broadband backscatter communication (BBBC) system, w… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  32. arXiv:2408.08412  [pdf, other

    cs.CV

    Penny-Wise and Pound-Foolish in Deepfake Detection

    Authors: Yabin Wang, Zhiwu Huang, Su Zhou, Adam Prugel-Bennett, Xiaopeng Hong

    Abstract: The diffusion of deepfake technologies has sparked serious concerns about its potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancement, many current approaches prioritize short-term gains at expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models solely with a penny-wise o… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  33. arXiv:2408.06793  [pdf, other

    cs.CL

    Layerwise Recurrent Router for Mixture-of-Experts

    Authors: Zihan Qiu, Zeyu Huang, Shuang Cheng, Yizhi Zhou, Zili Wang, Ivan Titov, Jie Fu

    Abstract: The scaling of large language models (LLMs) has revolutionized their capabilities in various tasks, yet this growth must be matched with efficient computational strategies. The Mixture-of-Experts (MoE) architecture stands out for its ability to scale model size without significantly increasing training costs. Despite their advantages, current MoE models often display parameter inefficiency. For in… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  34. arXiv:2408.06174  [pdf

    cond-mat.supr-con

    Emergent superconductivity and pair density wave at antiphase boundaries of charge density wave order in kagome metals

    Authors: Xianghe Han, Hui Chen, Hengxin Tan, Zhongyi Cao, Zihao Huang, Yuhan Ye, Zhen Zhao, Chengmin Shen, Haitao Yang, Binghai Yan, Ziqiang Wang, Hong-Jun Gao

    Abstract: Central to the layered kagome lattice superconductors AV3Sb5 (A = K, Cs, Rb) is a cascade of novel quantum states triggered by an unconventional charge density wave (CDW) order. The three-dimensional (3D) order involves a 2x2x2 phase coherent stacking of 2x2 charge density modulations in the kagome plane at low temperatures, exhibiting a CDW energy gap and evidence for time-reversal symmetry break… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  35. arXiv:2408.05945  [pdf, other

    cs.CV

    MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection

    Authors: Zitian Wang, Zehao Huang, Yulu Gao, Naiyan Wang, Si Liu

    Abstract: The rise of autonomous vehicles has significantly increased the demand for robust 3D object detection systems. While cameras and LiDAR sensors each offer unique advantages--cameras provide rich texture information and LiDAR offers precise 3D spatial data--relying on a single modality often leads to performance limitations. This paper introduces MV2DFusion, a multi-modal detection framework that in… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  36. arXiv:2408.05674  [pdf, other

    cs.CV

    PS-TTL: Prototype-based Soft-labels and Test-Time Learning for Few-shot Object Detection

    Authors: Yingjie Gao, Yanan Zhang, Ziyue Huang, Nanqing Liu, Di Huang

    Abstract: In recent years, Few-Shot Object Detection (FSOD) has gained widespread attention and made significant progress due to its ability to build models with a good generalization power using extremely limited annotated data. The fine-tuning based paradigm is currently dominating this field, where detectors are initially pre-trained on base classes with sufficient samples and then fine-tuned on novel on… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  37. arXiv:2408.05468  [pdf, ps, other

    math.RA math.RT

    Auslander-type conditions and weakly Gorenstein algebras

    Authors: Zhaoyong Huang

    Abstract: Let $R$ be an Artin algebra. Under certain Auslander-type conditions, we give some equivalent characterizations of (weakly) Gorenstein algebras in terms of the properties of Gorenstein projective modules and modules satisfying Auslander-type conditions. As applications, we provide some support for several homological conjectures. In particular, we prove that if $R$ is left quasi Auslander, then… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 16 pages; accepted for publication in Bulletin of the London Mathematical Society

    MSC Class: 16E65; 16E10; 18G25

  38. arXiv:2408.04968  [pdf, other

    physics.optics

    One-dimensional spin-flipping topological edge state laser

    Authors: Jhih-Sheng Wu, Zhen-Ting Huang, Meng-Ting Han, Yen-Hsun Chen, Tien-Chang Lu

    Abstract: Topological edge states manifest spin-momentum-locking propagation as a primary consequence of topological crystals. However, experimental studies on spin manipulation and the resulting propagation of these states are lacking. Here, we demonstrate experimentally spin manipulation of topological edge states by the boundary conditions of the one-dimensional path. Armchair boundaries at the endpoints… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6 figures

  39. arXiv:2408.04863  [pdf, other

    cs.SE

    Coding-PTMs: How to Find Optimal Code Pre-trained Models for Code Embedding in Vulnerability Detection?

    Authors: Yu Zhao, Lina Gong, Zhiqiu Huang, Yongwei Wang, Mingqiang Wei, Fei Wu

    Abstract: Vulnerability detection is garnering increasing attention in software engineering, since code vulnerabilities possibly pose significant security. Recently, reusing various code pre-trained models has become common for code embedding without providing reasonable justifications in vulnerability detection. The premise for casually utilizing pre-trained models (PTMs) is that the code embeddings genera… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE 2024

  40. arXiv:2408.03601  [pdf, other

    cs.RO

    DRAMA: An Efficient End-to-end Motion Planner for Autonomous Driving with Mamba

    Authors: Chengran Yuan, Zhanqi Zhang, Jiawei Sun, Shuo Sun, Zefan Huang, Christina Dao Wen Lee, Dongen Li, Yuhang Han, Anthony Wong, Keng Peng Tee, Marcelo H. Ang Jr

    Abstract: Motion planning is a challenging task to generate safe and feasible trajectories in highly dynamic and complex environments, forming a core capability for autonomous vehicles. In this paper, we propose DRAMA, the first Mamba-based end-to-end motion planner for autonomous vehicles. DRAMA fuses camera, LiDAR Bird's Eye View images in the feature space, as well as ego status information, to generate… ▽ More

    Submitted 14 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  41. arXiv:2408.03511  [pdf, other

    cs.CV cs.CL

    MoExtend: Tuning New Experts for Modality and Task Extension

    Authors: Shanshan Zhong, Shanghua Gao, Zhongzhan Huang, Wushao Wen, Marinka Zitnik, Pan Zhou

    Abstract: Large language models (LLMs) excel in various tasks but are primarily trained on text data, limiting their application scope. Expanding LLM capabilities to include vision-language understanding is vital, yet training them on multimodal data from scratch is challenging and costly. Existing instruction tuning methods, e.g., LLAVA, often connects a pretrained CLIP vision encoder and LLMs via fully fi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024 - SRW

  42. arXiv:2408.03361  [pdf, other

    eess.IV cs.CV

    GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI

    Authors: Pengcheng Chen, Jin Ye, Guoan Wang, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao

    Abstract: Large Vision-Language Models (LVLMs) are capable of handling diverse data types such as imaging, text, and physiological signals, and can be applied in various fields. In the medical field, LVLMs have a high potential to offer substantial assistance for diagnosis and treatment. Before that, it is crucial to develop benchmarks to evaluate LVLMs' effectiveness in various medical applications. Curren… ▽ More

    Submitted 9 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  43. arXiv:2408.03355  [pdf, other

    cs.CV

    FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning

    Authors: Zhi Chen, Zecheng Zhao, Yadan Luo, Zi Huang

    Abstract: Conventional Text-guided single-image editing approaches require a two-step process, including fine-tuning the target text embedding for over 1K iterations and the generative model for another 1.5K iterations. Although it ensures that the resulting image closely aligns with both the input image and the target text, this process often requires 7 minutes per image, posing a challenge for practical a… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Technical Report

  44. arXiv:2408.03120  [pdf, other

    cs.CV

    Benchmarking In-the-wild Multimodal Disease Recognition and A Versatile Baseline

    Authors: Tianqi Wei, Zhi Chen, Zi Huang, Xin Yu

    Abstract: Existing plant disease classification models have achieved remarkable performance in recognizing in-laboratory diseased images. However, their performance often significantly degrades in classifying in-the-wild images. Furthermore, we observed that in-the-wild plant images may exhibit similar appearances across various diseases (i.e., small inter-class discrepancy) while the same diseases may look… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  45. arXiv:2408.02503  [pdf, other

    cs.CL

    UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

    Authors: Zhaowei Li, Wei Wang, YiQing Cai, Xu Qi, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang

    Abstract: Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental q… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  46. arXiv:2408.01479  [pdf, ps, other

    math.CO

    On the two problems in Ramsey achievement games

    Authors: Zhong Huang, Yusuke Kobayashi, Yaping Mao, Bo Ning, Xiumin Wang

    Abstract: Let $p,q$ be two integers with $p\geq q$. Given a finite graph $F$ with no isolated vertices, the generalized Ramsey achievement game of $F$ on the complete graph $K_n$, denoted by $(p,q;K_n,F,+)$, is played by two players called Alice and Bob. In each round, Alice firstly chooses $p$ uncolored edges $e_1,e_2,...,e_p$ and colors it blue, then Bob chooses $q$ uncolored edge $f_1,f_2,...,f_q$ and co… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 13 pages

  47. arXiv:2408.01292  [pdf

    eess.IV cs.AI cs.CV

    3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks

    Authors: Xiaoshuang Li, Mingyuan Meng, Zimo Huang, Lei Bi, Eduardo Delamare, Dagan Feng, Bin Sheng, Jinman Kim

    Abstract: Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recent… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: accepted by MICCAI 2024

  48. arXiv:2408.00969  [pdf, other

    cs.CV

    Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach

    Authors: Yabin Zhu, Qianwu Wang, Chenglong Li, Jin Tang, Zhixiang Huang

    Abstract: The complementary benefits from visible and thermal infrared data are widely utilized in various computer vision task, such as visual tracking, semantic segmentation and object detection, but rarely explored in Multiple Object Tracking (MOT). In this work, we contribute a large-scale Visible-Thermal video benchmark for MOT, called VT-MOT. VT-MOT has the following main advantages. 1) The data is la… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  49. arXiv:2408.00793  [pdf

    physics.chem-ph cs.LG

    From 2015 to 2023: How Machine Learning Aids Natural Product Analysis

    Authors: Suwen Shi, Ziwei Huang, Xingxin Gu, Xu Lin, Chaoying Zhong, Junjie Hang, Jianli Lin, Claire Chenwen Zhong, Lin Zhang, Yu Li, Junjie Huang

    Abstract: In recent years, conventional chemistry techniques have faced significant challenges due to their inherent limitations, struggling to cope with the increasing complexity and volume of data generated in contemporary research endeavors. Computational methodologies represent robust tools in the field of chemistry, offering the capacity to harness potent machine-learning models to yield insightful ana… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 19 pages, 4 figures

  50. arXiv:2408.00728  [pdf, other

    cs.CL cs.CR cs.LG

    CERT-ED: Certifiably Robust Text Classification for Edit Distance

    Authors: Zhuoqun Huang, Neil G Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein

    Abstract: With the growing integration of AI in daily life, ensuring the robustness of systems to inference-time attacks is crucial. Among the approaches for certifying robustness to such adversarial examples, randomized smoothing has emerged as highly promising due to its nature as a wrapper around arbitrary black-box models. Previous work on randomized smoothing in natural language processing has primaril… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 22 pages, 3 figures, 12 tables. Include 11 pages of appendices