Skip to main content

Showing 1–50 of 252 results for author: Yue, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11840  [pdf, other

    cs.CV

    MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification

    Authors: Zhuoxiao Li, Shanliang Yao, Yijie Chu, Angel F. Garcia-Fernandez, Yong Yue, Eng Gee Lim, Xiaohui Zhu

    Abstract: In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and th… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: https://mvgsplatting.github.io

  2. arXiv:2407.08770  [pdf, other

    cs.AI

    Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing

    Authors: Huanqian Wang, Yang Yue, Rui Lu, Jingxin Shi, Andrew Zhao, Shenzhi Wang, Shiji Song, Gao Huang

    Abstract: Large Language Models (LLMs) have demonstrated great potential as generalist assistants, showcasing powerful task understanding and problem-solving capabilities. To deploy LLMs as AI assistants, it is crucial that these models exhibit desirable behavioral traits, such as non-toxicity and resilience against jailbreak attempts. Current methods for detoxification or preventing jailbreaking usually in… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 14 figures

    MSC Class: 68T50 (Primary) 68T07; 62M45 (Secondary) ACM Class: I.2.7

  3. arXiv:2407.03993  [pdf, other

    cs.CL

    A Survey on Natural Language Counterfactual Generation

    Authors: Yongjie Wang, Xiaoqi Qiu, Yu Yue, Xu Guo, Zhiwei Zeng, Yuhong Feng, Zhiqi Shen

    Abstract: Natural Language Counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class. The generated counterfactuals provide insight into the reasoning behind a model's predictions by highlighting which words significantly influence the outcomes. Additionally, they can be used to detect model fairness issues or augment the training d… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: A survey paper

    MSC Class: 68T50 ACM Class: I.2.7

  4. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024

  5. arXiv:2407.01887  [pdf, other

    cs.LG cs.AI cs.CL

    Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents

    Authors: Fanzeng Xia, Hao Liu, Yisong Yue, Tongxin Li

    Abstract: In-context decision-making is an important capability of artificial general intelligence, which Large Language Models (LLMs) have effectively demonstrated in various scenarios. However, LLMs often face challenges when dealing with numerical contexts, and limited attention has been paid to evaluating their performance through preference feedback generated by the environment. This paper investigates… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2406.17530  [pdf, other

    cs.CV cs.RO

    Point Tree Transformer for Point Cloud Registration

    Authors: Meiling Wang, Guangyan Chen, Yi Yang, Li Yuan, Yufeng Yue

    Abstract: Point cloud registration is a fundamental task in the fields of computer vision and robotics. Recent developments in transformer-based methods have demonstrated enhanced performance in this domain. However, the standard attention mechanism utilized in these methods often integrates many low-relevance points, thereby struggling to prioritize its attention weights on sparse yet meaningful points. Th… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  7. arXiv:2406.15788  [pdf, other

    cs.LG

    Distributionally Robust Constrained Reinforcement Learning under Strong Duality

    Authors: Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, Yisong Yue

    Abstract: We study the problem of Distributionally Robust Constrained RL (DRC-RL), where the goal is to maximize the expected reward subject to environmental distribution shifts and constraints. This setting captures situations where training and testing environments differ, and policies must satisfy constraints motivated by safety or limited budgets. Despite significant progress toward algorithm design for… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Accepted at the Reinforcement Learning Conference (RLC) 2024; 28 pages, 4 figures

  8. arXiv:2406.14699  [pdf, other

    cs.LG math.OC stat.ML

    Preferential Multi-Objective Bayesian Optimization

    Authors: Raul Astudillo, Kejun Li, Maegan Tucker, Chu Xin Cheng, Aaron D. Ames, Yisong Yue

    Abstract: Preferential Bayesian optimization (PBO) is a framework for optimizing a decision-maker's latent preferences over available design choices. While preferences often involve multiple conflicting objectives, existing work in PBO assumes that preferences can be encoded by a single objective function. For example, in robotic assistive devices, technicians often attempt to maximize user comfort while si… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.13103  [pdf, other

    cs.AI cs.LG

    A Generic Method for Fine-grained Category Discovery in Natural Language Texts

    Authors: Chang Tian, Matthew B. Blaschko, Wenpeng Yin, Mingzhe Xing, Yinliang Yue, Marie-Francine Moens

    Abstract: Fine-grained category discovery using only coarse-grained supervision is a cost-effective yet challenging task. Previous training methods focus on aligning query samples with positive samples and distancing them from negatives. They often neglect intra-category and inter-category semantic similarities of fine-grained categories when navigating sample distributions in the embedding space. Furthermo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: preprint

  10. arXiv:2406.11193  [pdf, other

    cs.CL

    MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

    Authors: Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu

    Abstract: Projecting visual features into word embedding space has become a significant fusion strategy adopted by Multimodal Large Language Models (MLLMs). However, its internal mechanisms have yet to be explored. Inspired by multilingual research, we identify domain-specific neurons in multimodal large language models. Specifically, we investigate the distribution of domain-specific neurons and the mechan… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.08009  [pdf, other

    cs.CV cs.AI cs.RO

    OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

    Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Jianyu Dou, Yi Yang, Yufeng Yue

    Abstract: In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 7figures. Project Url: https://openobj.github.io/

  12. arXiv:2406.03816  [pdf, other

    cs.CL

    ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

    Authors: Dan Zhang, Sining Zhoubian, Yisong Yue, Yuxiao Dong, Jie Tang

    Abstract: Recent methodologies in LLM self-training mostly rely on LLM generating responses and filtering those with correct output answers as training data. This approach often yields a low-quality fine-tuning training set (e.g., incorrect plans or intermediate reasoning). In this paper, we develop a reinforced self-training approach, called ReST-MCTS*, based on integrating process reward guidance with tre… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 29 pages

  13. arXiv:2406.03044  [pdf, other

    cs.LG q-bio.NC

    Population Transformer: Learning Population-level Representations of Intracranial Activity

    Authors: Geeling Chau, Christopher Wang, Sabera Talukder, Vighnesh Subramaniam, Saraswati Soedarmadji, Yisong Yue, Boris Katz, Andrei Barbu

    Abstract: We present a self-supervised framework that learns population-level codes for intracranial neural recordings at scale, unlocking the benefits of representation learning for a key neuroscience recording modality. The Population Transformer (PopT) lowers the amount of data required for decoding experiments, while increasing accuracy, even on never-before-seen subjects and tasks. We address two key c… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 17 pages, 10 figures, submitted to NeurIPS 2024

  14. arXiv:2406.02721  [pdf, other

    cs.CL cs.AI

    Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller

    Authors: Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Difan Zou, Yisong Yue, Ziniu Hu

    Abstract: We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: 41 pages, 12 figures, 41 tables; Website: https://llm-self-control.github.io/

  15. arXiv:2405.20561  [pdf, other

    cs.CR cs.SE

    All Your Tokens are Belong to Us: Demystifying Address Verification Vulnerabilities in Solidity Smart Contracts

    Authors: Tianle Sun, Ningyu He, Jiang Xiao, Yinliang Yue, Xiapu Luo, Haoyu Wang

    Abstract: In Ethereum, the practice of verifying the validity of the passed addresses is a common practice, which is a crucial step to ensure the secure execution of smart contracts. Vulnerabilities in the process of address verification can lead to great security issues, and anecdotal evidence has been reported by our community. However, this type of vulnerability has not been well studied. To fill the voi… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by USENIX Security 2024

  16. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on Foundation Model for Spatial Data Intelligence: China's 2024 White Paper on Strategic Development of Spatial Data Intelligence

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Tao Qin, Jingyuan Wang, Chuang Tao, Zhengchao Chen, Hao Lu, Jiayi Li, Hongyang Chen, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun , et al. (9 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 29 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  17. arXiv:2405.19647  [pdf, other

    cs.LG

    FTS: A Framework to Find a Faithful TimeSieve

    Authors: Songning Lai, Ninghui Feng, Haochen Sui, Ze Ma, Hao Wang, Zichen Song, Hang Zhao, Yutao Yue

    Abstract: The field of time series forecasting has garnered significant attention in recent years, prompting the development of advanced models like TimeSieve, which demonstrates impressive performance. However, an analysis reveals certain unfaithfulness issues, including high sensitivity to random seeds and minute input noise perturbations. Recognizing these challenges, we embark on a quest to define the c… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Journal ref: IJCAI2024 workshop

  18. arXiv:2405.18782  [pdf, other

    eess.IV cs.CV stat.ML

    Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

    Authors: Zihui Wu, Yu Sun, Yifan Chen, Bingliang Zhang, Yisong Yue, Katherine L. Bouman

    Abstract: Diffusion models (DMs) have recently shown outstanding capability in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior de… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  19. arXiv:2405.16464  [pdf, other

    cs.RO cs.CV

    Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge

    Authors: Tianchen Deng, Yi Zhou, Wenhua Wu, Mingrui Li, Jingwei Huang, Shuhong Liu, Yanzeng Song, Hao Zuo, Yanbo Wang, Yutao Yue, Hesheng Wang, Weidong Chen

    Abstract: This technical report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge. This challenge faces difficulties in drone detection, UAV-type classification and 2D/3D trajectory estimation in extreme weather conditions with multi-modal sensor information, including stereo vision, various Lidars, Radars, and audio arrays. Leveraging this information… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 workshop. The 1st winning model in CVPR 2024 UG2+ challenge. The code and configuration of our method are available at https://github.com/dtc111111/Multi-Modal-UAV

  20. arXiv:2405.14260  [pdf, other

    cs.LG cs.AI

    Graph Sparsification via Mixture of Graphs

    Authors: Guibin Zhang, Xiangguo Sun, Yanwei Yue, Kun Wang, Tianlong Chen, Shirui Pan

    Abstract: Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely o… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  21. arXiv:2405.13448  [pdf, other

    cs.CL

    Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning

    Authors: Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang

    Abstract: The process of instruction tuning aligns pre-trained large language models (LLMs) with open-domain instructions and human-preferred responses. While several studies have explored autonomous approaches to distilling and annotating instructions from more powerful proprietary LLMs, such as ChatGPT, they often neglect the impact of task distributions and the varying difficulty of instructions of the t… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  22. arXiv:2405.12821  [pdf, other

    cs.RO cs.CV

    Talk2Radar: Bridging Natural Language with 4D mmWave Radar for 3D Referring Expression Comprehension

    Authors: Runwei Guan, Ruixiao Zhang, Ningwei Ouyang, Jianan Liu, Ka Lok Man, Xiaohao Cai, Ming Xu, Jeremy Smith, Eng Gee Lim, Yutao Yue, Hui Xiong

    Abstract: Embodied perception is essential for intelligent vehicles and robots, enabling more natural interaction and task execution. However, these advancements currently embrace vision level, rarely focusing on using 3D modeling sensors, which limits the full understanding of surrounding objects with multi-granular characteristics. Recently, as a promising automotive sensor with affordable cost, 4D Millim… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures

  23. arXiv:2405.11446  [pdf, other

    cs.CL cs.LG

    MAML-en-LLM: Model Agnostic Meta-Training of LLMs for Improved In-Context Learning

    Authors: Sanchit Sinha, Yuguang Yue, Victor Soto, Mayank Kulkarni, Jianhua Lu, Aidong Zhang

    Abstract: Adapting large language models (LLMs) to unseen tasks with in-context training samples without fine-tuning remains an important research problem. To learn a robust LLM that adapts well to unseen tasks, multiple meta-training approaches have been proposed such as MetaICL and MetaICT, which involve meta-training pre-trained LLMs on a wide variety of diverse tasks. These meta-training approaches esse… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: KDD 2024, 11 pages(9 main, 2 ref, 1 App) Openreview https://openreview.net/forum?id=JwecLNhWDy&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DKDD.org%2F2024%2FResearch_Track%2FAuthors%23your-submissions)

  24. arXiv:2405.08768  [pdf, other

    cs.CV cs.AI cs.LG

    EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training

    Authors: Yulin Wang, Yang Yue, Rui Lu, Yizeng Han, Shiji Song, Gao Huang

    Abstract: The superior performance of modern visual backbones usually comes with a costly training procedure. We contribute to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using easier-to-harder data. Specifically, we reformulate the training curriculum as a soft-selection function, which uncovers progressively more difficult patterns with… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Journal version of arXiv:2211.09703 (ICCV 2023). Code is available at: https://github.com/LeapLabTHU/EfficientTrain

  25. arXiv:2405.04332  [pdf, other

    cs.CR

    WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

    Authors: Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

    Abstract: Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive se… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Just accepted by the Automated Software Engineering Journal

  26. arXiv:2404.18279  [pdf, other

    cs.CV

    Out-of-distribution Detection in Medical Image Analysis: A survey

    Authors: Zesheng Hong, Yubiao Yue, Yubin Chen, Lele Cong, Huanjie Lin, Yuanmei Luo, Mini Han Wang, Weidong Wang, Jialong Xu, Xiaoqi Yang, Hechang Chen, Zhenzhang Li, Sihong Xie

    Abstract: Computer-aided diagnostics has benefited from the development of deep learning-based computer vision techniques in these years. Traditional supervised deep learning methods assume that the test sample is drawn from the identical distribution as the training data. However, it is possible to encounter out-of-distribution samples in real-world clinical scenarios, which may cause silent failure in dee… ▽ More

    Submitted 3 July, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 23 pages, 3 figures

  27. Understanding Hyperbolic Metric Learning through Hard Negative Sampling

    Authors: Yun Yue, Fangzhou Lin, Guanyi Mou, Ziming Zhang

    Abstract: In recent years, there has been a growing trend of incorporating hyperbolic geometry methods into computer vision. While these methods have achieved state-of-the-art performance on various metric learning tasks using hyperbolic distance measurements, the underlying theoretical analysis supporting this superior performance remains under-exploited. In this study, we investigate the effects of integr… ▽ More

    Submitted 2 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: published in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2024

  28. arXiv:2404.10342  [pdf, other

    cs.CV cs.MM

    Referring Flexible Image Restoration

    Authors: Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

    Abstract: In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 15 pages, 19 figures

  29. arXiv:2404.07790  [pdf, other

    cs.CV

    VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

    Authors: Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

    Abstract: Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  30. arXiv:2404.07770  [pdf, other

    cs.CV

    Joint Conditional Diffusion Model for Image Restoration with Mixed Degradations

    Authors: Yufeng Yue, Meng Yu, Luojie Yang, Yi Yang

    Abstract: Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously. Blind image decomposition was proposed to tackle this issue, however, its effectiveness heavily relies on the accurate estimation of each component. Although diffusion-based models exhibit strong generative abilities in image restoration tasks, they may generate irrele… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  31. arXiv:2404.05187  [pdf, other

    cs.CV cs.GR cs.RO

    LGSDF: Continual Global Learning of Signed Distance Fields Aided by Local Updating

    Authors: Yufeng Yue, Yinan Deng, Jiahui Wang, Yi Yang

    Abstract: Implicit reconstruction of ESDF (Euclidean Signed Distance Field) involves training a neural network to regress the signed distance from any point to the nearest obstacle, which has the advantages of lightweight storage and continuous querying. However, existing algorithms usually rely on conflicting raw observations as training data, resulting in poor map performance. In this paper, we propose LG… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  32. arXiv:2403.15658  [pdf, other

    cs.RO

    Data-Driven Predictive Control for Robust Exoskeleton Locomotion

    Authors: Kejun Li, Jeeseop Kim, Xiaobin Xiong, Kaveh Akbari Hamed, Yisong Yue, Aaron D. Ames

    Abstract: Exoskeleton locomotion must be robust while being adaptive to different users with and without payloads. To address these challenges, this work introduces a data-driven predictive control (DDPC) framework to synthesize walking gaits for lower-body exoskeletons, employing Hankel matrices and a state transition matrix for its data-driven model. The proposed approach leverages DDPC through a multi-la… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  33. arXiv:2403.12686  [pdf, other

    cs.CV cs.MM cs.RO

    WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

    Authors: Runwei Guan, Liye Jia, Fengyufan Yang, Shanliang Yao, Erick Purwanto, Xiaohui Zhu, Eng Gee Lim, Jeremy Smith, Ka Lok Man, Xuming Hu, Yutao Yue

    Abstract: The perception of waterways based on human intent is significant for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the… ▽ More

    Submitted 4 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 10 pages, 10 figures

  34. arXiv:2403.09412  [pdf, other

    cs.CV cs.AI cs.RO

    OpenGraph: Open-Vocabulary Hierarchical 3D Graph Representation in Large-Scale Outdoor Environments

    Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Xinyu Tian, Guangyan Chen, Yi Yang, Yufeng Yue

    Abstract: Environment representations endowed with sophisticated semantics are pivotal for facilitating seamless interaction between robots and humans, enabling them to effectively carry out various tasks. Open-vocabulary maps, powered by Visual-Language models (VLMs), possess inherent advantages, including zero-shot learning and support for open-set classes. However, existing open-vocabulary maps are prima… ▽ More

    Submitted 28 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  35. arXiv:2403.03849  [pdf, other

    eess.IV cs.CV cs.LG

    MedMamba: Vision Mamba for Medical Image Classification

    Authors: Yubiao Yue, Zhenzhang Li

    Abstract: Since the era of deep learning, convolutional neural networks (CNNs) and vision transformers (ViTs) have been extensively studied and widely used in medical image classification tasks. Unfortunately, CNN's limitations in modeling long-range dependencies result in poor classification performances. In contrast, ViTs are hampered by the quadratic computational complexity of their self-attention mecha… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  36. arXiv:2403.01248  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

    Authors: Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi

    Abstract: This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts which render complex scenes with up to a hundred 3D assets. This process requires complex spatial planning and arrangement. We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning. SceneCraft first models… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  37. arXiv:2402.18546  [pdf, other

    cs.LG

    Generalizability Under Sensor Failure: Tokenization + Transformers Enable More Robust Latent Spaces

    Authors: Geeling Chau, Yujin An, Ahamed Raffey Iqbal, Soon-Jo Chung, Yisong Yue, Sabera Talukder

    Abstract: A major goal in neuroscience is to discover neural data representations that generalize. This goal is challenged by variability along recording sessions (e.g. environment), subjects (e.g. varying neural structures), and sensors (e.g. sensor noise), among others. Recent work has begun to address generalization across sessions and subjects, but few study robustness to sensor failure which is highly… ▽ More

    Submitted 19 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 9 pages, 8 figures

  38. arXiv:2402.16412  [pdf, other

    cs.LG

    TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis

    Authors: Sabera Talukder, Yisong Yue, Georgia Gkioxari

    Abstract: The field of general time series analysis has recently begun to explore unified modeling, where a common architectural backbone can be retrained on a specific task for a specific dataset. In this work, we approach unification from a complementary vantage point: unification across tasks and domains. To this end, we explore the impact of discrete, learnt, time series data representations that enable… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  39. arXiv:2402.14285  [pdf, other

    cs.SD cs.LG eess.AS

    Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion

    Authors: Yujia Huang, Adishree Ghatare, Yuanzhe Liu, Ziniu Hu, Qinsheng Zhang, Chandramouli S Sastry, Siddharth Gururani, Sageev Oore, Yisong Yue

    Abstract: We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose \oursfull (\ours), a novel gui… ▽ More

    Submitted 2 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: ICML 2024 (Oral)

  40. arXiv:2402.12065  [pdf, other

    cs.LG cs.AI cs.CL

    WKVQuant: Quantizing Weight and Key/Value Cache for Large Language Models Gains More

    Authors: Yuxuan Yue, Zhihang Yuan, Haojie Duanmu, Sifan Zhou, Jianlong Wu, Liqiang Nie

    Abstract: Large Language Models (LLMs) face significant deployment challenges due to their substantial memory requirements and the computational demands of auto-regressive text generation process. This paper addresses these challenges by focusing on the quantization of LLMs, a technique that reduces memory consumption by converting model parameters and activations into low-bit integers. We critically analyz… ▽ More

    Submitted 20 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Frist work to exclusively quantize weight and Key/Value cache for large language models

  41. arXiv:2402.10130  [pdf, other

    cs.LG cs.AI cs.CV

    Is Continual Learning Ready for Real-world Challenges?

    Authors: Theodora Kontogianni, Yuanwen Yue, Siyu Tang, Konrad Schindler

    Abstract: Despite continual learning's long and well-established academic history, its application in real-world scenarios remains rather limited. This paper contends that this gap is attributable to a misalignment between the actual challenges of continual learning and the evaluation protocols in use, rendering proposed solutions ineffective for addressing the complexities of real-world setups. We validate… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  42. arXiv:2402.01242  [pdf, other

    cs.LG

    Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

    Authors: Guibin Zhang, Yanwei Yue, Kun Wang, Junfeng Fang, Yongduo Sui, Kai Wang, Yuxuan Liang, Dawei Cheng, Shirui Pan, Tianlong Chen

    Abstract: Graph Neural Networks (GNNs) excel in various graph learning tasks but face computational challenges when applied to large-scale graphs. A promising solution is to remove non-essential edges to reduce the computational overheads in GNN. Previous literature generally falls into two categories: topology-guided and semantic-guided. The former maintains certain graph topological properties yet often u… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  43. arXiv:2401.11353  [pdf, other

    cs.LG

    Distributionally Robust Policy Evaluation under General Covariate Shift in Contextual Bandits

    Authors: Yihong Guo, Hao Liu, Yisong Yue, Anqi Liu

    Abstract: We introduce a distributionally robust approach that enhances the reliability of offline policy evaluation in contextual bandits under general covariate shifts. Our method aims to deliver robust policy evaluation results in the presence of discrepancies in both context and policy distribution between logging and target data. Central to our methodology is the application of robust regression, a dis… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  44. arXiv:2401.07950  [pdf, other

    cs.CL

    SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning

    Authors: Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, Jie Tang

    Abstract: Large Language Models (LLMs) have shown promise in assisting scientific discovery. However, such applications are currently limited by LLMs' deficiencies in understanding intricate scientific concepts, deriving symbolic equations, and solving advanced numerical calculations. To bridge these gaps, we introduce SciGLM, a suite of scientific language models able to conduct college-level scientific re… ▽ More

    Submitted 12 March, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: 21 pages

  45. arXiv:2312.12817  [pdf

    cs.CY

    Research on the Development of Blockchain-based Distributed Intelligent Healthcare Industry -- A Policy Analysis Perspective

    Authors: Yang Yue, Joseph Z. Shyu

    Abstract: As a pivotal innovation in digital infrastructure, blockchain ledger technology catalyzes the development of nascent business paradigms and applications globally. Utilizing Rothwell and Zegveld's taxonomy of twelve innovation policy tools, this study offers a nuanced comparison of domestic blockchain policies, dissecting supply, environment, and demand-driven policy dimensions to distill prevailin… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 6 pages, 4 figures

  46. arXiv:2312.08851  [pdf, other

    cs.CV cs.CE cs.RO

    Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous Modalities

    Authors: Runwei Guan, Haocheng Zhao, Shanliang Yao, Ka Lok Man, Xiaohui Zhu, Limin Yu, Yong Yue, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

    Abstract: Urban water-surface robust perception serves as the foundation for intelligent monitoring of aquatic environments and the autonomous navigation and operation of unmanned vessels, especially in the context of waterway safety. It is worth noting that current multi-sensor fusion and multi-task learning models consume substantial power and heavily rely on high-power GPUs for inference. This contribute… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 18 pages, 9 figures

  47. arXiv:2312.04861  [pdf, other

    cs.CV cs.AI

    Exploring Radar Data Representations in Autonomous Driving: A Comprehensive Review

    Authors: Shanliang Yao, Runwei Guan, Zitian Peng, Chenhang Xu, Yilu Shi, Weiping Ding, Eng Gee Lim, Yong Yue, Hyungjoon Seo, Ka Lok Man, Jieming Ma, Xiaohui Zhu, Yutao Yue

    Abstract: With the rapid advancements of sensor technology and deep learning, autonomous driving systems are providing safe and efficient access to intelligent vehicles as well as intelligent transportation. Among these equipped sensors, the radar sensor plays a crucial role in providing robust perception information in diverse environmental conditions. This review focuses on exploring different radar data… ▽ More

    Submitted 19 April, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: 24 pages, 10 figures, 5 tables. arXiv admin note: text overlap with arXiv:2304.10410

  48. arXiv:2312.04055  [pdf

    cs.LG

    Jointly spatial-temporal representation learning for individual trajectories

    Authors: Fei Huang, Jianrong Lv, Yang Yue

    Abstract: Individual trajectories, rich in human-environment interaction information across space and time, serve as vital inputs for geospatial foundation models (GeoFMs). However, existing attempts at learning trajectory representations have overlooked the implicit spatial-temporal dependency within trajectories, failing to encode such dependency in a deep learning-friendly format. That poses a challenge… ▽ More

    Submitted 11 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 27 pages, 3 tables, 7 figures

  49. arXiv:2312.02419  [pdf, other

    cs.RO

    Human Demonstrations are Generalizable Knowledge for Robots

    Authors: Te Cui, Guangyan Chen, Tianxing Zhou, Zicai Peng, Mengxiao Hu, Haoyang Lu, Haizhou Li, Meiling Wang, Yi Yang, Yufeng Yue

    Abstract: Learning from human demonstrations is an emerging trend for designing intelligent robotic systems. However, previous methods typically regard videos as instructions, simply dividing them into action sequences for robotic repetition, which poses obstacles to generalization to diverse tasks or object instances. In this paper, we propose a different perspective, considering human demonstration videos… ▽ More

    Submitted 12 May, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  50. arXiv:2312.01658  [pdf, other

    cs.LG cs.DC math.OC

    AGD: an Auto-switchable Optimizer using Stepwise Gradient Difference for Preconditioning Matrix

    Authors: Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang

    Abstract: Adaptive optimizers, such as Adam, have achieved remarkable success in deep learning. A key component of these optimizers is the so-called preconditioning matrix, providing enhanced gradient information and regulating the step size of each gradient direction. In this paper, we propose a novel approach to designing the preconditioning matrix by utilizing the gradient difference between two successi… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 21 pages. Accepted as a conference paper at NeurIPS '23