Skip to main content

Showing 1–50 of 456 results for author: Gao, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.13168  [pdf, other

    cs.AI cs.CL

    SciCode: A Research Coding Benchmark Curated by Scientists

    Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

    Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 25 pages, 9 figures, 7 tables

  2. arXiv:2407.12684  [pdf, other

    cs.CV

    4Dynamic: Text-to-4D Generation with Hybrid Priors

    Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

    Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  3. arXiv:2407.12292  [pdf, other

    cs.CV cs.AI

    Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

    Authors: Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song

    Abstract: Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  4. arXiv:2407.08084  [pdf, other

    cs.RO

    Decentralized Adaptive Aerospace Transportation of Unknown Loads Using A Team of Robots

    Authors: Longsen Gao, Kevin Aubert, David Saldana, Claus Danielson, Rafael Fierro

    Abstract: Transportation missions in aerospace are limited to the capability of each aerospace robot and the properties of the target transported object, such as mass, inertia, and grasping locations. We present a novel decentralized adaptive controller design for multiple robots that can be implemented in different kinds of aerospace robots. Our controller adapts to unknown objects in different gravity env… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by DARS2024 Conference. The permission for the preprint version on Arxiv has been approved through the DARS2024 Committee and Springer Press

  5. arXiv:2407.05700  [pdf, other

    cs.CL cs.AI cs.SE

    InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct

    Authors: Yutong Wu, Di Huang, Wenxuan Shi, Wei Wang, Lingzhe Gao, Shihao Liu, Ziyuan Nan, Kaizhao Yuan, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Yewen Pu, Dawei Yin, Xing Hu, Yunji Chen

    Abstract: Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  6. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  7. arXiv:2407.00949  [pdf, ps, other

    cs.CV eess.IV

    SpectralKAN: Kolmogorov-Arnold Network for Hyperspectral Images Change Detection

    Authors: Yanheng Wang, Xiaohan Yu, Yongsheng Gao, Jianjun Sha, Jian Wang, Lianru Gao, Yonggang Zhang, Xianhui Rong

    Abstract: It has been verified that deep learning methods, including convolutional neural networks (CNNs), graph neural networks (GNNs), and transformers, can accurately extract features from hyperspectral images (HSIs). These algorithms perform exceptionally well on HSIs change detection (HSIs-CD). However, the downside of these impressive results is the enormous number of parameters, FLOPs, GPU memory, tr… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  8. arXiv:2406.15485  [pdf, other

    cs.CL cs.CV

    SegHist: A General Segmentation-based Framework for Chinese Historical Document Text Line Detection

    Authors: Xingjian Hu, Baole Wei, Liangcai Gao, Jun Wang

    Abstract: Text line detection is a key task in historical document analysis facing many challenges of arbitrary-shaped text lines, dense texts, and text lines with high aspect ratios, etc. In this paper, we propose a general framework for historical document text detection (SegHist), enabling existing segmentation-based text detection methods to effectively address the challenges, especially text lines with… ▽ More

    Submitted 8 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by ICDAR2024 (poster)

  9. arXiv:2406.07595  [pdf, other

    cs.CR cs.AI cs.SE

    VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

    Authors: Yu Liu, Lang Gao, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen

    Abstract: Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challe… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  10. arXiv:2406.07060  [pdf

    cs.CL cs.AI cs.LG

    Reading Miscue Detection in Primary School through Automatic Speech Recognition

    Authors: Lingyun Gao, Cristian Tejedor-Garcia, Helmer Strik, Catia Cucchiarini

    Abstract: Automatic reading diagnosis systems can benefit both teachers for more efficient scoring of reading exercises and students for accessing reading exercises with feedback more easily. However, there are limited studies on Automatic Speech Recognition (ASR) for child speech in languages other than English, and limited research on ASR-based reading diagnosis systems. This study investigates how effici… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Proc. INTERSPEECH 2024, 1-5 September 2024. Kos Island, Greece

  11. arXiv:2406.04584  [pdf, other

    cs.LG cs.AI cs.CV

    CLoG: Benchmarking Continual Learning of Image Generation Models

    Authors: Haotian Zhang, Junting Zhou, Haowei Lin, Hang Ye, Jianhua Zhu, Zihao Wang, Liangcai Gao, Yizhou Wang, Yitao Liang

    Abstract: Continual Learning (CL) poses a significant challenge in Artificial Intelligence, aiming to mirror the human ability to incrementally acquire knowledge and skills. While extensive research has focused on CL within the context of classification tasks, the advent of increasingly powerful generative models necessitates the exploration of Continual Learning of Generative models (CLoG). This paper advo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  12. arXiv:2406.04093  [pdf, other

    cs.LG cs.AI

    Scaling and evaluating sparse autoencoders

    Authors: Leo Gao, Tom Dupré la Tour, Henk Tillman, Gabriel Goh, Rajan Troll, Alec Radford, Ilya Sutskever, Jan Leike, Jeffrey Wu

    Abstract: Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders need to be very large to recover all relevant features. However, studying the properties of autoencoder scaling is difficult due to the need to balance reconstr… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  13. arXiv:2406.03879  [pdf, other

    cs.LG cs.CV

    Decay Pruning Method: Smooth Pruning With a Self-Rectifying Procedure

    Authors: Minghao Yang, Linlin Gao, Pengyuan Li, Wenbo Li, Yihong Dong, Zhiying Cui

    Abstract: Current structured pruning methods often result in considerable accuracy drops due to abrupt network changes and loss of information from pruned structures. To address these issues, we introduce the Decay Pruning Method (DPM), a novel smooth pruning approach with a self-rectifying mechanism. DPM consists of two key components: (i) Smooth Pruning: It converts conventional single-step pruning into m… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  14. arXiv:2406.02430  [pdf, other

    eess.AS cs.SD

    Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

    Authors: Philip Anastassiou, Jiawei Chen, Jitong Chen, Yuanzhe Chen, Zhuo Chen, Ziyi Chen, Jian Cong, Lelai Deng, Chuang Ding, Lu Gao, Mingqing Gong, Peisong Huang, Qingqing Huang, Zhiying Huang, Yuanyuan Huo, Dongya Jia, Chumin Li, Feiya Li, Hui Li, Jiaxin Li, Xiaoyang Li, Xingxing Li, Lin Liu, Shouda Liu, Sichao Liu , et al. (21 additional authors not shown)

    Abstract: We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech in-context learning, achieving performance in speaker similarity and naturalness that matches ground truth human speech in both objective and sub… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  15. arXiv:2406.00114  [pdf, other

    cs.RO cs.NE

    Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion: An application in 6R robot trajectory planning

    Authors: Bao Liu, Tianbao Liu, Zhongshuo Hu, Fei Ye, Lei Gao

    Abstract: The advancement of industrialization has spurred the development of innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) notable for its robustness, parallelism, simplicity, and efficiency. While LSO excels in single-objective optimization, its multi-objective variants face challenges such as poor initialization, local optima entrapment, and so on. This study proposes Dynam… ▽ More

    Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

  16. arXiv:2405.20657  [pdf, other

    cs.CL

    DORY: Deliberative Prompt Recovery for LLM

    Authors: Lirong Gao, Ru Peng, Yiming Zhang, Junbo Zhao

    Abstract: Prompt recovery in large language models (LLMs) is crucial for understanding how LLMs work and addressing concerns regarding privacy, copyright, etc. The trend towards inference-only APIs complicates this task by restricting access to essential outputs for recovery. To tackle this challenge, we extract prompt-related information from limited outputs and identify a strong(negative) correlation betw… ▽ More

    Submitted 7 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Findings of ACL 2024

  17. arXiv:2405.15356  [pdf, other

    cs.CV

    Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

    Authors: Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen

    Abstract: Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropri… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors

  18. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.14132  [pdf, other

    cs.LG

    Text-to-Model: Text-Conditioned Neural Network Diffusion for Train-Once-for-All Personalization

    Authors: Zexi Li, Lingzhi Gao, Chao Wu

    Abstract: Generative artificial intelligence (GenAI) has made significant progress in understanding world knowledge and generating content from human languages across various modalities, like text-to-text large language models, text-to-image stable diffusion, and text-to-video Sora. While in this paper, we investigate the capability of GenAI for text-to-model generation, to see whether GenAI can comprehend… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Preprint

  20. arXiv:2405.12710   

    cs.CV

    Text-Video Retrieval with Global-Local Semantic Consistent Learning

    Authors: Haonan Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Yihang Duan, Xinyu Lyu, Hengtao Shen

    Abstract: Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, l… ▽ More

    Submitted 15 July, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: The author has withdrawn this paper due to a critical definitional error in concept learning for global/local-interaction learning during training. This error led to an alignment issue with the definition of the text-video retrieval task, causing an unfair comparison with state-of-the-art (SOTA) methods. Consequently, this hindered the accurate evaluation of the paper's contributions

  21. arXiv:2405.09883  [pdf, other

    cs.CV

    RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception

    Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye

    Abstract: We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within… ▽ More

    Submitted 4 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: ECCV 2024. Extended version. 33 pages, 21 figures, 13 tables. https://github.com/xiaosu-zhu/RoScenes

  22. arXiv:2405.09032  [pdf, other

    cs.CV

    ICAL: Implicit Character-Aided Learning for Enhanced Handwritten Mathematical Expression Recognition

    Authors: Jianhua Zhu, Liangcai Gao, Wenqi Zhao

    Abstract: Significant progress has been made in the field of handwritten mathematical expression recognition, while existing encoder-decoder methods are usually difficult to model global information in \LaTeX. Therefore, this paper introduces a novel approach, Implicit Character-Aided Learning (ICAL), to mine the global expression information and enhance handwritten mathematical expression recognition. Spec… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accept by ICDAR 2024

  23. arXiv:2405.06461  [pdf, other

    cs.GR

    SketchDream: Sketch-based Text-to-3D Generation and Editing

    Authors: Feng-Lin Liu, Hongbo Fu, Yu-Kun Lai, Lin Gao

    Abstract: Existing text-based 3D generation methods generate attractive results but lack detailed geometry control. Sketches, known for their conciseness and expressiveness, have contributed to intuitive 3D modeling but are confined to producing texture-less mesh models within predefined categories. Integrating sketch and text simultaneously for 3D generation promises enhanced control over geometry and appe… ▽ More

    Submitted 14 May, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

  24. arXiv:2405.04900  [pdf, other

    cs.CV

    Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences

    Authors: Cheng Song, Lu Lu, Zhen Ke, Long Gao, Shuai Ding

    Abstract: Emotion recognition is an important part of affective computing. Extracting emotional cues from human gaits yields benefits such as natural interaction, a nonintrusive nature, and remote detection. Recently, the introduction of self-supervised learning techniques offers a practical solution to the issues arising from the scarcity of labeled data in the field of gait-based emotion recognition. Howe… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  25. arXiv:2405.01525  [pdf, other

    cs.CL cs.AI

    FLAME: Factuality-Aware Alignment for Large Language Models

    Authors: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen

    Abstract: Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM al… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  26. arXiv:2404.18695  [pdf, other

    cs.CV

    Dual-Modal Prompting for Sketch-Based Image Retrieval

    Authors: Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang, Hanwang Zhang, Yanning Zhang

    Abstract: Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their corresponding realistic images. In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval. Our key innovation lies in the realization that solely addressing this cross-category… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  27. arXiv:2404.16054  [pdf, other

    cs.HC cs.AI

    LlamaTouch: A Faithful and Scalable Testbed for Mobile UI Automation Task Evaluation

    Authors: Li Zhang, Shihe Wang, Xianqing Jia, Zhihan Zheng, Yunhe Yan, Longxi Gao, Yuanchun Li, Mengwei Xu

    Abstract: The emergent large language/multimodal models facilitate the evolution of mobile agents, especially in the task of mobile UI automation. However, existing evaluation approaches, which rely on human validation or established datasets to compare agent-predicted actions with predefined ones, are unscalable and unfaithful. To overcome these limitations, this paper presents LlamaTouch, a testbed for on… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  28. arXiv:2404.11111  [pdf, other

    cs.CV

    CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation

    Authors: Lianyu Hu, Wei Feng, Liqing Gao, Zekang Liu, Liang Wan

    Abstract: In sign language, the conveyance of human body trajectories predominantly relies upon the coordinated movements of hands and facial expressions across successive frames. Despite the recent advancements of sign language understanding methods, they often solely focus on individual frames, inevitably overlooking the inter-frame correlations that are essential for effectively modeling human body traje… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.03202

  29. arXiv:2404.10384  [pdf, other

    cs.CL cs.AI cs.IR

    Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

    Authors: Yuqi Wang, Boran Jiang, Yi Luo, Dawei He, Peng Cheng, Liangcai Gao

    Abstract: Large language models (LLMs), such as GPT3.5, GPT4 and LLAMA2 perform surprisingly well and outperform human experts on many tasks. However, in many domain-specific evaluations, these LLMs often suffer from hallucination problems due to insufficient training of relevant corpus. Furthermore, fine-tuning large models may face problems such as the LLMs are not open source or the construction of high-… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  30. arXiv:2404.09412  [pdf, other

    cs.CV

    DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading

    Authors: Tong Wu, Jia-Mu Sun, Yu-Kun Lai, Yuewen Ma, Leif Kobbelt, Lin Gao

    Abstract: Reconstructing and editing 3D objects and scenes both play crucial roles in computer graphics and computer vision. Neural radiance fields (NeRFs) can achieve realistic reconstruction and editing results but suffer from inefficiency in rendering. Gaussian splatting significantly accelerates rendering by rasterizing Gaussian ellipsoids. However, Gaussian splatting utilizes a single Spherical Harmoni… ▽ More

    Submitted 6 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  31. arXiv:2404.08226  [pdf, other

    cs.CV

    Improving Continuous Sign Language Recognition with Adapted Image Models

    Authors: Lianyu Hu, Tongkai Shi, Liqing Gao, Zekang Liu, Wei Feng

    Abstract: The increase of web-scale weakly labelled image-text pairs have greatly facilitated the development of large-scale vision-language models (e.g., CLIP), which have shown impressive generalization performance over a series of downstream tasks. However, the massive model size and scarcity of available data limit their applications to fine-tune the whole model in downstream tasks. Besides, fully fine-… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  32. arXiv:2404.05220  [pdf, other

    cs.CV

    StylizedGS: Controllable Stylization for 3D Gaussian Splatting

    Authors: Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao

    Abstract: With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user e… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  33. arXiv:2404.00842  [pdf, other

    cs.CV

    An N-Point Linear Solver for Line and Motion Estimation with Event Cameras

    Authors: Ling Gao, Daniel Gehrig, Hang Su, Davide Scaramuzza, Laurent Kneip

    Abstract: Event cameras respond primarily to edges--formed by strong gradients--and are thus particularly well-suited for line-based motion estimation. Recent work has shown that events generated by a single line each satisfy a polynomial constraint which describes a manifold in the space-time volume. Multiple such constraints can be solved simultaneously to recover the partial linear velocity and line para… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  34. arXiv:2403.16034  [pdf, other

    cs.CV

    V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception

    Authors: Hao Xiang, Zhaoliang Zheng, Xin Xia, Runsheng Xu, Letian Gao, Zewei Zhou, Xu Han, Xinkai Ji, Mingxi Li, Zonglin Meng, Li Jin, Mingyue Lei, Zhaoyang Ma, Zihang He, Haoxuan Ma, Yunshuang Yuan, Yingqian Zhao, Jiaqi Ma

    Abstract: Recent advancements in Vehicle-to-Everything (V2X) technologies have enabled autonomous vehicles to share sensing information to see through occlusions, greatly boosting the perception capability. However, there are no real-world datasets to facilitate the real V2X cooperative perception research -- existing datasets either only support Vehicle-to-Infrastructure cooperation or Vehicle-to-Vehicle c… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  35. arXiv:2403.14338  [pdf, ps, other

    quant-ph cs.IT math-ph

    Optimal Second-Order Rates for Quantum Information Decoupling

    Authors: Yu-Chen Shen, Li Gao, Hao-Chung Cheng

    Abstract: In this paper, we consider the standard quantum information decoupling, in which Alice aims to decouple her system from the environment by local operations and discarding some of her systems. To achieve an $\varepsilon$-decoupling with trace distance as the error criterion, we establish a near-optimal one-shot characterization for the largest dimension of the remainder system in terms of the condi… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  36. arXiv:2403.12519  [pdf, other

    cs.CV

    Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition

    Authors: Lianyu Hu, Liqing Gao, Zekang Liu, Wei Feng

    Abstract: Skeleton-aware sign language recognition (SLR) has gained popularity due to its ability to remain unaffected by background information and its lower computational requirements. Current methods utilize spatial graph modules and temporal modules to capture spatial and temporal features, respectively. However, their spatial graph modules are typically built on fixed graph structures such as graph con… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  37. arXiv:2403.11535  [pdf, other

    cs.CV

    EchoReel: Enhancing Action Generation of Existing Video Diffusion Models

    Authors: Jianzhi liu, Junchen Zhu, Lianli Gao, Jingkuan Song

    Abstract: Recent large-scale video datasets have facilitated the generation of diverse open-domain videos of Video Diffusion Models (VDMs). Nonetheless, the efficacy of VDMs in assimilating complex knowledge from these datasets remains constrained by their inherent scale, leading to suboptimal comprehension and synthesis of numerous actions. In this paper, we introduce EchoReel, a novel approach to augment… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 22 pages, 10 figures

  38. arXiv:2403.11134  [pdf, other

    cs.CV cs.GR

    Recent Advances in 3D Gaussian Splatting

    Authors: Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao

    Abstract: The emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished b… ▽ More

    Submitted 13 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  39. arXiv:2403.08994  [pdf, other

    cs.CL

    Ethos: Rectifying Language Models in Orthogonal Parameter Space

    Authors: Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, Murali Annavaram

    Abstract: Language models (LMs) have greatly propelled the research on natural language processing. However, LMs also raise concerns regarding the generation of biased or toxic content and the potential disclosure of private information from the training dataset. In this work, we present a new efficient approach, Ethos, that rectifies LMs to mitigate toxicity and bias in outputs and avoid privacy leakage. E… ▽ More

    Submitted 1 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  40. arXiv:2403.08350  [pdf, other

    cs.CV

    CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model

    Authors: Cheng Chen, Junchen Zhu, Xu Luo, Hengtao Shen, Lianli Gao, Jingkuan Song

    Abstract: Instruction tuning represents a prevalent strategy employed by Multimodal Large Language Models (MLLMs) to align with human instructions and adapt to new tasks. Nevertheless, MLLMs encounter the challenge of adapting to users' evolving knowledge and demands. Therefore, how to retain existing skills while acquiring new knowledge needs to be investigated. In this paper, we present a comprehensive be… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  41. arXiv:2403.01414  [pdf, other

    cs.CV

    Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes

    Authors: Yujie Lu, Long Wan, Nayu Ding, Yulong Wang, Shuhan Shen, Shen Cai, Lin Gao

    Abstract: Neural implicit representation of geometric shapes has witnessed considerable advancements in recent years. However, common distance field based implicit representations, specifically signed distance field (SDF) for watertight shapes or unsigned distance field (UDF) for arbitrary shapes, routinely suffer from degradation of reconstruction accuracy when converting to explicit surface points and mes… ▽ More

    Submitted 1 April, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR 2024

  42. arXiv:2403.00840  [pdf

    cs.CL cs.AI

    EyeGPT: Ophthalmic Assistant with Large Language Models

    Authors: Xiaolan Chen, Ziwei Zhao, Weiyi Zhang, Pusheng Xu, Le Gao, Mingpu Xu, Yue Wu, Yinwen Li, Danli Shi, Mingguang He

    Abstract: Artificial intelligence (AI) has gained significant attention in healthcare consultation due to its potential to improve clinical workflow and enhance medical communication. However, owing to the complex nature of medical information, large language models (LLM) trained with general world knowledge might not possess the capability to tackle medical-related tasks at an expert level. Here, we introd… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 47 pages, 4 figures, 1 table, 2 supplementary figures and 9 supplementary tables

  43. arXiv:2402.17152  [pdf, other

    cs.LG cs.IR

    Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

    Authors: Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, Yu Shi

    Abstract: Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Despite being trained on huge volume of data with thousands of features, most Deep Learning Recommendation Models (DLRMs) in industry fail to scale with compute. Inspired by success achieved by Transformers in… ▽ More

    Submitted 5 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: 26 pages, 13 figures. ICML'24. Code available at https://github.com/facebookresearch/generative-recommenders

  44. arXiv:2402.05403  [pdf, other

    cs.CL cs.AI

    In-Context Principle Learning from Mistakes

    Authors: Tianjun Zhang, Aman Madaan, Luyu Gao, Steven Zheng, Swaroop Mishra, Yiming Yang, Niket Tandon, Uri Alon

    Abstract: In-context learning (ICL, also known as few-shot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): Firs… ▽ More

    Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  45. arXiv:2402.04796  [pdf, other

    cs.GR cs.CV

    Mesh-based Gaussian Splatting for Real-time Large-scale Deformation

    Authors: Lin Gao, Jie Yang, Bo-Tao Zhang, Jia-Mu Sun, Yu-Jie Yuan, Hongbo Fu, Yu-Kun Lai

    Abstract: Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion.… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 11 pages, 7 figures

  46. arXiv:2401.09195  [pdf, other

    cs.CV

    Training-Free Semantic Video Composition via Pre-trained Diffusion Model

    Authors: Jiaqi Guo, Sitong Su, Junchen Zhu, Lianli Gao, Jingkuan Song

    Abstract: The video composition task aims to integrate specified foregrounds and backgrounds from different videos into a harmonious composite. Current approaches, predominantly trained on videos with adjusted foreground color and lighting, struggle to address deep semantic disparities beyond superficial adjustments, such as domain gaps. Therefore, we propose a training-free pipeline employing a pre-trained… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  47. arXiv:2401.04906  [pdf

    cs.IT

    Deep Learning Based Resource Allocation for Full-duplex Device-to-Device Communication

    Authors: Xinxin Zhang, Lei Gao

    Abstract: Device-to-device (D2D) technology is one of the key research areas in 5G/6G networks, and full-duplex (FD) D2D will further enhance its spectral efficiency (SE). In recent years, deep learning approaches have shown remarkable performance in D2D resource allocation tasks. However, most schemes only model the channel state information (CSI) as an independent feature, neglecting the spatial relations… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 7 pages, 5 figures, 5 tables

  48. arXiv:2401.01055  [pdf, other

    cs.CL cs.AI

    LLaMA Beyond English: An Empirical Study on Language Capability Transfer

    Authors: Jun Zhao, Zhihao Zhang, Luhui Gao, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: In recent times, substantial advancements have been witnessed in large language models (LLMs), exemplified by ChatGPT, showcasing remarkable proficiency across a range of complex tasks. However, many mainstream LLMs (e.g. LLaMA) are pretrained on English-dominant corpus, which limits their performance in other non-English languages. In this paper, we focus on how to effectively transfer the capabi… ▽ More

    Submitted 12 January, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  49. arXiv:2401.00268  [pdf, other

    cs.CV

    COMMA: Co-Articulated Multi-Modal Learning

    Authors: Lianyu Hu, Liqing Gao, Zekang Liu, Chi-Man Pun, Wei Feng

    Abstract: Pretrained large-scale vision-language models such as CLIP have demonstrated excellent generalizability over a series of downstream tasks. However, they are sensitive to the variation of input text prompts and need a selection of prompt templates to achieve satisfactory performance. Recently, various methods have been proposed to dynamically learn the prompts as the textual inputs to avoid the req… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted to AAAI2024. Code is available at https://github.com/hulianyuyy/COMMA

  50. arXiv:2312.17425  [pdf, other

    cs.CV cs.AI

    ALF: Adaptive Label Finetuning for Scene Graph Generation

    Authors: Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song

    Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.