Zum Hauptinhalt springen

Showing 201–250 of 1,033 results for author: Gu, J

.
  1. arXiv:2312.02813  [pdf, other

    cs.CV cs.AI

    BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models

    Authors: Fengyuan Shi, Jiaxi Gu, Hang Xu, Songcen Xu, Wei Zhang, Limin Wang

    Abstract: Diffusion models have made tremendous progress in text-driven image and video generation. Now text-to-image foundation models are widely applied to various downstream image synthesis tasks, such as controllable image generation and image editing, while downstream video synthesis tasks are less explored for several reasons. First, it requires huge memory and computation overhead to train a video ge… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024. Project page: https://bivdiff.github.io; GitHub repository: https://github.com/MCG-NJU/BIVDiff

  2. arXiv:2312.02554  [pdf, other

    cs.LG cs.CL

    ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference

    Authors: Tianchi Cai, Xierui Song, Jiyan Jiang, Fei Teng, Jinjie Gu, Guannan Zhang

    Abstract: Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most preference learning methods, such as RLHF and DPO, depend on pairwise preference data, which inadequately address scenarios where human feedback is point-wise, lead… ▽ More

    Submitted 26 February, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  3. arXiv:2312.02496  [pdf

    cs.CL cs.AI

    MKA: A Scalable Medical Knowledge Assisted Mechanism for Generative Models on Medical Conversation Tasks

    Authors: Ke Liang, Sifan Wu, Jiayi Gu

    Abstract: Using natural language processing (NLP) technologies to develop medical chatbots makes the diagnosis of the patient more convenient and efficient, which is a typical application in healthcare AI. Because of its importance, lots of research have been come out. Recently, the neural generative models have shown their impressive ability as the core of chatbot, while it cannot scale well when directly… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  4. arXiv:2312.02207  [pdf, other

    cs.CV

    TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation

    Authors: Xiaojun Jia, Jindong Gu, Yihao Huang, Simeng Qin, Qing Guo, Yang Liu, Xiaochun Cao

    Abstract: Transferability of adversarial examples on image classification has been systematically explored, which generates adversarial examples in black-box mode. However, the transferability of adversarial examples on semantic segmentation has been largely overlooked. In this paper, we propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on sema… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  5. arXiv:2312.01040  [pdf, other

    cs.CL

    From Beginner to Expert: Modeling Medical Knowledge into General LLMs

    Authors: Qiang Li, Xiaoyan Yang, Haowen Wang, Qin Wang, Lei Liu, Junjie Wang, Yang Zhang, Mingyuan Chu, Sen Hu, Yicheng Chen, Yue Shen, Cong Fan, Wangshu Zhang, Teng Xu, Jinjie Gu, Jing Zheng, Guannan Zhang Ant Group

    Abstract: Recently, large language model (LLM) based artificial intelligence (AI) systems have demonstrated remarkable capabilities in natural language understanding and generation. However, these models face a significant challenge when it comes to sensitive applications, such as reasoning over medical knowledge and answering medical questions in a physician-like manner. Prior studies attempted to overcome… ▽ More

    Submitted 7 January, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Developed by Ant Group for PubMedQA leaderboard

  6. Thermodynamic bounds on the asymmetry of cross-correlations with dynamical activity and entropy production

    Authors: Jie Gu

    Abstract: Entropy production and dynamical activity are two complementary aspects in nonequilibrium physics. The asymmetry of cross-correlation, serving as a distinctive feature of nonequilibrium, also finds widespread utility. In this Letter, we establish two thermodynamic bounds on the normalized asymmetry of cross-correlation in terms of dynamical activity and entropy production rate. These bounds demons… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

    Comments: 6 pages, 2 figures

    Journal ref: Phys. Rev. E 109 (2024), L042101

  7. arXiv:2311.18495  [pdf, other

    cs.LG cs.CV

    Improving Adversarial Transferability via Model Alignment

    Authors: Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu

    Abstract: Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measu… ▽ More

    Submitted 17 July, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted at the European Conference on Computer Vision (ECCV) 2024. Code: https://github.com/averyma/model-alignment

  8. arXiv:2311.18257  [pdf, other

    cs.CV cs.LG

    Diffusion Models Without Attention

    Authors: Jing Nathan Yan, Jiatao Gu, Alexander M. Rush

    Abstract: In recent advancements in high-fidelity image generation, Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a key player. However, their application at high resolutions presents significant computational challenges. Current methods, such as patchifying, expedite processes in UNet and Transformer architectures but at the expense of representational capacity. Addressing this, we intro… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  9. arXiv:2311.18021  [pdf, other

    cs.CV cs.AI cs.LG

    Understanding and Improving In-Context Learning on Vision-language Models

    Authors: Shuo Chen, Zhen Han, Bailan He, Mark Buckley, Philip Torr, Volker Tresp, Jindong Gu

    Abstract: Recently, in-context learning (ICL) on large language models (LLMs) has received great attention, and this technique can also be applied to vision-language models (VLMs) built upon LLMs. These VLMs can respond to queries by conditioning responses on a series of multimodal demonstrations, which comprise images, queries, and answers. Though ICL has been extensively studied on LLMs, its research on V… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 8 pages, 10 figures

  10. arXiv:2311.17600  [pdf, other

    cs.CV

    MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models

    Authors: Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao

    Abstract: The security concerns surrounding Large Language Models (LLMs) have been extensively explored, yet the safety of Multimodal Large Language Models (MLLMs) remains understudied. In this paper, we observe that Multimodal Large Language Models (MLLMs) can be easily compromised by query-relevant images, as if the text query itself were malicious. To address this, we introduce MM-SafetyBench, a comprehe… ▽ More

    Submitted 19 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  11. arXiv:2311.17338  [pdf, other

    cs.CV cs.AI

    MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing

    Authors: Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Qingping Zheng, Zuxuan Wu, Hang Xu, Yu-Gang Jiang

    Abstract: The diffusion model is widely leveraged for either video generation or video editing. As each field has its task-specific problems, it is difficult to merely develop a single diffusion for completing both tasks simultaneously. Video diffusion sorely relying on the text prompt can be adapted to unify the two tasks. However, it lacks a high capability of aligning heterogeneous modalities between tex… ▽ More

    Submitted 15 July, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

  12. arXiv:2311.17216  [pdf, other

    cs.CV

    Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

    Authors: Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu

    Abstract: Diffusion-based models have gained significant popularity for text-to-image generation due to their exceptional image-generation capabilities. A risk with these models is the potential generation of inappropriate content, such as biased or harmful images. However, the underlying reasons for generating such undesired content from the perspective of the diffusion model's internal representation rema… ▽ More

    Submitted 28 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted to CVPR 2024

  13. arXiv:2311.16214  [pdf, other

    quant-ph cs.AR cs.ET cs.LG

    DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting

    Authors: Hanrui Wang, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jonathan Baker, Frederic T. Chong, Song Han

    Abstract: Quantum hardware suffers from high error rates and noise, which makes directly running applications on them ineffective. Quantum Error Correction (QEC) is a critical technique towards fault tolerance which encodes the quantum information distributively in multiple data qubits and uses syndrome qubits to check parity. Minimum-Weight-Perfect-Matching (MWPM) is a popular QEC decoder that takes the sy… ▽ More

    Submitted 22 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 13 pages, 19 figures

  14. arXiv:2311.16190  [pdf, other

    quant-ph cs.AR cs.ET

    Q-Pilot: Field Programmable Qubit Array Compilation with Flying Ancillas

    Authors: Hanrui Wang, Daniel Bochen Tan, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jason Cong, Song Han

    Abstract: Neutral atom arrays have become a promising platform for quantum computing, especially the field programmable qubit array (FPQA) endowed with the unique capability of atom movement. This feature allows dynamic alterations in qubit connectivity during runtime, which can reduce the cost of executing long-range gates and improve parallelism. However, this added flexibility introduces new challenges i… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 10 pages, 16 figures; Published as a conference paper at DAC 2024

  15. arXiv:2311.16082  [pdf, other

    quant-ph cs.AI cs.AR cs.ET cs.LG

    Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers

    Authors: Hanrui Wang, Pengyu Liu, Kevin Shao, Dantong Li, Jiaqi Gu, David Z. Pan, Yongshan Ding, Song Han

    Abstract: Quantum computing has the potential to solve problems that are intractable for classical systems, yet the high error rates in contemporary quantum devices often exceed tolerable limits for useful algorithm execution. Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their stat… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to ICCAD 2023, FAST ML for Science Workshop; 7 pages, 8 figures

  16. arXiv:2311.16035  [pdf, other

    quant-ph cs.AI cs.AR cs.LG

    RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

    Authors: Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, Jinglei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han

    Abstract: Quantum state preparation, a crucial subroutine in quantum computing, involves generating a target quantum state from initialized qubits. Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP). AD employs a predefined procedure to decompose the target state into a series of gates, whereas VQSP iterativel… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Accepted to FASTML @ ICCAD 2023. 14 pages, 20 figures

  17. arXiv:2311.15529  [pdf, other

    cs.CV

    Efficient Dataset Distillation via Minimax Diffusion

    Authors: Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, Yiran Chen

    Abstract: Dataset distillation reduces the storage and computational consumption of training a network by generating a small surrogate dataset that encapsulates rich information of the original large-scale one. However, previous distillation methods heavily rely on the sample-wise iterative optimization scheme. As the images-per-class (IPC) setting or image resolution grows larger, the necessary computation… ▽ More

    Submitted 25 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

    Comments: CVPR 2024

  18. arXiv:2311.15123  [pdf, other

    quant-ph cs.AR cs.DC

    Atomique: A Quantum Compiler for Reconfigurable Neutral Atom Arrays

    Authors: Hanrui Wang, Pengyu Liu, Daniel Bochen Tan, Yilian Liu, Jiaqi Gu, David Z. Pan, Jason Cong, Umut A. Acar, Song Han

    Abstract: The neutral atom array has gained prominence in quantum computing for its scalability and operation fidelity. Previous works focus on fixed atom arrays (FAAs) that require extensive SWAP operations for long-range interactions. This work explores a novel architecture reconfigurable atom arrays (RAAs), also known as field programmable qubit arrays (FPQAs), which allows for coherent atom movements du… ▽ More

    Submitted 2 May, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

    Comments: 17 pages, 26 figures; Published as a conference paper at ISCA 2024

  19. arXiv:2311.14977  [pdf

    cs.CV cs.MM

    Incorporating granularity bias as the margin into contrastive loss for video captioning

    Authors: Jiayang Gu, Fengming Yao

    Abstract: Video captioning models easily suffer from long-tail distribution of phrases, which makes captioning models prone to generate vague sentences instead of accurate ones. However, existing debiasing strategies tend to export external knowledge to build dependency trees of words or refine frequency distribution by complex losses and extra input features, which lack interpretability and are hard to tra… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 6 pages, 2 figures

  20. arXiv:2311.14943  [pdf, ps, other

    physics.plasm-ph physics.acc-ph

    Generation of polarized electron beams through self-injection in the interaction of a laser with a pre-polarized plasma

    Authors: L. R. Yin, X. F. Li, Y. J. Gu, N. Cao, Q. Kong, M. Buescher, S. M. Weng, M. Chen, Z. M. Sheng

    Abstract: Polarized electron beam production via laser wakefield acceleration in pre-polarized plasma is investigated by particle-in-cell simulations. The evolution of the electron beam polarization is studied based on the Thomas-Bargmann-Michel-Telegdi equation for the transverse and longitudinal self-injection, and the depolarization process is found to be influenced by the injection schemes. In the case… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

    Comments: 7 pages, 4 figures

  21. arXiv:2311.14837  [pdf, other

    cs.CV cs.IR

    Benchmarking Robustness of Text-Image Composed Retrieval

    Authors: Shitong Sun, Jindong Gu, Shaogang Gong

    Abstract: Text-image composed retrieval aims to retrieve the target image through the composed query, which is specified in the form of an image plus some text that describes desired modifications to the input image. It has recently attracted attention due to its ability to leverage both information-rich images and concise language to precisely express the requirements for target images. However, the robust… ▽ More

    Submitted 30 November, 2023; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Accepted by R0-FoMo: Workshop on Robustness of Few-shot and Zero-shot Learning in Foundation Models at NeurIPS 2023

  22. arXiv:2311.14323  [pdf, other

    cs.CV

    Binarized 3D Whole-body Human Mesh Recovery

    Authors: Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang

    Abstract: 3D whole-body human mesh recovery aims to reconstruct the 3D human body, face, and hands from a single image. Although powerful deep learning models have achieved accurate estimation in this task, they require enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited edge devices. In this work, we propose a Binarized Dual Residual Network (… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: The code will be available at https://github.com/ZHITENGLI/BiDRN

  23. arXiv:2311.14282  [pdf, other

    cs.CV

    Image Super-Resolution with Text Prompt Diffusion

    Authors: Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai Chen, Xiaokang Yang

    Abstract: Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal meth… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: Code is available at https://github.com/zhengchen1999/PromptSR

  24. arXiv:2311.12919  [pdf, other

    cs.CV cs.AI

    SPOT! Revisiting Video-Language Models for Event Understanding

    Authors: Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, Volker Tresp

    Abstract: Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only… ▽ More

    Submitted 1 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  25. arXiv:2311.11515  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Absence of metallicity and bias-dependent resistivity in low-carrier-density EuCd2As2

    Authors: Yuxiang Wang, Jianwen Ma, Jian Yuan, Wenbin Wu, Yong Zhang, Yicheng Mou, Jiaming Gu, Peihong Cheng, Wu Shi, Xiang Yuan, Jinglei Zhang, Yanfeng Guo, Cheng Zhang

    Abstract: EuCd2As2 was theoretically predicted to be a minimal model of Weyl semimetals with a single pair of Weyl points in the ferromagnet state. However, the heavily p-doped EuCd2As2 crystals in previous experiments prevent direct identification of the semimetal hypothesis. Here we present a comprehensive magneto-transport study of high-quality EuCd2As2 crystals with ultralow bulk carrier density (10^13… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 13 pages, 4 figures

    Journal ref: SCIENCE CHINA Physics, Mechanics & Astronomy, 67(4) 247311 (2024)

  26. arXiv:2311.10951  [pdf, other

    astro-ph.IM

    Detecting Cosmic 21 cm Global Signal Using an Improved Polynomial Fitting Algorithm

    Authors: Tianyang Liu, Junhua Gu, Quan Guo, Huanyuan Shan, Qian Zheng, Jingying Wang

    Abstract: Detecting the cosmic 21 cm signal from Epoch of Reionization (EoR) has always been a difficult task. Although the Galactic foreground can be regarded as a smooth power-law spectrum, due to the chromaticity of the antenna, additional structure will be introduced into the global spectrum, making the polynomial fitting algorithm perform poorly. In this paper, we introduce an improved polynomial fitti… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 14 pages, 15 figures, Accepted for publication in MNRAS

  27. arXiv:2311.08719  [pdf, other

    cs.CL

    Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory

    Authors: Lei Liu, Xiaoyan Yang, Yue Shen, Binbin Hu, Zhiqiang Zhang, Jinjie Gu, Guannan Zhang

    Abstract: Memory-augmented Large Language Models (LLMs) have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, \textit{i.e.}, inconsistent reasoning results when recalling the same history for differen… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  28. arXiv:2311.08263  [pdf, other

    cs.CL

    Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster

    Authors: Hongxuan Zhang, Zhining Liu, Yao Zhao, Jiaqi Zheng, Chenyi Zhuang, Jinjie Gu, Guihai Chen

    Abstract: In this work, we propose FastCoT, a model-agnostic framework based on parallel decoding without any further training of an auxiliary model or modification to the LLM itself. FastCoT uses a size-varying context window whose size changes with position to conduct parallel decoding and auto-regressive decoding simultaneously, thus fully utilizing GPU computation resources. In FastCoT, the parallel dec… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

  29. Heavy quark dominance in orbital excitation of singly and doubly heavy baryons

    Authors: Zhen-Yu Li, Guo-Liang Yu, Zhi-Gang Wang, Jian-Zhong Gu

    Abstract: A mechanism of the heavy quark dominance in the orbital excitation is proposed in this paper which is testified to be reasonable for singly and doubly heavy baryons. In the relativistic quark model, an analysis of the Hamiltonian figures out the mechanism that the excitation mode with lower energy levels is always associated with the heavy quark(s), and the splitting of the energy levels is suppre… ▽ More

    Submitted 1 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 12 pages, 7 figures, 5 tables

    Journal ref: Eur.Phys.J.C84,106(2024)

  30. arXiv:2311.07885  [pdf, other

    cs.CV cs.AI cs.GR

    One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion

    Authors: Minghua Liu, Ruoxi Shi, Linghao Chen, Zhuoyang Zhang, Chao Xu, Xinyue Wei, Hansheng Chen, Chong Zeng, Jiayuan Gu, Hao Su

    Abstract: Recent advancements in open-world 3D object generation have been remarkable, with image-to-3D methods offering superior fine-grained control over their text-to-3D counterparts. However, most existing models fall short in simultaneously providing rapid generation speeds and high fidelity to input images - two features essential for practical applications. In this paper, we present One-2-3-45++, an… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  31. arXiv:2311.07663  [pdf, other

    hep-ph hep-ex

    Probing positivity at the LHC with exclusive photon-fusion processes

    Authors: Jiayin Gu, Chi Shu

    Abstract: By tagging one or two intact protons in the forward direction, it is possible to select and measure exclusive photon-fusion processes at the LHC. The same processes can also be measured in heavy ion collisions, and are often denoted as ultraperipheral collisions (UPC) processes. Such measurements open up the possibility of probing certain dimension-8 operators and their positivity bounds at the LH… ▽ More

    Submitted 18 January, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 17 pages, 4 figures. v2: references added, minor corrections

  32. arXiv:2311.07596  [pdf, ps, other

    cs.SI cs.LG eess.SP

    Graph GOSPA metric: a metric to measure the discrepancy between graphs of different sizes

    Authors: Jinhao Gu, Ángel F. García-Fernández, Robert E. Firth, Lennart Svensson

    Abstract: This paper proposes a metric to measure the dissimilarity between graphs that may have a different number of nodes. The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric, which is a metric for sets, to graphs. The proposed graph GOSPA metric includes costs associated with node attribute errors for properly assigned nodes, missed and false nodes and edge mismatche… ▽ More

    Submitted 27 August, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted in IEEE Transactions on Signal Processing. The code is available at https://github.com/JinhaoGu/The-graph-GOSPA-metric

  33. arXiv:2311.04400  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    LRM: Large Reconstruction Model for Single Image to 3D

    Authors: Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan

    Abstract: We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural rad… ▽ More

    Submitted 9 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  34. arXiv:2311.01977  [pdf, other

    cs.RO cs.AI

    RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches

    Authors: Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, Priya Sundaresan, Peng Xu, Hao Su, Karol Hausman, Chelsea Finn, Quan Vuong, Ted Xiao

    Abstract: Generalization remains one of the most important desiderata for robust robot learning systems. While recently proposed approaches show promise in generalization to novel objects, semantic concepts, or visual distribution shifts, generalization to new tasks remains challenging. For example, a language-conditioned policy trained on pick-and-place tasks will not be able to generalize to a folding tas… ▽ More

    Submitted 6 November, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: Evaluation videos can be found at https://rt-trajectory.github.io/

  35. arXiv:2311.01288  [pdf, other

    cs.DC physics.plasm-ph

    Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting

    Authors: Junmin Gu, Paul Lin, Kesheng Wu, Seung-Hoe Ku, C. S. Chang, R. Michael Churchill, Jong Choi, Norbert Podhorszki, Scott Klasky

    Abstract: This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  36. arXiv:2310.17626  [pdf, ps, other

    cs.CV

    A Survey on Transferability of Adversarial Examples across Deep Neural Networks

    Authors: Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

    Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models in… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Transactions on Machine Learning Research (TMLR)

  37. arXiv:2310.16790  [pdf, other

    cs.CL cs.AI cs.LG

    Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances

    Authors: Zhendong Chu, Ruiyi Zhang, Tong Yu, Rajiv Jain, Vlad I Morariu, Jiuxiang Gu, Ani Nenkova

    Abstract: To achieve state-of-the-art performance, one still needs to train NER models on large-scale, high-quality annotated data, an asset that is both costly and time-intensive to accumulate. In contrast, real-world applications often resort to massive low-quality labeled data through non-expert annotators via crowdsourcing and external knowledge bases via distant supervision as a cost-effective alternat… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: 14 pages

  38. arXiv:2310.16400  [pdf, other

    cs.CV cs.AI

    Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

    Authors: Tianyi Lu, Xing Zhang, Jiaxi Gu, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu

    Abstract: Latent Diffusion Models (LDMs) are renowned for their powerful capabilities in image and video synthesis. Yet, video editing methods suffer from insufficient pre-training data or video-by-video re-training cost. In addressing this gap, we propose FLDM (Fused Latent Diffusion Model), a training-free framework to achieve text-guided video editing by applying off-the-shelf image editing methods in vi… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  39. arXiv:2310.16301  [pdf, other

    cs.CL

    Is ChatGPT a Good Multi-Party Conversation Solver?

    Authors: Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted. In this paper, we delve into the potential of generative LLMs such as ChatGPT and… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by Findings of EMNLP 2023

  40. arXiv:2310.15444  [pdf, other

    cs.CV

    Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks

    Authors: Xiaojun Jia, Jianshu Li, Jindong Gu, Yang Bai, Xiaochun Cao

    Abstract: Adversarial training has shown promise in building robust models against adversarial examples. A major drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. To overcome this limitation, adversarial training based on single-step attacks has been explored. Previous work improves the single-step adversarial training from different perspec… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  41. arXiv:2310.15111  [pdf, other

    cs.CV cs.LG

    Matryoshka Diffusion Models

    Authors: Jiatao Gu, Shuangfei Zhai, Yizhe Zhang, Josh Susskind, Navdeep Jaitly

    Abstract: Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion M… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: 28 pages, 18 figures

  42. arXiv:2310.15069  [pdf, other

    stat.ME q-bio.GN stat.AP

    Second-order group knockoffs with applications to GWAS

    Authors: Benjamin B Chu, Jiaqi Gu, Zhaomeng Chen, Tim Morrison, Emmanuel Candes, Zihuai He, Chiara Sabatti

    Abstract: Conditional testing via the knockoff framework allows one to identify -- among large number of possible explanatory variables -- those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying… ▽ More

    Submitted 3 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: 46 pages, 10 figures, 2 tables, 3 algorithms

  43. arXiv:2310.15052  [pdf, other

    cs.CV

    DREAM+: Efficient Dataset Distillation by Bidirectional Representative Matching

    Authors: Yanqing Liu, Jianyang Gu, Kai Wang, Zheng Zhu, Kaipeng Zhang, Wei Jiang, Yang You

    Abstract: Dataset distillation plays a crucial role in creating compact datasets with similar training performance compared with original large-scale ones. This is essential for addressing the challenges of data storage and training costs. Prevalent methods facilitate knowledge transfer by matching the gradients, embedding distributions, or training trajectories of synthetic images with those of the sampled… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: This is an extension of the ICCV conference version

  44. CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

    Authors: Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

    Abstract: A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by exp… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by MICCAI 2023

  45. arXiv:2310.13263  [pdf, other

    cs.CV

    UE4-NeRF:Neural Radiance Field for Real-Time Rendering of Large-Scale Scene

    Authors: Jiaming Gu, Minchao Jiang, Hongsheng Li, Xiaoyuan Lu, Guangming Zhu, Syed Afaq Ali Shah, Liang Zhang, Mohammed Bennamoun

    Abstract: Neural Radiance Fields (NeRF) is a novel implicit 3D reconstruction method that shows immense potential and has been gaining increasing attention. It enables the reconstruction of 3D scenes solely from a set of photographs. However, its real-time rendering capability, especially for interactive real-time rendering of large-scale scenes, still has significant limitations. To address these challenge… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS2023

  46. arXiv:2310.11716  [pdf, other

    cs.CL

    Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

    Authors: Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Heng Huang, Jiuxiang Gu, Tianyi Zhou

    Abstract: Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as highlighted in several studies, low-quality data in the training set are usually detrimental to instruction tuning, resulting in inconsistent or even m… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  47. arXiv:2310.10322  [pdf, other

    cs.CL

    Untying the Reversal Curse via Bidirectional Language Model Editing

    Authors: Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu

    Abstract: Recent studies have demonstrated that large language models (LLMs) store massive factual knowledge within their parameters. But existing LLMs are prone to hallucinate unintended text due to false or outdated knowledge. Since retraining LLMs is resource intensive, there has been a growing interest in the concept of model editing. Despite the emergence of benchmarks and approaches, these unidirectio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  48. arXiv:2310.10123  [pdf, other

    cs.CV

    AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion

    Authors: Yitong Jiang, Zhaoyang Zhang, Tianfan Xue, Jinwei Gu

    Abstract: We present AutoDIR, an innovative all-in-one image restoration system incorporating latent diffusion. AutoDIR excels in its ability to automatically identify and restore images suffering from a range of unknown degradations. AutoDIR offers intuitive open-vocabulary image editing, empowering users to customize and enhance images according to their preferences. Specifically, AutoDIR consists of two… ▽ More

    Submitted 28 May, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  49. arXiv:2310.09493  [pdf, other

    stat.ME stat.AP

    Summary Statistics Knockoffs Inference with Family-wise Error Rate Control

    Authors: Catherine Xinrui Yu, Jiaqi Gu, Zhaomeng Chen, Zihuai He

    Abstract: Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to sele… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

    Comments: 35 pages

  50. arXiv:2310.08866  [pdf, other

    cs.LG cs.AI

    Adaptivity and Modularity for Efficient Generalization Over Task Complexity

    Authors: Samira Abnar, Omid Saremi, Laurent Dinh, Shantel Wilson, Miguel Angel Bautista, Chen Huang, Vimal Thilak, Etai Littwin, Jiatao Gu, Josh Susskind, Samy Bengio

    Abstract: Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et a… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.