Zum Hauptinhalt springen

Showing 101–150 of 1,033 results for author: Gu, J

.
  1. arXiv:2404.05783  [pdf, other

    cs.CY cs.AI cs.CL cs.CV

    Responsible Generative AI: What to Generate and What Not

    Authors: Jindong Gu

    Abstract: In recent years, generative AI (GenAI), like large language models and text-to-image models, has received significant attention across various domains. However, ensuring the responsible generation of content by these models is crucial for their real-world applicability. This raises an interesting question: \textit{What should responsible GenAI generate, and what should it not?} To answer the quest… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 74 pages, 10 figures

  2. arXiv:2404.05717  [pdf, other

    cs.CV cs.AI

    SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

    Authors: Jing Gu, Yilin Wang, Nanxuan Zhao, Wei Xiong, Qing Liu, Zhifei Zhang, He Zhang, Jianming Zhang, HyunJoon Jung, Xin Eric Wang

    Abstract: Effective editing of personal content holds a pivotal role in enabling individuals to express their creativity, weaving captivating narratives within their visual stories, and elevate the overall quality and impact of their visual content. Therefore, in this work, we introduce SwapAnything, a novel framework that can swap any objects in an image with personalized concepts given by the reference, w… ▽ More

    Submitted 6 May, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 18 pages, 16 figures, 3 tables

  3. arXiv:2404.03422  [pdf, other

    stat.ME

    Empirical Bayes for the Reluctant Frequentist

    Authors: Roger Koenker, Jiaying Gu

    Abstract: Empirical Bayes methods offer valuable tools for a large class of compound decision problems. In this tutorial we describe some basic principles of the empirical Bayes paradigm stressing their frequentist interpretation. Emphasis is placed on recent developments of nonparametric maximum likelihood methods for estimating mixture models. A more extensive introductory treatment will eventually be ava… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  4. arXiv:2404.03411  [pdf, ps, other

    cs.LG cs.CL cs.CR

    Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks?

    Authors: Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, Jindong Gu

    Abstract: Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproductio… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: technical report

  5. arXiv:2404.03198  [pdf, other

    stat.ME

    Delaunay Weighted Two-sample Test for High-dimensional Data by Incorporating Geometric Information

    Authors: Jiaqi Gu, Ruoxu Tan, Guosheng Yin

    Abstract: Two-sample hypothesis testing is a fundamental problem with various applications, which faces new challenges in the high-dimensional context. To mitigate the issue of the curse of dimensionality, high-dimensional data are typically assumed to lie on a low-dimensional manifold. To incorporate geometric informtion in the data, we propose to apply the Delaunay triangulation and develop the Delaunay w… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    MSC Class: 62G10; 62G20

  6. arXiv:2404.03109  [pdf, other

    cs.CV

    Many-to-many Image Generation with Auto-regressive Diffusion Models

    Authors: Ying Shen, Yizhe Zhang, Shuangfei Zhai, Lifu Huang, Joshua M. Susskind, Jiatao Gu

    Abstract: Recent advancements in image generation have made significant progress, yet existing models present limitations in perceiving and generating an arbitrary number of interrelated images within a broad context. This limitation becomes increasingly critical as the demand for multi-image scenarios, such as multi-view images and visual narratives, grows with the expansion of multimedia platforms. This p… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  7. arXiv:2404.02697  [pdf, other

    cs.CV

    Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution

    Authors: Fengyuan Liu, Haochen Luo, Yiming Li, Philip Torr, Jindong Gu

    Abstract: Recent progress in visual generative models enables the generation of high-quality images. To prevent the misuse of generated images, it is important to identify the origin model that generates them. In this work, we study the origin attribution of generated images in a practical setting where only a few images generated by a source model are available and the source model cannot be accessed. The… ▽ More

    Submitted 18 July, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  8. arXiv:2403.20248  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci

    Gate-tunable quantum acoustoelectric transport in graphene

    Authors: Yicheng Mou, Haonan Chen, Jiaqi Liu, Qing Lan, Jiayu Wang, Chuanxin Zhang, Yuxiang Wang, Jiaming Gu, Tuoyu Zhao, Xue Jiang, Wu Shi, Cheng Zhang

    Abstract: Transport probes the motion of quasiparticles in response to external excitations. Apart from the well-known electric and thermoelectric transport, acoustoelectric transport induced by traveling acoustic waves has been rarely explored. Here, by adopting a hybrid nanodevices integrated with piezoelectric substrates, we establish a simple design of acoustoelectric transport with gate tunability. We… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 16 pages, 5 figures

    Journal ref: Nano Letters 24(15), 4625-4632 (2024)

  9. arXiv:2403.19275  [pdf, other

    cs.CL cs.AI

    Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent

    Authors: Junkai Zhou, Liang Pang, Ya Jing, Jia Gu, Huawei Shen, Xueqi Cheng

    Abstract: Constructing personalized and anthropomorphic agents holds significant importance in the simulation of social networks. However, there are still two key problems in existing works: the agent possesses world knowledge that does not belong to its personas, and it cannot eliminate the interference of diverse persona information on current actions, which reduces the personalization and anthropomorphis… ▽ More

    Submitted 2 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  10. arXiv:2403.16446  [pdf, other

    cs.CL

    Towards Automatic Evaluation for LLMs' Clinical Capabilities: Metric, Data, and Algorithm

    Authors: Lei Liu, Xiaoyan Yang, Fangzhou Li, Chenfei Chi, Yue Shen, Shiwei Lyu Ming Zhang, Xiaowei Ma, Xiangguo Lyu, Liya Ma, Zhiqiang Zhang, Wei Xue, Yiran Huang, Jinjie Gu

    Abstract: Large language models (LLMs) are gaining increasing interests to improve clinical efficiency for medical diagnosis, owing to their unprecedented performance in modelling natural language. Ensuring the safe and reliable clinical applications, the evaluation of LLMs indeed becomes critical for better mitigating the potential risks, e.g., hallucinations. However, current evaluation methods heavily re… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  11. arXiv:2403.16028  [pdf, other

    cs.CV cs.LG

    Exploring the Impact of Dataset Bias on Dataset Distillation

    Authors: Yao Lu, Jianyang Gu, Xuguang Chen, Saeed Vahidian, Qi Xuan

    Abstract: Dataset Distillation (DD) is a promising technique to synthesize a smaller dataset that preserves essential information from the original dataset. This synthetic dataset can serve as a substitute for the original large-scale one, and help alleviate the training workload. However, current DD methods typically operate under the assumption that the dataset is unbiased, overlooking potential bias issu… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  12. arXiv:2403.14806  [pdf, other

    cs.ET physics.app-ph physics.optics

    Photonic-Electronic Integrated Circuits for High-Performance Computing and AI Accelerators

    Authors: Shupeng Ning, Hanqing Zhu, Chenghao Feng, Jiaqi Gu, Zhixing Jiang, Zhoufeng Ying, Jason Midkiff, Sourabh Jain, May H. Hlaing, David Z. Pan, Ray T. Chen

    Abstract: In recent decades, the demand for computational power has surged, particularly with the rapid expansion of artificial intelligence (AI). As we navigate the post-Moore's law era, the limitations of traditional electrical digital computing, including process bottlenecks and power consumption issues, are propelling the search for alternative computing paradigms. Among various emerging technologies, i… ▽ More

    Submitted 11 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2403.13448  [pdf, other

    hep-ex astro-ph.CO

    Improved modelling for dark photon detection with dish antennas

    Authors: Jordan Gué, Aurélien Hees, Peter Wolf, Etienne Savalle, Laurent Chevalier, Pierre Brun

    Abstract: A vector dark matter candidate, also known as dark photon, would induce an oscillating electric field through kinetic mixing. One detection strategy uses a spherical reflector to focus the induced emission at its center of curvature. On one hand, we investigate the effects of diffraction in this type of experiment from an analytical standpoint, making use of the Kirchhoff integral theorem in the l… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 12+3 pages, 6 figures

  14. arXiv:2403.12800  [pdf, other

    cs.CV

    Learning Neural Volumetric Pose Features for Camera Localization

    Authors: Jingyu Lin, Jiaqi Gu, Bojian Wu, Lubin Fan, Renjie Chen, Ligang Liu, Jieping Ye

    Abstract: We introduce a novel neural volumetric pose feature, termed PoseMap, designed to enhance camera localization by encapsulating the information between images and the associated camera poses. Our framework leverages an Absolute Pose Regression (APR) architecture, together with an augmented NeRF module. This integration not only facilitates the generation of novel views to enrich the training dataset… ▽ More

    Submitted 11 July, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted at ECCV 2024. Project page: https://gujiaqivadin.github.io/posemap/

  15. arXiv:2403.12693  [pdf, other

    cs.CV

    As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?

    Authors: Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr

    Abstract: Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems. While pre-training offers clear advantages for downstream learning, it also endows downstream models with shared adversarial vulnerabilities that can be easily identified through the open-sourced foundation model. In this work, we expose such vulnerab… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  16. arXiv:2403.12386  [pdf

    cs.CL cs.AI

    Pipelined Biomedical Event Extraction Rivaling Joint Learning

    Authors: Pengchao Wu, Xuefeng Li, Jinghang Gu, Longhua Qian, Guodong Zhou

    Abstract: Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.12032  [pdf, other

    cs.CV cs.GR

    Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

    Authors: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

    Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denois… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: V2 note: Fix missing acknowledgements. Project page: https://lakonik.github.io/mvedit

  18. arXiv:2403.10146  [pdf, other

    cs.SD cs.IR eess.AS

    Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

    Authors: Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

    Abstract: Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention. Existing methods typically aggregate information from each modality into a single vector for matching, but this sacrifices local details and can hardly capture intricate relationships within and between modalities. Furthermore, current ATR d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 5 pages, accepted to ICASSP2024

  19. arXiv:2403.09766  [pdf, other

    cs.CV

    An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models

    Authors: Haochen Luo, Jindong Gu, Fengyuan Liu, Philip Torr

    Abstract: Different from traditional task-specific vision models, recent large VLMs can readily adapt to different vision tasks by simply using different textual instructions, i.e., prompts. However, a well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations. Furthermore, the concern is exacerbated by the phenomenon that the same… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024

  20. arXiv:2403.08253  [pdf, ps, other

    math.NA

    Explicit radial basis function Runge-Kutta methods

    Authors: Jiaxi Gu, Xinjuan Chen, Jae-Hun Jung

    Abstract: The aim of this paper is to design the explicit radial basis function (RBF) Runge-Kutta methods for the initial value problem. We construct the two-, three- and four-stage RBF Runge-Kutta methods based on the Gaussian RBF Euler method with the shape parameter, where the analysis of the local truncation error shows that the s-stage RBF Runge-Kutta method could formally achieve order s+1. The proof… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  21. arXiv:2403.06779  [pdf, other

    q-fin.ST

    From Factor Models to Deep Learning: Machine Learning in Reshaping Empirical Asset Pricing

    Authors: Junyi Ye, Bhaskar Goswami, Jingyi Gu, Ajim Uddin, Guiling Wang

    Abstract: This paper comprehensively reviews the application of machine learning (ML) and AI in finance, specifically in the context of asset pricing. It starts by summarizing the traditional asset pricing models and examining their limitations in capturing the complexities of financial markets. It explores how 1) ML models, including supervised, unsupervised, semi-supervised, and reinforcement learning, pr… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  22. arXiv:2403.06485  [pdf, other

    cs.SE cs.CL cs.LG

    Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

    Authors: Jinxi Kuang, Jinyang Liu, Junjie Huang, Renyi Zhong, Jiazhen Gu, Lan Yu, Rui Tan, Zengyin Yang, Michael R. Lyu

    Abstract: Due to the scale and complexity of cloud systems, a system failure would trigger an "alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typic… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024)

  23. arXiv:2403.06259  [pdf, other

    cs.CL cs.AI cs.DB cs.IR cs.LG

    Editing Conceptual Knowledge for Large Language Models

    Authors: Xiaohan Wang, Shengyu Mao, Ningyu Zhang, Shumin Deng, Yunzhi Yao, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

    Abstract: Recently, there has been a growing interest in knowledge editing for Large Language Models (LLMs). Current approaches and evaluations merely explore the instance-level editing, while whether LLMs possess the capability to modify concepts remains unclear. This paper pioneers the investigation of editing conceptual knowledge for LLMs, by constructing a novel benchmark dataset ConceptEdit and establi… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Work in progress. Code: https://github.com/zjunlp/EasyEdit Dataset: https://huggingface.co/datasets/zjunlp/ConceptEdit

  24. arXiv:2403.05247  [pdf, other

    cs.CV eess.IV

    Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds

    Authors: Tianrui Lou, Xiaojun Jia, Jindong Gu, Li Liu, Siyuan Liang, Bangyan He, Xiaochun Cao

    Abstract: Adversarial attack methods based on point manipulation for 3D point cloud classification have revealed the fragility of 3D models, yet the adversarial examples they produce are easily perceived or defended against. The trade-off between the imperceptibility and adversarial strength leads most point attack methods to inevitably introduce easily detectable outlier points upon a successful attack. An… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  25. arXiv:2403.04732  [pdf, other

    cs.AI cs.CL cs.CV

    How Far Are We from Intelligent Visual Deductive Reasoning?

    Authors: Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly

    Abstract: Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks. We dig into vision-based deductive reasoning, a more sophisticated but less explored realm, and find previously unexposed blindspots in the current SOTA VLMs. Specifically, we leverage Raven's Progressive Matrices (RPMs), to assess VLMs' abilities to perform multi-hop relati… ▽ More

    Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: ICLR 2024 AGI workshop. https://github.com/apple/ml-rpm-bench

  26. arXiv:2403.04528  [pdf, other

    hep-th math-ph

    Resurgent Wilson loops in refined topological string

    Authors: Jie Gu, Gengbei Guo

    Abstract: We study the resurgent structures of Wilson loops in refined topological string theory. We argue that the Borel singularities should be integral periods, and that the associated Stokes constants are refined Donaldson-Thomas invariants, just like the free energies, except that the Borel singularities cannot be local flat coordinates. We also solve the non-perturbative series in closed form from the… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 39 pages, 26 figures

  27. arXiv:2403.04010  [pdf, other

    cs.LG

    Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks

    Authors: Jing Gu, Dongmian Zou

    Abstract: Graph anomaly detection plays a vital role for identifying abnormal instances in complex networks. Despite advancements of methodology based on deep learning in recent years, existing benchmarking approaches exhibit limitations that hinder a comprehensive comparison. In this paper, we revisit datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects. Firs… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Presented at the Second Learning on Graphs Conference (LoG 2023)

  28. arXiv:2403.03631  [pdf, other

    cs.LG eess.SY

    Tackling Missing Values in Probabilistic Wind Power Forecasting: A Generative Approach

    Authors: Honglin Wen, Pierre Pinson, Jie Gu, Zhijian Jin

    Abstract: Machine learning techniques have been successfully used in probabilistic wind power forecasting. However, the issue of missing values within datasets due to sensor failure, for instance, has been overlooked for a long time. Although it is natural to consider addressing this issue by imputing missing values before model estimation and forecasting, we suggest treating missing values and forecasting… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 8 pages, to be presented at Power Systems Computation Conference (PSCC) 2024

  29. arXiv:2403.03447  [pdf, other

    cs.CV

    HDRFlow: Real-Time HDR Video Reconstruction with Large Motions

    Authors: Gangwei Xu, Yujin Wang, Jinwei Gu, Tianfan Xue, Xin Yang

    Abstract: Reconstructing High Dynamic Range (HDR) video from image sequences captured with alternating exposures is challenging, especially in the presence of large camera or object motion. Existing methods typically align low dynamic range sequences using optical flow or attention mechanism for deghosting. However, they often struggle to handle large complex motions and are computationally expensive. To ad… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Project website: https://openimaginglab.github.io/HDRFlow/

  30. arXiv:2403.03101  [pdf, other

    cs.CL cs.AI cs.HC cs.LG cs.MA

    KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents

    Authors: Yuqi Zhu, Shuofei Qiao, Yixin Ou, Shumin Deng, Ningyu Zhang, Shiwei Lyu, Yue Shen, Lei Liang, Jinjie Gu, Huajun Chen

    Abstract: Large Language Models (LLMs) have demonstrated great potential in complex reasoning tasks, yet they fall short when tackling more sophisticated challenges, especially when interacting with environments through generating executable actions. This inadequacy primarily stems from the lack of built-in action knowledge in language agents, which fails to effectively guide the planning trajectories durin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Work in progress. Project page: https://zjunlp.github.io/project/KnowAgent/ Code: https://github.com/zjunlp/KnowAgent

  31. arXiv:2403.02709  [pdf, other

    cs.RO

    RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

    Authors: Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal

    Abstract: Natural language and images are commonly used as goal representations in goal-conditioned imitation learning (IL). However, natural language can be ambiguous and images can be over-specified. In this work, we propose hand-drawn sketches as a modality for goal specification in visual imitation learning. Sketches are easy for users to provide on the fly like language, but similar to images they can… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  32. arXiv:2403.02688  [pdf, other

    cs.ET cs.AI cs.LG

    DOCTOR: Dynamic On-Chip Temporal Variation Remediation Toward Self-Corrected Photonic Tensor Accelerators

    Authors: Haotian Lu, Sanmitra Banerjee, Jiaqi Gu

    Abstract: Photonic computing has emerged as a promising solution for accelerating computation-intensive artificial intelligence (AI) workloads, offering unparalleled speed and energy efficiency, especially in resource-limited, latency-sensitive edge computing environments. However, the deployment of analog photonic tensor accelerators encounters reliability challenges due to hardware noise and environmental… ▽ More

    Submitted 31 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 9 pages. Accepted to IEEE JLT 2024

  33. arXiv:2403.02601  [pdf, other

    eess.IV cs.CV

    Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

    Authors: Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

    Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  34. arXiv:2402.19150  [pdf, other

    cs.CV

    Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model

    Authors: Hao Cheng, Erjia Xiao, Jindong Gu, Le Yang, Jinhao Duan, Jize Zhang, Jiahang Cao, Kaidi Xu, Renjing Xu

    Abstract: Large Vision-Language Models (LVLMs) rely on vision encoders and Large Language Models (LLMs) to exhibit remarkable capabilities on various multi-modal tasks in the joint space of vision and language. However, the Typographic Attack, which disrupts vision-language models (VLMs) such as Contrastive Language-Image Pretraining (CLIP), has also been expected to be a security threat to LVLMs. Firstly,… ▽ More

    Submitted 21 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  35. arXiv:2402.18892  [pdf, other

    cs.CV cs.RO

    Aligning Knowledge Graph with Visual Perception for Object-goal Navigation

    Authors: Nuo Xu, Wen Wang, Rong Yang, Mengjie Qin, Zheyuan Lin, Wei Song, Chunlong Zhang, Jason Gu, Chao Li

    Abstract: Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA 2024

  36. arXiv:2402.17583  [pdf, other

    cs.SE cs.CL cs.LG

    FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems

    Authors: Junjie Huang, Jinyang Liu, Zhuangbin Chen, Zhihan Jiang, Yichen LI, Jiazhen Gu, Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

    Abstract: Postmortem analysis is essential in the management of incidents within cloud systems, which provides valuable insights to improve system's reliability and robustness. At CloudA, fault pattern profiling is performed during the postmortem phase, which involves the classification of incidents' faults into unique categories, referred to as fault pattern. By aggregating and analyzing these fault patter… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024)

  37. arXiv:2402.16810  [pdf

    cs.CL

    OncoGPT: A Medical Conversational Model Tailored with Oncology Domain Expertise on a Large Language Model Meta-AI (LLaMA)

    Authors: Fujian Jia, Xin Liu, Lixi Deng, Jiwen Gu, Chunchao Pu, Tunan Bai, Mengjiang Huang, Yuanzhi Lu, Kang Liu

    Abstract: In the past year, there has been a growing trend in applying Large Language Models (LLMs) to the field of medicine, particularly with the advent of advanced language models such as ChatGPT developed by OpenAI. However, there is limited research on LLMs specifically addressing oncology-related queries. The primary aim of this research was to develop a specialized language model that demonstrates im… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  38. arXiv:2402.15000  [pdf, other

    cs.CL cs.LG

    Divide-or-Conquer? Which Part Should You Distill Your LLM?

    Authors: Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, Yizhe Zhang

    Abstract: Recent methods have demonstrated that Large Language Models (LLMs) can solve reasoning tasks better when they are encouraged to solve subtasks of the main task first. In this paper we devise a similar strategy that breaks down reasoning tasks into a problem decomposition phase and a problem solving phase and show that the strategy is able to outperform a single stage solution. Further, we hypothes… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  39. arXiv:2402.14899  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    Stop Reasoning! When Multimodal LLMs with Chain-of-Thought Reasoning Meets Adversarial Images

    Authors: Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu

    Abstract: Recently, Multimodal LLMs (MLLMs) have shown a great ability to understand images. However, like traditional vision models, they are still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has been widely explored on MLLMs, which not only improves model's performance, but also enhances model's explainability by giving intermediate reasoning steps. Nevertheless, there is… ▽ More

    Submitted 18 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  40. arXiv:2402.14840  [pdf, other

    cs.CL cs.AI stat.AP

    RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

    Authors: Congyun Jin, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

    Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages, 13 figures

  41. arXiv:2402.12958  [pdf, other

    cs.SE

    Go Static: Contextualized Logging Statement Generation

    Authors: Yichen Li, Yintong Huo, Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Pinjia He, Michael R. Lyu

    Abstract: Logging practices have been extensively investigated to assist developers in writing appropriate logging statements for documenting software behaviors. Although numerous automatic logging approaches have been proposed, their performance remains unsatisfactory due to the constraint of the single-method input, without informative programming context outside the method. Specifically, we identify thre… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: This paper was accepted by The ACM International Conference on the Foundations of Software Engineering (FSE 2024)

  42. arXiv:2402.12724  [pdf, other

    stat.ME q-bio.GN stat.AP

    Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression

    Authors: Zhaomeng Chen, Zihuai He, Benjamin B. Chu, Jiaqi Gu, Tim Morrison, Chiara Sabatti, Emmanuel Candès

    Abstract: Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to a… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  43. arXiv:2402.12289  [pdf, other

    cs.CV

    DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

    Authors: Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Yang Wang, Zhiyong Zhao, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

    Abstract: A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of reasoning modules for scen… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Project Page: https://tsinghua-mars-lab.github.io/DriveVLM/

  44. arXiv:2402.11957  [pdf, other

    cs.CV

    Event-Based Motion Magnification

    Authors: Yutian Chen, Shi Guo, Fangzheng Yu, Feng Zhang, Jinwei Gu, Tianfan Xue

    Abstract: Detecting and magnifying imperceptible high-frequency motions in real-world scenarios has substantial implications for industrial and medical applications. These motions are characterized by small amplitudes and high frequencies. Traditional motion magnification methods rely on costly high-speed cameras or active light sources, which limit the scope of their applications. In this work, we propose… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Accepted to ECCV 2024

  45. arXiv:2402.10760  [pdf, other

    q-fin.ST cs.LG

    RAGIC: Risk-Aware Generative Adversarial Model for Stock Interval Construction

    Authors: Jingyi Gu, Wenlu Du, Guiling Wang

    Abstract: Efforts to predict stock market outcomes have yielded limited success due to the inherently stochastic nature of the market, influenced by numerous unpredictable factors. Many existing prediction approaches focus on single-point predictions, lacking the depth needed for effective decision-making and often overlooking market risk. To bridge this gap, we propose a novel model, RAGIC, which introduce… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  46. arXiv:2402.10334  [pdf, other

    cs.CV cs.AI cs.LG

    HI-GAN: Hierarchical Inpainting GAN with Auxiliary Inputs for Combined RGB and Depth Inpainting

    Authors: Ankan Dash, Jingyi Gu, Guiling Wang

    Abstract: Inpainting involves filling in missing pixels or areas in an image, a crucial technique employed in Mixed Reality environments for various applications, particularly in Diminished Reality (DR) where content is removed from a user's visual environment. Existing methods rely on digital replacement techniques which necessitate multiple cameras and incur high costs. AR devices and smartphones use ToF… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  47. arXiv:2402.10110  [pdf, other

    cs.CL cs.AI cs.LG

    Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

    Authors: Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou

    Abstract: Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a no… ▽ More

    Submitted 7 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: ACL2024 (findings), Camera-ready

  48. arXiv:2402.09469  [pdf, other

    cs.LG stat.ML

    Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

    Authors: Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Tianyi Zhou

    Abstract: In the evolving landscape of machine learning, a pivotal challenge lies in deciphering the internal representations harnessed by neural networks and Transformers. Building on recent progress toward comprehending how networks execute distinct target functions, our study embarks on an exploration of the underlying reasons behind networks adopting specific computational strategies. We direct our focu… ▽ More

    Submitted 24 May, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Update Section 5.3; clean up problem setup

  49. arXiv:2402.07393  [pdf, other

    cs.ET cs.AI cs.LG

    TeMPO: Efficient Time-Multiplexed Dynamic Photonic Tensor Core for Edge AI with Compact Slow-Light Electro-Optic Modulator

    Authors: Meng Zhang, Dennis Yin, Nicholas Gangi, Amir Begović, Alexander Chen, Zhaoran Rena Huang, Jiaqi Gu

    Abstract: Electronic-photonic computing systems offer immense potential in energy-efficient artificial intelligence (AI) acceleration tasks due to the superior computing speed and efficiency of optics, especially for real-time, low-energy deep neural network (DNN) inference tasks on resource-restricted edge platforms. However, current optical neural accelerators based on foundry-available devices and conven… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: 17 pages, 19 figures

  50. arXiv:2402.07295  [pdf, other

    cs.LG cs.AI cs.DC

    Training Heterogeneous Client Models using Knowledge Distillation in Serverless Federated Learning

    Authors: Mohak Chadha, Pulkit Khera, Jianfeng Gu, Osama Abboud, Michael Gerndt

    Abstract: Federated Learning (FL) is an emerging machine learning paradigm that enables the collaborative training of a shared global model across distributed clients while keeping the data decentralized. Recent works on designing systems for efficient FL have shown that utilizing serverless computing technologies, particularly Function-as-a-Service (FaaS) for FL, can enhance resource efficiency, reduce tra… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: ACM/SIGAPP Symposium on Applied Computing 2024