Zum Hauptinhalt springen

Showing 1–50 of 508 results for author: Zeng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.16231  [pdf

    physics.optics cs.AI physics.app-ph

    Anchor-Controlled Generative Adversarial Network for High-Fidelity Electromagnetic and Structurally Diverse Metasurface Design

    Authors: Yunhui Zeng, Hongkun Cao, Xin Jin

    Abstract: In optoelectronics, designing free-form metasurfaces presents significant challenges, particularly in achieving high electromagnetic response fidelity due to the complex relationship between physical structures and electromagnetic behaviors. A key difficulty arises from the one-to-many mapping dilemma, where multiple distinct physical structures can yield similar electromagnetic responses, complic… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  2. arXiv:2408.16029  [pdf, other

    cs.LG cs.AI

    Meta-Learn Unimodal Signals with Weak Supervision for Multimodal Sentiment Analysis

    Authors: Sijie Mai, Yu Zhao, Ying Zeng, Jianhua Yao, Haifeng Hu

    Abstract: Multimodal sentiment analysis aims to effectively integrate information from various sources to infer sentiment, where in many cases there are no annotations for unimodal labels. Therefore, most works rely on multimodal labels for training. However, there exists the noisy label problem for the learning of unimodal signals as multimodal annotations are not always the ideal substitutes for the unimo… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  3. arXiv:2408.15578  [pdf, other

    cs.AR

    FireFly-S: Exploiting Dual-Side Sparsity for Spiking Neural Networks Acceleration with Reconfigurable Spatial Architecture

    Authors: Tenglong Li, Jindong Li, Guobin Shen, Dongcheng Zhao, Qian Zhang, Yi Zeng

    Abstract: Spiking Neural Networks (SNNs), with their brain-inspired structure using discrete spikes instead of continuous activations, are gaining attention for their potential of efficient processing on neuromorphic chips. While current SNN hardware accelerators often prioritize temporal spike sparsity, exploiting sparse synaptic weights offers significant untapped potential for even greater efficiency. To… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  4. arXiv:2408.11587  [pdf, other

    cs.CL cs.CR

    Large Language Models are Good Attackers: Efficient and Stealthy Textual Backdoor Attacks

    Authors: Ziqiang Li, Yueqi Zeng, Pengfei Xia, Lei Liu, Zhangjie Fu, Bin Li

    Abstract: With the burgeoning advancements in the field of natural language processing (NLP), the demand for training data has increased significantly. To save costs, it has become common for users and businesses to outsource the labor-intensive task of data collection to third-party entities. Unfortunately, recent research has unveiled the inherent risk associated with this practice, particularly in exposi… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Review

  5. arXiv:2408.10841  [pdf, other

    cs.AI cs.CL

    DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models

    Authors: Yuanhao Zeng, Fei Ren, Xinpeng Zhou, Yihang Wang, Yingxia Shao

    Abstract: Although instruction tuning is widely used to adjust behavior in Large Language Models (LLMs), extensive empirical evidence and research indicates that it is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities. We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  6. arXiv:2408.08588  [pdf, other

    cs.IT eess.SP

    Movable Antenna for Wireless Communications:Prototyping and Experimental Results

    Authors: Zhenjun Dong, Zhiwen Zhou, Zhiqiang Xiao, Chaoyue Zhang, Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA), which can flexibly change the position of antenna in three-dimensional (3D) continuous space, is an emerging technology for achieving full spatial performance gains. In this paper, a prototype of MA communication system with ultra-accurate movement control is presented to verify the performance gain of MA in practical environments. The prototype utilizes the feedback control… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.07694  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    End-to-end Semantic-centric Video-based Multimodal Affective Computing

    Authors: Ronghao Lin, Ying Zeng, Sijie Mai, Haifeng Hu

    Abstract: In the pathway toward Artificial General Intelligence (AGI), understanding human's affection is essential to enhance machine's cognition abilities. For achieving more sensual human-AI interaction, Multimodal Affective Computing (MAC) in human-spoken videos has attracted increasing attention. However, previous methods are mainly devoted to designing multimodal fusion algorithms, suffering from two… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Under Review

  8. arXiv:2408.06186  [pdf, other

    cs.CL cs.LG

    Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting

    Authors: Halley Young, Yimeng Zeng, Jacob Gardner, Osbert Bastani

    Abstract: The capability to generate diverse text is a key challenge facing large language models (LLMs). Thus far, diversity has been studied via metrics such as $n$-gram diversity or diversity of BERT embeddings. However, for these kinds of diversity, the user has little control over the dimensions along which diversity is considered. For example, in the poetry domain, one might desire diversity in terms… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  9. arXiv:2408.02006  [pdf, other

    cs.CL

    LLaSA: Large Language and E-Commerce Shopping Assistant

    Authors: Shuo Zhang, Boci Peng, Xinping Zhao, Boren Hu, Yun Zhu, Yanjia Zeng, Xuming Hu

    Abstract: The e-commerce platform has evolved rapidly due to its widespread popularity and convenience. Developing an e-commerce shopping assistant for customers is crucial to aiding them in quickly finding desired products and recommending precisely what they need. However, most previous shopping assistants face two main problems: (1) task-specificity, which necessitates the development of different models… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by KDD 2024 Workshop (Oral)

  10. arXiv:2408.01952  [pdf, other

    cs.CV

    CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization

    Authors: Xiang He, Xiangxi Liu, Yang Li, Dongcheng Zhao, Guobin Shen, Qingqun Kong, Xin Yang, Yi Zeng

    Abstract: The audio-visual event localization task requires identifying concurrent visual and auditory events from unconstrained videos within a network model, locating them, and classifying their category. The efficient extraction and integration of audio and visual modal information have always been challenging in this field. In this paper, we introduce CACE-Net, which differs from most existing methods t… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024. Code is available at this https://github.com/Brain-Cog-Lab/CACE-Net

  11. arXiv:2408.00906  [pdf, other

    cs.LG cs.AI

    Parkinson's Disease Detection from Resting State EEG using Multi-Head Graph Structure Learning with Gradient Weighted Graph Attention Explanations

    Authors: Christopher Neves, Yong Zeng, Yiming Xiao

    Abstract: Parkinson's disease (PD) is a debilitating neurodegenerative disease that has severe impacts on an individual's quality of life. Compared with structural and functional MRI-based biomarkers for the disease, electroencephalography (EEG) can provide more accessible alternatives for clinical insights. While deep learning (DL) techniques have provided excellent outcomes, many techniques fail to model… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted at MLCN 2024

  12. arXiv:2408.00799  [pdf, other

    cs.IR cs.LG stat.ML

    Deep Uncertainty-Based Explore for Index Construction and Retrieval in Recommendation System

    Authors: Xin Jiang, Kaiqiang Wang, Yinlong Wang, Fengchang Lv, Taiyang Peng, Shuai Yang, Xianteng Wu, Pengye Zhang, Shuo Yuan, Yifan Zeng

    Abstract: In recommendation systems, the relevance and novelty of the final results are selected through a cascade system of Matching -> Ranking -> Strategy. The matching model serves as the starting point of the pipeline and determines the upper bound of the subsequent stages. Balancing the relevance and novelty of matching results is a crucial step in the design and optimization of recommendation systems,… ▽ More

    Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced August 2024.

    Comments: accepted by cikm2024

  13. arXiv:2407.21413  [pdf, ps, other

    cs.GT

    Games in Public Announcement: How to Reduce System Losses in Optimistic Blockchain Mechanisms

    Authors: Siyuan Liu, Yulong Zeng

    Abstract: Announcement games, where information is disseminated by announcers and challenged by validators, are prevalent in real-world scenarios. Validators take effort to verify the validity of the announcements, gaining rewards for successfully challenging invalid ones, while receiving nothing for valid ones. Optimistic Rollup, a Layer 2 blockchain scaling solution, exemplifies such games, offering signi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 31 pages

  14. arXiv:2407.18498  [pdf, other

    cs.CL cs.AI cs.LO

    A Reliable Common-Sense Reasoning Socialbot Built Using LLMs and Goal-Directed ASP

    Authors: Yankai Zeng, Abhiramon Rajashekharan, Kinjal Basu, Huaduo Wang, Joaquín Arias, Gopal Gupta

    Abstract: The development of large language models (LLMs), such as GPT, has enabled the construction of several socialbots, like ChatGPT, that are receiving a lot of attention for their ability to simulate a human conversation. However, the conversation is not guided by a goal and is hard to control. In addition, because LLMs rely more on pattern recognition than deductive reasoning, they can give confusing… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  15. arXiv:2407.17438  [pdf, other

    cs.CV cs.AI cs.LG

    HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

    Authors: Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin

    Abstract: Human image animation involves generating videos from a character photo, allowing user control and unlocking potential for video and movie production. While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance o… ▽ More

    Submitted 28 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: camera controllable human image animation, a dataset and a baseline

  16. arXiv:2407.17436  [pdf, other

    cs.CY cs.AI

    AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

    Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  17. arXiv:2407.17039  [pdf, other

    cs.IT eess.SP

    Integrated Sensing and Communication with Nested Array: Beam Pattern and Performance Analysis

    Authors: Hongqi Min, Chao Feng, Ruoguang Li, Yong Zeng

    Abstract: Towards the upcoming 6G wireless networks, integrated sensing and communication (ISAC) has been identified as one of the typical usage scenarios. To further enhance the performance of ISAC, increasing the number of antennas as well as array aperture is one of the effective approaches. However, simply increasing the number of antennas will increase the cost of radio frequency chains and power consu… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 6 pages, 6 figures

  18. arXiv:2407.13306  [pdf, ps, other

    cs.IT eess.SP

    Group Movable Antenna With Flexible Sparsity: Joint Array Position and Sparsity Optimization

    Authors: Haiquan Lu, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA) is a promising technology to exploit the spatial variation of wireless channel for performance enhancement, by dynamically varying the antenna position within a certain region. However, for multi-antenna communication systems, moving each antenna independently not only requires prohibitive complexity to find the optimal antenna positions, but also incurs sophisticated movement… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures

  19. arXiv:2407.12291  [pdf, other

    cs.CV

    JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation

    Authors: Chenhan Jiang, Yihan Zeng, Tianyang Hu, Songcun Xu, Wei Zhang, Hang Xu, Dit-Yan Yeung

    Abstract: Score Distillation Sampling (SDS) by well-trained 2D diffusion models has shown great promise in text-to-3D generation. However, this paradigm distills view-agnostic 2D image distributions into the rendering distribution of 3D representation for each view independently, overlooking the coherence across views and yielding 3D inconsistency in generations. In this work, we propose \textbf{J}oint \tex… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 29 pages, ECCV2024

  20. arXiv:2407.08701  [pdf, other

    cs.CV

    Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

    Authors: Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen

    Abstract: Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio, thanks to their temporally uni-directional attention mechanism, which models correlations between the current token and previous tokens. However, video streaming remains much less explored, despite a growing need for live video processing. State-of-the-art video diffusion models leverage bi-di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: https://live2diff.github.io/

  21. arXiv:2407.07933  [pdf, other

    stat.ME cs.LG stat.ML

    Identification and Estimation of the Bi-Directional MR with Some Invalid Instruments

    Authors: Feng Xie, Zhen Yao, Lin Xie, Yan Zeng, Zhi Geng

    Abstract: We consider the challenging problem of estimating causal effects from purely observational data in the bi-directional Mendelian randomization (MR), where some invalid instruments, as well as unmeasured confounding, usually exist. To address this problem, most existing methods attempt to find proper valid instrumental variables (IVs) for the target causal effect by expert knowledge or by assuming t… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 27 pages, 6 tables, 7 figures

  22. arXiv:2407.06187  [pdf, other

    cs.CV cs.GR

    JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

    Authors: Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming.… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR 24

  23. arXiv:2407.05355  [pdf, other

    cs.CV cs.CL

    VideoCoT: A Video Chain-of-Thought Dataset with Active Annotation Tool

    Authors: Yan Wang, Yawen Zeng, Jingsheng Zheng, Xiaofen Xing, Jin Xu, Xiangmin Xu

    Abstract: Multimodal large language models (MLLMs) are flourishing, but mainly focus on images with less attention than videos, especially in sub-fields such as prompt engineering, video chain-of-thought (CoT), and instruction tuning on videos. Therefore, we try to explore the collection of CoT datasets in videos to lead to video OpenQA and improve the reasoning ability of MLLMs. Unfortunately, making such… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ACL 2024 Workshop

  24. arXiv:2407.05312  [pdf, other

    cs.CV

    An Improved Method for Personalizing Diffusion Models

    Authors: Yan Zeng, Masanori Suganuma, Takayuki Okatani

    Abstract: Diffusion models have demonstrated impressive image generation capabilities. Personalized approaches, such as textual inversion and Dreambooth, enhance model individualization using specific images. These methods enable generating images of specific objects based on diverse textual contexts. Our proposed approach aims to retain the model's original knowledge during new information integration, res… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  25. arXiv:2407.04404  [pdf

    cs.AR

    Fixed and Movable Antenna Technology for 6G Integrated Sensing and Communication

    Authors: Yong Zeng, Zhenjun Dong, Huizhi Wang, Lipeng Zhu, Ziyao Hong, Qingji Jiang, Dongming Wang, Shi Jin, Rui Zhang

    Abstract: By deploying antenna arrays at the transmitter/receiver to provide additional spatial-domain degrees of freedom (DoFs), multi-antenna technology greatly improves the reliability and efficiency of wireless communication. Meanwhile, the application of multi-antenna technology in the radar field has achieved spatial angle resolution and improved sensing DoF, thus significantly enhancing wireless sens… ▽ More

    Submitted 16 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: in Chinese language

  26. arXiv:2407.03245  [pdf, other

    cs.RO cs.AI eess.SY

    TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

    Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, Lin Shao

    Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More

    Submitted 3 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: fix few typos

  27. arXiv:2407.02648  [pdf, other

    cs.RO

    STRIDE: An Open-Source, Low-Cost, and Versatile Bipedal Robot Platform for Research and Education

    Authors: Yuhao Huang, Yicheng Zeng, Xiaobin Xiong

    Abstract: In this paper, we present STRIDE, a Simple, Terrestrial, Reconfigurable, Intelligent, Dynamic, and Educational bipedal platform. STRIDE aims to propel bipedal robotics research and education by providing a cost-effective implementation with step-by-step instructions for building a bipedal robotic platform while providing flexible customizations via a modular and durable design. Moreover, a versati… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 8 pages, 8 figures

  28. arXiv:2407.01577  [pdf, other

    q-fin.TR cs.AI cs.LG

    MOT: A Mixture of Actors Reinforcement Learning Method by Optimal Transport for Algorithmic Trading

    Authors: Xi Cheng, Jinghao Zhang, Yunan Zeng, Wenfang Xue

    Abstract: Algorithmic trading refers to executing buy and sell orders for specific assets based on automatically identified trading opportunities. Strategies based on reinforcement learning (RL) have demonstrated remarkable capabilities in addressing algorithmic trading problems. However, the trading patterns differ among market conditions due to shifted distribution data. Ignoring multiple patterns in the… ▽ More

    Submitted 2 June, 2024; originally announced July 2024.

    Comments: 13 pages, 5 figures, PAKDD2024 accepted

  29. arXiv:2407.01494  [pdf, other

    cs.CV cs.SD eess.AS

    FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

    Authors: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen

    Abstract: We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page: https://foleycrafter.github.io/

  30. arXiv:2407.01414  [pdf, other

    cs.CV

    StyleShot: A Snapshot on Any Style

    Authors: Junyao Gao, Yanchen Liu, Yanan Sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

    Abstract: In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning. We achieve this through constructing a style-aware encoder and a well-organized style dataset called StyleGallery. With dedicated design for style learning, this style-aware encoder is trained to extract expressive style representation with decoupling training… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: project page:https://styleshot.github.io/

  31. arXiv:2406.20085  [pdf, other

    cs.CV

    Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

    Authors: Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen

    Abstract: Diffusion-based models have shown great potential in generating high-quality images with various layouts, which can benefit downstream perception tasks. However, a fully automatic layout generation driven only by language and a suitable metric for measuring multiple generated instances has not been well explored. In this work, we present Auto Cherry-Picker (ACP), a novel framework that generates h… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: 19 pages, 7 figures

  32. arXiv:2406.19645  [pdf, other

    cs.NE

    Directly Training Temporal Spiking Neural Network with Sparse Surrogate Gradient

    Authors: Yang Li, Feifei Zhao, Dongcheng Zhao, Yi Zeng

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) have attracted much attention due to their event-based computing and energy-efficient features. However, the spiking all-or-none nature has prevented direct training of SNNs for various applications. The surrogate gradient (SG) algorithm has recently enabled spiking neural networks to shine in neuromorphic hardware. However, introducing surrogate gradi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  33. arXiv:2406.17864  [pdf, other

    cs.CY cs.AI

    AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies

    Authors: Yi Zeng, Kevin Klyman, Andy Zhou, Yu Yang, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: We present a comprehensive AI risk taxonomy derived from eight government policies from the European Union, United States, and China and 16 company policies worldwide, making a significant step towards establishing a unified language for generative AI safety evaluation. We identify 314 unique risk categories organized into a four-tiered taxonomy. At the highest level, this taxonomy encompasses Sys… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  34. arXiv:2406.17758  [pdf, other

    cs.CV

    MotionBooth: Motion-Aware Customized Text-to-Video Generation

    Authors: Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

    Abstract: In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements. By leveraging a few images of a specific object, we efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance t… ▽ More

    Submitted 21 August, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: Project page at https://jianzongwu.github.io/projects/motionbooth

  35. arXiv:2406.17092  [pdf, other

    cs.CR cs.AI

    BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

    Authors: Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia

    Abstract: Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relati… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  36. arXiv:2406.14598  [pdf, other

    cs.AI

    SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

    Authors: Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

    Abstract: Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  37. arXiv:2406.12270  [pdf, other

    cs.IT eess.SP

    Sparse MIMO for ISAC: New Opportunities and Challenges

    Authors: Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Linglong Dai, Yifei Yuan, Rui Zhang

    Abstract: Multiple-input multiple-output (MIMO) has been a key technology of wireless communications for decades. A typical MIMO system employs antenna arrays with the inter-antenna spacing being half of the signal wavelength, which we term as compact MIMO. Looking forward towards the future sixth-generation (6G) mobile communication networks, MIMO system will achieve even finer spatial resolution to not on… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  38. arXiv:2406.10126  [pdf, other

    cs.CV

    Training-free Camera Control for Video Generation

    Authors: Chen Hou, Guoqiang Wei, Yan Zeng, Zhibo Chen

    Abstract: We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. Instead, it can be plugged and played with most pretrained video diffusion models and generate camera controllable videos… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  39. arXiv:2406.09389  [pdf, other

    eess.IV cs.CV

    Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

    Authors: Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

    Abstract: Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: https://sagiri0208.github.io

  40. arXiv:2406.07029  [pdf, other

    cs.LG

    Fairness-Aware Meta-Learning via Nash Bargaining

    Authors: Yi Zeng, Xuelin Yang, Li Chen, Cristian Canton Ferrer, Ming Jin, Michael I. Jordan, Ruoxi Jia

    Abstract: To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable conve… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  41. arXiv:2406.06855  [pdf, other

    math.OC cs.LG

    Design and Scheduling of an AI-based Queueing System

    Authors: Jiung Lee, Hongseok Namkoong, Yibo Zeng

    Abstract: To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a jo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  42. arXiv:2406.06717   

    cs.SI cs.HC

    Analyzing user archetypes in Singapore's Telegram groups on COVID-19 and climate change

    Authors: Val Alvern Cueco Ligo, Lan Tianxiang, Ying Zeng, Lam Yin Cheung, Pi Zonooz, Roy Ka-Wei Lee, Koustuv Saha, Edson C. Tandoc Jr., Navin Kumar

    Abstract: Social media platforms, particularly Telegram, play a pivotal role in shaping public perceptions and opinions on global and national issues. Unlike traditional news media, Telegram allows for the proliferation of user-generated content with minimal oversight, making it a significant venue for the spread of controversial and misinformative content. During the COVID-19 pandemic, Telegram's popularit… ▽ More

    Submitted 7 August, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Incomplete data and modification in data analysis

  43. arXiv:2406.05371  [pdf, other

    cs.NE

    Spiking Neural Networks with Consistent Mapping Relations Allow High-Accuracy Inference

    Authors: Yang Li, Xiang He, Qingqun Kong, Yi Zeng

    Abstract: Spike-based neuromorphic hardware has demonstrated substantial potential in low energy consumption and efficient inference. However, the direct training of deep spiking neural networks is challenging, and conversion-based methods still require substantial time delay owing to unresolved conversion errors. We determine that the primary source of the conversion errors stems from the inconsistency bet… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  44. arXiv:2406.03720  [pdf, other

    cs.CV cs.MM

    JIGMARK: A Black-Box Approach for Enhancing Image Watermarks against Diffusion Model Edits

    Authors: Minzhou Pan, Yi Zeng, Xue Lin, Ning Yu, Cho-Jui Hsieh, Peter Henderson, Ruoxi Jia

    Abstract: In this study, we investigate the vulnerability of image watermarks to diffusion-model-based image editing, a challenge exacerbated by the computational cost of accessing gradient information and the closed-source nature of many diffusion models. To address this issue, we introduce JIGMARK. This first-of-its-kind watermarking technique enhances robustness through contrastive learning with pairs of… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  45. arXiv:2406.03438  [pdf, other

    cs.IT eess.SP

    CSI-GPT: Integrating Generative Pre-Trained Transformer with Federated-Tuning to Acquire Downlink Massive MIMO Channels

    Authors: Ye Zeng, Li Qiao, Zhen Gao, Tong Qin, Zhonghuai Wu, Sheng Chen, Mohsen Guizani

    Abstract: In massive multiple-input multiple-output (MIMO) systems, how to reliably acquire downlink channel state information (CSI) with low overhead is challenging. In this work, by integrating the generative pre-trained Transformer (GPT) with federated-tuning, we propose a CSI-GPT approach to realize efficient downlink CSI acquisition. Specifically, we first propose a Swin Transformer-based channel acqui… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  46. arXiv:2406.02913  [pdf, other

    cs.LG cs.AI

    Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

    Authors: Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu

    Abstract: Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO f… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  47. arXiv:2406.01922  [pdf, ps, other

    eess.SP cs.IT

    Performance Analysis of Hybrid Cellular and Cell-free MIMO Network

    Authors: Zhuoyin Dai, Jingran Xu, Xiaoli Xu, Ruoguang Li, Yong Zeng

    Abstract: Cell-free wireless communication is envisioned as one of the most promising network architectures, which can achieve stable and uniform communication performance while improving the system energy and spectrum efficiency. The deployment of cell-free networks is envisioned to be a longterm evolutionary process, in which cell-free access points (APs) will be gradually introduced into the communicatio… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  48. arXiv:2406.01476  [pdf, other

    cs.CV

    DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors

    Authors: Tianyu Huang, Yihan Zeng, Hui Li, Wangmeng Zuo, Rynson W. H. Lau

    Abstract: Dynamic 3D interaction has witnessed great interest in recent works, while creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, and the other is to learn the deformation of static 3D objects with the distillation of video generative models. The former one requires assigning precise physical properties to the target object, otherwise the… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Technical report. Codes are released at: https://github.com/tyhuang0428/DreamPhysics

  49. arXiv:2406.00830  [pdf, other

    cs.CV

    Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection

    Authors: Yang Cao, Yihan Zeng, Hang Xu, Dan Xu

    Abstract: Open-vocabulary 3D Object Detection (OV-3DDet) addresses the detection of objects from an arbitrary list of novel categories in 3D scenes, which remains a very challenging problem. In this work, we propose CoDAv2, a unified framework designed to innovatively tackle both the localization and classification of novel 3D objects, under the condition of limited base categories. For localization, the pr… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: Code Page: https://github.com/yangcaoai/CoDA_NeurIPS2023 This paper has been submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) for possible publication

  50. arXiv:2406.00231  [pdf, other

    cs.IR cs.AI cs.CL

    LLM-RankFusion: Mitigating Intrinsic Inconsistency in LLM-based Ranking

    Authors: Yifan Zeng, Ojas Tendolkar, Raymond Baartmans, Qingyun Wu, Huazheng Wang, Lizhong Chen

    Abstract: Ranking passages by prompting a large language model (LLM) can achieve promising performance in modern information retrieval (IR) systems. A common approach is to sort the ranking list by prompting LLMs for pairwise comparison. However, sorting-based methods require consistent comparisons to correctly sort the passages, which we show that LLMs often violate. We identify two kinds of intrinsic inco… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.