Zum Hauptinhalt springen

Showing 1–50 of 623 results for author: Qi, X

.
  1. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  4. arXiv:2408.10696  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Impact of ALD-Deposited Ultrathin Nitride Layers on Carrier Lifetimes and Photoluminescence Efficiency in CdTe/MgCdTe Double Heterostructures

    Authors: Haris Naeem Abbasi, Xin Qi, Zheng Ju, Zhenqiang Ma, Yong-Hang Zhang

    Abstract: This work evaluates the passivation effectiveness of ultrathin nitride layers (SiNx, AlN, TiN) deposited via atomic layer deposition on CdTe/MgCdTe double heterostructures for solar cell applications. Time-resolved photoluminescence and photoluminescence measurements revealed enhanced carrier lifetimes and reduced surface recombination, indicating improved passivation effectiveness. These results… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages, 4 figures

  5. arXiv:2408.04637  [pdf, other

    cs.CL

    APE: Active Learning-based Tooling for Finding Informative Few-shot Examples for LLM-based Entity Matching

    Authors: Kun Qian, Yisi Sang, Farima Fatahi Bayat, Anton Belyi, Xianqi Chu, Yash Govind, Samira Khorshidi, Rahul Khot, Katherine Luna, Azadeh Nikfarjam, Xiaoguang Qi, Fei Wu, Xianhan Zhang, Yunyao Li

    Abstract: Prompt engineering is an iterative procedure often requiring extensive manual effort to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and effective approach to providing LLMs with precise instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstratio… ▽ More

    Submitted 29 July, 2024; originally announced August 2024.

    Comments: 3 pages, Proceedings of the Fifth Workshop on Data Science with Human-in-the-Loop (DaSH 2024)

  6. arXiv:2408.02503  [pdf, other

    cs.CL

    UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model

    Authors: Zhaowei Li, Wei Wang, YiQing Cai, Xu Qi, Pengyu Wang, Dong Zhang, Hang Song, Botian Jiang, Zhida Huang, Tao Wang

    Abstract: Significant advancements has recently been achieved in the field of multi-modal large language models (MLLMs), demonstrating their remarkable capabilities in understanding and reasoning across diverse tasks. However, these models are often trained for specific tasks and rely on task-specific input-output formats, limiting their applicability to a broader range of tasks. This raises a fundamental q… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  7. arXiv:2408.01419  [pdf, other

    cs.CL

    DebateQA: Evaluating Question Answering on Debatable Knowledge

    Authors: Rongwu Xu, Xuan Qi, Zehan Qi, Wei Xu, Zhijiang Guo

    Abstract: The rise of large language models (LLMs) has enabled us to seek answers to inherently debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability. However, traditional QA benchmarks assume fixed answers are inadequate for this purpose. To address this, we introduce DebateQA, a dataset of 2,941 debatable questions, each accompanied by multiple human-annotated partial… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Dataset and scripts for evaluation are available at https://github.com/pillowsofwind/DebateQA

  8. arXiv:2407.21659  [pdf, other

    cs.CL

    Defending Jailbreak Attack in VLMs via Cross-modality Information Detector

    Authors: Yue Xu, Xiuyuan Qi, Zhan Qin, Wenjie Wang

    Abstract: Vision Language Models (VLMs) extend the capacity of LLMs to comprehensively understand vision information, achieving remarkable performance in many vision-centric tasks. Despite that, recent studies have shown that these models are susceptible to jailbreak attacks, which refer to an exploitative technique where malicious users can break the safety alignment of the target model and generate mislea… ▽ More

    Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures

  9. arXiv:2407.21292  [pdf, other

    hep-ph

    Dark matter, leptogenesis and Z' in the B-L model

    Authors: XinXin Qi, Hao Sun

    Abstract: We discuss the interplay between dark matter, leptogenesis and a new gauge boson $Z^\prime$ in the $B-L$ model. A fermion dark matter $χ$ carrying $U(1)_{B-L}$ charge is introduced to the model but not coupling with other particles. We consider the freeze-out and freeze-in mechanisms, and obtain the correct relic density respectively. We have scanned the feasible parameter space and found that the… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  10. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  11. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  12. arXiv:2407.15427  [pdf, other

    cs.CV cs.AI

    YOLO-pdd: A Novel Multi-scale PCB Defect Detection Method Using Deep Representations with Sequential Images

    Authors: Bowen Liu, Dongjie Chen, Xiao Qi

    Abstract: With the rapid growth of the PCB manufacturing industry, there is an increasing demand for computer vision inspection to detect defects during production. Improving the accuracy and generalization of PCB defect detection models remains a significant challenge. This paper proposes a high-precision, robust, and real-time end-to-end method for PCB defect detection based on deep Convolutional Neural N… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  13. SNNGX: Securing Spiking Neural Networks with Genetic XOR Encryption on RRAM-based Neuromorphic Accelerator

    Authors: Kwunhang Wong, Songqi Wang, Wei Huang, Xinyuan Zhang, Yangu He, Karl M. H. Lai, Yuzhong Jiao, Ning Lin, Xiaojuan Qi, Xiaoming Chen, Zhongrui Wang

    Abstract: Biologically plausible Spiking Neural Networks (SNNs), characterized by spike sparsity, are growing tremendous attention over intellectual edge devices and critical bio-medical applications as compared to artificial neural networks (ANNs). However, there is a considerable risk from malicious attempts to extract white-box information (i.e., weights) from SNNs, as attackers could exploit well-traine… ▽ More

    Submitted 26 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: International Conference on Computer-Aided Design 2024

  14. arXiv:2407.15116  [pdf, other

    hep-ph

    Z_5 two-component dark matter in the Type-II seesaw mechanism

    Authors: XinXin Qi, Hao Sun

    Abstract: We consider the Z5 two-component dark matter model within the framework of the Type-II seesaw mechanism. Due to the new annihilation processes related to triplets, the light component cannot necessarily be dominant in the dark matter relic density, which is different from the Z5 two-component dark matter model in the SM. The model is considered to explain the excess of electron-positron flux measu… ▽ More

    Submitted 26 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2006.14922 by other authors

  15. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Shuchen Shi, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 31 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description and results

  16. arXiv:2407.08990  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

    Authors: Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: In press

  17. arXiv:2407.08167  [pdf, other

    eess.IV cs.CV

    DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification

    Authors: Yuan Zhang, Yaolei Qi, Xiaoming Qi, Yongyue Wei, Guanyu Yang

    Abstract: The precise subtype classification of myeloproliferative neoplasms (MPNs) based on multimodal information, which assists clinicians in diagnosis and long-term treatment plans, is of great clinical significance. However, it remains a great challenging task due to the lack of diagnostic representativeness for local patches and the absence of diagnostic-relevant features from a single modality. In th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  18. arXiv:2407.07479  [pdf, other

    cs.CV

    How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

    Authors: Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu

    Abstract: Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  19. arXiv:2407.07478  [pdf, other

    cs.CV

    EA-VTR: Event-Aware Video-Text Retrieval

    Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

    Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  20. arXiv:2407.05421  [pdf, other

    eess.AS cs.SD

    ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

    Authors: Ruibo Fu, Xin Qi, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Zhiyong Wang, Yi Lu, Xiaopeng Wang, Shuchen Shi, Yukun Liu, Xuefei Liu, Shuai Zhang

    Abstract: Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: The audio demo is available at https://7xin.github.io/ASRRL/

  21. arXiv:2407.03162  [pdf, other

    cs.RO cs.CV cs.LG

    Bunny-VisionPro: Real-Time Bimanual Dexterous Teleoperation for Imitation Learning

    Authors: Runyu Ding, Yuzhe Qin, Jiyue Zhu, Chengzhe Jia, Shiqi Yang, Ruihan Yang, Xiaojuan Qi, Xiaolong Wang

    Abstract: Teleoperation is a crucial tool for collecting human demonstrations, but controlling robots with bimanual dexterous hands remains a challenge. Existing teleoperation systems struggle to handle the complexity of coordinating two hands for intricate manipulations. We introduce Bunny-VisionPro, a real-time bimanual dexterous teleoperation system that leverages a VR headset. Unlike previous vision-bas… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: project page: https://dingry.github.io/projects/bunny_visionpro.html

  22. arXiv:2407.00598  [pdf

    physics.chem-ph

    Direct Observation of Morphological and Chemical Changes During the Oxidation of Model Inorganic Ligand-Capped Particles

    Authors: Maximilian Jaugstetter, Xiao Qi, Emory Chan, Miquel Salmeron, Kevin R. Wilson, Slavomír Nemšák, Hendrik Bluhm

    Abstract: Functionalization and volatilization are competing reactions during the oxidation of carbonaceous materials and are important processes in many different areas of science and technology. Here we present a combined ambient pressure X-ray photoelectron spectroscopy (APXPS) and grazing incidence X-ray scattering (GIXS) investigation of the oxidation of oleic acid ligands surrounding NaYF4 nanoparticl… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 18 pages, 8 figures

  23. arXiv:2407.00367  [pdf, other

    cs.CV

    SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix

    Authors: Peng Dai, Feitong Tan, Qiangeng Xu, David Futschik, Ruofei Du, Sean Fanello, Xiaojuan Qi, Yinda Zhang

    Abstract: Video generation models have demonstrated great capabilities of producing impressive monocular videos, however, the generation of 3D stereoscopic video remains under-explored. We propose a pose-free and training-free approach for generating 3D stereoscopic videos using an off-the-shelf monocular video generation model. Our method warps a generated monocular video into camera views on stereoscopic… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 3D stereoscopic video generation, video diffusion, inpainting

  24. arXiv:2406.19568  [pdf, other

    cs.CV cs.AI

    What Matters in Detecting AI-Generated Videos like Sora?

    Authors: Chirui Chang, Zhengzhe Liu, Xiaoyang Lyu, Xiaojuan Qi

    Abstract: Recent advancements in diffusion-based video generation have showcased remarkable results, yet the gap between synthetic and real-world videos remains under-explored. In this study, we examine this gap from three fundamental perspectives: appearance, motion, and geometry, comparing real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. To achieve this, we tr… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  25. arXiv:2406.18074  [pdf, other

    cs.CV cs.AI

    Few-Shot Medical Image Segmentation with High-Fidelity Prototypes

    Authors: Song Tang, Shaxu Yan, Xiaozhi Qi, Jianxin Gao, Mao Ye, Jianwei Zhang, Xiatian Zhu

    Abstract: Few-shot Semantic Segmentation (FSS) aims to adapt a pretrained model to new classes with as few as a single labelled training sample per class. Despite the prototype based approaches have achieved substantial success, existing models are limited to the imaging scenarios with considerably distinct objects and not highly complex background, e.g., natural images. This makes such models suboptimal fo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  26. arXiv:2406.17801  [pdf, other

    cs.SD cs.CL eess.AS

    A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

    Authors: Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu

    Abstract: This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  27. arXiv:2406.16797  [pdf, other

    cs.CL cs.AI

    Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

    Authors: Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal

    Abstract: Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic forgetting of earlier tasks, make it challenging to obtain good performance on multiple tasks at the same time. To mitigate this, we propose Lottery Ti… ▽ More

    Submitted 25 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  28. arXiv:2406.14598  [pdf, other

    cs.AI

    SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

    Authors: Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

    Abstract: Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  29. arXiv:2406.13870  [pdf, other

    cs.CV

    Splatter a Video: Video Gaussian Representation for Versatile Processing

    Authors: Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a no… ▽ More

    Submitted 26 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  30. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  31. arXiv:2406.08112  [pdf, other

    cs.SD cs.AI eess.AS

    Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

    Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

    Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

  32. arXiv:2406.07648  [pdf, other

    cs.CV

    M-LRM: Multi-view Large Reconstruction Model

    Authors: Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Xiaowei Chi, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Despite recent advancements in the Large Reconstruction Model (LRM) demonstrating impressive results, when extending its input from single image to multiple images, it exhibits inefficiencies, subpar geometric and texture quality, as well as slower convergence speed than expected. It is attributed to that, LRM formulates 3D reconstruction as a naive images-to-3D translation problem, ignoring the… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  33. arXiv:2406.06927  [pdf, ps, other

    math.GR math.GN

    Some generalized metric properties of $n$-semitopological groups

    Authors: Fucai Lin, Xixi Qi

    Abstract: A semitopological group $G$ is called {\it an $n$-semitopological group}, if for any $g\in G$ with $e\not\in\overline{\{g\}}$ there is a neighborhood $W$ of $e$ such that $g\not\in W^{n}$, where $n\in\mathbb{N}$. The class of $n$-semitopological groups ($n\geq 2$) contains the class of paratopological groups and Hausdorff quasi-topological groups. Fix any $n\in\mathbb{N}$. Some properties of $n$-s… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 16 pages

    MSC Class: Primary 54H11; 22A05; secondary 54A25; 54B15; 54E35

  34. arXiv:2406.05946  [pdf, other

    cs.CR cs.AI

    Safety Alignment Should Be Made More Than Just a Few Tokens Deep

    Authors: Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson

    Abstract: The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens.… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  35. arXiv:2406.04683  [pdf, other

    cs.SD eess.AS

    PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

    Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

    Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  36. arXiv:2406.03247  [pdf, other

    cs.SD eess.AS

    Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

    Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

    Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  37. arXiv:2406.03237  [pdf, other

    cs.SD eess.AS

    Generalized Fake Audio Detection via Deep Stable Learning

    Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

    Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate t… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  38. arXiv:2406.00589  [pdf, other

    cs.CV cs.AI

    Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection

    Authors: Zhuang Qi, Junlin Zhang, Xin Qi

    Abstract: Visual tracking fundamentally involves regressing the state of the target in each frame of a video. Despite significant progress, existing regression-based trackers still tend to experience failures and inaccuracies. To enhance the precision of target estimation, this paper proposes a tracking technique based on robust regression. Firstly, we introduce a novel robust linear regression estimator, w… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  39. arXiv:2405.21070  [pdf, other

    cs.CV cs.CL cs.LG

    Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

    Authors: Xin Wen, Bingchen Zhao, Yilun Chen, Jiangmiao Pang, Xiaojuan Qi

    Abstract: Severe data imbalance naturally exists among web-scale vision-language datasets. Despite this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance compared to supervised learning, and demonstrates significant effectiveness in learning generalizable representations. With an aim to investigate the reasons behind this finding, we conduct controlled experiments to stud… ▽ More

    Submitted 14 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  40. arXiv:2405.20099  [pdf, other

    cs.CR

    Defensive Prompt Patch: A Robust and Interpretable Defense of LLMs against Jailbreak Attacks

    Authors: Chen Xiong, Xiangyu Qi, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Safety, security, and compliance are essential requirements when aligning large language models (LLMs). However, many seemingly aligned LLMs are soon shown to be susceptible to jailbreak attacks. These attacks aim to circumvent the models' safety guardrails and security mechanisms by introducing jailbreak prompts into malicious queries. In response to these challenges, this paper introduces Defens… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  41. arXiv:2405.20046  [pdf, other

    cs.AI

    Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning

    Authors: Zhuang Qi, Lei Meng, Weihao He, Ruohan Zhang, Yu Wang, Xin Qi, Xiangxu Meng

    Abstract: Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global know… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  42. arXiv:2405.19606  [pdf, other

    cs.AI

    Relation Modeling and Distillation for Learning with Noisy Labels

    Authors: Xiaming Che, Junlin Zhang, Zhuang Qi, Xin Qi

    Abstract: Learning with noisy labels has become an effective strategy for enhancing the robustness of models, which enables models to better tolerate inaccurate data. Existing methods either focus on optimizing the loss function to mitigate the interference from noise, or design procedures to detect potential noise and correct errors. However, their effectiveness is often compromised in representation learn… ▽ More

    Submitted 1 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  43. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  44. arXiv:2405.19247  [pdf, other

    cs.LG

    Comparative Study of Neighbor-based Methods for Local Outlier Detection

    Authors: Zhuang Qi, Junlin Zhang, Xiaming Chen, Xin Qi

    Abstract: The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.18253  [pdf, other

    cs.LG cs.GT

    Truthful Dataset Valuation by Pointwise Mutual Information

    Authors: Shuran Zheng, Yongchan Kwon, Xuan Qi, James Zou

    Abstract: A common way to evaluate a dataset in ML involves training a model on this dataset and assessing the model's performance on a test set. However, this approach has two issues: (1) it may incentivize undesirable data manipulation in data marketplaces, as the self-interested data providers seek to modify the dataset to maximize their evaluation scores; (2) it may select datasets that overfit to poten… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  46. arXiv:2405.17792  [pdf, other

    hep-ex hep-ph

    JUNO Sensitivity to Invisible Decay Modes of Neutrons

    Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

    Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 28 pages, 7 figures, 4 tables

  47. arXiv:2405.16874  [pdf, other

    cs.CV

    CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

    Authors: Xingqun Qi, Hengyuan Zhang, Yatian Wang, Jiahao Pan, Chen Liu, Peng Li, Xiaowei Chi, Mengfei Li, Qixun Zhang, Wei Xue, Shanghang Zhang, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: Deriving co-speech 3D gestures has seen tremendous progress in virtual avatar animation. Yet, the existing methods often produce stiff and unreasonable gestures with unseen human speech inputs due to the limited 3D speech-gesture data. In this paper, we propose CoCoGesture, a novel framework enabling vivid and diverse gesture synthesis from unseen human speech prompts. Our key insight is built upo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: The dataset will be released as soon as possible

  48. arXiv:2405.16810  [pdf

    cs.CL

    Performance evaluation of Reddit Comments using Machine Learning and Natural Language Processing methods in Sentiment Analysis

    Authors: Xiaoxia Zhang, Xiuyuan Qi, Zixin Teng

    Abstract: Sentiment analysis, an increasingly vital field in both academia and industry, plays a pivotal role in machine learning applications, particularly on social media platforms like Reddit. However, the efficacy of sentiment analysis models is hindered by the lack of expansive and fine-grained emotion datasets. To address this gap, our study leverages the GoEmotions dataset, comprising a diverse range… ▽ More

    Submitted 28 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures, to be published in Computational and Experimental Simulations in Engineering - Proceedings of ICCES 2024 - Volume 2

  49. arXiv:2405.14946  [pdf, other

    hep-ph astro-ph.CO

    Diffuse Boosted Cosmic Neutrino Background

    Authors: Gonzalo Herrera, Shunsaku Horiuchi, Xiaolin Qi

    Abstract: Energetic cosmic rays scatter off the cosmic neutrino background throughout the history of the Universe, yielding a diffuse flux of cosmic relic neutrinos boosted to high energies. We calculate this flux under different assumptions of the cosmic-ray flux spectral slope and redshift evolution. The non-observation of the diffuse flux of boosted relic neutrinos with current high-energy neutrino exper… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  50. arXiv:2405.14917  [pdf, other

    cs.LG cs.CL

    SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models

    Authors: Wei Huang, Haotong Qin, Yangdong Liu, Yawei Li, Xianglong Liu, Luca Benini, Michele Magno, Xiaojuan Qi

    Abstract: Large language models (LLMs) achieve remarkable performance in natural language understanding but require substantial computation and memory resources. Post-training quantization (PTQ) is a powerful compression technique extensively investigated in LLMs. However, existing PTQ methods are still not ideal in terms of accuracy and efficiency, especially with below 4 bit-widths. Standard PTQ methods u… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages