Zum Hauptinhalt springen

Showing 1–50 of 232 results for author: Tae, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17009  [pdf, other

    cs.SD eess.AS

    Utilizing Speaker Profiles for Impersonation Audio Detection

    Authors: Hao Gu, JiangYan Yi, Chenglong Wang, Yong Ren, Jianhua Tao, Xinrui Yan, Yujie Chen, Xiaohui Zhang

    Abstract: Fake audio detection is an emerging active topic. A growing number of literatures have aimed to detect fake utterance, which are mostly generated by Text-to-speech (TTS) or voice conversion (VC). However, countermeasures against impersonation remain an underexplored area. Impersonation is a fake type that involves an imitator replicating specific traits and speech style of a target speaker. Unlike… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  2. arXiv:2408.13533  [pdf, other

    cs.CL

    Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models

    Authors: Jinyang Wu, Feihu Che, Chuyuan Zhang, Jianhua Tao, Shuai Zhang, Pengpeng Shao

    Abstract: Retrieval-Augmented Generation (RAG) has emerged as a crucial method for addressing hallucinations in large language models (LLMs). While recent research has extended RAG models to complex noisy scenarios, these explorations often confine themselves to limited noise types and presuppose that noise is inherently detrimental to LLMs, potentially deviating from real-world retrieval environments and r… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  3. arXiv:2408.12558  [pdf, other

    cs.MM

    Exploring the Role of Audio in Multimodal Misinformation Detection

    Authors: Moyang Liu, Yukun Liu, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Xuefei Liu, Guanjun Li

    Abstract: With the rapid development of deepfake technology, especially the deep audio fake technology, misinformation detection on the social media scene meets a great challenge. Social media data often contains multimodal information which includes audio, video, text, and images. However, existing multimodal misinformation detection methods tend to focus only on some of these modalities, failing to compre… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  4. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  7. arXiv:2408.09746  [pdf, other

    cs.CV cs.AI

    Enhanced Cascade Prostate Cancer Classifier in mp-MRI Utilizing Recall Feedback Adaptive Loss and Prior Knowledge-Based Feature Extraction

    Authors: Kun Luo, Bowen Zheng, Shidong Lv, Jie Tao, Qiang Wei

    Abstract: Prostate cancer is the second most common cancer in males worldwide, and mpMRI is commonly used for diagnosis. However, interpreting mpMRI is challenging and requires expertise from radiologists. This highlights the urgent need for automated grading in mpMRI. Existing studies lack integration of clinical prior information and suffer from uneven training sample distribution due to prevalence. There… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  8. arXiv:2408.09422  [pdf, other

    cs.CL cs.AI

    Distinguish Confusion in Legal Judgment Prediction via Revised Relation Knowledge

    Authors: Nuo Xu, Pinghui Wang, Junzhou Zhao, Feiyang Sun, Lin Lan, Jing Tao, Li Pan, Xiaohong Guan

    Abstract: Legal Judgment Prediction (LJP) aims to automatically predict a law case's judgment results based on the text description of its facts. In practice, the confusing law articles (or charges) problem frequently occurs, reflecting that the law cases applicable to similar articles (or charges) tend to be misjudged. Although some recent works based on prior knowledge solve this issue well, they ignore t… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM TOIS

  9. arXiv:2408.06592  [pdf, other

    cs.CV

    ActiveNeRF: Learning Accurate 3D Geometry by Active Pattern Projection

    Authors: Jianyu Tao, Changping Hu, Edward Yang, Jing Xu, Rui Chen

    Abstract: NeRFs have achieved incredible success in novel view synthesis. However, the accuracy of the implicit geometry is unsatisfactory because the passive static environmental illumination has low spatial frequency and cannot provide enough information for accurate geometry reconstruction. In this work, we propose ActiveNeRF, a 3D geometry reconstruction framework, which improves the geometry quality of… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 18 pages, 10 figures

  10. arXiv:2408.05758  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

    Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

    Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  11. arXiv:2408.05131  [pdf, other

    cs.LG

    Range Membership Inference Attacks

    Authors: Jiashu Tao, Reza Shokri

    Abstract: Machine learning models can leak private information about their training data, but the standard methods to measure this risk, based on membership inference attacks (MIAs), have a major limitation. They only check if a given data point \textit{exactly} matches a training point, neglecting the potential of similar or partially overlapping data revealing the same private information. To address this… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  12. arXiv:2408.04967  [pdf, other

    eess.AS cs.SD

    ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

    Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

    Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  13. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  14. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  15. arXiv:2407.11494  [pdf, other

    cs.CV

    Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction

    Authors: Guowei Xu, Jiale Tao, Wen Li, Lixin Duan

    Abstract: In the realm of stochastic human motion prediction (SHMP), researchers have often turned to generative models like GANS, VAEs and diffusion models. However, most previous approaches have struggled to accurately predict motions that are both realistic and coherent with past motion due to a lack of guidance on the latent distribution. In this paper, we introduce Semantic Latent Directions (SLD) as a… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  16. arXiv:2407.08239  [pdf, other

    cs.SD cs.LG eess.AS

    An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio

    Authors: Siding Zeng, Jiangyan Yi, Jianhua Tao, Yujie Chen, Shan Liang, Yong Ren, Xiaohui Zhang

    Abstract: When the task of locating manipulation regions in partially-fake audio (PFA) involves cross-domain datasets, the performance of deep learning models drops significantly due to the shift between the source and target domains. To address this issue, existing approaches often employ data augmentation before training. However, they overlook the characteristics in target domain that are absent in sourc… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  17. arXiv:2407.07653  [pdf, other

    cs.HC

    AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Jiangyan Yi, Bin Liu, Jianhua Tao

    Abstract: Explainable Multimodal Emotion Recognition (EMER) is an emerging task that aims to achieve reliable and accurate emotion recognition. However, due to the high annotation cost, the existing dataset (denoted as EMER-Fine) is small, making it difficult to perform supervised training. To reduce the annotation cost and expand the dataset size, this paper reviews the previous dataset construction proces… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  18. arXiv:2407.05421  [pdf, other

    eess.AS cs.SD

    ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

    Authors: Ruibo Fu, Xin Qi, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Zhiyong Wang, Yi Lu, Xiaopeng Wang, Shuchen Shi, Yukun Liu, Xuefei Liu, Shuai Zhang

    Abstract: Speaker adaptation, which involves cloning voices from unseen speakers in the Text-to-Speech task, has garnered significant interest due to its numerous applications in multi-media fields. Despite recent advancements, existing methods often struggle with inadequate speaker representation accuracy and overfitting, particularly in limited reference speeches scenarios. To address these challenges, we… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: The audio demo is available at https://7xin.github.io/ASRRL/

  19. arXiv:2407.02042  [pdf, other

    cs.CL cs.AI

    Fake News Detection and Manipulation Reasoning via Large Vision-Language Models

    Authors: Ruihan Jin, Ruibo Fu, Zhengqi Wen, Shuai Zhang, Yukun Liu, Jianhua Tao

    Abstract: Fake news becomes a growing threat to information security and public opinion with the rapid sprawl of media manipulation. Therefore, fake news detection attracts widespread attention from academic community. Traditional fake news detection models demonstrate remarkable performance on authenticity binary classification but their ability to reason detailed faked traces based on the news content rem… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  20. arXiv:2406.10591  [pdf, other

    eess.AS cs.AI cs.CV cs.MM cs.SD

    MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

    Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

    Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  21. arXiv:2406.08112  [pdf, other

    cs.SD cs.AI eess.AS

    Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

    Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

    Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

  22. arXiv:2406.07381  [pdf, other

    cs.AI cs.LG

    World Models with Hints of Large Language Models for Goal Achieving

    Authors: Zeyuan Liu, Ziyu Huan, Xiyao Wang, Jiafei Lyu, Jian Tao, Xiu Li, Furong Huang, Huazhe Xu

    Abstract: Reinforcement learning struggles in the face of long-horizon tasks and sparse goals due to the difficulty in manual reward specification. While existing methods address this by adding intrinsic rewards, they may fail to provide meaningful guidance in long-horizon decision-making tasks with large state and action spaces, lacking purposeful exploration. Inspired by human cognition, we propose a new… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  23. arXiv:2406.06086  [pdf, other

    cs.SD eess.AS

    RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

    Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  24. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 5 August, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  25. arXiv:2406.04683  [pdf, other

    cs.SD eess.AS

    PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

    Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

    Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  26. arXiv:2406.04027  [pdf, other

    cs.CR cs.SE

    PowerPeeler: A Precise and General Dynamic Deobfuscation Method for PowerShell Scripts

    Authors: Ruijie Li, Chenyang Zhang, Huajun Chai, Lingyun Ying, Haixin Duan, Jun Tao

    Abstract: PowerShell is a powerful and versatile task automation tool. Unfortunately, it is also widely abused by cyber attackers. To bypass malware detection and hinder threat analysis, attackers often employ diverse techniques to obfuscate malicious PowerShell scripts. Existing deobfuscation tools suffer from the limitation of static analysis, which fails to simulate the real deobfuscation process accurat… ▽ More

    Submitted 19 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: To appear in the ACM CCS 2024

  27. arXiv:2406.03247  [pdf, other

    cs.SD eess.AS

    Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

    Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

    Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation… ▽ More

    Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  28. arXiv:2406.03240  [pdf, other

    cs.SD cs.AI eess.AS

    Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy

    Authors: Yuankun Xie, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Xiaopeng Wang, Haonnan Cheng, Long Ye, Jianhua Tao

    Abstract: With the proliferation of deepfake audio, there is an urgent need to investigate their attribution. Current source tracing methods can effectively distinguish in-distribution (ID) categories. However, the rapid evolution of deepfake algorithms poses a critical challenge in the accurate identification of out-of-distribution (OOD) novel deepfake algorithms. In this paper, we propose Real Emphasis an… ▽ More

    Submitted 8 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  29. arXiv:2406.03237  [pdf, other

    cs.SD eess.AS

    Generalized Fake Audio Detection via Deep Stable Learning

    Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

    Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate t… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: accepted by INTERSPEECH2024

  30. arXiv:2406.00504  [pdf

    cs.RO cs.AI

    Research on an Autonomous UAV Search and Rescue System Based on the Improved

    Authors: Haobin Chen, Junyu Tao, Bize Zhou, Xiaoyan Liu

    Abstract: The demand is to solve the issue of UAV (unmanned aerial vehicle) operating autonomously and implementing practical functions such as search and rescue in complex unknown environments. This paper proposes an autonomous search and rescue UAV system based on an EGO-Planner algorithm, which is improved by innovative UAV body application and takes the methods of inverse motor backstepping to enhance t… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: 2024 5th International Conference on Computer Engineering and Application

  31. arXiv:2405.20914  [pdf, other

    cs.CR

    RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs

    Authors: Zuyan Wang, Jun Tao, Dika Zou

    Abstract: The growing popular awareness of personal privacy raises the following quandary: what is the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices. Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes. Very recent work has taken steps towards relaxing the assumption by… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: 14 pages, 19 figures

  32. arXiv:2405.14913  [pdf, other

    stat.ME cs.LG math.PR stat.ML

    High Rank Path Development: an approach of learning the filtration of stochastic processes

    Authors: Jiajie Tao, Hao Ni, Chong Liu

    Abstract: Since the weak convergence for stochastic processes does not account for the growth of information over time which is represented by the underlying filtration, a slightly erroneous stochastic model in weak topology may cause huge loss in multi-periods decision making problems. To address such discontinuities Aldous introduced the extended weak convergence, which can fully characterise all essentia… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  33. arXiv:2405.10576  [pdf, other

    cs.RO

    An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems

    Authors: Jiyue Tao, Yunsong Zhang, Sunil Kumar Rajendran, Feitian Zhang, Dexin Zhao, Tongsheng Shen

    Abstract: Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alte… ▽ More

    Submitted 7 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  34. arXiv:2405.08596  [pdf, ps, other

    cs.SD eess.AS

    Towards Robust Audio Deepfake Detection: A Evolving Benchmark for Continual Learning

    Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao

    Abstract: The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts… ▽ More

    Submitted 13 August, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  35. arXiv:2405.05741  [pdf, ps, other

    cs.CL cs.AI

    Can large language models understand uncommon meanings of common words?

    Authors: Jinyang Wu, Feihu Che, Xinxin Zheng, Shuai Zhang, Ruihan Jin, Shuai Nie, Pengpeng Shao, Jianhua Tao

    Abstract: Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. P… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  36. arXiv:2405.04880  [pdf, other

    cs.SD cs.AI eess.AS

    The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio

    Authors: Yuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi Sun

    Abstract: With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively detect ALM-based deepfake audio, we focus on… ▽ More

    Submitted 15 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  37. arXiv:2404.17113  [pdf, other

    cs.LG cs.HC

    MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing the dataset size and building more effective algorithms. However, due to problems such as complex environments and inaccurate annotations, current systems are hard to meet the demands of practical applications. Therefore, we or… ▽ More

    Submitted 18 July, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  38. arXiv:2404.15660  [pdf, other

    cs.CL

    KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

    Authors: Xinxin Zheng, Feihu Che, Jinyang Wu, Shuai Zhang, Shuai Nie, Kang Liu, Jianhua Tao

    Abstract: Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise i… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  39. arXiv:2404.09606  [pdf, other

    cs.LG cs.AI q-bio.QM

    A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions

    Authors: Pengfei Liu, Jun Tao, Zhixiang Ren

    Abstract: The task of chemical reaction predictions (CRPs) plays a pivotal role in advancing drug discovery and material science. However, its effectiveness is constrained by the vast and uncertain chemical reaction space and challenges in capturing reaction selectivity, particularly due to existing methods' limitations in exploiting the data's inherent knowledge. To address these challenges, we introduce a… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  40. arXiv:2404.07454  [pdf, other

    cs.LG cs.NI

    Representation Learning of Tangled Key-Value Sequence Data for Early Classification

    Authors: Tao Duan, Junzhou Zhao, Shuo Zhang, Jing Tao, Pinghui Wang

    Abstract: Key-value sequence data has become ubiquitous and naturally appears in a variety of real-world applications, ranging from the user-product purchasing sequences in e-commerce, to network packet sequences forwarded by routers in networking. Classifying these key-value sequences is important in many scenarios such as user profiling and malicious applications identification. In many time-sensitive sce… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, 31 figures, Accepted by ICDE2024

  41. arXiv:2404.01089  [pdf, other

    cs.CV cs.AI

    Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On

    Authors: Xu Yang, Changxing Ding, Zhibin Hong, Junhao Huang, Jin Tao, Xiangmin Xu

    Abstract: Image-based virtual try-on is an increasingly important task for online shopping. It aims to synthesize images of a specific person wearing a specified garment. Diffusion model-based approaches have recently become popular, as they are excellent at image synthesis tasks. However, these approaches usually employ additional image encoders and rely on the cross-attention mechanism for texture transfe… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  42. arXiv:2403.15044  [pdf, other

    cs.CV cs.AI

    Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

    Authors: Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  43. arXiv:2403.01318  [pdf, other

    stat.ML cs.LG econ.EM

    High-Dimensional Tail Index Regression: with An Application to Text Analyses of Viral Posts in Social Media

    Authors: Yuya Sasaki, Jing Tao, Yulong Wang

    Abstract: Motivated by the empirical power law of the distributions of credits (e.g., the number of "likes") of viral posts in social media, we introduce the high-dimensional tail index regression and methods of estimation and inference for its parameters. We propose a regularized estimator, establish its consistency, and derive its convergence rate. To conduct inference, we propose to debias the regularize… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  44. arXiv:2402.11432  [pdf, other

    cs.CL

    Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning

    Authors: Kang Chen, Zheng Lian, Haiyang Sun, Rui Liu, Jiangyan Yi, Bin Liu, Jianhua Tao

    Abstract: Deception detection has attracted increasing attention due to its importance in real-world scenarios. Its main goal is to detect deceptive behaviors from multimodal clues such as gestures, facial expressions, prosody, etc. However, these bases are usually subjective and related to personal habits. Therefore, we extend deception detection to deception reasoning, further providing objective evidence… ▽ More

    Submitted 13 August, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

  45. arXiv:2402.11082  [pdf, other

    cs.CR cs.AI

    The AI Security Pyramid of Pain

    Authors: Chris M. Ward, Josh Harguess, Julia Tao, Daniel Christman, Paul Spicer, Mike Tan

    Abstract: We introduce the AI Security Pyramid of Pain, a framework that adapts the cybersecurity Pyramid of Pain to categorize and prioritize AI-specific threats. This framework provides a structured approach to understanding and addressing various levels of AI threats. Starting at the base, the pyramid emphasizes Data Integrity, which is essential for the accuracy and reliability of datasets and AI models… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: SPIE DCS 2024

  46. arXiv:2402.04119  [pdf, other

    cs.LG cs.CE

    Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science

    Authors: Pengfei Liu, Jun Tao, Zhixiang Ren

    Abstract: Efficient molecular modeling and design are crucial for the discovery and exploration of novel molecules, and the incorporation of deep learning methods has revolutionized this field. In particular, large language models (LLMs) offer a fresh approach to tackle scientific problems from a natural language processing (NLP) perspective, introducing a research paradigm called scientific language modeli… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  47. Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion

    Authors: Cunhang Fan, Yujie Chen, Jun Xue, Yonghui Kong, Jianhua Tao, Zhao Lv

    Abstract: In recent years, knowledge graph completion (KGC) models based on pre-trained language model (PLM) have shown promising results. However, the large number of parameters and high computational cost of PLM models pose challenges for their application in downstream tasks. This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly re… ▽ More

    Submitted 10 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI2024

    Journal ref: (2024) Vol. 38 No. 8: AAAI-24 Technical Tracks 8 Vol. 38 No. 8: AAAI-24 Technical Tracks 8 Vol. 38 No. 8: AAAI-24 Technical Tracks 8 Proceedings of the AAAI Conference on Artificial Intelligence, 38(8), 8380-8388

  48. arXiv:2401.10273  [pdf

    cs.CY cs.AI

    Revolutionizing Pharma: Unveiling the AI and LLM Trends in the Pharmaceutical Industry

    Authors: Yu Han, Jingwen Tao

    Abstract: This document offers a critical overview of the emerging trends and significant advancements in artificial intelligence (AI) within the pharmaceutical industry. Detailing its application across key operational areas, including research and development, animal testing, clinical trials, hospital clinical stages, production, regulatory affairs, quality control and other supporting areas, the paper ca… ▽ More

    Submitted 21 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

  49. arXiv:2401.09750  [pdf, other

    cs.LG

    Exploration and Anti-Exploration with Distributional Random Network Distillation

    Authors: Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li

    Abstract: Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND,… ▽ More

    Submitted 19 May, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: ICML 2024 accepted

  50. arXiv:2401.05698  [pdf, other

    cs.CV cs.HC cs.MM cs.SD eess.AS

    HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition

    Authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Audio-Visual Emotion Recognition (AVER) has garnered increasing attention in recent years for its critical role in creating emotion-ware intelligent machines. Previous efforts in this area are dominated by the supervised learning paradigm. Despite significant progress, supervised learning is meeting its bottleneck due to the longstanding data scarcity issue in AVER. Motivated by recent advances in… ▽ More

    Submitted 1 April, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by Information Fusion. The code is available at https://github.com/sunlicai/HiCMAE

    Journal ref: Information Fusion, 2024