Zum Hauptinhalt springen

Showing 1–50 of 229 results for author: Zhu, Q

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.11480  [pdf, other

    eess.IV cs.CV

    OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal

    Authors: Qiao Mo, Yukang Ding, Jinhua Hao, Qiang Zhu, Ming Sun, Chao Zhou, Feiyu Chen, Shuyuan Zhu

    Abstract: Deep learning-based methods have shown remarkable performance in single JPEG artifacts removal task. However, existing methods tend to degrade on double JPEG images, which are prevalent in real-world scenarios. To address this issue, we propose Offset-Aware Partition Transformer for double JPEG artifacts removal, termed as OAPT. We conduct an analysis of double JPEG compression that results in up… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures. Codes and models are available at https://github.com/QMoQ/OAPT.git

  2. arXiv:2408.08883  [pdf

    eess.IV

    MR Optimized Reconstruction of Simultaneous Multi-Slice Imaging Using Diffusion Model

    Authors: Ting Zhao, Zhuoxu Cui, Sen Jia, Qingyong Zhu, Congcong Liu, Yihang Zhou, Yanjie Zhu, Dong Liang, Haifeng Wang

    Abstract: Diffusion model has been successfully applied to MRI reconstruction, including single and multi-coil acquisition of MRI data. Simultaneous multi-slice imaging (SMS), as a method for accelerating MR acquisition, can significantly reduce scanning time, but further optimization of reconstruction results is still possible. In order to optimize the reconstruction of SMS, we proposed a method to use dif… ▽ More

    Submitted 21 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted as ISMRM 2024 Digital Poster 4024

    Journal ref: ISMRM 2024 Digital poster 4024

  3. arXiv:2408.02208  [pdf, other

    eess.SY cs.LG physics.soc-ph

    Multi-level Traffic-Responsive Tilt Camera Surveillance through Predictive Correlated Online Learning

    Authors: Tao Li, Zilin Bian, Haozhe Lei, Fan Zuo, Ya-Ting Yang, Quanyan Zhu, Zhenning Li, Kaan Ozbay

    Abstract: In urban traffic management, the primary challenge of dynamically and efficiently monitoring traffic conditions is compounded by the insufficient utilization of thousands of surveillance cameras along the intelligent transportation system. This paper introduces the multi-level Traffic-responsive Tilt Camera surveillance system (TTC-X), a novel framework designed for dynamic and efficient monitorin… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted to Transportation Research Part C special issue: Modelling, Learning, and Control of Conventional, Cooperative and Automated Motorway and Urban Traffic Systems

  4. arXiv:2407.19316  [pdf

    eess.IV cs.AI cs.CV

    AResNet-ViT: A Hybrid CNN-Transformer Network for Benign and Malignant Breast Nodule Classification in Ultrasound Images

    Authors: Xin Zhao, Qianqian Zhu, Jialing Wu

    Abstract: To address the challenges of similarity between lesions and surrounding tissues, overlapping appearances of partially benign and malignant nodules, and difficulty in classification, a deep learning network that integrates CNN and Transformer is proposed for the classification of benign and malignant breast lesions in ultrasound images. This network adopts a dual-branch architecture for local-globa… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures

  5. arXiv:2407.17737  [pdf

    eess.SY

    Control Informed Design of the IAC Autonomous Racecar for Operation at the Dynamic Envelope

    Authors: Qilun Zhu, Matthias Schmid, Robert Prucka, Ashley Boncimino, Chris Paredis

    Abstract: This article introduces the hardware-software co-design of the control system for an autonomy-enabled formula-style high-speed racecar that will be utilized as the deployment platform for high-level autonomy in the first ever head-to-head driverless race called the Indy Autonomous Challenge. The embedded control system needs to facilitate autonomous functionality, including perception, localizatio… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 6 pages, 6 figures, 2021 IEEE International Conference on Robotics and Automation, Workshop on Opportunities and Challenges with Autonomous Racing

  6. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  7. arXiv:2407.05617  [pdf, other

    eess.IV

    LINEAR: Learning Implicit Neural Representation With Explicit Physical Priors for Accelerated Quantitative T1rho Mapping

    Authors: Yuanyuan Liu, Jinwen Xie, Zhuo-Xu Cui, Qingyong Zhu, Jing Cheng, Dong Liang, Yanjie Zhu

    Abstract: Quantitative T1rho mapping has shown promise in clinical and research studies. However, it suffers from long scan times. Deep learning-based techniques have been successfully applied in accelerated quantitative MR parameter mapping. However, most methods require fully-sampled training dataset, which is impractical in the clinic. In this study, a novel subject-specific unsupervised method based on… ▽ More

    Submitted 23 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Yuanyuan Liu and Jinwen Xie contributed equally to this work

  8. arXiv:2407.04936  [pdf, other

    cs.SD eess.AS

    A Reference-free Metric for Language-Queried Audio Source Separation using Contrastive Language-Audio Pretraining

    Authors: Feiyang Xiao, Jian Guan, Qiaoxi Zhu, Xubo Liu, Wenbo Wang, Shuhan Qi, Kejia Zhang, Jianyuan Sun, Wenwu Wang

    Abstract: Language-queried audio source separation (LASS) aims to separate an audio source guided by a text query, with the signal-to-distortion ratio (SDR)-based metrics being commonly used to objectively measure the quality of the separated audio. However, the SDR-based metrics require a reference signal, which is often difficult to obtain in real-world scenarios. In addition, with the SDR-based metrics,… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE 2024 Workshop

  9. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.09693  [pdf, other

    cs.CV eess.IV

    Compressed Video Quality Enhancement with Temporal Group Alignment and Fusion

    Authors: Qiang Zhu, Yajun Qiu, Yu Liu, Shuyuan Zhu, Bing Zeng

    Abstract: In this paper, we propose a temporal group alignment and fusion network to enhance the quality of compressed videos by using the long-short term correlations between frames. The proposed model consists of the intra-group feature alignment (IntraGFA) module, the inter-group feature fusion (InterGFF) module, and the feature enhancement (FE) module. We form the group of pictures (GoP) by selecting fr… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  11. arXiv:2406.06964  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    Missingness-resilient Video-enhanced Multimodal Disfluency Detection

    Authors: Payal Mohapatra, Shamika Likhite, Subrata Biswas, Bashima Islam, Qi Zhu

    Abstract: Most existing speech disfluency detection techniques only rely upon acoustic data. In this work, we present a practical multimodal disfluency detection approach that leverages available video data together with audio. We curate an audiovisual dataset and propose a novel fusion technique with unified weight-sharing modality-agnostic encoders to learn the temporal and semantic context. Our resilient… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  12. arXiv:2406.03688  [pdf, other

    eess.IV cs.CV

    Shadow and Light: Digitally Reconstructed Radiographs for Disease Classification

    Authors: Benjamin Hou, Qingqing Zhu, Tejas Sudarshan Mathai, Qiao Jin, Zhiyong Lu, Ronald M. Summers

    Abstract: In this paper, we introduce DRR-RATE, a large-scale synthetic chest X-ray dataset derived from the recently released CT-RATE dataset. DRR-RATE comprises of 50,188 frontal Digitally Reconstructed Radiographs (DRRs) from 21,304 unique patients. Each image is paired with a corresponding radiology text report and binary labels for 18 pathology classes. Given the controllable nature of DRR generation,… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  13. arXiv:2405.17370  [pdf, other

    eess.SY cs.LG

    Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators

    Authors: Yunian Pan, Quanyan Zhu

    Abstract: Meta-learning has been proposed as a promising machine learning topic in recent years, with important applications to image classification, robotics, computer games, and control systems. In this paper, we study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators. We integrate the zeroth-order optimization technique with a typical met… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  14. arXiv:2405.10025  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models

    Authors: Yuchen Hu, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng, Ruizhe Li

    Abstract: Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which aims to predict the ground-truth transcription from the decoded N-best hypotheses. Thanks to the strong language generation ability of LLMs and rich information in the N-best list, GER shows great effectiveness in enhancing ASR results. However, it still suf… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 14 pages, Accepted by ACL 2024

  15. arXiv:2404.15750  [pdf, other

    eess.SP

    A Reconfigurable Subarray Architecture and Hybrid Beamforming for Millimeter-Wave Dual-Function-Radar-Communication Systems

    Authors: Xin Jin, Tiejun Lv, Wei Ni, Zhipeng Lin, Qiuming Zhu, Ekram Hossain, H. Vincent Poor

    Abstract: Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-pl… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures, Accepted by IEEE TWC

  16. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  17. arXiv:2404.01205  [pdf, other

    eess.SY cs.CR cs.GT

    Foundations of Cyber Resilience: The Confluence of Game, Control, and Learning Theories

    Authors: Quanyan Zhu

    Abstract: Cyber resilience is a complementary concept to cybersecurity, focusing on the preparation, response, and recovery from cyber threats that are challenging to prevent. Organizations increasingly face such threats in an evolving cyber threat landscape. Understanding and establishing foundations for cyber resilience provide a quantitative and systematic approach to cyber risk assessment, mitigation po… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  18. arXiv:2403.15636  [pdf, ps, other

    cs.GT eess.SY

    On the Variational Interpretation of Mirror Play in Monotone Games

    Authors: Yunian Pan, Tao Li, Quanyan Zhu

    Abstract: Mirror play (MP) is a well-accepted primal-dual multi-agent learning algorithm where all agents simultaneously implement mirror descent in a distributed fashion. The advantage of MP over vanilla gradient play lies in its usage of mirror maps that better exploit the geometry of decision domains. Despite extensive literature dedicated to the asymptotic convergence of MP to equilibrium, the understan… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  19. arXiv:2403.10362  [pdf, other

    eess.IV cs.CV

    CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement

    Authors: Qiang Zhu, Jinhua Hao, Yukang Ding, Yu Liu, Qiao Mo, Ming Sun, Chao Zhou, Shuyuan Zhu

    Abstract: Recently, numerous approaches have achieved notable success in compressed video quality enhancement (VQE). However, these methods usually ignore the utilization of valuable coding priors inherently embedded in compressed videos, such as motion vectors and residual frames, which carry abundant temporal and spatial information. To remedy this problem, we propose the Coding Priors-Guided Aggregation… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  20. arXiv:2403.06299  [pdf, other

    eess.SY cs.GT math.OC

    Disentangling Resilience from Robustness: Contextual Dualism, Interactionism, and Game-Theoretic Paradigms

    Authors: Quanyan Zhu, Tamer Basar

    Abstract: This article explains the distinctions between robustness and resilience in control systems. Resilience confronts a distinct set of challenges, posing new ones for designing controllers for feedback systems, networks, and machines that prioritize resilience over robustness. The concept of resilience is explored through a three-stage model, emphasizing the need for a proactive preparation and autom… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  21. arXiv:2403.00972  [pdf, other

    cs.GT eess.SY

    Understanding Police Force Resource Allocation using Adversarial Optimal Transport with Incomplete Information

    Authors: Yinan Hu, Juntao Chen, Quanyan Zhu

    Abstract: Adversarial optimal transport has been proven useful as a mathematical formulation to model resource allocation problems to maximize the efficiency of transportation with an adversary, who modifies the data. It is often the case, however, that only the adversary knows which nodes are malicious and which are not. In this paper we formulate the problem of seeking adversarial optimal transport into B… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  22. arXiv:2402.18781  [pdf, ps, other

    cs.GT cs.LG eess.SY

    Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games

    Authors: Tao Li, Kim Hammar, Rolf Stadler, Quanyan Zhu

    Abstract: Asymmetric information stochastic games (AISGs) arise in many complex socio-technical systems, such as cyber-physical systems and IT infrastructures. Existing computational methods for AISGs are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to particular information structures to avoid belief hierarchies. Considering these limitations, we propo… ▽ More

    Submitted 19 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to the 63rd IEEE Conference on Decision and Control, Special Session on Networks, Games and Learning

  23. arXiv:2402.17718  [pdf

    cs.LG eess.SP

    Towards a Digital Twin Framework in Additive Manufacturing: Machine Learning and Bayesian Optimization for Time Series Process Optimization

    Authors: Vispi Karkaria, Anthony Goeckner, Rujing Zha, Jie Chen, Jianjing Zhang, Qi Zhu, Jian Cao, Robert X. Gao, Wei Chen

    Abstract: Laser-directed-energy deposition (DED) offers advantages in additive manufacturing (AM) for creating intricate geometries and material grading. Yet, challenges like material inconsistency and part variability remain, mainly due to its layer-wise fabrication. A key issue is heat accumulation during DED, which affects the material microstructure and properties. While closed-loop control methods for… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 Pages, 10 Figures, 1 Table, NAMRC Conference

  24. arXiv:2402.15985  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Phonetic and Lexical Discovery of a Canine Language using HuBERT

    Authors: Xingyuan Li, Sinong Wang, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

    Abstract: This paper delves into the pioneering exploration of potential communication patterns within dog vocalizations and transcends traditional linguistic analysis barriers, which heavily relies on human priori knowledge on limited datasets to find sound units in dog vocalization. We present a self-supervised approach with HuBERT, enabling the accurate classification of phoneme labels and the identifica… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  25. arXiv:2402.12499  [pdf, other

    cs.GT cs.AI cs.CR cs.LG eess.SY

    Automated Security Response through Online Learning with Adaptive Conjectures

    Authors: Kim Hammar, Tao Li, Rolf Stadler, Quanyan Zhu

    Abstract: We study automated security response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed, non-stationary game. We relax the standard assumption that the game model is correctly specified and consider that each player has a probabilistic conjecture about the model, which may be misspecified in the sense that the true model has probabilit… ▽ More

    Submitted 23 July, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  26. arXiv:2402.05960  [pdf, other

    cs.LG eess.SP

    Phase-driven Domain Generalizable Learning for Nonstationary Time Series

    Authors: Payal Mohapatra, Lixu Wang, Qi Zhu

    Abstract: Monitoring and recognizing patterns in continuous sensing data is crucial for many practical applications. These real-world time-series data are often nonstationary, characterized by varying statistical and spectral properties over time. This poses a significant challenge in developing learning models that can effectively generalize across different distributions. In this work, based on our observ… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  27. arXiv:2402.03141  [pdf, other

    cs.LG cs.AI eess.SY

    Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

    Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

    Abstract: Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary ta… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  28. arXiv:2401.03468  [pdf, other

    eess.AS cs.SD

    Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

    Authors: Qiushi Zhu, Jie Zhang, Yu Gu, Yuchen Hu, Lirong Dai

    Abstract: Self-supervised speech pre-training methods have developed rapidly in recent years, which show to be very effective for many near-field single-channel speech tasks. However, far-field multichannel speech processing is suffering from the scarcity of labeled multichannel data and complex ambient noises. The efficacy of self-supervised learning for far-field multichannel and multi-modal speech proces… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI 2024

  29. arXiv:2401.01963  [pdf, other

    eess.SY

    Integrated Cyber-Physical Resiliency for Power Grids under IoT-Enabled Dynamic Botnet Attacks

    Authors: Yuhan Zhao, Juntao Chen, Quanyan Zhu

    Abstract: The wide adoption of Internet of Things (IoT)-enabled energy devices improves the quality of life, but simultaneously, it enlarges the attack surface of the power grid system. The adversary can gain illegitimate control of a large number of these devices and use them as a means to compromise the physical grid operation, a mechanism known as the IoT botnet attack. This paper aims to improve the res… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  30. arXiv:2312.00812  [pdf, other

    cs.AI cs.LG eess.SY

    Empowering Autonomous Driving with Large Language Models: A Safety Perspective

    Authors: Yixuan Wang, Ruochen Jiao, Sinong Simon Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu

    Abstract: Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data. To this end, this paper explores the integration of Large Language Models (LLMs) into AD systems, leveraging their rob… ▽ More

    Submitted 22 March, 2024; v1 submitted 27 November, 2023; originally announced December 2023.

    Comments: Accepted to LLMAgent workshop @ICLR2024

  31. arXiv:2311.02227  [pdf, other

    cs.LG cs.AI eess.SY

    State-Wise Safe Reinforcement Learning With Pixel Observations

    Authors: Simon Sinong Zhan, Yixuan Wang, Qingyuan Wu, Ruochen Jiao, Chao Huang, Qi Zhu

    Abstract: In the context of safe exploration, Reinforcement Learning (RL) has long grappled with the challenges of balancing the tradeoff between maximizing rewards and minimizing safety violations, particularly in complex environments with contact-rich or non-smooth dynamics, and when dealing with high-dimensional pixel observations. Furthermore, incorporating state-wise safety constraints in the explorati… ▽ More

    Submitted 11 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: 10 pages, 5 figures

  32. arXiv:2310.14173  [pdf, other

    cs.SD eess.AS

    First-Shot Unsupervised Anomalous Sound Detection With Unknown Anomalies Estimated by Metadata-Assisted Audio Generation

    Authors: Hejing Zhang, Qiaoxi Zhu, Jian Guan, Haohe Liu, Feiyang Xiao, Jiantong Tian, Xinhao Mei, Xubo Liu, Wenwu Wang

    Abstract: First-shot (FS) unsupervised anomalous sound detection (ASD) is a brand-new task introduced in DCASE 2023 Challenge Task 2, where the anomalous sounds for the target machine types are unseen in training. Existing methods often rely on the availability of normal and abnormal sound data from the target machines. However, due to the lack of anomalous sound data for the target machine types, it become… ▽ More

    Submitted 11 March, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted at ICASSP 2024

  33. arXiv:2310.08950  [pdf, ps, other

    cs.SD eess.AS

    Transformer-based Autoencoder with ID Constraint for Unsupervised Anomalous Sound Detection

    Authors: Jian Guan, Youde Liu, Qiuqiang Kong, Feiyang Xiao, Qiaoxi Zhu, Jiantong Tian, Wenwu Wang

    Abstract: Unsupervised anomalous sound detection (ASD) aims to detect unknown anomalous sounds of devices when only normal sound data is available. The autoencoder (AE) and self-supervised learning based methods are two mainstream methods. However, the AE-based methods could be limited as the feature learned from normal sounds can also fit with anomalous sounds, reducing the ability of the model in detectin… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by EURASIP Journal on Audio, Speech, and Music Processing

  34. arXiv:2310.07323  [pdf

    cs.LG eess.SY

    Multichannel consecutive data cross-extraction with 1DCNN-attention for diagnosis of power transformer

    Authors: Wei Zheng, Guogang Zhang, Chenchen Zhao, Qianqian Zhu

    Abstract: Power transformer plays a critical role in grid infrastructure, and its diagnosis is paramount for maintaining stable operation. However, the current methods for transformer diagnosis focus on discrete dissolved gas analysis, neglecting deep feature extraction of multichannel consecutive data. The unutilized sequential data contains the significant temporal information reflecting the transformer c… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  35. arXiv:2310.01646  [pdf, other

    cs.GT eess.SY

    Strategic Information Attacks on Incentive-Compatible Navigational Recommendations in Intelligent Transportation Systems

    Authors: Ya-Ting Yang, Haozhe Lei, Quanyan Zhu

    Abstract: Intelligent transportation systems (ITS) have gained significant attention from various communities, driven by rapid advancements in informational technology. Within the realm of ITS, navigational recommendation systems (RS) play a pivotal role, as users often face diverse path (route) options in such complex urban environments. However, RS is not immune to vulnerabilities, especially when confron… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures

  36. arXiv:2309.09705  [pdf, other

    cs.SD eess.AS

    Synth-AC: Enhancing Audio Captioning with Synthetic Supervision

    Authors: Feiyang Xiao, Qiaoxi Zhu, Jian Guan, Xubo Liu, Haohe Liu, Kejia Zhang, Wenwu Wang

    Abstract: Data-driven approaches hold promise for audio captioning. However, the development of audio captioning methods can be biased due to the limited availability and quality of text-audio data. This paper proposes a SynthAC framework, which leverages recent advances in audio generative models and commonly available text corpus to create synthetic text-audio pairs, thereby enhancing text-audio represent… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

  37. arXiv:2309.09207  [pdf, ps, other

    eess.SP

    Cramer-Rao Bound Optimization for Active RIS-Empowered ISAC Systems

    Authors: Qi Zhu, Ming Li, Rang Liu, Qian Liu

    Abstract: Integrated sensing and communication (ISAC), which simultaneously performs sensing and communication functions within a shared frequency band and hardware platform, has emerged as a promising technology for future wireless systems. Nevertheless, the weak echo signal received by the low-sensitivity ISAC receiver significantly constrains sensing performance in scenarios involving obstructed targets.… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: 13 pages, 10 figures, accepted by IEEE TWC

  38. arXiv:2309.07498  [pdf, other

    eess.AS cs.SD

    Hierarchical Metadata Information Constrained Self-Supervised Learning for Anomalous Sound Detection Under Domain Shift

    Authors: Haiyan Lan, Qiaoxi Zhu, Jian Guan, Yuming Wei, Wenwu Wang

    Abstract: Self-supervised learning methods have achieved promising performance for anomalous sound detection (ASD) under domain shift, where the type of domain shift is considered in feature learning by incorporating section IDs. However, the attributes accompanying audio files under each section, such as machine operating conditions and noise types, have not been considered, although they are also crucial… ▽ More

    Submitted 18 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: To appear at ICASSP 2024

  39. arXiv:2309.07474  [pdf, other

    eess.SY

    A Fuzzy Cascaded Proportional-Derivative Controller for Under-actuated Flexible Joint Manipulators Using Bayesian Optimization

    Authors: Changyi Lei, Quanmin Zhu

    Abstract: This paper proposes a novel fuzzy cascaded Proportional-Derivative (PD) controller for under-actuated single-link flexible joint manipulators. The original flexible joint system is considered as two coupled $2^{nd}$-order sub-systems. The proposed controller is composed of two cascaded PD controllers and two fuzzy logic regulators (FLRs). The first (virtual) PD controller is used to generate desir… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 19 pages, 23 figures, 6 tables

    MSC Class: 93C42; 93D15; 93C42 (Primary); 93C10 (Secondary)

  40. arXiv:2309.02328  [pdf, other

    cs.RO cs.AI eess.SY stat.ML

    Neurosymbolic Meta-Reinforcement Lookahead Learning Achieves Safe Self-Driving in Non-Stationary Environments

    Authors: Haozhe Lei, Quanyan Zhu

    Abstract: In the area of learning-driven artificial intelligence advancement, the integration of machine learning (ML) into self-driving (SD) technology stands as an impressive engineering feat. Yet, in real-world applications outside the confines of controlled laboratory scenarios, the deployment of self-driving technology assumes a life-critical role, necessitating heightened attention from researchers to… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  41. arXiv:2308.14553  [pdf, other

    eess.AS cs.SD

    Rep2wav: Noise Robust text-to-speech Using self-supervised representations

    Authors: Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang

    Abstract: Benefiting from the development of deep learning, text-to-speech (TTS) techniques using clean speech have achieved significant performance improvements. The data collected from real scenes often contains noise and generally needs to be denoised by speech enhancement models. Noise-robust TTS models are often trained using the enhanced speech, which thus suffer from speech distortion and background… ▽ More

    Submitted 3 September, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: 5 pages,2 figures

  42. arXiv:2308.14063  [pdf, other

    cs.SD eess.AS

    Anomalous Sound Detection Using Self-Attention-Based Frequency Pattern Analysis of Machine Sounds

    Authors: Hejing Zhang, Jian Guan, Qiaoxi Zhu, Feiyang Xiao, Youde Liu

    Abstract: Different machines can exhibit diverse frequency patterns in their emitted sound. This feature has been recently explored in anomaly sound detection and reached state-of-the-art performance. However, existing methods rely on the manual or empirical determination of the frequency filter by observing the effective frequency range in the training data, which may be impractical for general application… ▽ More

    Submitted 6 September, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Published in INTERSPEECH 2023

  43. arXiv:2308.05862  [pdf, other

    eess.IV cs.AI cs.CV

    Unleashing the Strengths of Unlabeled Data in Pan-cancer Abdominal Organ Quantification: the FLARE22 Challenge

    Authors: Jun Ma, Yao Zhang, Song Gu, Cheng Ge, Shihao Ma, Adamo Young, Cheng Zhu, Kangkang Meng, Xin Yang, Ziyan Huang, Fan Zhang, Wentao Liu, YuanKe Pan, Shoujin Huang, Jiacheng Wang, Mingze Sun, Weixin Xu, Dengqiang Jia, Jae Won Choi, Natália Alves, Bram de Wilde, Gregor Koehler, Yajun Wu, Manuel Wiesenfarth, Qiongjie Zhu , et al. (4 additional authors not shown)

    Abstract: Quantitative organ assessment is an essential step in automated abdominal disease diagnosis and treatment planning. Artificial intelligence (AI) has shown great potential to automatize this process. However, most existing AI algorithms rely on many expert annotations and lack a comprehensive evaluation of accuracy and efficiency in real-world multinational settings. To overcome these limitations,… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

    Comments: MICCAI FLARE22: https://flare22.grand-challenge.org/

  44. arXiv:2307.08029  [pdf, other

    eess.AS cs.LG cs.SD

    Noise-aware Speech Enhancement using Diffusion Probabilistic Model

    Authors: Yuchen Hu, Chen Chen, Ruizhe Li, Qiushi Zhu, Eng Siong Chng

    Abstract: With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of clean speech, underexploiting the varying noise information in real world. In this paper, we propose a noise-aware speech enhancement (NASE) approach that extract… ▽ More

    Submitted 4 June, 2024; v1 submitted 16 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, Accepted by InterSpeech 2024

  45. arXiv:2306.10563  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition

    Authors: Yuchen Hu, Ruizhe Li, Chen Chen, Chengwei Qin, Qiushi Zhu, Eng Siong Chng

    Abstract: Audio-visual speech recognition (AVSR) provides a promising solution to ameliorate the noise-robustness of audio-only speech recognition with visual information. However, most existing efforts still focus on audio modality to improve robustness considering its dominance in AVSR task, with noise adaptation techniques such as front-end denoise processing. Though effective, these methods are usually… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: 19 pages, 9 figures, Accepted by ACL 2023

  46. arXiv:2305.16072  [pdf, other

    eess.IV

    VEDA: Uneven light image enhancement via a vision-based exploratory data analysis model

    Authors: Tian Pu, Shuhang Wang, Zhenming Peng, Qingsong Zhu

    Abstract: Uneven light image enhancement is a highly demanded task in many industrial image processing applications. Many existing enhancement methods using physical lighting models or deep-learning techniques often lead to unnatural results. This is mainly because: 1) the assumptions and priors made by the physical lighting model (PLM) based approaches are often violated in most natural scenes, and 2) the… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  47. arXiv:2305.13957  [pdf, other

    eess.AS

    Eeg2vec: Self-Supervised Electroencephalographic Representation Learning

    Authors: Qiushi Zhu, Xiaoying Zhao, Jie Zhang, Yu Gu, Chao Weng, Yuchen Hu

    Abstract: Recently, many efforts have been made to explore how the brain processes speech using electroencephalographic (EEG) signals, where deep learning-based approaches were shown to be applicable in this field. In order to decode speech signals from EEG signals, linear networks, convolutional neural networks (CNN) and long short-term memory networks are often used in a supervised manner. Recording EEG-s… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: 5 pages

  48. arXiv:2305.13770  [pdf, other

    cs.CV eess.IV

    MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Qingpeng Zhu, Qianhui Sun, Wenxiu Sun, Chen Change Loy, Jinwei Gu

    Abstract: Developing and integrating advanced image sensors with novel algorithms in camera systems are prevalent with the increasing demand for computational photography and imaging on mobile platforms. However, the lack of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: CVPR 2023 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2023/

  49. arXiv:2305.09994  [pdf, other

    eess.AS cs.AI cs.SD

    BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

    Authors: Jie Zhang, Qing-Tian Xu, Qiu-Shi Zhu, Zhen-Hua Ling

    Abstract: Time-domain single-channel speech enhancement (SE) still remains challenging to extract the target speaker without any prior information on multi-talker conditions. It has been shown via auditory attention decoding that the brain activity of the listener contains the auditory information of the attended speaker. In this paper, we thus propose a novel time-domain brain-assisted SE network (BASEN) i… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Submitted to ISCA Interspeech 2023

  50. arXiv:2305.09212  [pdf, other

    eess.AS cs.CV cs.MM cs.SD

    Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

    Authors: Yuchen Hu, Ruizhe Li, Chen Chen, Heqing Zou, Qiushi Zhu, Eng Siong Chng

    Abstract: Audio-visual speech recognition (AVSR) research has gained a great success recently by improving the noise-robustness of audio-only automatic speech recognition (ASR) with noise-invariant visual information. However, most existing AVSR approaches simply fuse the audio and visual features by concatenation, without explicit interactions to capture the deep correlations between them, which results in… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 12 pages, 5 figures, Accepted by IJCAI 2023