Zum Hauptinhalt springen

Showing 1–50 of 307 results for author: Wu, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.13893  [pdf, other

    cs.SD cs.CL eess.AS

    SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Rongjie Huang, Yuanyuan Wang, Haohan Guo, Dading Chong, Songxiang Liu, Xixin Wu, Helen Meng

    Abstract: Scaling Text-to-speech (TTS) to large-scale datasets has been demonstrated as an effective method for improving the diversity and naturalness of synthesized speech. At the high level, previous large-scale TTS models can be categorized into either Auto-regressive (AR) based (\textit{e.g.}, VALL-E) or Non-auto-regressive (NAR) based models (\textit{e.g.}, NaturalSpeech 2/3). Although these works dem… ▽ More

    Submitted 28 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

    Comments: Submit to TASLP

  2. arXiv:2408.07770  [pdf, ps, other

    eess.SP

    User-Centric Machine Learning for Resource Allocation in MPTCP-Enabled Hybrid LiFi and WiFi Networks

    Authors: Han Ji, Declan T. Delaney, Xiping Wu

    Abstract: As an emerging paradigm of heterogeneous networks (HetNets) towards 6G, the hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) have potential to explore the complementary advantages of the optical and radio spectra. Like other cooperation-native HetNets, HLWNets face a crucial load balancing (LB) problem due to the heterogeneity of access points (APs). The existing litera… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  3. arXiv:2408.06185  [pdf, other

    eess.SY cs.CY cs.GT cs.NI

    Hi-SAM: A high-scalable authentication model for satellite-ground Zero-Trust system using mean field game

    Authors: Xuesong Wu, Tianshuai Zheng, Runfang Wu, Jie Ren, Junyan Guo, Ye Du

    Abstract: As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work id… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  4. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  5. arXiv:2408.02966  [pdf, other

    cs.CV eess.IV

    Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement

    Authors: Hao Xu, Xi Zhang, Xiaolin Wu

    Abstract: Compressing a set of unordered points is far more challenging than compressing images/videos of regular sample grids, because of the difficulties in characterizing neighboring relations in an irregular layout of points. Many researchers resort to voxelization to introduce regularity, but this approach suffers from quantization loss. In this research, we use the KNN method to determine the neighbor… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  6. arXiv:2407.20878  [pdf

    eess.IV cs.CV

    S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap

    Authors: Jiaqi Cui, Pinxian Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: To acquire high-quality positron emission tomography (PET) images while reducing the radiation tracer dose, numerous efforts have been devoted to reconstructing standard-dose PET (SPET) images from low-dose PET (LPET). However, the success of current fully-supervised approaches relies on abundant paired LPET and SPET images, which are often unavailable in clinic. Moreover, these methods often mix… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  7. arXiv:2407.13509  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

    Authors: Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng

    Abstract: Spontaneous style speech synthesis, which aims to generate human-like speech, often encounters challenges due to the scarcity of high-quality data and limitations in model capabilities. Recent language model-based TTS systems can be trained on large, diverse, and low-quality speech datasets, resulting in highly natural synthesized speech. However, they are limited by the difficulty of simulating v… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  8. arXiv:2407.09817  [pdf, other

    cs.SD cs.CL eess.AS

    Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System

    Authors: Lingwei Meng, Jiawen Kang, Yuejiao Wang, Zengrui Jin, Xixin Wu, Xunying Liu, Helen Meng

    Abstract: Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we propose a pioneering approach to empower Whisper, which is a speech foundation model, to tackle joint multi-talker and target-talker speech recogniti… ▽ More

    Submitted 24 August, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted to INTERSPEECH 2024

  9. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.04726  [pdf, other

    eess.SP cs.LG

    Data-Driven Prediction and Uncertainty Quantification of PWR Crud-Induced Power Shift Using Convolutional Neural Networks

    Authors: Aidan Furlong, Farah Alsafadi, Scott Palmtag, Andrew Godfrey, Xu Wu

    Abstract: The development of Crud-Induced Power Shift (CIPS) is an operational challenge in Pressurized Water Reactors that is due to the development of crud on the fuel rod cladding. The available predictive tools developed previously, usually based on fundamental physics, are computationally expensive and have shown differing degrees of accuracy. This work proposes a completely top-down approach to predic… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  11. arXiv:2407.02318  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

    Authors: Yurui Huang, Yang Yang, Shou Chen, Xiangyu Wu, Qingguo Chen, Jianfeng Lu

    Abstract: In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features. High-quality visual features are extracted using a state-of-the-art self-supervised pre-training network, resulting in efficient video feature representations. At the same time, audio features serve as complementary information… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  12. arXiv:2407.01700  [pdf, other

    eess.SY

    Joint Design of Conventional Public Transport Network and Mobility on Demand

    Authors: Xiaoyi Wu, Nisrine Mouhrim, Andrea Araldo, Yves Molenbruch, Dominique Feillet, Kris Braekers

    Abstract: Conventional Public Transport (PT) is based on fixed lines, running with routes and schedules determined a-priori. In low-demand areas, conventional PT is inefficient. Therein, Mobility on Demand (MoD) could serve users more efficiently and with an improved quality of service (QoS). The idea of integrating MoD into PT is therefore abundantly discussed by researchers and practitioners, mainly in th… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 26th Euro Working Group on Transportation Meeting

  13. arXiv:2406.17338  [pdf, other

    eess.IV cs.CV cs.LG

    Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

    Authors: Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

    Abstract: Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propos… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  14. arXiv:2406.16200  [pdf, other

    cs.LG cs.CR cs.IT eess.SP

    Towards unlocking the mystery of adversarial fragility of neural networks

    Authors: Jingchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu

    Abstract: In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages

  15. arXiv:2406.15222  [pdf

    eess.IV cs.AI cs.CV

    Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study

    Authors: Yujian Hu, Yilang Xiang, Yan-Jie Zhou, Yangyan He, Shifeng Yang, Xiaolong Du, Chunlan Den, Youyao Xu, Gaofeng Wang, Zhengyao Ding, Jingyong Huang, Wenjun Zhao, Xuejun Wu, Donglin Li, Qianqian Zhu, Zhenjiang Li, Chenyang Qiu, Ziheng Wu, Yunjun He, Chen Tian, Yihui Qiu, Zuodong Lin, Xiaolong Zhang, Yuan He, Zhenpeng Yuan , et al. (15 additional authors not shown)

    Abstract: Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed… ▽ More

    Submitted 16 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  16. arXiv:2406.14092  [pdf, other

    cs.CL eess.AS

    Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models

    Authors: Jing Xu, Minglin Wu, Xixin Wu, Helen Meng

    Abstract: Self-supervised (SSL) models have shown great performance in various downstream tasks. However, they are typically developed for limited languages, and may encounter new languages in real-world. Developing a SSL model for each new language is costly. Thus, it is vital to figure out how to efficiently adapt existed SSL models to a new language without impairing its original abilities. We propose ad… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  17. arXiv:2406.13150  [pdf

    eess.IV cs.CV

    MCAD: Multi-modal Conditioned Adversarial Diffusion Model for High-Quality PET Image Reconstruction

    Authors: Jiaqi Cui, Xinyi Zeng, Pinxian Zeng, Bo Liu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: Radiation hazards associated with standard-dose positron emission tomography (SPET) images remain a concern, whereas the quality of low-dose PET (LPET) images fails to meet clinical requirements. Therefore, there is great interest in reconstructing SPET images from LPET images. However, prior studies focus solely on image data, neglecting vital complementary information from other modalities, e.g.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Early accepted by MICCAI2024

  18. arXiv:2406.12646  [pdf, other

    eess.IV cs.AI cs.CV

    An Empirical Study on the Fairness of Foundation Models for Multi-Organ Image Segmentation

    Authors: Qin Li, Yizhe Zhang, Yan Li, Jun Lyu, Meng Liu, Longyu Sun, Mengting Sun, Qirong Li, Wenyue Mao, Xinran Wu, Yajing Zhang, Yinghua Chu, Shuo Wang, Chengyan Wang

    Abstract: The segmentation foundation model, e.g., Segment Anything Model (SAM), has attracted increasing interest in the medical image community. Early pioneering studies primarily concentrated on assessing and improving SAM's performance from the perspectives of overall accuracy and efficiency, yet little attention was given to the fairness considerations. This oversight raises questions about the potenti… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to MICCAI-2024

  19. arXiv:2406.10056  [pdf, other

    cs.SD eess.AS

    UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

    Authors: Dongchao Yang, Haohan Guo, Yuanyuan Wang, Rongjie Huang, Xiang Li, Xu Tan, Xixin Wu, Helen Meng

    Abstract: The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-dr… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  20. arXiv:2406.09356  [pdf, other

    cs.CV eess.IV

    CMC-Bench: Towards a New Paradigm of Visual Signal Compression

    Authors: Chunyi Li, Xiele Wu, Haoning Wu, Donghui Feng, Zicheng Zhang, Guo Lu, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin

    Abstract: Ultra-low bitrate image compression is a challenging and demanding topic. With the development of Large Multimodal Models (LMMs), a Cross Modality Compression (CMC) paradigm of Image-Text-Image has emerged. Compared with traditional codecs, this semantic-level compression can reduce image data size to 0.1\% or even lower, which has strong potential applications. However, CMC has certain defects in… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  21. arXiv:2406.08716  [pdf, other

    cs.SD eess.AS

    TSE-PI: Target Sound Extraction under Reverberant Environments with Pitch Information

    Authors: Yiwen Wang, Xihong Wu

    Abstract: Target sound extraction (TSE) separates the target sound from the mixture signals based on provided clues. However, the performance of existing models significantly degrades under reverberant conditions. Inspired by auditory scene analysis (ASA), this work proposes a TSE model provided with pitch information named TSE-PI. Conditional pitch extraction is achieved through the Feature-wise Linearly M… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech2024

  22. arXiv:2406.08336  [pdf, other

    cs.SD cs.CV eess.AS

    CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction

    Authors: Xueyuan Chen, Dongchao Yang, Dingdong Wang, Xixin Wu, Zhiyong Wu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. In this paper, we propose a multi-modal DSR model by leveraging neural codec language modeling to improve the reconstruction results, especially for the speaker similarity and prosody naturalness. Our proposed model consists of: (… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  23. arXiv:2406.02940  [pdf, other

    cs.SD eess.AS

    Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

    Authors: Haohan Guo, Fenglong Xie, Dongchao Yang, Hui Lu, Xixin Wu, Helen Meng

    Abstract: VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewords to address this problem and build large-codebook speech tokenizers. It encodes speech features into multiple VQ subspaces and composes them into c… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  24. arXiv:2406.02328  [pdf, other

    cs.SD eess.AS

    SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

    Authors: Dongchao Yang, Dingdong Wang, Haohan Guo, Xueyuan Chen, Xixin Wu, Helen Meng

    Abstract: In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignment information; (2) It directly takes plain text as input and generates speech through an NAR way; (3) It tries to model speech in a finite and compac… ▽ More

    Submitted 14 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  25. arXiv:2406.01313  [pdf, ps, other

    cs.IT eess.SP

    3D Trajectory Design for Energy-constrained Aerial CRNs Under Probabilistic LoS Channel

    Authors: Hongjiang Lei, Xiaqiu Wu, Ki-Hong Park, Gaofeng Pan

    Abstract: Unmanned aerial vehicles (UAVs) have been attracting significant attention because there is a high probability of line-of-sight links being obtained between them and terrestrial nodes in high-rise urban areas. In this work, we investigate cognitive radio networks (CRNs) by jointly designing three-dimensional (3D) trajectory, the transmit power of the UAV, and user scheduling. Considering the UAV's… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures,submitted to the IEEE journal for review

  26. arXiv:2405.19796  [pdf, other

    cs.SD cs.AI eess.AS

    Explainable Attribute-Based Speaker Verification

    Authors: Xiaoliang Wu, Chau Luu, Peter Bell, Ajitha Rajan

    Abstract: This paper proposes a fully explainable approach to speaker verification (SV), a task that fundamentally relies on individual speaker characteristics. The opaque use of speaker attributes in current SV systems raises concerns of trust. Addressing this, we propose an attribute-based explainable SV system that identifies speakers by comparing personal attributes such as gender, nationality, and age… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  27. arXiv:2405.17024  [pdf

    eess.SP

    Beware of Overestimated Decoding Performance Arising from Temporal Autocorrelations in Electroencephalogram Signals

    Authors: Xiran Xu, Bo Wang, Boda Xiao, Yadong Niu, Yiwen Wang, Xihong Wu, Jing Chen

    Abstract: Researchers have reported high decoding accuracy (>95%) using non-invasive Electroencephalogram (EEG) signals for brain-computer interface (BCI) decoding tasks like image decoding, emotion recognition, auditory spatial attention detection, etc. Since these EEG data were usually collected with well-designed paradigms in labs, the reliability and robustness of the corresponding decoding methods were… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  28. arXiv:2405.03126  [pdf

    eess.IV eess.SP

    Infrared Polarization Imaging-based Non-destructive Thermography Inspection

    Authors: Xianyu Wu, Bin Zhou, Peng Lin, Rongjin Cao, Feng Huang

    Abstract: Infrared pulse thermography non-destructive testing (NDT) method is developed based on the difference in the infrared radiation intensity emitted by defective and non-defective areas of an object. However, when the radiation intensity of the defective target is similar to that of the non-defective area of the object, the detection results are poor. To address this issue, this study investigated th… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  29. arXiv:2405.01725  [pdf, other

    eess.IV cs.CV cs.LG

    Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey

    Authors: Guoping Xu, Xiaxia Wang, Xinglong Wu, Xuesong Leng, Yongchao Xu

    Abstract: Deep learning has made significant progress in computer vision, specifically in image classification, object detection, and semantic segmentation. The skip connection has played an essential role in the architecture of deep neural networks,enabling easier optimization through residual learning during the training stage and improving accuracy during testing. Many neural networks have inherited the… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  30. arXiv:2404.17867  [pdf, other

    cs.CV eess.IV

    Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics

    Authors: Xiaoshuai Wu, Xin Liao, Bo Ou, Yuling Liu, Zheng Qin

    Abstract: AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  31. Power Failure Cascade Prediction using Graph Neural Networks

    Authors: Sathwik Chadaga, Xinyu Wu, Eytan Modiano

    Abstract: We consider the problem of predicting power failure cascades due to branch failures. We propose a flow-free model based on graph neural networks that predicts grid states at every generation of a cascade process given an initial contingency and power injection values. We train the proposed model using a cascade sequence data pool generated from simulations. We then evaluate our model at various le… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 2023 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm). Oct. 31, 2023. See implementations at https://github.com/sathwikchadaga/failure-cascade

  32. arXiv:2404.11537  [pdf, other

    cs.CV eess.IV

    SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

    Authors: Yu Zhong, Xiao Wu, Liang-Jian Deng, Zihan Cao

    Abstract: Pansharpening is a significant image fusion technique that merges the spatial content and spectral characteristics of remote sensing images to generate high-resolution multispectral images. Recently, denoising diffusion probabilistic models have been gradually applied to visual tasks, enhancing controllable image generation through low-rank adaptation (LoRA). In this paper, we introduce a spatial-… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  33. arXiv:2404.07609  [pdf, other

    math.OC eess.SY

    Achieving violation-free distributed optimization under coupling constraints

    Authors: Changxin Liu, Xiao Tan, Xuyang Wu, Dimos V. Dimarogonas, Karl H. Johansson

    Abstract: Constraint satisfaction is a critical component in a wide range of engineering applications, including but not limited to safe multi-agent control and economic dispatch in power systems. This study explores violation-free distributed optimization techniques for problems characterized by separable objective functions and coupling constraints. First, we incorporate auxiliary decision variables toget… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 13 pages, 6 figures

  34. arXiv:2404.07543  [pdf, other

    cs.CV eess.IV

    Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening

    Authors: Yule Duan, Xiao Wu, Haoyu Deng, Liang-Jian Deng

    Abstract: Currently, machine learning-based methods for remote sensing pansharpening have progressed rapidly. However, existing pansharpening methods often do not fully exploit differentiating regional information in non-local spaces, thereby limiting the effectiveness of the methods and resulting in redundant learning parameters. In this paper, we introduce a so-called content-adaptive non-local convolutio… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  35. Grouping of $N-1$ Contingencies for Controller Synthesis: A Study for Power Line Failures

    Authors: Neelay Junnarkar, Emily Jensen, Xiaofan Wu, Suat Gumussoy, Murat Arcak

    Abstract: The problem of maintaining power system stability and performance after the failure of any single line in a power system (an "N-1 contingency") is investigated. Due to the large number of possible N-1 contingencies for a power network, it is impractical to optimize controller parameters for each possible contingency a priori. A method to partition a set of contingencies into groups of contingencie… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Submitted to the journal IEEE Transactions on Power Systems, 12 pages, 11 figures, 1 table

  36. arXiv:2404.01121  [pdf, other

    cs.CV eess.IV

    CMT: Cross Modulation Transformer with Hybrid Loss for Pansharpening

    Authors: Wen-Jie Shu, Hong-Xia Dou, Rui Wen, Xiao Wu, Liang-Jian Deng

    Abstract: Pansharpening aims to enhance remote sensing image (RSI) quality by merging high-resolution panchromatic (PAN) with multispectral (MS) images. However, prior techniques struggled to optimally fuse PAN and MS images for enhanced spatial and spectral information, due to a lack of a systematic framework capable of effectively coordinating their individual strengths. In response, we present the Cross… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  37. arXiv:2403.17639  [pdf, other

    eess.IV cs.CV

    High-Resolution Image Translation Model Based on Grayscale Redefinition

    Authors: Xixian Wu, Dian Chao, Yang Yang

    Abstract: Image-to-image translation is a technique that focuses on transferring images from one domain to another while maintaining the essential content representations. In recent years, image-to-image translation has gained significant attention and achieved remarkable advancements due to its diverse applications in computer vision and image processing tasks. In this work, we propose an innovative method… ▽ More

    Submitted 1 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  38. arXiv:2403.16823  [pdf, ps, other

    eess.SY cs.LG

    Resource and Mobility Management in Hybrid LiFi and WiFi Networks: A User-Centric Learning Approach

    Authors: Han Ji, Xiping Wu

    Abstract: Hybrid light fidelity (LiFi) and wireless fidelity (WiFi) networks (HLWNets) are an emerging indoor wireless communication paradigm, which combines the advantages of the capacious optical spectra of LiFi and ubiquitous coverage of WiFi. Meanwhile, load balancing (LB) becomes a key challenge in resource management for such hybrid networks. The existing LB methods are mostly network-centric, relying… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 12 pages, 12 figures, 3 tables, submitted to IEEE TWC

  39. arXiv:2403.16078  [pdf, other

    cs.SD eess.AS

    Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

    Authors: Wenxuan Wu, Xueyuan Chen, Xixin Wu, Haizhou Li, Helen Meng

    Abstract: Audio-visual target speech extraction (AV-TSE) is one of the enabling technologies in robotics and many audio-visual applications. One of the challenges of AV-TSE is how to effectively utilize audio-visual synchronization information in the process. AV-HuBERT can be a useful pre-trained model for lip-reading, which has not been adopted by AV-TSE. In this paper, we would like to explore the way to… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by IJCNN 2024

  40. arXiv:2403.10589  [pdf

    eess.IV cs.CV

    A General Method to Incorporate Spatial Information into Loss Functions for GAN-based Super-resolution Models

    Authors: Xijun Wang, Santiago López-Tapia, Alice Lucas, Xinyi Wu, Rafael Molina, Aggelos K. Katsaggelos

    Abstract: Generative Adversarial Networks (GANs) have shown great performance on super-resolution problems since they can generate more visually realistic images and video frames. However, these models often introduce side effects into the outputs, such as unexpected artifacts and noises. To reduce these artifacts and enhance the perceptual quality of the results, in this paper, we propose a general method… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  41. arXiv:2403.01093  [pdf, other

    eess.SP

    Variational Bayesian Learning Based Localization and Channel Reconstruction in RIS-aided Systems

    Authors: Yunfei Li, Yiting Luo, Xianda Wu, Zheng Shi, Shaodan Ma, Guanghua Yang

    Abstract: The emerging immersive and autonomous services have posed stringent requirements on both communications and localization. By considering the great potential of reconfigurable intelligent surface (RIS), this paper focuses on the joint channel estimation and localization for RIS-aided wireless systems. As opposed to existing works that treat channel estimation and localization independently, this pa… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  42. arXiv:2402.17455  [pdf, ps, other

    eess.AS

    CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

    Authors: Hao Ma, Zhiyuan Peng, Xu Li, Mingjie Shao, Xixin Wu, Ju Liu

    Abstract: Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings. This can be achieved by language-queried target sound extraction (TSE), which typically consists of two components: a query network that converts user queries into conditional embeddings, and a separation network that extracts the target sound accordingly. Existing methods commonly train models f… ▽ More

    Submitted 8 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  43. arXiv:2402.02146  [pdf, other

    cs.AI cs.LG cs.NI eess.SP

    Emergency Computing: An Adaptive Collaborative Inference Method Based on Hierarchical Reinforcement Learning

    Authors: Weiqi Fu, Lianming Xu, Xin Wu, Li Wang, Aiguo Fei

    Abstract: In achieving effective emergency response, the timely acquisition of environmental information, seamless command data transmission, and prompt decision-making are crucial. This necessitates the establishment of a resilient emergency communication dedicated network, capable of providing communication and sensing services even in the absence of basic infrastructure. In this paper, we propose an Emer… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  44. Image2Points:A 3D Point-based Context Clusters GAN for High-Quality PET Image Reconstruction

    Authors: Jiaqi Cui, Yan Wang, Lu Wen, Pinxian Zeng, Xi Wu, Jiliu Zhou, Dinggang Shen

    Abstract: To obtain high-quality Positron emission tomography (PET) images while minimizing radiation exposure, numerous methods have been proposed to reconstruct standard-dose PET (SPET) images from the corresponding low-dose PET (LPET) images. However, these methods heavily rely on voxel-based representations, which fall short of adequately accounting for the precise structure and fine-grained context, le… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted by ICASSP 2024

  45. arXiv:2401.17796  [pdf, other

    cs.SD eess.AS

    Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction

    Authors: Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech by improving the intelligibility and naturalness. This is a challenging task especially for patients with severe dysarthria and speaking in complex, noisy acoustic environments. To address these challenges, we propose a novel multi-modal framework to utilize visual information, e.g., lip movements, in DSR… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by ICASSP 2024

  46. arXiv:2401.14664  [pdf, other

    cs.SD cs.CL eess.AS

    UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

    Authors: Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

    Abstract: Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech. The technology eases communication with speakers affected by the neuromotor disorder and enhances their social inclusion. NED-based (Neural Encoder-Decoder) systems have significantly improved the intelligibility of the reconstructed speech as compared with GAN-based (Generati… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  47. arXiv:2401.13998  [pdf, other

    eess.IV cs.CV

    WAL-Net: Weakly supervised auxiliary task learning network for carotid plaques classification

    Authors: Haitao Gan, Lingchao Fu, Ran Zhou, Weiyan Gan, Furong Wang, Xiaoyan Wu, Zhi Yang, Zhongwei Huang

    Abstract: The classification of carotid artery ultrasound images is a crucial means for diagnosing carotid plaques, holding significant clinical relevance for predicting the risk of stroke. Recent research suggests that utilizing plaque segmentation as an auxiliary task for classification can enhance performance by leveraging the correlation between segmentation and classification tasks. However, this appro… ▽ More

    Submitted 27 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  48. arXiv:2401.13616  [pdf, other

    eess.IV cs.CV

    FLLIC: Functionally Lossless Image Compression

    Authors: Xi Zhang, Xiaolin Wu

    Abstract: Recently, DNN models for lossless image coding have surpassed their traditional counterparts in compression performance, reducing the bit rate by about ten percent for natural color images. But even with these advances, mathematically lossless image compression (MLLIC) ratios for natural images still fall short of the bandwidth and cost-effectiveness requirements of most practical imaging and visi… ▽ More

    Submitted 26 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  49. arXiv:2401.08913  [pdf, other

    cs.CV eess.IV

    Efficient Image Super-Resolution via Symmetric Visual Attention Network

    Authors: Chengxu Wu, Qinrui Fan, Shu Hu, Xi Wu, Xin Wang, Jing Hu

    Abstract: An important development direction in the Single-Image Super-Resolution (SISR) algorithms is to improve the efficiency of the algorithms. Recently, efficient Super-Resolution (SR) research focuses on reducing model complexity and improving efficiency through improved deep small kernel convolution, leading to a small receptive field. The large receptive field obtained by large kernel convolution ca… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 13 pages,4 figures

  50. arXiv:2401.08136  [pdf, other

    eess.SY

    Bias-Compensated State of Charge and State of Health Joint Estimation for Lithium Iron Phosphate Batteries

    Authors: Baozhao Yi, Xinhao Du, Jiawei Zhang, Xiaogang Wu, Qiuhao Hu, Weiran Jiang, Xiaosong Hu, Ziyou Song

    Abstract: Accurate estimation of the state of charge (SOC) and state of health (SOH) is crucial for the safe and reliable operation of batteries. Voltage measurement bias highly affects state estimation accuracy, especially in Lithium Iron Phosphate (LFP) batteries, which are susceptible due to their flat open-circuit voltage (OCV) curves. This work introduces a bias-compensated algorithm to reliably estima… ▽ More

    Submitted 12 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 9 pages and 8 figures