Zum Hauptinhalt springen

Showing 1–50 of 179 results for author: Song, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.09381  [pdf, other

    eess.SP

    Channel Estimation, Interpolation and Extrapolation in Doubly-dispersive Channels

    Authors: Zijun Gong, Fan Jiang, Yuhui Song, Cheng Li, Xiaofeng Tao

    Abstract: The OTFS (Orthogonal Time Frequency Space) is widely acknowledged for its ability to combat Doppler spread in time-varying channels. In this paper, another advantage of OTFS over OFDM (Orthogonal Frequency Division Multiplexing) will be demonstrated: much reduced channel training overhead. Specifically, the sparsity of the channel in delay-Doppler (D-D) domain implies strong correlation of channel… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  2. arXiv:2408.08673  [pdf, other

    cs.SD cs.AI eess.AS

    MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection

    Authors: Pengfei Cai, Yan Song, Kang Li, Haoyu Song, Ian McLoughlin

    Abstract: Sound event detection (SED) methods that leverage a large pre-trained Transformer encoder network have shown promising performance in recent DCASE challenges. However, they still rely on an RNN-based context network to model temporal dependencies, largely due to the scarcity of labeled data. In this work, we propose a pure Transformer-based SED model with masked-reconstruction based pre-training,… ▽ More

    Submitted 19 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: Received by interspeech 2024

  3. arXiv:2408.07385  [pdf, other

    cs.IT eess.SP

    Iterative Equalization of CPM With Unitary Approximate Message Passing

    Authors: Zilong Liu, Yi Song, Qinghua Guo, Peng Sun, Kexian Gong, Zhongyong Wang

    Abstract: Continuous phase modulation (CPM) has extensive applications in wireless communications due to its high spectral and power efficiency. However, its nonlinear characteristics pose significant challenges for detection in frequency selective fading channels. This paper proposes an iterative receiver tailored for the detection of CPM signals over frequency selective fading channels. This design levera… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  4. arXiv:2408.05360  [pdf, other

    cs.ET cs.NE eess.SY

    On Noise Resiliency of Neuromorphic Inferential Communication in Microgrids

    Authors: Yubo Song, Subham Sahoo, Xiaoguang Diao

    Abstract: Neuromorphic computing leveraging spiking neural network has emerged as a promising solution to tackle the security and reliability challenges with the conventional cyber-physical infrastructure of microgrids. Its event-driven paradigm facilitates promising prospect in resilient and energy-efficient coordination among power electronic converters. However, different from biological neurons that are… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

    Comments: This manuscript has been accepted for publication in 2024 IEEE Energy Conversion Congress and Exposition (ECCE)

  5. arXiv:2408.02622  [pdf, other

    cs.CL cs.AI cs.HC cs.SD eess.AS

    Language Model Can Listen While Speaking

    Authors: Ziyang Ma, Yakun Song, Chenpeng Du, Jian Cong, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen

    Abstract: Dialogue serves as the most natural manner of human-computer interaction (HCI). Recent advancements in speech language models (SLM) have significantly enhanced speech-based conversational AI. However, these models are limited to turn-based conversation, lacking the ability to interact with humans in real-time spoken scenarios, for example, being interrupted when the generated content is not satisf… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Demo can be found at https://ddlbojack.github.io/LSLM

  6. arXiv:2408.00323  [pdf, other

    eess.SY

    A Novel Edge Laplacian-based Approach for Adaptive Formation Control of Uncertain Multi-agent Systems with Unified Relative Error Performance

    Authors: Kun Li, Kai Zhao, Yongduan Song, Lihua Xie

    Abstract: For most existing prescribed performance formation control methods, performance requirements are not directly imposed on the relative states between agents but on the consensus error, which lacks a clear physical interpretation of their solution. In this paper, we propose a novel adaptive prescribed performance formation control strategy, capable of guaranteeing prescribed performance on the relat… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 9 pages, 3 figures, submitted to IEEE

  7. arXiv:2407.15139  [pdf, ps, other

    eess.SY

    An Interface Method for Co-simulation of EMT Model and Shifted Frequency EMT Model Based on Rotational Invariance Techniques

    Authors: Shilin Gao, Ying Chen, Zhitong Yu, Wensheng Chen, Yankan Song

    Abstract: The shifted frequency-based electromagnetic transient (SFEMT) simulation has greatly improved the computational efficiency of traditional electromagnetic transient (EMT) simulation for the ac grid. This letter proposes a novel interface for the co-simulation of the SFEMT model and the traditional EMT model. The general form of SFEMT modeling and the principle of analytical signal construction are… ▽ More

    Submitted 27 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  8. arXiv:2407.14883  [pdf, other

    eess.SY cs.AI cs.NE

    Inferring Ingrained Remote Information in AC Power Flows Using Neuromorphic Modality Regime

    Authors: Xiaoguang Diao, Yubo Song, Subham Sahoo

    Abstract: In this paper, we infer remote measurements such as remote voltages and currents online with change in AC power flows using spiking neural network (SNN) as grid-edge technology for efficient coordination of power electronic converters. This work unifies power and information as a means of data normalization using a multi-modal regime in the form of spikes using energy-efficient neuromorphic learni… ▽ More

    Submitted 9 August, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

    Comments: The manuscript has been accepted for publication in the Proceedings of 2024 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm 2024)

  9. arXiv:2407.05558  [pdf

    math.OC eess.SY

    Hidden Convexity-Based Distributed Operation of Integrated Electricity-Gas Systems

    Authors: Rong-Peng Liu, Yue Song, Junhong Liu, Xiaozhe Wang, Jinpeng Guo, Yunhe Hou

    Abstract: We propose a hidden convexity-based method to address distributed optimal energy flow (OEF) problems for transmission-level integrated electricity-gas systems. First, we develop a node-wise decoupling method to de-compose an OEF problem into multiple OEF subproblems. Then, we propose a hidden convexity-based method to equivalently reformulate nonconvex OEF subproblems as semi-definite programs. Th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: 7 pages

  10. arXiv:2407.01336  [pdf, other

    cs.IT eess.SP

    Compressed Sensing Inspired User Acquisition for Downlink Integrated Sensing and Communication Transmissions

    Authors: Yi Song, Fernando Pedraza, Shuangyang Li, Siyao Li, Han Yu, Giuseppe Caire

    Abstract: This paper investigates radar-assisted user acquisition for downlink multi-user multiple-input multiple-output (MIMO) transmission using Orthogonal Frequency Division Multiplexing (OFDM) signals. Specifically, we formulate a concise mathematical model for the user acquisition problem, where each user is characterized by its delay and beamspace response. Therefore, we propose a two-stage method for… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  11. arXiv:2407.00681  [pdf, other

    eess.SY

    Safe Reinforcement Learning for Power System Control: A Review

    Authors: Peipei Yu, Zhenyi Wang, Hongcai Zhang, Yonghua Song

    Abstract: The large-scale integration of intermittent renewable energy resources introduces increased uncertainty and volatility to the supply side of power systems, thereby complicating system operation and control. Recently, data-driven approaches, particularly reinforcement learning (RL), have shown significant promise in addressing complex control challenges in power systems, because RL can learn from i… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  12. arXiv:2406.19560  [pdf, other

    cs.CV cs.LG eess.IV

    Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

    Authors: Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman, Marcio F. R. Resende, William M. Hammond, Chris H. Wilson, Alina Zare, Sanjeev J. Koppal

    Abstract: Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  13. arXiv:2406.15752  [pdf, other

    eess.AS cs.AI cs.CL

    TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

    Authors: Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen

    Abstract: Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, T… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  14. arXiv:2406.15727  [pdf, other

    eess.IV cs.CV

    Semi-supervised variational autoencoder for cell feature extraction in multiplexed immunofluorescence images

    Authors: Piumi Sandarenu, Julia Chen, Iveta Slapetova, Lois Browne, Peter H. Graham, Alexander Swarbrick, Ewan K. A. Millar, Yang Song, Erik Meijering

    Abstract: Advancements in digital imaging technologies have sparked increased interest in using multiplexed immunofluorescence (mIF) images to visualise and identify the interactions between specific immunophenotypes with the tumour microenvironment at the cellular level. Current state-of-the-art multiplexed immunofluorescence image analysis pipelines depend on cell feature representations characterised by… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  15. arXiv:2406.14875  [pdf, other

    cs.SD eess.AS

    GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech

    Authors: Wenbin Wang, Yang Song, Sanjay Jha

    Abstract: This paper introduces GLOBE, a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024, 4 pages, 3 figures

  16. arXiv:2406.14372  [pdf, ps, other

    eess.SY

    Ring-LWE based encrypted controller with unlimited number of recursive multiplications and effect of error growth

    Authors: Yeongjun Jang, Joowon Lee, Seonhong Min, Hyesun Kwak, Junsoo Kim, Yongsoo Song

    Abstract: In this paper, we propose a method to encrypt linear dynamic controllers that enables an unlimited number of recursive homomorphic multiplications on a Ring Learning With Errors (Ring-LWE) based cryptosystem without bootstrapping. Unlike LWE based schemes, where a scalar error is injected during encryption for security, Ring-LWE based schemes are based on polynomial rings and inject error as a pol… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 12 pages, 3 figures

  17. arXiv:2406.11248  [pdf

    eess.AS cs.AI cs.SD

    Performance Improvement of Language-Queried Audio Source Separation Based on Caption Augmentation From Large Language Models for DCASE Challenge 2024 Task 9

    Authors: Do Hyun Lee, Yoonah Song, Hong Kook Kim

    Abstract: We present a prompt-engineering-based text-augmentation approach applied to a language-queried audio source separation (LASS) task. To enhance the performance of LASS, the proposed approach utilizes large language models (LLMs) to generate multiple captions corresponding to each sentence of the training dataset. To this end, we first perform experiments to identify the most effective prompts for c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: DCASE 2024 Challenge Task 9, 4 pages

  18. arXiv:2406.09695  [pdf, other

    eess.SP

    Machine learning-based Near-field Emitter Localization via Grouped Hybrid Analog and Digital Massive MIMO Receive Array

    Authors: Yifan Li, Feng Shu, Jiatong Bai, Cunhua Pan, Yongpeng Wu, Yaoliang Song, Jiangzhou Wang

    Abstract: A fully-digital massive MIMO receive array is promising to meet the high-resolution requirement of near-field (NF) emitter localization, but it also results in the significantly increasing of hardware costs and algorithm complexity. In order to meet the future demand for green communication while maintaining high performance, the grouped hybrid analog and digital (HAD) structure is proposed for NF… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  19. arXiv:2406.08337  [pdf, other

    cs.CV eess.IV

    WMAdapter: Adding WaterMark Control to Latent Diffusion Models

    Authors: Hai Ci, Yiren Song, Pei Yang, Jinheng Xie, Mike Zheng Shou

    Abstract: Watermarking is crucial for protecting the copyright of AI-generated images. We propose WMAdapter, a diffusion model watermark plugin that takes user-specified watermark information and allows for seamless watermark imprinting during the diffusion generation process. WMAdapter is efficient and robust, with a strong emphasis on high generation quality. To achieve this, we make two key designs: (1)… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 20 pages, 13 figures

  20. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  21. arXiv:2406.01191  [pdf, other

    eess.IV cs.CV cs.LG

    S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography

    Authors: Yuhan Song, Nak Young Chong

    Abstract: Ultrasound imaging is pivotal in various medical diagnoses due to its non-invasive nature and safety. In clinical practice, the accuracy and precision of ultrasound image analysis are critical. Recent advancements in deep learning are showing great capacity of processing medical images. However, the data hungry nature of deep learning and the shortage of high-quality ultrasound image training data… ▽ More

    Submitted 22 August, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper is accepted by 2024 IEEE International Conference on Cyborg and Bionic Systems

  22. arXiv:2405.14300  [pdf, other

    eess.IV cs.CV

    Automatic diagnosis of cardiac magnetic resonance images based on semi-supervised learning

    Authors: Hejun Huang, Zuguo Chen, Yi Huang, Guangqiang Luo, Chaoyang Chen, Youzhi Song

    Abstract: Cardiac magnetic resonance imaging (MRI) is a pivotal tool for assessing cardiac function. Precise segmentation of cardiac structures is imperative for accurate cardiac functional evaluation. This paper introduces a semi-supervised model for automatic segmentation of cardiac images and auxiliary diagnosis. By harnessing cardiac MRI images and necessitating only a small portion of annotated image d… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  23. arXiv:2404.18094  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

    Authors: Wenbin Wang, Yang Song, Sanjay Jha

    Abstract: Conventional text-to-speech (TTS) research has predominantly focused on enhancing the quality of synthesized speech for speakers in the training dataset. The challenge of synthesizing lifelike speech for unseen, out-of-dataset speakers, especially those with limited reference data, remains a significant and unresolved problem. While zero-shot or few-shot speaker-adaptive TTS approaches have been e… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 15 pages, 13 figures. Copyright has been transferred to IEEE

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2024

  24. arXiv:2404.16412  [pdf, ps, other

    eess.SY

    Distributed Matrix Pencil Formulations for Prescribed-Time Leader-Following Consensus of MASs with Unknown Sensor Sensitivity

    Authors: Hefu Ye, Changyun Wen, Yongduan Song

    Abstract: In this paper, we address the problem of prescribed-time leader-following consensus of heterogeneous multi-agent systems (MASs) in the presence of unknown sensor sensitivity. Under a connected undirected topology, we propose a time-varying dual observer/controller design framework that makes use of regular local and inaccurate feedback to achieve consensus tracking within a prescribed time. In par… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure

  25. arXiv:2404.13714  [pdf, other

    eess.SY

    Self-Adjusting Prescribed Performance Control for Nonlinear Systems with Input Saturation

    Authors: Zhuwu Shao, Yujuan Wang, Huanyu Yang, Yongduan Song

    Abstract: Among the existing works on enhancing system performance via prescribed performance functions (PPFs), the decay rates of PPFs need to be predetermined by the designer, directly affecting the convergence time of the closed-loop system. However, if only considering accelerating the system convergence by selecting a big decay rate of the performance function, it may lead to the severe consequence of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  26. arXiv:2404.13315  [pdf, other

    eess.SP

    BERT: Accelerating Vital Signs Measurement for Bioradar with An Efficient Recursive Technique

    Authors: Chengyao Tang, Yongpeng Dai, Zhi Li, Yongping Song, Fulai Liang, Tian Jin

    Abstract: Recent years have witnessed the great advance of bioradar system in smart sensing of vital signs (VS) for human healthcare monitoring. As an important part of VS sensing process, VS measurement aims to capture the chest wall micromotion induced by the human respiratory and cardiac activities. Unfortunately, the existing VS measurement methods using bioradar have encountered bottlenecks in making a… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 4pages, 8 figures, submitted to the IEEE for possible publication

  27. arXiv:2404.09801  [pdf, other

    eess.SY eess.SP

    A Gray-Box Stability Analysis Mechanism for Power Electronic Converters

    Authors: Rui Kong, Subham Sahoo, Yubo Song, Frede Blaabjerg

    Abstract: This paper proposes a gray-box stability analysis mechanism based on data-driven dynamic mode decomposition (DMD) for commercial grid-tied power electronics converters with limited information on its control parameters and topology. By fusing the underlying physical constraints of the state equations into data snapshots, the system dynamic state matrix and input matrix are simultaneously approxima… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  28. arXiv:2404.07092  [pdf, other

    eess.SP physics.optics

    Net 835-Gb/s/λ Carrier- and LO-Free 100-km Transmission Using Channel-Aware Phase Retrieval Reception

    Authors: Hanzi Huang, Haoshuo Chen, Qian Hu, Di Che, Yetian Huang, Brian Stern, Nicolas K. Fontaine, Mikael Mazur, Lauren Dallachiesa, Roland Ryf, Zhengxuan Li, Yingxiong Song

    Abstract: We experimentally demonstrate the first carrier- and LO-free 800G/λ receiver enabling direct compatibility with standard coherent transmitters via phase retrieval, achieving net 835-Gb/s transmission over 100-km SMF and record 8.27-b/s/Hz net optical spectral efficiency.

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 3 pages, 3 figures

  29. IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images

    Authors: Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin

    Abstract: Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TGRS, we first present an iterative diffusion process for cloud removal, the code is available at: https://github.com/SongYxing/IDF-CR

  30. arXiv:2403.02651  [pdf, other

    eess.SP cs.AI

    Learning at the Speed of Wireless: Online Real-Time Learning for AI-Enabled MIMO in NextG

    Authors: Jiarui Xu, Shashank Jere, Yifei Song, Yi-Hung Kao, Lizhong Zheng, Lingjia Liu

    Abstract: Integration of artificial intelligence (AI) and machine learning (ML) into the air interface has been envisioned as a key technology for next-generation (NextG) cellular networks. At the air interface, multiple-input multiple-output (MIMO) and its variants such as multi-user MIMO (MU-MIMO) and massive/full-dimension MIMO have been key enablers across successive generations of cellular networks wit… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 7 pages, 4 figures, 1 table, magazine paper

  31. arXiv:2403.01013  [pdf

    eess.SY

    A Holistic Power Optimization Approach for Microgrid Control Based on Deep Reinforcement Learning

    Authors: Fulong Yao, Wanqing Zhao, Matthew Forshaw, Yang Song

    Abstract: The global energy landscape is undergoing a transformation towards decarbonization, sustainability, and cost-efficiency. In this transition, microgrid systems integrated with renewable energy sources (RES) and energy storage systems (ESS) have emerged as a crucial component. However, optimizing the operational control of such an integrated energy system lacks a holistic view of multiple environmen… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  32. arXiv:2402.18390  [pdf, other

    cs.ET cs.AI cs.NE eess.SY

    Neuromorphic Event-Driven Semantic Communication in Microgrids

    Authors: Xiaoguang Diao, Yubo Song, Subham Sahoo, Yuan Li

    Abstract: Synergies between advanced communications, computing and artificial intelligence are unraveling new directions of coordinated operation and resiliency in microgrids. On one hand, coordination among sources is facilitated by distributed, privacy-minded processing at multiple locations, whereas on the other hand, it also creates exogenous data arrival paths for adversaries that can lead to cyber-phy… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: The manuscript has been accepted for publication in IEEE Transactions on Smart Grid

  33. arXiv:2402.04584  [pdf, other

    eess.IV cs.CV

    Troublemaker Learning for Low-Light Image Enhancement

    Authors: Yinghao Song, Zhiyuan Cao, Wanhong Xiang, Sifan Long, Bo Yang, Hongwei Ge, Yanchun Liang, Chunguo Wu

    Abstract: Low-light image enhancement (LLIE) restores the color and brightness of underexposed images. Supervised methods suffer from high costs in collecting low/normal-light image pairs. Unsupervised methods invest substantial effort in crafting complex loss functions. We address these two challenges through the proposed TroubleMaker Learning (TML) strategy, which employs normal-light images as inputs for… ▽ More

    Submitted 2 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  34. arXiv:2402.01933  [pdf, other

    eess.AS cs.SD

    ToMoBrush: Exploring Dental Health Sensing using a Sonic Toothbrush

    Authors: Kuang Yuan, Mohamed Ibrahim, Yiwen Song, Guoxiang Deng, Suvendra Vijayan, Robert Nerone, Akshay Gadre, Swarun Kumar

    Abstract: Early detection of dental disease is crucial to prevent adverse outcomes. Today, dental X-rays are currently the most accurate gold standard for dental disease detection. Unfortunately, regular X-ray exam is still a privilege for billions of people around the world. In this paper, we ask: "Can we develop a low-cost sensing system that enables dental self-examination in the comfort of one's home?"… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    ACM Class: J.3; C.3; H.5.2

  35. arXiv:2401.15803  [pdf, other

    cs.RO cs.AI cs.CV eess.SY

    GarchingSim: An Autonomous Driving Simulator with Photorealistic Scenes and Minimalist Workflow

    Authors: Liguo Zhou, Yinglei Song, Yichao Gao, Zhou Yu, Michael Sodamin, Hongshen Liu, Liang Ma, Lian Liu, Hao Liu, Yang Liu, Haichuan Li, Guang Chen, Alois Knoll

    Abstract: Conducting real road testing for autonomous driving algorithms can be expensive and sometimes impractical, particularly for small startups and research institutes. Thus, simulation becomes an important method for evaluating these algorithms. However, the availability of free and open-source simulators is limited, and the installation and configuration process can be daunting for beginners and inte… ▽ More

    Submitted 30 January, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  36. Hybrid Message Passing-Based Detectors for Uplink Grant-Free NOMA Systems

    Authors: Yi Song, Yiwen Zhu, Kun Chen-Hu, Xinhua Lu, Peng Sun, Zhongyong Wang

    Abstract: This paper studies improving the detector performance which considers the activity state (AS) temporal correlation of the user equipments (UEs) in the time domain under the uplink grant-free non-orthogonal multiple access (GF-NOMA) system. The Bernoulli Gaussian-Markov chain (BG-MC) probability model is used for exploiting both the sparsity and slow change characteristic of the AS of the UE. The G… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Journal ref: Drones 2024, 8, 325

  37. arXiv:2401.07333  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering

    Authors: Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Xie Chen

    Abstract: The language model (LM) approach based on acoustic and linguistic prompts, such as VALL-E, has achieved remarkable progress in the field of zero-shot audio generation. However, existing methods still have some limitations: 1) repetitions, transpositions, and omissions in the output synthesized speech due to limited alignment constraints between audio and phoneme tokens; 2) challenges of fine-grain… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

    Comments: Working in progress

  38. arXiv:2312.09952  [pdf, other

    eess.AS cs.SD

    Multi-level graph learning for audio event classification and human-perceived annoyance rating prediction

    Authors: Yuanbo Hou, Qiaoqiao Ren, Siyang Song, Yuxin Song, Wenwu Wang, Dick Botteldooren

    Abstract: WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-relat… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP 2024

  39. Implementing Digital Twin in Field-Deployed Optical Networks: Uncertain Factors, Operational Guidance, and Field-Trial Demonstration

    Authors: Yuchen Song, Min Zhang, Yao Zhang, Yan Shi, Shikui Shen, Bingli Guo, Shanguo Huang, Danshi Wang

    Abstract: Digital twin has revolutionized optical communication networks by enabling their full life-cycle management, including design, troubleshooting, optimization, upgrade, and prediction. While extensive literature exists on frameworks, standards, and applications of digital twin, there is a pressing need in implementing digital twin in field-deployed optical networks operating in real-world environmen… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 10 pages, 5 figures Accepted by IEEE Network Magazine, early access

  40. arXiv:2311.11596  [pdf

    cs.HC cs.IT eess.SP q-bio.NC

    High-performance cVEP-BCI under minimal calibration

    Authors: Yining Miao, Nanlin Shi, Changxing Huang, Yonghao Song, Xiaogang Chen, Yijun Wang, Xiaorong Gao

    Abstract: The ultimate goal of brain-computer interfaces (BCIs) based on visual modulation paradigms is to achieve high-speed performance without the burden of extensive calibration. Code-modulated visual evoked potential-based BCIs (cVEP-BCIs) modulated by broadband white noise (WN) offer various advantages, including increased communication speed, expanded encoding target capabilities, and enhanced coding… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: 35 pages, 5 figures

  41. arXiv:2311.10224  [pdf, other

    eess.IV cs.CV cs.LG

    CV-Attention UNet: Attention-based UNet for 3D Cerebrovascular Segmentation of Enhanced TOF-MRA Images

    Authors: Syed Farhan Abbas, Nguyen Thanh Duc, Yoonguu Song, Kyungwon Kim, Ekta Srivastava, Boreom Lee

    Abstract: Due to the lack of automated methods, to diagnose cerebrovascular disease, time-of-flight magnetic resonance angiography (TOF-MRA) is assessed visually, making it time-consuming. The commonly used encoder-decoder architectures for cerebrovascular segmentation utilize redundant features, eventually leading to the extraction of low-level features multiple times. Additionally, convolutional neural ne… ▽ More

    Submitted 19 June, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  42. arXiv:2311.08758  [pdf, other

    eess.SP

    A Novel Tree Model-based DNN to Achieve a High-Resolution DOA Estimation via Massive MIMO receive array

    Authors: Yifan Li, Feng Shu, Jun Zou, Wei Gao, Yaoliang Song, Jiangzhou Wang

    Abstract: To satisfy the high-resolution requirements of direction-of-arrival (DOA) estimation, conventional deep neural network (DNN)-based methods using grid idea need to significantly increase the number of output classifications and also produce a huge high model complexity. To address this problem, a multi-level tree-based DNN model (TDNN) is proposed as an alternative, where each level takes small-sca… ▽ More

    Submitted 12 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

  43. arXiv:2311.07179  [pdf, other

    cs.SD eess.AS

    SponTTS: modeling and transferring spontaneous style for TTS

    Authors: Hanzhao Li, Xinfa Zhu, Liumeng Xue, Yang Song, Yunlin Chen, Lei Xie

    Abstract: Spontaneous speaking style exhibits notable differences from other speaking styles due to various spontaneous phenomena (e.g., filled pauses, prolongation) and substantial prosody variation (e.g., diverse pitch and duration variation, occasional non-verbal speech like a smile), posing challenges to modeling and prediction of spontaneous style. Moreover, the limitation of high-quality spontaneous d… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 5 pages, 3 figures, Accepted by ICASSP2024

  44. arXiv:2310.07974  [pdf, other

    eess.SY

    Causality-based Cost Allocation for Peer-to-Peer Energy Trading in Distribution System

    Authors: Hyun Joong Kim, Yong Hyun Song, Jip Kim

    Abstract: While peer-to-peer energy trading has the potential to harness the capabilities of small-scale energy resources, a peer-matching process often overlooks power grid conditions, yielding increased losses, line congestion, and voltage problems. This imposes a great challenge on the distribution system operator (DSO), which can eventually limit peer-to-peer energy trading. To align the peer-matching p… ▽ More

    Submitted 20 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 7 pages, 7 figures

  45. arXiv:2310.05314  [pdf, other

    eess.SP physics.optics

    Distortion-Aware Phase Retrieval Receiver for High-Order QAM Transmission with Carrierless Intensity-Only Measurements

    Authors: Hanzi Huang, Haoshuo Chen, Qi Gao, Yetian Huang, Nicolas K. Fontaine, Mikael Mazur, Lauren Dallachiesa, Roland Ryf, Zhengxuan Li, Yingxiong Song

    Abstract: We experimentally investigate transmitting high-order quadrature amplitude modulation (QAM) signals with carrierless and intensity-only measurements with phase retrieval (PR) receiving techniques. The intensity errors during measurement, including noise and distortions, are found to be a limiting factor for the precise convergence of the PR algorithm. To improve the PR reconstruction accuracy, we… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: 12 pages, 12 figures

  46. arXiv:2309.16817  [pdf, other

    eess.SY cs.RO

    Safe Non-Stochastic Control of Control-Affine Systems: An Online Convex Optimization Approach

    Authors: Hongyu Zhou, Yichen Song, Vasileios Tzoumas

    Abstract: We study how to safely control nonlinear control-affine systems that are corrupted with bounded non-stochastic noise, i.e., noise that is unknown a priori and that is not necessarily governed by a stochastic model. We focus on safety constraints that take the form of time-varying convex constraints such as collision-avoidance and control-effort constraints. We provide an algorithm with bounded dyn… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Accepted by IEEE Robotics and Automation Letters

  47. arXiv:2309.13942  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Speed Co-Augmentation for Unsupervised Audio-Visual Pre-training

    Authors: Jiangliu Wang, Jianbo Jiao, Yibing Song, Stephen James, Zhan Tong, Chongjian Ge, Pieter Abbeel, Yun-hui Liu

    Abstract: This work aims to improve unsupervised audio-visual pre-training. Inspired by the efficacy of data augmentation in visual contrastive learning, we propose a novel speed co-augmentation method that randomly changes the playback speeds of both audio and video data. Despite its simplicity, the speed co-augmentation method possesses two compelling attributes: (1) it increases the diversity of audio-vi… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: Published at the CVPR 2023 Sight and Sound workshop

  48. arXiv:2309.13860  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning

    Authors: Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen

    Abstract: Recent years have witnessed significant advancements in self-supervised learning (SSL) methods for speech-processing tasks. Various speech-based SSL models have been developed and present promising performance on a range of downstream tasks including speech recognition. However, existing speech-based SSL models face a common dilemma in terms of computational cost, which might hinder their potentia… ▽ More

    Submitted 29 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  49. arXiv:2309.11715   

    cs.CV eess.IV

    Deshadow-Anything: When Segment Anything Model Meets Zero-shot shadow removal

    Authors: Xiao Feng Zhang, Tian Yi Song, Jia Wei Yao

    Abstract: Segment Anything (SAM), an advanced universal image segmentation model trained on an expansive visual dataset, has set a new benchmark in image segmentation and computer vision. However, it faced challenges when it came to distinguishing between shadows and their backgrounds. To address this, we developed Deshadow-Anything, considering the generalization of large-scale datasets, and we performed F… ▽ More

    Submitted 2 January, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

    Comments: it needs revised

  50. arXiv:2309.10153  [pdf, other

    eess.IV cs.CV cs.LG

    Preserving Tumor Volumes for Unsupervised Medical Image Registration

    Authors: Qihua Dong, Hao Du, Ying Song, Yan Xu, Jing Liao

    Abstract: Medical image registration is a critical task that estimates the spatial correspondence between pairs of images. However, current traditional and deep-learning-based methods rely on similarity measures to generate a deforming field, which often results in disproportionate volume changes in dissimilar regions, especially in tumor regions. These changes can significantly alter the tumor size and und… ▽ More

    Submitted 9 May, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICCV 2023 Poster