Zum Hauptinhalt springen

Showing 1–50 of 231 results for author: Yu, Y

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.06911  [pdf, other

    eess.AS cs.AI

    Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement

    Authors: Tao Zheng, Liejun Wang, Yinfeng Yu

    Abstract: Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Considering the aforementioned issues, our study introd… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  2. arXiv:2408.06906  [pdf, other

    eess.AS cs.AI

    VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders

    Authors: Yubing Cao, Yongming Li, Liejun Wang, Yinfeng Yu

    Abstract: Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  3. arXiv:2408.06851  [pdf, other

    eess.AS cs.AI

    BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech Enhancement Network based on Self-Supervised Embedding

    Authors: Alimjan Mattursun, Liejun Wang, Yinfeng Yu

    Abstract: Speech self-supervised learning (SSL) represents has achieved state-of-the-art (SOTA) performance in multiple downstream tasks. However, its application in speech enhancement (SE) tasks remains immature, offering opportunities for improvement. In this study, we introduce a novel cross-domain feature fusion and multi-attention speech enhancement network, termed BSS-CFFMA, which leverages self-super… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  4. arXiv:2407.16779  [pdf, other

    eess.SY

    Learning Networked Dynamical System Models with Weak Form and Graph Neural Networks

    Authors: Yin Yu, Daning Huang, Seho Park, Herschel C. Pangborn

    Abstract: This paper presents a sequence of two approaches for the data-driven control-oriented modeling of networked systems, i.e., the systems that involve many interacting dynamical components. First, a novel deep learning approach named the weak Latent Dynamics Model (wLDM) is developed for learning generic nonlinear dynamics with control. Leveraging the weak form, the wLDM enables more numerically stab… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  5. arXiv:2407.12380  [pdf, other

    eess.AS cs.SD

    PCQ: Emotion Recognition in Speech via Progressive Channel Querying

    Authors: Xincheng Wang, Liejun Wang, Yinfeng Yu, Xinxin Jiao

    Abstract: In human-computer interaction (HCI), Speech Emotion Recognition (SER) is a key technology for understanding human intentions and emotions. Traditional SER methods struggle to effectively capture the long-term temporal correla-tions and dynamic variations in complex emotional expressions. To overcome these limitations, we introduce the PCQ method, a pioneering approach for SER via \textbf{P}rogress… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted for publication by International Conference On Intelligent Computing 2024. For data and code, see <a href="https://github.com/ICIG/PCQ-Net">this https URL</a>

  6. arXiv:2407.06525  [pdf, other

    eess.IV cs.CV

    UnmixingSR: Material-aware Network with Unsupervised Unmixing as Auxiliary Task for Hyperspectral Image Super-resolution

    Authors: Yang Yu

    Abstract: Deep learning-based (DL-based) hyperspectral image (HIS) super-resolution (SR) methods have achieved remarkable performance and attracted attention in industry and academia. Nonetheless, most current methods explored and learned the mapping relationship between low-resolution (LR) and high-resolution (HR) HSIs, leading to the side effect of increasing unreliability and irrationality in solving the… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  7. arXiv:2407.01956  [pdf, other

    eess.SY cs.RO

    Cloud-Edge-Terminal Collaborative AIGC for Autonomous Driving

    Authors: Jianan Zhang, Zhiwei Wei, Boxun Liu, Xiayi Wang, Yong Yu, Rongqing Zhang

    Abstract: In dynamic autonomous driving environment, Artificial Intelligence-Generated Content (AIGC) technology can supplement vehicle perception and decision making by leveraging models' generative and predictive capabilities, and has the potential to enhance motion planning, trajectory prediction and traffic simulation. This article proposes a cloud-edge-terminal collaborative architecture to support AIG… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2407.00995  [pdf, other

    cs.CY eess.SY physics.app-ph

    Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense

    Authors: Yi Yu, Shengyue Yao, Tianchen Zhou, Yexuan Fu, Jingru Yu, Ding Wang, Xuhong Wang, Cen Chen, Yilun Lin

    Abstract: In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  9. arXiv:2406.08761  [pdf, other

    cs.SD eess.AS

    VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

    Authors: Yifeng Yu, Jiatong Shi, Yuning Wu, Shinji Watanabe

    Abstract: Singing Voice Synthesis (SVS) has witnessed significant advancements with the advent of deep learning techniques. However, a significant challenge in SVS is the scarcity of labeled singing voice data, which limits the effectiveness of supervised learning methods. In response to this challenge, this paper introduces a novel approach to enhance the quality of SVS by leveraging unlabeled data from pr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures

  10. arXiv:2405.09572  [pdf, other

    eess.SP cs.AI

    Deep Neural Operator Enabled Digital Twin Modeling for Additive Manufacturing

    Authors: Ning Liu, Xuxiao Li, Manoj R. Rajanna, Edward W. Reutzel, Brady Sawyer, Prahalada Rao, Jim Lua, Nam Phan, Yue Yu

    Abstract: A digital twin (DT), with the components of a physics-based model, a data-driven model, and a machine learning (ML) enabled efficient surrogate, behaves as a virtual twin of the real-world physical process. In terms of Laser Powder Bed Fusion (L-PBF) based additive manufacturing (AM), a DT can predict the current and future states of the melt pool and the resulting defects corresponding to the inp… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  11. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  12. arXiv:2405.02963  [pdf

    cs.CR eess.SY

    Preventive Audits for Data Applications Before Data Sharing in the Power IoT

    Authors: Bohong Wang, Qinglai Guo, Yanxi Lin, Yang Yu

    Abstract: With the increase in data volume, more types of data are being used and shared, especially in the power Internet of Things (IoT). However, the processes of data sharing may lead to unexpected information leakage because of the ubiquitous relevance among the different data, thus it is necessary for data owners to conduct preventive audits for data applications before data sharing to avoid the risk… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 19 pages, 18 figures

  13. arXiv:2404.18418  [pdf, other

    cs.NI eess.SY

    Decomposition Model Assisted Energy-Saving Design in Radio Access Network

    Authors: Xiaoxue Zhao, Yijun Yu, Yexing Li, Dong Li, Yao Wang, Chungang Yang

    Abstract: The continuous emergence of novel services and massive connections involve huge energy consumption towards ultra-dense radio access networks. Moreover, there exist much more number of controllable parameters that can be adjusted to reduce the energy consumption from a network-wide perspective. However, a network-level energy-saving intent usually contains multiple network objectives and constraint… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  14. arXiv:2404.18176  [pdf, other

    eess.SY

    Auto-Optimized Maximum Torque Per Ampere Control of IPMSM Using Dual Control for Exploration and Exploitation

    Authors: Yuefei Zuo, Yalei Yu, Jun Yang, Wen-Hua Chen

    Abstract: In this paper, a maximum torque per ampere (MTPA) control strategy for the interior permanent magnet synchronous motor (IPMSM) using dual control for exploration and exploitation (DCEE). In the proposed method, the permanent magnet flux and the difference between the $d$- and $q$-axis inductance are identified by multiple estimators using the recursive least square method. The initial values of th… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  15. arXiv:2404.13789  [pdf, other

    cs.SD cs.AI cs.IR cs.MM eess.AS

    Anchor-aware Deep Metric Learning for Audio-visual Retrieval

    Authors: Donghuo Zeng, Yanan Wang, Kazushi Ikeda, Yi Yu

    Abstract: Metric learning minimizes the gap between similar (positive) pairs of data points and increases the separation of dissimilar (negative) pairs, aiming at capturing the underlying data structure and enhancing the performance of tasks like audio-visual cross-modal retrieval (AV-CMR). Recent works employ sampling methods to select impactful data points from the embedding space during training. However… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 9 pages, 5 figures. Accepted by ACM ICMR 2024

  16. arXiv:2404.13509  [pdf, ps, other

    cs.SD cs.AI eess.AS

    MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

    Authors: Xinxin Jiao, Liejun Wang, Yinfeng Yu

    Abstract: Speech emotion recognition is crucial in human-computer interaction, but extracting and using emotional cues from audio poses challenges. This paper introduces MFHCA, a novel method for Speech Emotion Recognition using Multi-Spatial Fusion and Hierarchical Cooperative Attention on spectrograms and raw audio. We employ the Multi-Spatial Fusion module (MF) to efficiently identify emotion-related spe… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Main paper (5 pages). Accepted for publication by ICME 2024

  17. arXiv:2404.09007  [pdf, other

    eess.SY

    A Framework for Safe Probabilistic Invariance Verification of Stochastic Dynamical Systems

    Authors: Taoran Wu, Yiqing Yu, Bican Xia, Ji Wang, Bai Xue

    Abstract: Ensuring safety through set invariance has proven to be a valuable method in various robotics and control applications. This paper introduces a comprehensive framework for the safe probabilistic invariance verification of both discrete- and continuous-time stochastic dynamical systems over an infinite time horizon. The objective is to ascertain the lower and upper bounds of liveness probabilities… ▽ More

    Submitted 3 August, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  18. arXiv:2403.17770  [pdf, other

    eess.IV cs.CV

    CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation

    Authors: Yongrui Yu, Hanyu Chen, Zitian Zhang, Qiong Xiao, Wenhui Lei, Linrui Dai, Yu Fu, Hui Tan, Guan Wang, Peng Gao, Xiaofan Zhang

    Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to the complex abdominal environment, small and indistinguishable lesions, and limited annotated data. To address these problems, we present a pipeline that integrates the conditional diffusion model for lymph node… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  19. arXiv:2403.14250  [pdf, other

    eess.IV cs.CR cs.CV

    Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations

    Authors: Xun Lin, Yi Yu, Song Xia, Jue Jiang, Haoran Wang, Zitong Yu, Yizhong Liu, Ying Fu, Shuai Wang, Wenzhong Tang, Alex Kot

    Abstract: The widespread availability of publicly accessible medical images has significantly propelled advancements in various research and clinical fields. Nonetheless, concerns regarding unauthorized training of AI systems for commercial purposes and the duties of patient privacy protection have led numerous institutions to hesitate to share their images. This is particularly true for medical image segme… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  20. arXiv:2403.12487  [pdf

    eess.SY

    Unveiling Four Key Factors for Tire Force Control Allocation in 4WID-4WIS Electric Vehicles at Handling Limits

    Authors: Ao Lu, Runfeng Li, Yunchang Yu, Ziwang Lu, Guangyu Tian

    Abstract: The four-wheel independent drive and four-wheel independent steering (4WID-4WIS) configurations enhance control flexibility and dynamic performance potential for more integrated electric vehicles. This paper comprehensively analyzes the impacts of four key factors on tire force control allocation: vertical load estimation, actuator dynamic characteristics, tire force constraints, and wheel steerin… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  21. arXiv:2403.10384  [pdf, other

    cs.GT cs.MA eess.SY

    Coordination in Noncooperative Multiplayer Matrix Games via Reduced Rank Correlated Equilibria

    Authors: Jaehan Im, Yue Yu, David Fridovich-Keil, Ufuk Topcu

    Abstract: Coordination in multiplayer games enables players to avoid the lose-lose outcome that often arises at Nash equilibria. However, designing a coordination mechanism typically requires the consideration of the joint actions of all players, which becomes intractable in large-scale games. We develop a novel coordination mechanism, termed reduced rank correlated equilibria, which reduces the number of j… ▽ More

    Submitted 12 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  22. arXiv:2403.10064  [pdf, other

    eess.IV cs.CV

    Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

    Authors: Chong Wang, Lanqing Guo, Yufei Wang, Hao Cheng, Yi Yu, Bihan Wen

    Abstract: Deep unfolding networks (DUN) have emerged as a popular iterative framework for accelerated magnetic resonance imaging (MRI) reconstruction. However, conventional DUN aims to reconstruct all the missing information within the entire null space in each iteration. Thus it could be challenging when dealing with highly ill-posed degradation, usually leading to unsatisfactory reconstruction. In this wo… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  23. arXiv:2403.09693  [pdf, other

    eess.SP

    A Constrained Deep Reinforcement Learning Optimization for Reliable Network Slicing in a Blockchain-Secured Low-Latency Wireless Network

    Authors: Xin Hao, Phee Lep Yeoh, Changyang She, Yao Yu, Branka Vucetic, Yonghui Li

    Abstract: Network slicing (NS) is a promising technology that supports diverse requirements for next-generation low-latency wireless communication networks. However, the tampering attack is a rising issue of jeopardizing NS service-provisioning. To resist tampering attacks in NS networks, we propose a novel optimization framework for reliable NS resource allocation in a blockchain-secured low-latency wirele… ▽ More

    Submitted 16 February, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.08016

  24. arXiv:2403.09407  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    LM2D: Lyrics- and Music-Driven Dance Synthesis

    Authors: Wenjie Yin, Xuejiao Zhao, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

    Abstract: Dance typically involves professional choreography with complex movements that follow a musical rhythm and can also be influenced by lyrical content. The integration of lyrics in addition to the auditory dimension, enriches the foundational tone and makes motion generation more amenable to its semantic meanings. However, existing dance synthesis methods tend to model motions only conditioned on au… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  25. arXiv:2403.09157  [pdf, ps, other

    eess.IV cs.CV

    VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

    Authors: Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao

    Abstract: In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 4 figures

  26. arXiv:2402.17246  [pdf, other

    eess.IV cs.CV cs.LG

    SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging

    Authors: Meng Lou, Hanning Ying, Xiaoqing Liu, Hong-Yu Zhou, Yuqing Zhang, Yizhou Yu

    Abstract: Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) t… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 13 pages, 7 figures

  27. arXiv:2402.03302  [pdf, other

    eess.IV cs.CV cs.LG

    Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining

    Authors: Jiarun Liu, Hao Yang, Hong-Yu Zhou, Yan Xi, Lequan Yu, Yizhou Yu, Yong Liang, Guangming Shi, Shaoting Zhang, Hairong Zheng, Shanshan Wang

    Abstract: Accurate medical image segmentation demands the integration of multi-scale information, spanning from local features to global dependencies. However, it is challenging for existing methods to model long-range global information, where convolutional neural networks (CNNs) are constrained by their local receptive fields, and vision transformers (ViTs) suffer from high quadratic complexity of their a… ▽ More

    Submitted 6 March, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Code and models of Swin-UMamba are publicly available at: https://github.com/JiarunLiu/Swin-UMamba

  28. arXiv:2401.17619  [pdf, ps, other

    cs.SD eess.AS

    Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

    Authors: Jiatong Shi, Yueqian Lin, Xinyi Bai, Keyi Zhang, Yuning Wu, Yuxun Tang, Yifeng Yu, Qin Jin, Shinji Watanabe

    Abstract: In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatu… ▽ More

    Submitted 12 June, 2024; v1 submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech2024

  29. arXiv:2401.10447  [pdf, other

    cs.CL cs.AI cs.LG cs.NE cs.SD eess.AS

    Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

    Authors: Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

    Abstract: The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware. In this study, we first explore how to enhance model performance by introducing various LoRA training strategies, achieving relative word error rate reductions of 3.50\% on the public Librispeech dat… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  30. arXiv:2401.06224  [pdf, other

    eess.IV cs.CV cs.LG

    Leveraging Frequency Domain Learning in 3D Vessel Segmentation

    Authors: Xinyuan Wang, Chengwei Pan, Hongming Dai, Gangming Zhao, Jinpeng Li, Xiao Zhang, Yizhou Yu

    Abstract: Coronary microvascular disease constitutes a substantial risk to human health. Employing computer-aided analysis and diagnostic systems, medical professionals can intervene early in disease progression, with 3D vessel segmentation serving as a crucial component. Nevertheless, conventional U-Net architectures tend to yield incoherent and imprecise segmentation outcomes, particularly for small vesse… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  31. arXiv:2401.02884  [pdf

    eess.IV cs.AI

    MsDC-DEQ-Net: Deep Equilibrium Model (DEQ) with Multi-scale Dilated Convolution for Image Compressive Sensing (CS)

    Authors: Youhao Yu, Richard M. Dansereau

    Abstract: Compressive sensing (CS) is a technique that enables the recovery of sparse signals using fewer measurements than traditional sampling methods. To address the computational challenges of CS reconstruction, our objective is to develop an interpretable and concise neural network model for reconstructing natural images using CS. We achieve this by mapping one step of the iterative shrinkage threshold… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 15 pages, 8 figures, open access journal paper

  32. arXiv:2401.00806  [pdf, other

    eess.SY

    Noise-Aware and Equitable Urban Air Traffic Management: An Optimization Approach

    Authors: Zhenyu Gao, Yue Yu, Qinshuang Wei, Ufuk Topcu, John-Paul Clarke

    Abstract: Urban air mobility (UAM), a transformative concept for the transport of passengers and cargo, faces several integration challenges in complex urban environments. Community acceptance of aircraft noise is among the most noticeable of these challenges when launching or scaling up a UAM system. Properly managing community noise is fundamental to establishing a UAM system that is environmentally and s… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

    Comments: 30 pages, 15 figures

  33. arXiv:2310.09603  [pdf, other

    eess.IV cs.CV

    B-Spine: Learning B-Spline Curve Representation for Robust and Interpretable Spinal Curvature Estimation

    Authors: Hao Wang, Qiang Song, Ruofeng Yin, Rui Ma, Yizhou Yu, Yi Chang

    Abstract: Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we pr… ▽ More

    Submitted 14 October, 2023; originally announced October 2023.

  34. arXiv:2310.05368  [pdf, other

    cs.AI cs.MA cs.SD eess.AS

    Measuring Acoustics with Collaborative Multiple Agents

    Authors: Yinfeng Yu, Changan Chen, Lele Cao, Fangkai Yang, Fuchun Sun

    Abstract: As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by set… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: Main paper (9 pages and 5 figures and 2 tables) and appendix (16 pages and 13 figures and 10 tables). Accepted for publication by IJCAI 2023

  35. arXiv:2310.00455  [pdf, other

    cs.MM cs.GR cs.LG cs.SD eess.AS

    Music- and Lyrics-driven Dance Synthesis

    Authors: Wenjie Yin, Qingyuan Yao, Yi Yu, Hang Yin, Danica Kragic, Mårten Björkman

    Abstract: Lyrics often convey information about the songs that are beyond the auditory dimension, enriching the semantic meaning of movements and musical themes. Such insights are important in the dance choreography domain. However, most existing dance synthesis methods mainly focus on music-to-dance generation, without considering the semantic information. To complement it, we introduce JustLMD, a new mult… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  36. arXiv:2309.15223  [pdf, other

    cs.CL cs.AI cs.LG cs.NE cs.SD eess.AS

    Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

    Authors: Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko

    Abstract: We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring. Although pretrained language models (LMs) like BERT have shown superior performance in second-pass rescoring, the high computational cost of scaling up the pretraining stage and adapting the pretrained models to specific domains limit their practical use in rescoring. Here we p… ▽ More

    Submitted 10 October, 2023; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to IEEE ASRU 2023. Internal Review Approved. Revised 2nd version with Andreas and Huck. The first version is in Sep 29th. 8 pages

    Journal ref: Proc. IEEE ASRU Workshop, Dec. 2023

  37. arXiv:2309.09409  [pdf

    eess.SP

    Improving Axial Resolution of Optical Resolution Photoacoustic Microscopy with Advanced Frequency Domain Eigenspace Based Minimum Variance Beamforming Method

    Authors: Yu-Hsiang Yu, Meng-Lin Li

    Abstract: Optical resolution photoacoustic microscopy (OR-PAM) leverages optical focusing and acoustic detection for microscopic optical absorption imaging. Intrinsically it owns high optical lateral resolution and poor acoustic axial resolution. Such anisometric resolution hinders good 3-D visualization; thus 2-D maximum amplitude projection images are commonly presented in the literature. Since its axial… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

  38. arXiv:2308.13789  [pdf

    eess.SP

    Sensiverse: A dataset for ISAC study

    Authors: Jiajin Luo, Baojian Zhou, Yang Yu, Ping Zhang, Xiaohui Peng, Jianglei Ma, Peiying Zhu, Jianmin Lu, Wen Tong

    Abstract: In order to address the lack of applicable channel models for ISAC research and evaluation, we release Sensiverse, a dataset that can be used for ISAC research. In this paper, we present the method of generating Sensiverse, including the acquisition and formatting of the 3D scene models, the generation of the channel data and associations with Tx/Rx deployment. The file structure and usage of the… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  39. arXiv:2308.11644  [pdf, other

    eess.SP cs.LG

    Synergistic Signal Denoising for Multimodal Time Series of Structure Vibration

    Authors: Yang Yu, Han Chen

    Abstract: Structural Health Monitoring (SHM) plays an indispensable role in ensuring the longevity and safety of infrastructure. With the rapid growth of sensor technology, the volume of data generated from various structures has seen an unprecedented surge, bringing forth challenges in efficient analysis and interpretation. This paper introduces a novel deep learning algorithm tailored for the complexities… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

  40. arXiv:2308.08197  [pdf, other

    eess.IV cs.CV

    Self-Reference Deep Adaptive Curve Estimation for Low-Light Image Enhancement

    Authors: Jianyu Wen, Chenhao Wu, Tong Zhang, Yixuan Yu, Piotr Swierczynski

    Abstract: In this paper, we propose a 2-stage low-light image enhancement method called Self-Reference Deep Adaptive Curve Estimation (Self-DACE). In the first stage, we present an intuitive, lightweight, fast, and unsupervised luminance enhancement algorithm. The algorithm is based on a novel low-light enhancement curve that can be used to locally boost image brightness. We also propose a new loss function… ▽ More

    Submitted 10 September, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

  41. arXiv:2308.08017  [pdf, other

    cs.GT cs.LG eess.SY

    Active Inverse Learning in Stackelberg Trajectory Games

    Authors: Yue Yu, Jacob Levy, Negar Mehr, David Fridovich-Keil, Ufuk Topcu

    Abstract: Game-theoretic inverse learning is the problem of inferring the players' objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates describes the… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

  42. arXiv:2308.03769  [pdf, other

    eess.SY cs.AI math.OC

    Towards Integrated Traffic Control with Operating Decentralized Autonomous Organization

    Authors: Shengyue Yao, Jingru Yu, Yi Yu, Jia Xu, Xingyuan Dai, Honghai Li, Fei-Yue Wang, Yilun Lin

    Abstract: With a growing complexity of the intelligent traffic system (ITS), an integrated control of ITS that is capable of considering plentiful heterogeneous intelligent agents is desired. However, existing control methods based on the centralized or the decentralized scheme have not presented their competencies in considering the optimality and the scalability simultaneously. To address this issue, we p… ▽ More

    Submitted 25 July, 2023; originally announced August 2023.

    Comments: 6 pages, 6 figures. To be published in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)

  43. arXiv:2308.02867  [pdf, other

    cs.SD eess.AS

    A Systematic Exploration of Joint-training for Singing Voice Synthesis

    Authors: Yuning Wu, Yifeng Yu, Jiatong Shi, Tao Qian, Qin Jin

    Abstract: There has been a growing interest in using end-to-end acoustic models for singing voice synthesis (SVS). Typically, these models require an additional vocoder to transform the generated acoustic features into the final waveform. However, since the acoustic model and the vocoder are not jointly optimized, a gap can exist between the two models, leading to suboptimal performance. Although a similar… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

  44. arXiv:2308.02166  [pdf, other

    eess.SY eess.SP

    Transformer-Based Denoising of Mechanical Vibration Signals

    Authors: Han Chen, Yang Yu, Pengtao Li

    Abstract: Mechanical vibration signal denoising is a pivotal task in various industrial applications, including system health monitoring and failure prediction. This paper introduces a novel deep learning transformer-based architecture specifically tailored for denoising mechanical vibration signals. The model leverages a Multi-Head Attention layer with 8 heads, processing input sequences of length 128, emb… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 8 pages

  45. arXiv:2307.07710  [pdf, other

    cs.CV eess.IV

    ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

    Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

    Abstract: Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic mappings from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure… ▽ More

    Submitted 15 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: accepted by ICCV2023

  46. arXiv:2307.05884  [pdf, other

    eess.SY cs.RO

    Learning Koopman Operators with Control Using Bi-level Optimization

    Authors: Daning Huang, Muhammad Bayu Prasetyo, Yin Yu, Junyi Geng

    Abstract: The accurate modeling and control of nonlinear dynamical effects are crucial for numerous robotic systems. The Koopman formalism emerges as a valuable tool for linear control design in nonlinear systems within unknown environments. However, it still remains a challenging task to learn the Koopman operator with control from data, and in particular, the simultaneous identification of the Koopman lin… ▽ More

    Submitted 5 November, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Accepted by 2023 IEEE 62nd Conference on Decision and Control (CDC)

  47. arXiv:2306.12058  [pdf, other

    cs.CV eess.IV

    Beyond Learned Metadata-based Raw Image Reconstruction

    Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

    Abstract: While raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective im… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

  48. arXiv:2306.02613  [pdf, other

    cs.SD cs.AI eess.AS

    Controllable Lyrics-to-Melody Generation

    Authors: Zhe Zhang, Yi Yu, Atsuhiro Takasu

    Abstract: Lyrics-to-melody generation is an interesting and challenging topic in AI music research field. Due to the difficulty of learning the correlations between lyrics and melody, previous methods suffer from low generation quality and lack of controllability. Controllability of generative models enables human interaction with models to generate desired contents, which is especially important in music g… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  49. arXiv:2306.00408  [pdf, other

    cs.CY cs.DC eess.SY

    Pursuing Equilibrium of Medical Resources via Data Empowerment in Parallel Healthcare System

    Authors: Yi Yu, Shengyue Yao, Kexin Wang, Yan Chen, Fei-Yue Wang, Yilun Lin

    Abstract: The imbalance between the supply and demand of healthcare resources is a global challenge, which is particularly severe in developing countries. Governments and academic communities have made various efforts to increase healthcare supply and improve resource allocation. However, these efforts often remain passive and inflexible. Alongside these issues, the emergence of the parallel healthcare syst… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  50. arXiv:2305.07110  [pdf, other

    eess.SY

    Dynamic Routing in Stochastic Urban Air Mobility Networks: A Markov Decision Process Approach

    Authors: Qinshuang Wei, Yue Yu, Ufuk Topcu

    Abstract: Urban air mobility (UAM) is an emerging concept in short-range aviation transportation, where the aircraft will take off, land, and charge their batteries at a set of vertistops, and travel only through a set of flight corridors connecting these vertistops. We study the problem of routing an electric aircraft from its origin vertistop to its destination vertistop with the minimal expected total tr… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

    Comments: 8 pages, 3 figures