Zum Hauptinhalt springen

Showing 1–50 of 199 results for author: Lin, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.12080  [pdf, other

    eess.SP cs.AI cs.NI

    Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning

    Authors: Max J. L. Lee, Ju Lin, Li-Ta Hsu

    Abstract: We propose a feasibility study for real-time automated data standardization leveraging Large Language Models (LLMs) to enhance seamless positioning systems in IoT environments. By integrating and standardizing heterogeneous sensor data from smartphones, IoT devices, and dedicated systems such as Ultra-Wideband (UWB), our study ensures data compatibility and improves positioning accuracy using the… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Accepted at IPIN 2024. To be published in IEEE Xplore

  2. arXiv:2408.05777  [pdf, other

    cs.CV cs.AI eess.IV

    Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

    Authors: Hannuo Zhang, Huihui Li, Jiarui Lin, Yujie Zhang, Jianghua Fan, Hang Liu

    Abstract: Optical remote sensing and Synthetic Aperture Radar(SAR) remote sensing are crucial for earth observation, offering complementary capabilities. While optical sensors provide high-quality images, they are limited by weather and lighting conditions. In contrast, SAR sensors can operate effectively under adverse conditions. This letter proposes a GAN-based SAR-to-optical image translation method name… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  3. Precoding Based Downlink OAM-MIMO Communications with Rate Splitting

    Authors: Ruirui Chen, Jinyang Lin, Beibei Zhang, Yu Ding, Keyue Xu

    Abstract: Orbital angular momentum (OAM) and rate splitting (RS) are the potential key techniques for the future wireless communications. As a new orthogonal resource, OAM can achieve the multifold increase of spectrum efficiency to relieve the scarcity of the spectrum resource, but how to enhance the privacy performance imposes crucial challenge for OAM communications. RS technique divides the information… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE TRANSACTIONS ON BROADCASTING, VOL. 69, NO. 4, DECEMBER 2023

  4. Cooperative Orbital Angular Momentum Wireless Communications

    Authors: Ruirui Chen, Wenchi Cheng, Jinyang Lin, Liping Liang

    Abstract: Orbital angular momentum (OAM) mode multiplexing has the potential to achieve high spectrum-efficiency communications at the same time and frequency by using orthogonal mode resource. However, the vortex wave hollow divergence characteristic results in the requirement of the large-scale receive antenna, which makes users hardly receive the OAM signal by size-limited equipment. To promote the OAM a… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Journal ref: IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 73, NO. 1, JANUARY 2024

  5. arXiv:2407.15358  [pdf, other

    eess.IV

    PRIME: Blind Multispectral Unmixing Using Virtual Quantum Prism and Convex Geometry

    Authors: Chia-Hsiang Lin, Jhao-Ting Lin

    Abstract: Multispectral unmixing (MU) is critical due to the inevitable mixed pixel phenomenon caused by the limited spatial resolution of typical multispectral images in remote sensing. However, MU mathematically corresponds to the underdetermined blind source separation problem, thus highly challenging, preventing researchers from tackling it. Previous MU works all ignore the underdetermined issue, and me… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 13 pages, 9 figures

  6. arXiv:2407.12963  [pdf, other

    eess.IV

    Edge Projection-Based Adaptive View Selection for Cone-Beam CT

    Authors: Jingsong Lin, Singanallur Venkatakrishnan, Gregery Buzzard, Amir Koushyar Ziabari, Charles Bouman

    Abstract: Industrial cone-beam X-ray computed tomography (CT) scans of additively manufactured components produce a 3D reconstruction from projection measurements acquired at multiple predetermined rotation angles of the component about a single axis. Typically, a large number of projections are required to achieve a high-quality reconstruction, a process that can span several hours or days depending on the… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Submitted to 2024 Asilomar Conference on Signals, Systems, and Computers

  7. arXiv:2407.11578  [pdf, other

    cs.CV eess.IV

    UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

    Authors: Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo

    Abstract: This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and pla… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  8. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  9. ecVoice: Audio Text Extraction and Optimization of Video Based on Idioms Similarity Replacement

    Authors: Jinwei Lin

    Abstract: The Text Extraction of the Audio from the Video plays an important role in multimedia editing and processing. As a popular open source toolkit, Whisper performs fast in human voice recognition. However, the recognition performance is dependent on the computing resource, which makes the low computing memory running Whisper become difficult. Our paper presents an available solution to extract the hu… ▽ More

    Submitted 20 May, 2024; originally announced July 2024.

    Comments: APSIPA ASC 2023

  10. arXiv:2407.07397  [pdf, other

    cs.SD eess.AS

    SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness

    Authors: Jie Lin, Xiuping Yang, Li Xiao, Xinhong Li, Weiyan Yi, Yuhong Yang, Weiping Tu, Xiong Chen

    Abstract: Obstructive Sleep Apnea-Hypopnea Syndrome (OSAHS) is a prevalent chronic breathing disorder caused by upper airway obstruction. Previous studies advanced OSAHS evaluation through machine learning-based systems trained on sleep snoring or speech signal datasets. However, constructing datasets for training a precise and rapid OSAHS evaluation system poses a challenge, since 1) it is time-consuming t… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  11. arXiv:2407.02826  [pdf, other

    eess.AS

    SA-WavLM: Speaker-Aware Self-Supervised Pre-training for Mixture Speech

    Authors: Jingru Lin, Meng Ge, Junyi Ao, Liqun Deng, Haizhou Li

    Abstract: It was shown that pre-trained models with self-supervised learning (SSL) techniques are effective in various downstream speech tasks. However, most such models are trained on single-speaker speech data, limiting their effectiveness in mixture speech. This motivates us to explore pre-training on mixture speech. This work presents SA-WavLM, a novel pre-trained model for mixture speech. Specifically,… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: InterSpeech 2024

  12. arXiv:2407.01536  [pdf, other

    eess.SY

    Beyond Profit: A Multi-Objective Framework for Electric Vehicle Charging Station Operations

    Authors: Shuoyao Wang, Jiawei Lin

    Abstract: This paper explores the pricing and scheduling strategies of the electric vehicle charging stations in response to the rising demand for cleaner transportation. Most of the existing methods focus on maximizing the energy efficiency or the charging station profit, however, the reputation of EVs is also a key factor for the long-term charging station operations. To address these gaps, we propose a n… ▽ More

    Submitted 12 March, 2024; originally announced July 2024.

    Comments: Accepted By VTC24-Spring

  13. arXiv:2406.14869  [pdf, other

    eess.SP

    Cost-Effective RF Fingerprinting Based on Hybrid CVNN-RF Classifier with Automated Multi-Dimensional Early-Exit Strategy

    Authors: Jiayan Gan, Zhixing Du, Qiang Li, Huaizong Shao, Jingran Lin, Ye Pan, Zhongyi Wen, Shafei Wang

    Abstract: While the Internet of Things (IoT) technology is booming and offers huge opportunities for information exchange, it also faces unprecedented security challenges. As an important complement to the physical layer security technologies for IoT, radio frequency fingerprinting (RFF) is of great interest due to its difficulty in counterfeiting. Recently, many machine learning (ML)-based RFF algorithms h… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  14. arXiv:2406.09989  [pdf, other

    q-bio.NC eess.SY

    Suppressing seizure via optimal electrical stimulation to the hub of epileptic brain network

    Authors: Zhichao Liang, Guanyi Zhao, Yinuo Zhang, Weiting Sun, Jingzhe Lin, Jialin Wang, Quanying Liu

    Abstract: The electrical stimulation to the seizure onset zone (SOZ) serves as an efficient approach to seizure suppression. Recently, seizure dynamics have gained widespread attendance in its network propagation mechanisms. Compared with the direct stimulation to SOZ, other brain network-level approaches that can effectively suppress epileptic seizures remain under-explored. In this study, we introduce a p… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  15. arXiv:2406.07437  [pdf, other

    cs.SD eess.AS

    Graph-based multi-Feature fusion method for speech emotion recognition

    Authors: Xueyu Liu, Jie Lin, Chao Wang

    Abstract: Exploring proper way to conduct multi-speech feature fusion for cross-corpus speech emotion recognition is crucial as different speech features could provide complementary cues reflecting human emotion status. While most previous approaches only extract a single speech feature for emotion recognition, existing fusion methods such as concatenation, parallel connection, and splicing ignore heterogen… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 25 pages,4 figures

  16. arXiv:2406.01245  [pdf, other

    eess.IV

    Sparse Focus Network for Multi-Source Remote Sensing Data Classification

    Authors: Xuepeng Jin, Junyan Lin, Feng Gao, Lin Qi, Yang Zhou

    Abstract: Multi-source remote sensing data classification has emerged as a prominent research topic with the advancement of various sensors. Existing multi-source data classification methods are susceptible to irrelevant information interference during multi-source feature extraction and fusion. To solve this issue, we propose a sparse focus network for multi-source data classification. Sparse attention is… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE IGARSS 2024

  17. arXiv:2406.01235  [pdf, other

    eess.IV

    Boosting Spatial-Spectral Masked Auto-Encoder Through Mining Redundant Spectra for HSI-SAR/LiDAR Classification

    Authors: Junyan Lin, Xuepeng Jin, Feng Gao, Junyu Dong, Hui Yu

    Abstract: Although recent masked image modeling (MIM)-based HSI-LiDAR/SAR classification methods have gradually recognized the importance of the spectral information, they have not adequately addressed the redundancy among different spectra, resulting in information leakage during the pretraining stage. This issue directly impairs the representation ability of the model. To tackle the problem, we propose a… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted by IGARSS 2024

  18. arXiv:2406.00421  [pdf

    eess.SY

    Modal Analysis of Power System with High CIG Penetration Based on Impedance Models

    Authors: Le Zheng, Jiajie Zheng, Jiajian Lin, Chongru Liu

    Abstract: This paper explores the modal analysis of power systems with high Converter-Interfaced Generation (CIG) penetration utilizing an impedance-based modeling approach. Traditional modal analysis based on the state-space model (MASS) requires comprehensive control structures and parameters of each system element, a challenging prerequisite as converters increasingly integrate into power systems and the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  19. arXiv:2405.20693  [pdf, other

    eess.IV cs.CV

    R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction

    Authors: Ruyi Zha, Tao Jun Lin, Yuanhao Cai, Jiwen Cao, Yanhao Zhang, Hongdong Li

    Abstract: 3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R2-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a p… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  20. arXiv:2405.13661  [pdf, ps, other

    cs.SD eess.AS

    Timbre Perception, Representation, and its Neuroscientific Exploration: A Comprehensive Review

    Authors: Hong Zhang, Jie Lin, Shengxuan Chen

    Abstract: Timbre, the sound's unique "color", is fundamental to how we perceive and appreciate music. This review explores the multifaceted world of timbre perception and representation. It begins by tracing the word's origin, offering an intuitive grasp of the concept. Building upon this foundation, the article delves into the complexities of defining and measuring timbre. It then explores the concept and… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  21. arXiv:2405.13636  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

    Authors: Jiaju Lin, Haoxuan Hu

    Abstract: Audio tagging is an important task of mapping audio samples to their corresponding categories. Recently endeavours that exploit transformer models in this field have achieved great success. However, the quadratic self-attention cost limits the scaling of audio transformer models and further constrains the development of more universal audio models. In this paper, we attempt to solve this problem b… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  22. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  23. arXiv:2405.09561  [pdf

    eess.SP cs.AI cs.LG

    GAD: A Real-time Gait Anomaly Detection System with Online Adaptive Learning

    Authors: Ming-Chang Lee, Jia-Chun Lin, Sokratis Katsikas

    Abstract: Gait anomaly detection is a task that involves detecting deviations from a person's normal gait pattern. These deviations can indicate health issues and medical conditions in the healthcare domain, or fraudulent impersonation and unauthorized identity access in the security domain. A number of gait anomaly detection approaches have been introduced, but many of them require offline data preprocessi… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 14 pages, 8 figures, 3 tables, ICT Systems Security and Privacy Protection 39th IFIP TC 11 International Conference, SEC 2024, Edinburgh, UK, June 12-14, 2024, Proceedings (IFIP SEC2024)

  24. arXiv:2405.05518  [pdf, other

    cs.CV cs.RO eess.IV

    DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction

    Authors: Siyu Li, Jiacheng Lin, Hao Shi, Jiaming Zhang, Song Wang, You Yao, Zhiyong Li, Kailun Yang

    Abstract: Temporal information plays a pivotal role in Bird's-Eye-View (BEV) driving scene understanding, which can alleviate the visual information sparsity. However, the indiscriminate temporal fusion method will cause the barrier of feature redundancy when constructing vectorized High-Definition (HD) maps. In this paper, we revisit the temporal fusion of vectorized HD maps, focusing on temporal instance… ▽ More

    Submitted 25 August, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code is available at https://github.com/lynn-yu/DTCLMapper

  25. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  26. arXiv:2405.03692  [pdf, other

    eess.IV cs.NI eess.SY

    Imitation Learning for Adaptive Video Streaming with Future Adversarial Information Bottleneck Principle

    Authors: Shuoyao Wang, Jiawei Lin, Fangwei Ye

    Abstract: Adaptive video streaming plays a crucial role in ensuring high-quality video streaming services. Despite extensive research efforts devoted to Adaptive BitRate (ABR) techniques, the current reinforcement learning (RL)-based ABR algorithms may benefit the average Quality of Experience (QoE) but suffers from fluctuating performance in individual video sessions. In this paper, we present a novel appr… ▽ More

    Submitted 12 March, 2024; originally announced May 2024.

    Comments: submitted to IEEE Journal

  27. arXiv:2404.19615  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    SemiPL: A Semi-supervised Method for Event Sound Source Localization

    Authors: Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

    Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many app… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  28. arXiv:2404.17554  [pdf

    cs.HC eess.SP eess.SY stat.AP

    A Novel Context driven Critical Integrative Levels (CIL) Approach: Advancing Human-Centric and Integrative Lighting Asset Management in Public Libraries with Practical Thresholds

    Authors: Jing Lin, Nina Mylly, Per Olof Hedekvist, Jingchun Shen

    Abstract: This paper proposes the context driven Critical Integrative Levels (CIL), a novel approach to lighting asset management in public libraries that aligns with the transformative vision of human-centric and integrative lighting. This approach encompasses not only the visual aspects of lighting performance but also prioritizes the physiological and psychological well-being of library users. Incorporat… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  29. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  30. arXiv:2404.15279  [pdf, other

    eess.SP cs.AI

    Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification

    Authors: Jimmy Lin, Junkai Li, Jiasi Gao, Weizhi Ma, Yang Liu

    Abstract: Tactile signals collected by wearable electronics are essential in modeling and understanding human behavior. One of the main applications of tactile signals is action classification, especially in healthcare and robotics. However, existing tactile classification methods fail to capture the spatial and temporal features of tactile signals simultaneously, which results in sub-optimal performances.… ▽ More

    Submitted 20 January, 2024; originally announced April 2024.

    Comments: Accepted by AAAI 2024

  31. arXiv:2404.12794  [pdf, other

    cs.CV cs.MM cs.RO eess.IV

    MambaMOS: LiDAR-based 3D Moving Object Segmentation with Motion-aware State Space Model

    Authors: Kang Zeng, Hao Shi, Jiacheng Lin, Siyu Li, Jintao Cheng, Kaiwei Wang, Zhiyong Li, Kailun Yang

    Abstract: LiDAR-based Moving Object Segmentation (MOS) aims to locate and segment moving objects in point clouds of the current scan using motion information from previous scans. Despite the promising results achieved by previous MOS methods, several key issues, such as the weak coupling of temporal and spatial information, still need further study. In this paper, we propose a novel LiDAR-based 3D Moving Ob… ▽ More

    Submitted 5 August, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted to ACM MM 2024. The source code is publicly available at https://github.com/Terminal-K/MambaMOS

  32. arXiv:2404.09729  [pdf

    eess.SP cs.IT cs.LG stat.ME

    Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

    Authors: Shuaicong Hu, Yanan Wang, Jian Liu, Jingyu Lin, Shengmei Qin, Zhenning Nie, Zhifeng Yao, Wenjie Cai, Cuiwei Yang

    Abstract: Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG mor… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 16 pages, 12 figures

    ACM Class: I.5.2

  33. arXiv:2404.01929  [pdf

    eess.IV cs.CV

    Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method

    Authors: Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin

    Abstract: This study aims to establish a computer-aided diagnostic system for lung lesions using endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, hysicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be influenced by… ▽ More

    Submitted 20 June, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  34. arXiv:2403.05808  [pdf, other

    cs.CV eess.IV

    Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

    Authors: Junxiong Lin, Yan Wang, Zeng Tao, Boyang Wang, Qing Zhao, Haorang Wang, Xuan Tong, Xinji Mai, Yuxuan Lin, Wei Song, Jiawen Yu, Shaoqi Yan, Wenqiang Zhang

    Abstract: Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation inf… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

  35. arXiv:2402.18302  [pdf, other

    cs.CV cs.RO eess.AS eess.IV

    EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving

    Authors: Jiacheng Lin, Jiajun Chen, Kunyu Peng, Xuan He, Zhiyong Li, Rainer Stiefelhagen, Kailun Yang

    Abstract: This paper introduces the task of Auditory Referring Multi-Object Tracking (AR-MOT), which dynamically tracks specific objects in a video sequence based on audio expressions and appears as a challenging problem in autonomous driving. Due to the lack of semantic modeling capacity in audio and video, existing works have mainly focused on text-based multi-object tracking, which often comes at the cos… ▽ More

    Submitted 5 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). The source code and datasets are available at https://github.com/lab206/EchoTrack

  36. arXiv:2402.17785  [pdf, other

    cs.SD cs.AI eess.AS

    ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

    Authors: Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

    Abstract: Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Eval… ▽ More

    Submitted 6 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

  37. arXiv:2402.00859  [pdf, other

    eess.AS

    Deep Room Impulse Response Completion

    Authors: Jackie Lin, Georg Götz, Sebastian J. Schlecht

    Abstract: Rendering immersive spatial audio in virtual reality (VR) and video games demands a fast and accurate generation of room impulse responses (RIRs) to recreate auditory environments plausibly. However, the conventional methods for simulating or measuring long RIRs are either computationally intensive or challenged by low signal-to-noise ratios. This study is propelled by the insight that direct soun… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: The following article has been submitted to the EURASIP Journal on Audio, Speech, and Music Processing

  38. arXiv:2401.10411  [pdf, other

    eess.AS cs.SD

    AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

    Authors: Ju Lin, Niko Moritz, Yiteng Huang, Ruiming Xie, Ming Sun, Christian Fuegen, Frank Seide

    Abstract: Wearable devices like smart glasses are approaching the compute capability to seamlessly generate real-time closed captions for live conversations. We build on our recently introduced directional Automatic Speech Recognition (ASR) for smart glasses that have microphone arrays, which fuses multi-channel ASR with serialized output training, for wearer/conversation-partner disambiguation as well as s… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024

  39. arXiv:2312.16002  [pdf, other

    eess.AS cs.AI

    The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

    Authors: Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

    Abstract: This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition. Our submitted systems for ICMC-ASR Challenge include the multi-channel front-end enhancement and diarization, training data augmentation, speech recognition modeling with multi-channel branches. Tested on the offical Eval1 and Eval2 set, our best system achieves… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: Technical Report. 2 pages. For ICMC-ASR-2023 Challenge

  40. arXiv:2312.14473  [pdf, other

    math.OC eess.SY

    Coordinated Active-Reactive Power Management of ReP2H Systems with Multiple Electrolyzers

    Authors: Yangjun Zeng, Buxiang Zhou, Jie Zhu, Jiarong Li, Bosen Yang, Jin Lin, Yiwei Qiu

    Abstract: Utility-scale renewable power-to-hydrogen (ReP2H) production typically uses thyristor rectifiers (TRs) to supply power to multiple electrolyzers (ELZs). They exhibit a nonlinear and non-decouplable relation between active and reactive power. The on-off scheduling and load allocation of multiple ELZs simultaneously impact energy conversion efficiency and AC-side active and reactive power flow. Impr… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  41. arXiv:2312.10687  [pdf, other

    eess.AS cs.SD

    MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis

    Authors: Wenhao Guan, Yishuang Li, Tao Li, Hukai Huang, Feng Wang, Jiayan Lin, Lingyan Huang, Lin Li, Qingyang Hong

    Abstract: The style transfer task in Text-to-Speech refers to the process of transferring style information into text content to generate corresponding speech with a specific style. However, most existing style transfer approaches are either based on fixed emotional labels or reference speech clips, which cannot achieve flexible style transfer. Recently, some methods have adopted text descriptions to guide… ▽ More

    Submitted 31 January, 2024; v1 submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI2024

  42. arXiv:2311.15556  [pdf, other

    cs.CV eess.IV

    PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI Generated Images

    Authors: Jiquan Yuan, Xinyan Cao, Changjin Li, Fanyi Yang, Jinlong Lin, Xixin Cao

    Abstract: As image generation technology advances, AI-based image generation has been applied in various fields and Artificial Intelligence Generated Content (AIGC) has garnered widespread attention. However, the development of AI-based image generative models also brings new problems and challenges. A significant challenge is that AI-generated images (AIGI) may exhibit unique distortions compared to natura… ▽ More

    Submitted 29 November, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 18 pages

  43. arXiv:2311.05653  [pdf, ps, other

    eess.SY

    Minimal Input Structural Modifications for Strongly Structural Controllability

    Authors: Geethu Joseph, Shana Moothedath, Jiabin Lin

    Abstract: This paper studies the problem of modifying the input matrix of a structured system to make the system strongly structurally controllable. We focus on the generalized structured systems that rely on zero/nonzero/arbitrary structure, i.e., some entries of system matrices are zeros, some are nonzero, and the remaining entries can be zero or nonzero (arbitrary). We analyze the feasibility of the prob… ▽ More

    Submitted 17 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

  44. arXiv:2311.04526  [pdf, other

    eess.AS

    Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

    Authors: Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng

    Abstract: Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks. Since they are mainly pre-trained to map input speech to pseudo-labels, the resulting representations are only effective for the type of pre-train data used, either clean or mixture speech. With the idea of selective auditory attention, we propose a novel pre-training solution called Sele… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  45. arXiv:2311.04442  [pdf, other

    eess.IV cs.CV

    SS-MAE: Spatial-Spectral Masked Auto-Encoder for Multi-Source Remote Sensing Image Classification

    Authors: Junyan Lin, Feng Gao, Xiaocheng Shi, Junyu Dong, Qian Du

    Abstract: Masked image modeling (MIM) is a highly popular and effective self-supervised learning method for image understanding. Existing MIM-based methods mostly focus on spatial feature modeling, neglecting spectral feature modeling. Meanwhile, existing MIM-based methods use Transformer for feature extraction, some local or high-frequency information may get lost. To this end, we propose a spatial-spectra… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: IEEE TGRS 2023

  46. arXiv:2310.10159  [pdf, other

    cs.SD cs.CL eess.AS

    Joint Music and Language Attention Models for Zero-shot Music Tagging

    Authors: Xingjian Du, Zhesong Yu, Jiaju Lin, Bilei Zhu, Qiuqiang Kong

    Abstract: Music tagging is a task to predict the tags of music recordings. However, previous music tagging research primarily focuses on close-set music tagging tasks which can not be generalized to new tags. In this work, we propose a zero-shot music tagging system modeled by a joint music and language attention (JMLA) model to address the open-set music tagging problem. The JMLA model consists of an audio… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: \begin{keywords} Music tagging, joint music and language attention models, Music Foundation Model. \end{keywords}

  47. arXiv:2310.03750  [pdf

    eess.SP cond-mat.mtrl-sci cs.LG physics.app-ph

    Health diagnosis and recuperation of aged Li-ion batteries with data analytics and equivalent circuit modeling

    Authors: Riko I Made, Jing Lin, Jintao Zhang, Yu Zhang, Lionel C. H. Moh, Zhaolin Liu, Ning Ding, Sing Yang Chiam, Edwin Khoo, Xuesong Yin, Guangyuan Wesley Zheng

    Abstract: Battery health assessment and recuperation play a crucial role in the utilization of second-life Li-ion batteries. However, due to ambiguous aging mechanisms and lack of correlations between the recovery effects and operational states, it is challenging to accurately estimate battery health and devise a clear strategy for cell rejuvenation. This paper presents aging and reconditioning experiments… ▽ More

    Submitted 21 September, 2023; originally announced October 2023.

    Comments: 20 pages, 5 figures, 1 table

    Journal ref: iScience (2024)

  48. arXiv:2310.01861  [pdf, other

    eess.IV cs.CV cs.GR

    Shifting More Attention to Breast Lesion Segmentation in Ultrasound Videos

    Authors: Junhao Lin, Qian Dai, Lei Zhu, Huazhu Fu, Qiong Wang, Weibin Li, Wenhao Rao, Xiaoyang Huang, Liansheng Wang

    Abstract: Breast lesion segmentation in ultrasound (US) videos is essential for diagnosing and treating axillary lymph node metastasis. However, the lack of a well-established and large-scale ultrasound video dataset with high-quality annotations has posed a persistent challenge for the research community. To overcome this issue, we meticulously curated a US video breast lesion segmentation dataset comprisi… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 10 pages

  49. One-Bit Channel Estimation for IRS-aided Millimeter-Wave Massive MU-MISO System

    Authors: Silei Wang, Qiang Li, Jingran Lin

    Abstract: Recently, intelligent reflecting surface (IRS)-assisted communication has gained considerable attention due to its advantage in extending the coverage and compensating the path loss with low-cost passive metasurface. This paper considers the uplink channel estimation for IRS-aided multiuser massive MISO communications with one-bit ADCs at the base station (BS). The use of one-bit ADC is impelled b… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted by IEEE Trans. Signal Process

  50. Driving behavior-guided battery health monitoring for electric vehicles using machine learning

    Authors: Nanhua Jiang, Jiawei Zhang, Weiran Jiang, Yao Ren, Jing Lin, Edwin Khoo, Ziyou Song

    Abstract: An accurate estimation of the state of health (SOH) of batteries is critical to ensuring the safe and reliable operation of electric vehicles (EVs). Feature-based machine learning methods have exhibited enormous potential for rapidly and precisely monitoring battery health status. However, simultaneously using various health indicators (HIs) may weaken estimation performance due to feature redunda… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Journal ref: Applied Energy (2024)