Zum Hauptinhalt springen

Showing 1–50 of 130 results for author: Lin, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.15069  [pdf

    cs.CV eess.IV physics.ins-det

    Geometric Artifact Correction for Symmetric Multi-Linear Trajectory CT: Theory, Method, and Generalization

    Authors: Zhisheng Wang, Yanxu Sun, Shangyu Li, Legeng Lin, Shunli Wang, Junning Cui

    Abstract: For extending CT field-of-view to perform non-destructive testing, the Symmetric Multi-Linear trajectory Computed Tomography (SMLCT) has been developed as a successful example of non-standard CT scanning modes. However, inevitable geometric errors can cause severe artifacts in the reconstructed images. The existing calibration method for SMLCT is both crude and inefficient. It involves reconstruct… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures

    MSC Class: 68U10 (Primary) 68V99; 68Q30(Secondary)

  2. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elio Quinton, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg , et al. (18 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 27 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  3. arXiv:2408.13487  [pdf, ps, other

    cs.LO eess.SY math.OC

    Towards Automatic Linearization via SMT Solving

    Authors: Jian Cao, Liyong Lin, Lele Li

    Abstract: Mathematical optimization is ubiquitous in modern applications. However, in practice, we often need to use nonlinear optimization models, for which the existing optimization tools such as Cplex or Gurobi may not be directly applicable and an (error-prone) manual transformation often has to be done. Thus, to address this issue, in this paper we investigate the problem of automatically verifying and… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

    Comments: 4 pages, conference

  4. arXiv:2408.13435  [pdf, ps, other

    eess.SP

    Prototype of Secure Wire-Line Telephone

    Authors: Lifeng Lin, Zijian Zhou, Peihe Jiang, Sanjun Liu, Lai Wei, Bingli Jiao

    Abstract: This paper proposes a secure wire-line telephone prototype that leverages physical layer security (PLS) techniques to protect communications from wiretapping. The system generates artificial noise (AN) in both directions over a telephone line and utilizes a telephone hybrid circuit to achieve effective AN cancellation. We conduct a thorough analysis of the secrecy capacity and evaluate the system'… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 5 pages, 8 figures

  5. arXiv:2408.12615  [pdf, other

    eess.IV cs.CV cs.LG

    Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

    Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

    Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More

    Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

  6. arXiv:2408.07444  [pdf, other

    eess.IV cs.CV

    Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

    Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

    Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2407.14823  [pdf, other

    cs.CV cs.AI cs.LG cs.MM eess.IV

    CrossDehaze: Scaling Up Image Dehazing with Cross-Data Vision Alignment and Augmentation

    Authors: Yukai Shi, Zhipeng Weng, Yupei Lin, Cidan Shi, Xiaojun Yang, Liang Lin

    Abstract: In recent years, as computer vision tasks have increasingly relied on high-quality image inputs, the task of image dehazing has received significant attention. Previously, many methods based on priors and deep learning have been proposed to address the task of image dehazing. Ignoring the domain gap between different data, former de-hazing methods usually adopt multiple datasets for explicit train… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: A cross-dataset vision alignment and augmentation technology is proposed to boost generalizable feature learning in the de-hazing task

  8. arXiv:2406.18327  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

    Authors: Yuxuan Qi, Li Lin, Jiajun Wang, Jingya Zhang, Bin Zhang

    Abstract: Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by network… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  9. arXiv:2406.15931  [pdf, other

    eess.SY cs.CE cs.LG stat.AP

    Multistep Criticality Search and Power Shaping in Microreactors with Reinforcement Learning

    Authors: Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias

    Abstract: Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, developing robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid i… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 15 pages, 3 figures, and 2 tables

  10. arXiv:2405.18386  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

    Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

    Abstract: Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; o… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Code and demo are available at: https://github.com/ldzhangyx/instruct-musicgen

  11. arXiv:2405.17496  [pdf, other

    eess.IV

    UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

    Authors: Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang

    Abstract: Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, w… ▽ More

    Submitted 27 August, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  12. arXiv:2404.12908  [pdf, other

    cs.CV cs.LG eess.IV

    Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images

    Authors: Santosh, Li Lin, Irene Amerini, Xin Wang, Shu Hu

    Abstract: Diffusion models (DMs) have revolutionized image generation, producing high-quality images with applications spanning various fields. However, their ability to create hyper-realistic images poses significant challenges in distinguishing between real and synthetic content, raising concerns about digital authenticity and potential misuse in creating deepfakes. This work introduces a robust detection… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  13. arXiv:2404.03885  [pdf, ps, other

    cs.IT cs.DS eess.SP math.ST

    The ESPRIT algorithm under high noise: Optimal error scaling and noisy super-resolution

    Authors: Zhiyan Ding, Ethan N. Epperly, Lin Lin, Ruizhe Zhang

    Abstract: Subspace-based signal processing techniques, such as the Estimation of Signal Parameters via Rotational Invariant Techniques (ESPRIT) algorithm, are popular methods for spectral estimation. These algorithms can achieve the so-called super-resolution scaling under low noise conditions, surpassing the well-known Nyquist limit. However, the performance of these algorithms under high-noise conditions… ▽ More

    Submitted 22 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  14. arXiv:2404.02744  [pdf

    eess.IV

    Terraced Compression Method with Automated Threshold Selection for Multidimensional Image Clustering of Heterogeneous Bodies

    Authors: Jiatong Li, Gang Li, Nan Su Su Win, Ling Lin

    Abstract: Multispectral transmission imaging provides strong benefits for early breast cancer screening. The frame accumulation method addresses the challenge of low grayscale and signal-to-noise ratio resulting from the strong absorption and scattering of light by breast tissue. This method introduces redundancy in data while improving the grayscale and signal-to-noise ratio of the image. Existing terraced… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  15. arXiv:2404.02286  [pdf, other

    cs.IT eess.SP

    Energy Allocation for Multi-User Cooperative Molecular Communication Systems in the Internet of Bio-Nano Things

    Authors: Dongliang Jing, Lin Lin, Andrew W. Eckford

    Abstract: Cooperative molecular communication (MC) is a promising technology for facilitating communication between nanomachines in the Internet of Bio-Nano Things (IoBNT) field. However, the performance of IoBNT is limited by the availability of energy for cooperative MC. This paper presents a novel transmitter design scheme that utilizes molecule movement between reservoirs, creating concentration differe… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: To appear in IEEE Internet of Things Journal

  16. arXiv:2403.20025  [pdf, ps, other

    cs.IT eess.SP

    Secure Full-Duplex Communication via Movable Antennas

    Authors: Jingze Ding, Zijian Zhou, Chenbo Wang, Wenyao Li, Lifeng Lin, Bingli Jiao

    Abstract: This paper investigates physical layer security (PLS) for a movable antenna (MA)-assisted full-duplex (FD) system. In this system, an FD base station (BS) with multiple MAs for transmission and reception provides services for an uplink (UL) user and a downlink (DL) user. Each user operates in half-duplex (HD) mode and is equipped with a single fixed-position antenna (FPA), in the presence of a sin… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: This paper has been submitted for possible publication

  17. IDF-CR: Iterative Diffusion Process for Divide-and-Conquer Cloud Removal in Remote-sensing Images

    Authors: Meilin Wang, Yexing Song, Pengxu Wei, Xiaoyu Xian, Yukai Shi, Liang Lin

    Abstract: Deep learning technologies have demonstrated their effectiveness in removing cloud cover from optical remote-sensing images. Convolutional Neural Networks (CNNs) exert dominance in the cloud removal tasks. However, constrained by the inherent limitations of convolutional operations, CNNs can address only a modest fraction of cloud occlusion. In recent years, diffusion models have achieved state-of… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE TGRS, we first present an iterative diffusion process for cloud removal, the code is available at: https://github.com/SongYxing/IDF-CR

  18. arXiv:2403.08947  [pdf, other

    eess.IV cs.CV

    Robust COVID-19 Detection in CT Images with CLIP

    Authors: Li Lin, Yamini Sri Krubha, Zhenhuan Yang, Cheng Ren, Thuc Duy Le, Irene Amerini, Xin Wang, Shu Hu

    Abstract: In the realm of medical imaging, particularly for COVID-19 detection, deep learning models face substantial challenges such as the necessity for extensive computational resources, the paucity of well-annotated datasets, and a significant amount of unlabeled data. In this work, we introduce the first lightweight detector designed to overcome these obstacles, leveraging a frozen CLIP image encoder a… ▽ More

    Submitted 14 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  19. arXiv:2402.17502  [pdf, other

    cs.CV eess.IV

    FedLPPA: Learning Personalized Prompt and Aggregation for Federated Weakly-supervised Medical Image Segmentation

    Authors: Li Lin, Yixiang Liu, Jiewei Wu, Pujin Cheng, Zhiyuan Cai, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) effectively mitigates the data silo challenge brought about by policies and privacy concerns, implicitly harnessing more data for deep model training. However, traditional centralized FL models grapple with diverse multi-center data, especially in the face of significant data heterogeneity, notably in medical contexts. In the realm of medical image segmentation, the growing… ▽ More

    Submitted 31 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: 12 pages, 10 figures

  20. arXiv:2402.09508  [pdf, other

    cs.SD cs.AI eess.AS

    Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

    Authors: Liwei Lin, Gus Xia, Yixiao Zhang, Junyan Jiang

    Abstract: Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. T… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  21. arXiv:2401.17049  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna-Enabled Co-Frequency Co-Time Full-Duplex Wireless Communication

    Authors: Jingze Ding, Zijian Zhou, Wenyao Li, Chenbo Wang, Lifeng Lin, Bingli Jiao

    Abstract: Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/… ▽ More

    Submitted 7 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: This paper has been submitted to IEEE Wireless Communications Letters

  22. arXiv:2312.16607  [pdf, other

    eess.IV cs.CV stat.ML

    A Polarization and Radiomics Feature Fusion Network for the Classification of Hepatocellular Carcinoma and Intrahepatic Cholangiocarcinoma

    Authors: Jia Dong, Yao Yao, Liyan Lin, Yang Dong, Jiachen Wan, Ran Peng, Chao Li, Hui Ma

    Abstract: Classifying hepatocellular carcinoma (HCC) and intrahepatic cholangiocarcinoma (ICC) is a critical step in treatment selection and prognosis evaluation for patients with liver diseases. Traditional histopathological diagnosis poses challenges in this context. In this study, we introduce a novel polarization and radiomics feature fusion network, which combines polarization features obtained from Mu… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  23. arXiv:2312.07226  [pdf, other

    eess.IV cs.CV

    Super-Resolution on Rotationally Scanned Photoacoustic Microscopy Images Incorporating Scanning Prior

    Authors: Kai Pan, Linyang Li, Li Lin, Pujin Cheng, Junyan Lyu, Lei Xi, Xiaoyin Tang

    Abstract: Photoacoustic Microscopy (PAM) images integrating the advantages of optical contrast and acoustic resolution have been widely used in brain studies. However, there exists a trade-off between scanning speed and image resolution. Compared with traditional raster scanning, rotational scanning provides good opportunities for fast PAM imaging by optimizing the scanning mechanism. Recently, there is a t… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  24. arXiv:2312.03299  [pdf, other

    cs.IT eess.SP

    Channel-Transferable Semantic Communications for Multi-User OFDM-NOMA Systems

    Authors: Lan Lin, Wenjun Xu, Fengyu Wang, Yimeng Zhang, Wei Zhang, Ping Zhang

    Abstract: Semantic communications are expected to become the core new paradigms of the sixth generation (6G) wireless networks. Most existing works implicitly utilize channel information for codecs training, which leads to poor communications when channel type or statistical characteristics change. To tackle this issue posed by various channels, a novel channel-transferable semantic communications (CT-SemCo… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  25. arXiv:2312.03231  [pdf, other

    cs.LG cs.AI cs.CV cs.HC eess.AS

    Deep Multimodal Fusion for Surgical Feedback Classification

    Authors: Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

    Abstract: Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In th… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Journal ref: Published in Proceedings of Machine Learning for Health 2024

  26. Exploding AI Power Use: an Opportunity to Rethink Grid Planning and Management

    Authors: Liuzixuan Lin, Rajini Wijayawardana, Varsha Rao, Hai Nguyen, Wedan Emmanuel Gnibga, Andrew A. Chien

    Abstract: The unprecedented rapid growth of computing demand for AI is projected to increase global annual datacenter (DC) growth from 7.2% to 11.3%. We project the 5-year AI DC demand for several power grids and assess whether they will allow desired AI growth (resource adequacy). If not, several "desperate measures" -- grid policies that enable more load growth and maintain grid reliability by sacrificing… ▽ More

    Submitted 30 April, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted by ACM e-Energy '24: the 15th ACM International Conference on Future and Sustainable Energy Systems

  27. arXiv:2310.17162  [pdf, other

    cs.AI cs.SD eess.AS

    Content-based Controls For Music Large Language Modeling

    Authors: Liwei Lin, Gus Xia, Junyan Jiang, Yixiao Zhang

    Abstract: Recent years have witnessed a rapid growth of large-scale language models in the domain of music audio. Such models enable end-to-end generation of higher-quality music, and some allow conditioned generation using text descriptions. However, the control power of text controls on music is intrinsically limited, as they can only describe music indirectly through meta-data (such as singers and instru… ▽ More

    Submitted 13 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  28. arXiv:2310.11230  [pdf, other

    eess.AS cs.LG cs.SD

    Zipformer: A faster and better encoder for automatic speech recognition

    Authors: Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey

    Abstract: The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame… ▽ More

    Submitted 9 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Published as a conference paper at ICLR 2024

  29. arXiv:2310.07255  [pdf, other

    cs.CV eess.IV

    ADASR: An Adversarial Auto-Augmentation Framework for Hyperspectral and Multispectral Data Fusion

    Authors: Jinghui Qin, Lihuang Fang, Ruitao Lu, Liang Lin, Yukai Shi

    Abstract: Deep learning-based hyperspectral image (HSI) super-resolution, which aims to generate high spatial resolution HSI (HR-HSI) by fusing hyperspectral image (HSI) and multispectral image (MSI) with deep neural networks (DNNs), has attracted lots of attention. However, neural networks require large amounts of training data, hindering their application in real-world scenarios. In this letter, we propos… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Geoscience and Remote Sensing Letters. Code is released at https://github.com/fangfang11-plog/ADASR

  30. arXiv:2310.04886  [pdf, other

    eess.SY

    A Closed-form Solution for the Strapdown Inertial Navigation Initial Value Problem

    Authors: James Goppert, Li-Yu Lin, Kartik Pant, Benjamin Perseghetti

    Abstract: Strapdown inertial navigation systems (SINS) are ubiquitious in robotics and engineering since they can estimate a rigid body pose using onboard kinematic measurements without knowledge of the dynamics of the vehicle to which they are attached. While recent work has focused on the closed-form evolution of the estimation error for SINS, which is critical for Kalman filtering, the propagation of the… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: 4 pages, 3 figures

  31. arXiv:2309.08879  [pdf, other

    cs.CL eess.SP

    Semantic Information Extraction for Text Data with Probability Graph

    Authors: Zhouxiang Zhao, Zhaohui Yang, Ye Hu, Licheng Lin, Zhaoyang Zhang

    Abstract: In this paper, the problem of semantic information extraction for resource constrained text data transmission is studied. In the considered model, a sequence of text data need to be transmitted within a communication resource-constrained network, which only allows limited data transmission. Thus, at the transmitter, the original text data is extracted with natural language processing techniques. T… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  32. arXiv:2309.08105  [pdf, other

    eess.AS cs.SD

    Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

    Authors: Wei Kang, Xiaoyu Yang, Zengwei Yao, Fangjun Kuang, Yifan Yang, Liyong Guo, Long Lin, Daniel Povey

    Abstract: In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Different from other open-sourced datasets that only provide normalized transcriptions, Libriheavy contains richer information such as punctuation, casin… ▽ More

    Submitted 14 January, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Submitted to ICASSP 2024

  33. arXiv:2309.07414  [pdf, other

    eess.AS cs.CL cs.SD

    PromptASR for contextualized ASR with controllable style

    Authors: Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey

    Abstract: Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and t… ▽ More

    Submitted 24 January, 2024; v1 submitted 13 September, 2023; originally announced September 2023.

    Comments: Proc. ICASSP 2024

  34. arXiv:2309.05042  [pdf, ps, other

    eess.SP

    High-Precision Channel Estimation for Sub-Noise Self-Interference Cancellation

    Authors: Dongsheng Zheng, Lifeng Lin, Wenyao Li, Bingli Jiao

    Abstract: Self-interference cancellation plays a crucial role in achieving reliable full-duplex communications. In general, it is essential to cancel the self-interference signal below the thermal noise level, which necessitates accurate reconstruction of the self-interference signal. In this paper, we propose a high-precision channel estimation method specifically designed for sub-noise self-interference c… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  35. arXiv:2307.15972  [pdf, other

    eess.SY

    On Decidability of Existence of Fortified Supervisors Against Covert Actuator Attackers

    Authors: Ruochen Tai, Liyong Lin, Rong Su

    Abstract: This work investigates the problem of synthesizing fortified supervisors against covert actuator attackers. For a non-resilient supervisor S, i.e., there exists at least a covert actuator attacker that is capable of inflicting damage w.r.t S, a fortified supervisor S' satisfies two requirements: 1) S' is resilient against any covert actuator attacker, and 2) the original closed-behavior of the clo… ▽ More

    Submitted 29 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.02383

  36. arXiv:2306.14471  [pdf

    physics.med-ph eess.IV physics.ins-det physics.optics

    Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

    Authors: Rui Cao, Yilin Luo, Jinhua Xu, Xiaofei Luo, Ku Geng, Yousuf Aborahama, Manxiu Cui, Samuel Davis, Shuai Na, Xin Tong, Cindy Liu, Karteek Sastry, Konstantin Maslov, Peng Hu, Yide Zhang, Li Lin, Yang Zhang, Lihong V. Wang

    Abstract: Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we hav… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  37. arXiv:2305.11558  [pdf, other

    eess.AS cs.CL

    Blank-regularized CTC for Frame Skipping in Neural Transducer

    Authors: Yifan Yang, Xiaoyu Yang, Liyong Guo, Zengwei Yao, Wei Kang, Fangjun Kuang, Long Lin, Xie Chen, Daniel Povey

    Abstract: Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems. Due to their frame-synchronous design, blank symbols are introduced to address the length mismatch between acoustic frames and output tokens, which might bring redundant computation. Previous studies managed to accelerate the training and inference of neural Transducers by… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  38. arXiv:2305.11539  [pdf, other

    eess.AS

    Delay-penalized CTC implemented based on Finite State Transducer

    Authors: Zengwei Yao, Wei Kang, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Yifan Yang, Long Lin, Daniel Povey

    Abstract: Connectionist Temporal Classification (CTC) suffers from the latency problem when applied to streaming models. We argue that in CTC lattice, the alignments that can access more future context are preferred during training, thereby leading to higher symbol delay. In this work we propose the delay-penalized CTC which is augmented with latency penalty regularization. We devise a flexible and efficien… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted in INTERSPEECH 2023

  39. arXiv:2305.11504  [pdf, other

    eess.IV cs.CV cs.LG

    JOINEDTrans: Prior Guided Multi-task Transformer for Joint Optic Disc/Cup Segmentation and Fovea Detection

    Authors: Huaqing He, Li Lin, Zhiyuan Cai, Pujin Cheng, Xiaoying Tang

    Abstract: Deep learning-based image segmentation and detection models have largely improved the efficiency of analyzing retinal landmarks such as optic disc (OD), optic cup (OC), and fovea. However, factors including ophthalmic disease-related lesions and low image quality issues may severely complicate automatic OD/OC segmentation and fovea detection. Most existing works treat the identification of each la… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: 11 pages, 6 figures

  40. arXiv:2304.07495  [pdf

    physics.optics eess.IV

    Anti-scattering medium computational ghost imaging with modified Hadamard patterns

    Authors: Li-Xing Lin, Jie Cao, Qun Hao

    Abstract: Illumination patterns of computational ghost imaging (CGI) systems suffer from reduced contrast when passing through a scattering medium, which causes the effective information in the reconstruction result to be drowned out by noise. A two-dimensional (2D) Gaussian filter performs linear smoothing operation on the whole image for image denoising. It can be combined with linear reconstruction algor… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: 14 pages, 7 figures

  41. arXiv:2304.05635  [pdf, other

    eess.IV cs.CV

    Unifying and Personalizing Weakly-supervised Federated Medical Image Segmentation via Adaptive Representation and Aggregation

    Authors: Li Lin, Jiewei Wu, Yixiang Liu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Federated learning (FL) enables multiple sites to collaboratively train powerful deep models without compromising data privacy and security. The statistical heterogeneity (e.g., non-IID data and domain shifts) is a primary obstacle in FL, impairing the generalization performance of the global model. Weakly supervised segmentation, which uses sparsely-grained (i.e., point-, bounding box-, scribble-… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: 13 pages, 7 figures

  42. arXiv:2303.04603  [pdf, other

    eess.IV cs.CV

    Learning Enhancement From Degradation: A Diffusion Model For Fundus Image Enhancement

    Authors: Puijin Cheng, Li Lin, Yijin Huang, Huaqing He, Wenhan Luo, Xiaoying Tang

    Abstract: The quality of a fundus image can be compromised by numerous factors, many of which are challenging to be appropriately and mathematically modeled. In this paper, we introduce a novel diffusion model based framework, named Learning Enhancement from Degradation (LED), for enhancing fundus images. Specifically, we first adopt a data-driven degradation framework to learn degradation mappings from unp… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  43. arXiv:2303.03703  [pdf, other

    eess.IV

    Geometry-based spherical JND modeling for 360$^\circ$ display

    Authors: Hongan Wei, Jiaqi Liu, Bo Chen, Liqun Lin, Weiling Chen, Tiesong Zhao

    Abstract: 360$^\circ$ videos have received widespread attention due to its realistic and immersive experiences for users. To date, how to accurately model the user perceptions on 360$^\circ$ display is still a challenging issue. In this paper, we exploit the visual characteristics of 360$^\circ$ projection and display and extend the popular just noticeable difference (JND) model to spherical JND (SJND). Fir… ▽ More

    Submitted 4 June, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

  44. Adapting Datacenter Capacity for Greener Datacenters and Grid

    Authors: Liuzixuan Lin, Andrew A. Chien

    Abstract: Cloud providers are adapting datacenter (DC) capacity to reduce carbon emissions. With hyperscale datacenters exceeding 100 MW individually, and in some grids exceeding 15% of power load, DC adaptation is large enough to harm power grid dynamics, increasing carbon emissions, power prices, or reduce grid reliability. To avoid harm, we explore coordination of DC capacity change varying scope in sp… ▽ More

    Submitted 23 June, 2023; v1 submitted 8 January, 2023; originally announced January 2023.

    Comments: Published at e-Energy '23: Proceedings of the 14th ACM International Conference on Future Energy Systems

  45. arXiv:2301.01069  [pdf, other

    eess.IV cs.CV cs.IR

    Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment

    Authors: Liqun Lin, Yang Zheng, Weiling Chen, Chengdong Lan, Tiesong Zhao

    Abstract: Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and… ▽ More

    Submitted 3 January, 2023; originally announced January 2023.

  46. arXiv:2212.10541  [pdf, other

    cs.CV eess.IV

    UNO-QA: An Unsupervised Anomaly-Aware Framework with Test-Time Clustering for OCTA Image Quality Assessment

    Authors: Juntao Chen, Li Lin, Pujin Cheng, Yijin Huang, Xiaoying Tang

    Abstract: Medical image quality assessment (MIQA) is a vital prerequisite in various medical image analysis applications. Most existing MIQA algorithms are fully supervised that request a large amount of annotated data. However, annotating medical images is time-consuming and labor-intensive. In this paper, we propose an unsupervised anomaly-aware framework with test-time clustering for optical coherence to… ▽ More

    Submitted 21 February, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: submitted to ISBI2023

  47. arXiv:2212.05566  [pdf, other

    cs.CV eess.IV

    YoloCurvSeg: You Only Label One Noisy Skeleton for Vessel-style Curvilinear Structure Segmentation

    Authors: Li Lin, Linkai Peng, Huaqing He, Pujin Cheng, Jiewei Wu, Kenneth K. Y. Wong, Xiaoying Tang

    Abstract: Weakly-supervised learning (WSL) has been proposed to alleviate the conflict between data annotation cost and model performance through employing sparsely-grained (i.e., point-, box-, scribble-wise) supervision and has shown promising performance, particularly in the image segmentation field. However, it is still a very challenging task due to the limited supervision, especially when only a small… ▽ More

    Submitted 18 August, 2023; v1 submitted 11 December, 2022; originally announced December 2022.

    Comments: 20 pages, 15 figures, MEDIA accepted

  48. arXiv:2211.03310  [pdf, other

    eess.SY

    Log-linear Dynamic Inversion Control with Provable Safety Guarantees in Lie Groups

    Authors: Li-Yu Lin, James Goppert, Inseok Hwang

    Abstract: In this paper, we use the derivative of the exponential map to derive the exact evolution of the logarithm of the tracking error for mixed-invariant systems, a class of systems capable of describing rigid body tracking problems in Lie groups. Additionally, we design a log-linear dynamic inversion-based control law to remove the nonlinearities due to spatial curvature and enhance the robustness of… ▽ More

    Submitted 13 August, 2023; v1 submitted 7 November, 2022; originally announced November 2022.

    Comments: 7 pages, 5 figures. Revision is submitted to IEEE TAC

  49. arXiv:2211.00508  [pdf, other

    eess.AS cs.CL cs.SD

    Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation

    Authors: Liyong Guo, Xiaoyu Yang, Quandong Wang, Yuxiang Kong, Zengwei Yao, Fan Cui, Fangjun Kuang, Wei Kang, Long Lin, Mingshuang Luo, Piotr Zelasko, Daniel Povey

    Abstract: Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teacher label storage issue, especially when the training corpora are large. Although on-the-fly teacher label generation tackles this issue, the training… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2022

  50. arXiv:2211.00490  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Delay-penalized transducer for low-latency streaming ASR

    Authors: Wei Kang, Zengwei Yao, Fangjun Kuang, Liyong Guo, Xiaoyu Yang, Long lin, Piotr Żelasko, Daniel Povey

    Abstract: In streaming automatic speech recognition (ASR), it is desirable to reduce latency as much as possible while having minimum impact on recognition accuracy. Although a few existing methods are able to achieve this goal, they are difficult to implement due to their dependency on external alignments. In this paper, we propose a simple way to penalize symbol delay in transducer model, so that we can b… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Comments: Submitted to 2023 IEEE International Conference on Acoustics, Speech and Signal Processing