Zum Hauptinhalt springen

Showing 1–50 of 109 results for author: Guo, Z

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.08653  [pdf, other

    cs.SD eess.AS

    GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model

    Authors: Xavier Riley, Zixun Guo, Drew Edwards, Simon Dixon

    Abstract: We introduce GAPS (Guitar-Aligned Performance Scores), a new dataset of classical guitar performances, and a benchmark guitar transcription model that achieves state-of-the-art performance on GuitarSet in both supervised and zero-shot settings. GAPS is the largest dataset of real guitar audio, containing 14 hours of freely available audio-score aligned pairs, recorded in diverse conditions by over… ▽ More

    Submitted 30 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: ISMIR 2024

  2. Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

    Authors: Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu

    Abstract: Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks. These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment. However, frame-level correspondence with texts may be… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: ACM MM 2024 (Oral)

  3. arXiv:2407.20262  [pdf

    eess.SP

    A Neural-Network-Embedded Equivalent Circuit Model for Lithium-ion Battery State Estimation

    Authors: Zelin Guo, Yiyan Li, Zheng Yan, Mo-Yuen Chow

    Abstract: Equivalent Circuit Model(ECM)has been widelyused in battery modeling and state estimation because of itssimplicity, stability and interpretability.However, ECM maygenerate large estimation errors in extreme working conditionssuch as freezing environmenttemperature andcomplexcharging/discharging behaviors,in whichscenariostheelectrochemical characteristics of the battery become extremelycomplex and… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 8 pages

  4. arXiv:2407.18449  [pdf, other

    eess.IV cs.CV cs.LG

    Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

    Authors: Jiabo Ma, Zhengrui Guo, Fengtao Zhou, Yihui Wang, Yingxue Xu, Yu Cai, Zhengjie Zhu, Cheng Jin, Yi Lin, Xinrui Jiang, Anjia Han, Li Liang, Ronald Cheong Kin Chan, Jiguang Wang, Kwang-Ting Cheng, Hao Chen

    Abstract: Foundation models pretrained on large-scale datasets are revolutionizing the field of computational pathology (CPath). The generalization ability of foundation models is crucial for the success in various downstream clinical tasks. However, current foundation models have only been evaluated on a limited type and number of tasks, leaving their generalization ability and overall performance unclear.… ▽ More

    Submitted 3 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Report number: I.2.10

  5. arXiv:2407.10759  [pdf, other

    eess.AS cs.CL cs.LG

    Qwen2-Audio Technical Report

    Authors: Yunfei Chu, Jin Xu, Qian Yang, Haojie Wei, Xipin Wei, Zhifang Guo, Yichong Leng, Yuanjun Lv, Jinzheng He, Junyang Lin, Chang Zhou, Jingren Zhou

    Abstract: We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions. In contrast to complex hierarchical tags, we have simplified the pre-training process by utilizing natural language prompts for different data an… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: https://github.com/QwenLM/Qwen2-Audio. Checkpoints, codes and scripts will be opensoursed soon

  6. arXiv:2406.00279  [pdf

    eess.IV cs.CV

    Hybrid attention structure preserving network for reconstruction of under-sampled OCT images

    Authors: Zezhao Guo, Zhanfang Zhao

    Abstract: Optical coherence tomography (OCT) is a non-invasive, high-resolution imaging technology that provides cross-sectional images of tissues. Dense acquisition of A-scans along the fast axis is required to obtain high digital resolution images. However, the dense acquisition will increase the acquisition time, causing the discomfort of patients. In addition, the longer acquisition time may lead to mot… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  7. arXiv:2405.16980  [pdf, other

    cs.CV eess.IV

    DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking

    Authors: Hongtao Wang, Rongyu Feng, Liangyi Wu, Mutian Liu, Yinuo Cui, Chunxia Zhang, Zhenbo Guo

    Abstract: In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  8. arXiv:2405.16952  [pdf, other

    eess.AS

    A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

    Authors: Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

    Abstract: In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  9. arXiv:2405.15863  [pdf, other

    cs.SD cs.AI eess.AS

    QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation

    Authors: Chang Li, Ruoyu Wang, Lijuan Liu, Jun Du, Yixuan Sun, Zilu Guo, Zhenrong Zhang, Yuan Jiang

    Abstract: In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering an innovative approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, including both high-fidelity audio waveforms and detailed text descriptions, which often constitute only a small porti… ▽ More

    Submitted 20 August, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  10. arXiv:2405.13710  [pdf, other

    eess.IV cs.CV cs.LG

    Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies

    Authors: Amine Marzouki, Zhuxian Guo, Qinghe Zeng, Camille Kurtz, Nicolas Loménie

    Abstract: Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  11. arXiv:2405.11532  [pdf, other

    eess.SP

    Non-Invasive Monitoring of Vital Signs in Calves Using Thermal Imaging Technology

    Authors: Ehsan Sadeghi, Zinan Guo, Alessandro Chiumento, Paul Havinga

    Abstract: This study presents a non-invasive method using thermal imaging to estimate heart and respiration rates in calves, avoiding the stress from wearables. Using Kernelised Correlation Filters (KCF) for movement tracking and advanced signal processing, we targeted one ROI for respiration and four for heart rate based on their thermal correlation. Achieving Mean Absolute Percentage Errors (MAPE) of 3.08… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  12. arXiv:2404.17667  [pdf, other

    eess.SP cs.LG

    SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals

    Authors: Cheng Ding, Zhicheng Guo, Zhaoliang Chen, Randall J Lee, Cynthia Rudin, Xiao Hu

    Abstract: Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for developing foundation models for phys… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  13. arXiv:2404.09131  [pdf, other

    eess.SP

    Design of Artificial Interference Signals for Covert Communication Aided by Multiple Friendly Nodes

    Authors: Xuyang Zhao. Wei Guo, Yongchao Wang

    Abstract: In this paper, we consider a scenario of covert communication aided by multiple friendly interference nodes. The objective is to conceal the legitimate communication link under the surveillance of a warden. The main content is as follows: first, we propose a novel strategy for generating artificial noise signals in the considered covert scenario. Then, we leverage the statistical information of ch… ▽ More

    Submitted 9 May, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  14. arXiv:2404.08408  [pdf, other

    cs.LG cs.AI eess.SP physics.geo-ph

    Seismic First Break Picking in a Higher Dimension Using Deep Graph Learning

    Authors: Hongtao Wang, Li Long, Jiangshe Zhang, Xiaoli Wei, Chunxia Zhang, Zhenbo Guo

    Abstract: Contemporary automatic first break (FB) picking methods typically analyze 1D signals, 2D source gathers, or 3D source-receiver gathers. Utilizing higher-dimensional data, such as 2D or 3D, incorporates global features, improving the stability of local picking. Despite the benefits, high-dimensional data requires structured input and increases computational demands. Addressing this, we propose a no… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  15. arXiv:2404.07989  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.SD eess.AS

    Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

    Authors: Yiwen Tang, Ray Zhang, Jiaming Liu, Zoey Guo, Dong Wang, Zhigang Wang, Bin Zhao, Shanghang Zhang, Peng Gao, Hongsheng Li, Xuelong Li

    Abstract: Large foundation models have recently emerged as a prominent focus of interest, attaining superior performance in widespread scenarios. Due to the scarcity of 3D data, many efforts have been made to adapt pre-trained transformers from vision to 3D domains. However, such 2D-to-3D approaches are still limited, due to the potential loss of spatial geometries and high computation cost. More importantl… ▽ More

    Submitted 30 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Code and models are released at https://github.com/Ivan-Tang-3D/Any2Point

  16. arXiv:2404.07620  [pdf, other

    eess.IV cs.CV

    Diffusion Probabilistic Multi-cue Level Set for Reducing Edge Uncertainty in Pancreas Segmentation

    Authors: Yue Gou, Yuming Xing, Shengzhu Shi, Zhichang Guo

    Abstract: Accurately segmenting the pancreas remains a huge challenge. Traditional methods encounter difficulties in semantic localization due to the small volume and distorted structure of the pancreas, while deep learning methods encounter challenges in obtaining accurate edges because of low contrast and organ overlapping. To overcome these issues, we propose a multi-cue level set method based on the dif… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  17. arXiv:2404.00837  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    Automated HER2 Scoring in Breast Cancer Images Using Deep Learning and Pyramid Sampling

    Authors: Sahan Yoruc Selcuk, Xilin Yang, Bijie Bai, Yijie Zhang, Yuzhu Li, Musa Aydin, Aras Firat Unal, Aditya Gomatam, Zhen Guo, Darrow Morgan Angus, Goren Kolodney, Karine Atlan, Tal Keidar Haran, Nir Pillar, Aydogan Ozcan

    Abstract: Human epidermal growth factor receptor 2 (HER2) is a critical protein in cancer cell growth that signifies the aggressiveness of breast cancer (BC) and helps predict its prognosis. Accurate assessment of immunohistochemically (IHC) stained tissue slides for HER2 expression levels is essential for both treatment guidance and understanding of cancer mechanisms. Nevertheless, the traditional workflow… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 21 Pages, 7 Figures

    Journal ref: BME Frontiers (2024)

  18. arXiv:2403.09100  [pdf

    physics.med-ph cs.CV cs.LG eess.IV physics.optics

    Virtual birefringence imaging and histological staining of amyloid deposits in label-free tissue using autofluorescence microscopy and deep learning

    Authors: Xilin Yang, Bijie Bai, Yijie Zhang, Musa Aydin, Sahan Yoruc Selcuk, Zhen Guo, Gregory A. Fishbein, Karine Atlan, William Dean Wallace, Nir Pillar, Aydogan Ozcan

    Abstract: Systemic amyloidosis is a group of diseases characterized by the deposition of misfolded proteins in various organs and tissues, leading to progressive organ dysfunction and failure. Congo red stain is the gold standard chemical stain for the visualization of amyloid deposits in tissue sections, as it forms complexes with the misfolded proteins and shows a birefringence pattern under polarized lig… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 20 Pages, 5 Figures

  19. arXiv:2402.17187  [pdf, other

    eess.IV cs.CV

    PE-MVCNet: Multi-view and Cross-modal Fusion Network for Pulmonary Embolism Prediction

    Authors: Zhaoxin Guo, Zhipeng Wang, Ruiquan Ge, Jianxun Yu, Feiwei Qin, Yuan Tian, Yuqing Peng, Yonghong Li, Changmiao Wang

    Abstract: The early detection of a pulmonary embolism (PE) is critical for enhancing patient survival rates. Both image-based and non-image-based features are of utmost importance in medical classification tasks. In a clinical setting, physicians tend to rely on the contextual information provided by Electronic Medical Records (EMR) to interpret medical imaging. However, very few models effectively integrat… ▽ More

    Submitted 17 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  20. arXiv:2401.13959  [pdf, other

    eess.IV cs.CV

    Conditional Neural Video Coding with Spatial-Temporal Super-Resolution

    Authors: Henan Wang, Xiaohan Pan, Runsen Feng, Zongyu Guo, Zhibo Chen

    Abstract: This document is an expanded version of a one-page abstract originally presented at the 2024 Data Compression Conference. It describes our proposed method for the video track of the Challenge on Learned Image Compression (CLIC) 2024. Our scheme follows the typical hybrid coding framework with some novel techniques. Firstly, we adopt Spynet network to produce accurate motion vectors for motion esti… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster

  21. arXiv:2401.09673  [pdf, other

    cs.CV cs.CR cs.LG eess.IV

    Artwork Protection Against Neural Style Transfer Using Locally Adaptive Adversarial Color Attack

    Authors: Zhongliang Guo, Junhao Dong, Yifei Qian, Kaixuan Wang, Weiye Li, Ziheng Guo, Yuheng Wang, Yanli Li, Ognjen Arandjelović, Lei Fang

    Abstract: Neural style transfer (NST) generates new images by combining the style of one image with the content of another. However, unauthorized NST can exploit artwork, raising concerns about artists' rights and motivating the development of proactive protection methods. We propose Locally Adaptive Adversarial Color Attack (LAACA), empowering artists to protect their artwork from unauthorized style transf… ▽ More

    Submitted 5 July, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 9 pages, 5 figures, 4 tables

  22. arXiv:2401.08926  [pdf, other

    cs.CV eess.IV

    Uncertainty-aware No-Reference Point Cloud Quality Assessment

    Authors: Songlin Fan, Zixuan Guo, Wei Gao, Ge Li

    Abstract: The evolution of compression and enhancement algorithms necessitates an accurate quality assessment for point clouds. Previous works consistently regard point cloud quality assessment (PCQA) as a MOS regression problem and devise a deterministic mapping, ignoring the stochasticity in generating MOS from subjective tests. Besides, the viewpoint switching of 3D point clouds in subjective tests reinf… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  23. arXiv:2312.14705  [pdf, other

    eess.IV cs.CV cs.LG

    SCUNet++: Swin-UNet and CNN Bottleneck Hybrid Architecture with Multi-Fusion Dense Skip Connection for Pulmonary Embolism CT Image Segmentation

    Authors: Yifei Chen, Binfeng Zou, Zhaoxin Guo, Yiyu Huang, Yifan Huang, Feiwei Qin, Qinhai Li, Changmiao Wang

    Abstract: Pulmonary embolism (PE) is a prevalent lung disease that can lead to right ventricular hypertrophy and failure in severe cases, ranking second in severity only to myocardial infarction and sudden death. Pulmonary artery CT angiography (CTPA) is a widely used diagnostic method for PE. However, PE detection presents challenges in clinical practice due to limitations in imaging technology. CTPA can p… ▽ More

    Submitted 2 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 10 pages, 7 figures, accept WACV2024

    Journal ref: WACV 2024

  24. arXiv:2312.11896  [pdf, other

    eess.SY

    Stable Relay Learning Optimization Approach for Fast Power System Production Cost Minimization Simulation

    Authors: Zishan Guo, Qinran Hu, Tao Qian, Xin Fang, Renjie Hu, Zaijun Wu

    Abstract: Production cost minimization (PCM) simulation is commonly employed for assessing the operational efficiency, economic viability, and reliability, providing valuable insights for power system planning and operations. However, solving a PCM problem is time-consuming, consisting of numerous binary variables for simulation horizon extending over months and years. This hinders rapid assessment of moder… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Submitted to IEEE Transactions on Power Systems on December 15, 2023

  25. arXiv:2312.02300  [pdf

    cs.LG eess.SP

    Reconsideration on evaluation of machine learning models in continuous monitoring using wearables

    Authors: Cheng Ding, Zhicheng Guo, Cynthia Rudin, Ran Xiao, Fadi B Nahab, Xiao Hu

    Abstract: This paper explores the challenges in evaluating machine learning (ML) models for continuous health monitoring using wearable devices beyond conventional metrics. We state the complexities posed by real-world variability, disease dynamics, user-specific characteristics, and the prevalence of false notifications, necessitating novel evaluation strategies. Drawing insights from large-scale heart stu… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  26. arXiv:2311.16419  [pdf, other

    eess.SY

    A review on the charging station planning and fleet operation for electric freight vehicles

    Authors: Md Rakibul Alam, Zhaomiao Guo

    Abstract: Freight electrification introduces new opportunities and challenges for planning and operation. Although research on charging infrastructure planning and operation is widely available for general electric vehicles, unique physical and operational characteristics of EFVs coupled with specific patterns of logistics require dedicated research. This paper presents a comprehensive literature review to… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 43 pages, 4 figures, 2 tables

  27. arXiv:2311.08355  [pdf, other

    eess.AS

    Mustango: Toward Controllable Text-to-Music Generation

    Authors: Jan Melechovsky, Zixun Guo, Deepanway Ghosal, Navonil Majumder, Dorien Herremans, Soujanya Poria

    Abstract: The quality of the text-to-music models has reached new heights due to recent advancements in diffusion models. The controllability of various musical aspects, however, has barely been explored. In this paper, we propose Mustango: a music-domain-knowledge-inspired text-to-music system based on diffusion. Mustango aims to control the generated music, not only with general text captions, but with mo… ▽ More

    Submitted 3 June, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  28. Sum Rate Maximization under AoI Constraints for RIS-Assisted mmWave Communications

    Authors: Ziqi Guo, Yong Niu, Shiwen Mao, Changming Zhang, Ning Wang, Zhangdui Zhong, Bo Ai

    Abstract: The concept of age of information (AoI) has been proposed to quantify information freshness, which is crucial for time-sensitive applications. However, in millimeter wave (mmWave) communication systems, the link blockage caused by obstacles and the severe path loss greatly impair the freshness of information received by the user equipments (UEs). In this paper, we focus on reconfigurable intellige… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  29. arXiv:2310.17471  [pdf, other

    cs.IT cs.DC cs.LG cs.NI eess.SP

    Foundation Model Based Native AI Framework in 6G with Cloud-Edge-End Collaboration

    Authors: Xiang Chen, Zhiheng Guo, Xijun Wang, Howard H. Yang, Chenyuan Feng, Junshen Su, Sihui Zheng, Tony Q. S. Quek

    Abstract: Future wireless communication networks are in a position to move beyond data-centric, device-oriented connectivity and offer intelligent, immersive experiences based on task-oriented connections, especially in the context of the thriving development of pre-trained foundation models (PFM) and the evolving vision of 6G native artificial intelligence (AI). Therefore, redefining modes of collaboration… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures, 1 table

  30. arXiv:2310.03814  [pdf, other

    eess.SY

    Optimal Control of District Cooling Energy Plant with Reinforcement Learning and MPC

    Authors: Zhong Guo, Aditya Chaudhari, Austin R. Coffman, Prabir Barooah

    Abstract: We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the c… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: 18 pages, 12 figures. arXiv admin note: text overlap with arXiv:2203.07500

  31. arXiv:2309.09270   

    eess.AS cs.AI cs.SD

    Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning

    Authors: Zilu Guo, Jun Du, CHin-Hui Lee

    Abstract: In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During traini… ▽ More

    Submitted 7 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: We found the results are got from some wrong experimental settings. We needs new experiments

  32. arXiv:2309.03905  [pdf, other

    cs.MM cs.CL cs.CV cs.LG cs.SD eess.AS

    ImageBind-LLM: Multi-modality Instruction Tuning

    Authors: Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

    Abstract: We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training… ▽ More

    Submitted 11 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Code is available at https://github.com/OpenGVLab/LLaMA-Adapter

  33. arXiv:2309.02835  [pdf

    physics.optics eess.IV

    A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

    Authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi Jin, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang

    Abstract: Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 25 pages, 5 figures and 1 table

  34. arXiv:2309.02285  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    PromptTTS 2: Describing and Generating Voices with Text Prompt

    Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

    Abstract: Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text… ▽ More

    Submitted 11 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: Demo page: https://speechresearch.github.io/prompttts2

  35. arXiv:2308.11940  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Audio Generation with Multiple Conditional Diffusion Model

    Authors: Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang

    Abstract: Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and e… ▽ More

    Submitted 28 December, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted by AAAI 2024

  36. arXiv:2308.11530  [pdf, other

    cs.SD cs.AI eess.AS

    Leveraging Language Model Capabilities for Sound Event Detection

    Authors: Hualei Wang, Jianguo Mao, Zhifang Guo, Jiarui Wan, Hong Liu, Xiangdong Wang

    Abstract: Large language models reveal deep comprehension and fluent generation in the field of multi-modality. Although significant advancements have been achieved in audio multi-modality, existing methods are rarely leverage language model for sound event detection (SED). In this work, we propose an end-to-end framework for understanding audio features while simultaneously generating sound event and their… ▽ More

    Submitted 5 August, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, accept by interspeech2024

  37. arXiv:2307.09729  [pdf, other

    cs.CV cs.MM eess.IV

    NTIRE 2023 Quality Assessment of Video Enhancement Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu , et al. (47 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  38. arXiv:2307.05800  [pdf, other

    eess.IV cs.CV

    A Hierarchical Transformer Encoder to Improve Entire Neoplasm Segmentation on Whole Slide Image of Hepatocellular Carcinoma

    Authors: Zhuxian Guo, Qitong Wang, Henning Müller, Themis Palpanas, Nicolas Loménie, Camille Kurtz

    Abstract: In digital histopathology, entire neoplasm segmentation on Whole Slide Image (WSI) of Hepatocellular Carcinoma (HCC) plays an important role, especially as a preprocessing filter to automatically exclude healthy tissue, in histological molecular correlations mining and other downstream histopathological tasks. The segmentation task remains challenging due to HCC's inherent high-heterogeneity and t… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  39. arXiv:2307.05385  [pdf, other

    eess.SP cs.AI cs.LG

    Learned Kernels for Sparse, Interpretable, and Efficient Medical Time Series Processing

    Authors: Sully F. Chen, Zhicheng Guo, Cheng Ding, Xiao Hu, Cynthia Rudin

    Abstract: Background: Rapid, reliable, and accurate interpretation of medical signals is crucial for high-stakes clinical decision-making. The advent of deep learning allowed for an explosion of new models that offered unprecedented performance in medical time series processing but at a cost: deep learning models are often compute-intensive and lack interpretability. Methods: We propose Sparse Mixture of… ▽ More

    Submitted 2 April, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

    Comments: 26 pages, 9 figures

    Journal ref: Nature Machine Intelligence, 2024

  40. arXiv:2307.04101  [pdf, other

    cs.CV eess.IV

    Enhancing Building Semantic Segmentation Accuracy with Super Resolution and Deep Learning: Investigating the Impact of Spatial Resolution on Various Datasets

    Authors: Zhiling Guo, Xiaodan Shi, Haoran Zhang, Dou Huang, Xiaoya Song, Jinyue Yan, Ryosuke Shibasaki

    Abstract: The development of remote sensing and deep learning techniques has enabled building semantic segmentation with high accuracy and efficiency. Despite their success in different tasks, the discussions on the impact of spatial resolution on deep learning based building semantic segmentation are quite inadequate, which makes choosing a higher cost-effective data source a big challenge. To address the… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

  41. arXiv:2306.17441  [pdf, other

    cs.CV eess.IV

    Efficient Backdoor Removal Through Natural Gradient Fine-tuning

    Authors: Nazmul Karim, Abdullah Al Arafat, Umar Khalid, Zhishan Guo, Naznin Rahnavard

    Abstract: The success of a deep neural network (DNN) heavily relies on the details of the training scheme; e.g., training data, architectures, hyper-parameters, etc. Recent backdoor attacks suggest that an adversary can take advantage of such training details and compromise the integrity of a DNN. Our studies show that a backdoor model is usually optimized to a bad local minima, i.e. sharper minima as compa… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

  42. arXiv:2306.16050  [pdf, other

    cs.CV cs.LG eess.IV

    Evaluating Similitude and Robustness of Deep Image Denoising Models via Adversarial Attack

    Authors: Jie Ning, Jiebao Sun, Yao Li, Zhichang Guo, Wangmeng Zuo

    Abstract: Deep neural networks (DNNs) have shown superior performance comparing to traditional image denoising algorithms. However, DNNs are inevitably vulnerable while facing adversarial attacks. In this paper, we propose an adversarial attack method named denoising-PGD which can successfully attack all the current deep denoising models while keep the noise distribution almost unchanged. We surprisingly fi… ▽ More

    Submitted 6 July, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

  43. arXiv:2306.13875  [pdf, other

    cs.CV eess.IV

    Real-World Video for Zoom Enhancement based on Spatio-Temporal Coupling

    Authors: Zhiling Guo, Yinqiang Zheng, Haoran Zhang, Xiaodan Shi, Zekun Cai, Ryosuke Shibasaki, Jinyue Yan

    Abstract: In recent years, single-frame image super-resolution (SR) has become more realistic by considering the zooming effect and using real-world short- and long-focus image pairs. In this paper, we further investigate the feasibility of applying realistic multi-frame clips to enhance zoom quality via spatio-temporal information coupling. Specifically, we first built a real-world video benchmark, VideoRA… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: 11 pages

  44. arXiv:2306.08532  [pdf, ps, other

    eess.SP

    On the Generalization and Advancement of Half-Sine-Based Pulse Shaping Filters for Constant Envelope OQPSK Modulation

    Authors: Pengcheng Mu, Yan Liu, Zihao Guo, Xiaoyan Hu, Kai-Kit Wong

    Abstract: The offset quadrature phase-shift keying (OQPSK) modulation is a key factor for the technique of ZigBee, which has been adopted in IEEE 802.15.4 for wireless communications of Internet of Things (IoT) and Internet of Vehicles (IoV), etc. In this paper, we propose the general conditions of pulse shaping filters (PSFs) with constant envelope (CE) property for OQPSK modulation, which can be easily le… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: 5 pages, 5 figures, journal paper

  45. arXiv:2306.08527  [pdf, other

    eess.AS cs.AI cs.SD

    Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

    Authors: Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang

    Abstract: The goal of this study is to implement diffusion models for speech enhancement (SE). The first step is to emphasize the theoretical foundation of variance-preserving (VP)-based interpolation diffusion under continuous conditions. Subsequently, we present a more concise framework that encapsulates both the VP- and variance-exploding (VE)-based interpolation diffusion methods. We demonstrate that th… ▽ More

    Submitted 17 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  46. arXiv:2305.18744  [pdf, other

    eess.SP

    Unsupervised Massive MIMO Channel Estimation with Dual-Path Knowledge-Aware Auto-Encoders

    Authors: Zhiheng Guo, Yuanzhang Xiao, Xiang Chen

    Abstract: In this paper, an unsupervised deep learning framework based on dual-path model-driven variational auto-encoders (VAE) is proposed for angle-of-arrivals (AoAs) and channel estimation in massive MIMO systems. Specifically designed for channel estimation, the proposed VAE differs from the original VAE in two aspects. First, the encoder is a dual-path neural network, where one path uses the received… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

  47. arXiv:2305.16025  [pdf, other

    cs.CV eess.IV

    NVTC: Nonlinear Vector Transform Coding

    Authors: Runsen Feng, Zongyu Guo, Weiping Li, Zhibo Chen

    Abstract: In theory, vector quantization (VQ) is always better than scalar quantization (SQ) in terms of rate-distortion (R-D) performance. Recent state-of-the-art methods for neural image compression are mainly based on nonlinear transform coding (NTC) with uniform scalar quantization, overlooking the benefits of VQ due to its exponentially increased complexity. In this paper, we first investigate on some… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: Accepted by CVPR 2023

  48. arXiv:2305.12460  [pdf, other

    cs.SD eess.AS

    Study of GANs for Noisy Speech Simulation from Clean Speech

    Authors: Leander Melroy Maben, Zixun Guo, Chen Chen, Utkarsh Chudiwal, Chng Eng Siong

    Abstract: The performance of speech processing models trained on clean speech drops significantly in noisy conditions. Training with noisy datasets alleviates the problem, but procuring such datasets is not always feasible. Noisy speech simulation models that generate noisy speech from clean speech help remedy this issue. In our work, we study the ability of Generative Adversarial Networks (GANs) to simulat… ▽ More

    Submitted 21 May, 2023; originally announced May 2023.

  49. arXiv:2305.07678  [pdf, other

    eess.IV cs.IT cs.LG

    Exploring the Rate-Distortion-Complexity Optimization in Neural Image Compression

    Authors: Yixin Gao, Runsen Feng, Zongyu Guo, Zhibo Chen

    Abstract: Despite a short history, neural image codecs have been shown to surpass classical image codecs in terms of rate-distortion performance. However, most of them suffer from significantly longer decoding times, which hinders the practical applications of neural image codecs. This issue is especially pronounced when employing an effective yet time-consuming autoregressive context model since it would i… ▽ More

    Submitted 11 May, 2023; originally announced May 2023.

  50. arXiv:2303.06811  [pdf, other

    eess.AS

    The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge

    Authors: Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie

    Abstract: This paper describes our NPU-Elevoc personalized speech enhancement system (NAPSE) for the 5th Deep Noise Suppression Challenge at ICASSP 2023. Based on the superior two-stage model TEA-PSE 2.0, our system particularly explores better strategy for speaker embedding fusion, optimizes the model training pipeline, and leverages adversarial training and multi-scale loss. According to the results, our… ▽ More

    Submitted 15 March, 2023; v1 submitted 12 March, 2023; originally announced March 2023.