Search | arXiv e-print repository

AudioBench: A Universal Benchmark for Audio Large Language Models

Authors: Bin Wang, Xunlong Zou, Geyu Lin, Shuo Sun, Zhuohan Liu, Wenyu Zhang, Zhengyuan Liu, AiTi Aw, Nancy F. Chen

Abstract: We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in co… ▽ More We introduce AudioBench, a new benchmark designed to evaluate audio large language models (AudioLLMs). AudioBench encompasses 8 distinct tasks and 26 carefully selected or newly curated datasets, focusing on speech understanding, voice interpretation, and audio scene understanding. Despite the rapid advancement of large language models, including multimodal versions, a significant gap exists in comprehensive benchmarks for thoroughly evaluating their capabilities. AudioBench addresses this gap by providing relevant datasets and evaluation metrics. In our study, we evaluated the capabilities of four models across various aspects and found that no single model excels consistently across all tasks. We outline the research outlook for AudioLLMs and anticipate that our open-source code, data, and leaderboard will offer a robust testbed for future model developments. △ Less

Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: 20 pages; v2 - typo update; Code: https://github.com/AudioLLMs/AudioBench

arXiv:2406.14069 [pdf, other]

Towards Multi-modality Fusion and Prototype-based Feature Refinement for Clinically Significant Prostate Cancer Classification in Transrectal Ultrasound

Authors: Hong Wu, Juan Fu, Hongsheng Ye, Yuming Zhong, Xuebin Zou, Jianhua Zhou, Yi Wang

Abstract: Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classifica… ▽ More Prostate cancer is a highly prevalent cancer and ranks as the second leading cause of cancer-related deaths in men globally. Recently, the utilization of multi-modality transrectal ultrasound (TRUS) has gained significant traction as a valuable technique for guiding prostate biopsies. In this study, we propose a novel learning framework for clinically significant prostate cancer (csPCa) classification using multi-modality TRUS. The proposed framework employs two separate 3D ResNet-50 to extract distinctive features from B-mode and shear wave elastography (SWE). Additionally, an attention module is incorporated to effectively refine B-mode features and aggregate the extracted features from both modalities. Furthermore, we utilize few shot segmentation task to enhance the capacity of classification encoder. Due to the limited availability of csPCa masks, a prototype correction module is employed to extract representative prototypes of csPCa. The performance of the framework is assessed on a large-scale dataset consisting of 512 TRUS videos with biopsy-proved prostate cancer. The results demonstrate the strong capability in accurately identifying csPCa, achieving an area under the curve (AUC) of 0.86. Moreover, the framework generates visual class activation mapping (CAM), which can serve as valuable assistance for localizing csPCa. These CAM images may offer valuable guidance during TRUS-guided targeted biopsies, enhancing the efficacy of the biopsy procedure.The code is available at https://github.com/2313595986/SmileCode. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12943 [pdf]

A square cross-section FOV rotational CL (SC-CL) and its analytical reconstruction method

Authors: Xiang Zou, Wuliang Shi, Muge Du, Yuxiang Xing

Abstract: Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was deriv… ▽ More Rotational computed laminography (CL) has broad application potential in three-dimensional imaging of plate-like objects, as it only needs x-ray to pass through the tested object in the thickness direction during the imaging process. In this study, a square cross-section FOV rotational CL (SC-CL) was proposed. Then, the FDK-type analytical reconstruction algorithm applicable to the SC-CL was derived. On this basis, the proposed method was validated through numerical experiments. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2404.09433 [pdf, other]

MarsQE: Semantic-Informed Quality Enhancement for Compressed Martian Image

Authors: Chengfeng Liu, Mai Xu, Qunliang Xing, Xin Zou

Abstract: Lossy image compression is essential for Mars exploration missions, due to the limited bandwidth between Earth and Mars. However, the compression may introduce visual artifacts that complicate the geological analysis of the Martian surface. Existing quality enhancement approaches, primarily designed for Earth images, fall short for Martian images due to a lack of consideration for the unique Marti… ▽ More Lossy image compression is essential for Mars exploration missions, due to the limited bandwidth between Earth and Mars. However, the compression may introduce visual artifacts that complicate the geological analysis of the Martian surface. Existing quality enhancement approaches, primarily designed for Earth images, fall short for Martian images due to a lack of consideration for the unique Martian semantics. In response to this challenge, we conduct an in-depth analysis of Martian images, yielding two key insights based on semantics: the presence of texture similarities and the compact nature of texture representations in Martian images. Inspired by these findings, we introduce MarsQE, an innovative, semantic-informed, two-phase quality enhancement approach specifically designed for Martian images. The first phase involves the semantic-based matching of texture-similar reference images, and the second phase enhances image quality by transferring texture patterns from these reference images to the compressed image. We also develop a post-enhancement network to further reduce compression artifacts and achieve superior compression quality. Our extensive experiments demonstrate that MarsQE significantly outperforms existing approaches for Earth images, establishing a new benchmark for the quality enhancement on Martian images. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2403.14135 [pdf, other]

Powerful Lossy Compression for Noisy Images

Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recently, the academic community began to make some attempts to tackle this problem through end-to-end joint methods. Most of them ignore that different regions of noisy images have different characteristics. To solve these problems, in this paper, our proposed signal-to-noise ratio~(SNR) aware joint solution exploits local and non-local features for image compression and denoising simultaneously. We design an end-to-end trainable network, which includes the main encoder branch, the guidance branch, and the signal-to-noise ratio~(SNR) aware branch. We conducted extensive experiments on both synthetic and real-world datasets, demonstrating that our joint solution outperforms existing state-of-the-art methods. △ Less

Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by ICME 2024

arXiv:2403.02601 [pdf, other]

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

Authors: Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Haoze Sun, Xueyi Zou, Zhensong Zhang, Youliang Yan, Lei Zhu

Abstract: For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (L… ▽ More For image super-resolution (SR), bridging the gap between the performance on synthetic datasets and real-world degradation scenarios remains a challenge. This work introduces a novel "Low-Res Leads the Way" (LWay) training framework, merging Supervised Pre-training with Self-supervised Learning to enhance the adaptability of SR models to real-world images. Our approach utilizes a low-resolution (LR) reconstruction network to extract degradation embeddings from LR images, merging them with super-resolved outputs for LR reconstruction. Leveraging unseen LR images for self-supervised learning guides the model to adapt its modeling space to the target domain, facilitating fine-tuning of SR models without requiring paired high-resolution (HR) images. The integration of Discrete Wavelet Transform (DWT) further refines the focus on high-frequency details. Extensive evaluations show that our method significantly improves the generalization and detail restoration capabilities of SR models on unseen real-world datasets, outperforming existing methods. Our training regime is universally compatible, requiring no network architecture modifications, making it a practical solution for real-world SR applications. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2311.18327 [pdf]

Deep Reinforcement Learning Based Optimal Energy Management of Multi-energy Microgrids with Uncertainties

Authors: Yang Cui, Yang Xu, Yang Li, Yijian Wang, Xinpeng Zou

Abstract: Multi-energy microgrid (MEMG) offers an effective approach to deal with energy demand diversification and new energy consumption on the consumer side. In MEMG, it is critical to deploy an energy management system (EMS) for efficient utilization of energy and reliable operation of the system. To help EMS formulate optimal dispatching schemes, a deep reinforcement learning (DRL)-based MEMG energy ma… ▽ More Multi-energy microgrid (MEMG) offers an effective approach to deal with energy demand diversification and new energy consumption on the consumer side. In MEMG, it is critical to deploy an energy management system (EMS) for efficient utilization of energy and reliable operation of the system. To help EMS formulate optimal dispatching schemes, a deep reinforcement learning (DRL)-based MEMG energy management scheme with renewable energy source (RES) uncertainty is proposed in this paper. To accurately describe the operating state of the MEMG, the off-design performance model of energy conversion devices is considered in scheduling. The nonlinear optimal dispatching model is expressed as a Markov decision process (MDP) and is then addressed by the twin delayed deep deterministic policy gradient (TD3) algorithm. In addition, to accurately describe the uncertainty of RES, the conditional-least squares generative adversarial networks (C-LSGANs) method based on RES forecast power is proposed to construct the scenarios set of RES power generation. The generated data of RES is used for scheduling to obtain caps and floors for the purchase of electricity and natural gas. Based on this, the superior energy supply sector can formulate solutions in advance to tackle the uncertainty of RES. Finally, the simulation analysis demonstrates the validity and superiority of the method. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: Accepted by CSEE Journal of Power and Energy Systems

arXiv:2311.12083 [pdf, other]

PanBench: Towards High-Resolution and High-Performance Pansharpening

Authors: Shiying Wang, Xuechao Zou, Kai Li, Junliang Xing, Pin Tao

Abstract: Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. Whil… ▽ More Pansharpening, a pivotal task in remote sensing, involves integrating low-resolution multispectral images with high-resolution panchromatic images to synthesize an image that is both high-resolution and retains multispectral information. These pansharpened images enhance precision in land cover classification, change detection, and environmental monitoring within remote sensing data analysis. While deep learning techniques have shown significant success in pansharpening, existing methods often face limitations in their evaluation, focusing on restricted satellite data sources, single scene types, and low-resolution images. This paper addresses this gap by introducing PanBench, a high-resolution multi-scene dataset containing all mainstream satellites and comprising 5,898 pairs of samples. Each pair includes a four-channel (RGB + near-infrared) multispectral image of 256x256 pixels and a mono-channel panchromatic image of 1,024x1,024 pixels. To achieve high-fidelity synthesis, we propose a Cascaded Multiscale Fusion Network (CMFNet) for Pansharpening. Extensive experiments validate the effectiveness of CMFNet. We have released the dataset, source code, and pre-trained models in the supplementary, fostering further research in remote sensing. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 10 pages, 5 figures

arXiv:2309.15367 [pdf]

Analysis on Multi-robot Relative 6-DOF Pose Estimation Error Based on UWB Range

Authors: Xinran Li, Shuaikang Zheng, Pengcheng Zheng, Haifeng Zhang, Zhitian Li, Xudong Zou

Abstract: Relative pose estimation is the foundational requirement for multi-robot system, while it is a challenging research topic in infrastructure-free scenes. In this study, we analyze the relative 6-DOF pose estimation error of multi-robot system in GNSS-denied and anchor-free environment. An analytical lower bound of position and orientation estimation error is given under the assumption that distance… ▽ More Relative pose estimation is the foundational requirement for multi-robot system, while it is a challenging research topic in infrastructure-free scenes. In this study, we analyze the relative 6-DOF pose estimation error of multi-robot system in GNSS-denied and anchor-free environment. An analytical lower bound of position and orientation estimation error is given under the assumption that distance between the nodes are far more than the size of robotic platform. Through simulation, impact of distance between nodes, altitudes and circumradius of tag simplex on pose estimation accuracy is discussed, which verifies the analysis results. Our analysis is expected to determine parameters (e.g. deployment of tags) of UWB based multi-robot systems. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 7 pages, 9 figures

arXiv:2308.04417 [pdf, other]

DiffCR: A Fast Conditional Diffusion Framework for Cloud Removal from Optical Satellite Images

Authors: Xuechao Zou, Kai Li, Junliang Xing, Yu Zhang, Shiying Wang, Lei Jin, Pin Tao

Abstract: Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image qual… ▽ More Optical satellite images are a critical data source; however, cloud cover often compromises their quality, hindering image applications and analysis. Consequently, effectively removing clouds from optical satellite images has emerged as a prominent research direction. While recent advancements in cloud removal primarily rely on generative adversarial networks, which may yield suboptimal image quality, diffusion models have demonstrated remarkable success in diverse image-generation tasks, showcasing their potential in addressing this challenge. This paper presents a novel framework called DiffCR, which leverages conditional guided diffusion with deep convolutional networks for high-performance cloud removal for optical satellite imagery. Specifically, we introduce a decoupled encoder for conditional image feature extraction, providing a robust color representation to ensure the close similarity of appearance information between the conditional input and the synthesized output. Moreover, we propose a novel and efficient time and condition fusion block within the cloud removal model to accurately simulate the correspondence between the appearance in the conditional image and the target image at a low computational cost. Extensive experimental evaluations on two commonly used benchmark datasets demonstrate that DiffCR consistently achieves state-of-the-art performance on all metrics, with parameter and computational complexities amounting to only 5.1% and 5.4%, respectively, of those previous best methods. The source code, pre-trained models, and all the experimental results will be publicly available at https://github.com/XavierJiezou/DiffCR upon the paper's acceptance of this work. △ Less

Submitted 8 August, 2023; originally announced August 2023.

Comments: 13 pages, 7 figures

arXiv:2307.13953 [pdf, other]

The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

Authors: Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

Abstract: This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiolo… ▽ More This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiological perspective, each segment of speech -- phoneme -- corresponds to different types of airflow and movements in the face. Therefore, it is advantageous to discover the hidden link between phonemes and face attributes. In this paper, we propose an analysis pipeline to help us explore the voice-face relationship in a fine-grained manner, i.e., phonemes v.s. facial anthropometric measurements (AM). We build an estimator for each phoneme-AM pair and evaluate the correlation through hypothesis testing. Our results indicate that AMs are more predictable from vowels compared to consonants, particularly with plosives. Additionally, we observe that if a specific AM exhibits more movement during phoneme pronunciation, it is more predictable. Our findings support those in physiology regarding correlation and lay the groundwork for future research on speech-face multimodal learning. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: Interspeech 2023

arXiv:2307.09740 [pdf]

A Physics-Informed Data-Driven Fault Location Method for Transmission Lines Using Single-Ended Measurements with Field Data Validation

Authors: Yiqi Xing, Yu Liu, Dayou Lu, Xinchen Zou, Xuming He

Abstract: Data driven transmission line fault location methods have the potential to more accurately locate faults by extracting fault information from available data. However, most of the data driven fault location methods in the literature are not validated by field data for the following reasons. On one hand, the available field data during faults are very limited for one specific transmission line, and… ▽ More Data driven transmission line fault location methods have the potential to more accurately locate faults by extracting fault information from available data. However, most of the data driven fault location methods in the literature are not validated by field data for the following reasons. On one hand, the available field data during faults are very limited for one specific transmission line, and using field data for training is close to impossible. On the other hand, if simulation data are utilized for training, the mismatch between the simulation system and the practical system will cause fault location errors. To this end, this paper proposes a physics-informed data-driven fault location method. The data from a practical fault event are first analyzed to extract the ranges of system and fault parameters such as equivalent source impedances, loading conditions, fault inception angles (FIA) and fault resistances. Afterwards, the simulation system is constructed with the ranges of parameters, to generate data for training. This procedure merges the gap between simulation and practical power systems, and at the same time considers the uncertainty of system and fault parameters in practice. The proposed data-driven method does not require system parameters, only requires instantaneous voltage and current measurements at the local terminal, with a low sampling rate of several kHz and a short fault time window of half a cycle before and after the fault occurs. Numerical experiments and field data experiments clearly validate the advantages of the proposed method over existing data driven methods. △ Less

Submitted 18 July, 2023; originally announced July 2023.

Comments: 10 pages, 27 figures

arXiv:2305.15030 [pdf, other]

Make Lossy Compression Meaningful for Low-Light Images

Authors: Shilv Cai, Liqun Chen, Sheng Zhong, Luxin Yan, Jiahuan Zhou, Xu Zou

Abstract: Low-light images frequently occur due to unavoidable environmental influences or technical limitations, such as insufficient lighting or limited exposure time. To achieve better visibility for visual perception, low-light image enhancement is usually adopted. Besides, lossy image compression is vital for meeting the requirements of storage and transmission in computer vision applications. To touch… ▽ More Low-light images frequently occur due to unavoidable environmental influences or technical limitations, such as insufficient lighting or limited exposure time. To achieve better visibility for visual perception, low-light image enhancement is usually adopted. Besides, lossy image compression is vital for meeting the requirements of storage and transmission in computer vision applications. To touch the above two practical demands, current solutions can be categorized into two sequential manners: ``Compress before Enhance (CbE)'' or ``Enhance before Compress (EbC)''. However, both of them are not suitable since: (1) Error accumulation in the individual models plagues sequential solutions. Especially, once low-light images are compressed by existing general lossy image compression approaches, useful information (e.g., texture details) would be lost resulting in a dramatic performance decrease in low-light image enhancement. (2) Due to the intermediate process, the sequential solution introduces an additional burden resulting in low efficiency. We propose a novel joint solution to simultaneously achieve a high compression rate and good enhancement performance for low-light images with much lower computational cost and fewer model parameters. We design an end-to-end trainable architecture, which includes the main enhancement branch and the signal-to-noise ratio (SNR) aware branch. Experimental results show that our proposed joint solution achieves a significant improvement over different combinations of existing state-of-the-art sequential ``Compress before Enhance'' or ``Enhance before Compress'' solutions for low-light images, which would make lossy low-light image compression more meaningful. The project is publicly available at: https://github.com/CaiShilv/Joint-IC-LL. △ Less

Submitted 24 February, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted by AAAI 2024

ACM Class: I.4.2; I.4.3

arXiv:2305.03387 [pdf, other]

AsConvSR: Fast and Lightweight Super-Resolution Network with Assembled Convolutions

Authors: Jiaming Guo, Xueyi Zou, Yuyi Chen, Yi Liu, Jia Hao, Jianzhuang Liu, Youliang Yan

Abstract: In recent years, videos and images in 720p (HD), 1080p (FHD) and 4K (UHD) resolution have become more popular for display devices such as TVs, mobile phones and VR. However, these high resolution images cannot achieve the expected visual effect due to the limitation of the internet bandwidth, and bring a great challenge for super-resolution networks to achieve real-time performance. Following this… ▽ More In recent years, videos and images in 720p (HD), 1080p (FHD) and 4K (UHD) resolution have become more popular for display devices such as TVs, mobile phones and VR. However, these high resolution images cannot achieve the expected visual effect due to the limitation of the internet bandwidth, and bring a great challenge for super-resolution networks to achieve real-time performance. Following this challenge, we explore multiple efficient network designs, such as pixel-unshuffle, repeat upscaling, and local skip connection removal, and propose a fast and lightweight super-resolution network. Furthermore, by analyzing the applications of the idea of divide-and-conquer in super-resolution, we propose assembled convolutions which can adapt convolution kernels according to the input features. Experiments suggest that our method outperforms all the state-of-the-art efficient super-resolution models, and achieves optimal results in terms of runtime and quality. In addition, our method also wins the first place in NTIRE 2023 Real-Time Super-Resolution - Track 1 ($\times$2). The code will be available at https://gitee.com/mindspore/models/tree/master/research/cv/AsConvSR △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2303.16565 [pdf, other]

PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-performance Cloud Removal from Multi-temporal Satellite Imagery

Authors: Xuechao Zou, Kai Li, Junliang Xing, Pin Tao, Yachao Cui

Abstract: Satellite imagery analysis plays a pivotal role in remote sensing; however, information loss due to cloud cover significantly impedes its application. Although existing deep cloud removal models have achieved notable outcomes, they scarcely consider contextual information. This study introduces a high-performance cloud removal architecture, termed Progressive Multi-scale Attention Autoencoder (PMA… ▽ More Satellite imagery analysis plays a pivotal role in remote sensing; however, information loss due to cloud cover significantly impedes its application. Although existing deep cloud removal models have achieved notable outcomes, they scarcely consider contextual information. This study introduces a high-performance cloud removal architecture, termed Progressive Multi-scale Attention Autoencoder (PMAA), which concurrently harnesses global and local information to construct robust contextual dependencies using a novel Multi-scale Attention Module (MAM) and a novel Local Interaction Module (LIM). PMAA establishes long-range dependencies of multi-scale features using MAM and modulates the reconstruction of fine-grained details utilizing LIM, enabling simultaneous representation of fine- and coarse-grained features at the same level. With the help of diverse and multi-scale features, PMAA consistently outperforms the previous state-of-the-art model CTGAN on two benchmark datasets. Moreover, PMAA boasts considerable efficiency advantages, with only 0.5% and 14.6% of the parameters and computational complexity of CTGAN, respectively. These comprehensive results underscore PMAA's potential as a lightweight cloud removal network suitable for deployment on edge devices to accomplish large-scale cloud removal tasks. Our source code and pre-trained models are available at https://github.com/XavierJiezou/PMAA. △ Less

Submitted 8 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted by ECAI 2023

arXiv:2302.06294 [pdf, other]

doi 10.1016/j.media.2023.102888

CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection

Authors: Chinedu Innocent Nwoye, Tong Yu, Saurav Sharma, Aditya Murali, Deepak Alapatt, Armine Vardazaryan, Kun Yuan, Jonas Hajek, Wolfgang Reiter, Amine Yamlahi, Finn-Henri Smidt, Xiaoyang Zou, Guoyan Zheng, Bruno Oliveira, Helena R. Torres, Satoshi Kondo, Satoshi Kasai, Felix Holm, Ege Özsoy, Shuangchun Gui, Han Li, Sista Raviteja, Rachana Sathish, Pranav Poudel, Binod Bhattarai , et al. (24 additional authors not shown)

Abstract: Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier effor… ▽ More Formalizing surgical activities as triplets of the used instruments, actions performed, and target anatomies is becoming a gold standard approach for surgical activity modeling. The benefit is that this formalization helps to obtain a more detailed understanding of tool-tissue interaction which can be used to develop better Artificial Intelligence assistance for image-guided surgery. Earlier efforts and the CholecTriplet challenge introduced in 2021 have put together techniques aimed at recognizing these triplets from surgical footage. Estimating also the spatial locations of the triplets would offer a more precise intraoperative context-aware decision support for computer-assisted intervention. This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool), as the key actors, and the modeling of each tool-activity in the form of <instrument, verb, target> triplet. The paper describes a baseline method and 10 new deep learning algorithms presented at the challenge to solve the task. It also provides thorough methodological comparisons of the methods, an in-depth analysis of the obtained results across multiple metrics, visual and procedural challenges; their significance, and useful insights for future research directions and applications in surgery. △ Less

Submitted 14 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: MICCAI EndoVis CholecTriplet2022 challenge report. Published at Elsevier journal of Medical Image Analysis. 25 pages, 15 figures, 8 tables

Journal ref: Medical Image Analysis, Volume 89, 2023, 102888, ISSN 1361-8415

arXiv:2212.01560 [pdf]

High-resolution and reliable automatic target recognition based on photonic ISAR imaging system with explainable deep learning

Authors: Xiuting Zou, Anyi Deng, Yiheng Hu, Shiyu Hua, Linbo Zhang, Shaofu Xu, Weiwen Zou

Abstract: Automatic target recognition (ATR) based on inverse synthetic aperture radar (ISAR) images, which is extensively utilized to surveil environment in military and civil fields, must be high-precision and reliable. Photonic technologies' advantage of broad bandwidth enables ISAR systems to realize high-resolution imaging, which is in favor of achieving high-performance ATR. Deep learning (DL) algorit… ▽ More Automatic target recognition (ATR) based on inverse synthetic aperture radar (ISAR) images, which is extensively utilized to surveil environment in military and civil fields, must be high-precision and reliable. Photonic technologies' advantage of broad bandwidth enables ISAR systems to realize high-resolution imaging, which is in favor of achieving high-performance ATR. Deep learning (DL) algorithms have achieved excellent recognition accuracies. However, the lack of interpretability of DL algorithms causes the head-scratching problem of credibility. In this paper, we exploit the inner relationship between a photonic ISAR imaging system and behaviors of a convolutional neural network (CNN) to deeply comprehend the intelligent recognition. Specifically, we manipulate imaging physical process and analyze network outputs, the relevance between the ISAR image and network output, and the visualization of features in the network output layer. Consequently, the broader imaging bandwidths and appropriate imaging angles lead to more detailed structural and contour features and the bigger discrepancy among ISAR images of different targets, which contributes to the CNN recognizing and distinguishing objects according to physical laws. Then, based on the photonic ISAR imaging system and the explainable CNN, we accomplish a high-accuracy and reliable ATR. To the best of our knowledge, there is no precedent of explaining the DL algorithms by exploring the influence of the physical process of data generation on network behaviors. It is anticipated that this work can not only inspire the accomplishment of a high-performance ATR but also bring new insights to explore network behaviors and thus achieve better intelligent abilities. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2210.17530 [pdf, other]

Joint Localization and Beamforming for Reconfigurable Intelligent Surface Aided 5G mmWave Communication Systems

Authors: Yunis Xanthos, Wanting Lyu, Songjie Yang, Chadi Assi, Xianbing Zou, Ning Wei

Abstract: Reconfigurable intelligent surface (RIS) is an attractive technology to improve the transmission rate of millimetre-wave (mmWave) communication systems. The previous {research} on RIS technology mainly focused on improving the transmission rate and security rate of the mmWave communication systems. Since the emergence of RIS technology creates the conditions for generating an intelligent radio env… ▽ More Reconfigurable intelligent surface (RIS) is an attractive technology to improve the transmission rate of millimetre-wave (mmWave) communication systems. The previous {research} on RIS technology mainly focused on improving the transmission rate and security rate of the mmWave communication systems. Since the emergence of RIS technology creates the conditions for generating an intelligent radio environment, it also has potential advantages on improving the localization accuracy of the mmWave communication systems. Deployed on walls and objects, RISs are capable of significantly improving communications and positioning coverage by controlling the multi-path reflection. This paper considers the RIS-aided mmWave localization system and proposes a joint beamforming and localization problem. However, since the objective function depends on the unknown UE's position and instantaneous channel state information (CSI), this beamforming and localization technology based on RIS assistance is challenging. To solve this problem, we propose a new joint localization and beamforming optimization (JLBO) algorithm, and give the proof of its convergence. The simulation results show that the RIS can improve the user localization accuracy of the system and the proposed scheme has a significant performance improvement compared with the traditional schemes. △ Less

Submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.11410 [pdf]

Millimeter-level Resolution Photonic Multiband Radar Using a Single MZM and Sub-GHz-Bandwidth Electronics

Authors: Peixuan Li, Wenlin Bai, Xihua Zou, Ningyuan Zhong, Wei Pan, Lianshan Yan

Abstract: We here propose a novel cost-effective millimeter-level resolution photonic multiband radar system using a single MZM driven by a 1-GHz-bandwidth LFM signal. It experimentally shows an ~8.5-mm range resolution through coherence-processing-free multiband data fusion. We here propose a novel cost-effective millimeter-level resolution photonic multiband radar system using a single MZM driven by a 1-GHz-bandwidth LFM signal. It experimentally shows an ~8.5-mm range resolution through coherence-processing-free multiband data fusion. △ Less

Submitted 18 October, 2022; originally announced October 2022.

arXiv:2210.04280 [pdf]

doi 10.1364/OL.472155

Cost-effective photonic super-resolution millimeter-wave joint radar-communication system using self-coherent detection

Authors: Wenlin Bai, Peixuan Li, Xihua Zou, Ningyuan Zhong, Wei Pan, Lianshan Yan, Bin Luo

Abstract: A cost-effective millimeter-wave (MMW) joint radar-communication (JRC) system with super resolution is proposed and experimentally demonstrated, using optical heterodyne up-conversion and self-coherent detection down-conversion techniques. The point lies in the designed coherent dual-band constant envelope linear frequency modulation-orthogonal frequency division multiplexing (LFM-OFDM) signal wit… ▽ More A cost-effective millimeter-wave (MMW) joint radar-communication (JRC) system with super resolution is proposed and experimentally demonstrated, using optical heterodyne up-conversion and self-coherent detection down-conversion techniques. The point lies in the designed coherent dual-band constant envelope linear frequency modulation-orthogonal frequency division multiplexing (LFM-OFDM) signal with opposite phase modulation indexes for the JRC system. Then the self-coherent detection, as a simple and low-cost means, is accordingly facilitated for both de-chirping of MMW radar and frequency down-conversion reception of MMW communication, which circumvents the costly high-speed mixers along with MMW local oscillators and more significantly achieves the real-time decomposition of radar and communication information. Furthermore, a super resolution radar range profile is realized through the coherent fusion processing of dual-band JRC signal. In experiments, a dual-band LFM-OFDM JRC signal centered at 54-GHz and 61-GHz is generated. The dual bands are featured with an identical instantaneous bandwidth of 2 GHz and carry an OFDM signal of 1 GBaud, which help to achieve a 6-Gbit/s data rate for communication and a 1.76-cm range resolution for radar. △ Less

Submitted 9 October, 2022; originally announced October 2022.

arXiv:2209.05054 [pdf, other]

doi 10.1145/3503161.3547880

High-Fidelity Variable-Rate Image Compression via Invertible Activation Transformation

Authors: Shilv Cai, Zhijun Zhang, Liqun Chen, Luxin Yan, Sheng Zhong, Xu Zou

Abstract: Learning-based methods have effectively promoted the community of image compression. Meanwhile, variational autoencoder (VAE) based variable-rate approaches have recently gained much attention to avoid the usage of a set of different networks for various compression rates. Despite the remarkable performance that has been achieved, these approaches would be readily corrupted once multiple compressi… ▽ More Learning-based methods have effectively promoted the community of image compression. Meanwhile, variational autoencoder (VAE) based variable-rate approaches have recently gained much attention to avoid the usage of a set of different networks for various compression rates. Despite the remarkable performance that has been achieved, these approaches would be readily corrupted once multiple compression/decompression operations are executed, resulting in the fact that image quality would be tremendously dropped and strong artifacts would appear. Thus, we try to tackle the issue of high-fidelity fine variable-rate image compression and propose the Invertible Activation Transformation (IAT) module. We implement the IAT in a mathematical invertible manner on a single rate Invertible Neural Network (INN) based model and the quality level (QLevel) would be fed into the IAT to generate scaling and bias tensors. IAT and QLevel together give the image compression model the ability of fine variable-rate control while better maintaining the image fidelity. Extensive experiments demonstrate that the single rate image compression model equipped with our IAT module has the ability to achieve variable-rate control without any compromise. And our IAT-embedded model obtains comparable rate-distortion performance with recent learning-based image compression methods. Furthermore, our method outperforms the state-of-the-art variable-rate image compression method by a large margin, especially after multiple re-encodings. △ Less

Submitted 12 September, 2022; originally announced September 2022.

Comments: Accept to ACMMM2022

MSC Class: 68P30 ACM Class: I.4.2

arXiv:2208.11184 [pdf, other]

AIM 2022 Challenge on Super-Resolution of Compressed Image and Video: Dataset, Methods and Results

Authors: Ren Yang, Radu Timofte, Xin Li, Qi Zhang, Lin Zhang, Fanglong Liu, Dongliang He, Fu li, He Zheng, Weihang Yuan, Pavel Ostyakov, Dmitry Vyal, Magauiya Zhussip, Xueyi Zou, Youliang Yan, Lei Li, Jingzhu Tang, Ming Chen, Shijie Zhao, Yu Zhu, Xiaoran Qin, Chenghua Li, Cong Leng, Jian Cheng, Claudio Rota , et al. (28 additional authors not shown)

Abstract: This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 3… ▽ More This paper reviews the Challenge on Super-Resolution of Compressed Image and Video at AIM 2022. This challenge includes two tracks. Track 1 aims at the super-resolution of compressed image, and Track~2 targets the super-resolution of compressed video. In Track 1, we use the popular dataset DIV2K as the training, validation and test sets. In Track 2, we propose the LDV 3.0 dataset, which contains 365 videos, including the LDV 2.0 dataset (335 videos) and 30 additional videos. In this challenge, there are 12 teams and 2 teams that submitted the final results to Track 1 and Track 2, respectively. The proposed methods and solutions gauge the state-of-the-art of super-resolution on compressed image and video. The proposed LDV 3.0 dataset is available at https://github.com/RenYang-home/LDV_dataset. The homepage of this challenge is at https://github.com/RenYang-home/AIM22_CompressSR. △ Less

Submitted 25 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: Camera-ready version

arXiv:2208.05772 [pdf, other]

KiPA22 Report: U-Net with Contour Regularization for Renal Structures Segmentation

Authors: Kangqing Ye, Peng Liu, Xiaoyang Zou, Qin Zhou, Guoyan Zheng

Abstract: Three-dimensional (3D) integrated renal structures (IRS) segmentation is important in clinical practice. With the advancement of deep learning techniques, many powerful frameworks focusing on medical image segmentation are proposed. In this challenge, we utilized the nnU-Net framework, which is the state-of-the-art method for medical image segmentation. To reduce the outlier prediction for the tum… ▽ More Three-dimensional (3D) integrated renal structures (IRS) segmentation is important in clinical practice. With the advancement of deep learning techniques, many powerful frameworks focusing on medical image segmentation are proposed. In this challenge, we utilized the nnU-Net framework, which is the state-of-the-art method for medical image segmentation. To reduce the outlier prediction for the tumor label, we combine contour regularization (CR) loss of the tumor label with Dice loss and cross-entropy loss to improve this phenomenon. △ Less

Submitted 6 September, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

arXiv:2208.04318 [pdf, other]

Adaptive Local Implicit Image Function for Arbitrary-scale Super-resolution

Authors: Hongwei Li, Tao Dai, Yiming Li, Xueyi Zou, Shu-Tao Xia

Abstract: Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution… ▽ More Image representation is critical for many visual tasks. Instead of representing images discretely with 2D arrays of pixels, a recent study, namely local implicit image function (LIIF), denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs. Due to its continuous nature, LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors. However, LIIF often suffers from structural distortions and ringing artifacts around edges, mostly because all pixels share the same model, thus ignoring the local properties of the image. In this paper, we propose a novel adaptive local image function (A-LIIF) to alleviate this problem. Specifically, our A-LIIF consists of two main components: an encoder and a expansion network. The former captures cross-scale image features, while the latter models the continuous up-scaling function by a weighted combination of multiple local implicit image functions. Accordingly, our A-LIIF can reconstruct the high-frequency textures and structures more accurately. Experiments on multiple benchmark datasets verify the effectiveness of our method. Our codes are available at \url{https://github.com/LeeHW-THU/A-LIIF}. △ Less

Submitted 7 August, 2022; originally announced August 2022.

Comments: This paper is accepted by ICIP 2022. 5 pages

arXiv:2203.14823 [pdf]

Reciprocal phase transition-enabled electro-optic modulation

Authors: Fang Zou, Lei Zou, Ye Tian, Yiming Zhang, Erwin Bente, Weigang Hou, Yu Liu, Siming Chen, Victoria Cao, Lei Guo, Songsui Li, Lianshan Yan, Wei Pan, Dusan Milosevic, Zizheng Cao, A. M. J. Koonen, Huiyun Liu, Xihua Zou

Abstract: Electro-optic (EO) modulation is a well-known and essential topic in the field of communications and sensing. Its ultrahigh efficiency is unprecedentedly desired in the current green and data era. However, dramatically increasing the modulation efficiency is difficult due to the monotonic mapping relationship between the electrical signal and modulated optical signal. Here, a new mechanism termed… ▽ More Electro-optic (EO) modulation is a well-known and essential topic in the field of communications and sensing. Its ultrahigh efficiency is unprecedentedly desired in the current green and data era. However, dramatically increasing the modulation efficiency is difficult due to the monotonic mapping relationship between the electrical signal and modulated optical signal. Here, a new mechanism termed phase-transition EO modulation is revealed from the reciprocal transition between two distinct phase planes arising from the bifurcation. Remarkably, a monolithically integrated mode-locked laser (MLL) is implemented as a prototype. A 24.8-GHz radio-frequency signal is generated and modulated, achieving a modulation energy efficiency of 3.06 fJ/bit improved by about four orders of magnitude and a contrast ratio exceeding 50 dB. Thus, MLL-based phase-transition EO modulation is characterised by ultrahigh modulation efficiency and ultrahigh contrast ratio, as experimentally proved in radio-over-fibre and underwater acoustic-sensing systems. This phase-transition EO modulation opens a new avenue for green communication and ubiquitous connections. △ Less

Submitted 22 November, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

Comments: 27 pages, 14 figures

arXiv:2201.01893 [pdf, other]

Flow-Guided Sparse Transformer for Video Deblurring

Authors: Jing Lin, Yuanhao Cai, Xiaowan Hu, Haoqian Wang, Youliang Yan, Xueyi Zou, Henghui Ding, Yulun Zhang, Radu Timofte, Luc Van Gool

Abstract: Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sp… ▽ More Exploiting similar and sharper scene patches in spatio-temporal neighborhoods is critical for video deblurring. However, CNN-based methods show limitations in capturing long-range dependencies and modeling non-local self-similarity. In this paper, we propose a novel framework, Flow-Guided Sparse Transformer (FGST), for video deblurring. In FGST, we customize a self-attention module, Flow-Guided Sparse Window-based Multi-head Self-Attention (FGSW-MSA). For each $query$ element on the blurry reference frame, FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse yet highly related $key$ elements corresponding to the same scene patch in neighboring frames. Besides, we present a Recurrent Embedding (RE) mechanism to transfer information from past frames and strengthen long-range temporal dependencies. Comprehensive experiments demonstrate that our proposed FGST outperforms state-of-the-art (SOTA) methods on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring. Code and pre-trained models are publicly available at https://github.com/linjing7/VR-Baseline △ Less

Submitted 29 May, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: ICML 2022; The First Transformer-based method for Video Deblurring

arXiv:2110.08521 [pdf, other]

doi 10.1145/3474085.3475419

Locally Adaptive Structure and Texture Similarity for Image Quality Assessment

Authors: Keyan Ding, Yi Liu, Xueyi Zou, Shiqi Wang, Kede Ma

Abstract: The latest advances in full-reference image quality assessment (IQA) involve unifying structure and texture similarity based on deep representations. The resulting Deep Image Structure and Texture Similarity (DISTS) metric, however, makes rather global quality measurements, ignoring the fact that natural photographic images are locally structured and textured across space and scale. In this paper,… ▽ More The latest advances in full-reference image quality assessment (IQA) involve unifying structure and texture similarity based on deep representations. The resulting Deep Image Structure and Texture Similarity (DISTS) metric, however, makes rather global quality measurements, ignoring the fact that natural photographic images are locally structured and textured across space and scale. In this paper, we describe a locally adaptive structure and texture similarity index for full-reference IQA, which we term A-DISTS. Specifically, we rely on a single statistical feature, namely the dispersion index, to localize texture regions at different scales. The estimated probability (of one patch being texture) is in turn used to adaptively pool local structure and texture measurements. The resulting A-DISTS is adapted to local image content, and is free of expensive human perceptual scores for supervised training. We demonstrate the advantages of A-DISTS in terms of correlation with human data on ten IQA databases and optimization of single image super-resolution methods. △ Less

Submitted 16 October, 2021; originally announced October 2021.

Journal ref: Proceedings of the 29th ACM International Conference on Multimedia, 2021

arXiv:2104.10781 [pdf, other]

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh △ Less

Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Corrected the MOS values in Table 2, and corrected some minor typos

arXiv:2102.02640 [pdf, ps, other]

Low Bit-Rate Wideband Speech Coding: A Deep Generative Model based Approach

Authors: Gang Min, Xiongwei Zhang, Xia Zou, Xiangyang Liu

Abstract: Traditional low bit-rate speech coding approach only handles narrowband speech at 8kHz, which limits further improvements in speech quality. Motivated by recent successful exploration of deep learning methods for image and speech compression, this paper presents a new approach through vector quantization (VQ) of mel-frequency cepstral coefficients (MFCCs) and using a deep generative model called W… ▽ More Traditional low bit-rate speech coding approach only handles narrowband speech at 8kHz, which limits further improvements in speech quality. Motivated by recent successful exploration of deep learning methods for image and speech compression, this paper presents a new approach through vector quantization (VQ) of mel-frequency cepstral coefficients (MFCCs) and using a deep generative model called WaveGlow to provide efficient and high-quality speech coding. The coding feature is sorely an 80-dimension MFCCs vector for 16kHz wideband speech, then speech coding at the bit-rate throughout 1000-2000 bit/s could be scalably implemented by applying different VQ schemes for MFCCs vector. This new deep generative network based codec works fast as the WaveGlow model abandons the sample-by-sample autoregressive mechanism. We evaluated this new approach over the multi-speaker TIMIT corpus, and experimental results demonstrate that it provides better speech quality compared with the state-of-the-art classic MELPe codec at lower bit-rate. △ Less

Submitted 4 February, 2021; originally announced February 2021.

Comments: 6 pages

arXiv:2009.09910 [pdf]

doi 10.1016/j.optcom.2020.126611

Detail reconstruction in binary ghost imaging by using point-by-point method

Authors: Ning Zhang, Yanfeng Bai, Xuanpengfan Zou, Xiquan Fu

Abstract: We propose a new local-binary ghost imaging by using point-by-point method. This method can compensate the degradation of imaging quality due to the loss of information during binarization process. The numerical and experimental results show that the target details can be reconstructed well by this method when compared with traditional ghost imaging. By comparing the differences of the speckle pat… ▽ More We propose a new local-binary ghost imaging by using point-by-point method. This method can compensate the degradation of imaging quality due to the loss of information during binarization process. The numerical and experimental results show that the target details can be reconstructed well by this method when compared with traditional ghost imaging. By comparing the differences of the speckle patterns from different binarization methods, we also give the corresponding explanation. Our results may have the potential applications in areas with high requirements for imaging details, such as target recognition. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2006.13443 [pdf]

Hardware-irrelevant parallel processing system

Authors: Xiuting Zou, Shaofu Xu, Anyi Deng, Rui Wang, Weiwen Zou

Abstract: Parallel processing technology has been a primary tool for achieving high-speed, high-accuracy, and broadband processing for many years across modern information systems and data processing such as optical and radar, synthetic aperture radar imaging, digital beam forming, and digital filtering systems. However, hardware deviations in a parallel processing system (PPS) severely degrade system perfo… ▽ More Parallel processing technology has been a primary tool for achieving high-speed, high-accuracy, and broadband processing for many years across modern information systems and data processing such as optical and radar, synthetic aperture radar imaging, digital beam forming, and digital filtering systems. However, hardware deviations in a parallel processing system (PPS) severely degrade system performance and pose an urgent challenge. We propose a hardware-irrelevant PPS of which the performance is unaffected by hardware deviations. In this system, an embedded convolutional recurrent autoencoder (CRAE), which learns inherent system patterns as well as acquires and removes adverse effects brought by hardware deviations, is adopted. We implement a hardware-irrelevant PPS into a parallel photonic sampling system to accomplish a high-performance analog-to-digital conversion for microwave signals with high frequency and broad bandwidth. Under one system state, a category of signals with two different mismatch degrees is utilized to train the CRAE, which can then compensate for mismatches in various categories of signals with multiple mismatch degrees under random system states. Our approach is extensively applicable to achieving hardware-irrelevant PPSs which are either discrete or integrated in photonic, electric, and other fields. △ Less

Submitted 23 June, 2020; originally announced June 2020.

arXiv:2005.01996 [pdf, other]

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Authors: Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng , et al. (21 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Proc… ▽ More This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Processing artifacts, the aim is to super-resolve images with synthetically generated image processing artifacts. This allows for quantitative benchmarking of the approaches \wrt a ground-truth image. In Track 2: Smartphone Images, real low-quality smart phone images have to be super-resolved. In both tracks, the ultimate goal is to achieve the best perceptual quality, evaluated using a human study. This is the second challenge on the subject, following AIM 2019, targeting to advance the state-of-the-art in super-resolution. To measure the performance we use the benchmark protocol from AIM 2019. In total 22 teams competed in the final testing phase, demonstrating new and innovative solutions to the problem. △ Less

Submitted 5 May, 2020; originally announced May 2020.

arXiv:2003.03460 [pdf]

doi 10.1109/TQE.2020.2965810

Enhancing a Near-Term Quantum Accelerator's Instruction Set Architecture for Materials Science Applications

Authors: Xiang Zou, Shavindra P. Premaratne, M. Adriaan Rol, Sonika Johri, Viacheslav Ostroukh, David J. Michalak, Roman Caudillo, James S. Clarke, Leonardo Dicarlo, A. Y. Matsuura

Abstract: Quantum computers with tens to hundreds of noisy qubits are being developed today. To be useful for real-world applications, we believe that these near-term systems cannot simply be scaled-down non-error-corrected versions of future fault-tolerant large-scale quantum computers. These near-term systems require specific architecture and design attributes to realize their full potential. To efficient… ▽ More Quantum computers with tens to hundreds of noisy qubits are being developed today. To be useful for real-world applications, we believe that these near-term systems cannot simply be scaled-down non-error-corrected versions of future fault-tolerant large-scale quantum computers. These near-term systems require specific architecture and design attributes to realize their full potential. To efficiently execute an algorithm, the quantum coprocessor must be designed to scale with respect to qubit number and to maximize useful computation within the qubits' decoherence bounds. In this work, we employ an application-system-qubit co-design methodology to architect a near-term quantum coprocessor. To support algorithms from the real-world application area of simulating the quantum dynamics of a material system, we design a (parameterized) arbitrary single-qubit rotation instruction and a two-qubit entangling controlled-Z instruction. We introduce dynamic gate set and paging mechanisms to implement the instructions. To evaluate the functionality and performance of these two instructions, we implement a two-qubit version of an algorithm to study a disorder-induced metal-insulator transition and run 60 random instances of it, each of which realizes one disorder configuration and contains 40 two-qubit instructions (or gates) and 104 single-qubit instructions. We observe the expected quantum dynamics of the time-evolution of this system. △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: Received August 15, 2019; revised December 9, 2019; accepted December 13, 2019; date of publication January 28, 2020; date of current version February 14, 2020

Journal ref: in IEEE Transactions on Quantum Engineering, vol. 1, pp. 1-7, 2020, Art no. 4500307

arXiv:1912.10074 [pdf, ps, other]

Trellis-Coded Non-Orthogonal Multiple Access

Authors: Xun Zou, Mehdi Ganji, Hamid Jafarkhani

Abstract: In this letter, we propose a trellis-coded nonorthogonal multiple access (NOMA) scheme. The signals for different users are produced by trellis coded modulation (TCM) and then superimposed on different power levels. By interpreting the encoding process via the tensor product of trellises, we introduce a joint detection method based on the Viterbi algorithm. Then, we determine the optimal power all… ▽ More In this letter, we propose a trellis-coded nonorthogonal multiple access (NOMA) scheme. The signals for different users are produced by trellis coded modulation (TCM) and then superimposed on different power levels. By interpreting the encoding process via the tensor product of trellises, we introduce a joint detection method based on the Viterbi algorithm. Then, we determine the optimal power allocation between the two users by maximizing the free distance of the tensor product trellis. Finally, we manifest that the trellis-coded NOMA outperforms the uncoded NOMA at high signal-to-noise ratio (SNR). △ Less

Submitted 20 December, 2019; originally announced December 2019.

arXiv:1911.01249 [pdf, other]

AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results

Authors: Kai Zhang, Shuhang Gu, Radu Timofte, Zheng Hui, Xiumei Wang, Xinbo Gao, Dongliang Xiong, Shuai Liu, Ruipeng Gang, Nan Nan, Chenghua Li, Xueyi Zou, Ning Kang, Zhan Wang, Hang Xu, Chaofeng Wang, Zheng Li, Linlin Wang, Jun Shi, Wenyu Sun, Zhiqiang Lang, Jiangtao Nie, Wei Wei, Lei Zhang, Yazhe Niu , et al. (4 additional authors not shown)

Abstract: This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve… ▽ More This paper reviews the AIM 2019 challenge on constrained example-based single image super-resolution with focus on proposed solutions and results. The challenge had 3 tracks. Taking the three main aspects (i.e., number of parameters, inference/running time, fidelity (PSNR)) of MSRResNet as the baseline, Track 1 aims to reduce the amount of parameters while being constrained to maintain or improve the running time and the PSNR result, Tracks 2 and 3 aim to optimize running time and PSNR result with constrain of the other two aspects, respectively. Each track had an average of 64 registered participants, and 12 teams submitted the final results. They gauge the state-of-the-art in single image super-resolution. △ Less

Submitted 4 November, 2019; originally announced November 2019.

arXiv:1810.08906 [pdf]

Analog-to-digital conversion revolutionized by deep learning

Authors: Shaofu Xu, Xiuting Zou, Bowen Ma, Jianping Chen, Lei Yu, Weiwen Zou

Abstract: As the bridge between the analog world and digital computers, analog-to-digital converters are generally used in modern information systems such as radar, surveillance, and communications. For the configuration of analog-to-digital converters in future high-frequency broadband systems, we introduce a revolutionary architecture that adopts deep learning technology to overcome tradeoffs between band… ▽ More As the bridge between the analog world and digital computers, analog-to-digital converters are generally used in modern information systems such as radar, surveillance, and communications. For the configuration of analog-to-digital converters in future high-frequency broadband systems, we introduce a revolutionary architecture that adopts deep learning technology to overcome tradeoffs between bandwidth, sampling rate, and accuracy. A photonic front-end provides broadband capability for direct sampling and speed multiplication. Trained deep neural networks learn the patterns of system defects, maintaining high accuracy of quantized data in a succinct and adaptive manner. Based on numerical and experimental demonstrations, we show that the proposed architecture outperforms state-of-the-art analog-to-digital converters, confirming the potential of our approach in future analog-to-digital converter design and performance enhancement of future information systems. △ Less

Submitted 21 October, 2018; originally announced October 2018.

arXiv:1808.05410 [pdf, ps, other]

Interleaving Channel Estimation and Limited Feedback for Point-to-Point Systems with a Large Number of Transmit Antennas

Authors: Erdem Koyuncu, Xun Zou, Hamid Jafarkhani

Abstract: We introduce and investigate the opportunities of multi-antenna communication schemes whose training and feedback stages are interleaved and mutually interacting. Specifically, unlike the traditional schemes where the transmitter first trains all of its antennas at once and then receives a single feedback message, we consider a scenario where the transmitter instead trains its antennas one by one… ▽ More We introduce and investigate the opportunities of multi-antenna communication schemes whose training and feedback stages are interleaved and mutually interacting. Specifically, unlike the traditional schemes where the transmitter first trains all of its antennas at once and then receives a single feedback message, we consider a scenario where the transmitter instead trains its antennas one by one and receives feedback information immediately after training each one of its antennas. The feedback message may ask the transmitter to train another antenna; or, it may terminate the feedback/training phase and provide the quantized codeword (e.g., a beamforming vector) to be utilized for data transmission. As a specific application, we consider a multiple-input single-output system with $t$ transmit antennas, a short-term power constraint $P$, and target data rate $ρ$. We show that for any $t$, the same outage probability as a system with perfect transmitter and receiver channel state information can be achieved with a feedback rate of $R_1$ bits per channel state and via training $R_2$ transmit antennas on average, where $R_1$ and $R_2$ are independent of $t$, and depend only on $ρ$ and $P$. In addition, we design variable-rate quantizers for channel coefficients to further minimize the feedback rate of our scheme. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: To appear in IEEE Transactions on Wireless Communications

arXiv:1711.03197 [pdf, other]

Asynchronous Channel Training in Multi-Cell Massive MIMO

Authors: Xun Zou, Hamid Jafarkhani

Abstract: Pilot contamination has been regarded as the main bottleneck in time division duplexing (TDD) multi-cell massive multiple-input multiple-output (MIMO) systems. The pilot contamination problem cannot be addressed with large-scale antenna arrays. We provide a novel asynchronous channel training scheme to obtain precise channel matrices without the cooperation of base stations. The scheme takes advan… ▽ More Pilot contamination has been regarded as the main bottleneck in time division duplexing (TDD) multi-cell massive multiple-input multiple-output (MIMO) systems. The pilot contamination problem cannot be addressed with large-scale antenna arrays. We provide a novel asynchronous channel training scheme to obtain precise channel matrices without the cooperation of base stations. The scheme takes advantage of sampling diversity by inducing intentional timing mismatch. Then, the linear minimum mean square error (LMMSE) estimator and the zero-forcing (ZF) estimator are designed. Moreover, we derive the minimum square error (MSE) upper bound of the ZF estimator. In addition, we propose the equally-divided delay scheme which under certain conditions is the optimal solution to minimize the MSE of the ZF estimator employing the identity matrix as pilot matrix. We calculate the uplink achievable rate using maximum ratio combining (MRC) to compare asynchronous and synchronous channel training schemes. Finally, simulation results demonstrate that the asynchronous channel estimation scheme can greatly reduce the harmful effect of pilot contamination. △ Less

Submitted 8 November, 2017; originally announced November 2017.

arXiv:1506.06419 [pdf, other]

Verification and Control of Partially Observable Probabilistic Real-Time Systems

Authors: Gethin Norman, David Parker, Xueyi Zou

Abstract: We propose automated techniques for the verification and control of probabilistic real-time systems that are only partially observable. To formally model such systems, we define an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give a probabilistic temporal logic that can express a range of quantitative properties of these mod… ▽ More We propose automated techniques for the verification and control of probabilistic real-time systems that are only partially observable. To formally model such systems, we define an extension of probabilistic timed automata in which local states are partially visible to an observer or controller. We give a probabilistic temporal logic that can express a range of quantitative properties of these models, relating to the probability of an event's occurrence or the expected value of a reward measure. We then propose techniques to either verify that such a property holds or to synthesise a controller for the model which makes it true. Our approach is based on an integer discretisation of the model's dense-time behaviour and a grid-based abstraction of the uncountable belief space induced by partial observability. The latter is necessarily approximate since the underlying problem is undecidable, however we show how both lower and upper bounds on numerical results can be generated. We illustrate the effectiveness of the approach by implementing it in the PRISM model checker and applying it to several case studies, from the domains of computer security and task scheduling. △ Less

Submitted 22 June, 2015; v1 submitted 21 June, 2015; originally announced June 2015.

Showing 1–39 of 39 results for author: Zou, X