Search | arXiv e-print repository

Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network

Authors: Ling Lin, Yihang Zhou, Zhanqi Hu, Dian Jiang, Congcong Liu, Shuo Zhou, Yanjie Zhu, Jianxiang Liao, Dong Liang, Hairong Zheng, Haifeng Wang

Abstract: Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-lay… ▽ More Tuberous sclerosis complex (TSC) manifests as a multisystem disorder with significant neurological implications. This study addresses the critical need for robust classification models tailored to TSC in pediatric patients, introducing QResNet,a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks. The model incorporates a two-layer quantum layer (QL), comprising ZZFeatureMap and Ansatz layers, strategically designed for processing classical data within a quantum framework. A comprehensive evaluation, demonstrates the superior performance of QResNet in TSC MRI image classification compared to conventional 3D-ResNet models. These compelling findings underscore the potential of quantum computing to revolutionize medical imaging and diagnostics.Remarkably, this method surpasses conventional CNNs in accuracy and Area Under the Curve (AUC) metrics with the current dataset. Future research endeavors may focus on exploring the scalability and practical implementation of quantum algorithms in real-world medical imaging scenarios. △ Less

Submitted 26 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

Comments: 5 pages,4 figures,2 tables,presented at ISBI 2024

arXiv:2407.00280 [pdf, other]

IVCA: Inter-Relation-Aware Video Complexity Analyzer

Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA. Next, drawing inspiration from the hierarchical reference structure in codecs, we design layer-aware weights to adjust the majorities of frame complexity in different layers. Additionally, we expand the scope of temporal features by considering frames that be referred to, rather than relying solely on the previous frame. Experimental results show the significant improvement in complexity estimation accuracy achieved by IVCA, with minimal time complexity increase. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

arXiv:2406.06626 [pdf, other]

Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications

Authors: Zhou Zhou, Guohang He, Zheng Zhang, Luziwei Leng, Qinghai Guo, Jianxing Liao, Xuan Song, Ran Cheng

Abstract: Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks… ▽ More Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks to identify an optimal neural decoding backbone that boasts robust performance and swift inference capabilities suitable for edge deployment. We executed a series of neural decoding experiments involving nonhuman primates engaged in random reaching tasks, evaluating four prospective models, Gated Recurrent Unit (GRU), Transformer, Receptance Weighted Key Value (RWKV), and Selective State Space model (Mamba), across several metrics: single-session decoding, multi-session decoding, new session fine-tuning, inference speed, calibration speed, and scalability. The findings indicate that although the GRU model delivers sufficient accuracy, the RWKV and Mamba models are preferable due to their superior inference and calibration speeds. Additionally, RWKV and Mamba comply with the scaling law, demonstrating improved performance with larger data sets and increased model sizes, whereas GRU shows less pronounced scalability, and the Transformer model requires computational resources that scale prohibitively. This paper presents a thorough comparative analysis of the four models in various scenarios. The results are pivotal in pinpointing an optimal backbone that can handle increasing data volumes and is viable for edge implementation. This analysis provides essential insights for ongoing research and practical applications in the field. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.02146 [pdf, other]

A Spiking Neural Network Decoder for Implantable Brain Machine Interfaces and its Sparsity-aware Deployment on RISC-V Microcontrollers

Authors: Jiawei Liao, Oscar Toomey, Xiaying Wang, Lars Widmer, Cynthia A. Chestek, Luca Benini, Taekwang Jang

Abstract: Implantable Brain-machine interfaces (BMIs) are promising for motor rehabilitation and mobility augmentation, and they demand accurate and energy-efficient algorithms. In this paper, we propose a novel spiking neural network (SNN) decoder for regression tasks for implantable BMIs. The SNN is trained with enhanced spatio-temporal backpropagation to fully leverage its capability to handle temporal p… ▽ More Implantable Brain-machine interfaces (BMIs) are promising for motor rehabilitation and mobility augmentation, and they demand accurate and energy-efficient algorithms. In this paper, we propose a novel spiking neural network (SNN) decoder for regression tasks for implantable BMIs. The SNN is trained with enhanced spatio-temporal backpropagation to fully leverage its capability to handle temporal problems. The proposed SNN decoder outperforms the state-of-the-art Kalman filter and artificial neural network (ANN) decoders in offline finger velocity decoding tasks. The decoder is deployed on a RISC-V-based hardware platform and optimized to exploit sparsity. The proposed implementation has an average power consumption of 0.50 mW in a duty-cycled mode. When conducting continuous inference without duty-cycling, it achieves an energy efficiency of 1.88 uJ per inference, which is 5.5X less than the baseline ANN. Additionally, the average decoding latency is 0.12 ms for each inference, which is 5.7X faster than the ANN implementation. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.15341 [pdf, other]

Classifier-guided neural blind deconvolution: a physics-informed denoising module for bearing fault diagnosis under heavy noise

Authors: Jing-Xiao Liao, Chao He, Jipu Li, Jinwei Sun, Shiping Zhang, Xiaoge Zhang

Abstract: Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional… ▽ More Blind deconvolution (BD) has been demonstrated as an efficacious approach for extracting bearing fault-specific features from vibration signals under strong background noise. Despite BD's desirable feature in adaptability and mathematical interpretability, a significant challenge persists: How to effectively integrate BD with fault-diagnosing classifiers? This issue arises because the traditional BD method is solely designed for feature extraction with its own optimizer and objective function. When BD is combined with downstream deep learning classifiers, the different learning objectives will be in conflict. To address this problem, this paper introduces classifier-guided BD (ClassBD) for joint learning of BD-based feature extraction and deep learning-based fault classification. Firstly, we present a time and frequency neural BD that employs neural networks to implement conventional BD, thereby facilitating the seamless integration of BD and the deep learning classifier for co-optimization of model parameters. Subsequently, we develop a unified framework to use a deep learning classifier to guide the learning of BD filters. In addition, we devise a physics-informed loss function composed of kurtosis, $l_2/l_4$ norm, and a cross-entropy loss to jointly optimize the BD filters and deep learning classifier. Consequently, the fault labels provide useful information to direct BD to extract features that distinguish classes amidst strong noise. To the best of our knowledge, this is the first of its kind that BD is successfully applied to bearing fault diagnosis. Experimental results from three datasets demonstrate that ClassBD outperforms other state-of-the-art methods under noisy conditions. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2402.12657 [pdf, ps, other]

Coded Backscattering Communication with LTE Pilots as Ambient Signal

Authors: Jingyi Liao, Kalle Ruttik, Riku Jantti, Phan-Huy Dinh-Thuy

Abstract: The 3GPP has recently conducted a study on the Ambient Internet of Things (AIoT), with a particular emphasis on examining backscatter communications as one of the primary techniques under consideration. Previous investigations into Ambient Backscatter Communications (AmBC) within the long term evolution (LTE) downlink have shown that it is feasible to utilize the user equipment channel estimator a… ▽ More The 3GPP has recently conducted a study on the Ambient Internet of Things (AIoT), with a particular emphasis on examining backscatter communications as one of the primary techniques under consideration. Previous investigations into Ambient Backscatter Communications (AmBC) within the long term evolution (LTE) downlink have shown that it is feasible to utilize the user equipment channel estimator as a receiver for demodulating frequency shift keyed (FSK) messages transmitted by the backscatter devices. In practical deployment scenarios, the backscattered link often experiences a low signal-to-noise ratio, leading to subpar bit error rate (BER) performance in the case of uncoded transmissions. In this paper, we propose the adoption of the same convolutional coding methodology for backscatter links that is already employed for LTE downlink control signals. This approach facilitates the reuse of identical demodulation functions at the modem for both control signals and backscattered AIoT messages. To assess the performance of the proposed scheme, we conducted experiments utilizing real LTE downlink signals generated by a mobile operator within an office environment. When compared to uncoded FSK, convolutional channel coding delivers a notable gain of approximately 6 dB at a BER of $10^{-3}$. Consequently, the AmBC system demonstrates a high level of reliability, achieving a BER of $10^{-3}$ at a Signal-to-Noise Ratio (SNR) of 5 dB. △ Less

Submitted 20 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2311.05415 [pdf, other]

EEG-DG: A Multi-Source Domain Generalization Framework for Motor Imagery EEG Classification

Authors: Xiao-Cong Zhong, Qisong Wang, Dan Liu, Zhihuang Chen, Jing-Xiao Liao, Jinwei Sun, Yudong Zhang, Feng-Lei Fan

Abstract: Motor imagery EEG classification plays a crucial role in non-invasive Brain-Computer Interface (BCI) research. However, the classification is affected by the non-stationarity and individual variations of EEG signals. Simply pooling EEG data with different statistical distributions to train a classification model can severely degrade the generalization performance. To address this issue, the existi… ▽ More Motor imagery EEG classification plays a crucial role in non-invasive Brain-Computer Interface (BCI) research. However, the classification is affected by the non-stationarity and individual variations of EEG signals. Simply pooling EEG data with different statistical distributions to train a classification model can severely degrade the generalization performance. To address this issue, the existing methods primarily focus on domain adaptation, which requires access to the target data during training. This is unrealistic in many EEG application scenarios. In this paper, we propose a novel multi-source domain generalization framework called EEG-DG, which leverages multiple source domains with different statistical distributions to build generalizable models on unseen target EEG data. We optimize both the marginal and conditional distributions to ensure the stability of the joint distribution across source domains and extend it to a multi-source domain generalization framework to achieve domain-invariant feature representation, thereby alleviating calibration efforts. Systematic experiments on a simulative dataset and BCI competition datasets IV-2a and IV-2b demonstrate the superiority of our proposed EEG-DG over state-of-the-art methods. Specifically, EEG-DG achieves an average classification accuracy/kappa value of 81.79%/0.7572 and 87.12%/0.7424 on datasets IV-2a and IV-2b, respectively, which even outperforms some domain adaptation methods. Our code is available at https://github.com/XC-ZhongHIT/EEG-DG for free download and evaluation. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.11717 [pdf, other]

A class-weighted supervised contrastive learning long-tailed bearing fault diagnosis approach using quadratic neural network

Authors: Wei-En Yu, Jinwei Sun, Shiping Zhang, Xiaoge Zhang, Jing-Xiao Liao

Abstract: Deep learning has achieved remarkable success in bearing fault diagnosis. However, its performance oftentimes deteriorates when dealing with highly imbalanced or long-tailed data, while such cases are prevalent in industrial settings because fault is a rare event that occurs with an extremely low probability. Conventional data augmentation methods face fundamental limitations due to the scarcity o… ▽ More Deep learning has achieved remarkable success in bearing fault diagnosis. However, its performance oftentimes deteriorates when dealing with highly imbalanced or long-tailed data, while such cases are prevalent in industrial settings because fault is a rare event that occurs with an extremely low probability. Conventional data augmentation methods face fundamental limitations due to the scarcity of samples pertaining to the minority class. In this paper, we propose a supervised contrastive learning approach with a class-aware loss function to enhance the feature extraction capability of neural networks for fault diagnosis. The developed class-weighted contrastive learning quadratic network (CCQNet) consists of a quadratic convolutional residual network backbone, a contrastive learning branch utilizing a class-weighted contrastive loss, and a classifier branch employing logit-adjusted cross-entropy loss. By utilizing class-weighted contrastive loss and logit-adjusted cross-entropy loss, our approach encourages equidistant representation of class features, thereby inducing equal attention on all the classes. We further analyze the superior feature extraction ability of quadratic network by establishing the connection between quadratic neurons and autocorrelation in signal processing. Experimental results on public and proprietary datasets are used to validate the effectiveness of CCQNet, and computational results reveal that CCQNet outperforms SOTA methods in handling extremely imbalanced data substantially. △ Less

Submitted 20 September, 2023; originally announced September 2023.

arXiv:2309.10153 [pdf, other]

Preserving Tumor Volumes for Unsupervised Medical Image Registration

Authors: Qihua Dong, Hao Du, Ying Song, Yan Xu, Jing Liao

Abstract: Medical image registration is a critical task that estimates the spatial correspondence between pairs of images. However, current traditional and deep-learning-based methods rely on similarity measures to generate a deforming field, which often results in disproportionate volume changes in dissimilar regions, especially in tumor regions. These changes can significantly alter the tumor size and und… ▽ More Medical image registration is a critical task that estimates the spatial correspondence between pairs of images. However, current traditional and deep-learning-based methods rely on similarity measures to generate a deforming field, which often results in disproportionate volume changes in dissimilar regions, especially in tumor regions. These changes can significantly alter the tumor size and underlying anatomy, which limits the practical use of image registration in clinical diagnosis. To address this issue, we have formulated image registration with tumors as a constraint problem that preserves tumor volumes while maximizing image similarity in other normal regions. Our proposed strategy involves a two-stage process. In the first stage, we use similarity-based registration to identify potential tumor regions by their volume change, generating a soft tumor mask accordingly. In the second stage, we propose a volume-preserving registration with a novel adaptive volume-preserving loss that penalizes the change in size adaptively based on the masks calculated from the previous stage. Our approach balances image similarity and volume preservation in different regions, i.e., normal and tumor regions, by using soft tumor masks to adjust the imposition of volume-preserving loss on each one. This ensures that the tumor volume is preserved during the registration process. We have evaluated our strategy on various datasets and network architectures, demonstrating that our method successfully preserves the tumor volume while achieving comparable registration results with state-of-the-art methods. Our codes is available at: \url{https://dddraxxx.github.io/Volume-Preserving-Registration/}. △ Less

Submitted 9 May, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: ICCV 2023 Poster

arXiv:2307.02036 [pdf]

Convex Optimal Power Flow Based on Power Injection-based Equations and Its Application in Bipolar DC Distribution Network

Authors: Yiyao Zhou, Qianggang Wang, Yuan Chi, Jianquan Liao, Tao Huang, Niancheng Zhou, Xiaolong Xu, Xuefei Zhang

Abstract: Optimal power flow (OPF) is a fundamental tool for analyzing the characteristics of bipolar DC distribution network (DCDN). However, existing OPF models face challenges in reflecting the power distribution and exchange of bipolar DCDN directly since its decision variables are voltage and current. This paper addresses this issue by establishing a convex OPF model that can be used for the planning a… ▽ More Optimal power flow (OPF) is a fundamental tool for analyzing the characteristics of bipolar DC distribution network (DCDN). However, existing OPF models face challenges in reflecting the power distribution and exchange of bipolar DCDN directly since its decision variables are voltage and current. This paper addresses this issue by establishing a convex OPF model that can be used for the planning and operation of bipolar DCDN. First, the power flow characteristics of bipolar DCDN are revealed through power injection-based equations, upon which the original OPF model is established. Next, the original OPF model undergoes a transformation into a convex OPF model based on second-order cone programming (SOCP) through variable substitution, secondorder cone relaxation, McCormick relaxation, and first-order Taylor expansion, respectively. Finally, the sequence bound tightening algorithm (STBA) is employed to tighten the boundaries of McCormick envelopes in each iteration to ensure the exactness of the convex OPF model. The effectiveness of this novel OPF model for bipolar DCDN is verified through two case studies, i.e., capacity configuration of distributed generation (DG) and operation optimization of bipolar DCDN. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 10 pages, 13 figures, under review in IEEE transactions on power systems

arXiv:2304.08282 [pdf]

Deep-Learning-based Vasculature Extraction for Single-Scan Optical Coherence Tomography Angiography

Authors: Jinpeng Liao, Tianyu Zhang, Yilong Zhang, Chunhui Li, Zhihong Huang

Abstract: Optical coherence tomography angiography (OCTA) is a non-invasive imaging modality that extends the functionality of OCT by extracting moving red blood cell signals from surrounding static biological tissues. OCTA has emerged as a valuable tool for analyzing skin microvasculature, enabling more accurate diagnosis and treatment monitoring. Most existing OCTA extraction algorithms, such as speckle v… ▽ More Optical coherence tomography angiography (OCTA) is a non-invasive imaging modality that extends the functionality of OCT by extracting moving red blood cell signals from surrounding static biological tissues. OCTA has emerged as a valuable tool for analyzing skin microvasculature, enabling more accurate diagnosis and treatment monitoring. Most existing OCTA extraction algorithms, such as speckle variance (SV)- and eigen-decomposition (ED)-OCTA, implement a larger number of repeated (NR) OCT scans at the same position to produce high-quality angiography images. However, a higher NR requires a longer data acquisition time, leading to more unpredictable motion artifacts. In this study, we propose a vasculature extraction pipeline that uses only one-repeated OCT scan to generate OCTA images. The pipeline is based on the proposed Vasculature Extraction Transformer (VET), which leverages convolutional projection to better learn the spatial relationships between image patches. In comparison to OCTA images obtained via the SV-OCTA (PSNR: 17.809) and ED-OCTA (PSNR: 18.049) using four-repeated OCT scans, OCTA images extracted by VET exhibit moderate quality (PSNR: 17.515) and higher image contrast while reducing the required data acquisition time from ~8 s to ~2 s. Based on visual observations, the proposed VET outperforms SV and ED algorithms when using neck and face OCTA data in areas that are challenging to scan. This study represents that the VET has the capacity to extract vascularture images from a fast one-repeated OCT scan, facilitating accurate diagnosis for patients. △ Less

Submitted 3 May, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

arXiv:2303.13264 [pdf, other]

Modular CSI Quantization for FDD Massive MIMO Communication

Authors: Jialing Liao, Roope Vehkalahti, Tefjol Pllaha, Wei Han, Olav Tirkkonen

Abstract: We consider high-dimensional MIMO transmissions in frequency division duplexing (FDD) systems. For precoding, the frequency selective channel has to be measured, quantized and fed back to the base station by the users. When the number of antennas is very high this typically leads to prohibitively high quantization complexity and large feedback. In 5G New Radio (NR), a modular quantization approach… ▽ More We consider high-dimensional MIMO transmissions in frequency division duplexing (FDD) systems. For precoding, the frequency selective channel has to be measured, quantized and fed back to the base station by the users. When the number of antennas is very high this typically leads to prohibitively high quantization complexity and large feedback. In 5G New Radio (NR), a modular quantization approach has been applied for this, where first a low-dimensional subspace is identified for the whole frequency selective channel, and then subband channels are linearly mapped to this subspace and quantized. We analyze how the components in such a modular scheme contribute to the overall quantization distortion. Based on this analysis we improve the technology components in the modular approach and propose an orthonormalized wideband precoding scheme and a sequential wideband precoding approach which provide considerable gains over the conventional method. We compare the performance of the developed quantization schemes to prior art by simulations in terms of the projection distortion, overall distortion and spectral efficiency, in a scenario with a realistic spatial channel model. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: 15 pages,9 figures, to appear in TWC

arXiv:2303.04439 [pdf, other]

A Light Weight Model for Active Speaker Detection

Authors: Junhua Liao, Haihan Duan, Kanghui Feng, Wanbing Zhao, Yanbing Yang, Liangyin Chen

Abstract: Active speaker detection is a challenging task in audio-visual scenario understanding, which aims to detect who is speaking in one or more speakers scenarios. This task has received extensive attention as it is crucial in applications such as speaker diarization, speaker tracking, and automatic video editing. The existing studies try to improve performance by inputting multiple candidate informati… ▽ More Active speaker detection is a challenging task in audio-visual scenario understanding, which aims to detect who is speaking in one or more speakers scenarios. This task has received extensive attention as it is crucial in applications such as speaker diarization, speaker tracking, and automatic video editing. The existing studies try to improve performance by inputting multiple candidate information and designing complex models. Although these methods achieved outstanding performance, their high consumption of memory and computational power make them difficult to be applied in resource-limited scenarios. Therefore, we construct a lightweight active speaker detection architecture by reducing input candidates, splitting 2D and 3D convolutions for audio-visual feature extraction, and applying gated recurrent unit (GRU) with low computational complexity for cross-modal modeling. Experimental results on the AVA-ActiveSpeaker dataset show that our framework achieves competitive mAP performance (94.1% vs. 94.2%), while the resource costs are significantly lower than the state-of-the-art method, especially in model parameters (1.0M vs. 22.5M, about 23x) and FLOPs (0.6G vs. 2.6G, about 4x). In addition, our framework also performs well on the Columbia dataset showing good robustness. The code and model weights are available at https://github.com/Junhua-Liao/Light-ASD. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023

arXiv:2302.02125 [pdf, other]

doi 10.1109/TMI.2023.3269523

Weakly-Supervised 3D Medical Image Segmentation using Geometric Prior and Contrastive Similarity

Authors: Hao Du, Qihua Dong, Yan Xu, Jing Liao

Abstract: Medical image segmentation is almost the most important pre-processing procedure in computer-aided diagnosis but is also a very challenging task due to the complex shapes of segments and various artifacts caused by medical imaging, (i.e., low-contrast tissues, and non-homogenous textures). In this paper, we propose a simple yet effective segmentation framework that incorporates the geometric prior… ▽ More Medical image segmentation is almost the most important pre-processing procedure in computer-aided diagnosis but is also a very challenging task due to the complex shapes of segments and various artifacts caused by medical imaging, (i.e., low-contrast tissues, and non-homogenous textures). In this paper, we propose a simple yet effective segmentation framework that incorporates the geometric prior and contrastive similarity into the weakly-supervised segmentation framework in a loss-based fashion. The proposed geometric prior built on point cloud provides meticulous geometry to the weakly-supervised segmentation proposal, which serves as better supervision than the inherent property of the bounding-box annotation (i.e., height and width). Furthermore, we propose contrastive similarity to encourage organ pixels to gather around in the contrastive embedding space, which helps better distinguish low-contrast tissues. The proposed contrastive embedding space can make up for the poor representation of the conventionally-used gray space. Extensive experiments are conducted to verify the effectiveness and the robustness of the proposed weakly-supervised segmentation framework. The proposed framework is superior to state-of-the-art weakly-supervised methods on the following publicly accessible datasets: LiTS 2017 Challenge, KiTS 2021 Challenge, and LPBA40. We also dissect our method and evaluate the performance of each component. △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: Weakly-supervised Segmentation, Medical Image Segmentation, Contrastive Similarity, Geometric Prior, Point Cloud

Journal ref: IEEE Trans. Med. Imaging, Early Access, pp. 1-1, April 24, 2023

arXiv:2301.13664 [pdf, other]

Ambient FSK Backscatter Communications using LTE Cell Specific Reference Signals

Authors: Jingyi Liao, Xiyu Wang, Kalle Ruttik, Riku Jantti, Phan-Huy Dinh-Thuy

Abstract: Long Term Evolution (LTE) signal is ubiquitously present in electromagnetic (EM) background environment, which make it an attractive signal source for the ambient backscatter communications (AmBC). In this paper, we propose a system, in which a backscatter device (BD) introduces artificial Doppler shift to the channel which is larger than the natural Doppler but still small enough such that it can… ▽ More Long Term Evolution (LTE) signal is ubiquitously present in electromagnetic (EM) background environment, which make it an attractive signal source for the ambient backscatter communications (AmBC). In this paper, we propose a system, in which a backscatter device (BD) introduces artificial Doppler shift to the channel which is larger than the natural Doppler but still small enough such that it can be tracked by the channel estimator at the User Equipment (UE). Channel estimation is done using the downlink cell specific reference signals (CRS) that are present regardless the UE being attached to the network or not. FSK was selected due to its robust operation in a fading channel. We describe the whole AmBC system, use two receivers. Finally, numerical simulations and measurements are provided to validate the proposed FSK AmBC performance. △ Less

Submitted 31 January, 2023; originally announced January 2023.

arXiv:2210.06287 [pdf, other]

doi 10.1109/AICAS54282.2022.9869846

An Energy-Efficient Spiking Neural Network for Finger Velocity Decoding for Implantable Brain-Machine Interface

Authors: Jiawei Liao, Lars Widmer, Xiaying Wang, Alfio Di Mauro, Samuel R. Nason-Tomaszewski, Cynthia A. Chestek, Luca Benini, Taekwang Jang

Abstract: Brain-machine interfaces (BMIs) are promising for motor rehabilitation and mobility augmentation. High-accuracy and low-power algorithms are required to achieve implantable BMI systems. In this paper, we propose a novel spiking neural network (SNN) decoder for implantable BMI regression tasks. The SNN is trained with enhanced spatio-temporal backpropagation to fully leverage its ability in handlin… ▽ More Brain-machine interfaces (BMIs) are promising for motor rehabilitation and mobility augmentation. High-accuracy and low-power algorithms are required to achieve implantable BMI systems. In this paper, we propose a novel spiking neural network (SNN) decoder for implantable BMI regression tasks. The SNN is trained with enhanced spatio-temporal backpropagation to fully leverage its ability in handling temporal problems. The proposed SNN decoder achieves the same level of correlation coefficient as the state-of-the-art ANN decoder in offline finger velocity decoding tasks, while it requires only 6.8% of the computation operations and 9.4% of the memory access. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Journal ref: 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), 2022, pp. 134-137

arXiv:2209.01108 [pdf, other]

Ambient backscatter communications using LTE cell specific reference signals

Authors: Kalle Ruttik, Xiyu Wang, Jingyi Liao, Riku Jantti, Phan-Huy Dinh-Thuy

Abstract: Long Term Evolution (LTE) systems provide ubiquitous coverage for mobile communications, which makes it a promising candidate to be used as a signal source in the ambient backscatter communications. In this paper, we propose a system in which a backscatter device modulates the ambient LTE signal by changing its reflection coefficient and the receiver uses the LTE Cell Specific Reference Signals (C… ▽ More Long Term Evolution (LTE) systems provide ubiquitous coverage for mobile communications, which makes it a promising candidate to be used as a signal source in the ambient backscatter communications. In this paper, we propose a system in which a backscatter device modulates the ambient LTE signal by changing its reflection coefficient and the receiver uses the LTE Cell Specific Reference Signals (CRS) to estimate the channel and demodulates the backscattered signal from the obtained channel impulse response estimates. We first outline the overall system, discuss the receiver operation, and then provide experimental evidence on the practicality of the proposed system. △ Less

Submitted 2 September, 2022; originally announced September 2022.

Comments: 4 pages, 5 figures, IEEE RFID-TA 2022

arXiv:2208.04718 [pdf, other]

doi 10.1016/j.compbiomed.2022.106417

Improving COVID-19 CT Classification of CNNs by Learning Parameter-Efficient Representation

Authors: Yujia Xu, Hak-Keung Lam, Guangyu Jia, Jian Jiang, Junkai Liao, Xinqi Bao

Abstract: COVID-19 pandemic continues to spread rapidly over the world and causes a tremendous crisis in global human health and the economy. Its early detection and diagnosis are crucial for controlling the further spread. Many deep learning-based methods have been proposed to assist clinicians in automatic COVID-19 diagnosis based on computed tomography imaging. However, challenges still remain, including… ▽ More COVID-19 pandemic continues to spread rapidly over the world and causes a tremendous crisis in global human health and the economy. Its early detection and diagnosis are crucial for controlling the further spread. Many deep learning-based methods have been proposed to assist clinicians in automatic COVID-19 diagnosis based on computed tomography imaging. However, challenges still remain, including low data diversity in existing datasets, and unsatisfied detection resulting from insufficient accuracy and sensitivity of deep learning models. To enhance the data diversity, we design augmentation techniques of incremental levels and apply them to the largest open-access benchmark dataset, COVIDx CT-2A. Meanwhile, similarity regularization (SR) derived from contrastive learning is proposed in this study to enable CNNs to learn more parameter-efficient representations, thus improving the accuracy and sensitivity of CNNs. The results on seven commonly used CNNs demonstrate that CNN performance can be improved stably through applying the designed augmentation and SR techniques. In particular, DenseNet121 with SR achieves an average test accuracy of 99.44% in three trials for three-category classification, including normal, non-COVID-19 pneumonia, and COVID-19 pneumonia. And the achieved precision, sensitivity, and specificity for the COVID-19 pneumonia category are 98.40%, 99.59%, and 99.50%, respectively. These statistics suggest that our method has surpassed the existing state-of-the-art methods on the COVIDx CT-2A dataset. △ Less

Submitted 9 August, 2022; originally announced August 2022.

arXiv:2206.00390 [pdf, other]

doi 10.1109/TIM.2023.3259031

Attention-embedded Quadratic Network (Qttention) for Effective and Interpretable Bearing Fault Diagnosis

Authors: Jing-Xiao Liao, Hang-Cheng Dong, Zhi-Qi Sun, Jinwei Sun, Shiping Zhang, Feng-Lei Fan

Abstract: Bearing fault diagnosis is of great importance to decrease the damage risk of rotating machines and further improve economic profits. Recently, machine learning, represented by deep learning, has made great progress in bearing fault diagnosis. However, applying deep learning to such a task still faces a major problem. A deep network is notoriously a black box. It is difficult to know how a model c… ▽ More Bearing fault diagnosis is of great importance to decrease the damage risk of rotating machines and further improve economic profits. Recently, machine learning, represented by deep learning, has made great progress in bearing fault diagnosis. However, applying deep learning to such a task still faces a major problem. A deep network is notoriously a black box. It is difficult to know how a model classifies faulty signals from the normal and the physics principle behind the classification. To solve the interpretability issue, first, we prototype a convolutional network with recently-invented quadratic neurons. This quadratic neuron empowered network can qualify the noisy bearing data due to the strong feature representation ability of quadratic neurons. Moreover, we independently derive the attention mechanism from a quadratic neuron, referred to as qttention, by factorizing the learned quadratic function in analogue to the attention, making the model with quadratic neurons inherently interpretable. Experiments on the public and our datasets demonstrate that the proposed network can facilitate effective and interpretable bearing fault diagnosis. △ Less

Submitted 7 August, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: update abstract add experiments in classification results delete small data experiment add comparison experiments of qttention and convolution

Report number: Art no. 3511113

Journal ref: IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1-13, 2023

arXiv:2102.11114 [pdf, other]

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Authors: Junwei Liao, Yu Shi, Ming Gong, Linjun Shou, Sefik Eskimez, Liyang Lu, Hong Qu, Michael Zeng

Abstract: Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR s… ▽ More Modern Automatic Speech Recognition (ASR) systems can achieve high performance in terms of recognition accuracy. However, a perfectly accurate transcript still can be challenging to read due to disfluency, filter words, and other errata common in spoken communication. Many downstream tasks and human readers rely on the output of the ASR system; therefore, errors introduced by the speaker and ASR system alike will be propagated to the next task in the pipeline. In this work, we propose an ASR post-processing model that aims to transform the incorrect and noisy ASR output into a readable text for humans and downstream tasks. We leverage the Metadata Extraction (MDE) corpus to construct a task-specific dataset for our study. Since the dataset is small, we propose a novel data augmentation method and use a two-stage training strategy to fine-tune the RoBERTa pre-trained model. On the constructed test set, our model outperforms a production two-step pipeline-based post-processing method by a large margin of 13.26 on readability-aware WER (RA-WER) and 17.53 on BLEU metrics. Human evaluation also demonstrates that our method can generate more human-readable transcripts than the baseline method. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted in 2021 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2021)

arXiv:2011.11879 [pdf]

Blind deblurring for microscopic pathology images using deep learning networks

Authors: Cheng Jiang, Jun Liao, Pei Dong, Zhaoxuan Ma, De Cai, Guoan Zheng, Yueping Liu, Hong Bu, Jianhua Yao

Abstract: Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach… ▽ More Artificial Intelligence (AI)-powered pathology is a revolutionary step in the world of digital pathology and shows great promise to increase both diagnosis accuracy and efficiency. However, defocus and motion blur can obscure tissue or cell characteristics hence compromising AI algorithms'accuracy and robustness in analyzing the images. In this paper, we demonstrate a deep-learning-based approach that can alleviate the defocus and motion blur of a microscopic image and output a sharper and cleaner image with retrieved fine details without prior knowledge of the blur type, blur extent and pathological stain. In this approach, a deep learning classifier is first trained to identify the image blur type. Then, two encoder-decoder networks are trained and used alone or in combination to deblur the input image. It is an end-to-end approach and introduces no corrugated artifacts as traditional blind deconvolution methods do. We test our approach on different types of pathology specimens and demonstrate great performance on image blur correction and the subsequent improvement on the diagnosis outcome of AI algorithms. △ Less

Submitted 23 November, 2020; originally announced November 2020.

arXiv:2011.07511 [pdf]

Wide-field Decodable Orthogonal Fingerprints of Single Nanoparticles Unlock Multiplexed Digital Assays

Authors: Jiayan Liao, Jiajia Zhou, Yiliao Song, Baolei Liu, Yinghui Chen, Fan Wang, Chaohao Chen, Jun Lin, Xueyuan Chen, Jie Lu, Dayong Jin

Abstract: The control in optical uniformity of single nanoparticles and tuning their diversity in orthogonal dimensions, dot to dot, holds the key to unlock nanoscience and applications. Here we report that the time-domain emissive profile from single upconversion nanoparticle, including the rising, decay and peak moment of the excited state population (T2 profile), can be arbitrarily tuned by upconversion… ▽ More The control in optical uniformity of single nanoparticles and tuning their diversity in orthogonal dimensions, dot to dot, holds the key to unlock nanoscience and applications. Here we report that the time-domain emissive profile from single upconversion nanoparticle, including the rising, decay and peak moment of the excited state population (T2 profile), can be arbitrarily tuned by upconversion schemes, including interfacial energy migration, concentration dependency, energy transfer, and isolation of surface quenchers. This allows us to significantly increase the coding capacity at the nanoscale. We further implement both time-resolved wide-field imaging and deep-learning techniques to decode these fingerprints, showing high accuracies at high throughput. These high-dimensional optical fingerprints provide a new horizon for applications spanning from sub-diffraction-limit data storage, security inks, to high-throughput single-molecule digital assays and super-resolution imaging. △ Less

Submitted 15 November, 2020; originally announced November 2020.

arXiv:2007.08691 [pdf]

Decision-making Strategy on Highway for Autonomous Vehicles using Deep Reinforcement Learning

Authors: Jiangdong Liao, Teng Liu, Xiaolin Tang, Xingyu Mu, Bing Huang, Dongpu Cao

Abstract: Autonomous driving is a promising technology to reduce traffic accidents and improve driving efficiency. In this work, a deep reinforcement learning (DRL)-enabled decision-making policy is constructed for autonomous vehicles to address the overtaking behaviors on the highway. First, a highway driving environment is founded, wherein the ego vehicle aims to pass through the surrounding vehicles with… ▽ More Autonomous driving is a promising technology to reduce traffic accidents and improve driving efficiency. In this work, a deep reinforcement learning (DRL)-enabled decision-making policy is constructed for autonomous vehicles to address the overtaking behaviors on the highway. First, a highway driving environment is founded, wherein the ego vehicle aims to pass through the surrounding vehicles with an efficient and safe maneuver. A hierarchical control framework is presented to control these vehicles, which indicates the upper-level manages the driving decisions, and the lower-level cares about the supervision of vehicle speed and acceleration. Then, the particular DRL method named dueling deep Q-network (DDQN) algorithm is applied to derive the highway decision-making strategy. The exhaustive calculative procedures of deep Q-network and DDQN algorithms are discussed and compared. Finally, a series of estimation simulation experiments are conducted to evaluate the effectiveness of the proposed highway decision-making policy. The advantages of the proposed framework in convergence rate and control performance are illuminated. Simulation results reveal that the DDQN-based overtaking policy could accomplish highway driving tasks efficiently and safely. △ Less

Submitted 16 July, 2020; originally announced July 2020.

Comments: 11 pages, 13 figures

arXiv:2004.09484 [pdf, other]

Bringing Old Photos Back to Life

Authors: Ziyu Wan, Bo Zhang, Dongdong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen

Abstract: We propose to restore old photos that suffer from severe degradation through a deep learning approach. Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize. Therefore, we propose a novel triplet domain translation network by… ▽ More We propose to restore old photos that suffer from severe degradation through a deep learning approach. Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize. Therefore, we propose a novel triplet domain translation network by leveraging real photos along with massive synthetic image pairs. Specifically, we train two variational autoencoders (VAEs) to respectively transform old photos and clean photos into two latent spaces. And the translation between these two latent spaces is learned with synthetic paired data. This translation generalizes well to real photos because the domain gap is closed in the compact latent space. Besides, to address multiple degradations mixed in one old photo, we design a global branch with a partial nonlocal block targeting to the structured defects, such as scratches and dust spots, and a local branch targeting to the unstructured defects, such as noises and blurriness. Two branches are fused in the latent space, leading to improved capability to restore old photos from multiple defects. The proposed method outperforms state-of-the-art methods in terms of visual quality for old photos restoration. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: CVPR 2020 Oral, project website: http://raywzy.com/Old_Photo/

arXiv:2004.06949 [pdf, other]

A Model-Driven Deep Learning Method for Massive MIMO Detection

Authors: Jieyu Liao, Junhui Zhao, Feifei Gao, Geoffrey Ye Li

Abstract: In this paper, an efficient massive multiple-input multiple-output (MIMO) detector is proposed by employing a deep neural network (DNN). Specifically, we first unfold an existing iterative detection algorithm into the DNN structure, such that the detection task can be implemented by deep learning (DL) approach. We then introduce two auxiliary parameters at each layer to better cancel multiuser int… ▽ More In this paper, an efficient massive multiple-input multiple-output (MIMO) detector is proposed by employing a deep neural network (DNN). Specifically, we first unfold an existing iterative detection algorithm into the DNN structure, such that the detection task can be implemented by deep learning (DL) approach. We then introduce two auxiliary parameters at each layer to better cancel multiuser interference (MUI). The first parameter is to generate the residual error vector while the second one is to adjust the relationship among previous layers. We further design the training procedure to optimize the auxiliary parameters with pre-processed inputs. The so derived MIMO detector falls into the category of model-driven DL. The simulation results show that the proposed MIMO detector can achieve preferable detection performance compared to the existing detectors for massive MIMO systems. △ Less

Submitted 15 April, 2020; originally announced April 2020.

arXiv:2004.02805 [pdf]

Application of Structural Similarity Analysis of Visually Salient Areas and Hierarchical Clustering in the Screening of Similar Wireless Capsule Endoscopic Images

Authors: Rui Nie, Huan Yang, Hejuan Peng, Wenbin Luo, Weiya Fan, Jie Zhang, Jing Liao, Fang Huang, Yufeng Xiao

Abstract: Small intestinal capsule endoscopy is the mainstream method for inspecting small intestinal lesions,but a single small intestinal capsule endoscopy will produce 60,000 - 120,000 images, the majority of which are similar and have no diagnostic value. It takes 2 - 3 hours for doctors to identify lesions from these images. This is time-consuming and increase the probability of misdiagnosis and missed… ▽ More Small intestinal capsule endoscopy is the mainstream method for inspecting small intestinal lesions,but a single small intestinal capsule endoscopy will produce 60,000 - 120,000 images, the majority of which are similar and have no diagnostic value. It takes 2 - 3 hours for doctors to identify lesions from these images. This is time-consuming and increase the probability of misdiagnosis and missed diagnosis since doctors are likely to experience visual fatigue while focusing on a large number of similar images for an extended period of time.In order to solve these problems, we proposed a similar wireless capsule endoscope (WCE) image screening method based on structural similarity analysis and the hierarchical clustering of visually salient sub-image blocks. The similarity clustering of images was automatically identified by hierarchical clustering based on the hue,saturation,value (HSV) spatial color characteristics of the images,and the keyframe images were extracted based on the structural similarity of the visually salient sub-image blocks, in order to accurately identify and screen out similar small intestinal capsule endoscopic images. Subsequently, the proposed method was applied to the capsule endoscope imaging workstation. After screening out similar images in the complete data gathered by the Type I OMOM Small Intestinal Capsule Endoscope from 52 cases covering 17 common types of small intestinal lesions, we obtained a lesion recall of 100% and an average similar image reduction ratio of 76%. With similar images screened out, the average play time of the OMOM image workstation was 18 minutes, which greatly reduced the time spent by doctors viewing the images. △ Less

Submitted 1 April, 2020; originally announced April 2020.

arXiv:2002.11088 [pdf, other]

Model Watermarking for Image Processing Networks

Authors: Jie Zhang, Dongdong Chen, Jing Liao, Han Fang, Weiming Zhang, Wenbo Zhou, Hao Cui, Nenghai Yu

Abstract: Deep learning has achieved tremendous success in numerous industrial applications. As training a good model often needs massive high-quality data and computation resources, the learned models often have significant business values. However, these valuable deep models are exposed to a huge risk of infringements. For example, if the attacker has the full information of one target model including the… ▽ More Deep learning has achieved tremendous success in numerous industrial applications. As training a good model often needs massive high-quality data and computation resources, the learned models often have significant business values. However, these valuable deep models are exposed to a huge risk of infringements. For example, if the attacker has the full information of one target model including the network structure and weights, the model can be easily finetuned on new datasets. Even if the attacker can only access the output of the target model, he/she can still train another similar surrogate model by generating a large scale of input-output training pairs. How to protect the intellectual property of deep models is a very important but seriously under-researched problem. There are a few recent attempts at classification network protection only. In this paper, we propose the first model watermarking framework for protecting image processing models. To achieve this goal, we leverage the spatial invisible watermarking mechanism. Specifically, given a black-box target model, a unified and invisible watermark is hidden into its outputs, which can be regarded as a special task-agnostic barrier. In this way, when the attacker trains one surrogate model by using the input-output pairs of the target model, the hidden watermark will be learned and extracted afterward. To enable watermarks from binary bits to high-resolution images, both traditional and deep spatial invisible watermarking mechanism are considered. Experiments demonstrate the robustness of the proposed watermarking mechanism, which can resist surrogate models learned with different network structures and objective functions. Besides deep models, the proposed method is also easy to be extended to protect data and traditional image processing algorithms. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: AAAI 2020

arXiv:1901.03057 [pdf]

doi 10.1364/OE.27.007498

Near-field Fourier ptychography: super-resolution phase retrieval via speckle illumination

Authors: He Zhang, Shaowei Jiang, Jun Liao, Junjing Deng, Jian Liu, Yongbing Zhang, Guoan Zheng

Abstract: Achieving high spatial resolution is the goal of many imaging systems. Designing a high-resolution lens with diffraction-limited performance over a large field of view remains a difficult task in imaging system design. On the other hand, creating a complex speckle pattern with wavelength-limited spatial features is effortless and can be implemented via a simple random diffuser. With this observati… ▽ More Achieving high spatial resolution is the goal of many imaging systems. Designing a high-resolution lens with diffraction-limited performance over a large field of view remains a difficult task in imaging system design. On the other hand, creating a complex speckle pattern with wavelength-limited spatial features is effortless and can be implemented via a simple random diffuser. With this observation and inspired by the concept of near-field ptychography, we report a new imaging modality, termed near-field Fourier ptychography, for tackling high-resolution imaging challenges in both microscopic and macroscopic imaging settings. The meaning of 'near-field' is referred to placing the object at a short defocus distance with a large Fresnel number. In our implementations, we project a speckle pattern with fine spatial features on the object instead of directly resolving the spatial features via a high-resolution lens. We then translate the object (or speckle) to different positions and acquire the corresponding images using a low-resolution lens. A ptychographic phase retrieval process is used to recover the complex object, the unknown speckle pattern, and the coherent transfer function at the same time. In a microscopic imaging setup, we use a 0.12 numerical aperture (NA) lens to achieve a NA of 0.85 in the reconstruction process. In a macroscale photographic imaging setup, we achieve ~7-fold resolution gain using a photographic lens. The final achievable resolution is not determined by the collection optics. Instead, it is determined by the feature size of the speckle pattern. The reported imaging modality can be employed in light, coherent X-ray, and transmission electron imaging systems to increase resolution and provide quantitative absorption and phase contrast of the object. △ Less

Submitted 9 February, 2019; v1 submitted 10 January, 2019; originally announced January 2019.

Comments: 15 pages, 14 figures

Showing 1–29 of 29 results for author: Liao, J