Search | arXiv e-print repository

Deep Learning based Performance Testing for Analog Integrated Circuits

Authors: Jiawei Cao, Chongtao Guo, Hao Li, Zhigang Wang, Houjun Wang, Geoffrey Ye Li

Abstract: In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the mapping from the response of the circuit under test (CUT) in each module to all specif… ▽ More In this paper, we propose a deep learning based performance testing framework to minimize the number of required test modules while guaranteeing the accuracy requirement, where a test module corresponds to a combination of one circuit and one stimulus. First, we apply a deep neural network (DNN) to establish the mapping from the response of the circuit under test (CUT) in each module to all specifications to be tested. Then, the required test modules are selected by solving a 0-1 integer programming problem. Finally, the predictions from the selected test modules are combined by a DNN to form the specification estimations. The simulation results validate the proposed approach in terms of testing accuracy and cost. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.00739 [pdf, other]

Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

Authors: Chenqi Guo, Shiwei Zhong, Xiaofeng Liu, Qianli Feng, Yinglong Ma

Abstract: Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize th… ▽ More Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize that diverse attentions in teachers contribute to better student generalization at the expense of reduced fidelity in ensemble KD setups. By increasing data augmentation strengths, our key findings reveal a decrease in the Intersection over Union (IoU) of attentions between teacher models, leading to reduced student overfitting and decreased fidelity. We propose this low-fidelity phenomenon as an underlying characteristic rather than a pathology when training KD. This suggests that stronger data augmentation fosters a broader perspective provided by the divergent teacher ensemble and lower student-teacher mutual information, benefiting generalization performance. These insights clarify the mechanism on low-fidelity phenomenon in KD. Thus, we offer new perspectives on optimizing student model performance, by emphasizing increased diversity in teacher attentions and reduced mimicry behavior between teachers and student. △ Less

Submitted 29 April, 2024; originally announced May 2024.

arXiv:2404.10343 [pdf, other]

The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/. △ Less

Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

arXiv:2402.10686 [pdf, other]

On the Impact of Uncertainty and Calibration on Likelihood-Ratio Membership Inference Attacks

Authors: Meiyi Zhu, Caili Guo, Chunyan Feng, Osvaldo Simeone

Abstract: In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleato… ▽ More In a membership inference attack (MIA), an attacker exploits the overconfidence exhibited by typical machine learning models to determine whether a specific data point was used to train a target model. In this paper, we analyze the performance of the state-of-the-art likelihood ratio attack (LiRA) within an information-theoretical framework that allows the investigation of the impact of the aleatoric uncertainty in the true data generation process, of the epistemic uncertainty caused by a limited training data set, and of the calibration level of the target model. We compare three different settings, in which the attacker receives decreasingly informative feedback from the target model: confidence vector (CV) disclosure, in which the output probability vector is released; true label confidence (TLC) disclosure, in which only the probability assigned to the true label is made available by the model; and decision set (DS) disclosure, in which an adaptive prediction set is produced as in conformal prediction. We derive bounds on the advantage of an MIA adversary with the aim of offering insights into the impact of uncertainty and calibration on the effectiveness of MIAs. Simulation results demonstrate that the derived analytical bounds predict well the effectiveness of MIAs. △ Less

Submitted 15 August, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: 13 pages, 20 figures

arXiv:2401.13893 [pdf, ps, other]

A Survey on Indoor Visible Light Positioning Systems: Fundamentals, Applications, and Challenges

Authors: Zhiyu Zhu, Yang Yang, Mingzhe Chen, Caili Guo, Julian Cheng, Shuguang Cui

Abstract: The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light communications (VLC), becomes a promising indoor positioning technology due to its high accuracy and low cost. This paper provides a comprehensive survey of VLP systems. In particular,… ▽ More The growing demand for location-based services in areas like virtual reality, robot control, and navigation has intensified the focus on indoor localization. Visible light positioning (VLP), leveraging visible light communications (VLC), becomes a promising indoor positioning technology due to its high accuracy and low cost. This paper provides a comprehensive survey of VLP systems. In particular, since VLC lays the foundation for VLP, we first present a detailed overview of the principles of VLC. The performance of each positioning algorithm is also compared in terms of various metrics such as accuracy, coverage, and orientation limitation. Beyond the physical layer studies, the network design for a VLP system is also investigated, including multi-access technologies resource allocation, and light-emitting diode (LED) placements. Next, the applications of the VLP systems are overviewed. Finally, this paper outlines open issues, challenges, and future research directions for the research field. In a nutshell, this paper constitutes the first holistic survey on VLP from state-of-the-art studies to practical uses. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.05365 [pdf, other]

Online Action Recognition for Human Risk Prediction with Anticipated Haptic Alert via Wearables

Authors: Cheng Guo, Lorenzo Rapetti, Kourosh Darvish, Riccardo Grieco, Francesco Draicchio, Daniele Pucci

Abstract: This paper proposes a framework that combines online human state estimation, action recognition and motion prediction to enable early assessment and prevention of worker biomechanical risk during lifting tasks. The framework leverages the NIOSH index to perform online risk assessment, thus fitting real-time applications. In particular, the human state is retrieved via inverse kinematics/dynamics a… ▽ More This paper proposes a framework that combines online human state estimation, action recognition and motion prediction to enable early assessment and prevention of worker biomechanical risk during lifting tasks. The framework leverages the NIOSH index to perform online risk assessment, thus fitting real-time applications. In particular, the human state is retrieved via inverse kinematics/dynamics algorithms from wearable sensor data. Human action recognition and motion prediction are achieved by implementing an LSTM-based Guided Mixture of Experts architecture, which is trained offline and inferred online. With the recognized actions, a single lifting activity is divided into a series of continuous movements and the Revised NIOSH Lifting Equation can be applied for risk assessment. Moreover, the predicted motions enable anticipation of future risks. A haptic actuator, embedded in the wearable system, can alert the subject of potential risk, acting as an active prevention device. The performance of the proposed framework is validated by executing real lifting tasks, while the subject is equipped with the iFeel wearable system. △ Less

Submitted 14 December, 2023; originally announced January 2024.

Comments: 8 pages, 7 figures, accepted at 2023 IEEE-RAS International Conference on Humanoid Robots (Humanoids)

arXiv:2401.02178 [pdf, other]

OFDM-Based Digital Semantic Communication with Importance Awareness

Authors: Chuanhong Liu, Caili Guo, Yang Yang, Wanli Ni, Tony Q. S. Quek

Abstract: Semantic communication (SemCom) has received considerable attention for its ability to reduce data transmission size while maintaining task performance. However, existing works mainly focus on analog SemCom with simple channel models, which may limit its practical application. To reduce this gap, we propose an orthogonal frequency division multiplexing (OFDM)-based SemCom system that is compatible… ▽ More Semantic communication (SemCom) has received considerable attention for its ability to reduce data transmission size while maintaining task performance. However, existing works mainly focus on analog SemCom with simple channel models, which may limit its practical application. To reduce this gap, we propose an orthogonal frequency division multiplexing (OFDM)-based SemCom system that is compatible with existing digital communication infrastructures. In the considered system, the extracted semantics is quantized by scalar quantizers, transformed into OFDM signal, and then transmitted over the frequency-selective channel. Moreover, we propose a semantic importance measurement method to build the relationship between target task and semantic features. Based on semantic importance, we formulate a sub-carrier and bit allocation problem to maximize communication performance. However, the optimization objective function cannot be accurately characterized using a mathematical expression due to the neural network-based semantic codec. Given the complex nature of the problem, we first propose a low-complexity sub-carrier allocation method that assigns sub-carriers with better channel conditions to more critical semantics. Then, we propose a deep reinforcement learning-based bit allocation algorithm with dynamic action space. Simulation results demonstrate that the proposed system achieves 9.7% and 28.7% performance gains compared to analog SemCom and conventional bit-based communication systems, respectively. △ Less

Submitted 4 January, 2024; originally announced January 2024.

arXiv:2312.12789 [pdf, other]

SLP-Net:An efficient lightweight network for segmentation of skin lesions

Authors: Bo Yang, Hong Peng, Chenggang Guo, Xiaohui Luo, Jun Wang, Xianzhong Long

Abstract: Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hard… ▽ More Prompt treatment for melanoma is crucial. To assist physicians in identifying lesion areas precisely in a quick manner, we propose a novel skin lesion segmentation technique namely SLP-Net, an ultra-lightweight segmentation network based on the spiking neural P(SNP) systems type mechanism. Most existing convolutional neural networks achieve high segmentation accuracy while neglecting the high hardware cost. SLP-Net, on the contrary, has a very small number of parameters and a high computation speed. We design a lightweight multi-scale feature extractor without the usual encoder-decoder structure. Rather than a decoder, a feature adaptation module is designed to replace it and implement multi-scale information decoding. Experiments at the ISIC2018 challenge demonstrate that the proposed model has the highest Acc and DSC among the state-of-the-art methods, while experiments on the PH2 dataset also demonstrate a favorable generalization ability. Finally, we compare the computational complexity as well as the computational speed of the models in experiments, where SLP-Net has the highest overall superiority △ Less

Submitted 4 January, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

arXiv:2310.07130 [pdf, other]

Edge Cloud Collaborative Stream Computing for Real-Time Structural Health Monitoring

Authors: Wenzhao Zhang, Cheng Guo, Yi Gao, Wei Dong

Abstract: Structural Health Monitoring (SHM) is crucial for the safety and maintenance of various infrastructures. Due to the large amount of data generated by numerous sensors and the high real-time requirements of many applications, SHM poses significant challenges. Although the cloud-centric stream computing paradigm opens new opportunities for real-time data processing, it consumes too much network band… ▽ More Structural Health Monitoring (SHM) is crucial for the safety and maintenance of various infrastructures. Due to the large amount of data generated by numerous sensors and the high real-time requirements of many applications, SHM poses significant challenges. Although the cloud-centric stream computing paradigm opens new opportunities for real-time data processing, it consumes too much network bandwidth. In this paper, we propose ECStream, an Edge Cloud collaborative fine-grained stream operator scheduling framework for SHM. We collectively consider atomic and composite operators together with their iterative computability to model and formalize the problem of minimizing bandwidth usage and end-to-end operator processing latency. Preliminary evaluation results show that ECStream can effectively balance bandwidth usage and end-to-end operator computation latency, reducing bandwidth usage by 73.01% and latency by 34.08% on average compared to the cloud-centric approach. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2310.04644 [pdf, other]

Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction

Authors: Jiawei Li, Chunxu Guo, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

Abstract: Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performan… ▽ More Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility. △ Less

Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2309.10263 [pdf, other]

Disentangled Information Bottleneck guided Privacy-Protective JSCC for Image Transmission

Authors: Lunan Sun, Yang Yang, Mingzhe Chen, Caili Guo

Abstract: Joint source and channel coding (JSCC) has attracted increasing attention due to its robustness and high efficiency. However, JSCC is vulnerable to privacy leakage due to the high relevance between the source image and channel input. In this paper, we propose a disentangled information bottleneck guided privacy-protective JSCC (DIB-PPJSCC) for image transmission, which aims at protecting private i… ▽ More Joint source and channel coding (JSCC) has attracted increasing attention due to its robustness and high efficiency. However, JSCC is vulnerable to privacy leakage due to the high relevance between the source image and channel input. In this paper, we propose a disentangled information bottleneck guided privacy-protective JSCC (DIB-PPJSCC) for image transmission, which aims at protecting private information as well as achieving superior communication performance at the legitimate receiver. In particular, we propose a DIB objective to disentangle private and public information. The goal is to compress the private information in the public subcodewords, preserve the private information in the private subcodewords and improve the reconstruction quality simultaneously. In order to optimize JSCC neural networks using the DIB objective, we derive a differentiable estimation of the DIB objective based on the variational approximation and the density-ratio trick. Additionally, we design a password-based privacy-protective (PP) algorithm which can be jointly optimized with JSCC neural networks to encrypt the private subcodewords. Specifically, we employ a private information encryptor to encrypt the private subcodewords before transmission, and a corresponding decryptor to recover the private information at the legitimate receiver. A loss function for jointly training the encryptor, decryptor and JSCC decoder is derived based on the maximum entropy principle, which aims at maximizing the eavesdropping uncertainty as well as improving the reconstruction quality. Experimental results show that DIB-PPJSCC can reduce the eavesdropping accuracy on private information up to $15\%$ and reduce $10\%$ inference time compared to existing privacy-protective JSCC and traditional separate methods. △ Less

Submitted 18 September, 2023; originally announced September 2023.

arXiv:2309.08402 [pdf, other]

3D SA-UNet: 3D Spatial Attention UNet with 3D ASPP for White Matter Hyperintensities Segmentation

Authors: Changlu Guo

Abstract: White Matter Hyperintensity (WMH) is an imaging feature related to various diseases such as dementia and stroke. Accurately segmenting WMH using computer technology is crucial for early disease diagnosis. However, this task remains challenging due to the small lesions with low contrast and high discontinuity in the images, which contain limited contextual and spatial information. To address this c… ▽ More White Matter Hyperintensity (WMH) is an imaging feature related to various diseases such as dementia and stroke. Accurately segmenting WMH using computer technology is crucial for early disease diagnosis. However, this task remains challenging due to the small lesions with low contrast and high discontinuity in the images, which contain limited contextual and spatial information. To address this challenge, we propose a deep learning model called 3D Spatial Attention U-Net (3D SA-UNet) for automatic WMH segmentation using only Fluid Attenuation Inversion Recovery (FLAIR) scans. The 3D SA-UNet introduces a 3D Spatial Attention Module that highlights important lesion features, such as WMH, while suppressing unimportant regions. Additionally, to capture features at different scales, we extend the Atrous Spatial Pyramid Pooling (ASPP) module to a 3D version, enhancing the segmentation performance of the network. We evaluate our method on publicly available dataset and demonstrate the effectiveness of 3D spatial attention module and 3D ASPP in WMH segmentation. Through experimental results, it has been demonstrated that our proposed 3D SA-UNet model achieves higher accuracy compared to other state-of-the-art 3D convolutional neural networks. △ Less

Submitted 20 November, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.08188 [pdf, other]

Privacy-Aware Joint Source-Channel Coding for image transmission based on Disentangled Information Bottleneck

Authors: Lunan Sun, Caili Guo, Mingzhe Chen, Yang Yang

Abstract: Current privacy-aware joint source-channel coding (JSCC) works aim at avoiding private information transmission by adversarially training the JSCC encoder and decoder under specific signal-to-noise ratios (SNRs) of eavesdroppers. However, these approaches incur additional computational and storage requirements as multiple neural networks must be trained for various eavesdroppers' SNRs to determine… ▽ More Current privacy-aware joint source-channel coding (JSCC) works aim at avoiding private information transmission by adversarially training the JSCC encoder and decoder under specific signal-to-noise ratios (SNRs) of eavesdroppers. However, these approaches incur additional computational and storage requirements as multiple neural networks must be trained for various eavesdroppers' SNRs to determine the transmitted information. To overcome this challenge, we propose a novel privacy-aware JSCC for image transmission based on disentangled information bottleneck (DIB-PAJSCC). In particular, we derive a novel disentangled information bottleneck objective to disentangle private and public information. Given the separate information, the transmitter can transmit only public information to the receiver while minimizing reconstruction distortion. Since DIB-PAJSCC transmits only public information regardless of the eavesdroppers' SNRs, it can eliminate additional training adapted to eavesdroppers' SNRs. Experimental results show that DIB-PAJSCC can reduce the eavesdropping accuracy on private information by up to 20\% compared to existing methods. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.01072 [pdf, other]

Channel Attention Separable Convolution Network for Skin Lesion Segmentation

Authors: Changlu Guo, Jiangyan Dai, Marton Szemenyei, Yugen Yi

Abstract: Skin cancer is a frequently occurring cancer in the human population, and it is very important to be able to diagnose malignant tumors in the body early. Lesion segmentation is crucial for monitoring the morphological changes of skin lesions, extracting features to localize and identify diseases to assist doctors in early diagnosis. Manual de-segmentation of dermoscopic images is error-prone and t… ▽ More Skin cancer is a frequently occurring cancer in the human population, and it is very important to be able to diagnose malignant tumors in the body early. Lesion segmentation is crucial for monitoring the morphological changes of skin lesions, extracting features to localize and identify diseases to assist doctors in early diagnosis. Manual de-segmentation of dermoscopic images is error-prone and time-consuming, thus there is a pressing demand for precise and automated segmentation algorithms. Inspired by advanced mechanisms such as U-Net, DenseNet, Separable Convolution, Channel Attention, and Atrous Spatial Pyramid Pooling (ASPP), we propose a novel network called Channel Attention Separable Convolution Network (CASCN) for skin lesions segmentation. The proposed CASCN is evaluated on the PH2 dataset with limited images. Without excessive pre-/post-processing of images, CASCN achieves state-of-the-art performance on the PH2 dataset with Dice similarity coefficient of 0.9461 and accuracy of 0.9645. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: Accepted by ICONIP 2023

arXiv:2308.03448 [pdf, other]

Make Explicit Calibration Implicit: Calibrate Denoiser Instead of the Noise Model

Authors: Xin Jin, Jia-Wen Xiao, Ling-Hao Han, Chunle Guo, Xialei Liu, Chongyi Li, Ming-Ming Cheng

Abstract: Explicit calibration-based methods have dominated RAW image denoising under extremely low-light environments. However, these methods are impeded by several critical limitations: a) the explicit calibration process is both labor- and time-intensive, b) challenge exists in transferring denoisers across different camera models, and c) the disparity between synthetic and real noise is exacerbated by d… ▽ More Explicit calibration-based methods have dominated RAW image denoising under extremely low-light environments. However, these methods are impeded by several critical limitations: a) the explicit calibration process is both labor- and time-intensive, b) challenge exists in transferring denoisers across different camera models, and c) the disparity between synthetic and real noise is exacerbated by digital gain. To address these issues, we introduce a groundbreaking pipeline named Lighting Every Darkness (LED), which is effective regardless of the digital gain or the camera sensor. LED eliminates the need for explicit noise model calibration, instead utilizing an implicit fine-tuning process that allows quick deployment and requires minimal data. Structural modifications are also included to reduce the discrepancy between synthetic and real noise without extra computational demands. Our method surpasses existing methods in various camera models, including new ones not in public datasets, with just a few pairs per digital gain and only 0.5% of the typical iterations. Furthermore, LED also allows researchers to focus more on deep learning advancements while still utilizing sensor engineering benefits. Code and related materials can be found in https://srameo.github.io/projects/led-iccv23/ . △ Less

Submitted 25 December, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2306.08918 [pdf, other]

PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN with Dual-Discriminators

Authors: Runmin Cong, Wenyu Yang, Wei Zhang, Chongyi Li, Chun-Le Guo, Qingming Huang, Sam Kwong

Abstract: Due to the light absorption and scattering induced by the water medium, underwater images usually suffer from some degradation problems, such as low contrast, color distortion, and blurring details, which aggravate the difficulty of downstream underwater understanding tasks. Therefore, how to obtain clear and visually pleasant images has become a common concern of people, and the task of underwate… ▽ More Due to the light absorption and scattering induced by the water medium, underwater images usually suffer from some degradation problems, such as low contrast, color distortion, and blurring details, which aggravate the difficulty of downstream underwater understanding tasks. Therefore, how to obtain clear and visually pleasant images has become a common concern of people, and the task of underwater image enhancement (UIE) has also emerged as the times require. Among existing UIE methods, Generative Adversarial Networks (GANs) based methods perform well in visual aesthetics, while the physical model-based methods have better scene adaptability. Inheriting the advantages of the above two types of models, we propose a physical model-guided GAN model for UIE in this paper, referred to as PUGAN. The entire network is under the GAN architecture. On the one hand, we design a Parameters Estimation subnetwork (Par-subnet) to learn the parameters for physical model inversion, and use the generated color enhancement image as auxiliary information for the Two-Stream Interaction Enhancement sub-network (TSIE-subnet). Meanwhile, we design a Degradation Quantization (DQ) module in TSIE-subnet to quantize scene degradation, thereby achieving reinforcing enhancement of key regions. On the other hand, we design the Dual-Discriminators for the style-content adversarial constraint, promoting the authenticity and visual aesthetics of the results. Extensive experiments on three benchmark datasets demonstrate that our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics. △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: 8 pages, 4 figures, Accepted by IEEE Transactions on Image Processing 2023

arXiv:2305.16055 [pdf, ps, other]

Machine Learning-Based Automatic Cardiovascular Disease Diagnosis Using Two ECG Leads

Authors: Cheng Guo, Sajid Ahmed, Mohamed-Slim Alouini

Abstract: The state-of-the-art cardiovascular disease diagnosis techniques use machine-learning algorithms based on feature extraction and classification. In this work, in contrast to a conventional single Electrocardiogram (ECG) lead, two leads are used, and autoregressive (AR) coefficients and statistical parameters are extracted to be used as features. Four machine-learning classifiers support-vector-mac… ▽ More The state-of-the-art cardiovascular disease diagnosis techniques use machine-learning algorithms based on feature extraction and classification. In this work, in contrast to a conventional single Electrocardiogram (ECG) lead, two leads are used, and autoregressive (AR) coefficients and statistical parameters are extracted to be used as features. Four machine-learning classifiers support-vector-machine (SVM), K-nearest neighbors (KNN), multi-layer perceptron (MLP), and Naive Bayes are applied on these features to test the accuracy of each classifier. For simulation, data is collected from the MIT-BIH and Shaoxing Peoples Hospital China (SPHC) database. To test the generalization ability of our proposed methodology machine-learning model is built on the SPHC database and tested on the MIT-BIH database and self-collected datasets. In the single-database simulation, the MLP performs better than the other three classifiers. While in the cross-database simulation, the SVM-based model trained by the SPHC database shows superiority. For normal and LBBB heartbeats, the predicted recall respectively reaches 100% and 98.4%. Simulation results show that the performance of our proposed methodology is better than the state-of-the-art techniques for the same database. While for cross-database simulation, the results are promising too. Finally, in the demonstration of our realized system, all heartbeats collected from healthy people are classified as normal beats. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 15 pages, 11 figures

MSC Class: 53A45

arXiv:2305.00505 [pdf, ps, other]

Fixed-time safe tracking control of uncertain high-order nonlinear pure-feedback systems via unified transformation functions

Authors: Chaoqun Guo, Jiangping Hu, Jiasheng Hao, Sergej Celikovsky, Xiaoming Hu

Abstract: In this paper, a fixed-time safe control problem is investigated for an uncertain high-order nonlinear pure-feedback system with state constraints. A new nonlinear transformation function is firstly proposed to handle both the constrained and unconstrained cases in a unified way. Further, a radial basis function neural network is constructed to approximate the unknown dynamics in the system and a… ▽ More In this paper, a fixed-time safe control problem is investigated for an uncertain high-order nonlinear pure-feedback system with state constraints. A new nonlinear transformation function is firstly proposed to handle both the constrained and unconstrained cases in a unified way. Further, a radial basis function neural network is constructed to approximate the unknown dynamics in the system and a fixed-time dynamic surface control (FDSC) technique is developed to facilitate the fixed-time control design for the uncertain high-order pure-feedback system. Combined with the proposed unified transformation function and the FDSC technique, an adaptive fixed-time control strategy is proposed to guarantee the fixed-time tracking. The proposed fixed-time control strategy can guarantee uniform control structure when addressing both constrained and unconstrained situations. Numerical examples are presented to demonstrate the proposed fixed-time tracking control strategy. △ Less

Submitted 30 April, 2023; originally announced May 2023.

arXiv:2303.12286 [pdf, other]

Explainable Semantic Communication for Text Tasks

Authors: Chuanhong Liu, Caili Guo, Yang Yang, Wanli Ni, Yanquan Zhou, Lei Li, Tony Q. S. Quek

Abstract: Task-oriented semantic communication has gained increasing attention due to its ability to reduce the amount of transmitted data without sacrificing task performance. Although some prior efforts have been dedicated to developing semantic communications, the semantics in these works remains to be unexplainable. Challenges related to explainable semantic representation and knowledge-based semantic c… ▽ More Task-oriented semantic communication has gained increasing attention due to its ability to reduce the amount of transmitted data without sacrificing task performance. Although some prior efforts have been dedicated to developing semantic communications, the semantics in these works remains to be unexplainable. Challenges related to explainable semantic representation and knowledge-based semantic compression have yet to be explored. In this paper, we propose a triplet-based explainable semantic communication (TESC) scheme for representing text semantics efficiently. Specifically, we develop a semantic extraction method to convert text into triplets while using syntactic dependency analysis to enhance semantic completeness. Then, we design a semantic filtering method to further compress the duplicate and task-irrelevant triplets based on prior knowledge. The filtered triplets are encoded and transmitted to the receiver for completing intelligent tasks. Furthermore, we apply the propsed TESC scheme to two emblematic text tasks: sentiment analysis and question answering, in which the semantic codec is meticulously customized for each task. Experimental results demonstrate that 1) TESC scheme outperforms benchmarks in terms of Top-1 accuracy and transmission efficiency, and 2) TESC scheme enjoys about 150% performance gain compared to the traditional communication method. △ Less

Submitted 17 May, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2303.04854 [pdf, other]

Structural Similarity: When to Use Deep Generative Models on Imbalanced Image Dataset Augmentation

Authors: Chenqi Guo, Fabian Benitez-Quiroz, Qianli Feng, Aleix Martinez

Abstract: Improving the performance on an imbalanced training set is one of the main challenges in nowadays Machine Learning. One way to augment and thus re-balance the image dataset is through existing deep generative models, like class-conditional Generative Adversarial Networks (cGAN) or Diffusion Models by synthesizing images on each of the tail-class. Our experiments on imbalanced image dataset classif… ▽ More Improving the performance on an imbalanced training set is one of the main challenges in nowadays Machine Learning. One way to augment and thus re-balance the image dataset is through existing deep generative models, like class-conditional Generative Adversarial Networks (cGAN) or Diffusion Models by synthesizing images on each of the tail-class. Our experiments on imbalanced image dataset classification show that, the validation accuracy improvement with such re-balancing method is related to the image similarity between different classes. Thus, to quantify this image dataset class similarity, we propose a measurement called Super-Sub Class Structural Similarity (SSIM-supSubCls) based on Structural Similarity (SSIM). A deep generative model data augmentation classification (GM-augCls) pipeline is also provided to verify this metric correlates with the accuracy enhancement. We further quantify the relationship between them, discovering that the accuracy improvement decays exponentially with respect to SSIM-supSubCls values. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2302.02287 [pdf, ps, other]

Deep Joint Source-Channel Coding for Wireless Image Transmission with Semantic Importance

Authors: Qizheng Sun, Caili Guo, Yang Yang, Jiujiu Chen, Rui Tang, Chuanhong Liu

Abstract: The sixth-generation mobile communication system proposes the vision of smart interconnection of everything, which requires accomplishing communication tasks while ensuring the performance of intelligent tasks. A joint source-channel coding method based on semantic importance is proposed, which aims at preserving semantic information during wireless image transmission and thereby boosting the perf… ▽ More The sixth-generation mobile communication system proposes the vision of smart interconnection of everything, which requires accomplishing communication tasks while ensuring the performance of intelligent tasks. A joint source-channel coding method based on semantic importance is proposed, which aims at preserving semantic information during wireless image transmission and thereby boosting the performance of intelligent tasks for images at the receiver. Specifically, we first propose semantic importance weight calculation method, which is based on the gradient of intelligent task's perception results with respect to the features. Then, we design the semantic loss function in the way of using semantic weights to weight the features. Finally, we train the deep joint source-channel coding network using the semantic loss function. Experiment results demonstrate that the proposed method achieves up to 57.7% and 9.1% improvement in terms of intelligent task's performance compared with the source-channel separation coding method and the deep sourcechannel joint coding method without considering semantics at the same compression rate and signal-to-noise ratio, respectively. △ Less

Submitted 4 February, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2208.11375

arXiv:2212.12097 [pdf, other]

Tightening Quadratic Convex Relaxations for the AC Optimal Transmission Switching Problem

Authors: Cheng Guo, Harsha Nagarajan, Merve Bodur

Abstract: The Alternating Current Optimal Transmission Switching (ACOTS) problem incorporates line switching decisions into the fundamental AC optimal power flow (ACOPF) problem. The advantages of the ACOTS problem are well-known in terms of reducing the operational cost and improving system reliability. ACOTS optimization models contain discrete variables and nonlinear, non-convex structures, which make it… ▽ More The Alternating Current Optimal Transmission Switching (ACOTS) problem incorporates line switching decisions into the fundamental AC optimal power flow (ACOPF) problem. The advantages of the ACOTS problem are well-known in terms of reducing the operational cost and improving system reliability. ACOTS optimization models contain discrete variables and nonlinear, non-convex structures, which make it difficult to solve. We derive strengthened quadratic convex (QC) relaxations for ACOTS by combining several methodologies recently developed in the ACOPF literature. First, we relax the ACOTS model with the on/off QC relaxation, which has been empirically observed to be both tight and computationally efficient in approximating the ACOPF problem. Further, we tighten this relaxation by using strong linearization with extreme-point representation, and by adding several types of new valid inequalities. In particular, we derive a novel kind of "on/off cycle-based polynomial constraints", by taking advantage of the network structure. Those constraints are linearized using convex-hull representations and implemented in an efficient "branch-and-cut" framework. We also tighten the relaxation using the optimization-based bound tightening algorithm. Our extensive numerical experiments on medium-scale PGLib instances show that, compared with the state-of-the-art formulations, our strengthening techniques are able to improve the quality of ACOTS relaxations on many of the PGLib instances, with some being substantial improvements. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Report number: LA-UR-22-33111

arXiv:2212.09337 [pdf, other]

doi 10.1109/LSP.2023.3266115

Information Bottleneck-Inspired Type Based Multiple Access for Remote Estimation in IoT Systems

Authors: Meiyi Zhu, Chunyan Feng, Caili Guo, Nan Jiang, Osvaldo Simeone

Abstract: Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of ob… ▽ More Type-based multiple access (TBMA) is a semantics-aware multiple access protocol for remote inference. In TBMA, codewords are reused across transmitting sensors, with each codeword being assigned to a different observation value. Existing TBMA protocols are based on fixed shared codebooks and on conventional maximum-likelihood or Bayesian decoders, which require knowledge of the distributions of observations and channels. In this letter, we propose a novel design principle for TBMA based on the information bottleneck (IB). In the proposed IB-TBMA protocol, the shared codebook is jointly optimized with a decoder based on artificial neural networks (ANNs), so as to adapt to source, observations, and channel statistics based on data only. We also introduce the Compressed IB-TBMA (CIB-TBMA) protocol, which improves IB-TBMA by enabling a reduction in the number of codewords via an IB-inspired clustering phase. Numerical results demonstrate the importance of a joint design of codebook and neural decoder, and validate the benefits of codebook compression. △ Less

Submitted 5 April, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

Comments: 5 pages, 3 figures, accepted by IEEE Signal Processing Letters (SPL)

arXiv:2212.06466 [pdf, other]

doi 10.1145/3581783.3612084

U2Net: A General Framework with Spatial-Spectral-Integrated Double U-Net for Image Fusion

Authors: Siran Peng, Chenhao Guo, Xiao Wu, Liang-Jian Deng

Abstract: In image fusion tasks, images obtained from different sources exhibit distinct properties. Consequently, treating them uniformly with a single-branch network can lead to inadequate feature extraction. Additionally, numerous works have demonstrated that multi-scaled networks capture information more sufficiently than single-scaled models in pixel-level computer vision problems. Considering these fa… ▽ More In image fusion tasks, images obtained from different sources exhibit distinct properties. Consequently, treating them uniformly with a single-branch network can lead to inadequate feature extraction. Additionally, numerous works have demonstrated that multi-scaled networks capture information more sufficiently than single-scaled models in pixel-level computer vision problems. Considering these factors, we propose U2Net, a spatial-spectral-integrated double U-shape network for image fusion. The U2Net utilizes a spatial U-Net and a spectral U-Net to extract spatial details and spectral characteristics, which allows for the discriminative and hierarchical learning of features from diverse images. In contrast to most previous works that merely employ concatenation to merge spatial and spectral information, this paper introduces a novel spatial-spectral integration structure called S2Block, which combines feature maps from different sources in a logical and effective way. We conduct a series of experiments on two image fusion tasks, including remote sensing pansharpening and hyperspectral image super-resolution (HISR). The U2Net outperforms representative state-of-the-art (SOTA) approaches in both quantitative and qualitative evaluations, demonstrating the superiority of our method. The code is available at https://github.com/PSRben/U2Net. △ Less

Submitted 2 October, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

Comments: Accepted by the 31st ACM International Conference on Multimedia (ACM MM '23)

arXiv:2210.13004 [pdf, other]

Efficient Representation of Natural Image Patches

Authors: Cheng Guo

Abstract: Utilizing an abstract information processing model based on minimal yet realistic assumptions inspired by biological systems, we study how to achieve the early visual system's two ultimate objectives: efficient information transmission and accurate sensor probability distribution modeling. We prove that optimizing for information transmission does not guarantee optimal probability distribution mod… ▽ More Utilizing an abstract information processing model based on minimal yet realistic assumptions inspired by biological systems, we study how to achieve the early visual system's two ultimate objectives: efficient information transmission and accurate sensor probability distribution modeling. We prove that optimizing for information transmission does not guarantee optimal probability distribution modeling in general. We illustrate, using a two-pixel (2D) system and image patches, that an efficient representation can be realized through a nonlinear population code driven by two types of biologically plausible loss functions that depend solely on output. After unsupervised learning, our abstract information processing model bears remarkable resemblances to biological systems, despite not mimicking many features of real neurons, such as spiking activity. A preliminary comparison with a contemporary deep learning model suggests that our model offers a significant efficiency advantage. Our model provides novel insights into the computational theory of early visual systems as well as a potential new approach to enhance the efficiency of deep learning models. △ Less

Submitted 11 April, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

arXiv:2209.03918 [pdf, other]

A multi view multi stage and multi window framework for pulmonary artery segmentation from CT scans

Authors: ZeYu Liu, Yi Wang, Jing Wen, Yong Zhang, Hao Yin, Chao Guo, ZhongYu Wang

Abstract: This is the technical report of the 9th place in the final result of PARSE2022 Challenge. We solve the segmentation problem of the pulmonary artery by using a two-stage method based on a 3D CNN network. The coarse model is used to locate the ROI, and the fine model is used to refine the segmentation result. In addition, in order to improve the segmentation performance, we adopt multi-view and mult… ▽ More This is the technical report of the 9th place in the final result of PARSE2022 Challenge. We solve the segmentation problem of the pulmonary artery by using a two-stage method based on a 3D CNN network. The coarse model is used to locate the ROI, and the fine model is used to refine the segmentation result. In addition, in order to improve the segmentation performance, we adopt multi-view and multi-window level method, at the same time we employ a fine-tune strategy to mitigate the impact of inconsistent labeling. △ Less

Submitted 14 September, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.11375 [pdf, other]

Deep Joint Source-Channel Coding Based on Semantics of Pixels

Authors: Qizheng Sun, Caili Guo, Yang Yang, Jiujiu Chen, Rui Tang, Chuanhong Liu

Abstract: The semantic information of the image for intelligent tasks is hidden behind the pixels, and slight changes in the pixels will affect the performance of intelligent tasks. In order to preserve semantic information behind pixels for intelligent tasks during wireless image transmission, we propose a joint source-channel coding method based on semantics of pixels, which can improve the performance of… ▽ More The semantic information of the image for intelligent tasks is hidden behind the pixels, and slight changes in the pixels will affect the performance of intelligent tasks. In order to preserve semantic information behind pixels for intelligent tasks during wireless image transmission, we propose a joint source-channel coding method based on semantics of pixels, which can improve the performance of intelligent tasks for images at the receiver by retaining semantic information. Specifically, we first utilize gradients of intelligent task's perception results with respect to pixels to represent the semantic importance of pixels. Then, we extract the semantic distortion, and train the deep joint source-channel coding network with the goal of minimizing semantic distortion rather than pixel's distortion. Experiment results demonstrate that the proposed method improves the performance of the intelligent classification task by 1.38% and 66% compared with the SOTA deep joint source-channel coding method and the traditional separately source-channel coding method at the same transmission ra te and signal-to-noise ratio. △ Less

Submitted 24 August, 2022; originally announced August 2022.

arXiv:2204.08910 [pdf, other]

Adaptable Semantic Compression and Resource Allocation for Task-Oriented Communications

Authors: Chuanhong Liu, Caili Guo, Yang Yang, Nan Jiang

Abstract: Task-oriented communication is a new paradigm that aims at providing efficient connectivity for accomplishing intelligent tasks rather than the reception of every transmitted bit. In this paper, a deep learning-based task-oriented communication architecture is proposed where the user extracts, compresses and transmits semantics in an end-to-end (E2E) manner. Furthermore, an approach is proposed to… ▽ More Task-oriented communication is a new paradigm that aims at providing efficient connectivity for accomplishing intelligent tasks rather than the reception of every transmitted bit. In this paper, a deep learning-based task-oriented communication architecture is proposed where the user extracts, compresses and transmits semantics in an end-to-end (E2E) manner. Furthermore, an approach is proposed to compress the semantics according to their importance relevant to the task, namely, adaptable semantic compression (ASC). Assuming a delay-intolerant system, supporting multiple users indicates a problem that executing with the higher compression ratio requires fewer channel resources but leads to the distortion of semantics, while executing with the lower compression ratio requires more channel resources and thus may lead to a transmission failure due to delay constraint. To solve the problem, both compression ratio and resource allocation are optimized for the task-oriented communication system to maximize the success probability of tasks. Specifically, due to the nonconvexity of the problem, we propose a compression ratio and resource allocation (CRRA) algorithm by separating the problem into two subproblems and solving iteratively to obtain the convergent solution. Furthermore, considering the scenarios where users have various service levels, a compression ratio, resource allocation, and user selection (CRRAUS) algorithm is proposed to deal with the problem. In CRRAUS, users are adaptively selected to complete the corresponding intelligent tasks based on branch and bound method at the expense of higher algorithm complexity compared with CRRA. Simulation results show that the proposed CRRA and CRRAUS algorithms can obtain at least 15% and 10% success gains over baseline algorithms, respectively. △ Less

Submitted 19 April, 2022; originally announced April 2022.

arXiv:2204.08131 [pdf, ps, other]

Positioning Using Visible Light Communications: A Perspective Arcs Approach

Authors: Zhiyu Zhu, Caili Guo, Rongzhen Bao, Mingzhe Chen, Walid Saad, Yang Yang

Abstract: Visible light positioning (VLP) is an accurate indoor positioning technology that uses luminaires as transmitters. In particular, circular luminaires are a common source type for VLP, that are typically treated only as point sources for positioning, while ignoring their geometry characteristics. In this paper, the arc feature of the circular luminaire and the coordinate information obtained via vi… ▽ More Visible light positioning (VLP) is an accurate indoor positioning technology that uses luminaires as transmitters. In particular, circular luminaires are a common source type for VLP, that are typically treated only as point sources for positioning, while ignoring their geometry characteristics. In this paper, the arc feature of the circular luminaire and the coordinate information obtained via visible light communication (VLC) are jointly used for VLC-enabled indoor positioning, and a novel perspective arcs approach is proposed. The proposed approach does not rely on any inertial measurement unit, and has no tilted angle limitations at the user. First, a VLC assisted perspective circle and arc algorithm (V-PCA) is proposed for a scenario in which a complete luminaire and an incomplete one can be captured by the user. Considering the cases in which parts of VLC links are blocked, an anti-occlusion VLC assisted perspective arcs algorithm (OA-V-PA) is proposed. Simulation results show that the proposed indoor positioning algorithm can achieve a 95th percentile positioning accuracy of around 10 cm. Moreover, an experimental prototype based on mobile phone is implemented, in which, a fused image processing method is proposed. Experimental results show that the average positioning accuracy is less than 5 cm. △ Less

Submitted 17 April, 2022; originally announced April 2022.

arXiv:2204.02663 [pdf, other]

Towards An End-to-End Framework for Flow-Guided Video Inpainting

Authors: Zhen Li, Cheng-Ze Lu, Jianhua Qin, Chun-Le Guo, Ming-Ming Cheng

Abstract: Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes in these methods are applied separately to form the whole inpainting pipeline. Thus, these methods are less efficient and rely heavily on the intermediate results from earlier stages. In this… ▽ More Optical flow, which captures motion information across frames, is exploited in recent video inpainting methods through propagating pixels along its trajectories. However, the hand-crafted flow-based processes in these methods are applied separately to form the whole inpainting pipeline. Thus, these methods are less efficient and rely heavily on the intermediate results from earlier stages. In this paper, we propose an End-to-End framework for Flow-Guided Video Inpainting (E$^2$FGVI) through elaborately designed three trainable modules, namely, flow completion, feature propagation, and content hallucination modules. The three modules correspond with the three stages of previous flow-based methods but can be jointly optimized, leading to a more efficient and effective inpainting process. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods both qualitatively and quantitatively and shows promising efficiency. The code is available at https://github.com/MCG-NKU/E2FGVI. △ Less

Submitted 7 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted to CVPR 2022

arXiv:2202.06369 [pdf, ps, other]

Incremental user embedding modeling for personalized text classification

Authors: Ruixue Lian, Che-Wei Huang, Yuqing Tang, Qilong Gu, Chengyuan Ma, Chenlei Guo

Abstract: Individual user profiles and interaction histories play a significant role in providing customized experiences in real-world applications such as chatbots, social media, retail, and education. Adaptive user representation learning by utilizing user personalized information has become increasingly challenging due to ever-growing history data. In this work, we propose an incremental user embedding m… ▽ More Individual user profiles and interaction histories play a significant role in providing customized experiences in real-world applications such as chatbots, social media, retail, and education. Adaptive user representation learning by utilizing user personalized information has become increasingly challenging due to ever-growing history data. In this work, we propose an incremental user embedding modeling approach, in which embeddings of user's recent interaction histories are dynamically integrated into the accumulated history vectors via a transformer encoder. This modeling paradigm allows us to create generalized user representations in a consecutive manner and also alleviate the challenges of data management. We demonstrate the effectiveness of this approach by applying it to a personalized multi-class classification task based on the Reddit dataset, and achieve 9% and 30% relative improvement on prediction accuracy over a baseline system for two experiment settings through appropriate comment history encoding and task modeling. △ Less

Submitted 13 February, 2022; originally announced February 2022.

Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022

arXiv:2201.12599 [pdf, other]

Semantic-assisted image compression

Authors: Qizheng Sun, Caili Guo, Yang Yang, Jiujiu Chen, Xijun Xue

Abstract: Conventional image compression methods typically aim at pixel-level consistency while ignoring the performance of downstream AI tasks.To solve this problem, this paper proposes a Semantic-Assisted Image Compression method (SAIC), which can maintain semantic-level consistency to enable high performance of downstream AI tasks.To this end, we train the compression network using semantic-level loss fu… ▽ More Conventional image compression methods typically aim at pixel-level consistency while ignoring the performance of downstream AI tasks.To solve this problem, this paper proposes a Semantic-Assisted Image Compression method (SAIC), which can maintain semantic-level consistency to enable high performance of downstream AI tasks.To this end, we train the compression network using semantic-level loss function. In particular, semantic-level loss is measured using gradient-based semantic weights mechanism (GSW). GSW directly consider downstream AI tasks' perceptual results. Then, this paper proposes a semantic-level distortion evaluation metric to quantify the amount of semantic information retained during the compression process. Experimental results show that the proposed SAIC method can retain more semantic-level information and achieve better performance of downstream AI tasks compared to the traditional deep learning-based method and the advanced perceptual method at the same compression ratio. △ Less

Submitted 29 January, 2022; originally announced January 2022.

arXiv:2201.10929 [pdf, other]

Task-Oriented Image Semantic Communication Based on Rate-Distortion Theory

Authors: Fangfang Liu, Wanjie Tong, Yang Yang, Zhengfen Sun, Caili Guo

Abstract: Task-oriented image semantic communication is a new communication paradigm, which aims to transmit semantics for artificial intelligent (AI) tasks while ignoring the reconstruction quality of the images. However, in some applications, such as autonomous driving, both image reconstruction quality and the performance of the followed AI tasks must be simultaneously considered. To tackle this challeng… ▽ More Task-oriented image semantic communication is a new communication paradigm, which aims to transmit semantics for artificial intelligent (AI) tasks while ignoring the reconstruction quality of the images. However, in some applications, such as autonomous driving, both image reconstruction quality and the performance of the followed AI tasks must be simultaneously considered. To tackle this challenge, this paper proposes a task-oriented semantic communication scheme with semantic reconstruction (TOSC-SR). Its main goal is to simultaneously minimize pixel-level and task-relevant semantic-level distortion during communications under a certain rate, which formulates a new rate-distortion optimization problem. To successfully measure the loss at the semantic level, a new form of semantic distortion measured by the mutual information between the semantic-reconstructed images and the task labels is proposed. Then, we derive an analytical solution for the formulated problem, where the self-consistent equations of the problem are obtained to determine the optimal mapping of the source and the semantic-reconstructed images. To implement TOSC-SR, we further obtain an extended form of rate-distortion form based on the variational approximation of mutual information, which is applicable to multiple AI tasks. Experimental results show that the proposed approach outperforms the traditional JPEG, JPEG2000, BPG, VVC-based image communication systems and deep learning based benchmarks in terms of image reconstruction quality, AI task performance, and multi-task generalization ability. △ Less

Submitted 1 December, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: 17 pages, 8 figures

arXiv:2201.10795 [pdf, other]

Bandwidth and Power Allocation for Task-Oriented SemanticCommunication

Authors: Chuanhong Liu, Caili Guo, Yang Yang, Jiujiu Chen

Abstract: Deep learning enabled semantic communication has been studied to improve communication efficiency while guaranteeing intelligent task performance. Different from conventional communications systems, the resource allocation in semantic communications no longer just pursues the bit transmission rate, but focuses on how to better compress and transmit semantic to complete subsequent intelligent tasks… ▽ More Deep learning enabled semantic communication has been studied to improve communication efficiency while guaranteeing intelligent task performance. Different from conventional communications systems, the resource allocation in semantic communications no longer just pursues the bit transmission rate, but focuses on how to better compress and transmit semantic to complete subsequent intelligent tasks. This paper aims to appropriately allocate the bandwidth and power for artificial intelligence (AI) task-oriented semantic communication and proposes a joint compressiom ratio and resource allocation (CRRA) algorithm. We first analyze the relationship between the AI task's performance and the semantic information. Then, to optimize the AI task's perfomance under resource constraints, a bandwidth and power allocation problem is formulated. The problem is first separated into two subproblems due to the non-convexity. The first subproblem is a compression ratio optimization problem with a given resource allocation scheme, which is solved by a enumeration algorithm. The second subproblem is to find the optimal resource allocation scheme, which is transformed into a convex problem by successive convex approximation method, and solved by a convex optimization method. The optimal semantic compression ratio and resource allocation scheme are obtained by iteratively solving these two subproblems. Simulation results show that the proposed algorithm can efficiently improve the AI task's performance by up to 30\% comprared with baselines. △ Less

Submitted 26 January, 2022; originally announced January 2022.

arXiv:2112.08133 [pdf]

doi 10.1016/j.bios.2021.113699

Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Authors: Shaowei Jiang, Chengfei Guo, Zichao Bian, Ruihai Wang, Jiakai Zhu, Pengming Song, Patrick Hu, Derek Hu, Zibang Zhang, Kazunori Hoshino, Bin Feng, Guoan Zheng

Abstract: Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a… ▽ More Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a large scale and with high spatiotemporal resolution. The reported device can be placed within a regular incubator or used as a standalone incubating unit for long-term microbial monitoring. For longitudinal study where massive data are acquired at sequential time points, we report a new temporal-similarity constraint to increase the temporal resolution of ptychographic reconstruction by 7-fold. With this strategy, the reported device achieves a centimeter-scale field of view, a half-pitch spatial resolution of 488 nm, and a temporal resolution of 15-second intervals. For the first time, we report the direct observation of bacterial growth in a 15-second interval by tracking the phase wraps of the recovered images, with high phase sensitivity like that in interferometric measurements. We also characterize cell growth via longitudinal dry mass measurement and perform rapid bacterial detection at low concentrations. For drug-screening application, we demonstrate proof-of-concept antibiotic susceptibility testing and perform single-cell analysis of antibiotic-induced filamentation. The combination of high phase sensitivity, high spatiotemporal resolution, and large field of view is unique among existing microscopy techniques. As a quantitative and miniaturized platform, it can improve studies with microorganisms and other biospecimens at resource-limited settings. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 18 pages, 6 figures

arXiv:2110.01989 [pdf]

doi 10.1364/OL.437832

High-throughput lensless whole slide imaging via continuous height-varying modulation of tilted sensor

Authors: Shaowei Jiang, Chengfei Guo, Patrick Hu, Derek Hu, Pengming Song, Tianbo Wang, Zichao Bian, Zibang Zhang, Guoan Zheng

Abstract: We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffrac… ▽ More We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffraction images for reconstruction. Since the axial distance between the specimen and the sensor varies at different lateral positions, laterally translating the specimen effectively introduces defocus multi-height measurements while eliminating axial scanning. Lateral translation further introduces sub-pixel shift for pixel super-resolution imaging and naturally expands the field of view for rapid whole slide imaging. We show that the equivalent height variation can be precisely estimated from the lateral shift of the specimen, thereby addressing the challenge of precise axial positioning in conventional multi-height phase retrieval. Using a sensor with a 1.67-micron pixel size, our low-cost and field-portable prototype can resolve 690-nm linewidth on the resolution target. We show that a whole slide image of a blood smear with a 120-mm^2 field of view can be acquired in 18 seconds. We also demonstrate accurate automatic white blood cell counting from the recovered image. The reported approach may provide a turnkey solution for addressing point-of-care- and telemedicine-related challenges. △ Less

Submitted 28 September, 2021; originally announced October 2021.

arXiv:2108.13249 [pdf, ps, other]

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

Authors: Yanfeng Wu, Chenkai Guo, Junan Zhao, Xiao Jin, Jing Xu

Abstract: The convolutional neural network (CNN) based approaches have shown great success for speaker verification (SV) tasks, where modeling long temporal context and reducing information loss of speaker characteristics are two important challenges significantly affecting the verification performance. Previous works have introduced dilated convolution and multi-scale aggregation methods to address above c… ▽ More The convolutional neural network (CNN) based approaches have shown great success for speaker verification (SV) tasks, where modeling long temporal context and reducing information loss of speaker characteristics are two important challenges significantly affecting the verification performance. Previous works have introduced dilated convolution and multi-scale aggregation methods to address above challenges. However, such methods are also hard to make full use of some valuable information, which make it difficult to substantially improve the verification performance. To address above issues, we construct a novel CNN-based architecture for SV, called RSKNet-MTSP, where a residual selective kernel block (RSKBlock) and a multiple time-scale statistics pooling (MTSP) module are first proposed. The RSKNet-MTSP can capture both long temporal context and neighbouring information, and gather more speaker-discriminative information from multi-scale features. In order to design a portable model for real applications with limited resources, we then present a lightweight version of RSKNet-MTSP, namely RSKNet-MTSP-L, which employs a combination technique associating the depthwise separable convolutions with low-rank factorization of weight matrices. Extensive experiments are conducted on two public SV datasets, VoxCeleb and Speaker in the Wild (SITW). The results demonstrate that 1) RSKNet-MTSP outperforms the state-of-the-art deep embedding architectures by at least 9%-26% in all test sets. 2) RSKNet-MTSP-L achieves competitive performance compared with baseline models with 17%-39% less network parameters. The ablation experiments further illustrate that our proposed approaches can achieve substantial improvement over prior methods. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: submitted to Neurocomputing

arXiv:2106.00610 [pdf, other]

Deep Learning for Depression Recognition with Audiovisual Cues: A Review

Authors: Lang He, Mingyue Niu, Prayag Tiwari, Pekka Marttinen, Rui Su, Jiewei Jiang, Chenguang Guo, Hongyu Wang, Songtao Ding, Zhongmin Wang, Wei Dang, Xiaoying Pan

Abstract: With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial express… ▽ More With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial expression between patients with depression and healthy individuals. Consequently, to improve current medical care, many scholars have used deep learning to extract a representation of depression cues in audio and video for automatic depression detection. To sort out and summarize these works, this review introduces the databases and describes objective markers for automatic depression estimation (ADE). Furthermore, we review the deep learning methods for automatic depression detection to extract the representation of depression from audio and video. Finally, this paper discusses challenges and promising directions related to automatic diagnosing of depression using deep learning technologies. △ Less

Submitted 27 May, 2021; originally announced June 2021.

arXiv:2105.09865 [pdf, ps, other]

Power-Efficient Wireless Streaming of Multi-Quality Tiled 360 VR Video in MIMO-OFDMA Systems

Authors: Chengjun Guo, Lingzhi Zhao, Ying Cui, Zhi Liu, Derrick Wing Kwan Ng

Abstract: In this paper, we study the optimal wireless streaming of a multi-quality tiled 360 virtual reality (VR) video from a multi-antenna server to multiple single-antenna users in a multiple-input multiple-output (MIMO)-orthogonal frequency division multiple access (OFDMA) system. In the scenario without user transcoding, we jointly optimize beamforming and subcarrier, transmission power, and rate allo… ▽ More In this paper, we study the optimal wireless streaming of a multi-quality tiled 360 virtual reality (VR) video from a multi-antenna server to multiple single-antenna users in a multiple-input multiple-output (MIMO)-orthogonal frequency division multiple access (OFDMA) system. In the scenario without user transcoding, we jointly optimize beamforming and subcarrier, transmission power, and rate allocation to minimize the total transmission power. This problem is a challenging mixed discretecontinuous optimization problem. We obtain a globally optimal solution for small multicast groups, an asymptotically optimal solution for a large antenna array, and a suboptimal solution for the general case. In the scenario with user transcoding, we jointly optimize the quality level selection, beamforming, and subcarrier, transmission power, and rate allocation to minimize the weighted sum of the average total transmission power and the transcoding power. This problem is a two-timescale mixed discrete-continuous optimization problem, which is even more challenging than the problem for the scenario without user transcoding. We obtain a globally optimal solution for small multicast groups, an asymptotically optimal solution for a large antenna array, and a low-complexity suboptimal solution for the general case. Finally, numerical results demonstrate the significant gains of proposed solutions over the existing solutions. significant gains of proposed solutions over the existing solutions. △ Less

Submitted 13 April, 2021; originally announced May 2021.

Comments: 15 pages, 4 figures, to appear in IEEE Trans. Wireless Commun. arXiv admin note: text overlap with arXiv:2104.06183

arXiv:2103.03444 [pdf, ps, other]

Optimization of User Selection and Bandwidth Allocation for Federated Learning in VLC/RF Systems

Authors: Chuanhong Liu, Caili Guo, Yang Yang, Mingzhe Chen, H. Vincent Poor, Shuguang Cui

Abstract: Limited radio frequency (RF) resources restrict the number of users that can participate in federated learning (FL) thus affecting FL convergence speed and performance. In this paper, we first introduce visible light communication (VLC) as a supplement to RF in FL and build a hybrid VLC/RF communication system, in which each indoor user can use both VLC and RF to transmit its FL model parameters.… ▽ More Limited radio frequency (RF) resources restrict the number of users that can participate in federated learning (FL) thus affecting FL convergence speed and performance. In this paper, we first introduce visible light communication (VLC) as a supplement to RF in FL and build a hybrid VLC/RF communication system, in which each indoor user can use both VLC and RF to transmit its FL model parameters. Then, the problem of user selection and bandwidth allocation is studied for FL implemented over a hybrid VLC/RF system aiming to optimize the FL performance. The problem is first separated into two subproblems. The first subproblem is a user selection problem with a given bandwidth allocation, which is solved by a traversal algorithm. The second subproblem is a bandwidth allocation problem with a given user selection, which is solved by a numerical method. The final user selection and bandwidth allocation are obtained by iteratively solving these two subproblems. Simulation results show that the proposed FL algorithm that efficiently uses VLC and RF for FL model transmission can improve the prediction accuracy by up to 10% compared with a conventional FL system using only RF. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: WCNC2021

arXiv:2102.09199 [pdf, other]

Minimizing false negative rate in melanoma detection and providing insight into the causes of classification

Authors: Ellák Somfai, Benjámin Baffy, Kristian Fenech, Changlu Guo, Rita Hosszú, Dorina Korózs, Fabrizio Nunnari, Marcell Pólik, Daniel Sonntag, Attila Ulbert, András Lőrincz

Abstract: Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result o… ▽ More Our goal is to bridge human and machine intelligence in melanoma detection. We develop a classification system exploiting a combination of visual pre-processing, deep learning, and ensembling for providing explanations to experts and to minimize false negative rate while maintaining high accuracy in melanoma detection. Source images are first automatically segmented using a U-net CNN. The result of the segmentation is then used to extract image sub-areas and specific parameters relevant in human evaluation, namely center, border, and asymmetry measures. These data are then processed by tailored neural networks which include structure searching algorithms. Partial results are then ensembled by a committee machine. Our evaluation on the largest skin lesion dataset which is publicly available today, ISIC-2019, shows improvement in all evaluated metrics over a baseline using the original images only. We also showed that indicative scores computed by the feature classifiers can provide useful insight into the various features on which the decision can be based. △ Less

Submitted 9 March, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: supplementary materials included

ACM Class: I.4.9; J.3

arXiv:2102.03853 [pdf]

doi 10.1016/j.optcom.2021.127031

Bypassing the resolution limit of diffractive zone plate optics via rotational Fourier ptychography

Authors: Chengfei Guo, Shaowei Jiang, Pengming Song, Zichao Bian, Tianbo Wang, Pouria Hoveida, Xiaopeng Shao

Abstract: Diffractive zone plate optics uses a thin micro-structure pattern to alter the propagation direction of the incoming light wave. It has found important applications in extreme-wavelength imaging where conventional refractive lenses do not exist. The resolution limit of zone plate optics is determined by the smallest width of the outermost zone. In order to improve the achievable resolution, signif… ▽ More Diffractive zone plate optics uses a thin micro-structure pattern to alter the propagation direction of the incoming light wave. It has found important applications in extreme-wavelength imaging where conventional refractive lenses do not exist. The resolution limit of zone plate optics is determined by the smallest width of the outermost zone. In order to improve the achievable resolution, significant efforts have been devoted to the fabrication of very small zone width with ultrahigh placement accuracy. Here, we report the use of a diffractometer setup for bypassing the resolution limit of zone plate optics. In our prototype, we mounted the sample on two rotation stages and used a low-resolution binary zone plate to relay the sample plane to the detector. We then performed both in-plane and out-of-plane sample rotations and captured the corresponding raw images. The captured images were processed using a Fourier ptychographic procedure for resolution improvement. The final achievable resolution of the reported setup is not determined by the smallest width structures of the employed binary zone plate; instead, it is determined by the maximum angle of the out-of-plane rotation. In our experiment, we demonstrated 8-fold resolution improvement using both a resolution target and a titanium dioxide sample. The reported approach may be able to bypass the fabrication challenge of diffractive elements and open up new avenues for microscopy with extreme wavelengths. △ Less

Submitted 7 February, 2021; originally announced February 2021.

arXiv:2009.13379 [pdf, other]

A Content Driven Resource Allocation Scheme for Video Transmission in Vehicular Networks

Authors: Jiujiu Chen, Chunyan Feng, Caili Guo, Xu Zhu

Abstract: With the growing computer vision applications, lots of videos are transmitted for content analysis, the way to allocate resources can affect the performance of video content analysis. For this purpose, the traditional resource allocation schemes for video transmission in vehicular networks, such as qualityof-service (QoS) based or quality-of-experience (QoE) based schemes, are no longer optimal an… ▽ More With the growing computer vision applications, lots of videos are transmitted for content analysis, the way to allocate resources can affect the performance of video content analysis. For this purpose, the traditional resource allocation schemes for video transmission in vehicular networks, such as qualityof-service (QoS) based or quality-of-experience (QoE) based schemes, are no longer optimal anymore. In this paper, we propose an efficient content driven resource allocation scheme for vehicles equipped with cameras under bandwidth constraints in order to improve the video content analysis performance. The proposed resource allocation scheme is based on maximizing the quality-of-content (QoC), which is related to the content analysis performance. A QoC based assessment model is first proposed. Then, the resource allocation problem is converted to a solvable convex optimization problem. Finally, simulation results show the better performance of our proposed scheme than the existing schemes like QoE based schemes. △ Less

Submitted 28 September, 2020; originally announced September 2020.

arXiv:2009.08829 [pdf, other]

Residual Spatial Attention Network for Retinal Vessel Segmentation

Authors: Changlu Guo, Márton Szemenyei, Yugen Yi, Wei Zhou, Haodong Bian

Abstract: Reliable segmentation of retinal vessels can be employed as a way of monitoring and diagnosing certain diseases, such as diabetes and hypertension, as they affect the retinal vascular structure. In this work, we propose the Residual Spatial Attention Network (RSAN) for retinal vessel segmentation. RSAN employs a modified residual block structure that integrates DropBlock, which can not only be uti… ▽ More Reliable segmentation of retinal vessels can be employed as a way of monitoring and diagnosing certain diseases, such as diabetes and hypertension, as they affect the retinal vascular structure. In this work, we propose the Residual Spatial Attention Network (RSAN) for retinal vessel segmentation. RSAN employs a modified residual block structure that integrates DropBlock, which can not only be utilized to construct deep networks to extract more complex vascular features, but can also effectively alleviate the overfitting. Moreover, in order to further improve the representation capability of the network, based on this modified residual block, we introduce the spatial attention (SA) and propose the Residual Spatial Attention Block (RSAB) to build RSAN. We adopt the public DRIVE and CHASE DB1 color fundus image datasets to evaluate the proposed RSAN. Experiments show that the modified residual structure and the spatial attention are effective in this work, and our proposed RSAN achieves the state-of-the-art performance. △ Less

Submitted 18 September, 2020; originally announced September 2020.

Comments: ICONIP 2020

arXiv:2008.06916 [pdf]

doi 10.1364/OL.400244

Virtual brightfield and fluorescence staining for Fourier ptychography via unsupervised deep learning

Authors: Ruihai Wang, Pengming Song, Shaowei Jiang, Chenggang Yan, Jiakai Zhu, Chengfei Guo, Zichao Bian, Tianbo Wang, Guoan Zheng

Abstract: Fourier ptychographic microscopy (FPM) is a computational approach geared towards creating high-resolution and large field-of-view images without mechanical scanning. To acquire color images of histology slides, it often requires sequential acquisitions with red, green, and blue illuminations. The color reconstructions often suffer from coherent artifacts that are not presented in regular incohere… ▽ More Fourier ptychographic microscopy (FPM) is a computational approach geared towards creating high-resolution and large field-of-view images without mechanical scanning. To acquire color images of histology slides, it often requires sequential acquisitions with red, green, and blue illuminations. The color reconstructions often suffer from coherent artifacts that are not presented in regular incoherent microscopy images. As a result, it remains a challenge to employ FPM for digital pathology applications, where resolution and color accuracy are of critical importance. Here we report a deep learning approach for performing unsupervised image-to-image translation of FPM reconstructions. A cycle-consistent adversarial network with multiscale structure similarity loss is trained to perform virtual brightfield and fluorescence staining of the recovered FPM images. In the training stage, we feed the network with two sets of unpaired images: 1) monochromatic FPM recovery, and 2) color or fluorescence images captured using a regular microscope. In the inference stage, the network takes the FPM input and outputs a virtually stained image with reduced coherent artifacts and improved image quality. We test the approach on various samples with different staining protocols. High-quality color and fluorescence reconstructions validate its effectiveness. △ Less

Submitted 16 August, 2020; originally announced August 2020.

arXiv:2006.09894 [pdf, ps, other]

doi 10.1364/OE.410502

Power Efficient LED Placement Algorithm for Indoor Visible Light Communication

Authors: Yang Yang, Zhiyu Zhu, Caili Guo, Chunyan Feng

Abstract: This paper proposes a novel power-efficient light-emitting diode (LED) placement algorithm for indoor visible light communication (VLC). In the considered model, the LEDs can be designedly placed for high power efficiency while satisfying the indoor communication and illumination requirements. This design problem is formulated as a power minimization problem under both communication and illuminati… ▽ More This paper proposes a novel power-efficient light-emitting diode (LED) placement algorithm for indoor visible light communication (VLC). In the considered model, the LEDs can be designedly placed for high power efficiency while satisfying the indoor communication and illumination requirements. This design problem is formulated as a power minimization problem under both communication and illumination level constraints. Due to the interactions among LEDs and the illumination uniformity constraint, the formulated problem is complex and non-convex. To solve the problem, we first transform the complex uniformity constraint into a series of linear constraints. Then an iterative algorithm is proposed to decouple the interactions among LEDs and transforms the original problem into a series of convex sub-problems. Then, we use Lagrange dual method to solve the sub-problem and obtain a convergent solution of the original problem. Simulation results show that the proposed LED placement algorithm can harvest 22.86% power consumption gain when compared with the baseline scheme with centrally placed LEDs. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2006.08610 [pdf]

Autofocusing technologies for whole slide imaging and automated microscopy

Authors: Zichao Bian, Chengfei Guo, Shaowei Jiang, Jiakai Zhu, Ruihai Wang, Pengming Song, Zibang Zhang, Kazunori Hoshino, Guoan Zheng

Abstract: Whole slide imaging (WSI) has moved digital pathology closer to diagnostic practice in recent years. Due to the inherent tissue topography variability, accurate autofocusing remains a critical challenge for WSI and automated microscopy systems. The traditional focus map surveying method is limited in its ability to acquire a high degree of focus points while still maintaining high throughput. Real… ▽ More Whole slide imaging (WSI) has moved digital pathology closer to diagnostic practice in recent years. Due to the inherent tissue topography variability, accurate autofocusing remains a critical challenge for WSI and automated microscopy systems. The traditional focus map surveying method is limited in its ability to acquire a high degree of focus points while still maintaining high throughput. Real-time approaches decouple image acquisition from focusing, thus allowing for rapid scanning while maintaining continuous accurate focus. This work reviews the traditional focus map approach and discusses the choice of focus measure for focal plane determination. It also discusses various real-time autofocusing approaches including reflective-based triangulation, confocal pinhole detection, low-coherence interferometry, tilted sensor approach, independent dual sensor scanning, beam splitter array, phase detection, dual-LED illumination, and deep-learning approaches. The technical concepts, merits, and limitations of these methods are explained and compared to those of a traditional WSI system. This review may provide new insights for the development of high-throughput automated microscopy imaging systems that can be made broadly available and utilizable without loss of capacity. △ Less

Submitted 15 August, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:2006.08114 [pdf]

doi 10.1364/OL.394923

Super-resolved multispectral lensless microscopy via angle-tilted, wavelength-multiplexed ptychographic modulation

Authors: Pengming Song, Ruihai Wang, Jiakai Zhu, Tianbo Wang, Zichao Bian, Zibang Zhang, Kazunori Hoshino, Michael Murphy, Shaowei Jiang, Chengfei Guo, Guoan Zheng

Abstract: We report an angle-tilted, wavelength-multiplexed ptychographic modulation approach for multispectral lensless on-chip microscopy. In this approach, we illuminate the specimen with lights at 5 wavelengths simultaneously. A prism is added at the illumination path for spectral dispersion. Lightwaves at different wavelengths, thus, hit the specimen at slightly different incident angles, breaking the… ▽ More We report an angle-tilted, wavelength-multiplexed ptychographic modulation approach for multispectral lensless on-chip microscopy. In this approach, we illuminate the specimen with lights at 5 wavelengths simultaneously. A prism is added at the illumination path for spectral dispersion. Lightwaves at different wavelengths, thus, hit the specimen at slightly different incident angles, breaking the ambiguities in mixed state ptychographic reconstruction. At the detection path, we place a thin diffuser in-between the specimen and the monochromatic image sensor for encoding the spectral information into 2D intensity measurements. By scanning the sample to different x-y positions, we acquire a sequence of monochromatic images for reconstructing the 5 complex object profiles at the 5 wavelengths. An up-sampling procedure is integrated into the recovery process to bypass the resolution limit imposed by the imager pixel size. We demonstrate a half-pitch resolution of 0.55 microns using an image sensor with 1.85-micron pixel size. We also demonstrate quantitative and high-quality multispectral reconstructions of stained tissue sections for digital pathology applications. △ Less

Submitted 14 June, 2020; originally announced June 2020.

arXiv:2005.11162 [pdf, ps, other]

doi 10.1364/OE.400992

A Novel Received Signal Strength Assisted Perspective-three-Point Algorithm for Indoor Visible Light Positioning

Authors: Lin Bai, Yang Yang, Chunyan Feng, Caili Guo

Abstract: In this paper, a received signal strength assisted Perspective-three-Point positioning algorithm (R-P3P) is proposed for visible light positioning (VLP) systems. The basic idea of R-P3P is to joint visual and strength information to estimate the receiver position using 3 LEDs regardless of the LEDs' orientations. R-P3P first utilizes visual information captured by the camera to estimate the incide… ▽ More In this paper, a received signal strength assisted Perspective-three-Point positioning algorithm (R-P3P) is proposed for visible light positioning (VLP) systems. The basic idea of R-P3P is to joint visual and strength information to estimate the receiver position using 3 LEDs regardless of the LEDs' orientations. R-P3P first utilizes visual information captured by the camera to estimate the incidence angles of visible lights. Then, R-P3P calculates the candidate distances between the LEDs and the receiver based on the law of cosines and the Wu-Ritt's zero decomposition method. Based on the incidence angles, the candidate distances and the physical characteristics of the LEDs, R-P3P can select the exact distances from all the candidate distances. Finally, the linear least square (LLS) method is employed to estimate the position of the receiver. Due to the combination of visual and strength information of visible light signals, R-P3P can achieve high accuracy using 3 LEDs regardless of the LEDs' orientations. Simulation results show that R-P3P can achieve positioning accuracy within 10 cm over 70% indoor area with low complexity regardless of LEDs orientations. △ Less

Submitted 21 May, 2020; originally announced May 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2004.06294

arXiv:2004.12568 [pdf, other]

doi 10.1109/MAP.2020.3043445

Attenuation of Several Common Building Materials in Millimeter-Wave Frequency Bands: 28, 73 and 91 GHz

Authors: Nozhan Hosseini, Mahfuza Khatun, Changyu Guo, Kairui Du, Ozgur Ozdemir, David W. Matolak, Ismail Guvenc, Hani Mehrpouyan

Abstract: Future cellular systems will make use of millimeter wave (mmWave) frequency bands. Many users in these bands are located indoors, i.e., inside buildings, homes, and offices. Typical building material attenuations in these high frequency ranges are of interest for link budget calculations. In this paper, we report on a collaborative measurement campaign to find the attenuation of several typical bu… ▽ More Future cellular systems will make use of millimeter wave (mmWave) frequency bands. Many users in these bands are located indoors, i.e., inside buildings, homes, and offices. Typical building material attenuations in these high frequency ranges are of interest for link budget calculations. In this paper, we report on a collaborative measurement campaign to find the attenuation of several typical building materials in three potential mmWave bands (28, 73, 91 GHz). Using directional antennas, we took multiple measurements at multiple locations using narrow-band and wide-band signals, and averaged out residual small-scale fading effects. Materials include clear glass, drywall (plasterboard), plywood, acoustic ceiling tile, and cinder blocks. Specific attenuations range from approximately 0.5 dB/cm for ceiling tile at 28 GHz to approximately 19 dB/cm for clear glass at 91 GHz. △ Less

Submitted 26 April, 2020; originally announced April 2020.

Comments: keywords: mm-wave; attenuation

Showing 1–50 of 75 results for author: Guo, C