Search | arXiv e-print repository

More-than-Moore Microacoustics: A Scalable Fabrication Process for Suspended Lamb Wave Resonators

Authors: Marco Liffredo, Federico Peretti, Nan Xu, Silvan Stettler, Luis Guillermo Villanueva

Abstract: Deep Ultraviolet (DUV) Photolithography is currently used to fabricate mass-scale integrated circuits (ICs). Its high throughput and resolution could benefit large-scale RF MEMS production for the telecommunication market. We present a process flow to fabricate suspended acoustic resonators using DUV Photolithography. This method allows for scalable production of resonators with critical dimension… ▽ More Deep Ultraviolet (DUV) Photolithography is currently used to fabricate mass-scale integrated circuits (ICs). Its high throughput and resolution could benefit large-scale RF MEMS production for the telecommunication market. We present a process flow to fabricate suspended acoustic resonators using DUV Photolithography. This method allows for scalable production of resonators with critical dimensions of 250 nm and alignment accuracy of less than 100 nm. We show how photoresists and anti-reflective coatings integrate with the process, help with deposition quality and resolution, and how Ion Beam Etching allows for vertical sidewalls of the resonators. We measure resonance frequencies (fr) up to 7.5 GHz and electromechanical couplings up to 8%, and we investigate the uniformity of this process by analyzing the deviation of fs over the wafer surface for four main resonance modes. We show that the deviation of the S0 mode can be kept below 1%. These results indicate the suitability of this process for quick scale-up of Lamb wave resonator technology, bridging the gap from research to industry. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: Submitted to IEEE-TUFFC on 21.07.2024

arXiv:2407.07372 [pdf, other]

Trustworthy Contrast-enhanced Brain MRI Synthesis

Authors: Jiyao Liu, Yuxin Li, Shangqi Gao, Yuncheng Zhou, Xin Gao, Ningsheng Xu, Xiao-Yong Zhang, Xiahai Zhuang

Abstract: Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking inte… ▽ More Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking interpretability on predictions. To address the above challenges, this paper introduces TrustI2I, a novel trustworthy method that reformulates multi-to-one medical image translation problem as a multimodal regression problem, aiming to build an uncertainty-aware and reliable system. Specifically, our method leverages deep evidential regression to estimate prediction uncertainties and employs an explicit intermediate and late fusion strategy based on the Mixture of Normal Inverse Gamma (MoNIG) distribution, enhancing both synthesis quality and interpretability. Additionally, we incorporate uncertainty calibration to improve the reliability of uncertainty. Validation on the BraTS2018 dataset demonstrates that our approach surpasses current methods, producing higher-quality images with rational uncertainty estimation. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 11 pages, 3 figures

arXiv:2403.03736 [pdf, other]

Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

Authors: Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma

Abstract: Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivat… ▽ More Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivated by the capabilities of predictive language models for lossless compression, this paper introduces a novel Unified Image Generation-Compression (UIGC) paradigm, merging the processes of generation and compression. A key feature of the UIGC framework is the adoption of vector-quantized (VQ) image models for tokenization, alongside a multi-stage transformer designed to exploit spatial contextual information for modeling the prior distribution. As such, the dual-purpose framework effectively utilizes the learned prior for entropy estimation and assists in the regeneration of lost tokens. Extensive experiments demonstrate the superiority of the proposed UIGC framework over existing codecs in perceptual quality and human perception, particularly in ultra-low bitrate scenarios (<=0.03 bpp), pioneering a new direction in generative compression. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2401.17575 [pdf, other]

Can We Improve Channel Reciprocity via Loop-back Compensation for RIS-assisted Physical Layer Key Generation

Authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao, Na Li, Pengxuan Mao, Tianyuan Yang

Abstract: Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware… ▽ More Reconfigurable intelligent surface (RIS) facilitates the extraction of unpredictable channel features for physical layer key generation (PKG), securing communications among legitimate users with symmetric keys. Previous works have demonstrated that channel reciprocity plays a crucial role in generating symmetric keys in PKG systems, whereas, in reality, reciprocity is greatly affected by hardware interference and RIS-based jamming attacks. This motivates us to propose LoCKey, a novel approach that aims to improve channel reciprocity by mitigating interferences and attacks with a loop-back compensation scheme, thus maximizing the secrecy performance of the PKG system. Specifically, our proposed LoCKey is capable of effectively compensating for the CSI non-reciprocity by the combination of transmit-back signal value and error minimization module. Firstly, we introduce the entire flowchart of our method and provide an in-depth discussion of each step. Following that, we delve into a theoretical analysis of the performance optimizations when our LoCKey is applied for CSI reciprocity enhancement. Finally, we conduct experiments to verify the effectiveness of the proposed LoCKey in improving channel reciprocity under various interferences for RIS-assisted wireless communications. The results demonstrate a significant improvement in both the rate of key generation assisted by the RIS and the consistency of the generated keys, showing great potential for the practical deployment of our LoCKey in future wireless systems. △ Less

Submitted 13 August, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Accepted by ICC 2024

arXiv:2309.03815 [pdf, other]

T2IW: Joint Text to Image & Watermark Generation

Authors: An-An Liu, Guokai Zhang, Yuting Su, Ning Xu, Yongdong Zhang, Lanjun Wang

Abstract: Recent developments in text-conditioned image generative models have revolutionized the production of realistic results. Unfortunately, this has also led to an increase in privacy violations and the spread of false information, which requires the need for traceability, privacy protection, and other security measures. However, existing text-to-image paradigms lack the technical capabilities to link… ▽ More Recent developments in text-conditioned image generative models have revolutionized the production of realistic results. Unfortunately, this has also led to an increase in privacy violations and the spread of false information, which requires the need for traceability, privacy protection, and other security measures. However, existing text-to-image paradigms lack the technical capabilities to link traceable messages with image generation. In this study, we introduce a novel task for the joint generation of text to image and watermark (T2IW). This T2IW scheme ensures minimal damage to image quality when generating a compound image by forcing the semantic feature and the watermark signal to be compatible in pixels. Additionally, by utilizing principles from Shannon information theory and non-cooperative game theory, we are able to separate the revealed image and the revealed watermark from the compound image. Furthermore, we strengthen the watermark robustness of our approach by subjecting the compound image to various post-processing attacks, with minimal pixel distortion observed in the revealed watermark. Extensive experiments have demonstrated remarkable achievements in image quality, watermark invisibility, and watermark robustness, supported by our proposed set of evaluation metrics. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.02835 [pdf]

A flexible and accurate total variation and cascaded denoisers-based image reconstruction algorithm for hyperspectrally compressed ultrafast photography

Authors: Zihan Guo, Jiali Yao, Dalong Qi, Pengpeng Ding, Chengzhi Jin, Ning Xu, Zhiling Zhang, Yunhua Yao, Lianzhong Deng, Zhiyong Wang, Zhenrong Sun, Shian Zhang

Abstract: Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hun… ▽ More Hyperspectrally compressed ultrafast photography (HCUP) based on compressed sensing and the time- and spectrum-to-space mappings can simultaneously realize the temporal and spectral imaging of non-repeatable or difficult-to-repeat transient events passively in a single exposure. It possesses an incredibly high frame rate of tens of trillions of frames per second and a sequence depth of several hundred, and plays a revolutionary role in single-shot ultrafast optical imaging. However, due to the ultra-high data compression ratio induced by the extremely large sequence depth as well as the limited fidelities of traditional reconstruction algorithms over the reconstruction process, HCUP suffers from a poor image reconstruction quality and fails to capture fine structures in complex transient scenes. To overcome these restrictions, we propose a flexible image reconstruction algorithm based on the total variation (TV) and cascaded denoisers (CD) for HCUP, named the TV-CD algorithm. It applies the TV denoising model cascaded with several advanced deep learning-based denoising models in the iterative plug-and-play alternating direction method of multipliers framework, which can preserve the image smoothness while utilizing the deep denoising networks to obtain more priori, and thus solving the common sparsity representation problem in local similarity and motion compensation. Both simulation and experimental results show that the proposed TV-CD algorithm can effectively improve the image reconstruction accuracy and quality of HCUP, and further promote the practical applications of HCUP in capturing high-dimensional complex physical, chemical and biological ultrafast optical scenes. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 25 pages, 5 figures and 1 table

arXiv:2309.02653 [pdf, other]

Passive Eavesdropping Can Significantly Slow Down RIS-Assisted Secret Key Generation

Authors: Ningya Xu, Guoshun Nan, Xiaofeng Tao

Abstract: Reconfigurable Intelligent Surface (RIS) assisted physical layer key generation has shown great potential to secure wireless communications by smartly controlling signals such as phase and amplitude. However, previous studies mainly focus on RIS adjustment under ideal conditions, while the correlation between the eavesdropping channel and the legitimate channel, a more practical setting in the rea… ▽ More Reconfigurable Intelligent Surface (RIS) assisted physical layer key generation has shown great potential to secure wireless communications by smartly controlling signals such as phase and amplitude. However, previous studies mainly focus on RIS adjustment under ideal conditions, while the correlation between the eavesdropping channel and the legitimate channel, a more practical setting in the real world, is still largely under-explored for the key generation. To fill this gap, this paper aims to maximize the RIS-assisted physical-layer secret key generation by optimizing the RIS units switching under the eavesdropping channel. Firstly, we theoretically show that passive eavesdropping significantly reduces RIS-assisted secret key generation. Keeping this in mind, we then introduce a mathematical formulation to maximize the key generation rate and provide a step-by-step analysis. Extensive experiments show the effectiveness of our method in benefiting the secret key capacity under the eavesdropping channel. We also observe that the key randomness, and unmatched key rate, two metrics that measure the secret key quality, are also significantly improved, potentially paving the way to RIS-assisted key generation in real-world scenarios. △ Less

Submitted 14 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by Globecom 2023

arXiv:2203.16850 [pdf, other]

Revisiting Document Image Dewarping by Grid Regularization

Authors: Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia

Abstract: This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization. Instead of designing a better neural network to approximate the optical flow fields between the inputs and outputs, we pursue the best readability by taking the text lines and the document boundaries into account from a constrained optimizat… ▽ More This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization. Instead of designing a better neural network to approximate the optical flow fields between the inputs and outputs, we pursue the best readability by taking the text lines and the document boundaries into account from a constrained optimization perspective. Specifically, our proposed method first learns the boundary points and the pixels in the text lines and then follows the most simple observation that the boundaries and text lines in both horizontal and vertical directions should be kept after dewarping to introduce a novel grid regularization scheme. To obtain the final forward mapping for dewarping, we solve an optimization problem with our proposed grid regularization. The experiments comprehensively demonstrate that our proposed approach outperforms the prior arts by large margins in terms of readability (with the metrics of Character Errors Rate and the Edit Distance) while maintaining the best image quality on the publicly-available DocUNet benchmark. △ Less

Submitted 31 March, 2022; originally announced March 2022.

arXiv:2203.03619 [pdf, other]

Adaptive Cross-Layer Attention for Image Restoration

Authors: Yancheng Wang, Ning Xu, Yingzhen Yang

Abstract: Non-local attention module has been proven to be crucial for image restoration. Conventional non-local attention processes features of each layer separately, so it risks missing correlation between features among different layers. To address this problem, we aim to design attention modules that aggregate information from different layers. Instead of finding correlated key pixels within the same la… ▽ More Non-local attention module has been proven to be crucial for image restoration. Conventional non-local attention processes features of each layer separately, so it risks missing correlation between features among different layers. To address this problem, we aim to design attention modules that aggregate information from different layers. Instead of finding correlated key pixels within the same layer, each query pixel is encouraged to attend to key pixels at multiple previous layers of the network. In order to efficiently embed such attention design into neural network backbones, we propose a novel Adaptive Cross-Layer Attention (ACLA) module. Two adaptive designs are proposed for ACLA: (1) adaptively selecting the keys for non-local attention at each layer; (2) automatically searching for the insertion locations for ACLA modules. By these two adaptive designs, ACLA dynamically selects a flexible number of keys to be aggregated for non-local attention at previous layer while maintaining a compact neural network with compelling performance. Extensive experiments on image restoration tasks, including single image super-resolution, image denoising, image demosaicing, and image compression artifacts reduction, validate the effectiveness and efficiency of ACLA. The code of ACLA is available at \url{https://github.com/SDL-ASU/ACLA}. △ Less

Submitted 18 April, 2023; v1 submitted 4 March, 2022; originally announced March 2022.

arXiv:2203.00918 [pdf]

Smart Tracking Tray System for A Smart and Sustainable Wet Lab Community

Authors: Nan Xu, Jingchen Li, Yue Yu, Yang Li, Jinglei Yang

Abstract: The laboratories and research institutes are the major places for cutting-edge scientific exploration. Hundreds of millions of research papers were formed from front-line labs. Behind this glorious achievement were unsustainable facts. More and more human investment is required in innovative experimental design and analysis of results. However, the laboratory operating environment has not been sub… ▽ More The laboratories and research institutes are the major places for cutting-edge scientific exploration. Hundreds of millions of research papers were formed from front-line labs. Behind this glorious achievement were unsustainable facts. More and more human investment is required in innovative experimental design and analysis of results. However, the laboratory operating environment has not been subversively transformed for centuries. This abstract proposed a smart tracking system, consisting of IoT and Data Visualization technologies, to track the chemicals in an automatic and timely approach. Positive feedback has been collected from pilot tests in several labs. The system benefits various lab users in their daily work and improves their working efficiency. In the long run, it will play an essential role in promoting the efficient use of lab resources and achieving the goal of sustainable labs. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: This abstract contains 2 pages and 5 figures. It has been submitted and accepted by IEEE SUSTECH 2022 Student Poster Competition

arXiv:2202.08552 [pdf, other]

EBHI:A New Enteroscope Biopsy Histopathological H&E Image Dataset for Image Classification Evaluation

Authors: Weiming Hu, Chen Li, Xiaoyan Li, Md Mamunur Rahaman, Yong Zhang, Haoyuan Chen, Wanli Liu, Yudong Yao, Hongzan Sun, Ning Xu, Xinyu Huang, Marcin Grzegorze

Abstract: Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal c… ▽ More Background and purpose: Colorectal cancer has become the third most common cancer worldwide, accounting for approximately 10% of cancer patients. Early detection of the disease is important for the treatment of colorectal cancer patients. Histopathological examination is the gold standard for screening colorectal cancer. However, the current lack of histopathological image datasets of colorectal cancer, especially enteroscope biopsies, hinders the accurate evaluation of computer-aided diagnosis techniques. Methods: A new publicly available Enteroscope Biopsy Histopathological H&E Image Dataset (EBHI) is published in this paper. To demonstrate the effectiveness of the EBHI dataset, we have utilized several machine learning, convolutional neural networks and novel transformer-based classifiers for experimentation and evaluation, using an image with a magnification of 200x. Results: Experimental results show that the deep learning method performs well on the EBHI dataset. Traditional machine learning methods achieve maximum accuracy of 76.02% and deep learning method achieves a maximum accuracy of 95.37%. Conclusion: To the best of our knowledge, EBHI is the first publicly available colorectal histopathology enteroscope biopsy dataset with four magnifications and five types of images of tumor differentiation stages, totaling 5532 images. We believe that EBHI could attract researchers to explore new classification algorithms for the automated diagnosis of colorectal cancer, which could help physicians and patients in clinical settings. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2110.09223 [pdf, other]

Learning Models for Query by Vocal Percussion: A Comparative Study

Authors: Alejandro Delgado, SkoT McDonald, Ning Xu, Charalampos Saitis, Mark Sandler

Abstract: The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Thus, the automatic retrieval of drum sounds using vocal percussion can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. Here we explore different strategies to perform this type of query, making use of… ▽ More The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Thus, the automatic retrieval of drum sounds using vocal percussion can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. Here we explore different strategies to perform this type of query, making use of both traditional machine learning algorithms and recent deep learning techniques. The main hyperparameters from the models involved are carefully selected by feeding performance metrics to a grid search algorithm. We also look into several audio data augmentation techniques, which can potentially regularise deep learning models and improve generalisation. We compare the final performances in terms of effectiveness (classification accuracy), efficiency (computational speed), stability (performance consistency), and interpretability (decision patterns), and discuss the relevance of these results when it comes to the design of successful query-by-vocal-percussion systems. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: Published in proceedings of the International Computer Music Conference (ICMC) 2021

arXiv:2107.03642 [pdf]

Image restoration quality assessment based on regional differential information entropy

Authors: Zhiyu Wang, Jiayan Zhuang, Ningyuan Xu, Sichao Ye, Jiangjian Xiao, Chengbin Peng

Abstract: With the development of image recovery models,especially those based on adversarial and perceptual losses,the detailed texture portions of images are being recovered more naturally.However,these restored images are similar but not identical in detail texture to their reference images.With traditional image quality assessment methods,results with better subjective perceived quality often score lowe… ▽ More With the development of image recovery models,especially those based on adversarial and perceptual losses,the detailed texture portions of images are being recovered more naturally.However,these restored images are similar but not identical in detail texture to their reference images.With traditional image quality assessment methods,results with better subjective perceived quality often score lower in objective scoring.Assessment methods suffer from subjective and objective inconsistencies.This paper proposes a regional differential information entropy (RDIE) method for image quality assessment to address this problem.This approach allows better assessment of similar but not identical textural details and achieves good agreement with perceived quality.Neural networks are used to reshape the process of calculating information entropy,improving the speed and efficiency of the operation. Experiments conducted with this study image quality assessment dataset and the PIPAL dataset show that the proposed RDIE method yields a high degree of agreement with people average opinion scores compared to other image quality assessment metrics,proving that RDIE can better quantify the perceived quality of images. △ Less

Submitted 26 November, 2022; v1 submitted 8 July, 2021; originally announced July 2021.

Comments: 14 pages, 8 figures, 5 tables

arXiv:2106.08961 [pdf, other]

A Direct Slip Ratio Estimation Method based on an Intelligent Tire and Machine Learning

Authors: Nan Xu, Zepeng Tang, Hassan Askari, Jianfeng Zhou, Amir Khajepour

Abstract: Accurate estimation of the tire slip ratio is critical for vehicle safety, as it is necessary for vehicle control purposes. In this paper, an intelligent tire system is presented to develop a novel slip ratio estimation model using machine learning algorithms. The accelerations, generated by a triaxial accelerometer installed onto the inner liner of the tire, are varied when the tire rotates to up… ▽ More Accurate estimation of the tire slip ratio is critical for vehicle safety, as it is necessary for vehicle control purposes. In this paper, an intelligent tire system is presented to develop a novel slip ratio estimation model using machine learning algorithms. The accelerations, generated by a triaxial accelerometer installed onto the inner liner of the tire, are varied when the tire rotates to update the contact patch. Meanwhile, the slip ratio reference value can be measured by the MTS Flat-Trac tire test platform. Then, by analyzing the variation between the accelerations and slip ratio, highly useful features are discovered, which are especially promising for assessing vertical acceleration. For these features, machine learning (ML) algorithms are trained to build the slip ratio estimation model, in which the ML algorithms include artificial neural networks (ANNs), gradient boosting machines (GBMs), random forests (RFs), and support vector machines (SVMs). Finally, the estimated NRMS errors are evaluated using 10-fold cross-validation (CV). The proposed estimation model is able to estimate the slip ratio continuously and stably using only the acceleration from the intelligent tire system, and the estimated slip ratio range can reach 30%. The estimation results have high robustness to vehicle velocity and load, where the best NRMS errors can reach 4.88%. In summary, the present study with the fusion of an intelligent tire system and machine learning paves the way for the accurate estimation of the tire slip ratio under different driving conditions, which create new opportunities for autonomous vehicles, intelligent tires, and tire slip ratio estimation. △ Less

Submitted 22 January, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: 12 pages, 25 figures, 2 tables

arXiv:2106.02934 [pdf, other]

Lightweight Dual-channel Target Speaker Separation for Mobile Voice Communication

Authors: Yuanyuan Bao, Yanze Xu, Na Xu, Wenjing Yang, Hongfeng Li, Shicong Li, Yongtao Jia, Fei Xiang, Jincheng He, Ming Li

Abstract: Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channe… ▽ More Nowadays, there is a strong need to deploy the target speaker separation (TSS) model on mobile devices with a limitation of the model size and computational complexity. To better perform TSS for mobile voice communication, we first make a dual-channel dataset based on a specific scenario, LibriPhone. Specifically, to better mimic the real-case scenario, instead of simulating from the single-channel dataset, LibriPhone is made by simultaneously replaying pairs of utterances from LibriSpeech by two professional artificial heads and recording by two built-in microphones of the mobile. Then, we propose a lightweight time-frequency domain separation model, LSTM-Former, which is based on the LSTM framework with source-to-noise ratio (SI-SNR) loss. For the experiments on Libri-Phone, we explore the dual-channel LSTMFormer model and a single-channel version by a random single channel of Libri-Phone. Experimental result shows that the dual-channel LSTM-Former outperforms the single-channel LSTMFormer with relative 25% improvement. This work provides a feasible solution for the TSS task on mobile devices, playing back and recording multiple data sources in real application scenarios for getting dual-channel real data can assist the lightweight model to achieve higher performance. △ Less

Submitted 5 June, 2021; originally announced June 2021.

arXiv:2101.09401 [pdf]

Adaptively Sparse Regularization for Blind Image Restoration

Authors: Ningshan Xu

Abstract: Image quality is the basis of image communication and understanding tasks. Due to the blur and noise effects caused by imaging, transmission and other processes, the image quality is degraded. Blind image restoration is widely used to improve image quality, where the main goal is to faithfully estimate the blur kernel and the latent sharp image. In this study, based on experimental observation and… ▽ More Image quality is the basis of image communication and understanding tasks. Due to the blur and noise effects caused by imaging, transmission and other processes, the image quality is degraded. Blind image restoration is widely used to improve image quality, where the main goal is to faithfully estimate the blur kernel and the latent sharp image. In this study, based on experimental observation and research, an adaptively sparse regularized minimization method is originally proposed. The high-order gradients combine with low-order ones to form a hybrid regularization term, and an adaptive operator derived from the image entropy is introduced to maintain a good convergence. Extensive experiments were conducted on different blur kernels and images. Compared with existing state-of-the-art blind deblurring methods, our method demonstrates superiority on the recovery accuracy. △ Less

Submitted 22 January, 2021; originally announced January 2021.

Comments: 10 pages, 5 figures, 3 tables

arXiv:2010.06803 [pdf, other]

Tire Slip Angle Estimation based on the Intelligent Tire Technology

Authors: Nan Xu, Yanjun Huang, Hassan Askari, Zepeng Tang

Abstract: Tire slip angle is a vital parameter in tire/vehicle dynamics and control. This paper proposes an accurate estimation method by the fusion of intelligent tire technology and machine-learning techniques. The intelligent tire is equipped by MEMS accelerometers attached to its inner liner. First, we describe the intelligent tire system along with the implemented testing apparatus. Second, experimenta… ▽ More Tire slip angle is a vital parameter in tire/vehicle dynamics and control. This paper proposes an accurate estimation method by the fusion of intelligent tire technology and machine-learning techniques. The intelligent tire is equipped by MEMS accelerometers attached to its inner liner. First, we describe the intelligent tire system along with the implemented testing apparatus. Second, experimental results under different loading and velocity conditions are provided. Then, we show the procedure of data processing, which will be used for training three different machine learning techniques to estimate tire slip angles. The results show that the machine learning techniques, especially in frequency domain, can accurately estimate tire slip angles up to 10 degrees. More importantly, with the accurate tire slip angle estimation, all other states and parameters can be easily and precisely obtained, which is significant to vehicle advanced control, and thus this study has a high potential to obviously improve the vehicle safety especially in extreme maneuvers. △ Less

Submitted 15 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

arXiv:2010.06299 [pdf, other]

doi 10.1109/TITS.2020.3038155

Tire Force Estimation in Intelligent Tires Using Machine Learning

Authors: Nan Xu, Hassan Askari, Yanjun Huang, Jianfeng Zhou, Amir Khajepour

Abstract: The concept of intelligent tires has drawn attention of researchers in the areas of autonomous driving, advanced vehicle control, and artificial intelligence. The focus of this paper is on intelligent tires and the application of machine learning techniques to tire force estimation. We present an intelligent tire system with a tri-axial acceleration sensor, which is installed onto the inner liner… ▽ More The concept of intelligent tires has drawn attention of researchers in the areas of autonomous driving, advanced vehicle control, and artificial intelligence. The focus of this paper is on intelligent tires and the application of machine learning techniques to tire force estimation. We present an intelligent tire system with a tri-axial acceleration sensor, which is installed onto the inner liner of the tire, and Neural Network techniques for real-time processing of the sensor data. The accelerometer is capable of measuring the acceleration in x,y, and z directions. When the accelerometer enters the tire contact patch, it starts generating signals until it fully leaves it. Simultaneously, by using MTS Flat-Trac test platform, tire actual forces are measured. Signals generated by the accelerometer and MTS Flat-Trac testing system are used for training three different machine learning techniques with the purpose of online prediction of tire forces. It is shown that the developed intelligent tire in conjunction with machine learning is effective in accurate prediction of tire forces under different driving conditions. The results presented in this work will open a new avenue of research in the area of intelligent tires, vehicle systems, and tire force estimation. △ Less

Submitted 11 December, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

Comments: 10 pages,20 figures, Accepted for publication at IEEE Transactions on Intelligent Transportation Systems, the link of this work is https://ieeexplore.ieee.org/document/9284471

arXiv:2009.12463 [pdf, other]

Lateral Force Prediction using Gaussian Process Regression for Intelligent Tire Systems

Authors: Bruno Henrique Groenner Barbosa, Nan Xu, Hassan Askari, Amir Khajepour

Abstract: Understanding the dynamic behavior of tires and their interactions with road plays an important role in designing integrated vehicle control strategies. Accordingly, having access to reliable information about the tire-road interactions through tire embedded sensors is very demanding for developing enhanced vehicle control systems. Thus, the main objectives of the present research work are i. to a… ▽ More Understanding the dynamic behavior of tires and their interactions with road plays an important role in designing integrated vehicle control strategies. Accordingly, having access to reliable information about the tire-road interactions through tire embedded sensors is very demanding for developing enhanced vehicle control systems. Thus, the main objectives of the present research work are i. to analyze data from an experimental accelerometer-based intelligent tire acquired over a wide range of maneuvers, with different vertical loads, velocities, and high slip angles; and ii. to develop a lateral force predictor based on a machine learning tool, more specifically the Gaussian Process Regression (GPR) technique. It is delineated that the proposed intelligent tire system can provide reliable information about the tire-road interactions even in the case of high slip angles. Besides, the lateral forces model based on GPR can predict forces with acceptable accuracy and provide level of uncertainties that can be very useful for designing vehicle control strategies. △ Less

Submitted 25 September, 2020; originally announced September 2020.

arXiv:2009.11737 [pdf, other]

doi 10.1145/3356590.3356844

A New Dataset for Amateur Vocal Percussion Analysis

Authors: Alejandro Delgado, SKoT McDonald, Ning Xu, Mark Sandler

Abstract: The imitation of percussive instruments via the human voice is a natural way for us to communicate rhythmic ideas and, for this reason, it attracts the interest of music makers. Specifically, the automatic mapping of these vocal imitations to their emulated instruments would allow creators to realistically prototype rhythms in a faster way. The contribution of this study is two-fold. Firstly, a ne… ▽ More The imitation of percussive instruments via the human voice is a natural way for us to communicate rhythmic ideas and, for this reason, it attracts the interest of music makers. Specifically, the automatic mapping of these vocal imitations to their emulated instruments would allow creators to realistically prototype rhythms in a faster way. The contribution of this study is two-fold. Firstly, a new Amateur Vocal Percussion (AVP) dataset is introduced to investigate how people with little or no experience in beatboxing approach the task of vocal percussion. The end-goal of this analysis is that of helping mapping algorithms to better generalise between subjects and achieve higher performances. The dataset comprises a total of 9780 utterances recorded by 28 participants with fully annotated onsets and labels (kick drum, snare drum, closed hi-hat and opened hi-hat). Lastly, we conducted baseline experiments on audio onset detection with the recorded dataset, comparing the performance of four state-of-the-art algorithms in a vocal percussion context. △ Less

Submitted 24 September, 2020; originally announced September 2020.

arXiv:2007.09923 [pdf, other]

Incorporating Reinforced Adversarial Learning in Autoregressive Image Generation

Authors: Kenan E. Ak, Ning Xu, Zhe Lin, Yilin Wang

Abstract: Autoregressive models recently achieved comparable results versus state-of-the-art Generative Adversarial Networks (GANs) with the help of Vector Quantized Variational AutoEncoders (VQ-VAE). However, autoregressive models have several limitations such as exposure bias and their training objective does not guarantee visual fidelity. To address these limitations, we propose to use Reinforced Adversa… ▽ More Autoregressive models recently achieved comparable results versus state-of-the-art Generative Adversarial Networks (GANs) with the help of Vector Quantized Variational AutoEncoders (VQ-VAE). However, autoregressive models have several limitations such as exposure bias and their training objective does not guarantee visual fidelity. To address these limitations, we propose to use Reinforced Adversarial Learning (RAL) based on policy gradient optimization for autoregressive models. By applying RAL, we enable a similar process for training and testing to address the exposure bias issue. In addition, visual fidelity has been further optimized with adversarial loss inspired by their strong counterparts: GANs. Due to the slow sampling speed of autoregressive models, we propose to use partial generation for faster training. RAL also empowers the collaboration between different modules of the VQ-VAE framework. To our best knowledge, the proposed method is first to enable adversarial learning in autoregressive models for image generation. Experiments on synthetic and real-world datasets show improvements over the MLE trained models. The proposed method improves both negative log-likelihood (NLL) and Fréchet Inception Distance (FID), which indicates improvements in terms of visual quality and diversity. The proposed method achieves state-of-the-art results on Celeba for 64 $\times$ 64 image resolution, showing promise for large scale image generation. △ Less

Submitted 20 July, 2020; originally announced July 2020.

Comments: Accepted to ECCV 2020

arXiv:1911.02399 [pdf]

Dynamic Energy Beacon: An Adaptive and Cost-effective Energy Harvesting and Power Management System for A Better Life

Authors: Nan Xu, Xiao Qiu, Bo Xu, Junyuan Shu, Ka Ho Wan

Abstract: In this proposal, a cost-effective energy harvesting and management system have been proposed. The regular power keeps around 200 Watt while the peak power can reach 300 Watt. The cost of this system satisfies the requirements and budget for residents in the rural area and live off-grid. It could be a potential solution to the global energy crisis, particularly the billions of people living in sev… ▽ More In this proposal, a cost-effective energy harvesting and management system have been proposed. The regular power keeps around 200 Watt while the peak power can reach 300 Watt. The cost of this system satisfies the requirements and budget for residents in the rural area and live off-grid. It could be a potential solution to the global energy crisis, particularly the billions of people living in severe energy poverty. Also, it is an important renewable alternative to conventional fossil fuel electricity generation not only the cost of manufacturing is low and high efficiency, but also it is safe and eco-friendly. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: Entered the Pacific-Asia regional session of IEEE "Empower A Billion Lives" Contest in 2018 PEAC

arXiv:1909.13622 [pdf]

Investigation of On-Chip Inductors for Fully Integrated DC-DC Converters

Authors: Nan Xu, Wing-Hung Ki

Abstract: On-silicon inductors using a bulk 0.18 μm CMOS process have been designed. By shunting different metal layers in parallel, inductor values and quality factors were simulated. Selected inductors were then employed in two open-loop buck converters for comparison: the first used an off-chip discrete diode, and the second used an on-chip active diode. All inductors and converters were sent for fabrica… ▽ More On-silicon inductors using a bulk 0.18 μm CMOS process have been designed. By shunting different metal layers in parallel, inductor values and quality factors were simulated. Selected inductors were then employed in two open-loop buck converters for comparison: the first used an off-chip discrete diode, and the second used an on-chip active diode. All inductors and converters were sent for fabrication. The fabricated inductors were then measured to have values more than 250 nH over a wide range of frequency, validating the simulation results. The buck converters were switched at 30 MHz with a fixed duty ratio of 0.5, to generate an output voltage of 1.2 V from an input voltage of 2.4 V. The peak efficiency was measured to be 69.1% for a light load current of 12.9 mA. △ Less

Submitted 20 August, 2019; originally announced September 2019.

Comments: Submitted to IEEE APCCAS 2019

arXiv:1907.02477 [pdf, other]

Adversarial Attacks in Sound Event Classification

Authors: Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, Mark Sandler

Abstract: Adversarial attacks refer to a set of methods that perturb the input to a classification model in order to fool the classifier. In this paper we apply different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification. Four of the models use mel-spectrogram input and one model uses raw audio input. The models represent standard architectures… ▽ More Adversarial attacks refer to a set of methods that perturb the input to a classification model in order to fool the classifier. In this paper we apply different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification. Four of the models use mel-spectrogram input and one model uses raw audio input. The models represent standard architectures such as convolutional, recurrent and dense networks. The dataset used for training is the Freesound dataset released for task 2 of the DCASE 2018 challenge and the models used are from participants of the challenge who open sourced their code. Our experiments show that adversarial attacks can be generated with high confidence and low perturbation. In addition, we show that the adversarial attacks are very effective across the different models. △ Less

Submitted 15 August, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

Comments: Fixed Freesound data reference to FSDKaggle2018

arXiv:1903.04124 [pdf, other]

Singing voice conversion with non-parallel data

Authors: Xin Chen, Wei Chu, Jinxi Guo, Ning Xu

Abstract: Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (R… ▽ More Singing voice conversion is a task to convert a song sang by a source singer to the voice of a target singer. In this paper, we propose using a parallel data free, many-to-one voice conversion technique on singing voices. A phonetic posterior feature is first generated by decoding singing voices through a robust Automatic Speech Recognition Engine (ASR). Then, a trained Recurrent Neural Network (RNN) with a Deep Bidirectional Long Short Term Memory (DBLSTM) structure is used to model the mapping from person-independent content to the acoustic features of the target person. F0 and aperiodic are obtained through the original singing voice, and used with acoustic features to reconstruct the target singing voice through a vocoder. In the obtained singing voice, the targeted and sourced singers sound similar. To our knowledge, this is the first study that uses non parallel data to train a singing voice conversion system. Subjective evaluations demonstrate that the proposed method effectively converts singing voices. △ Less

Submitted 11 March, 2019; originally announced March 2019.

Comments: Accepted to MIPR 2019

arXiv:1810.07309 [pdf, other]

Deep neural network based i-vector mapping for speaker verification using short utterances

Authors: Jinxi Guo, Ning Xu, Kailun Qian, Yang Shi, Kaiyuan Xu, Yingnian Wu, Abeer Alwan

Abstract: Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model training methods for… ▽ More Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector based systems have become the standard in speaker verification applications, but they are less effective with short utterances. In this paper, we first compare two state-of-the-art universal background model training methods for i-vector modeling using full-length and short utterance evaluation tasks. The two methods are Gaussian mixture model (GMM) based and deep neural network (DNN) based methods. The results indicate that the I-vector_DNN system outperforms the I-vector_GMM system under various durations. However, the performances of both systems degrade significantly as the duration of the utterances decreases. To address this issue, we propose two novel nonlinear mapping methods which train DNN models to map the i-vectors extracted from short utterances to their corresponding long-utterance i-vectors. The mapped i-vector can restore missing information and reduce the variance of the original short-utterance i-vectors. The proposed methods both model the joint representation of short and long utterance i-vectors by using autoencoder. Experimental results using the NIST SRE 2010 dataset show that both methods provide significant improvement and result in a max of 28.43% relative improvement in Equal Error Rates from a baseline system, when using deep encoder with residual blocks and adding an additional phoneme vector. When further testing the best-validated models of SRE10 on the Speaker In The Wild dataset, the methods result in a 23.12% improvement on arbitrary-duration (1-5 s) short-utterance conditions. △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: Submitted to Speech Communication; under final review

arXiv:1803.05947 [pdf, ps, other]

Control Inversion: A Clustering-Based Method for Distributed Wide-Area Control of Power Systems

Authors: Nan Xue, Aranya Chakrabortty

Abstract: Wide-area control (WAC) has been shown to be an effective tool for damping low-frequency oscillations in power systems. In the current state of art, WAC is challenged by two main factors - namely, scalability of design and complexity of implementation. In this paper we present a control design called control inversion that bypasses both of these challenges using the idea of clustering. The basic p… ▽ More Wide-area control (WAC) has been shown to be an effective tool for damping low-frequency oscillations in power systems. In the current state of art, WAC is challenged by two main factors - namely, scalability of design and complexity of implementation. In this paper we present a control design called control inversion that bypasses both of these challenges using the idea of clustering. The basic philosophy behind this method is to project the original power system model into a lower-dimensional state-space through clustering and aggregation of generator states, and then designing an LQR controller for the lower-dimensional model. This controller is finally projected back to the original coordinates for wide-area implementation. The main problem is, therefore, posed as finding the projection which best matches the closed-loop performance of the WAC controller with that of a reference LQR controller for damping low-frequency oscillations. We verify the effectiveness of the proposed design using the NPCC 48-machine power system model. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Comments: Submitted to IEEE Transactions on Control of Network Systems

arXiv:1711.10067 [pdf, other]

WSNet: Compact and Efficient Networks Through Weight Sampling

Authors: Xiaojie Jin, Yingzhen Yang, Ning Xu, Jianchao Yang, Nebojsa Jojic, Jiashi Feng, Shuicheng Yan

Abstract: We present a new approach and a novel architecture, termed WSNet, for learning compact and efficient deep neural networks. Existing approaches conventionally learn full model parameters independently and then compress them via ad hoc processing such as model pruning or filter factorization. Alternatively, WSNet proposes learning model parameters by sampling from a compact set of learnable paramete… ▽ More We present a new approach and a novel architecture, termed WSNet, for learning compact and efficient deep neural networks. Existing approaches conventionally learn full model parameters independently and then compress them via ad hoc processing such as model pruning or filter factorization. Alternatively, WSNet proposes learning model parameters by sampling from a compact set of learnable parameters, which naturally enforces {parameter sharing} throughout the learning process. We demonstrate that such a novel weight sampling approach (and induced WSNet) promotes both weights and computation sharing favorably. By employing this method, we can more efficiently learn much smaller networks with competitive performance compared to baseline networks with equal numbers of convolution filters. Specifically, we consider learning compact and efficient 1D convolutional neural networks for audio classification. Extensive experiments on multiple audio classification datasets verify the effectiveness of WSNet. Combined with weight quantization, the resulted models are up to 180 times smaller and theoretically up to 16 times faster than the well-established baselines, without noticeable performance drop. △ Less

Submitted 22 May, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

Comments: To appear at ICML 2018

arXiv:1703.07393 [pdf, ps, other]

Hierarchical H2 Control of Large-Scale Network Dynamic Systems

Authors: Nan Xue, Aranya Chakrabortty

Abstract: Standard H2 optimal control of networked dynamic systems tend to become unscalable with network size. Structural constraints can be imposed on the design to counteract this problem albeit at the risk of making the solution non-convex. In this paper, we present a special class of structural constraints such that the H2 design satisfies a quadratic invariance condition, and therefore can be reformul… ▽ More Standard H2 optimal control of networked dynamic systems tend to become unscalable with network size. Structural constraints can be imposed on the design to counteract this problem albeit at the risk of making the solution non-convex. In this paper, we present a special class of structural constraints such that the H2 design satisfies a quadratic invariance condition, and therefore can be reformulated as a convex problem. This special class consists of structured and weighted projections of the input and output spaces. The choice of these projections can be optimized to match the closed-loop performance of the reformulated controller with that of the standard H2 controller. The advantage is that unlike the latter, the reformulated controller results in a hierarchical implementation which requires significantly lesser number of communication links, while also admitting model and controller reduction that helps the design to scale computationally. We illustrate our design with simulations of a 500-node network. △ Less

Submitted 26 September, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

Comments: Submitted to 2018 American Control Conference

arXiv:1609.05265 [pdf, ps, other]

Optimal Control of Large-Scale Networks using Clustering Based Projections

Authors: Nan Xue, Aranya Chakrabortty

Abstract: In this paper we present a set of projection-based designs for constructing simplified linear quadratic regulator (LQR) controllers for large-scale network systems. When such systems have tens of thousands of states, the design of conventional LQR controllers becomes numerically challenging, and their implementation requires a large number of communication links. Our proposed algorithms bypass the… ▽ More In this paper we present a set of projection-based designs for constructing simplified linear quadratic regulator (LQR) controllers for large-scale network systems. When such systems have tens of thousands of states, the design of conventional LQR controllers becomes numerically challenging, and their implementation requires a large number of communication links. Our proposed algorithms bypass these difficulties by clustering the system states using structural properties of its closed-loop transfer matrix. The assignment of clusters is defined through a structured projection matrix P, which leads to a significantly lower-dimensional LQR design. The reduced-order controller is finally projected back to the original coordinates via an inverse projection. The problem is, therefore, posed as a model matching problem of finding the optimal set of clusters or P that minimizes the H2-norm of the error between the transfer matrix of the full-order network with the full-order LQR and that with the projected LQR. We derive a tractable relaxation for this model matching problem, and design a P that solves the relaxation. The design is shown to be implementable by a convenient, hierarchical two-layer control architecture, requiring far less number of communication links than full-order LQR. △ Less

Submitted 4 October, 2017; v1 submitted 16 September, 2016; originally announced September 2016.

Comments: Submitted to Transactions on Automatic Control, Oct 4, 2016. (under review)

Showing 1–30 of 30 results for author: Xu, N