Search | arXiv e-print repository

doi 10.13140/RG.2.2.24505.17769

SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning

Authors: Jianye Xu, Pan Hu, Bassam Alrifaee

Abstract: This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same sce… ▽ More This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL) for motion planning of connected and automated vehicles. Most RL agents exhibit a limited capacity to generalize, often focusing narrowly on specific scenarios, and are usually evaluated in similar or even the same scenarios seen during training. Various methods have been proposed to address these challenges, including experience replay and regularization. However, how observation design in RL affects sample efficiency and generalization remains an under-explored area. We address this gap by proposing five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios. We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout. Incorporating these information-dense observations reduces training times to under one hour on a single CPU, and the evaluation results reveal that our RL agents can effectively zero-shot generalize. Code: github.com/cas-lab-munich/SigmaRL △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 8 pages, 5 figures, accepted for presentation at the IEEE International Conference on Intelligent Transportation Systems (ITSC) 2024

arXiv:2408.04737 [pdf, other]

Quantifying the Corpus Bias Problem in Automatic Music Transcription Systems

Authors: Lukáš Samuel Marták, Patricia Hu, Gerhard Widmer

Abstract: Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music. The State-of-the-Art (SotA) benchmarks have been dominated by deep learning systems. Due to the scarcity of high quality data, they are usually trained and evaluated exclusively or predominantly on classical piano music. Unfortunately, that hinders our ability to understand how they generalize to oth… ▽ More Automatic Music Transcription (AMT) is the task of recognizing notes in audio recordings of music. The State-of-the-Art (SotA) benchmarks have been dominated by deep learning systems. Due to the scarcity of high quality data, they are usually trained and evaluated exclusively or predominantly on classical piano music. Unfortunately, that hinders our ability to understand how they generalize to other music. Previous works have revealed several aspects of memorization and overfitting in these systems. We identify two primary sources of distribution shift: the music, and the sound. Complementing recent results on the sound axis (i.e. acoustics, timbre), we investigate the musical one (i.e. note combinations, dynamics, genre). We evaluate the performance of several SotA AMT systems on two new experimental test sets which we carefully construct to emulate different levels of musical distribution shift. Our results reveal a stark performance gap, shedding further light on the Corpus Bias problem, and the extent to which it continues to trouble these systems. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: 2 pages, 1 figure, presented in the 1st International Workshop on Sound Signal Processing Applications (IWSSPA) 2024

arXiv:2408.03124 [pdf, other]

Closed-loop Diffusion Control of Complex Physical Systems

Authors: Long Wei, Haodong Feng, Peiyan Hu, Tao Zhang, Yuchen Yang, Xiang Zheng, Ruiqi Feng, Dixia Fan, Tailin Wu

Abstract: The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective contr… ▽ More The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective control of complex physical systems. In this paper, we propose a Closed-Loop Diffusion method for Physical systems Control (CL-DiffPhyCon). By adopting an asynchronous denoising schedule for different time steps, CL-DiffPhyCon generates control signals conditioned on real-time feedback from the environment. Thus, CL-DiffPhyCon is able to speed up diffusion control methods in a closed-loop framework. We evaluate CL-DiffPhyCon on the 1D Burgers' equation control and 2D incompressible fluid control tasks. The results demonstrate that CL-DiffPhyCon achieves notable control performance with significant sampling acceleration. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2407.11620 [pdf]

A Deep Learning-Based Target Radial Length Estimation Method through HRRP Sequence

Authors: Lingfeng Chen, Panhe Hu, Zhiliang Pan, Xiao Sun, Zehao Wang

Abstract: This paper introduces an innovative deep learning-based method for end-to-end target radial length estimation from HRRP (High Resolution Range Profile) sequences. Firstly, the HRRP sequences are normalized and transformed into GAF (Gram Angular Field) images to effectively capture and utilize the temporal information. Subsequently, these GAF images serve as the input for a pretrained ResNet-101 mo… ▽ More This paper introduces an innovative deep learning-based method for end-to-end target radial length estimation from HRRP (High Resolution Range Profile) sequences. Firstly, the HRRP sequences are normalized and transformed into GAF (Gram Angular Field) images to effectively capture and utilize the temporal information. Subsequently, these GAF images serve as the input for a pretrained ResNet-101 model, which is then fine-tuned for target radial length estimation. The simulation results show that compared to traditional threshold method and simple networks e.g. one-dimensional CNN (Convolutional Neural Network), the proposed method demonstrates superior noise resistance and higher accuracy under low SNR (Signal-to-Noise Ratio) conditions. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 2 pages, 2 figures. Accepted by APCAP 2024

arXiv:2407.08236 [pdf, other]

HRRPGraphNet: A Graph Neural Network Based Approach for HRRP Radar Target Recognition

Authors: Lingfeng Chen, Panhe Hu, Zhiliang Pan, Xiao Sun, Zehao Wang

Abstract: High Resolution Range Profiles (HRRP) have become a key area of focus in the domain of Radar Automatic Target Recognition (RATR). Despite the success of data-driven neural network-based HRRP recognition, challenges such as insufficient training samples persist in its real-world application. This letter introduces HRRPGraphNet, a novel Graph Neural Network (GNN) model designed specifically for HRRP… ▽ More High Resolution Range Profiles (HRRP) have become a key area of focus in the domain of Radar Automatic Target Recognition (RATR). Despite the success of data-driven neural network-based HRRP recognition, challenges such as insufficient training samples persist in its real-world application. This letter introduces HRRPGraphNet, a novel Graph Neural Network (GNN) model designed specifically for HRRP target recognition that leverages new insights to address these challenges. A pivotal innovation is the transformation of HRRP data into a graph structure, utilizing a range cell amplitude-based node vector and a range-relative adjacency matrix. This graph-based approach facilitates both local feature extraction via one-dimensional convolution layers and global feature extraction through a graph convolution layer, capitalizing on the intrinsic relationships between range cells which is a distinct advantage over existing sequence-based methods. Experiments on the aircraft electromagnetic simulation dataset and the measured dataset have confirmed HRRPGraphNet's superior accuracy and robustness, particularly in fewer training sample environments, underscoring the potential of graph-driven innovations in HRRP-based RATR. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 5 pages, 4 figures

arXiv:2406.15160 [pdf, other]

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Authors: Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee

Abstract: This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c… ▽ More This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich collection of audio data with multiple data augmentation techniques, to an audio-visual student model trained with only a limited set of multi-modal data. Next, we propose a two-stage audio-visual fusion strategy, consisting of an early feature fusion and a late video-guided decision fusion to exploit synergies between audio and video modalities. Finally, we introduce an innovative video pixel swapping (VPS) technique to extend an audio channel swapping (ACS) method to an audio-visual joint augmentation. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances. Furthermore, our submission to the SELD task of the DCASE 2023 Challenge ranks first place by effectively integrating the proposed techniques into a model ensemble. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: accepted by icme2024

arXiv:2406.08454 [pdf, other]

Towards Musically Informed Evaluation of Piano Transcription Models

Authors: Patricia Hu, Lukáš Samuel Marták, Carlos Cancino-Chacón, Gerhard Widmer

Abstract: Automatic piano transcription models are typically evaluated using simple frame- or note-wise information retrieval (IR) metrics. Such benchmark metrics do not provide insights into the transcription quality of specific musical aspects such as articulation, dynamics, or rhythmic precision of the output, which are essential in the context of expressive performance analysis. Furthermore, in recent y… ▽ More Automatic piano transcription models are typically evaluated using simple frame- or note-wise information retrieval (IR) metrics. Such benchmark metrics do not provide insights into the transcription quality of specific musical aspects such as articulation, dynamics, or rhythmic precision of the output, which are essential in the context of expressive performance analysis. Furthermore, in recent years, MAESTRO has become the de-facto training and evaluation dataset for such models. However, inference performance has been observed to deteriorate substantially when applied on out-of-distribution data, thereby questioning the suitability and reliability of transcribed outputs from such models for specific MIR tasks. In this work, we investigate the performance of three state-of-the-art piano transcription models in two experiments. In the first one, we propose a variety of musically informed evaluation metrics which, in contrast to the IR metrics, offer more detailed insight into the musical quality of the transcriptions. In the second experiment, we compare inference performance on real-world and perturbed audio recordings, and highlight musical dimensions which our metrics can help explain. Our experimental results highlight the weaknesses of existing piano transcription metrics and contribute to a more musically sound error analysis of transcription outputs. △ Less

Submitted 29 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

arXiv:2405.15438 [pdf, other]

Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China

Authors: Wenquan Dong, Edward T. A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven Hancock

Abstract: Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbia… ▽ More Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbiased estimation of forest AGB at high resolution, particularly in dense and tall forests, where Synthetic Aperture Radar (SAR) and passive optical data exhibit saturation. However, GEDI is a sampling instrument, collecting dispersed footprints, and its data must be combined with that from other continuous cover satellites to create high-resolution maps, using local machine learning methods. In this study, we developed local models to estimate forest AGB from GEDI L2A data, as the models used to create GEDI L4 AGB data incorporated minimal field data from China. We then applied LightGBM and random forest regression to generate wall-to-wall AGB maps at 25 m resolution, using extensive GEDI footprints as well as Sentinel-1 data, ALOS-2 PALSAR-2 and Sentinel-2 optical data. Through a 5-fold cross-validation, LightGBM demonstrated a slightly better performance than Random Forest across two contrasting regions. However, in both regions, the computation speed of LightGBM is substantially faster than that of the random forest model, requiring roughly one-third of the time to compute on the same hardware. Through the validation against field data, the 25 m resolution AGB maps generated using the local models developed in this study exhibited higher accuracy compared to the GEDI L4B AGB data. We found in both regions an increase in error as slope increased. The trained models were tested on nearby but different regions and exhibited good performance. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2312.04795 [pdf, other]

Latency versus Transmission Power Trade-off in Free-Space Optical (FSO) Satellite Networks with Multiple Inter-Continental Connections

Authors: Jintao Liang, Aizaz Chaudhry, John Chinneck, Halim Yanikomeroglu, Gunes Kurt, Peng Hu, Khaled Ahmed, Stephane Martel

Abstract: In free-space optical satellite networks (FSOSNs), satellites connected via laser inter-satellite links (LISLs), latency is a critical factor, especially for long-distance inter-continental connections. Since satellites depend on solar panels for power supply, power consumption is also a vital factor. We investigate the minimization of total network latency (i.e., the sum of the network latencies… ▽ More In free-space optical satellite networks (FSOSNs), satellites connected via laser inter-satellite links (LISLs), latency is a critical factor, especially for long-distance inter-continental connections. Since satellites depend on solar panels for power supply, power consumption is also a vital factor. We investigate the minimization of total network latency (i.e., the sum of the network latencies of all inter-continental connections in a time slot) in a realistic model of a FSOSN, the latest version of the Starlink Phase 1 Version 3 constellation. We develop mathematical formulations of the total network latency over different LISL ranges and different satellite transmission power constraints for multiple simultaneous inter-continental connections. We use practical system models for calculating network latency and satellite optical link transmission power, and we formulate the problem as a binary integer linear program. The results reveal that, for satellite transmission power limits set at 0.5 W, 0.3 W, and 0.1 W, the average total network latency for all five inter-continental connections studied in this work levels off at 339 ms, 361 ms, and 542 ms, respectively. Furthermore, the corresponding LISL ranges required to achieve these average total network latency values are 4500 km, 3000 km, and 1731 km, respectively. Different limitations on satellite transmission power exhibit varying effects on average total network latency (over 100 time slots), and they also induce differing changes in the corresponding LISL ranges. In the absence of satellite transmission power constraints, as the LISL range extends from the minimum feasible range of 1575 km to the maximum feasible range of 5016 km, the average total network latency decreases from 589 ms to 311 ms. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Open Journal of the Communications Society

arXiv:2312.04788 [pdf, other]

Free-Space Optical (FSO) Satellite Networks Performance Analysis: Transmission Power, Latency, and Outage Probability

Authors: Jintao Liang, Aizaz U. Chaudhry, Eylem Erdogan, Halim Yanikomeroglu, Gunes Karabulut Kurt, Peng Hu, Khaled Ahmed, Stephane Martel

Abstract: In free-space optical satellite networks (FSOSNs), satellites can have different laser inter-satellite link (LISL) ranges for connectivity. Greater LISL ranges can reduce network latency of the path but can also result in an increase in transmission power for satellites on the path. Consequently, this tradeoff between satellite transmission power and network latency should be investigated, and in… ▽ More In free-space optical satellite networks (FSOSNs), satellites can have different laser inter-satellite link (LISL) ranges for connectivity. Greater LISL ranges can reduce network latency of the path but can also result in an increase in transmission power for satellites on the path. Consequently, this tradeoff between satellite transmission power and network latency should be investigated, and in this work we examine it in FSOSNs drawing on the Starlink Phase 1 Version 3 and Kuiper Shell 2 constellations for different LISL ranges and different inter-continental connections. We use appropriate system models for calculating the average satellite transmission power and network latency. The results show that the mean network latency decreases and mean average satellite transmission power increases with an increase in LISL range. For the Toronto--Sydney inter-continental connection in an FSOSN with Starlink's Phase 1 Version 3 constellation, when the LISL range is approximately 2,900 km, the mean network latency and mean average satellite transmission power intersect are approximately 135 ms and 380 mW, respectively. For an FSOSN with the Kuiper Shell 2 constellation in this inter-continental connection, this LISL range is around 3,800 km, and the two parameters are approximately 120 ms and 700 mW, respectively. For the Toronto--Istanbul and Toronto--London inter-continental connections, the LISL ranges at the intersection are different and vary from 2,600 km to 3,400 km. Furthermore, we analyze outage probability performance of optical uplink/downlink due to atmosphere attenuation and turbulence. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Open Journal of Vehicular Technology

arXiv:2311.07062 [pdf, other]

doi 10.1109/TASLP.2023.3332542

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Authors: Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie

Abstract: Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related a… ▽ More Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related accent characteristics, while coarse-grained units are better for learning linguistic information. Moreover, an explicit interaction of two tasks can also provide complementary information and improve the performance of each other, but it is rarely used by existing approaches. In this paper, we propose a novel Decoupling and Interacting Multi-task Network (DIMNet) for joint speech and accent recognition, which is comprised of a connectionist temporal classification (CTC) branch, an AR branch, an ASR branch, and a bottom feature encoder. Specifically, AR and ASR are first decoupled by separated branches and two-granular modeling units to learn task-specific representations. The AR branch is from our previously proposed linguistic-acoustic bimodal AR model and the ASR branch is an encoder-decoder based Conformer model. Then, for the task interaction, the CTC branch provides aligned text for the AR task, while accent embeddings extracted from our AR model are incorporated into the ASR branch's encoder and decoder. Finally, during ASR inference, a cross-granular rescoring method is introduced to fuse the complementary information from the CTC and attention decoder after the decoupling. Our experiments on English and Chinese datasets demonstrate the effectiveness of the proposed model, which achieves 21.45%/28.53% AR accuracy relative improvement and 32.33%/14.55% ASR error rate relative reduction over a published standard baseline, respectively. △ Less

Submitted 17 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing (TASLP)

arXiv:2309.07925 [pdf, other]

doi 10.1145/3581783.3612859

Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for emotion classification and valence regression in the decoding stage. A multi-task loss based on uncertainty is also designed to optimize the whole process. Finally, by combining three different structures on the posterior probability level, we obtain the final predictions of discrete and dimensional emotions. When tested on the dataset of multimodal emotion recognition challenge (MER 2023), the proposed framework yields consistent improvements in both emotion classification and valence regression. Our final system achieves state-of-the-art performance and ranks third on the leaderboard on MER-MULTI sub-challenge. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Comments: 5 pages, 4 figures

Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

arXiv:2309.02399 [pdf, other]

The Batik-plays-Mozart Corpus: Linking Performance to Score to Musicological Annotations

Authors: Patricia Hu, Gerhard Widmer

Abstract: We present the Batik-plays-Mozart Corpus, a piano performance dataset combining professional Mozart piano sonata performances with expert-labelled scores at a note-precise level. The performances originate from a recording by Viennese pianist Roland Batik on a computer-monitored Bösendorfer grand piano, and are available both as MIDI files and audio recordings. They have been precisely aligned, no… ▽ More We present the Batik-plays-Mozart Corpus, a piano performance dataset combining professional Mozart piano sonata performances with expert-labelled scores at a note-precise level. The performances originate from a recording by Viennese pianist Roland Batik on a computer-monitored Bösendorfer grand piano, and are available both as MIDI files and audio recordings. They have been precisely aligned, note by note, with a current standard edition of the corresponding scores (the New Mozart Edition) in such a way that they can further be connected to the musicological annotations (harmony, cadences, phrases) on these scores that were recently published by Hentschel et al. (2021). The result is a high-quality, high-precision corpus mapping scores and musical structure annotations to precise note-level professional performance information. As the first of its kind, it can serve as a valuable resource for studying various facets of expressive performance and their relationship with structural aspects. In the paper, we outline the curation process of the alignment and conduct two exploratory experiments to demonstrate its usefulness in analyzing expressive performance. △ Less

Submitted 6 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: To be published in the Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

arXiv:2306.14471 [pdf]

Single-shot 3D photoacoustic computed tomography with a densely packed array for transcranial functional imaging

Authors: Rui Cao, Yilin Luo, Jinhua Xu, Xiaofei Luo, Ku Geng, Yousuf Aborahama, Manxiu Cui, Samuel Davis, Shuai Na, Xin Tong, Cindy Liu, Karteek Sastry, Konstantin Maslov, Peng Hu, Yide Zhang, Li Lin, Yang Zhang, Lihong V. Wang

Abstract: Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we hav… ▽ More Photoacoustic computed tomography (PACT) is emerging as a new technique for functional brain imaging, primarily due to its capabilities in label-free hemodynamic imaging. Despite its potential, the transcranial application of PACT has encountered hurdles, such as acoustic attenuations and distortions by the skull and limited light penetration through the skull. To overcome these challenges, we have engineered a PACT system that features a densely packed hemispherical ultrasonic transducer array with 3072 channels, operating at a central frequency of 1 MHz. This system allows for single-shot 3D imaging at a rate equal to the laser repetition rate, such as 20 Hz. We have achieved a single-shot light penetration depth of approximately 9 cm in chicken breast tissue utilizing a 750 nm laser (withstanding 3295-fold light attenuation and still retaining an SNR of 74) and successfully performed transcranial imaging through an ex vivo human skull using a 1064 nm laser. Moreover, we have proven the capacity of our system to perform single-shot 3D PACT imaging in both tissue phantoms and human subjects. These results suggest that our PACT system is poised to unlock potential for real-time, in vivo transcranial functional imaging in humans. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.09397 [pdf, other]

Non-Asymptotic Performance of Social Machine Learning Under Limited Data

Authors: Ping Hu, Virginia Bordignon, Mert Kayaalp, Ali H. Sayed

Abstract: This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the deci… ▽ More This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the decision-making phase, which requires a non-asymptotic performance analysis. We establish a condition for consistent training and derive an upper bound on the probability of error for classification. The results clarify the dependence on the statistical properties of the data and the combination policy used over the graph. They also establish the exponential decay of the probability of error with respect to the number of unlabeled samples. △ Less

Submitted 9 July, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

arXiv:2304.12939 [pdf, other]

The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist

Authors: Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Nimrod Varga, Gerhard Widmer

Abstract: This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable… ▽ More This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable of producing and playing MIDI data, with explicitly encoded onset, offset, and pitch for each played note. We describe the components that go into such a system, from real-time score following and prediction to expressive performance generation and online adaptation to the expressive choices of the human player. Based on our experience with repeated live demonstrations in front of various audiences, we offer an analysis of the challenges of combining these components into a system that is highly reactive and precise, while still a reliable musical partner, robust to possible performance errors and responsive to expressive variations. △ Less

Submitted 30 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

Comments: In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23), Macao, China. The differences/extensions with the previous version include a technical appendix, added missing links, and minor text updates. 10 pages, 4 figures

arXiv:2303.12883 [pdf, other]

HAPS-UAV-Enabled Heterogeneous Networks: A Deep Reinforcement Learning Approach

Authors: Atefeh H. Arani, Peng Hu, Yeying Zhu

Abstract: The integrated use of non-terrestrial network (NTN) entities such as the high-altitude platform station (HAPS) and low-altitude platform station (LAPS) has become essential elements in the space-air-ground integrated networks (SAGINs). However, the complexity, mobility, and heterogeneity of NTN entities and resources present various challenges from system design to deployment. This paper proposes… ▽ More The integrated use of non-terrestrial network (NTN) entities such as the high-altitude platform station (HAPS) and low-altitude platform station (LAPS) has become essential elements in the space-air-ground integrated networks (SAGINs). However, the complexity, mobility, and heterogeneity of NTN entities and resources present various challenges from system design to deployment. This paper proposes a novel approach to designing a heterogeneous network consisting of HAPSs and unmanned aerial vehicles (UAVs) being LAPS entities. Our approach involves jointly optimizing the three-dimensional trajectory and channel allocation for aerial base stations, with a focus on ensuring fairness and the provision of quality of service (QoS) to ground users. Furthermore, we consider the load on base stations and incorporate this information into the optimization problem. The proposed approach utilizes a combination of deep reinforcement learning and fixed-point iteration techniques to determine the UAV locations and channel allocation strategies. Simulation results reveal that our proposed deep learning-based approach significantly outperforms learning-based and conventional benchmark models. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.05697 [pdf]

Quantification of cervical elasticity during pregnancy based on transvaginal ultrasound imaging and stress measurement

Authors: Peng Hu, Peinan Zhao, Yuan Qu, Konstantin Maslov, Jessica Chubiz, Methodius G. Tuuli, Molly J. Stout, Lihong V. Wang

Abstract: Objective: Strain elastography and shear wave elastography are two commonly used methods to quantify cervical elasticity; however, they have limitations. Strain elastography is effective in showing tissue elasticity distribution in a single image, but the absence of stress information causes difficulty in comparing the results acquired from different imaging sessions. Shear wave elastography is ef… ▽ More Objective: Strain elastography and shear wave elastography are two commonly used methods to quantify cervical elasticity; however, they have limitations. Strain elastography is effective in showing tissue elasticity distribution in a single image, but the absence of stress information causes difficulty in comparing the results acquired from different imaging sessions. Shear wave elastography is effective in measuring shear wave speed (an intrinsic tissue property correlated with elasticity) in relatively homogeneous tissue, such as in the liver. However, for inhomogeneous tissue in the cervix, the shear wave speed measurement is less robust. To overcome these limitations, we develop a quantitative cervical elastography system by adding a stress sensor to an ultrasound imaging system. Methods: In an imaging session for quantitative cervical elastography, we use the transvaginal ultrasound imaging system to record B-mode images of the cervix showing its deformation and use the stress sensor to record the probe-surface stress simultaneously. We develop a correlation-based automatic feature tracking algorithm to quantify the deformation, from which the strain is quantified. After each imaging session, we calibrate the stress sensor and transform its measurement to true stress. Applying a linear regression to the stress and strain, we obtain an approximation of the cervical Young's modulus. Results: We validate the accuracy and robustness of this elastography system using phantom experiments. Applying this system to pregnant participants, we observe significant softening of the cervix during pregnancy (p-value < 0.001) with the cervical Young's modulus decreasing 3.95% per week. We estimate that geometric mean values of cervical Young's moduli during the first (11 to 13 weeks), second, and third trimesters are 13.07 kPa, 7.59 kPa, and 4.40 kPa, respectively. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 26 pages, 8 figures, 1 table

arXiv:2302.13130 [pdf, other]

Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

Authors: Tarasha Khurana, Peiyun Hu, David Held, Deva Ramanan

Abstract: Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud… ▽ More Predicting how the world can evolve in the future is crucial for motion planning in autonomous systems. Classical methods are limited because they rely on costly human annotations in the form of semantic class labels, bounding boxes, and tracks or HD maps of cities to plan their motion and thus are difficult to scale to large unlabeled datasets. One promising self-supervised task is 3D point cloud forecasting from unannotated LiDAR sequences. We show that this task requires algorithms to implicitly capture (1) sensor extrinsics (i.e., the egomotion of the autonomous vehicle), (2) sensor intrinsics (i.e., the sampling pattern specific to the particular LiDAR sensor), and (3) the shape and motion of other objects in the scene. But autonomous systems should make predictions about the world and not their sensors. To this end, we factor out (1) and (2) by recasting the task as one of spacetime (4D) occupancy forecasting. But because it is expensive to obtain ground-truth 4D occupancy, we render point cloud data from 4D occupancy predictions given sensor extrinsics and intrinsics, allowing one to train and test occupancy algorithms with unannotated LiDAR sequences. This also allows one to evaluate and compare point cloud forecasting algorithms across diverse datasets, sensors, and vehicles. △ Less

Submitted 30 April, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

Comments: CVPR 2023. Project page: https://www.cs.cmu.edu/~tkhurana/ff4d/index.html Code: https://github.com/tarashakhurana/4d-occ-forecasting

arXiv:2302.05525 [pdf, other]

Satellite Anomaly Detection Using Variance Based Genetic Ensemble of Neural Networks

Authors: Mohammad Amin Maleki Sadr, Yeying Zhu, Peng Hu

Abstract: In this paper, we use a variance-based genetic ensemble (VGE) of Neural Networks (NNs) to detect anomalies in the satellite's historical data. We use an efficient ensemble of the predictions from multiple Recurrent Neural Networks (RNNs) by leveraging each model's uncertainty level (variance). For prediction, each RNN is guided by a Genetic Algorithm (GA) which constructs the optimal structure for… ▽ More In this paper, we use a variance-based genetic ensemble (VGE) of Neural Networks (NNs) to detect anomalies in the satellite's historical data. We use an efficient ensemble of the predictions from multiple Recurrent Neural Networks (RNNs) by leveraging each model's uncertainty level (variance). For prediction, each RNN is guided by a Genetic Algorithm (GA) which constructs the optimal structure for each RNN model. However, finding the model uncertainty level is challenging in many cases. Although the Bayesian NNs (BNNs)-based methods are popular for providing the confidence bound of the models, they cannot be employed in complex NN structures as they are computationally intractable. This paper uses the Monte Carlo (MC) dropout as an approximation version of BNNs. Then these uncertainty levels and each predictive model suggested by GA are used to generate a new model, which is then used for forecasting the TS and AD. Simulation results show that the forecasting and AD capability of the ensemble model outperforms existing approaches. △ Less

Submitted 10 February, 2023; originally announced February 2023.

arXiv:2301.03641 [pdf, other]

SatNetOps: Toward Multi-Layer Networking for Satellite Network Operations

Authors: Peng Hu

Abstract: Recent advancements in low-Earth-orbit (LEO) satellites aim to bring resilience, ubiquitous, and high-quality service to future Internet infrastructure. However, the soaring number of space assets, increasing dynamics of LEO satellites and expanding dimensions of network threats call for an enhanced approach to efficient satellite operations. To address these pressing challenges, we propose an app… ▽ More Recent advancements in low-Earth-orbit (LEO) satellites aim to bring resilience, ubiquitous, and high-quality service to future Internet infrastructure. However, the soaring number of space assets, increasing dynamics of LEO satellites and expanding dimensions of network threats call for an enhanced approach to efficient satellite operations. To address these pressing challenges, we propose an approach for satellite network operations based on multi-layer satellite networking (MLSN), called "SatNetOps". Two SatNetOps schemes are proposed, referred to as LEO-LEO MLSN (LLM) and GEO-LEO MLSN (GLM). The performance of the proposed schemes is evaluated in 24-hr satellite scenarios with typical payload setups in simulations, where the key metrics such as latency and reliability are discussed with the consideration of the Consultative Committee for Space Data Systems (CCSDS) standard-compliant telemetry and telecommand missions. Although the SatNetOps approach is promising, we analyze the factors affecting the performance of the LLM and GLM schemes. The discussions on the results and conclusive remarks are made in the end. △ Less

Submitted 9 January, 2023; originally announced January 2023.

arXiv:2212.05986 [pdf, other]

A Cross-Layer Descent Approach for Resilient Network Operations of Proliferated LEO Satellites

Authors: Peng Hu

Abstract: With the proliferated low-Earth-orbit (LEO) satellites in mega-constellations, the future Internet will be able to reach any place on Earth, providing high-quality services to everyone. However, high-quality operations in terms of timeliness and resilience are lacking in the current solutions. This paper proposes a multi-layer networking approach called "Cross-Layer Descent (CLD)". Based on the pr… ▽ More With the proliferated low-Earth-orbit (LEO) satellites in mega-constellations, the future Internet will be able to reach any place on Earth, providing high-quality services to everyone. However, high-quality operations in terms of timeliness and resilience are lacking in the current solutions. This paper proposes a multi-layer networking approach called "Cross-Layer Descent (CLD)". Based on the proposed system model, principles, and measures, CLD can support foundational services such as telecommand (TC) transmissions for various network operation missions for LEO satellites compliant with the Consultative Committee for Space Data Systems (CCSDS) standards. The CLD approach enhances timing and resilience requirements using advanced communication payloads. From the simulation-based analysis, the proposed scheme outperforms other classical ones in resilience and latency for typical TC missions. The future work and conclusive remarks are discussed at the end. △ Less

Submitted 12 December, 2022; originally announced December 2022.

Comments: 2023 IEEE Wireless Communications and Networking Conference (WCNC), 26--29 March 2023, Glasgow, UK

arXiv:2212.04148 [pdf, other]

Relationship Quantification of Image Degradations

Authors: Wenxin Wang, Boyun Li, Yuanbiao Gou, Peng Hu, Wangmeng Zuo, Xi Peng

Abstract: In this paper, we study two challenging but less-touched problems in image restoration, namely, i) how to quantify the relationship between image degradations and ii) how to improve the performance of a specific restoration task using the quantified relationship. To tackle the first challenge, we proposed a Degradation Relationship Index (DRI) which is defined as the mean drop rate difference in t… ▽ More In this paper, we study two challenging but less-touched problems in image restoration, namely, i) how to quantify the relationship between image degradations and ii) how to improve the performance of a specific restoration task using the quantified relationship. To tackle the first challenge, we proposed a Degradation Relationship Index (DRI) which is defined as the mean drop rate difference in the validation loss between two models which are respectively trained using the anchor degradation and the mixture of the anchor and the auxiliary degradations. Through quantifying the degradation relationship using DRI, we reveal that i) a positive DRI always predicts performance improvement by using the specific degradation as an auxiliary to train models; ii) the degradation proportion is crucial to the image restoration performance. In other words, the restoration performance is improved only if the anchor and the auxiliary degradations are mixed with an appropriate proportion. Based on the observations, we further propose a simple but effective method (dubbed DPD) to estimate whether the given degradation combinations could improve the performance on the anchor degradation with the assistance of the auxiliary degradation. Extensive experimental results verify the effectiveness of our method in dehazing, denoising, deraining, and desnowing. The code will be released after acceptance. △ Less

Submitted 5 August, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

arXiv:2212.03729 [pdf, other]

Enabling Resilient and Real-Time Network Operations in Space: A Novel Multi-Layer Satellite Networking Scheme

Authors: Peng Hu

Abstract: Recently advanced low-Earth-orbit (LEO) satellite networks represented by large constellations and advanced payloads provide great promises for enabling high-quality Internet connectivity to any place on Earth. However, the traditional access-based approach to satellite operations cannot meet the pressing requirements of real-time, reliable, and resilient operations for LEO satellites. A new schem… ▽ More Recently advanced low-Earth-orbit (LEO) satellite networks represented by large constellations and advanced payloads provide great promises for enabling high-quality Internet connectivity to any place on Earth. However, the traditional access-based approach to satellite operations cannot meet the pressing requirements of real-time, reliable, and resilient operations for LEO satellites. A new scheme is proposed based on multi-layer satellite networking considering the advanced Ka-band and optical communications payloads on a satellite platform. The proposed scheme can enable efficient and resilient message transmissions for critical telecommand and telemetry missions through different layers of satellite networks, which consist of LEO, medium-Earth-orbit (MEO), and geostationary (GEO) satellites. The proposed scheme is evaluated in a 24-hr satellite mission and shows superior performance improvements compared to the traditional operations approach. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: Published in the Proceedings of the 2022 IEEE Latin-American Conference on Communications (LATINCOM), 30 November - 2 December 2022, Rio de Janeiro, Brazil

arXiv:2212.01042 [pdf, other]

doi 10.1109/SP46214.2022.9833716

AccEar: Accelerometer Acoustic Eavesdropping with Unconstrained Vocabulary

Authors: Pengfei Hu, Hui Zhuang, Panneer Selvam Santhalingamy, Riccardo Spolaor, Parth Pathaky, Guoming Zhang, Xiuzhen Cheng

Abstract: With the increasing popularity of voice-based applications, acoustic eavesdropping has become a serious threat to users' privacy. While on smartphones the access to microphones needs an explicit user permission, acoustic eavesdropping attacks can rely on motion sensors (such as accelerometer and gyroscope), which access is unrestricted. However, previous instances of such attacks can only recogniz… ▽ More With the increasing popularity of voice-based applications, acoustic eavesdropping has become a serious threat to users' privacy. While on smartphones the access to microphones needs an explicit user permission, acoustic eavesdropping attacks can rely on motion sensors (such as accelerometer and gyroscope), which access is unrestricted. However, previous instances of such attacks can only recognize a limited set of pre-trained words or phrases. In this paper, we present AccEar, an accelerometerbased acoustic eavesdropping attack that can reconstruct any audio played on the smartphone's loudspeaker with unconstrained vocabulary. We show that an attacker can employ a conditional Generative Adversarial Network (cGAN) to reconstruct highfidelity audio from low-frequency accelerometer signals. The presented cGAN model learns to recreate high-frequency components of the user's voice from low-frequency accelerometer signals through spectrogram enhancement. We assess the feasibility and effectiveness of AccEar attack in a thorough set of experiments using audio from 16 public personalities. As shown by the results in both objective and subjective evaluations, AccEar successfully reconstructs user speeches from accelerometer signals in different scenarios including varying sampling rate, audio volume, device model, etc. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 2022 IEEE Symposium on Security and Privacy (SP)

Journal ref: 2022 IEEE Symposium on Security and Privacy (SP)

arXiv:2211.14938 [pdf, other]

doi 10.1109/TAES.2022.3206257

An Anomaly Detection Method for Satellites Using Monte Carlo Dropout

Authors: Mohammad Amin Maleki Sadr, Yeying Zhu, Peng Hu

Abstract: Recently, there has been a significant amount of interest in satellite telemetry anomaly detection (AD) using neural networks (NN). For AD purposes, the current approaches focus on either forecasting or reconstruction of the time series, and they cannot measure the level of reliability or the probability of correct detection. Although the Bayesian neural network (BNN)-based approaches are well kno… ▽ More Recently, there has been a significant amount of interest in satellite telemetry anomaly detection (AD) using neural networks (NN). For AD purposes, the current approaches focus on either forecasting or reconstruction of the time series, and they cannot measure the level of reliability or the probability of correct detection. Although the Bayesian neural network (BNN)-based approaches are well known for time series uncertainty estimation, they are computationally intractable. In this paper, we present a tractable approximation for BNN based on the Monte Carlo (MC) dropout method for capturing the uncertainty in the satellite telemetry time series, without sacrificing accuracy. For time series forecasting, we employ an NN, which consists of several Long Short-Term Memory (LSTM) layers followed by various dense layers. We employ the MC dropout inside each LSTM layer and before the dense layers for uncertainty estimation. With the proposed uncertainty region and by utilizing a post-processing filter, we can effectively capture the anomaly points. Numerical results show that our proposed time series AD approach outperforms the existing methods from both prediction accuracy and AD perspectives. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Journal ref: IEEE Transactions on Aerospace and Electronic Systems, 2022

arXiv:2211.14931 [pdf, other]

UAV-Assisted Space-Air-Ground Integrated Networks: A Technical Review of Recent Learning Algorithms

Authors: Atefeh H. Arani, Peng Hu, Yeying Zhu

Abstract: Recent technological advancements in space, air, and ground components have made possible a new network paradigm called space-air-ground integrated network (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, real-world deployment of a SAGIN becomes a significant barrier to realizing such SAGINs. UAVs are expected to meet key perfo… ▽ More Recent technological advancements in space, air, and ground components have made possible a new network paradigm called space-air-ground integrated network (SAGIN). Unmanned aerial vehicles (UAVs) play a key role in SAGINs. However, due to UAVs' high dynamics and complexity, real-world deployment of a SAGIN becomes a significant barrier to realizing such SAGINs. UAVs are expected to meet key performance requirements with limited maneuverability and resources with space and terrestrial components. Therefore, employing UAVs in various usage scenarios requires well-designed planning in algorithmic approaches. This paper provides an essential review and analysis of recent learning algorithms in a UAV-assisted SAGIN. We consider possible reward functions and discuss the state-of-the-art algorithms for optimizing the reward functions, including Q-learning, deep Q-learning, multi-armed bandit, particle swarm optimization, and satisfaction-based learning algorithms. Unlike other survey papers, we focus on the methodological perspective of the optimization problem, applicable to various missions on a SAGIN. We consider real-world configurations and the 2-dimensional (2D) and 3-dimensional (3D) UAV trajectories to reflect deployment cases. Our simulations suggest the 3D satisfaction-based learning algorithm outperforms other approaches in most cases. With open challenges discussed at the end, we aim to provide design and deployment guidelines for UAV-assisted SAGINs. △ Less

Submitted 16 July, 2024; v1 submitted 27 November, 2022; originally announced November 2022.

Comments: Accepted by the IEEE Open Journal of Vehicular Technology in July 2024

arXiv:2209.04093 [pdf, other]

Learning Audio-Visual embedding for Person Verification in the Wild

Authors: Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu

Abstract: It has already been observed that audio-visual embedding is more robust than uni-modality embedding for person verification. Here, we proposed a novel audio-visual strategy that considers aggregators from a fusion perspective. First, we introduced weight-enhanced attentive statistics pooling for the first time in face verification. We find that a strong correlation exists between modalities during… ▽ More It has already been observed that audio-visual embedding is more robust than uni-modality embedding for person verification. Here, we proposed a novel audio-visual strategy that considers aggregators from a fusion perspective. First, we introduced weight-enhanced attentive statistics pooling for the first time in face verification. We find that a strong correlation exists between modalities during pooling, so joint attentive pooling is proposed which contains cycle consistency to learn the implicit inter-frame weight. Finally, each modality is fused with a gated attention mechanism to gain robust audio-visual embedding. All the proposed models are trained on the VoxCeleb2 dev dataset and the best system obtains 0.18%, 0.27%, and 0.49% EER on three official trial lists of VoxCeleb1 respectively, which is to our knowledge the best-published results for person verification. △ Less

Submitted 26 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

arXiv:2209.02205 [pdf, other]

High Speed Rotation Estimation with Dynamic Vision Sensors

Authors: Guangrong Zhao, Yiran Shen, Ning Chen, Pengfei Hu, Lei Liu, Hongkai Wen

Abstract: Rotational speed is one of the important metrics to be measured for calibrating the electric motors in manufacturing, monitoring engine during car repairing, faults detection on electrical appliance and etc. However, existing measurement techniques either require prohibitive hardware (e.g., high-speed camera) or are inconvenient to use in real-world application scenarios. In this paper, we propose… ▽ More Rotational speed is one of the important metrics to be measured for calibrating the electric motors in manufacturing, monitoring engine during car repairing, faults detection on electrical appliance and etc. However, existing measurement techniques either require prohibitive hardware (e.g., high-speed camera) or are inconvenient to use in real-world application scenarios. In this paper, we propose, EV-Tach, an event-based tachometer via efficient dynamic vision sensing on mobile devices. EV-Tach is designed as a high-fidelity and convenient tachometer by introducing dynamic vision sensor as a new sensing modality to capture the high-speed rotation precisely under various real-world scenarios. By designing a series of signal processing algorithms bespoke for dynamic vision sensing on mobile devices, EV-Tach is able to extract the rotational speed accurately from the event stream produced by dynamic vision sensing on rotary targets. According to our extensive evaluations, the Relative Mean Absolute Error (RMAE) of EV-Tach is as low as 0.03% which is comparable to the state-of-the-art laser tachometer under fixed measurement mode. Moreover, EV-Tach is robust to subtle movement of user's hand, therefore, can be used as a handheld device, where the laser tachometer fails to produce reasonable results. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: 10 pages,13 figures

arXiv:2206.00248 [pdf]

Transcranial photoacoustic computed tomography of human brain function

Authors: Yang Zhang, Shuai Na, Karteekeya Sastry, Jonathan J. Russin, Peng Hu, Li Lin, Xin Tong, Kay B. Jann, Danny J. Wang, Charles Y. Liu, Lihong V. Wang

Abstract: Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving trans… ▽ More Herein we report the first in-human transcranial imaging of brain function using photoacoustic computed tomography. Functional responses to benchmark motor tasks were imaged on both the skull-less and the skull-intact hemispheres of a hemicraniectomy patient. The observed brain responses in these preliminary results demonstrate the potential of photoacoustic computed tomography for achieving transcranial functional imaging. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2205.12459 [pdf, other]

doi 10.1049/ipr2.12733

A CNN with Noise Inclined Module and Denoise Framework for Hyperspectral Image Classification

Authors: Zhiqiang Gong, Ping Zhong, Jiahao Qi, Panhe Hu

Abstract: Deep Neural Networks have been successfully applied in hyperspectral image classification. However, most of prior works adopt general deep architectures while ignore the intrinsic structure of the hyperspectral image, such as the physical noise generation. This would make these deep models unable to generate discriminative features and provide impressive classification performance. To leverage suc… ▽ More Deep Neural Networks have been successfully applied in hyperspectral image classification. However, most of prior works adopt general deep architectures while ignore the intrinsic structure of the hyperspectral image, such as the physical noise generation. This would make these deep models unable to generate discriminative features and provide impressive classification performance. To leverage such intrinsic information, this work develops a novel deep learning framework with the noise inclined module and denoise framework for hyperspectral image classification. First, we model the spectral signature of hyperspectral image with the physical noise model to describe the high intraclass variance of each class and great overlapping between different classes in the image. Then, a noise inclined module is developed to capture the physical noise within each object and a denoise framework is then followed to remove such noise from the object. Finally, the CNN with noise inclined module and the denoise framework is developed to obtain discriminative features and provides good classification performance of hyperspectral image. Experiments are conducted over two commonly used real-world datasets and the experimental results show the effectiveness of the proposed method. The implementation of the proposed method and other compared methods could be accessed at https://github.com/shendu-sw/noise-physical-framework. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Journal ref: IET Image Processing, 2022

arXiv:2204.04956 [pdf, other]

Segmentation Network with Compound Loss Function for Hydatidiform Mole Hydrops Lesion Recognition

Authors: Chengze Zhu, Pingge Hu, Xianxu Zeng, Xingtong Wang, Zehua Ji, Li Shi

Abstract: Pathological morphology diagnosis is the standard diagnosis method of hydatidiform mole. As a disease with malignant potential, the hydatidiform mole section of hydrops lesions is an important basis for diagnosis. Due to incomplete lesion development, early hydatidiform mole is difficult to distinguish, resulting in a low accuracy of clinical diagnosis. As a remarkable machine learning technology,… ▽ More Pathological morphology diagnosis is the standard diagnosis method of hydatidiform mole. As a disease with malignant potential, the hydatidiform mole section of hydrops lesions is an important basis for diagnosis. Due to incomplete lesion development, early hydatidiform mole is difficult to distinguish, resulting in a low accuracy of clinical diagnosis. As a remarkable machine learning technology, image semantic segmentation networks have been used in many medical image recognition tasks. We developed a hydatidiform mole hydrops lesion segmentation model based on a novel loss function and training method. The model consists of different networks that segment the section image at the pixel and lesion levels. Our compound loss function assign weights to the segmentation results of the two levels to calculate the loss. We then propose a stagewise training method to combine the advantages of various loss functions at different levels. We evaluate our method on a hydatidiform mole hydrops dataset. Experiments show that the proposed model with our loss function and training method has good recognition performance under different segmentation metrics. △ Less

Submitted 11 April, 2022; originally announced April 2022.

arXiv:2204.04949 [pdf]

A Semantic Segmentation Network Based Real-Time Computer-Aided Diagnosis System for Hydatidiform Mole Hydrops Lesion Recognition in Microscopic View

Authors: Chengze Zhu, Pingge Hu, Xianxu Zeng, Xingtong Wang, Zehua Ji, Li Shi

Abstract: As a disease with malignant potential, hydatidiform mole (HM) is one of the most common gestational trophoblastic diseases. For pathologists, the HM section of hydrops lesions is an important basis for diagnosis. In pathology departments, the diverse microscopic manifestations of HM lesions and the limited view under the microscope mean that physicians with extensive diagnostic experience are requ… ▽ More As a disease with malignant potential, hydatidiform mole (HM) is one of the most common gestational trophoblastic diseases. For pathologists, the HM section of hydrops lesions is an important basis for diagnosis. In pathology departments, the diverse microscopic manifestations of HM lesions and the limited view under the microscope mean that physicians with extensive diagnostic experience are required to prevent missed diagnosis and misdiagnosis. Feature extraction can significantly improve the accuracy and speed of the diagnostic process. As a remarkable diagnosis assisting technology, computer-aided diagnosis (CAD) has been widely used in clinical practice. We constructed a deep-learning-based CAD system to identify HM hydrops lesions in the microscopic view in real-time. The system consists of three modules; the image mosaic module and edge extension module process the image to improve the outcome of the hydrops lesion recognition module, which adopts a semantic segmentation network, our novel compound loss function, and a stepwise training function in order to achieve the best performance in identifying hydrops lesions. We evaluated our system using an HM hydrops dataset. Experiments show that our system is able to respond in real-time and correctly display the entire microscopic view with accurately labeled HM hydrops lesions. △ Less

Submitted 11 April, 2022; originally announced April 2022.

arXiv:2204.03398 [pdf, other]

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition

Authors: Qijie Shao, Jinghao Yan, Jian Kang, Pengcheng Guo, Xian Shi, Pengfei Hu, Lei Xie

Abstract: General accent recognition (AR) models tend to directly extract low-level information from spectrums, which always significantly overfit on speakers or channels. Considering accent can be regarded as a series of shifts relative to native pronunciation, distinguishing accents will be an easier task with accent shift as input. But due to the lack of native utterance as an anchor, estimating the acce… ▽ More General accent recognition (AR) models tend to directly extract low-level information from spectrums, which always significantly overfit on speakers or channels. Considering accent can be regarded as a series of shifts relative to native pronunciation, distinguishing accents will be an easier task with accent shift as input. But due to the lack of native utterance as an anchor, estimating the accent shift is difficult. In this paper, we propose linguistic-acoustic similarity based accent shift (LASAS) for AR tasks. For an accent speech utterance, after mapping the corresponding text vector to multiple accent-associated spaces as anchors, its accent shift could be estimated by the similarities between the acoustic embedding and those anchors. Then, we concatenate the accent shift with a dimension-reduced text vector to obtain a linguistic-acoustic bimodal representation. Compared with pure acoustic embedding, the bimodal representation is richer and more clear by taking full advantage of both linguistic and acoustic information, which can effectively improve AR performance. Experiments on Accented English Speech Recognition Challenge (AESRC) dataset show that our method achieves 77.42% accuracy on Test set, obtaining a 6.94% relative improvement over a competitive system in the challenge. △ Less

Submitted 1 July, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Accepted by Interspeech 2022

arXiv:2204.00819 [pdf, other]

doi 10.21437/Interspeech.2021-964

Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition

Authors: Guodong Ma, Pengfei Hu, Jian Kang, Shen Huang, Hao Huang

Abstract: In Uyghur speech, consonant and vowel reduction are often encountered, especially in spontaneous speech with high speech rate, which will cause a degradation of speech recognition performance. To solve this problem, we propose an effective phone mask training method for Conformer-based Uyghur end-to-end (E2E) speech recognition. The idea is to randomly mask off a certain percentage features of pho… ▽ More In Uyghur speech, consonant and vowel reduction are often encountered, especially in spontaneous speech with high speech rate, which will cause a degradation of speech recognition performance. To solve this problem, we propose an effective phone mask training method for Conformer-based Uyghur end-to-end (E2E) speech recognition. The idea is to randomly mask off a certain percentage features of phones during model training, which simulates the above verbal phenomena and facilitates E2E model to learn more contextual information. According to experiments, the above issues can be greatly alleviated. In addition, deep investigations are carried out into different units in masking, which shows the effectiveness of our proposed masking unit. We also further study the masking method and optimize filling strategy of phone mask. Finally, compared with Conformer-based E2E baseline without mask training, our model demonstrates about 5.51% relative Word Error Rate (WER) reduction on reading speech and 12.92% on spontaneous speech, respectively. The above approach has also been verified on test-set of open-source data THUYG-20, which shows 20% relative improvements. △ Less

Submitted 2 April, 2022; originally announced April 2022.

Comments: Accepted by INTERSPEECH 2021

Journal ref: INTERSPEECH 2021

arXiv:2203.15249 [pdf, other]

MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification

Authors: Yang Zhang, Zhiqiang Lv, Haibin Wu, Shanshan Zhang, Pengfei Hu, Zhiyong Wu, Hung-yi Lee, Helen Meng

Abstract: In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an easy-to-implement, simple but effective backbone for automatic speaker verification based on the Convolution-augmented Transformer (Conformer). The architecture of the MFA-Conformer is inspired by recent stateof-the-art models in speech recognition and speaker verification. Firstly, we introduce a convolution s… ▽ More In this paper, we present Multi-scale Feature Aggregation Conformer (MFA-Conformer), an easy-to-implement, simple but effective backbone for automatic speaker verification based on the Convolution-augmented Transformer (Conformer). The architecture of the MFA-Conformer is inspired by recent stateof-the-art models in speech recognition and speaker verification. Firstly, we introduce a convolution subsampling layer to decrease the computational cost of the model. Secondly, we adopt Conformer blocks which combine Transformers and convolution neural networks (CNNs) to capture global and local features effectively. Finally, the output feature maps from all Conformer blocks are concatenated to aggregate multi-scale representations before final pooling. We evaluate the MFA-Conformer on the widely used benchmarks. The best system obtains 0.64%, 1.29% and 1.63% EER on VoxCeleb1-O, SITW.Dev, and SITW.Eval set, respectively. MFA-Conformer significantly outperforms the popular ECAPA-TDNN systems in both recognition performance and inference speed. Last but not the least, the ablation studies clearly demonstrate that the combination of global and local feature learning can lead to robust and accurate speaker embedding extraction. We have also released the code for future comparison. △ Less

Submitted 10 November, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: accepted by INTERSPEECH 2022

arXiv:2203.07065 [pdf, other]

Optimal Aggregation Strategies for Social Learning over Graphs

Authors: Ping Hu, Virginia Bordignon, Stefan Vlaski, Ali H. Sayed

Abstract: Adaptive social learning is a useful tool for studying distributed decision-making problems over graphs. This paper investigates the effect of combination policies on the performance of adaptive social learning strategies. Using large-deviation analysis, it first derives a bound on the steady-state error probability and characterizes the optimal selection for the Perron eigenvectors of the combina… ▽ More Adaptive social learning is a useful tool for studying distributed decision-making problems over graphs. This paper investigates the effect of combination policies on the performance of adaptive social learning strategies. Using large-deviation analysis, it first derives a bound on the steady-state error probability and characterizes the optimal selection for the Perron eigenvectors of the combination policies. It subsequently studies the effect of the combination policy on the transient behavior of the learning strategy by estimating the adaptation time in the low signal-to-noise ratio regime. In the process, it is discovered that, interestingly, the influence of the combination policy on the transient behavior is insignificant, and thus it is more critical to employ policies that enhance the steady-state performance. The theoretical conclusions are illustrated by means of computer simulations. △ Less

Submitted 31 May, 2023; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2203.04313 [pdf, other]

Multi-Scale Adaptive Network for Single Image Denoising

Authors: Yuanbiao Gou, Peng Hu, Jiancheng Lv, Joey Tianyi Zhou, Xi Peng

Abstract: Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale arc… ▽ More Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale architecture design and accordingly propose a novel Multi-Scale Adaptive Network (MSANet) for single image denoising. Specifically, MSANet simultaneously embraces the within-scale characteristics and the cross-scale complementarity thanks to three novel neural blocks, \textit{i.e.}, adaptive feature block (AFeB), adaptive multi-scale block (AMB), and adaptive fusion block (AFuB). In brief, AFeB is designed to adaptively preserve image details and filter noises, which is highly expected for the features with mixed details and noises. AMB could enlarge the receptive field and aggregate the multi-scale information, which meets the need of contextually informative features. AFuB devotes to adaptively sampling and transferring the features from one scale to another scale, which fuses the multi-scale features with varying characteristics from coarse to fine. Extensive experiments on both three real and six synthetic noisy image datasets show the superiority of MSANet compared with 12 methods. The code could be accessed from https://github.com/XLearning-SCU/2022-NeurIPS-MSANet. △ Less

Submitted 29 October, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Journal ref: the Thirty-Sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2202.07521 [pdf, other]

doi 10.1109/JIOT.2020.2965034

5G Enabled Fault Detection and Diagnostics: How Do We Achieve Efficiency?

Authors: Peng Hu, Jinhuan Zhang

Abstract: The 5th-generation wireless networks (5G) technologies and mobile edge computing (MEC) provide great promises of enabling new capabilities for the industrial Internet of Things. However, the solutions enabled by the 5G ultra-reliable low-latency communication (URLLC) paradigm come with challenges, where URLLC alone does not necessarily guarantee the efficient execution of time-critical fault detec… ▽ More The 5th-generation wireless networks (5G) technologies and mobile edge computing (MEC) provide great promises of enabling new capabilities for the industrial Internet of Things. However, the solutions enabled by the 5G ultra-reliable low-latency communication (URLLC) paradigm come with challenges, where URLLC alone does not necessarily guarantee the efficient execution of time-critical fault detection and diagnostics (FDD) applications. Based on the Tennessee Eastman Process model, we propose the concept of the communication-edge-computing (CEC) loop and a system model for evaluating the efficiency of FDD applications. We then formulate an optimization problem for achieving the defined CEC efficiency and discuss some typical solutions to the generic CEC-based FDD services, and propose a new uplink-based communication protocol called "ReFlexUp". From the performance analysis and numerical results, the proposed ReFlexUp protocol shows its effectiveness compared to the typical protocols such as Selective Repeat ARQ, HARQ, and "Occupy CoW" in terms of the key metrics such as latency, reliability, and efficiency. These results are further convinced from the mmWave-based simulations in a typical 5G MEC-based implementation. △ Less

Submitted 15 February, 2022; originally announced February 2022.

arXiv:2112.08133 [pdf]

doi 10.1016/j.bios.2021.113699

Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Authors: Shaowei Jiang, Chengfei Guo, Zichao Bian, Ruihai Wang, Jiakai Zhu, Pengming Song, Patrick Hu, Derek Hu, Zibang Zhang, Kazunori Hoshino, Bin Feng, Guoan Zheng

Abstract: Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a… ▽ More Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a large scale and with high spatiotemporal resolution. The reported device can be placed within a regular incubator or used as a standalone incubating unit for long-term microbial monitoring. For longitudinal study where massive data are acquired at sequential time points, we report a new temporal-similarity constraint to increase the temporal resolution of ptychographic reconstruction by 7-fold. With this strategy, the reported device achieves a centimeter-scale field of view, a half-pitch spatial resolution of 488 nm, and a temporal resolution of 15-second intervals. For the first time, we report the direct observation of bacterial growth in a 15-second interval by tracking the phase wraps of the recovered images, with high phase sensitivity like that in interferometric measurements. We also characterize cell growth via longitudinal dry mass measurement and perform rapid bacterial detection at low concentrations. For drug-screening application, we demonstrate proof-of-concept antibiotic susceptibility testing and perform single-cell analysis of antibiotic-induced filamentation. The combination of high phase sensitivity, high spatiotemporal resolution, and large field of view is unique among existing microscopy techniques. As a quantitative and miniaturized platform, it can improve studies with microorganisms and other biospecimens at resource-limited settings. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 18 pages, 6 figures

arXiv:2112.06721 [pdf, other]

PM-MMUT: Boosted Phone-Mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition

Authors: Guodong Ma, Pengfei Hu, Nurmemet Yolwas, Shen Huang, Hao Huang

Abstract: Consonant and vowel reduction are often encountered in speech, which might cause performance degradation in automatic speech recognition (ASR). Our recently proposed learning strategy based on masking, Phone Masking Training (PMT), alleviates the impact of such phenomenon in Uyghur ASR. Although PMT achieves remarkably improvements, there still exists room for further gains due to the granularity… ▽ More Consonant and vowel reduction are often encountered in speech, which might cause performance degradation in automatic speech recognition (ASR). Our recently proposed learning strategy based on masking, Phone Masking Training (PMT), alleviates the impact of such phenomenon in Uyghur ASR. Although PMT achieves remarkably improvements, there still exists room for further gains due to the granularity mismatch between the masking unit of PMT (phoneme) and the modeling unit (word-piece). To boost the performance of PMT, we propose multi-modeling unit training (MMUT) architecture fusion with PMT (PM-MMUT). The idea of MMUT framework is to split the Encoder into two parts including acoustic feature sequences to phoneme-level representation (AF-to-PLR) and phoneme-level representation to word-piece-level representation (PLR-to-WPLR). It allows AF-to-PLR to be optimized by an intermediate phoneme-based CTC loss to learn the rich phoneme-level context information brought by PMT. Experimental results on Uyghur ASR show that the proposed approaches outperform obviously the pure PMT. We also conduct experiments on the 960-hour Librispeech benchmark using ESPnet1, which achieves about 10% relative WER reduction on all the test set without LM fusion comparing with the latest official ESPnet1 pre-trained model. △ Less

Submitted 2 July, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted to INTERSPEECH 2022

arXiv:2110.15316 [pdf]

VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge

Authors: Yougen Yuan, Zhiqiang Lv, Shen Huang, Pengfei Hu

Abstract: Keyword wakeup technology has always been a research hotspot in speech processing, but many related works were done on different datasets. We organized a Chinese long-short video keyword wakeup challenge (Video Keyword Wakeup Challenge, VKW) for testing the ability of each participating team to build a keyword wakeup system under the public dataset. All submitted systems not only need to support t… ▽ More Keyword wakeup technology has always been a research hotspot in speech processing, but many related works were done on different datasets. We organized a Chinese long-short video keyword wakeup challenge (Video Keyword Wakeup Challenge, VKW) for testing the ability of each participating team to build a keyword wakeup system under the public dataset. All submitted systems not only need to support the setting of multiple different keywords, but also need to support the wakeup of any costumed keyword.This paper mainly describes the basic situation of the VKW challenge and the experimental results of some participating teams. △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: 6 pages, in Chinese language, 3 tables, NCMMC 2021 conference paper

arXiv:2110.09121 [pdf, ps, other]

KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Authors: Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu

Abstract: An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and… ▽ More An automatic pitch correction system typically includes several stages, such as pitch extraction, deviation estimation, pitch shift processing, and cross-fade smoothing. However, designing these components with strategies often requires domain expertise and they are likely to fail on corner cases. In this paper, we present KaraTuner, an end-to-end neural architecture that predicts pitch curve and resynthesizes the singing voice directly from the tuned pitch and vocal spectrum extracted from the original recordings. Several vital technical points have been introduced in KaraTuner to ensure pitch accuracy, pitch naturalness, timbre consistency, and sound quality. A feed-forward Transformer is employed in the pitch predictor to capture longterm dependencies in the vocal spectrum and musical note. We also develop a pitch-controllable vocoder based on a novel source-filter block and the Fre-GAN architecture. KaraTuner obtains a higher preference than the rule-based pitch correction approach through A/B tests, and perceptual experiments show that the proposed vocoder achieves significant advantages in timbre consistency and sound quality compared with the parametric WORLD vocoder, phase vocoder and CLPC vocoder. △ Less

Submitted 26 June, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea

arXiv:2110.01989 [pdf]

doi 10.1364/OL.437832

High-throughput lensless whole slide imaging via continuous height-varying modulation of tilted sensor

Authors: Shaowei Jiang, Chengfei Guo, Patrick Hu, Derek Hu, Pengming Song, Tianbo Wang, Zichao Bian, Zibang Zhang, Guoan Zheng

Abstract: We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffrac… ▽ More We report a new lensless microscopy configuration by integrating the concepts of transverse translational ptychography and defocus multi-height phase retrieval. In this approach, we place a tilted image sensor under the specimen for linearly-increasing phase modulation along one lateral direction. Similar to the operation of ptychography, we laterally translate the specimen and acquire the diffraction images for reconstruction. Since the axial distance between the specimen and the sensor varies at different lateral positions, laterally translating the specimen effectively introduces defocus multi-height measurements while eliminating axial scanning. Lateral translation further introduces sub-pixel shift for pixel super-resolution imaging and naturally expands the field of view for rapid whole slide imaging. We show that the equivalent height variation can be precisely estimated from the lateral shift of the specimen, thereby addressing the challenge of precise axial positioning in conventional multi-height phase retrieval. Using a sensor with a 1.67-micron pixel size, our low-cost and field-portable prototype can resolve 690-nm linewidth on the resolution target. We show that a whole slide image of a blood smear with a 120-mm^2 field of view can be acquired in 18 seconds. We also demonstrate accurate automatic white blood cell counting from the recovered image. The reported approach may provide a turnkey solution for addressing point-of-care- and telemedicine-related challenges. △ Less

Submitted 28 September, 2021; originally announced October 2021.

arXiv:2107.01805 [pdf]

Dual Synchronous Generator: Inertial Current Source based Grid-Forming Solution for VSC

Authors: Huanhai Xin, Kehao Zhuang, Pengfei Hu, Yunjie Gu, Ping Ju

Abstract: In order to improve dynamic characteristics of the power system with high-proportion renewable energy sources (RESs), it is necessary for the voltage source converter (VSC), interfaces of RESs, to provide inertial and frequency regulation. In practical applications, VSCs are better to be controlled as a current source due to its weak overcurrent capacity. According to the characteristic, a dual sy… ▽ More In order to improve dynamic characteristics of the power system with high-proportion renewable energy sources (RESs), it is necessary for the voltage source converter (VSC), interfaces of RESs, to provide inertial and frequency regulation. In practical applications, VSCs are better to be controlled as a current source due to its weak overcurrent capacity. According to the characteristic, a dual synchronous theory is proposed to analyze the synchronization between current sources in this paper. Based on dual synchronous idea, a dual synchronous generator (DSG) control is applied in VSC to form inertial current source. In addition, a braking control is embedded in DSG control to improve the transient stability of VSC. Finally, experimental results verify the effectiveness of the theory and the control method. △ Less

Submitted 17 July, 2021; v1 submitted 5 July, 2021; originally announced July 2021.

arXiv:2004.07135 [pdf, other]

A 5G NR based System Architecture for Real-Time Control with Batteryless RFID Sensors

Authors: Peng Hu

Abstract: The fifth-generation wireless networking (5G) technologies have been developed to meet various time-critical use cases with ultra-reliable, low-latency and massive machine-type communications which are indispensable for tactile Internet applications. Recent advancements in very low-cost and batteryless radio-frequency identification (RFID) sensors have given promises of deploying a massive amount… ▽ More The fifth-generation wireless networking (5G) technologies have been developed to meet various time-critical use cases with ultra-reliable, low-latency and massive machine-type communications which are indispensable for tactile Internet applications. Recent advancements in very low-cost and batteryless radio-frequency identification (RFID) sensors have given promises of deploying a massive amount of such sensors for real-time sensing and control applications on a 5G New Radio (NR) network. However, the system design and performance of such applications have not been well studied. This paper proposes a novel system architecture for the representative batteryless RFID touch sensors in generic real-time control applications in a 5G NR mmWave environment. We will discuss the solution using edge computing nodes on the 5G NR base station to the implementation of the proposed system architecture. The real-time performance evaluation with the comparison of the Long-Term Evolution (LTE) networks has shown the effectiveness of the proposed system architecture. △ Less

Submitted 12 August, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2004.01800 [pdf, other]

Temporally Distributed Networks for Fast Video Semantic Segmentation

Authors: Ping Hu, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Stan Sclaroff, Federico Perazzi

Abstract: We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefo… ▽ More We present TDNet, a temporally distributed network designed for fast and accurate video semantic segmentation. We observe that features extracted from a certain high-level layer of a deep CNN can be approximated by composing features extracted from several shallower sub-networks. Leveraging the inherent temporal continuity in videos, we distribute these sub-networks over sequential frames. Therefore, at each time step, we only need to perform a lightweight computation to extract a sub-features group from a single sub-network. The full features used for segmentation are then recomposed by application of a novel attention propagation module that compensates for geometry deformation between frames. A grouped knowledge distillation loss is also introduced to further improve the representation power at both full and sub-feature levels. Experiments on Cityscapes, CamVid, and NYUD-v2 demonstrate that our method achieves state-of-the-art accuracy with significantly faster speed and lower latency. △ Less

Submitted 6 April, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

Comments: [CVPR2020] Project: https://github.com/feinanshan/TDNet

arXiv:1912.11264 [pdf, other]

Deep Manifold Embedding for Hyperspectral Image Classification

Authors: Zhiqiang Gong, Weidong Hu, Xiaoyong Du, Ping Zhong, Panhe Hu

Abstract: Deep learning methods have played a more and more important role in hyperspectral image classification. However, the general deep learning methods mainly take advantage of the information of sample itself or the pairwise information between samples while ignore the intrinsic data structure within the whole data. To tackle this problem, this work develops a novel deep manifold embedding method(DMEM… ▽ More Deep learning methods have played a more and more important role in hyperspectral image classification. However, the general deep learning methods mainly take advantage of the information of sample itself or the pairwise information between samples while ignore the intrinsic data structure within the whole data. To tackle this problem, this work develops a novel deep manifold embedding method(DMEM) for hyperspectral image classification. First, each class in the image is modelled as a specific nonlinear manifold and the geodesic distance is used to measure the correlation between the samples. Then, based on the hierarchical clustering, the manifold structure of the data can be captured and each nonlinear data manifold can be divided into several sub-classes. Finally, considering the distribution of each sub-class and the correlation between different subclasses, the DMEM is constructed to preserve the estimated geodesic distances on the data manifold between the learned low dimensional features of different samples. Experiments over three real-world hyperspectral image datasets have demonstrated the effectiveness of the proposed method. △ Less

Submitted 27 March, 2021; v1 submitted 24 December, 2019; originally announced December 2019.

Comments: Accepted by IEEE TCYB

arXiv:1908.05033 [pdf, other]

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Authors: Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, Junjie Yan

Abstract: Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process… ▽ More Hardware-friendly network quantization (e.g., binary/uniform quantization) can efficiently accelerate the inference and meanwhile reduce memory consumption of the deep neural networks, which is crucial for model deployment on resource-limited devices like mobile phones. However, due to the discreteness of low-bit quantization, existing quantization methods often face the unstable training process and severe performance degradation. To address this problem, in this paper we propose Differentiable Soft Quantization (DSQ) to bridge the gap between the full-precision and low-bit networks. DSQ can automatically evolve during training to gradually approximate the standard quantization. Owing to its differentiable property, DSQ can help pursue the accurate gradients in backward propagation, and reduce the quantization loss in forward process with an appropriate clipping range. Extensive experiments over several popular network structures show that training low-bit neural networks with DSQ can consistently outperform state-of-the-art quantization methods. Besides, our first efficient implementation for deploying 2 to 4-bit DSQ on devices with ARM architecture achieves up to 1.7$\times$ speed up, compared with the open-source 8-bit high-performance inference framework NCNN. [31] △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: IEEE ICCV 2019

arXiv:1907.11138 [pdf]

Effect of Surrounding Conductive Object on Four-Plate Capacitive Power Transfer System

Authors: Qi Zhu, Lixiang Jackie Zou, Shaoge Zang, Mei Su, Aiguo Patrick Hu

Abstract: In this paper, the effect of a surrounding conductive object on a typical capacitive power transfer (CPT) system with two pairs of parallel plates is studied by considering the mutual coupling between the conductive object and the plates. A mathematical model is established based on a 5*5 mutual capacitance matrix by using a larger additional conductive plate to represent the surrounding conductiv… ▽ More In this paper, the effect of a surrounding conductive object on a typical capacitive power transfer (CPT) system with two pairs of parallel plates is studied by considering the mutual coupling between the conductive object and the plates. A mathematical model is established based on a 5*5 mutual capacitance matrix by using a larger additional conductive plate to represent the surrounding conductive object. Based on the proposed model, the effect of the additional conductive plate on the CPT system is analyzed in detail. The electric field distribution of the CPT system including the additional plate is simulated by ANSYS Maxwell. A practical CPT system consisting of four 100mm*100mm square aluminum plates and one 300mm*300mm square aluminum plate is built to verify the modeling and analysis. Both theoretical and experimental results show that the output voltage of the CPT system decreases when the additional conductive plate is placed closer to the CPT system. It has found that the additional plate can effectively shield the electric field outside the plate, and it attracts the electric field in-between the four plates of the CPT system and the additional plate. It has also found that the voltage potential difference between the additional plate and the reference plate of the CPT system remains almost constant even when the distance between them changes. The findings are useful for guiding the design of CPT systems, particularly the electric field shielding. △ Less

Submitted 7 June, 2019; originally announced July 2019.

Comments: 9 pages, 15 figures, 4 tables

Showing 1–50 of 51 results for author: Hu, P