Search | arXiv e-print repository

Integrated Sensing, Communication, and Powering over Multi-antenna OFDM Systems

Authors: Yilong Chen, Chao Hu, Zixiang Ren, Han Hu, Jie Xu, Lexi Xu, Lei Liu, Shuguang Cui

Abstract: This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on t… ▽ More This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on the echo signals. To facilitate ISCAP, the BS employs the joint transmit beamforming design by sending dedicated sensing/energy beams jointly with information beams. Furthermore, we consider the beam scanning for sensing, in which the joint beams scan in different directions over time to sense potential targets. In order to ensure the sensing beam scanning performance and meet the communication and powering requirements, it is essential to properly schedule IRs and ERs and design the resource allocation over time, frequency, and space. More specifically, we optimize the joint transmit beamforming over multiple OFDM symbols and subcarriers, with the objective of minimizing the average beampattern matching error of beam scanning for sensing, subject to the constraints on the average communication rates at IRs and the average harvested power at ERs. We find converged high-quality solutions to the formulated problem by proposing efficient iterative algorithms based on advanced optimization techniques. We also develop various heuristic designs based on the principles of zero-forcing (ZF) beamforming, round-robin user scheduling, and time switching, respectively. Numerical results show that our proposed algorithms adaptively generate information and sensing/energy beams at each time-frequency slot to match the scheduled IRs/ERs with the desired scanning beam, significantly outperforming the heuristic designs. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: 13 pages, 12 figures

arXiv:2408.02095 [pdf, other]

Secure Semantic Communications: From Perspective of Physical Layer Security

Authors: Yongkang Li, Zheng Shi, Han Hu, Yaru Fu, Hong Wang, Hongjiang Lei

Abstract: Semantic communications have been envisioned as a potential technique that goes beyond Shannon paradigm. Unlike modern communications that provide bit-level security, the eaves-dropping of semantic communications poses a significant risk of potentially exposing intention of legitimate user. To address this challenge, a novel deep neural network (DNN) enabled secure semantic communication (DeepSSC)… ▽ More Semantic communications have been envisioned as a potential technique that goes beyond Shannon paradigm. Unlike modern communications that provide bit-level security, the eaves-dropping of semantic communications poses a significant risk of potentially exposing intention of legitimate user. To address this challenge, a novel deep neural network (DNN) enabled secure semantic communication (DeepSSC) system is developed by capitalizing on physical layer security. To balance the tradeoff between security and reliability, a two-phase training method for DNNs is devised. Particularly, Phase I aims at semantic recovery of legitimate user, while Phase II attempts to minimize the leakage of semantic information to eavesdroppers. The loss functions of DeepSSC in Phases I and II are respectively designed according to Shannon capacity and secure channel capacity, which are approximated with variational inference. Moreover, we define the metric of secure bilingual evaluation understudy (S-BLEU) to assess the security of semantic communications. Finally, simulation results demonstrate that DeepSSC achieves a significant boost to semantic security particularly in high signal-to-noise ratio regime, despite a minor degradation of reliability. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2407.20532 [pdf, other]

Scalable Synthesis of Formally Verified Neural Value Function for Hamilton-Jacobi Reachability Analysis

Authors: Yujie Yang, Hanjiang Hu, Tianhao Wei, Shengbo Eben Li, Changliu Liu

Abstract: Hamilton-Jacobi (HJ) reachability analysis provides a formal method for guaranteeing safety in constrained control problems. It synthesizes a value function to represent a long-term safe set called feasible region. Early synthesis methods based on state space discretization cannot scale to high-dimensional problems, while recent methods that use neural networks to approximate value functions resul… ▽ More Hamilton-Jacobi (HJ) reachability analysis provides a formal method for guaranteeing safety in constrained control problems. It synthesizes a value function to represent a long-term safe set called feasible region. Early synthesis methods based on state space discretization cannot scale to high-dimensional problems, while recent methods that use neural networks to approximate value functions result in unverifiable feasible regions. To achieve both scalability and verifiability, we propose a framework for synthesizing verified neural value functions for HJ reachability analysis. Our framework consists of three stages: pre-training, adversarial training, and verification-guided training. We design three techniques to address three challenges to improve scalability respectively: boundary-guided backtracking (BGB) to improve counterexample search efficiency, entering state regularization (ESR) to enlarge feasible region, and activation pattern alignment (APA) to accelerate neural network verification. We also provide a neural safety certificate synthesis and verification benchmark called Cersyve-9, which includes nine commonly used safe control tasks and supplements existing neural network verification benchmarks. Our framework successfully synthesizes verified neural value functions on all tasks, and our proposed three techniques exhibit superior scalability and efficiency compared with existing methods. △ Less

Submitted 31 July, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.08961 [pdf]

Tissue-Contrastive Semi-Masked Autoencoders for Segmentation Pretraining on Chest CT

Authors: Jie Zheng, Ru Wen, Haiqin Hu, Lina Wei, Kui Su, Wei Chen, Chen Liu, Jun Wang

Abstract: Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream model… ▽ More Existing Masked Image Modeling (MIM) depends on a spatial patch-based masking-reconstruction strategy to perceive objects'features from unlabeled images, which may face two limitations when applied to chest CT: 1) inefficient feature learning due to complex anatomical details presented in CT images, and 2) suboptimal knowledge transfer owing to input disparity between upstream and downstream models. To address these issues, we propose a new MIM method named Tissue-Contrastive Semi-Masked Autoencoder (TCS-MAE) for modeling chest CT images. Our method has two novel designs: 1) a tissue-based masking-reconstruction strategy to capture more fine-grained anatomical features, and 2) a dual-AE architecture with contrastive learning between the masked and original image views to bridge the gap of the upstream and downstream models. To validate our method, we systematically investigate representative contrastive, generative, and hybrid self-supervised learning methods on top of tasks involving segmenting pneumonia, mediastinal tumors, and various organs. The results demonstrate that, compared to existing methods, our TCS-MAE more effectively learns tissue-aware representations, thereby significantly enhancing segmentation performance across all tasks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05407 [pdf, other]

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role in LLM-based TTS models. Current speech tokens are learned in an unsupervised manner, which lacks explicit semantic information and alignment to the text. In this paper, we propose to represent speech with supervised semantic tokens, which are derived from a multilingual speech recognition model by inserting vector quantization into the encoder. Based on the tokens, we further propose a scalable zero-shot TTS synthesizer, CosyVoice, which consists of an LLM for text-to-token generation and a conditional flow matching model for token-to-speech synthesis. Experimental results show that supervised semantic tokens significantly outperform existing unsupervised tokens in terms of content consistency and speaker similarity for zero-shot voice cloning. Moreover, we find that utilizing large-scale data further improves the synthesis performance, indicating the scalable capacity of CosyVoice. To the best of our knowledge, this is the first attempt to involve supervised speech tokens into TTS models. △ Less

Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

arXiv:2407.04051 [pdf, other]

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, speaking style, and speaker identity. SenseVoice-Small delivers exceptionally low-latency ASR for 5 languages, and SenseVoice-Large supports high-precision ASR for over 50 languages, while CosyVoice excels in multi-lingual voice generation, zero-shot in-context learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CosyVoice have been open-sourced on Modelscope and Huggingface, along with the corresponding training, inference, and fine-tuning codes released on GitHub. By integrating these models with LLMs, FunAudioLLM enables applications such as speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration, thereby pushing the boundaries of voice interaction technology. Demos are available at https://fun-audio-llm.github.io, and the code can be accessed at https://github.com/FunAudioLLM. △ Less

Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Work in progress. Authors are listed in alphabetical order by family name

arXiv:2406.19608 [pdf, other]

Multi-service collaboration and composition of cloud manufacturing customized production based on problem decomposition

Authors: Hao Yue, Yingtao Wu, Min Wang, Hesuan Hu, Weimin Wu, Jihui Zhang

Abstract: Cloud manufacturing system is a service-oriented and knowledge-based one, which can provide solutions for the large-scale customized production. The service resource allocation is the primary factor that restricts the production time and cost in the cloud manufacturing customized production (CMCP). In order to improve the efficiency and reduce the cost in CMCP, we propose a new framework which con… ▽ More Cloud manufacturing system is a service-oriented and knowledge-based one, which can provide solutions for the large-scale customized production. The service resource allocation is the primary factor that restricts the production time and cost in the cloud manufacturing customized production (CMCP). In order to improve the efficiency and reduce the cost in CMCP, we propose a new framework which considers the collaboration among services with the same functionality. A mathematical evaluation formulation for the service composition and service usage scheme is constructed with the following critical indexes: completion time, cost, and number of selected services. Subsequently, a problem decomposition based genetic algorithm is designed to obtain the optimal service compositions with service usage schemes. A smart clothing customization case is illustrated so as to show the effectiveness and efficiency of the method proposed in this paper. Finally, the results of simulation experiments and comparisons show that these solutions obtained by our method are with the minimum time, a lower cost, and the fewer selected services. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 12 pages, 8 figures

ACM Class: J.0

arXiv:2406.09810 [pdf, other]

Think Deep and Fast: Learning Neural Nonlinear Opinion Dynamics from Inverse Dynamic Games for Split-Second Interactions

Authors: Haimin Hu, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is… ▽ More Non-cooperative interactions commonly occur in multi-agent scenarios such as car racing, where an ego vehicle can choose to overtake the rival, or stay behind it until a safe overtaking "corridor" opens. While an expert human can do well at making such time-sensitive decisions, the development of safe and efficient game-theoretic trajectory planners capable of rapidly reasoning discrete options is yet to be fully addressed. The recently developed nonlinear opinion dynamics (NOD) show promise in enabling fast opinion formation and avoiding safety-critical deadlocks. However, it remains an open challenge to determine the model parameters of NOD automatically and adaptively, accounting for the ever-changing environment of interaction. In this work, we propose for the first time a learning-based, game-theoretic approach to synthesize a Neural NOD model from expert demonstrations, given as a dataset containing (possibly incomplete) state and action trajectories of interacting agents. The learned NOD can be used by existing dynamic game solvers to plan decisively while accounting for the predicted change of other agents' intents, thus enabling situational awareness in planning. We demonstrate Neural NOD's ability to make fast and robust decisions in a simulated autonomous racing example, leading to tangible improvements in safety and overtaking performance over state-of-the-art data-driven game-theoretic planning methods. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.08038 [pdf, other]

Interference Analysis for Coexistence of UAVs and Civil Aircrafts Based on Automatic Dependent Surveillance-Broadcast

Authors: Yiyang Liao, Ziye Jia, Chao Dong, Lei Zhang, Qihui Wu, Huiling Hu, Zhu Han

Abstract: Due to the advantages of high mobility and easy deployment, unmanned aerial vehicles (UAVs) are widely applied in both military and civilian fields. In order to strengthen the flight surveillance of UAVs and guarantee the airspace safety, UAVs can be equipped with the automatic dependent surveillance-broadcast (ADS-B) system, which periodically sends flight information to other aircrafts and groun… ▽ More Due to the advantages of high mobility and easy deployment, unmanned aerial vehicles (UAVs) are widely applied in both military and civilian fields. In order to strengthen the flight surveillance of UAVs and guarantee the airspace safety, UAVs can be equipped with the automatic dependent surveillance-broadcast (ADS-B) system, which periodically sends flight information to other aircrafts and ground stations (GSs). However, due to the limited resource of channel capacity, UAVs equipped with ADS-B results in the interference between UAVs and civil aircrafts (CAs), which further impacts the accuracy of received information at GSs. In detail, the channel capacity is mainly affected by the density of aircrafts and the transmitting power of ADS-B. Hence, based on the three-dimensional poisson point process, this work leverages the stochastic geometry theory to build a model of the coexistence of UAVs and CAs and analyze the interference performance of ADS-B monitoring system. From simulation results, we reveal the effects of transmitting power, density, threshold and pathloss on the performance of the ADS-B monitoring system. Besides, we provide the suggested transmitting power and density for the safe coexistence of UAVs and CAs. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.13636 [pdf, other]

Audio Mamba: Pretrained Audio State Space Model For Audio Tagging

Authors: Jiaju Lin, Haoxuan Hu

Abstract: Audio tagging is an important task of mapping audio samples to their corresponding categories. Recently endeavours that exploit transformer models in this field have achieved great success. However, the quadratic self-attention cost limits the scaling of audio transformer models and further constrains the development of more universal audio models. In this paper, we attempt to solve this problem b… ▽ More Audio tagging is an important task of mapping audio samples to their corresponding categories. Recently endeavours that exploit transformer models in this field have achieved great success. However, the quadratic self-attention cost limits the scaling of audio transformer models and further constrains the development of more universal audio models. In this paper, we attempt to solve this problem by proposing Audio Mamba, a self-attention-free approach that captures long audio spectrogram dependency with state space models. Our experimental results on two audio-tagging datasets demonstrate the parameter efficiency of Audio Mamba, it achieves comparable results to SOTA audio spectrogram transformers with one third parameters. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.07994 [pdf]

BubbleID: A Deep Learning Framework for Bubble Interface Dynamics Analysis

Authors: Christy Dunlap, Changgen Li, Hari Pandey, Ngan Le, Han Hu

Abstract: This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetim… ▽ More This paper presents BubbleID, a sophisticated deep learning architecture designed to comprehensively identify both static and dynamic attributes of bubbles within sequences of boiling images. By amalgamating segmentation powered by Mask R-CNN with SORT-based tracking techniques, the framework is capable of analyzing each bubble's location, dimensions, interface shape, and velocity over its lifetime, and capturing dynamic events such as bubble departure. BubbleID is trained and tested on boiling images across diverse heater surfaces and operational settings. This paper also offers a comparative analysis of bubble interface dynamics prior to and post-critical heat flux (CHF) conditions. △ Less

Submitted 20 March, 2024; originally announced May 2024.

Comments: 16 pages, 4 figures

arXiv:2404.13456 [pdf, other]

Real-Time Safe Control of Neural Network Dynamic Models with Sound Approximation

Authors: Hanjiang Hu, Jianglin Lan, Changliu Liu

Abstract: Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernste… ▽ More Safe control of neural network dynamic models (NNDMs) is important to robotics and many applications. However, it remains challenging to compute an optimal safe control in real time for NNDM. To enable real-time computation, we propose to use a sound approximation of the NNDM in the control synthesis. In particular, we propose Bernstein over-approximated neural dynamics (BOND) based on the Bernstein polynomial over-approximation (BPO) of ReLU activation functions in NNDM. To mitigate the errors introduced by the approximation and to ensure persistent feasibility of the safe control problems, we synthesize a worst-case safety index using the most unsafe approximated state within the BPO relaxation of NNDM offline. For the online real-time optimization, we formulate the first-order Taylor approximation of the nonlinear worst-case safety constraint as an additional linear layer of NNDM with the l2 bounded bias term for the higher-order remainder. Comprehensive experiments with different neural dynamics and safety constraints show that with safety guaranteed, our NNDMs with sound approximation are 10-100 times faster than the safe control baseline that uses mixed integer programming (MIP), validating the effectiveness of the worst-case safety index and scalability of the proposed BOND in real-time large-scale settings. The code is available at https://github.com/intelligent-control-lab/BOND. △ Less

Submitted 20 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

Comments: Camera-ready version of L4DC 2024, 12 pages, 3 figures, 4 tables

arXiv:2403.16361 [pdf, other]

RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Authors: Ziheng Deng, Hua Chen, Haibo Hu, Zhiyong Xu, Jiayuan Sun, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao

Abstract: Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, the cone-beam projections become much sparser and the reconstructed 4D CBCT images will be covered… ▽ More Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, the cone-beam projections become much sparser and the reconstructed 4D CBCT images will be covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ 2D network models as backbones, neglecting the intrinsic structural priors within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images. We find that streak artifacts exhibit a unique rotational motion along with the patient's respiration, distinguishable from diaphragm-driven respiratory motion in the spatiotemporal domain. Therefore, we propose a novel 4D neural network model, RSTAR4D-Net, designed to address Rotational STreak Artifact Reduction by integrating the spatial and temporal information within 4D CBCT images. Specifically, we overcome the computational and training difficulties of a 4D neural network. The specially designed model adopts an efficient implementation of 4D convolutions to reduce computational costs and thus can process the whole 4D image in one pass. Additionally, a Tetris training strategy pertinent to the separable 4D convolutions is proposed to effectively train the model using limited 4D training samples. Extensive experiments substantiate the effectiveness of our proposed method, and the RSTAR4D-Net shows superior performance compared to other methods. The source code and dynamic demos are available at https://github.com/ivy9092111111/RSTAR. △ Less

Submitted 22 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

arXiv:2402.18070 [pdf, other]

A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 7 pages, 7 figures, conference

arXiv:2402.14174 [pdf, other]

Blending Data-Driven Priors in Dynamic Games

Authors: Justin Lidard, Haimin Hu, Asher Hancock, Zixu Zhang, Albert Gimó Contreras, Vikash Modi, Jonathan DeCastro, Deepak Gopinath, Guy Rosman, Naomi Ehrich Leonard, María Santos, Jaime Fernández Fisac

Abstract: As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, h… ▽ More As intelligent robots like autonomous vehicles become increasingly deployed in the presence of people, the extent to which these systems should leverage model-based game-theoretic planners versus data-driven policies for safe, interaction-aware motion planning remains an open question. Existing dynamic game formulations assume all agents are task-driven and behave optimally. However, in reality, humans tend to deviate from the decisions prescribed by these models, and their behavior is better approximated under a noisy-rational paradigm. In this work, we investigate a principled methodology to blend a data-driven reference policy with an optimization-based game-theoretic policy. We formulate KLGame, an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization with respect to a general, stochastic, and possibly multi-modal reference policy. Our method incorporates, for each decision maker, a tunable parameter that permits modulation between task-driven and data-driven behaviors. We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time. Through a series of simulated and real-world autonomous driving scenarios, we demonstrate that KLGame policies can more effectively incorporate guidance from the reference policy and account for noisily-rational human behaviors versus non-regularized baselines. Website with additional information, videos, and code: https://kl-games.github.io/. △ Less

Submitted 6 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 20 pages, 12 figures

arXiv:2402.09246 [pdf, other]

Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots

Authors: Haimin Hu, Gabriele Dragotto, Zixu Zhang, Kaiqu Liang, Bartolomeo Stellato, Jaime F. Fisac

Abstract: We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutat… ▽ More We consider the multi-agent spatial navigation problem of computing the socially optimal order of play, i.e., the sequence in which the agents commit to their decisions, and its associated equilibrium in an N-player Stackelberg trajectory game. We model this problem as a mixed-integer optimization problem over the space of all possible Stackelberg games associated with the order of play's permutations. To solve the problem, we introduce Branch and Play (B&P), an efficient and exact algorithm that provably converges to a socially optimal order of play and its Stackelberg equilibrium. As a subroutine for B&P, we employ and extend sequential trajectory planning, i.e., a popular multi-agent control approach, to scalably compute valid local Stackelberg equilibria for any given order of play. We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets. We find that B&P consistently outperforms various baselines, and computes the socially optimal equilibrium. △ Less

Submitted 24 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: Robotics: Science and Systems (RSS) 2024

arXiv:2401.13766 [pdf, ps, other]

Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori

Authors: Hu Hu, Sabato Marco Siniscalchi, Chin-Hui Lee

Abstract: In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models. Latent variables indeed encode both transferable distributional information and structural relationships. Thus the distributions of the source latent variables (prior) can be combined with the knowledge learned from the target data (likelihood) to… ▽ More In this work, we aim to establish a Bayesian adaptive learning framework by focusing on estimating latent variables in deep neural network (DNN) models. Latent variables indeed encode both transferable distributional information and structural relationships. Thus the distributions of the source latent variables (prior) can be combined with the knowledge learned from the target data (likelihood) to yield the distributions of the target latent variables (posterior) with the goal of addressing acoustic mismatches between training and testing conditions. The prior knowledge transfer is accomplished through Variational Bayes (VB). In addition, we also investigate Maximum a Posteriori (MAP) based Bayesian adaptation. Experimental results on device adaptation in acoustic scene classification show that our proposed approaches can obtain good improvements on target devices, and consistently outperforms other cut-edging algorithms. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: ASRU2023 Bayesian Symposium. arXiv admin note: text overlap with arXiv:2110.08598

arXiv:2401.09455 [pdf, other]

Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

Authors: Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jianping An, Shiwen Mao

Abstract: The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to sat… ▽ More The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to satisfy various constraints related to satellite service quality. To address these challenges, we study packet routing with ground stations and satellites working jointly to transmit packets, while prioritizing fast communication and meeting energy efficiency and packet loss requirements. Specifically, we formulate the problem of packet routing with constraints as a max-min problem using the Lagrange method. Then we propose a novel constrained Multi-Agent reinforcement learning (MARL) dynamic routing algorithm named CMADR, which efficiently balances objective improvement and constraint satisfaction during the updating of policy and Lagrange multipliers. Finally, we conduct extensive experiments and an ablation study using the OneWeb and Telesat mega-constellations. Results demonstrate that CMADR reduces the packet delay by a minimum of 21% and 15%, while meeting stringent energy consumption and packet loss rate constraints, outperforming several baseline algorithms. △ Less

Submitted 22 December, 2023; originally announced January 2024.

arXiv:2401.03664 [pdf]

Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature attribution algorithm SP-RISA are gracefully applied. Uncertainty quantification is used to evaluate the predictive reliability via the Test Time Enhancement. The effectiveness of this reliability evaluation framework has been verified on our breast ultrasound clinical dataset YBUS, and its robustness is verified on the public dataset BUSI. The expected calibration errors on both datasets are significantly lower than traditional evaluation methods, which proves the effectiveness of our proposed reliability measurement. △ Less

Submitted 7 January, 2024; originally announced January 2024.

arXiv:2312.15721 [pdf, ps, other]

UAV Trajectory Tracking via RNN-enhanced IMM-KF with ADS-B Data

Authors: Yian Zhu, Ziye Jia, Qihui Wu, Chao Dong, Zirui Zhuang, Huiling Hu, Qi Cai

Abstract: With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic de… ▽ More With the increasing use of autonomous unmanned aerial vehicles (UAVs), it is critical to ensure that they are continuously tracked and controlled, especially when UAVs operate beyond the communication range of ground stations (GSs). Conventional surveillance methods for UAVs, such as satellite communications, ground mobile networks and radars are subject to high costs and latency. The automatic dependent surveillance-broadcast (ADS-B) emerges as a promising method to monitor UAVs, due to the advantages of real-time capabilities, easy deployment and affordable cost. Therefore, we employ the ADS-B for UAV trajectory tracking in this work. However, the inherent noise in the transmitted data poses an obstacle for precisely tracking UAVs. Hence, we propose the algorithm of recurrent neural network-enhanced interacting multiple model-Kalman filter (RNN-enhanced IMM-KF) for UAV trajectory filtering. Specifically, the algorithm utilizes the RNN to capture the maneuvering behavior of UAVs and the noise level in the ADS-B data. Moreover, accurate UAV tracking is achieved by adaptively adjusting the process noise matrix and observation noise matrix of IMM-KF with the assistance of the RNN. The proposed algorithm can facilitate GSs to make timely decisions during trajectory deviations of UAVs and improve the airspace safety. Finally, via comprehensive simulations, the total root mean square error of the proposed algorithm decreases by 28.56%, compared to the traditional IMM-KF. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.14563 [pdf, other]

AI Generated Signal for Wireless Sensing

Authors: Hanxiang He, Han Hu, Xintao Huan, Heng Liu, Jianping An, Shiwen Mao

Abstract: Deep learning has significantly advanced wireless sensing technology by leveraging substantial amounts of high-quality training data. However, collecting wireless sensing data encounters diverse challenges, including unavoidable data noise, limited data scale due to significant collection overhead, and the necessity to reacquire data in new environments. Taking inspiration from the achievements of… ▽ More Deep learning has significantly advanced wireless sensing technology by leveraging substantial amounts of high-quality training data. However, collecting wireless sensing data encounters diverse challenges, including unavoidable data noise, limited data scale due to significant collection overhead, and the necessity to reacquire data in new environments. Taking inspiration from the achievements of AI-generated content, this paper introduces a signal generation method that achieves data denoising, augmentation, and synthesis by disentangling distinct attributes within the signal, such as individual and environment. The approach encompasses two pivotal modules: structured signal selection and signal disentanglement generation. Structured signal selection establishes a minimal signal set with the target attributes for subsequent attribute disentanglement. Signal disentanglement generation disentangles the target attributes and reassembles them to generate novel signals. Extensive experimental results demonstrate that the proposed method can generate data that closely resembles real-world data on two wireless sensing datasets, exhibiting state-of-the-art performance. Our approach presents a robust framework for comprehending and manipulating attribute-specific information in wireless sensing. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: 6 pages, 6 figures, published to Globecom2023

arXiv:2312.04786 [pdf, other]

Joint User Association, Interference Cancellation and Power Control for Multi-IRS Assisted UAV Communications

Authors: Zhaolong Ning, Hao Hu, Xiaojie Wang, Qingqing Wu, Chau Yuen, F. Richard Yu, Yan Zhang

Abstract: Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way. Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs, whereas it is extremely challenging for joint multi-IRS multi-user association in UAV communications with const… ▽ More Intelligent reflecting surface (IRS)-assisted unmanned aerial vehicle (UAV) communications are expected to alleviate the load of ground base stations in a cost-effective way. Existing studies mainly focus on the deployment and resource allocation of a single IRS instead of multiple IRSs, whereas it is extremely challenging for joint multi-IRS multi-user association in UAV communications with constrained reflecting resources and dynamic scenarios. To address the aforementioned challenges, we propose a new optimization algorithm for joint IRS-user association, trajectory optimization of UAVs, successive interference cancellation (SIC) decoding order scheduling and power allocation to maximize system energy efficiency. We first propose an inverse soft-Q learning-based algorithm to optimize multi-IRS multi-user association. Then, SCA and Dinkelbach-based algorithm are leveraged to optimize UAV trajectory followed by the optimization of SIC decoding order scheduling and power allocation. Finally, theoretical analysis and performance results show significant advantages of the designed algorithm in convergence rate and energy efficiency. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.00567 [pdf]

A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images

Authors: Ni Yao, Hang Hu, Kaicong Chen, Chen Zhao, Yuan Guo, Boya Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Weihua Zhou, Li Tian

Abstract: Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross… ▽ More Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross-validation, a deep learning model incorporating uncertainty estimation was developed to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation set of 78 patients from Center 2 further evaluated the model's performance. Results In the five-fold cross-validation, the model's area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI: 0.826-0.923), 0.846 (95% CI: 0.812-0.886), and 0.839 (95% CI: 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI: 0.838-0.882), 0.787 (95% CI: 0.757-0.818), and 0.793 (95% CI: 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. Conclusions The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making for patients with renal tumors. Clinical relevance statement Our deep learning approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence references, promoting informed decision-making for patients with RCC. △ Less

Submitted 12 November, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: 16 pages, 6 figures

arXiv:2310.20289 [pdf]

C-Silicon-based metasurfaces for aperture-robust spectrometer/imaging with angle integration

Authors: Weizhu Xu, Qingbin Fan, Peicheng Lin, Jiarong Wang, Hao Hu, Tao Yue, Xuemei Hu, Ting Xu

Abstract: Compared with conventional grating-based spectrometers, reconstructive spectrometers based on spectrally engineered filtering have the advantage of miniaturization because of the less demand for dispersive optics and free propagation space. However, available reconstructive spectrometers fail to balance the performance on operational bandwidth, spectral diversity and angular stability. In this wor… ▽ More Compared with conventional grating-based spectrometers, reconstructive spectrometers based on spectrally engineered filtering have the advantage of miniaturization because of the less demand for dispersive optics and free propagation space. However, available reconstructive spectrometers fail to balance the performance on operational bandwidth, spectral diversity and angular stability. In this work, we proposed a compact silicon metasurfaces based spectrometer/camera. After angle integration, the spectral response of the system is robust to angle/aperture within a wide working bandwidth from 400nm to 800nm. It is experimentally demonstrated that the proposed method could maintain the spectral consistency from F/1.8 to F/4 (The corresponding angle of incident light ranges from 7° to 16°) and the incident hyperspectral signal could be accurately reconstructed with a fidelity exceeding 99%. Additionally, a spectral imaging system with 400x400 pixels is also established in this work. The accurate reconstructed hyperspectral image indicates that the proposed aperture-robust spectrometer has the potential to be extended as a high-resolution broadband hyperspectral camera. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.06678 [pdf, other]

Modelling and Performance Analysis of the Over-the-Air Computing in Cellular IoT Networks

Authors: Ying Dong, Haonan Hu, Qiaoshou Liu, Tingwei Lv, Qianbin Chen, Jie Zhang

Abstract: Ultra-fast wireless data aggregation (WDA) of distributed data has emerged as a critical design challenge in the ultra-densely deployed cellular internet of things network (CITN) due to limited spectral resources. Over-the-air computing (AirComp) has been proposed as an effective solution for ultra-fast WDA by exploiting the superposition property of wireless channels. However, the effect of acces… ▽ More Ultra-fast wireless data aggregation (WDA) of distributed data has emerged as a critical design challenge in the ultra-densely deployed cellular internet of things network (CITN) due to limited spectral resources. Over-the-air computing (AirComp) has been proposed as an effective solution for ultra-fast WDA by exploiting the superposition property of wireless channels. However, the effect of access radius of access point (AP) on the AirComp performance has not been investigated yet. Therefore, in this work, the mean square error (MSE) performance of AirComp in the ultra-densely deployed CITN is analyzed with the AP access radius. By modelling the spatial locations of internet of things devices as a Poisson point process, the expression of MSE is derived in an analytical form, which is validated by Monte Carlo simulations. Based on the analytical MSE, we investigate the effect of AP access radius on the MSE of AirComp numerically. The results show that there exists an optimal AP access radius for AirComp, which can decrease the MSE by up to 12.7%. It indicates that the AP access radius should be carefully chosen to improve the AirComp performance in the ultra-densely deployed CITN. △ Less

Submitted 11 August, 2023; originally announced October 2023.

arXiv:2309.16077 [pdf, other]

Task-Oriented Koopman-Based Control with Contrastive Encoder

Authors: Xubo Lyu, Hanyang Hu, Seth Siriya, Ye Pu, Mo Chen

Abstract: We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator, and associated linear controller within an iterative loop. By prioritizing the task cost as the main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which, fo… ▽ More We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator, and associated linear controller within an iterative loop. By prioritizing the task cost as the main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which, for the first time to the best of our knowledge, extends Koopman control from low to high-dimensional, complex nonlinear systems, including pixel-based tasks and a real robot with lidar observations. Code and videos are available \href{https://sites.google.com/view/kpmlilatsupp/}{here}. △ Less

Submitted 1 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted by the 7th Annual Conference on Robot Learning (CoRL), 2023 (oral spotlight)

arXiv:2309.13155 [pdf, other]

Multi-Agent Reach-Avoid Games: Two Attackers Versus One Defender and Mixed Integer Programming

Authors: Hanyang Hu, Minh Bui, Mo Chen

Abstract: We propose a hybrid approach that combines Hamilton-Jacobi (HJ) reachability and mixed-integer optimization for solving a reach-avoid game with multiple attackers and defenders. The reach-avoid game is an important problem with potential applications in air traffic control and multi-agent motion planning; however, solving this game for many attackers and defenders is intractable due to the adversa… ▽ More We propose a hybrid approach that combines Hamilton-Jacobi (HJ) reachability and mixed-integer optimization for solving a reach-avoid game with multiple attackers and defenders. The reach-avoid game is an important problem with potential applications in air traffic control and multi-agent motion planning; however, solving this game for many attackers and defenders is intractable due to the adversarial nature of the agents and the high problem dimensionality. In this paper, we first propose an HJ reachability-based method for solving the reach-avoid game in which 2 attackers are playing against 1 defender; we derive the numerically convergent optimal winning sets for the two sides in environments with obstacles. Utilizing this result and previous results for the 1 vs. 1 game, we further propose solving the general multi-agent reach-avoid game by determining the defender assignments that can maximize the number of attackers captured via a Mixed Integer Program (MIP). Our method generalizes previous state-of-the-art results and is especially useful when there are fewer defenders than attackers. We validate our theoretical results in numerical simulations. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.05837 [pdf, other]

The Safety Filter: A Unified View of Safety-Critical Control in Autonomous Systems

Authors: Kai-Chieh Hsu, Haimin Hu, Jaime Fernández Fisac

Abstract: Recent years have seen significant progress in the realm of robot autonomy, accompanied by the expanding reach of robotic technologies. However, the emergence of new deployment domains brings unprecedented challenges in ensuring safe operation of these systems, which remains as crucial as ever. While traditional model-based safe control methods struggle with generalizability and scalability, emerg… ▽ More Recent years have seen significant progress in the realm of robot autonomy, accompanied by the expanding reach of robotic technologies. However, the emergence of new deployment domains brings unprecedented challenges in ensuring safe operation of these systems, which remains as crucial as ever. While traditional model-based safe control methods struggle with generalizability and scalability, emerging data-driven approaches tend to lack well-understood guarantees, which can result in unpredictable catastrophic failures. Successful deployment of the next generation of autonomous robots will require integrating the strengths of both paradigms. This article provides a review of safety filter approaches, highlighting important connections between existing techniques and proposing a unified technical framework to understand, compare, and combine them. The new unified view exposes a shared modular structure across a range of seemingly disparate safety filter classes and naturally suggests directions for future progress towards more scalable synthesis, robust monitoring, and efficient intervention. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: Accepted for publication in Annual Review of Control, Robotics, and Autonomous Systems

arXiv:2309.04335 [pdf, ps, other]

On the performance of an integrated communication and localization system: an analytical framework

Authors: Yuan Gao, Haonan Hu, Jiliang Zhang, Yanliang Jin, Shugong Xu, Xiaoli Chu

Abstract: Quantifying the performance bound of an integrated localization and communication (ILAC) system and the trade-off between communication and localization performance is critical. In this letter, we consider an ILAC system that can perform communication and localization via time-domain or frequency-domain resource allocation. We develop an analytical framework to derive the closed-form expression of… ▽ More Quantifying the performance bound of an integrated localization and communication (ILAC) system and the trade-off between communication and localization performance is critical. In this letter, we consider an ILAC system that can perform communication and localization via time-domain or frequency-domain resource allocation. We develop an analytical framework to derive the closed-form expression of the capacity loss versus localization Cramer-Rao lower bound (CRB) loss via time-domain and frequency-domain resource allocation. Simulation results validate the analytical model and demonstrate that frequency-domain resource allocation is preferable in scenarios with a smaller number of antennas at the next generation nodeB (gNB) and a larger distance between user equipment (UE) and gNB, while time-domain resource allocation is preferable in scenarios with a larger number of antennas and smaller distance between UE and the gNB. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures

arXiv:2309.03900 [pdf, other]

Learning Continuous Exposure Value Representations for Single-Image HDR Reconstruction

Authors: Su-Kai Chen, Hung-Lin Yen, Yu-Lun Liu, Min-Hung Chen, Hou-Ning Hu, Wen-Hsiao Peng, Yen-Yu Lin

Abstract: Deep learning is commonly used to reconstruct HDR images from LDR images. LDR stack-based methods are used for single-image HDR reconstruction, generating an HDR image from a deep learning-generated LDR stack. However, current methods generate the stack with predetermined exposure values (EVs), which may limit the quality of HDR reconstruction. To address this, we propose the continuous exposure v… ▽ More Deep learning is commonly used to reconstruct HDR images from LDR images. LDR stack-based methods are used for single-image HDR reconstruction, generating an HDR image from a deep learning-generated LDR stack. However, current methods generate the stack with predetermined exposure values (EVs), which may limit the quality of HDR reconstruction. To address this, we propose the continuous exposure value representation (CEVR), which uses an implicit function to generate LDR images with arbitrary EVs, including those unseen during training. Our approach generates a continuous stack with more images containing diverse EVs, significantly improving HDR reconstruction. We use a cycle training strategy to supervise the model in generating continuous EV LDR images without corresponding ground truths. Our CEVR model outperforms existing methods, as demonstrated by experimental results. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: ICCV 2023. Project page: https://skchen1993.github.io/CEVR_web/

arXiv:2309.01267 [pdf, other]

Deception Game: Closing the Safety-Learning Loop in Interactive Robot Autonomy

Authors: Haimin Hu, Zixu Zhang, Kensuke Nakamura, Andrea Bajcsy, Jaime F. Fisac

Abstract: An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explic… ▽ More An outstanding challenge for the widespread deployment of robotic systems like autonomous vehicles is ensuring safe interaction with humans without sacrificing performance. Existing safety methods often neglect the robot's ability to learn and adapt at runtime, leading to overly conservative behavior. This paper proposes a new closed-loop paradigm for synthesizing safe control policies that explicitly account for the robot's evolving uncertainty and its ability to quickly respond to future scenarios as they arise, by jointly considering the physical dynamics and the robot's learning algorithm. We leverage adversarial reinforcement learning for tractable safety analysis under high-dimensional learning dynamics and demonstrate our framework's ability to work with both Bayesian belief propagation and implicit learning through large pre-trained neural trajectory predictors. △ Less

Submitted 1 November, 2023; v1 submitted 3 September, 2023; originally announced September 2023.

Comments: Conference on Robot Learning 2023

arXiv:2309.00514 [pdf]

A Machine Vision Method for Correction of Eccentric Error: Based on Adaptive Enhancement Algorithm

Authors: Fanyi Wang, Pin Cao, Yihui Zhang, Haotian Hu, Yongying Yang

Abstract: In the procedure of surface defects detection for large-aperture aspherical optical elements, it is of vital significance to adjust the optical axis of the element to be coaxial with the mechanical spin axis accurately. Therefore, a machine vision method for eccentric error correction is proposed in this paper. Focusing on the severe defocus blur of reference crosshair image caused by the imaging… ▽ More In the procedure of surface defects detection for large-aperture aspherical optical elements, it is of vital significance to adjust the optical axis of the element to be coaxial with the mechanical spin axis accurately. Therefore, a machine vision method for eccentric error correction is proposed in this paper. Focusing on the severe defocus blur of reference crosshair image caused by the imaging characteristic of the aspherical optical element, which may lead to the failure of correction, an Adaptive Enhancement Algorithm (AEA) is proposed to strengthen the crosshair image. AEA is consisted of existed Guided Filter Dark Channel Dehazing Algorithm (GFA) and proposed lightweight Multi-scale Densely Connected Network (MDC-Net). The enhancement effect of GFA is excellent but time-consuming, and the enhancement effect of MDC-Net is slightly inferior but strongly real-time. As AEA will be executed dozens of times during each correction procedure, its real-time performance is very important. Therefore, by setting the empirical threshold of definition evaluation function SMD2, GFA and MDC-Net are respectively applied to highly and slightly blurred crosshair images so as to ensure the enhancement effect while saving as much time as possible. AEA has certain robustness in time-consuming performance, which takes an average time of 0.2721s and 0.0963s to execute GFA and MDC-Net separately on ten 200pixels 200pixels Region of Interest (ROI) images with different degrees of blur. And the eccentricity error can be reduced to within 10um by our method. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2307.01534 [pdf, other]

Impact of UAVs Equipped with ADS-B on the Civil Aviation Monitoring System

Authors: Yiyang Liao, Lei Zhang, Ziye Jia, Chao Dong, Yifan Zhang, Qihui Wu, Huiling Hu, Bin Wang

Abstract: In recent years, there is an increasing demand for unmanned aerial vehicles (UAVs) to complete multiple applications. However, as unmanned equipments, UAVs lead to some security risks to general civil aviations. In order to strengthen the flight management of UAVs and guarantee the safety, UAVs can be equipped with automatic dependent surveillance-broadcast (ADS-B) devices. In addition, as an auto… ▽ More In recent years, there is an increasing demand for unmanned aerial vehicles (UAVs) to complete multiple applications. However, as unmanned equipments, UAVs lead to some security risks to general civil aviations. In order to strengthen the flight management of UAVs and guarantee the safety, UAVs can be equipped with automatic dependent surveillance-broadcast (ADS-B) devices. In addition, as an automatic system, ADS-B can periodically broadcast flight information to the nearby aircrafts or the ground stations, and the technology is already used in civil aviation systems. However, due to the limited frequency of ADS-B technique, UAVs equipped with ADS-B devices result in the loss of packets to both UAVs and civil aviation. Further, the operation of civil aviation are seriously interfered. Hence, this paper firstly examines the packets loss of civil planes at different distance, then analyzes the impact of UAVs equipped with ADS-B on the packets updating of civil planes. The result indicates that the 1090MHz band blocking is affected by the density of UAVs. Besides, the frequency capacity is affected by the requirement of updating interval of civil planes. The position updating probability within 3s is 92.3% if there are 200 planes within 50km and 20 UAVs within 5km. The position updating probability within 3s is 86.9% if there are 200 planes within 50km and 40 UAVs within 5km. △ Less

Submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.16696 [pdf]

Computationally-efficient and perceptually-motivated rendering of diffuse reflections in room acoustics simulation

Authors: Stephan D. Ewert, Nico Gößling, Oliver Buttler, Steven van de Par, Hongmei Hu

Abstract: Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. He… ▽ More Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. Here, a computationally-efficient, digital-filter approach is suggested to account for effects of non-specular reflections in the ISM and to couple scattered sound into a diffuse reverberation model using a spatially rendered FDN. Depending on the scattering coefficient of a room boundary, energy of each image source is split into a specular and a scattered part which is added to the diffuse sound field. Temporal effects as observed for an infinite ideal diffuse (Lambertian) reflector are simulated using cascaded all-pass filters. Effects of scattering and multiple (inter-) reflections caused by larger geometric disturbances at walls and by objects in the room are accounted for in a highly simplified manner. Using a single parameter to quantify deviations from an empty shoebox room, each reflection is temporally smeared using cascaded all-pass filters. The proposed method was perceptually evaluated against dummy head recordings of real rooms. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: This work has been submitted to Forum Acusticum 2023 for publication

arXiv:2305.12107 [pdf, other]

EE-TTS: Emphatic Expressive TTS with Linguistic Information

Authors: Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun

Abstract: While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To reso… ▽ More While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To resolve this matter, we propose Emphatic Expressive TTS (EE-TTS), which leverages multi-level linguistic information from syntax and semantics. EE-TTS contains an emphasis predictor that can identify appropriate emphasis positions from text and a conditioned acoustic model to synthesize expressive speech with emphasis and linguistic information. Experimental results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49 and 0.67 in expressiveness and naturalness. EE-TTS also shows strong generalization across different datasets according to AB test results. △ Less

Submitted 14 April, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: Accepted by Interspeech 2023, fix some typos

arXiv:2305.09236 [pdf, other]

One-shot neural band selection for spectral recovery

Authors: Hai-Miao Hu, Zhenbo Xu, Wenshuai Xu, You Song, YiTao Zhang, Liu Liu, Zhilin Han, Ajin Meng

Abstract: Band selection has a great impact on the spectral recovery quality. To solve this ill-posed inverse problem, most band selection methods adopt hand-crafted priors or exploit clustering or sparse regularization constraints to find most prominent bands. These methods are either very slow due to the computational cost of repeatedly training with respect to different selection frequencies or different… ▽ More Band selection has a great impact on the spectral recovery quality. To solve this ill-posed inverse problem, most band selection methods adopt hand-crafted priors or exploit clustering or sparse regularization constraints to find most prominent bands. These methods are either very slow due to the computational cost of repeatedly training with respect to different selection frequencies or different band combinations. Many traditional methods rely on the scene prior and thus are not applicable to other scenarios. In this paper, we present a novel one-shot Neural Band Selection (NBS) framework for spectral recovery. Unlike conventional searching approaches with a discrete search space and a non-differentiable search strategy, our NBS is based on the continuous relaxation of the band selection process, thus allowing efficient band search using gradient descent. To enable the compatibility for se- lecting any number of bands in one-shot, we further exploit the band-wise correlation matrices to progressively suppress similar adjacent bands. Extensive evaluations on the NTIRE 2022 Spectral Reconstruction Challenge demonstrate that our NBS achieves consistent performance gains over competitive baselines when examined with four different spectral recov- ery methods. Our code will be publicly available. △ Less

Submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted by ICASSP 2023, any questions contact [email protected]

arXiv:2305.04294

PELE scores: Pelvic X-ray Landmark Detection by Pelvis Extraction and Enhancement

Authors: Zhen Huang, Han Li, Shitong Shao, Heqin Zhu, Huijie Hu, Zhiwei Cheng, Jianji Wang, S. Kevin Zhou

Abstract: The pelvis, the lower part of the trunk, supports and balances the trunk. Landmark detection from a pelvic X-ray (PXR) facilitates downstream analysis and computer-assisted diagnosis and treatment of pelvic diseases. Although PXRs have the advantages of low radiation and reduced cost compared to computed tomography (CT) images, their 2D pelvis-tissue superposition of 3D structures confuses clinica… ▽ More The pelvis, the lower part of the trunk, supports and balances the trunk. Landmark detection from a pelvic X-ray (PXR) facilitates downstream analysis and computer-assisted diagnosis and treatment of pelvic diseases. Although PXRs have the advantages of low radiation and reduced cost compared to computed tomography (CT) images, their 2D pelvis-tissue superposition of 3D structures confuses clinical decision-making. In this paper, we propose a PELvis Extraction (PELE) module that utilizes 3D prior anatomical knowledge in CT to guide and well isolate the pelvis from PXRs, thereby eliminating the influence of soft tissue. We conduct an extensive evaluation based on two public datasets and one private dataset, totaling 850 PXRs. The experimental results show that the proposed PELE module significantly improves the accuracy of PXRs landmark detection and achieves state-of-the-art performances in several benchmark metrics, thus better serving downstream tasks. △ Less

Submitted 7 June, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: will revise it and resubmit it again later

arXiv:2304.02687 [pdf, other]

Emergent Coordination through Game-Induced Nonlinear Opinion Dynamics

Authors: Haimin Hu, Kensuke Nakamura, Kai-Chieh Hsu, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to uns… ▽ More We present a multi-agent decision-making framework for the emergent coordination of autonomous agents whose intents are initially undecided. Dynamic non-cooperative games have been used to encode multi-agent interaction, but ambiguity arising from factors such as goal preference or the presence of multiple equilibria may lead to coordination issues, ranging from the "freezing robot" problem to unsafe behavior in safety-critical events. The recently developed nonlinear opinion dynamics (NOD) provide guarantees for breaking deadlocks. However, choosing the appropriate model parameters automatically in general multi-agent settings remains a challenge. In this paper, we first propose a novel and principled procedure for synthesizing NOD based on the value functions of dynamic games conditioned on agents' intents. In particular, we provide for the two-player two-option case precise stability conditions for equilibria of the game-induced NOD based on the mismatch between agents' opinions and their game values. We then propose an optimization-based trajectory optimization algorithm that computes agents' policies guided by the evolution of opinions. The efficacy of our method is illustrated with a simulated toll station coordination example. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2303.10310 [pdf, other]

doi 10.1016/j.engappai.2023.107255

Domain-knowledge Inspired Pseudo Supervision (DIPS) for Unsupervised Image-to-Image Translation Models to Support Cross-Domain Classification

Authors: Firas Al-Hindawi, Md Mahfuzur Rahman Siddiquee, Teresa Wu, Han Hu, Ying Sun

Abstract: The ability to classify images is dependent on having access to large labeled datasets and testing on data from the same domain that the model can train on. Classification becomes more challenging when dealing with new data from a different domain, where gathering and especially labeling a larger image dataset for retraining a classification model requires a labor-intensive human effort. Cross-dom… ▽ More The ability to classify images is dependent on having access to large labeled datasets and testing on data from the same domain that the model can train on. Classification becomes more challenging when dealing with new data from a different domain, where gathering and especially labeling a larger image dataset for retraining a classification model requires a labor-intensive human effort. Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image translation models to translate an input image from the unlabeled domain to the labeled domain. The problem with these unsupervised models lies in their unsupervised nature. For lack of annotations, it is not possible to use the traditional supervised metrics to evaluate these translation models to pick the best-saved checkpoint model. This paper introduces a new method called Domain-knowledge Inspired Pseudo Supervision (DIPS) which utilizes domain-informed Gaussian Mixture Models to generate pseudo annotations to enable the use of traditional supervised metrics. This method was designed specifically to support cross-domain classification applications contrary to other typically used metrics such as the FID which were designed to evaluate the model in terms of the quality of the generated image from a human-eye perspective. DIPS proves its effectiveness by outperforming various GAN evaluation metrics, including FID, when selecting the optimal saved checkpoint model. It is also evaluated against truly supervised metrics. Furthermore, DIPS showcases its robustness and interpretability by demonstrating a strong correlation with truly supervised metrics, highlighting its superiority over existing state-of-the-art alternatives. The code and data to replicate the results can be found on the official Github repository: https://github.com/Hindawi91/DIPS △ Less

Submitted 30 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

Comments: arXiv admin note: text overlap with arXiv:2212.09107

arXiv:2303.03625 [pdf, other]

SGDA: Towards 3D Universal Pulmonary Nodule Detection via Slice Grouped Domain Attention

Authors: Rui Xu, Zhi Liu, Yong Luo, Han Hu, Li Shen, Bo Du, Kaiming Kuang, Jiancheng Yang

Abstract: Lung cancer is the leading cause of cancer death worldwide. The best solution for lung cancer is to diagnose the pulmonary nodules in the early stage, which is usually accomplished with the aid of thoracic computed tomography (CT). As deep learning thrives, convolutional neural networks (CNNs) have been introduced into pulmonary nodule detection to help doctors in this labor-intensive task and dem… ▽ More Lung cancer is the leading cause of cancer death worldwide. The best solution for lung cancer is to diagnose the pulmonary nodules in the early stage, which is usually accomplished with the aid of thoracic computed tomography (CT). As deep learning thrives, convolutional neural networks (CNNs) have been introduced into pulmonary nodule detection to help doctors in this labor-intensive task and demonstrated to be very effective. However, the current pulmonary nodule detection methods are usually domain-specific, and cannot satisfy the requirement of working in diverse real-world scenarios. To address this issue, we propose a slice grouped domain attention (SGDA) module to enhance the generalization capability of the pulmonary nodule detection networks. This attention module works in the axial, coronal, and sagittal directions. In each direction, we divide the input feature into groups, and for each group, we utilize a universal adapter bank to capture the feature subspaces of the domains spanned by all pulmonary nodule datasets. Then the bank outputs are combined from the perspective of domain to modulate the input group. Extensive experiments demonstrate that SGDA enables substantially better multi-domain pulmonary nodule detection performance compared with the state-of-the-art multi-domain learning methods. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE/ACM Transactions on Computational Biology and Bioinformatics

arXiv:2302.00171 [pdf, other]

Active Uncertainty Reduction for Safe and Efficient Interaction Planning: A Shielding-Aware Dual Control Approach

Authors: Haimin Hu, David Isele, Sangjae Bae, Jaime F. Fisac

Abstract: The ability to accurately predict others' behavior is central to the safety and efficiency of interactive robotics. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents' goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidde… ▽ More The ability to accurately predict others' behavior is central to the safety and efficiency of interactive robotics. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents' goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we use a runtime safety filter (also referred to as a "shielding" scheme), which overrides the robot's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability agent behaviors. We demonstrate the efficacy of our approach with both simulated driving studies and hardware experiments using 1/10 scale autonomous vehicles. △ Less

Submitted 1 November, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: The International Journal of Robotics Research. arXiv admin note: text overlap with arXiv:2202.07720

arXiv:2301.01446 [pdf, other]

Radio Frequency Fingerprints Extraction for LTE-V2X: A Channel Estimation Based Methodology

Authors: Tianshu Chen, Hong Shen, Aiqun Hu, Weihang He, Jie Xu, Hongxing Hu

Abstract: The vehicular-to-everything (V2X) technology has recently drawn a number of attentions from both academic and industrial areas. However, the openness of the wireless communication system makes it more vulnerable to identity impersonation and information tampering. How to employ the powerful radio frequency fingerprint (RFF) identification technology in V2X systems turns out to be a vital and also… ▽ More The vehicular-to-everything (V2X) technology has recently drawn a number of attentions from both academic and industrial areas. However, the openness of the wireless communication system makes it more vulnerable to identity impersonation and information tampering. How to employ the powerful radio frequency fingerprint (RFF) identification technology in V2X systems turns out to be a vital and also challenging task. In this paper, we propose a novel RFF extraction method for Long Term Evolution-V2X (LTE-V2X) systems. In order to conquer the difficulty of extracting transmitter RFF in the presence of wireless channel and receiver noise, we first estimate the wireless channel which excludes the RFF. Then, we remove the impact of the wireless channel based on the channel estimate and obtain initial RFF features. Finally, we conduct RFF denoising to enhance the quality of the initial RFF. Simulation and experiment results both demonstrate that our proposed RFF extraction scheme achieves a high identification accuracy. Furthermore, the performance is also robust to the vehicle speed. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: To be published in 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall)

arXiv:2212.09107 [pdf, other]

doi 10.1016/j.eswa.2023.120265

A Framework for Generalizing Critical Heat Flux Detection Models Using Unsupervised Image-to-Image Translation

Authors: Firas Al-Hindawi, Tejaswi Soori, Han Hu, Md Mahfuzur Rahman Siddiquee, Hyunsoo Yoon, Teresa Wu, Ying Sun

Abstract: The detection of critical heat flux (CHF) is crucial in heat boiling applications as failure to do so can cause rapid temperature ramp leading to device failures. Many machine learning models exist to detect CHF, but their performance reduces significantly when tested on data from different domains. To deal with datasets from new domains a model needs to be trained from scratch. Moreover, the data… ▽ More The detection of critical heat flux (CHF) is crucial in heat boiling applications as failure to do so can cause rapid temperature ramp leading to device failures. Many machine learning models exist to detect CHF, but their performance reduces significantly when tested on data from different domains. To deal with datasets from new domains a model needs to be trained from scratch. Moreover, the dataset needs to be annotated by a domain expert. To address this issue, we propose a new framework to support the generalizability and adaptability of trained CHF detection models in an unsupervised manner. This approach uses an unsupervised Image-to-Image (UI2I) translation model to transform images in the target dataset to look like they were obtained from the same domain the model previously trained on. Unlike other frameworks dealing with domain shift, our framework does not require retraining or fine-tuning of the trained classification model nor does it require synthesized datasets in the training process of either the classification model or the UI2I model. The framework was tested on three boiling datasets from different domains, and we show that the CHF detection model trained on one dataset was able to generalize to the other two previously unseen datasets with high accuracy. Overall, the framework enables CHF detection models to adapt to data generated from different domains without requiring additional annotation effort or retraining of the model. △ Less

Submitted 17 March, 2023; v1 submitted 18 December, 2022; originally announced December 2022.

Comments: This work has been submitted to the Expert Systems With Applications Journal on Sep 25, 2022

arXiv:2212.08653 [pdf, other]

Attentive Mask CLIP

Authors: Yifan Yang, Weiquan Huang, Yixuan Wei, Houwen Peng, Xinyang Jiang, Huiqiang Jiang, Fangyun Wei, Yin Wang, Han Hu, Lili Qiu, Yuqing Yang

Abstract: Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incor… ▽ More Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incorrect pairing target in CLIP training. To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description. The correlation scores are computed in an online fashion using the EMA version of the visual encoder. Our experiments show that the proposed attentive masking approach performs better than the previous method of random token removal for CLIP training. The approach also makes it efficient to apply multiple augmentation views to the image, as well as introducing instance contrastive learning tasks between these views into the CLIP framework. Compared to other CLIP improvements that combine different pre-training targets such as SLIP and MaskCLIP, our method is not only more effective, but also much more efficient. Specifically, using ViT-B and YFCC-15M dataset, our approach achieves $43.9\%$ top-1 accuracy on ImageNet-1K zero-shot classification, as well as $62.7/42.1$ and $38.0/23.2$ I2T/T2I retrieval accuracy on Flickr30K and MS COCO, which are $+1.1\%$, $+5.5/+0.9$, and $+4.4/+1.3$ higher than the SLIP method, while being $2.30\times$ faster. An efficient version of our approach running $1.16\times$ faster than the plain CLIP model achieves significant gains of $+5.3\%$, $+11.3/+8.0$, and $+9.5/+4.9$ on these benchmarks. △ Less

Submitted 9 October, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

Journal ref: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2771-2781

arXiv:2211.14314 [pdf, other]

The applicability of transperceptual and deep learning approaches to the study and mimicry of complex cartilaginous tissues

Authors: J. Waghorne, C. Howard, H. Hu, J. Pang, W. J. Peveler, L. Harris, O. Barrera

Abstract: Complex soft tissues, for example the knee meniscus, play a crucial role in mobility and joint health, but when damaged are incredibly difficult to repair and replace. This is due to their highly hierarchical and porous nature which in turn leads to their unique mechanical properties. In order to design tissue substitutes, the internal architecture of the native tissue needs to be understood and r… ▽ More Complex soft tissues, for example the knee meniscus, play a crucial role in mobility and joint health, but when damaged are incredibly difficult to repair and replace. This is due to their highly hierarchical and porous nature which in turn leads to their unique mechanical properties. In order to design tissue substitutes, the internal architecture of the native tissue needs to be understood and replicated. Here we explore a combined audio-visual approach - so called transperceptual - to generate artificial architectures mimicking the native ones. The proposed method uses both traditional imagery, and sound generated from each image as a method of rapidly comparing and contrasting the porosity and pore size within the samples. We have trained and tested a generative adversarial network (GAN) on the 2D image stacks. The impact of the training set of images on the similarity of the artificial to the original dataset was assessed by analyzing two samples. The first consisting of n=478 pairs of audio and image files for which the images were downsampled to 64 $\times$ 64 pixels, the second one consisting of n=7640 pairs of audio and image files for which the full resolution 256 $\times$ 256 pixels is retained but each image is divided into 16 squares to maintain the limit of 64 $\times$ 64 pixels required by the GAN. We reconstruct the 2D stacks of artificially generated datasets into 3D objects and run image analysis algorithms to characterize statistically the architectural parameters - pore size, tortuosity and pore connectivity - and compare them with the original dataset. Results show that the artificially generated dataset that undergoes downsampling performs better in terms of parameter matching. Our audiovisual approach has the potential to be extended to larger data sets to explore both how similarities and differences can be audibly recognized across multiple samples. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2210.14645 [pdf, other]

Super-Resolution Based Patch-Free 3D Image Segmentation with High-Frequency Guidance

Authors: Hongyi Wang, Lanfen Lin, Hongjie Hu, Qingqing Chen, Yinhao Li, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, Ruofeng Tong

Abstract: High resolution (HR) 3D images are widely used nowadays, such as medical images like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, segmentation of these 3D images remains a challenge due to their high spatial resolution and dimensionality in contrast to currently limited GPU memory. Therefore, most existing 3D image segmentation methods use patch-based models, which have… ▽ More High resolution (HR) 3D images are widely used nowadays, such as medical images like Magnetic Resonance Imaging (MRI) and Computed Tomography (CT). However, segmentation of these 3D images remains a challenge due to their high spatial resolution and dimensionality in contrast to currently limited GPU memory. Therefore, most existing 3D image segmentation methods use patch-based models, which have low inference efficiency and ignore global contextual information. To address these problems, we propose a super-resolution (SR) based patch-free 3D image segmentation framework that can realize HR segmentation from a global-wise low-resolution (LR) input. The framework contains two sub-tasks, of which semantic segmentation is the main task and super resolution is an auxiliary task aiding in rebuilding the high frequency information from the LR input. To furthermore balance the information loss with the LR input, we propose a High-Frequency Guidance Module (HGM), and design an efficient selective cropping algorithm to crop an HR patch from the original image as restoration guidance for it. In addition, we also propose a Task-Fusion Module (TFM) to exploit the inter connections between segmentation and SR task, realizing joint optimization of the two tasks. When predicting, only the main segmentation task is needed, while other modules can be removed for acceleration. The experimental results on two different datasets show that our framework has a four times higher inference speed compared to traditional patch-based methods, while its performance also surpasses other patch-based and patch-free models. △ Less

Submitted 10 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

Comments: Version #2 uploaded in Jul 10, 2023

arXiv:2210.13415 [pdf]

Deep Learning Approach for Dynamic Sampling for Multichannel Mass Spectrometry Imaging

Authors: David Helminiak, Hang Hu, Julia Laskin, Dong Hye Ye

Abstract: Mass Spectrometry Imaging (MSI), using traditional rectilinear scanning, takes hours to days for high spatial resolution acquisitions. Given that most pixels within a sample's field of view are often neither relevant to underlying biological structures nor chemically informative, MSI presents as a prime candidate for integration with sparse and dynamic sampling algorithms. During a scan, stochasti… ▽ More Mass Spectrometry Imaging (MSI), using traditional rectilinear scanning, takes hours to days for high spatial resolution acquisitions. Given that most pixels within a sample's field of view are often neither relevant to underlying biological structures nor chemically informative, MSI presents as a prime candidate for integration with sparse and dynamic sampling algorithms. During a scan, stochastic models determine which locations probabilistically contain information critical to the generation of low-error reconstructions. Decreasing the number of required physical measurements thereby minimizes overall acquisition times. A Deep Learning Approach for Dynamic Sampling (DLADS), utilizing a Convolutional Neural Network (CNN) and encapsulating molecular mass intensity distributions within a third dimension, demonstrates a simulated 70% throughput improvement for Nanospray Desorption Electrospray Ionization (nano-DESI) MSI tissues. Evaluations are conducted between DLADS and a Supervised Learning Approach for Dynamic Sampling, with Least-Squares regression (SLADS-LS) and a Multi-Layer Perceptron (MLP) network (SLADS-Net). When compared with SLADS-LS, limited to a single m/z channel, as well as multichannel SLADS-LS and SLADS-Net, DLADS respectively improves regression performance by 36.7%, 7.0%, and 6.2%, resulting in gains to reconstruction quality of 6.0%, 2.1%, and 3.4% for acquisition of targeted m/z. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2209.11171 [pdf, other]

A Cooperative Deception Strategy for Covert Communication in Presence of a Multi-antenna Adversary

Authors: Jiangbo Si, Zizhen Liu, Zan Li, Hang Hu, Lei Guan, Chao Wang, Naofal Al-Dhahir

Abstract: Covert transmission is investigated for a cooperative deception strategy, where a cooperative jammer (Jammer) tries to attract a multi-antenna adversary (Willie) and degrade the adversary's reception ability for the signal from a transmitter (Alice). For this strategy, we formulate an optimization problem to maximize the covert rate when three different types of channel state information (CSI) are… ▽ More Covert transmission is investigated for a cooperative deception strategy, where a cooperative jammer (Jammer) tries to attract a multi-antenna adversary (Willie) and degrade the adversary's reception ability for the signal from a transmitter (Alice). For this strategy, we formulate an optimization problem to maximize the covert rate when three different types of channel state information (CSI) are available. The total power is optimally allocated between Alice and Jammer subject to Kullback-Leibler (KL) divergence constraint. Different from the existing literature, in our proposed strategy, we also determine the optimal transmission power at the jammer when Alice is silent, while existing works always assume that the jammer's power is fixed. Specifically, we apply the S-procedure to convert infinite constraints into linear-matrix-inequalities (LMI) constraints. When statistical CSI at Willie is available, we convert double integration to single integration using asymptotic approximation and substitution method. In addition, the transmission strategy without jammer deception is studied as a benchmark. Finally, our simulation results show that for the proposed strategy, the covert rate is increased with the number of antennas at Willie. Moreover, compared to the benchmark, our proposed strategy is more robust in face of imperfect CSI. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 33 pages, 8 Figures

MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary) ACM Class: F.2.2; I.2.7

arXiv:2209.05234 [pdf, other]

Low rank prior and l0 norm to remove impulse noise in images

Authors: Haijuan Hu

Abstract: Patch-based low rank is an important prior assumption for image processing. Moreover, according to our calculation, the optimization of l0 norm corresponds to the maximum likelihood estimation under random-valued impulse noise. In this article, we thus combine exact rank and l0 norm for removing the noise. It is solved formally using the alternating direction method of multipliers (ADMM), with our… ▽ More Patch-based low rank is an important prior assumption for image processing. Moreover, according to our calculation, the optimization of l0 norm corresponds to the maximum likelihood estimation under random-valued impulse noise. In this article, we thus combine exact rank and l0 norm for removing the noise. It is solved formally using the alternating direction method of multipliers (ADMM), with our previous patch-based weighted filter (PWMF) producing initial images. Since this model is not convex, we consider it as a Plug-and-Play ADMM, and do not discuss theoretical convergence properties. Experiments show that this method has very good performance, especially for weak or medium contrast images. △ Less

Submitted 12 September, 2022; originally announced September 2022.

arXiv:2209.00353 [pdf, other]

AccoMontage2: A Complete Harmonization and Accompaniment Arrangement System

Authors: Li Yi, Haochen Hu, Jingwei Zhao, Gus Xia

Abstract: We propose AccoMontage2, a system capable of doing full-length song harmonization and accompaniment arrangement based on a lead melody. Following AccoMontage, this study focuses on generating piano arrangements for popular/folk songs and it carries on the generalized template-based retrieval method. The novelties of this study are twofold. First, we invent a harmonization module (which AccoMontage… ▽ More We propose AccoMontage2, a system capable of doing full-length song harmonization and accompaniment arrangement based on a lead melody. Following AccoMontage, this study focuses on generating piano arrangements for popular/folk songs and it carries on the generalized template-based retrieval method. The novelties of this study are twofold. First, we invent a harmonization module (which AccoMontage does not have). This module generates structured and coherent full-length chord progression by optimizing and balancing three loss terms: a micro-level loss for note-wise dissonance, a meso-level loss for phrase-template matching, and a macro-level loss for full piece coherency. Second, we develop a graphical user interface which allows users to select different styles of chord progression and piano texture. Currently, chord progression styles include Pop, R&B, and Dark, while piano texture styles include several levels of voicing density and rhythmic complexity. Experimental results show that both our harmonization and arrangement results significantly outperform the baselines. Lastly, we release AccoMontage2 as an online application as well as the organized chord progression templates as a public dataset. △ Less

Submitted 1 September, 2022; originally announced September 2022.

Comments: Accepted by ISMIR 2022

Showing 1–50 of 107 results for author: Hu, H