Search | arXiv e-print repository

Mid-Band Extra Large-Scale MIMO System: Channel Modeling and Performance Analysis

Authors: Jiachen Tian, Yu Han, Xiao Li, Shi Jin, Chao-Kai Wen

Abstract: In pursuit of enhanced quality of service and higher transmission rates, communication within the mid-band spectrum, such as bands in the 6-15 GHz range, combined with extra large-scale multiple-input multiple-output (XL-MIMO), is considered a potential enabler for future communication systems. However, the characteristics introduced by mid-band XL-MIMO systems pose challenges for channel modeling… ▽ More In pursuit of enhanced quality of service and higher transmission rates, communication within the mid-band spectrum, such as bands in the 6-15 GHz range, combined with extra large-scale multiple-input multiple-output (XL-MIMO), is considered a potential enabler for future communication systems. However, the characteristics introduced by mid-band XL-MIMO systems pose challenges for channel modeling and performance analysis. In this paper, we first analyze the potential characteristics of mid-band MIMO channels. Then, an analytical channel model incorporating novel channel characteristics is proposed, based on a review of classical analytical channel models. This model is convenient for theoretical analysis and compatible with other analytical channel models. Subsequently, based on the proposed channel model, we analyze key metrics of wireless communication, including the ergodic spectral efficiency (SE) and outage probability (OP) of MIMO maximal-ratio combining systems. Specifically, we derive closed-form approximations and performance bounds for two typical scenarios, aiming to illustrate the influence of mid-band XL-MIMO systems. Finally, comparisons between systems under different practical configurations are carried out through simulations. The theoretical analysis and simulations demonstrate that mid-band XL-MIMO systems excel in SE and OP due to the increased array elements, moderate large-scale fading, and enlarged transmission bandwidth. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 16 pages, 10 figures

arXiv:2407.11705 [pdf, other]

Snail-Radar: A large-scale diverse dataset for the evaluation of 4D-radar-based SLAM systems

Authors: Jianzhu Huai, Binliang Wang, Yuan Zhuang, Yiwen Chen, Qipeng Li, Yulong Han, Charles Toth

Abstract: 4D radars are increasingly favored for odometry and mapping of autonomous systems due to their robustness in harsh weather and dynamic environments. Existing datasets, however, often cover limited areas and are typically captured using a single platform. To address this gap, we present a diverse large-scale dataset specifically designed for 4D radar-based localization and mapping. This dataset was… ▽ More 4D radars are increasingly favored for odometry and mapping of autonomous systems due to their robustness in harsh weather and dynamic environments. Existing datasets, however, often cover limited areas and are typically captured using a single platform. To address this gap, we present a diverse large-scale dataset specifically designed for 4D radar-based localization and mapping. This dataset was gathered using three different platforms: a handheld device, an e-bike, and an SUV, under a variety of environmental conditions, including clear days, nighttime, and heavy rain. The data collection occurred from September 2023 to February 2024, encompassing diverse settings such as roads in a vegetated campus and tunnels on highways. Each route was traversed multiple times to facilitate place recognition evaluations. The sensor suite included a 3D lidar, 4D radars, stereo cameras, consumer-grade IMUs, and a GNSS/INS system. Sensor data packets were synchronized to GNSS time using a two-step process: a convex hull algorithm was applied to smooth host time jitter, and then odometry and correlation algorithms were used to correct constant time offsets. Extrinsic calibration between sensors was achieved through manual measurements and subsequent nonlinear optimization. The reference motion for the platforms was generated by registering lidar scans to a terrestrial laser scanner (TLS) point cloud map using a lidar inertial odometry (LIO) method in localization mode. Additionally, a data reversion technique was introduced to enable backward LIO processing. We believe this dataset will boost research in radar-based point cloud registration, odometry, mapping, and place recognition. △ Less

Submitted 22 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: 11 pages, 4 figures, 5 tables

arXiv:2407.10377 [pdf]

Enhanced Self-supervised Learning for Multi-modality MRI Segmentation and Classification: A Novel Approach Avoiding Model Collapse

Authors: Linxuan Han, Sa Xiao, Zimeng Li, Haidong Li, Xiuchao Zhao, Fumin Guo, Yeqing Han, Xin Zhou

Abstract: Multi-modality magnetic resonance imaging (MRI) can provide complementary information for computer-aided diagnosis. Traditional deep learning algorithms are suitable for identifying specific anatomical structures segmenting lesions and classifying diseases with magnetic resonance images. However, manual labels are limited due to high expense, which hinders further improvement of model accuracy. Se… ▽ More Multi-modality magnetic resonance imaging (MRI) can provide complementary information for computer-aided diagnosis. Traditional deep learning algorithms are suitable for identifying specific anatomical structures segmenting lesions and classifying diseases with magnetic resonance images. However, manual labels are limited due to high expense, which hinders further improvement of model accuracy. Self-supervised learning (SSL) can effectively learn feature representations from unlabeled data by pre-training and is demonstrated to be effective in natural image analysis. Most SSL methods ignore the similarity of multi-modality MRI, leading to model collapse. This limits the efficiency of pre-training, causing low accuracy in downstream segmentation and classification tasks. To solve this challenge, we establish and validate a multi-modality MRI masked autoencoder consisting of hybrid mask pattern (HMP) and pyramid barlow twin (PBT) module for SSL on multi-modality MRI analysis. The HMP concatenates three masking steps forcing the SSL to learn the semantic connections of multi-modality images by reconstructing the masking patches. We have proved that the proposed HMP can avoid model collapse. The PBT module exploits the pyramidal hierarchy of the network to construct barlow twin loss between masked and original views, aligning the semantic representations of image patches at different vision scales in latent space. Experiments on BraTS2023, PI-CAI, and lung gas MRI datasets further demonstrate the superiority of our framework over the state-of-the-art. The performance of the segmentation and classification is substantially enhanced, supporting the accurate detection of small lesion areas. The code is available at https://github.com/LinxuanHan/M2-MAE. △ Less

Submitted 17 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

arXiv:2406.19769 [pdf, other]

Decision Transformer for IRS-Assisted Systems with Diffusion-Driven Generative Channels

Authors: Jie Zhang, Jun Li, Zhe Wang, Yu Han, Long Shi, Bin Cao

Abstract: In this paper, we propose a novel diffusion-decision transformer (D2T) architecture to optimize the beamforming strategies for intelligent reflecting surface (IRS)-assisted multiple-input single-output (MISO) communication systems. The first challenge lies in the expensive computation cost to recover the real-time channel state information (CSI) from the received pilot signals, which usually requi… ▽ More In this paper, we propose a novel diffusion-decision transformer (D2T) architecture to optimize the beamforming strategies for intelligent reflecting surface (IRS)-assisted multiple-input single-output (MISO) communication systems. The first challenge lies in the expensive computation cost to recover the real-time channel state information (CSI) from the received pilot signals, which usually requires prior knowledge of the channel distributions. To reduce the channel estimation complexity, we adopt a diffusion model to automatically learn the mapping between the received pilot signals and channel matrices in a model-free manner. The second challenge is that, the traditional optimization or reinforcement learning (RL) algorithms cannot guarantee the optimality of the beamforming policies once the channel distribution changes, and it is costly to resolve the optimized strategies. To enhance the generality of the decision models over varying channel distributions, we propose an offline pre-training and online fine-tuning decision transformer (DT) framework, wherein we first pre-train the DT offline with the data samples collected by the RL algorithms under diverse channel distributions, and then fine-tune the DT online with few-shot samples under a new channel distribution for a generalization purpose. Simulation results demonstrate that, compared with retraining RL algorithms, the proposed D2T algorithm boosts the convergence speed by 3 times with only a few samples from the new channel distribution while enhancing the average user data rate by 6%. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.18425 [pdf, other]

L-Sort: An Efficient Hardware for Real-time Multi-channel Spike Sorting with Localization

Authors: Yuntao Han, Shiwei Wang, Alister Hamilton

Abstract: Spike sorting is essential for extracting neuronal information from neural signals and understanding brain function. With the advent of high-density microelectrode arrays (HDMEAs), the challenges and opportunities in multi-channel spike sorting have intensified. Real-time spike sorting is particularly crucial for closed-loop brain computer interface (BCI) applications, demanding efficient hardware… ▽ More Spike sorting is essential for extracting neuronal information from neural signals and understanding brain function. With the advent of high-density microelectrode arrays (HDMEAs), the challenges and opportunities in multi-channel spike sorting have intensified. Real-time spike sorting is particularly crucial for closed-loop brain computer interface (BCI) applications, demanding efficient hardware implementations. This paper introduces L-Sort, an hardware design for real-time multi-channel spike sorting. Leveraging spike localization techniques, L-Sort achieves efficient spike detection and clustering without the need to store raw signals during detection. By incorporating median thresholding and geometric features, L-Sort demonstrates promising results in terms of accuracy and hardware efficiency. We assessed the detection and clustering accuracy of our design with publicly available datasets recorded using high-density neural probes (Neuropixel). We implemented our design on an FPGA and compared the results with state of the art. Results show that our designs consume less hardware resource comparing with other FPGA-based spike sorting hardware. △ Less

Submitted 26 June, 2024; originally announced June 2024.

ACM Class: B.7.1

arXiv:2406.03706 [pdf, other]

Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

Authors: Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we introduce a novel audio codec-based TTS model to adapt context features with multiple enhancements. Inspired by the success of Qformer, we propose a multi-modal context-enhanced Qformer (MMCE-Qformer) to utilize additional multi-modal context information. Besides, we adapt a pretrained LLM to leverage its understanding ability to predict semantic tokens, and use a SoundStorm to generate acoustic tokens thereby enhancing audio quality and speaker similarity. The extensive objective and subjective evaluations show that our proposed method outperforms baselines across various context TTS scenarios. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by Interspeech 2024

arXiv:2405.20969 [pdf, other]

Design, Calibration, and Control of Compliant Force-sensing Gripping Pads for Humanoid Robots

Authors: Yuanfeng Han, Boren Jiang, Gregory S. Chirikjian

Abstract: This paper introduces a pair of low-cost, light-weight and compliant force-sensing gripping pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal gripping forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the g… ▽ More This paper introduces a pair of low-cost, light-weight and compliant force-sensing gripping pads used for manipulating box-like objects with smaller-sized humanoid robots. These pads measure normal gripping forces and center of pressure (CoP). A calibration method is developed to improve the CoP measurement accuracy. A hybrid force-alignment-position control framework is proposed to regulate the gripping forces and to ensure the surface alignment between the grippers and the object. Limit surface theory is incorporated as a contact friction modeling approach to determine the magnitude of gripping forces for slippage avoidance. The integrated hardware and software system is demonstrated with a NAO humanoid robot. Experiments show the effectiveness of the overall approach. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: 21 pages, 16 figures, Published in ASME Journal of Mechanisms and Robotics

Journal ref: Journal of Mechanisms and Robotics, 15, 031010,2023

arXiv:2405.16715 [pdf]

Coil Reweighting to Suppress Motion Artifacts in Real-Time Exercise Cine Imaging

Authors: Chong Chen, Yingmin Liu, Yu Ding, Matthew Tong, Preethi Chandrasekaran, Christopher Crabtree, Syed M. Arshad, Yuchi Han, Rizwan Ahmad

Abstract: Background: Accelerated real-time cine (RT-Cine) imaging enables cardiac function assessment without the need for breath-holding. However, when performed during in-magnet exercise, RT-Cine images may exhibit significant motion artifacts. Methods: By projecting the time-averaged images to the subspace spanned by the coil sensitivity maps, we propose a coil reweighting (CR) method to automatically s… ▽ More Background: Accelerated real-time cine (RT-Cine) imaging enables cardiac function assessment without the need for breath-holding. However, when performed during in-magnet exercise, RT-Cine images may exhibit significant motion artifacts. Methods: By projecting the time-averaged images to the subspace spanned by the coil sensitivity maps, we propose a coil reweighting (CR) method to automatically suppress a subset of receive coils that introduces a high level of artifacts in the reconstructed image. RT-Cine data collected at rest and during exercise from ten healthy volunteers and six patients were utilized to assess the performance of the proposed method. One short-axis and one two-chamber RT-Cine series reconstructed with and without CR from each subject were visually scored by two cardiologists in terms of the level of artifacts on a scale of 1 (worst) to 5 (best). Results: For healthy volunteers, applying CR to RT-Cine images collected at rest did not significantly change the image quality score (p=1). In contrast, for RT-Cine images collected during exercise, CR significantly improved the score from 3.9 to 4.68 (p<0.001). Similarly, in patients, CR did not significantly change the score for images collected at rest (p=0.031) but markedly improved the score from 3.15 to 4.42 (p<0.001) for images taken during exercise. Despite lower image quality scores in the patient cohort compared to healthy subjects, likely due to larger body habitus and the difficulty of limiting body motion during exercise, CR effectively suppressed motion artifacts, with all image series from the patient cohort receiving a score of four or higher. Conclusion: Using data from healthy subjects and patients, we demonstrate that the motion artifacts in the reconstructed RT-Cine images can be effectively suppressed significantly with the proposed CR method. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.00367 [pdf, other]

doi 10.1145/3626772.3657976

Distance Sampling-based Paraphraser Leveraging ChatGPT for Text Data Manipulation

Authors: Yoori Oh, Yoseob Han, Kyogu Lee

Abstract: There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different… ▽ More There has been growing interest in audio-language retrieval research, where the objective is to establish the correlation between audio and text modalities. However, most audio-text paired datasets often lack rich expression of the text data compared to the audio samples. One of the significant challenges facing audio-text datasets is the presence of similar or identical captions despite different audio samples. Therefore, under many-to-one mapping conditions, audio-text datasets lead to poor performance of retrieval tasks. In this paper, we propose a novel approach to tackle the data imbalance problem in audio-language retrieval task. To overcome the limitation, we introduce a method that employs a distance sampling-based paraphraser leveraging ChatGPT, utilizing distance function to generate a controllable distribution of manipulated text data. For a set of sentences with the same context, the distance is used to calculate a degree of manipulation for any two sentences, and ChatGPT's few-shot prompting is performed using a text cluster with a similar distance defined by the Jaccard similarity. Therefore, ChatGPT, when applied to few-shot prompting with text clusters, can adjust the diversity of the manipulated text based on the distance. The proposed approach is shown to significantly enhance performance in audio-text retrieval, outperforming conventional text augmentation techniques. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: Accepted at SIGIR 2024 short paper track

arXiv:2404.16318 [pdf, other]

The Continuous-Time Weighted-Median Opinion Dynamics

Authors: Yi Han, Ge Chen, Florian Dörfler, Wenjun Mei

Abstract: Opinion dynamics models are important in understanding and predicting opinion formation processes within social groups. Although the weighted-averaging opinion-update mechanism is widely adopted as the micro-foundation of opinion dynamics, it bears a non-negligibly unrealistic implication: opinion attractiveness increases with opinion distance. Recently, the weighted-median mechanism has been prop… ▽ More Opinion dynamics models are important in understanding and predicting opinion formation processes within social groups. Although the weighted-averaging opinion-update mechanism is widely adopted as the micro-foundation of opinion dynamics, it bears a non-negligibly unrealistic implication: opinion attractiveness increases with opinion distance. Recently, the weighted-median mechanism has been proposed as a new microscopic mechanism of opinion exchange. Numerous advancements have been achieved regarding this new micro-foundation, from theoretical analysis to empirical validation, in a discrete-time asynchronous setup. However, the original discrete-time weighted-median model does not allow for "compromise behavior" in opinion exchanges, i.e., no intermediate opinions are created between disagreeing agents. To resolve this problem, this paper propose a novel continuous-time weighted-median opinion dynamics model, in which agents' opinions move towards the weighted-medians of their out-neighbors' opinions. It turns out that the proof methods for the original discrete-time asynchronous model are no longer applicable to the analysis of the continuous-time model. In this paper, we first establish the existence and uniqueness of the solution to the continuous-time weighted-median opinion dynamics by showing that the weighted-median mapping is contractive on any graph. We also characterize the set of all the equilibria. Then, by leveraging a new LaSalle invariance principle argument, we prove the convergence of the continuous-time weighted-median model for any initial condition and derive a necessary and sufficient condition for the convergence to consensus. △ Less

Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: 13 pages, 1 figure

MSC Class: 91D30(Primary) 93A16(Secondary)

arXiv:2403.08580 [pdf, other]

Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

Authors: Yuxing Han, Yunan Ding, Chen Ye Gan, Jiangtao Wen

Abstract: Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these meth… ▽ More Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these methods often suffer from performance degradation in low-quality videos. We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream decoding. To validate our approach, we built a comprehensive data set comprising over 29,000 YouTube video clips, totaling 6,000 hours and spanning 11 distinct categories. Our evaluations indicate precision, accuracy, and recall rates consistently above 80%, many exceeding 90%, and some reaching 99%. The algorithm operates approximately 15,000 times faster than real-time for 30fps videos, outperforming traditional Dynamic Time Warping (DTW) algorithm by seven orders of magnitude. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 5 pages, 5 figures, 1 table. arXiv admin note: substantial text overlap with arXiv:2309.07361

arXiv:2403.06998 [pdf]

High-speed Low-consumption sEMG-based Transient-state micro-Gesture Recognition

Authors: Youfang Han, Wei Zhao, Xiangjin Chen, Xin Meng

Abstract: Gesture recognition on wearable devices is extensively applied in human-computer interaction. Electromyography (EMG) has been used in many gesture recognition systems for its rapid perception of muscle signals. However, analyzing EMG signals on devices, like smart wristbands, usually needs inference models to have high performances, such as low inference latency, low power consumption, and low mem… ▽ More Gesture recognition on wearable devices is extensively applied in human-computer interaction. Electromyography (EMG) has been used in many gesture recognition systems for its rapid perception of muscle signals. However, analyzing EMG signals on devices, like smart wristbands, usually needs inference models to have high performances, such as low inference latency, low power consumption, and low memory occupation. Therefore, this paper proposes an improved spiking neural network (SNN) to achieve these goals. We propose an adaptive multi-delta coding as a spiking coding method to improve recognition accuracy. We propose two additive solvers for SNN, which can reduce inference energy consumption and amount of parameters significantly, and improve the robustness of temporal differences. In addition, we propose a linear action detection method TAD-LIF, which is suitable for SNNs. TAD-LIF is an improved LIF neuron that can detect transient-state gestures quickly and accurately. We collected two datasets from 20 subjects including 6 micro gestures. The collection devices are two designed lightweight consumer-level sEMG wristbands (3 and 8 electrode channels respectively). Compared to CNN, FCN, and normal SNN-based methods, the proposed SNN has higher recognition accuracy. The accuracy of the proposed SNN is 83.85% and 93.52% on the two datasets respectively. In addition, the inference latency of the proposed SNN is about 1% of CNN, the power consumption is about 0.1% of CNN, and the memory occupation is about 20% of CNN. The proposed methods can be used for precise, high-speed, and low-power micro-gesture recognition tasks, and are suitable for consumer-level intelligent wearable devices, which is a general way to achieve ubiquitous computing. △ Less

Submitted 12 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.17877 [pdf, other]

Accelerated Real-time Cine and Flow under In-magnet Staged Exercise

Authors: Preethi Chandrasekaran, Chong Chen, Yingmin Liu, Syed Murtaza Arshad, Christopher Crabtree, Matthew Tong, Yuchi Han, Rizwan Ahmad

Abstract: Background: Cardiovascular magnetic resonance imaging (CMR) is a wellestablished imaging tool for diagnosing and managing cardiac conditions. The integration of exercise stress with CMR (ExCMR) can enhance its diagnostic capacity. Despite recent advances in CMR technology, quantitative ExCMR during exercise remains technically challenging due to motion artifacts and limited spatial and temporal re… ▽ More Background: Cardiovascular magnetic resonance imaging (CMR) is a wellestablished imaging tool for diagnosing and managing cardiac conditions. The integration of exercise stress with CMR (ExCMR) can enhance its diagnostic capacity. Despite recent advances in CMR technology, quantitative ExCMR during exercise remains technically challenging due to motion artifacts and limited spatial and temporal resolution. Methods: This study investigated the feasibility of biventricular functional and hemodynamic assessment using real-time (RT) ExCMR during a staged exercise protocol in 24 healthy volunteers. We applied a coil reweighting technique and employed high acceleration rates to minimize motion blurring and artifacts. We further applied a beat-selection technique that identified beats from the endexpiratory phase to minimize the impact of respiration-induced through-plane motion. Additionally, results from six patients were presented to demonstrate clinical feasibility. Results: Our findings indicated a consistent decrease in end-systolic volume and stable end-diastolic volume across exercise intensities, leading to increased stroke volume and ejection fraction. The selection of end-expiratory beats enhanced the repeatability of cardiac function parameters, as shown by scan-rescan tests in nine volunteers. High scores from a blinded image quality assessment indicated that coil reweighting effectively minimized motion artifacts. Conclusions: This study demonstrated the feasibility of RT ExCMR with inmagnet exercise in healthy subjects and patients. Our results indicate that high acceleration rates, coil reweighting, and selection of respiratory phase-specific heartbeats enhance image quality and repeatability of quantitative RT ExCMR. △ Less

Submitted 21 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2401.08121 [pdf, other]

CycLight: learning traffic signal cooperation with a cycle-level strategy

Authors: Gengyue Han, Xiaohan Liu, Xianyue Peng, Hao Wang, Yu Han

Abstract: This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm… ▽ More This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm. This cycle-level approach effectively reduces the computational burden associated with frequent data communication, meanwhile enhancing the practicality and safety of real-world applications. A decentralized framework is formulated for multi-agent cooperation, while attention mechanism is integrated to accurately assess the impact of the surroundings on the current intersection. CycLight is tested in a large synthetic traffic grid using the microscopic traffic simulation tool, SUMO. Experimental results not only demonstrate the superiority of CycLight over other state-of-the-art approaches but also showcase its robustness against information transmission delays. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.17282 [pdf]

Nonlinear energy harvesting system with multiple stability

Authors: Yanwei Han, Zijian Zhang

Abstract: The nonlinear energy harvesting systems of the forced vibration with an electron-mechanical coupling are widely used to capture ambient vibration energy and convert mechanical energy into electrical energy. However, the nonlinear response mechanism of the friction induced vibration (FIV) energy harvesting system with multiple stability and stick-slip motion is still unclear. In the current paper,… ▽ More The nonlinear energy harvesting systems of the forced vibration with an electron-mechanical coupling are widely used to capture ambient vibration energy and convert mechanical energy into electrical energy. However, the nonlinear response mechanism of the friction induced vibration (FIV) energy harvesting system with multiple stability and stick-slip motion is still unclear. In the current paper, a novel nonlinear energy harvesting model with multiple stability of single-, double- and triple-well potential is proposed based on V-shaped structure spring and the belt conveying system. The dynamic equations for the energy harvesting system with multiple stability and self-excited friction are established by using Euler-Lagrangian equations. Secondly, the nonlinear restoring force, friction force, and potential energy surfaces for static characteristics of the energy harvesting system are obtained to show the nonlinear varying stiffness, multiple equilibrium points, discontinuous behaviors and multiple well response. Then, the equilibrium surface of bifurcation sets of the autonomous system is given to show the third-order quasi zero stiffness (QZS3), fifth-order quasi zero stiffness (QZS5), double well (DW) and triple well (TW). Furthermore, the response amplitudes of charge, current, voltage and power of the forced electron-mechanical coupled vibration system for QZS3, QZS5, DW and TW are analyzed by using the numerically solution. Finally, a prototype of FIV energy harvesting system is manufactured and the experimental system is setup. The experimental work of static restoring force, damping force and electrical output are well agreeable with the numerical results, which testified the proposed FIV energy harvesting model. △ Less

Submitted 27 December, 2023; originally announced December 2023.

Comments: 29 Pages, 29 figures

MSC Class: 34-xx ACM Class: J.2

arXiv:2312.16383 [pdf, ps, other]

Frame-level emotional state alignment method for speech emotion recognition

Authors: Qifei Li, Yingming Gao, Cong Wang, Yayue Deng, Jinlong Xue, Yichen Han, Ya Li

Abstract: Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address th… ▽ More Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective states consistent with utterance-level label, which makes it difficult for the model to distinguish the true emotion of the audio and perform poorly. To address this problem, we propose a frame-level emotional state alignment method for SER. First, we fine-tune HuBERT model to obtain a SER system with task-adaptive pretraining (TAPT) method, and extract embeddings from its transformer layers to form frame-level pseudo-emotion labels with clustering. Then, the pseudo labels are used to pretrain HuBERT. Hence, the each frame output of HuBERT has corresponding emotional information. Finally, we fine-tune the above pretrained HuBERT for SER by adding an attention layer on the top of it, which can focus only on those frames that are emotionally more consistent with utterance-level label. The experimental results performed on IEMOCAP indicate that our proposed method performs better than state-of-the-art (SOTA) methods. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2312.10112 [pdf, other]

NM-FlowGAN: Modeling sRGB Noise with a Hybrid Approach based on Normalizing Flows and Generative Adversarial Networks

Authors: Young Joo Han, Ha-Jin Yu

Abstract: Modeling and synthesizing real sRGB noise is crucial for various low-level vision tasks, such as building datasets for training image denoising systems. The distribution of real sRGB noise is highly complex and affected by a multitude of factors, making its accurate modeling extremely challenging. Therefore, recent studies have proposed methods that employ data-driven generative models, such as ge… ▽ More Modeling and synthesizing real sRGB noise is crucial for various low-level vision tasks, such as building datasets for training image denoising systems. The distribution of real sRGB noise is highly complex and affected by a multitude of factors, making its accurate modeling extremely challenging. Therefore, recent studies have proposed methods that employ data-driven generative models, such as generative adversarial networks (GAN) and Normalizing Flows. These studies achieve more accurate modeling of sRGB noise compared to traditional noise modeling methods. However, there are performance limitations due to the inherent characteristics of each generative model. To address this issue, we propose NM-FlowGAN, a hybrid approach that exploits the strengths of both GAN and Normalizing Flows. We simultaneously employ a pixel-wise noise modeling network based on Normalizing Flows, and spatial correlation modeling networks based on GAN. In our experiments, our NM-FlowGAN outperforms other baselines on the sRGB noise synthesis task. Moreover, the denoising neural network, trained with synthesized image pairs from our model, also shows superior performance compared to other baselines. Our code is available at: \url{https://github.com/YoungJooHan/NM-FlowGAN}. △ Less

Submitted 14 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: 25 pages, 11 figures, 7 tables

MSC Class: 68T45 ACM Class: I.4.4

arXiv:2310.11044 [pdf, ps, other]

A Tutorial on Near-Field XL-MIMO Communications Towards 6G

Authors: Haiquan Lu, Yong Zeng, Changsheng You, Yu Han, Jiayi Zhang, Zhe Wang, Zhenjun Dong, Shi Jin, Cheng-Xiang Wang, Tao Jiang, Xiaohu You, Rui Zhang

Abstract: Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The… ▽ More Extremely large-scale multiple-input multiple-output (XL-MIMO) is a promising technology for the sixth-generation (6G) mobile communication networks. By significantly boosting the antenna number or size to at least an order of magnitude beyond current massive MIMO systems, XL-MIMO is expected to unprecedentedly enhance the spectral efficiency and spatial resolution for wireless communication. The evolution from massive MIMO to XL-MIMO is not simply an increase in the array size, but faces new design challenges, in terms of near-field channel modelling, performance analysis, channel estimation, and practical implementation. In this article, we give a comprehensive tutorial overview on near-field XL-MIMO communications, aiming to provide useful guidance for tackling the above challenges. First, the basic near-field modelling for XL-MIMO is established, by considering the new characteristics of non-uniform spherical wave (NUSW) and spatial non-stationarity. Next, based on the near-field modelling, the performance analysis of XL-MIMO is presented, including the near-field signal-to-noise ratio (SNR) scaling laws, beam focusing pattern, achievable rate, and degrees-of-freedom (DoF). Furthermore, various XL-MIMO design issues such as near-field beam codebook, beam training, channel estimation, and delay alignment modulation (DAM) transmission are elaborated. Finally, we point out promising directions to inspire future research on near-field XL-MIMO communications. △ Less

Submitted 3 April, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 42 pages

arXiv:2310.07464 [pdf]

Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

Authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Changjing Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang

Abstract: Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a… ▽ More Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 47 pages, 6 figures

arXiv:2309.16128 [pdf, other]

Joint Correcting and Refinement for Balanced Low-Light Image Enhancement

Authors: Nana Yu, Hong Shi, Yahong Han

Abstract: Low-light image enhancement tasks demand an appropriate balance among brightness, color, and illumination. While existing methods often focus on one aspect of the image without considering how to pay attention to this balance, which will cause problems of color distortion and overexposure etc. This seriously affects both human visual perception and the performance of high-level visual models. In t… ▽ More Low-light image enhancement tasks demand an appropriate balance among brightness, color, and illumination. While existing methods often focus on one aspect of the image without considering how to pay attention to this balance, which will cause problems of color distortion and overexposure etc. This seriously affects both human visual perception and the performance of high-level visual models. In this work, a novel synergistic structure is proposed which can balance brightness, color, and illumination more effectively. Specifically, the proposed method, so-called Joint Correcting and Refinement Network (JCRNet), which mainly consists of three stages to balance brightness, color, and illumination of enhancement. Stage 1: we utilize a basic encoder-decoder and local supervision mechanism to extract local information and more comprehensive details for enhancement. Stage 2: cross-stage feature transmission and spatial feature transformation further facilitate color correction and feature refinement. Stage 3: we employ a dynamic illumination adjustment approach to embed residuals between predicted and ground truth images into the model, adaptively adjusting illumination balance. Extensive experiments demonstrate that the proposed method exhibits comprehensive performance advantages over 21 state-of-the-art methods on 9 benchmark datasets. Furthermore, a more persuasive experiment has been conducted to validate our approach the effectiveness in downstream visual tasks (e.g., saliency detection). Compared to several enhancement models, the proposed method effectively improves the segmentation results and quantitative metrics of saliency detection. The source code will be available at https://github.com/woshiyll/JCRNet. △ Less

Submitted 19 October, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.11977 [pdf, other]

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Authors: Shun Lei, Yixuan Zhou, Liyang Chen, Dan Luo, Zhiyong Wu, Xixin Wu, Shiyin Kang, Tao Jiang, Yahui Zhou, Yuxing Han, Helen Meng

Abstract: Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by th… ▽ More Zero-shot text-to-speech (TTS) synthesis aims to clone any unseen speaker's voice without adaptation parameters. By quantizing speech waveform into discrete acoustic tokens and modeling these tokens with the language model, recent language model-based TTS models show zero-shot speaker adaptation capabilities with only a 3-second acoustic prompt of an unseen speaker. However, they are limited by the length of the acoustic prompt, which makes it difficult to clone personal speaking style. In this paper, we propose a novel zero-shot TTS model with the multi-scale acoustic prompts based on a neural codec language model VALL-E. A speaker-aware text encoder is proposed to learn the personal speaking style at the phoneme-level from the style prompt consisting of multiple sentences. Following that, a VALL-E based acoustic decoder is utilized to model the timbre from the timbre prompt at the frame-level and generate speech. The experimental results show that our proposed method outperforms baselines in terms of naturalness and speaker similarity, and can achieve better performance by scaling out to a longer style prompt. △ Less

Submitted 9 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Accepted bt ICASSP 2024

arXiv:2309.03686 [pdf, other]

MS-UNet-v2: Adaptive Denoising Method and Training Strategy for Medical Image Segmentation with Small Training Data

Authors: Haoyuan Chen, Yufei Han, Pin Xu, Yanyi Li, Kuan Li, Jianping Yin

Abstract: Models based on U-like structures have improved the performance of medical image segmentation. However, the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing t… ▽ More Models based on U-like structures have improved the performance of medical image segmentation. However, the single-layer decoder structure of U-Net is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse if the number of training sets of data is not sufficiently large, which is common in medical image processing tasks where annotated data are more difficult to obtain than other tasks. Based on this observation, we propose a novel U-Net model named MS-UNet for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUnet, we specifically design a multi-scale nested decoder based on the Swin Transformer for U-Net. The proposed multi-scale nested decoder structure allows the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. In addition, we propose a novel edge loss and a plug-and-play fine-tuning Denoising module, which not only effectively improves the segmentation performance of MS-UNet, but could also be applied to other models individually. Experimental results show that MS-UNet could effectively improve the network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data, and the proposed Edge loss and Denoising module could significantly enhance the segmentation performance of MS-UNet. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2309.03451 [pdf, other]

Cross-domain Sound Recognition for Efficient Underwater Data Analysis

Authors: Jeongsoo Park, Dong-Gyun Han, Hyoung Sul La, Sangmin Lee, Yoonchang Han, Eun-Jin Yang

Abstract: This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualizati… ▽ More This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis. △ Less

Submitted 21 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

Comments: Accepted to APSIPA 2023

arXiv:2308.15752 [pdf, other]

Large-scale data extraction from the UNOS organ donor documents

Authors: Marek Rychlik, Bekir Tanriover, Yan Han

Abstract: In this paper we focus on three major task: 1) discussing our methods: Our method captures a portion of the data in DCD flowsheets, kidney perfusion data, and Flowsheet data captured peri-organ recovery surgery. 2) demonstrating the result: We built a comprehensive, analyzable database from 2022 OPTN data. This dataset is by far larger than any previously available even in this preliminary phase;… ▽ More In this paper we focus on three major task: 1) discussing our methods: Our method captures a portion of the data in DCD flowsheets, kidney perfusion data, and Flowsheet data captured peri-organ recovery surgery. 2) demonstrating the result: We built a comprehensive, analyzable database from 2022 OPTN data. This dataset is by far larger than any previously available even in this preliminary phase; and 3) proving that our methods can be extended to all the past OPTN data and future data. The scope of our study is all Organ Procurement and Transplantation Network (OPTN) data of the USA organ donors since 2008. The data was not analyzable in a large scale in the past because it was captured in PDF documents known as ``Attachments'', whereby every donor's information was recorded into dozens of PDF documents in heterogeneous formats. To make the data analyzable, one needs to convert the content inside these PDFs to an analyzable data format, such as a standard SQL database. In this paper we will focus on 2022 OPTN data, which consists of $\approx 400,000$ PDF documents spanning millions of pages. The entire OPTN data covers 15 years (2008--20022). This paper assumes that readers are familiar with the content of the OPTN data. △ Less

Submitted 4 January, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

MSC Class: 62; 68 ACM Class: I.5.4

arXiv:2308.12985 [pdf]

Perimeter Control with Heterogeneous Metering Rates for Cordon Signals: A Physics-Regularized Multi-Agent Reinforcement Learning Approach

Authors: Jiajie Yu, Pierre-Antoine Laharotte, Yu Han, Wei Ma, Ludovic Leclercq

Abstract: Perimeter Control (PC) strategies have been proposed to address urban road network control in oversaturated situations by regulating the transfer flow of the Protected Network (PN) based on the Macroscopic Fundamental Diagram (MFD). The uniform metering rate for cordon signals in most existing studies overlooks the variance of local traffic states at the intersection level, which may cause severe… ▽ More Perimeter Control (PC) strategies have been proposed to address urban road network control in oversaturated situations by regulating the transfer flow of the Protected Network (PN) based on the Macroscopic Fundamental Diagram (MFD). The uniform metering rate for cordon signals in most existing studies overlooks the variance of local traffic states at the intersection level, which may cause severe local traffic congestion and degradation of the network stability. PC strategies with heterogeneous metering rates for cordon signals allow precise control for the perimeter but the complexity of the problem increases exponentially with the scale of the PN. This paper leverages a Multi-Agent Reinforcement Learning (MARL)-based traffic signal control framework to decompose this PC problem, which considers heterogeneous metering rates for cordon signals, into multi-agent cooperation tasks. Each agent controls an individual signal located in the cordon, decreasing the dimension of action space for the controller compared to centralized methods. A physics regularization approach for the MARL framework is proposed to ensure the distributed cordon signal controllers are aware of the global network state by encoding MFD-based knowledge into the action-value functions of the local agents. The proposed PC strategy is operated as a two-stage system, with a feedback PC strategy detecting the overall traffic state within the PN and then distributing local instructions to cordon signals controllers in the MARL framework via the physics regularization. Through numerical tests with different demand patterns in a microscopic traffic environment, the proposed PC strategy shows promising robustness and transferability. It outperforms state-of-the-art feedback PC strategies in increasing network throughput, decreasing distributed delay for gate links, and reducing carbon emissions. △ Less

Submitted 31 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 21 pages, 24 figures

arXiv:2308.02088 [pdf, other]

doi 10.1002/mrm.30123

Motion-robust free-running volumetric cardiovascular MRI

Authors: Syed M. Arshad, Lee C. Potter, Chong Chen, Yingmin Liu, Preethi Chandrasekaran, Christopher Crabtree, Matthew S. Tong, Orlando P. Simonetti, Yuchi Han, Rizwan Ahmad

Abstract: PURPOSE: To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion. METHODS: The proposed method, called compressive recovery with outlier rejection (CORe), models outliers in the measured data as an additive auxiliary variable. We enforce MR physics-guided group sparsity on the auxiliary variable, and jointly estimate it… ▽ More PURPOSE: To present and assess an outlier mitigation method that makes free-running volumetric cardiovascular MRI (CMR) more robust to motion. METHODS: The proposed method, called compressive recovery with outlier rejection (CORe), models outliers in the measured data as an additive auxiliary variable. We enforce MR physics-guided group sparsity on the auxiliary variable, and jointly estimate it along with the image using an iterative algorithm. For evaluation, CORe is first compared to traditional compressed sensing (CS), robust regression (RR), and an existing outlier rejection method using two simulation studies. Then, CORe is compared to CS using seven three-dimensional (3D) cine, 12 rest four-dimensional (4D) flow, and eight stress 4D flow imaging datasets. RESULTS: Our simulation studies show that CORe outperforms CS, RR, and the existing outlier rejection method in terms of normalized mean square error and structural similarity index across 55 different realizations. The expert reader evaluation of 3D cine images demonstrates that CORe is more effective in suppressing artifacts while maintaining or improving image sharpness. Finally, 4D flow images show that CORe yields more reliable and consistent flow measurements, especially in the presence of involuntary subject motion or exercise stress. CONCLUSION: An outlier rejection method is presented and tested using simulated and measured data. This method can help suppress motion artifacts in a wide range of free-running CMR applications. CODE & DATA: Implementation code and datasets are available on GitHub at http://github.com/OSU-MR/motion-robust-CMR △ Less

Submitted 24 June, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Journal ref: Magnetic Resonance in Medicine 92(3) (2024) 1248-1262

arXiv:2307.13237 [pdf, ps, other]

doi 10.1109/LWC.2023.3331489

Rank Optimization for MIMO Channel with RIS: Simulation and Measurement

Authors: Shengguo Meng, Wankai Tang, Weicong Chen, Jifeng Lan, Qun Yan Zhou, Yu Han, Xiao Li, Shi Jin

Abstract: Reconfigurable intelligent surface (RIS) is a promising technology that can reshape the electromagnetic environment in wireless networks, offering various possibilities for enhancing wireless channels. Motivated by this, we investigate the channel optimization for multiple-input multiple-output (MIMO) systems assisted by RIS. In this paper, an efficient RIS optimization method is proposed to enhan… ▽ More Reconfigurable intelligent surface (RIS) is a promising technology that can reshape the electromagnetic environment in wireless networks, offering various possibilities for enhancing wireless channels. Motivated by this, we investigate the channel optimization for multiple-input multiple-output (MIMO) systems assisted by RIS. In this paper, an efficient RIS optimization method is proposed to enhance the effective rank of the MIMO channel for achievable rate improvement. Numerical results are presented to verify the effectiveness of RIS in improving MIMO channels. Additionally, we construct a 2$\times$2 RIS-assisted MIMO prototype to perform experimental measurements and validate the performance of our proposed algorithm. The results reveal a significant increase in effective rank and achievable rate for the RIS-assisted MIMO channel compared to the MIMO channel without RIS. △ Less

Submitted 8 December, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: This work has been accepted by IEEE WCL

arXiv:2307.09823 [pdf, other]

Multi-modal Learning based Prediction for Disease

Authors: Yaran Chen, Xueyu Chen, Yu Han, Haoran Li, Dongbin Zhao, Jingzhong Li, Xu Wang

Abstract: Non alcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease, which can be predicted accurately to prevent advanced fibrosis and cirrhosis. While, a liver biopsy, the gold standard for NAFLD diagnosis, is invasive, expensive, and prone to sampling errors. Therefore, non-invasive studies are extremely promising, yet they are still in their infancy due to the lack of c… ▽ More Non alcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease, which can be predicted accurately to prevent advanced fibrosis and cirrhosis. While, a liver biopsy, the gold standard for NAFLD diagnosis, is invasive, expensive, and prone to sampling errors. Therefore, non-invasive studies are extremely promising, yet they are still in their infancy due to the lack of comprehensive research data and intelligent methods for multi-modal data. This paper proposes a NAFLD diagnosis system (DeepFLDDiag) combining a comprehensive clinical dataset (FLDData) and a multi-modal learning based NAFLD prediction method (DeepFLD). The dataset includes over 6000 participants physical examinations, laboratory and imaging studies, extensive questionnaires, and facial images of partial participants, which is comprehensive and valuable for clinical studies. From the dataset, we quantitatively analyze and select clinical metadata that most contribute to NAFLD prediction. Furthermore, the proposed DeepFLD, a deep neural network model designed to predict NAFLD using multi-modal input, including metadata and facial images, outperforms the approach that only uses metadata. Satisfactory performance is also verified on other unseen datasets. Inspiringly, DeepFLD can achieve competitive results using only facial images as input rather than metadata, paving the way for a more robust and simpler non-invasive NAFLD diagnosis. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.07650 [pdf, other]

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Authors: Yuchen Han, Chen Xu, Tong Xiao, Jingbo Zhu

Abstract: Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On… ▽ More Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: ACL 2023 Main Conference

arXiv:2304.14467 [pdf, other]

Distributed Quantized Detection of Sparse Signals Under Byzantine Attacks

Authors: Chen Quan, Yunghsiang S. Han, Baocheng Geng, Pramod K. Varshney

Abstract: This paper investigates distributed detection of sparse stochastic signals with quantized measurements under Byzantine attacks. Under this type of attack, sensors in the networks might send falsified data to degrade system performance. The Bernoulli-Gaussian (BG) distribution in terms of the sparsity degree of the stochastic signal is utilized for modeling the sparsity of signals. Several detector… ▽ More This paper investigates distributed detection of sparse stochastic signals with quantized measurements under Byzantine attacks. Under this type of attack, sensors in the networks might send falsified data to degrade system performance. The Bernoulli-Gaussian (BG) distribution in terms of the sparsity degree of the stochastic signal is utilized for modeling the sparsity of signals. Several detectors with improved detection performance are proposed by incorporating the estimated attack parameters into the detection process. First, we propose the generalized likelihood ratio test with reference sensors (GLRTRS) and the locally most powerful test with reference sensors (LMPTRS) detectors with adaptive thresholds, given that the sparsity degree and the attack parameters are unknown. Our simulation results show that the LMPTRS and GLRTRS detectors outperform the LMPT and GLRT detectors proposed for an attack-free environment and are more robust against attacks. The proposed detectors can achieve the detection performance close to the benchmark likelihood ratio test (LRT) detector, which has perfect knowledge of the attack parameters and sparsity degree. When the fraction of Byzantine nodes are assumed to be known, we can further improve the system's detection performance. We propose the enhanced LMPTRS (E-LMPTRS) and enhanced GLRTRS (E-GLRTRS) detectors by filtering out potential malicious sensors with the knowledge of the fraction of Byzantine nodes in the network. Simulation results show the superiority of proposed enhanced detectors over LMPTRS and GLRTRS detectors. △ Less

Submitted 27 April, 2023; originally announced April 2023.

arXiv:2301.10815 [pdf, other]

Human-machine Hierarchical Networks for Decision Making under Byzantine Attacks

Authors: Chen Quan, Baocheng Geng, Yunghsiang S. Han, Pramod K. Varshney

Abstract: This paper proposes a belief-updating scheme in a human-machine collaborative decision-making network to combat Byzantine attacks. A hierarchical framework is used to realize the network where local decisions from physical sensors act as reference decisions to improve the quality of human sensor decisions. During the decision-making process, the belief that each physical sensor is malicious is upd… ▽ More This paper proposes a belief-updating scheme in a human-machine collaborative decision-making network to combat Byzantine attacks. A hierarchical framework is used to realize the network where local decisions from physical sensors act as reference decisions to improve the quality of human sensor decisions. During the decision-making process, the belief that each physical sensor is malicious is updated. The case when humans have side information available is investigated, and its impact is analyzed. Simulation results substantiate that the proposed scheme can significantly improve the quality of human sensor decisions, even when most physical sensors are malicious. Moreover, the performance of the proposed method does not necessarily depend on the knowledge of the actual fraction of malicious physical sensors. Consequently, the proposed scheme can effectively defend against Byzantine attacks and improve the quality of human sensors' decisions so that the performance of the human-machine collaborative system is enhanced. △ Less

Submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.09058 [pdf, other]

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Authors: Kwangje Baeg, Yeong-Gwan Kim, Young-Sub Han, Byoung-Ki Jeon

Abstract: Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech informatio… ▽ More Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech information leakage. Hence, to improve age group classification performance, we consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. In addition, we investigated different types of speaker embeddings to learn and generalize the domain-invariant representations for age groups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive adversarial network in multi-objective scenarios and leveraging speaker embeddings for the domain adaptation task. △ Less

Submitted 22 January, 2023; originally announced January 2023.

arXiv:2211.06160 [pdf, other]

Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations

Authors: Yoori Oh, Juheon Lee, Yoseob Han, Kyogu Lee

Abstract: Recent text-to-speech models have reached the level of generating natural speech similar to what humans say. But there still have limitations in terms of expressiveness. The existing emotional speech synthesis models have shown controllability using interpolated features with scaling parameters in emotional latent space. However, the emotional latent space generated from the existing models is dif… ▽ More Recent text-to-speech models have reached the level of generating natural speech similar to what humans say. But there still have limitations in terms of expressiveness. The existing emotional speech synthesis models have shown controllability using interpolated features with scaling parameters in emotional latent space. However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc. In this paper, we propose a novel method to control the continuous intensity of emotions using semi-supervised learning. The model learns emotions of intermediate intensity using pseudo-labels generated from phoneme-level sequences of speech information. An embedding space built from the proposed model satisfies the uniform grid geometry with an emotional basis. The experimental results showed that the proposed method was superior in controllability and naturalness. △ Less

Submitted 29 May, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted by Interspeech 2023

arXiv:2210.14627 [pdf, other]

Channel-Aware Ordered Successive Relaying with Finite-Blocklength Coding

Authors: Lingrui Zhang, Yuxing Han, Qiong Wang, Wei Chen

Abstract: Successive relaying can improve the transmission rate by allowing the source and relays to transmit messages simultaneously, but it may cause severe inter-relay interference (IRI). IRI cancellation schemes have been proposed to mitigate IRI. However, interference cancellation methods have a high risk of error propagation, resulting in a severe transmission rate loss in finite blocklength regimes.… ▽ More Successive relaying can improve the transmission rate by allowing the source and relays to transmit messages simultaneously, but it may cause severe inter-relay interference (IRI). IRI cancellation schemes have been proposed to mitigate IRI. However, interference cancellation methods have a high risk of error propagation, resulting in a severe transmission rate loss in finite blocklength regimes. Thus, jointly decoding for successive relaying with finite-blocklength coding (FBC) remains a challenge. In this paper, we present an optimized channel-aware ordered successive relaying protocol with finite-blocklength coding (CAO-SIR-FBC), which can recover the rate loss by carefully adapting the relay transmission order and rate. We analyze the average throughput of the CAO-SIR-FBC method, based on which a closed-form expression in a high signal-to-noise regime (SNR) is presented. Average throughput analysis and simulations show that CAO-SIR-FBC outperforms conventional two-timeslot half-duplex relaying in terms of spectral efficiency. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: 11 pages, 5 figures

arXiv:2207.08870 [pdf, other]

doi 10.1109/LSP.2023.3244748

Efficient Ordered-Transmission Based Distributed Detection under Data Falsification Attacks

Authors: Chen Quan, Nandan Sriranga, Haodong Yang, Yunghsiang S. Han, Baocheng Geng, Pramod K. Varshney

Abstract: In distributed detection systems, energy-efficient ordered transmission (EEOT) schemes are able to reduce the number of transmissions required to make a final decision. In this work, we investigate the effect of data falsification attacks on the performance of EEOT-based systems. We derive the probability of error for an EEOT-based system under attack and find an upper bound (UB) on the expected n… ▽ More In distributed detection systems, energy-efficient ordered transmission (EEOT) schemes are able to reduce the number of transmissions required to make a final decision. In this work, we investigate the effect of data falsification attacks on the performance of EEOT-based systems. We derive the probability of error for an EEOT-based system under attack and find an upper bound (UB) on the expected number of transmissions required to make the final decision. Moreover, we tighten this UB by solving an optimization problem via integer programming (IP). We also obtain the FC's optimal threshold which guarantees the optimal detection performance of the EEOT-based system. Numerical and simulation results indicate that it is possible to reduce transmissions while still ensuring the quality of the decision with an appropriately designed threshold. △ Less

Submitted 18 July, 2022; originally announced July 2022.

arXiv:2206.02507 [pdf, other]

Learning to Control under Time-Varying Environment

Authors: Yuzhen Han, Ruben Solozabal, Jing Dong, Xingyu Zhou, Martin Takac, Bin Gu

Abstract: This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from non… ▽ More This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from nonparametric rate of regret. In this paper, we propose the first computationally tractable online algorithm with regret guarantees that avoids offline planning over the state linear feedback policies. Our algorithm is based on the optimism in the face of uncertainty (OFU) principle in which we optimistically select the best model in a high confidence region. Our algorithm is then more explorative when compared to previous approaches. To overcome non-stationarity, we propose either a restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These algorithms utilize data from the current phase for tracking variations on the system dynamics. We corroborate our theoretical findings with numerical experiments, which highlight the effectiveness of our methods. To the best of our knowledge, our study establishes the first model-based online algorithm with regret guarantees under LTV dynamical systems. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2206.01833 [pdf, other]

Leveraging Heterogeneous Capabilities in Multi-Agent Systems for Environmental Conflict Resolution

Authors: Michael Enqi Cao, Jonas Warnke, Yunhai Han, Xinpei Ni, Ye Zhao, Samuel Coogan

Abstract: In this paper, we introduce a high-level controller synthesis framework that enables teams of heterogeneous agents to assist each other in resolving environmental conflicts that appear at runtime. This conflict resolution method is built upon temporal-logic-based reactive synthesis to guarantee safety and task completion under specific environment assumptions. In heterogeneous multi-agent systems,… ▽ More In this paper, we introduce a high-level controller synthesis framework that enables teams of heterogeneous agents to assist each other in resolving environmental conflicts that appear at runtime. This conflict resolution method is built upon temporal-logic-based reactive synthesis to guarantee safety and task completion under specific environment assumptions. In heterogeneous multi-agent systems, every agent is expected to complete its own tasks in service of a global team objective. However, at runtime, an agent may encounter un-modeled obstacles (e.g., doors or walls) that prevent it from achieving its own task. To address this problem, we employ the capabilities of other heterogeneous agents to resolve the obstacle. A controller framework is proposed to redirect agents with the capability of resolving the appropriate obstacles to the required target when such a situation is detected. Three case studies involving a bipedal robot Digit and a quadcopter are used to evaluate the controller performance in action. Additionally, we implement the proposed framework on a physical multi-agent robotic system to demonstrate its viability for real world applications. △ Less

Submitted 1 September, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: Submitted to The International Symposium on Safety, Security, and Rescue Robotics (SSRR) 2022

arXiv:2205.15170 [pdf, other]

GAN-based Medical Image Small Region Forgery Detection via a Two-Stage Cascade Framework

Authors: Jianyi Zhang, Xuanxi Huang, Yaqi Liu, Yuyang Han, Zixiao Xiang

Abstract: Using generative adversarial network (GAN)\cite{RN90} for data enhancement of medical images is significantly helpful for many computer-aided diagnosis (CAD) tasks. A new attack called CT-GAN has emerged. It can inject or remove lung cancer lesions to CT scans. Because the tampering region may even account for less than 1\% of the original image, even state-of-the-art methods are challenging to de… ▽ More Using generative adversarial network (GAN)\cite{RN90} for data enhancement of medical images is significantly helpful for many computer-aided diagnosis (CAD) tasks. A new attack called CT-GAN has emerged. It can inject or remove lung cancer lesions to CT scans. Because the tampering region may even account for less than 1\% of the original image, even state-of-the-art methods are challenging to detect the traces of such tampering. This paper proposes a cascade framework to detect GAN-based medical image small region forgery like CT-GAN. In the local detection stage, we train the detector network with small sub-images so that interference information in authentic regions will not affect the detector. We use depthwise separable convolution and residual to prevent the detector from over-fitting and enhance the ability to find forged regions through the attention mechanism. The detection results of all sub-images in the same image will be combined into a heatmap. In the global classification stage, using gray level co-occurrence matrix (GLCM) can better extract features of the heatmap. Because the shape and size of the tampered area are uncertain, we train PCA and SVM methods for classification. Our method can classify whether a CT image has been tampered and locate the tampered position. Sufficient experiments show that our method can achieve excellent performance. △ Less

Submitted 30 May, 2022; originally announced May 2022.

arXiv:2204.08686 [pdf, ps, other]

Audio-Visual Wake Word Spotting System For MISP Challenge 2021

Authors: Yanguang Xu, Jianwei Sun, Yang Han, Shuaijiang Zhao, Chaoyang Mei, Tingwei Guo, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

Abstract: This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustness of far-field wake word spotting. In the proposed system, firstly, we take advantage of speech enhancement algorithms such as beamforming and weight… ▽ More This paper presents the details of our system designed for the Task 1 of Multimodal Information Based Speech Processing (MISP) Challenge 2021. The purpose of Task 1 is to leverage both audio and video information to improve the environmental robustness of far-field wake word spotting. In the proposed system, firstly, we take advantage of speech enhancement algorithms such as beamforming and weighted prediction error (WPE) to address the multi-microphone conversational audio. Secondly, several data augmentation techniques are applied to simulate a more realistic far-field scenario. For the video information, the provided region of interest (ROI) is used to obtain visual representation. Then the multi-layer CNN is proposed to learn audio and visual representations, and these representations are fed into our two-branch attention-based network which can be employed for fusion, such as transformer and conformed. The focal loss is used to fine-tune the model and improve the performance significantly. Finally, multiple trained models are integrated by casting vote to achieve our final 0.091 score. △ Less

Submitted 19 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: Accepted to ICASSP 2022

arXiv:2204.07212 [pdf, other]

Reputation and Audit Bit Based Distributed Detection in the Presence of Byzantine

Authors: Chen Quan, Yunghsiang S. Han, Baocheng Geng, Pramod K. Varshney

Abstract: In this paper, two reputation based algorithms called Reputation and audit based clustering (RAC) algorithm and Reputation and audit based clustering with auxiliary anchor node (RACA) algorithm are proposed to defend against Byzantine attacks in distributed detection networks when the fusion center (FC) has no prior knowledge of the attacking strategy of Byzantine nodes. By updating the reputation… ▽ More In this paper, two reputation based algorithms called Reputation and audit based clustering (RAC) algorithm and Reputation and audit based clustering with auxiliary anchor node (RACA) algorithm are proposed to defend against Byzantine attacks in distributed detection networks when the fusion center (FC) has no prior knowledge of the attacking strategy of Byzantine nodes. By updating the reputation index of the sensors in cluster-based networks, the system can accurately identify Byzantine nodes. The simulation results show that both proposed algorithms have superior detection performance compared with other algorithms. The proposed RACA algorithm works well even when the number of Byzantine nodes exceeds half of the total number of sensors in the network. Furthermore, the robustness of our proposed algorithms is evaluated in a dynamically changing scenario, where the attacking parameters change over time. We show that our algorithms can still achieve superior detection performance. △ Less

Submitted 14 April, 2022; originally announced April 2022.

arXiv:2203.10473 [pdf, other]

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

Authors: Jinlong Xue, Yayue Deng, Yichen Han, Ya Li, Jianqing Sun, Jiaen Liang

Abstract: In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker information. In this paper, we focus on accurate speaker encoder modeling and propose an end-to-end method that can generate high-quality speech and better similarity for… ▽ More In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker information. In this paper, we focus on accurate speaker encoder modeling and propose an end-to-end method that can generate high-quality speech and better similarity for both seen and unseen speakers. The proposed architecture consists of three separately trained components: a speaker encoder based on the state-of-the-art ECAPA-TDNN model which is derived from speaker verification task, a FastSpeech2 based synthesizer, and a HiFi-GAN vocoder. The comparison among different speaker encoder models shows our proposed method can achieve better naturalness and similarity. To efficiently evaluate our synthesized speech, we are the first to adopt deep learning based automatic MOS evaluation methods to assess our results, and these methods show great potential in automatic speech quality assessment. △ Less

Submitted 26 March, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: 5 pages, 2 figures, submitted to interspeech2022

arXiv:2203.03969 [pdf, other]

A Dynamic Hierarchical Framework for IoT-assisted Metaverse Synchronization

Authors: Yue Han, Dusit Niyato, Cyril Leung, Dong In Kim, Kun Zhu, Shaohan Feng, Sherman Xuemin Shen, Chunyan Miao

Abstract: Metaverse has recently attracted much attention from both academia and industry. Virtual services, ranging from virtual driver training to online route optimization for smart goods delivery, are emerging in the Metaverse. To make the human experience of virtual life more real, digital twins (DTs), namely digital replicas of physical objects, are key enablers. However, DT status may not always accu… ▽ More Metaverse has recently attracted much attention from both academia and industry. Virtual services, ranging from virtual driver training to online route optimization for smart goods delivery, are emerging in the Metaverse. To make the human experience of virtual life more real, digital twins (DTs), namely digital replicas of physical objects, are key enablers. However, DT status may not always accurately reflect that of its real-world twin because the latter may be subject to changes with time. As such, it is necessary to synchronize a DT with its physical counterpart to ensure that its status is accurate for virtual businesses in the Metaverse. In this paper, we propose a dynamic hierarchical framework in which a group of IoT devices is incentivized to sense and collect physical objects' status information collectively so as to assists virtual service providers (VSPs) in synchronizing DTs. Based on the collected sensing data and the value decay rate of the DTs, the VSPs can determine synchronization intensities to maximize their payoffs. In our proposed dynamic hierarchical framework, the lower-level evolutionary game captures the VSPs selection by the IoT device population, and the upper-level differential game captures the VSPs payoffs, which are affected by the synchronization strategy, IoT devices selections, and the DTs value status, given VSPs are simultaneous decision makers. We further consider the case in which some VSPs are first movers and extend it as a Stackelberg differential game. We theoretically and experimentally show that the equilibrium to the lower-level game exists and is evolutionarily robust, and provide a sensitivity analysis with respect to various system parameters. Experiments show that the proposed dynamic hierarchical game outperform the baseline. △ Less

Submitted 14 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

arXiv:2202.07108 [pdf]

Dynamic optical contrast imaging for real-time delineation of tumor resection margins using head and neck cancer as a model

Authors: Yong Hu, Shan Huang, Albert Y. Han, Seong Moon, Jeffrey F. Krane, Oscar Stafsudd, Warren Grundfest, Maie A. St. John

Abstract: Complete surgical resection of the tumor for Head and neck squamous cell carcinoma (HNSCC) remains challenging, given the devastating side effects of aggressive surgery and the anatomic proximity to vital structures. To address the clinical challenges, we introduce a wide-field, label-free imaging tool that can assist surgeons delineate tumor margins real-time. We assume that autofluorescence life… ▽ More Complete surgical resection of the tumor for Head and neck squamous cell carcinoma (HNSCC) remains challenging, given the devastating side effects of aggressive surgery and the anatomic proximity to vital structures. To address the clinical challenges, we introduce a wide-field, label-free imaging tool that can assist surgeons delineate tumor margins real-time. We assume that autofluorescence lifetime is a natural indicator of the health level of tissues, and ratio-metric measurement of the emission-decay state to the emission-peak state of excited fluorophores will enable rapid lifetime mapping of tissues. Here, we describe the principle, instrumentation, characterization of the imager and the intraoperative imaging of resected tissues from 13 patients undergoing head and neck cancer resection. 20 x 20 mm2 imaging takes 2 second/frame with a working distance of 50 mm, and characterization shows that the spatial resolution reached 70 μm and the least distinguishable fluorescence lifetime difference is 0.14 ns. Tissue imaging and Hematoxylin-Eosin stain slides comparison reveals its capability of delineating cancerous boundaries with submillimeter accuracy and a sensitivity of 91.86% and specificity of 84.38%. △ Less

Submitted 18 February, 2024; v1 submitted 14 February, 2022; originally announced February 2022.

Comments: 21 pages, 7 figures and 1 table

arXiv:2112.07854 [pdf]

A terrain treadmill to study animal locomotion through large obstacles

Authors: Ratan Othayoth, Blake Strebel, Yuanfeng Han, Evains Francois, Chen Li

Abstract: A major challenge to understanding locomotion in complex 3-D terrain with large obstacles is to create tools for controlled, systematic lab experiments. Existing terrain arenas only allow observations at small spatiotemporal scales (~10 body length, ~10 stride cycles). Here, we create a terrain treadmill to enable high-resolution observations of animal locomotion through large obstacles over large… ▽ More A major challenge to understanding locomotion in complex 3-D terrain with large obstacles is to create tools for controlled, systematic lab experiments. Existing terrain arenas only allow observations at small spatiotemporal scales (~10 body length, ~10 stride cycles). Here, we create a terrain treadmill to enable high-resolution observations of animal locomotion through large obstacles over large spatiotemporal scales. An animal moves through modular obstacles on an inner sphere, while a rigidly-attached, concentric, transparent outer sphere rotated with the opposite velocity via closed-loop feedback to keep the animal on top. During sustained locomotion, a discoid cockroach moved through pillar obstacles for 25 minutes ($\approx$2500 strides) over 67 m ($\approx$1500 body lengths), and was contained within a radius of 4 cm (0.9 body length) for 83% of the duration, even at speeds of up to 10 body length/s. The treadmill enabled observation of diverse locomotor behaviors and quantification of animal-obstacle interaction. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.02743 [pdf, other]

Separated Contrastive Learning for Organ-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

Authors: Jiacheng Wang, Xiaomeng Li, Yiming Han, Jing Qin, Liansheng Wang, Zhou Qichao

Abstract: Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV) is of great significance for radiotherapy planning. However, it is a challenging task to learn powerful representations for accurate delineation under limited pixel (voxel)-wise annotations. Contrastive learning at pixel-level can alleviate the dependency on annotations by learning dense representations from unlabeled data.… ▽ More Automatic delineation of organ-at-risk (OAR) and gross-tumor-volume (GTV) is of great significance for radiotherapy planning. However, it is a challenging task to learn powerful representations for accurate delineation under limited pixel (voxel)-wise annotations. Contrastive learning at pixel-level can alleviate the dependency on annotations by learning dense representations from unlabeled data. Recent studies in this direction design various contrastive losses on the feature maps, to yield discriminative features for each pixel in the map. However, pixels in the same map inevitably share semantics to be closer than they actually are, which may affect the discrimination of pixels in the same map and lead to the unfair comparison to pixels in other maps. To address these issues, we propose a separated region-level contrastive learning scheme, namely SepaReg, the core of which is to separate each image into regions and encode each region separately. Specifically, SepaReg comprises two components: a structure-aware image separation (SIS) module and an intra- and inter-organ distillation (IID) module. The SIS is proposed to operate on the image set to rebuild a region set under the guidance of structural information. The inter-organ representation will be learned from this set via typical contrastive losses cross regions. On the other hand, the IID is proposed to tackle the quantity imbalance in the region set as tiny organs may produce fewer regions, by exploiting intra-organ representations. We conducted extensive experiments to evaluate the proposed model on a public dataset and two private datasets. The experimental results demonstrate the effectiveness of the proposed model, consistently achieving better performance than state-of-the-art approaches. Code is available at https://github.com/jcwang123/Separate_CL. △ Less

Submitted 20 April, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: Accepted in AAAI-22 (Oral)

arXiv:2111.03729 [pdf, other]

Explaining neural network predictions of material strength

Authors: Ian A. Palmer, T. Nathan Mundhenk, Brian Gallagher, Yong Han

Abstract: We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material's crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one wha… ▽ More We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material's crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one what parts of an image are important to the network's decision. One can usually deduce the important features by looking at these salient locations. However, SEM images of crystals are more abstract to the human observer than natural image photographs. As a result, it is not easy to tell what features are important at the locations which are most salient. To solve this, we developed a method that helps us map features from important locations in SEM images to non-abstract textures that are easier to interpret. △ Less

Submitted 5 November, 2021; originally announced November 2021.

arXiv:2110.14787 [pdf, other]

SCALP -- Supervised Contrastive Learning for Cardiopulmonary Disease Classification and Localization in Chest X-rays using Patient Metadata

Authors: Ajay Jaiswal, Tianhao Li, Cyprian Zander, Yan Han, Justin F. Rousseau, Yifan Peng, Ying Ding

Abstract: Computer-aided diagnosis plays a salient role in more accessible and accurate cardiopulmonary diseases classification and localization on chest radiography. Millions of people get affected and die due to these diseases without an accurate and timely diagnosis. Recently proposed contrastive learning heavily relies on data augmentation, especially positive data augmentation. However, generating clin… ▽ More Computer-aided diagnosis plays a salient role in more accessible and accurate cardiopulmonary diseases classification and localization on chest radiography. Millions of people get affected and die due to these diseases without an accurate and timely diagnosis. Recently proposed contrastive learning heavily relies on data augmentation, especially positive data augmentation. However, generating clinically-accurate data augmentations for medical images is extremely difficult because the common data augmentation methods in computer vision, such as sharp, blur, and crop operations, can severely alter the clinical settings of medical images. In this paper, we proposed a novel and simple data augmentation method based on patient metadata and supervised knowledge to create clinically accurate positive and negative augmentations for chest X-rays. We introduce an end-to-end framework, SCALP, which extends the self-supervised contrastive approach to a supervised setting. Specifically, SCALP pulls together chest X-rays from the same patient (positive keys) and pushes apart chest X-rays from different patients (negative keys). In addition, it uses ResNet-50 along with the triplet-attention mechanism to identify cardiopulmonary diseases, and Grad-CAM++ to highlight the abnormal regions. Our extensive experiments demonstrate that SCALP outperforms existing baselines with significant margins in both classification and localization tasks. Specifically, the average classification AUCs improve from 82.8% (SOTA using DenseNet-121) to 83.9% (SCALP using ResNet-50), while the localization results improve on average by 3.7% over different IoU thresholds. △ Less

Submitted 27 October, 2021; originally announced October 2021.

arXiv:2109.13325 [pdf, other]

Enhanced Audit Bit Based Distributed Bayesian Detection in the Presence of Strategic Attacks

Authors: Chen Quan, Baocheng Geng, Yunghsiang S. Han, Pramod K. Varshney

Abstract: This paper employs an audit bit based mechanism to mitigate the effect of Byzantine attacks. In this framework, the optimal attacking strategy for intelligent attackers is investigated for the traditional audit bit based scheme (TAS) to evaluate the robustness of the system. We show that it is possible for an intelligent attacker to degrade the performance of TAS to the system without audit bits.… ▽ More This paper employs an audit bit based mechanism to mitigate the effect of Byzantine attacks. In this framework, the optimal attacking strategy for intelligent attackers is investigated for the traditional audit bit based scheme (TAS) to evaluate the robustness of the system. We show that it is possible for an intelligent attacker to degrade the performance of TAS to the system without audit bits. To enhance the robustness of the system in the presence of intelligent attackers, we propose an enhanced audit bit based scheme (EAS). The optimal fusion rule for the proposed scheme is derived and the detection performance of the system is evaluated via the probability of error for the system. Simulation results show that the proposed EAS improves the robustness and the detection performance of the system. Moreover, based on EAS, another new scheme called the reduced audit bit based scheme (RAS) is proposed which further improves system performance. We derive the new optimal fusion rule and the simulation results show that RAS outperforms EAS and TAS in terms of both robustness and detection performance of the system. Then, we extend the proposed RAS for a wide-area cluster based distributed wireless sensor networks (CWSNs). Simulation results show that the proposed RAS significantly reduces the communication overhead between the sensors and the FC, which prolongs the lifetime of the network. △ Less

Submitted 27 September, 2021; originally announced September 2021.

arXiv:2108.12094 [pdf, other]

A Numerical Verification Framework for Differential Privacy in Estimation

Authors: Yunhai Han, Sonia Martínez

Abstract: This work proposes an algorithmic method to verify differential privacy for estimation mechanisms with performance guarantees. Differential privacy makes it hard to distinguish outputs of a mechanism produced by adjacent inputs. While obtaining theoretical conditions that guarantee differential privacy may be possible, evaluating these conditions in practice can be hard. This is especially true fo… ▽ More This work proposes an algorithmic method to verify differential privacy for estimation mechanisms with performance guarantees. Differential privacy makes it hard to distinguish outputs of a mechanism produced by adjacent inputs. While obtaining theoretical conditions that guarantee differential privacy may be possible, evaluating these conditions in practice can be hard. This is especially true for estimation mechanisms that take values in continuous spaces, as this requires checking for an infinite set of inequalities. Instead, our verification approach consists of testing the differential privacy condition for a suitably chosen finite collection of events at the expense of some information loss. More precisely, our data-driven, test framework for continuous range mechanisms first finds a highly-likely, compact event set, as well as a partition of this event, and then evaluates differential privacy wrt this partition. This results into a type of differential privacy with high confidence, which we are able to quantify precisely. This approach is then used to evaluate the differential-privacy properties of the recently proposed $W_2$ Moving Horizon Estimator. We confirm its properties, while comparing its performance with alternative approaches in simulation. △ Less

Submitted 2 December, 2021; v1 submitted 26 August, 2021; originally announced August 2021.

Comments: The paper is accepted by IEEE Control System Letter (L-CSS)

arXiv:2108.00952 [pdf, other]

An Applied Deep Learning Approach for Estimating Soybean Relative Maturity from UAV Imagery to Aid Plant Breeding Decisions

Authors: Saba Moeinizade, Hieu Pham, Ye Han, Austin Dobbels, Guiping Hu

Abstract: For a global breeding organization, identifying the next generation of superior crops is vital for its success. Recognizing new genetic varieties requires years of in-field testing to gather data about the crop's yield, pest resistance, heat resistance, etc. At the conclusion of the growing season, organizations need to determine which varieties will be advanced to the next growing season (or sold… ▽ More For a global breeding organization, identifying the next generation of superior crops is vital for its success. Recognizing new genetic varieties requires years of in-field testing to gather data about the crop's yield, pest resistance, heat resistance, etc. At the conclusion of the growing season, organizations need to determine which varieties will be advanced to the next growing season (or sold to farmers) and which ones will be discarded from the candidate pool. Specifically for soybeans, identifying their relative maturity is a vital piece of information used for advancement decisions. However, this trait needs to be physically observed, and there are resource limitations (time, money, etc.) that bottleneck the data collection process. To combat this, breeding organizations are moving toward advanced image capturing devices. In this paper, we develop a robust and automatic approach for estimating the relative maturity of soybeans using a time series of UAV images. An end-to-end hybrid model combining Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) is proposed to extract features and capture the sequential behavior of time series data. The proposed deep learning model was tested on six different environments across the United States. Results suggest the effectiveness of our proposed CNN-LSTM model compared to the local regression method. Furthermore, we demonstrate how this newfound information can be used to aid in plant breeding advancement decisions. △ Less

Submitted 2 August, 2021; originally announced August 2021.

Comments: 22 pages, 7 figures

Showing 1–50 of 94 results for author: Han, Y