-
Enhanced Visual SLAM for Collision-free Driving with Lightweight Autonomous Cars
Authors:
Zhihao Lin,
Zhen Tian,
Qi Zhang,
Hanyang Zhuang,
Jianglin Lan
Abstract:
The paper presents a vision-based obstacle avoidance strategy for lightweight self-driving cars that can be run on a CPU-only device using a single RGB-D camera. The method consists of two steps: visual perception and path planning. The visual perception part uses ORBSLAM3 enhanced with optical flow to estimate the car's poses and extract rich texture information from the scene. In the path planni…
▽ More
The paper presents a vision-based obstacle avoidance strategy for lightweight self-driving cars that can be run on a CPU-only device using a single RGB-D camera. The method consists of two steps: visual perception and path planning. The visual perception part uses ORBSLAM3 enhanced with optical flow to estimate the car's poses and extract rich texture information from the scene. In the path planning phase, we employ a method combining a control Lyapunov function and control barrier function in the form of quadratic program (CLF-CBF-QP) together with an obstacle shape reconstruction process (SRP) to plan safe and stable trajectories. To validate the performance and robustness of the proposed method, simulation experiments were conducted with a car in various complex indoor environments using the Gazebo simulation environment. Our method can effectively avoid obstacles in the scenes. The proposed algorithm outperforms benchmark algorithms in achieving more stable and shorter trajectories across multiple simulated scenes.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Principle Driven Parameterized Fiber Model based on GPT-PINN Neural Network
Authors:
Yubin Zang,
Boyu Hua,
Zhenzhou Tang,
Zhipeng Lin,
Fangzheng Zhang,
Simin Li,
Zuxing Zhang,
Hongwei Chen
Abstract:
In cater the need of Beyond 5G communications, large numbers of data driven artificial intelligence based fiber models has been put forward as to utilize artificial intelligence's regression ability to predict pulse evolution in fiber transmission at a much faster speed compared with the traditional split step Fourier method. In order to increase the physical interpretabiliy, principle driven fibe…
▽ More
In cater the need of Beyond 5G communications, large numbers of data driven artificial intelligence based fiber models has been put forward as to utilize artificial intelligence's regression ability to predict pulse evolution in fiber transmission at a much faster speed compared with the traditional split step Fourier method. In order to increase the physical interpretabiliy, principle driven fiber models have been proposed which inserts the Nonlinear Schodinger Equation into their loss functions. However, regardless of either principle driven or data driven models, they need to be re-trained the whole model under different transmission conditions. Unfortunately, this situation can be unavoidable when conducting the fiber communication optimization work. If the scale of different transmission conditions is large, then the whole model needs to be retrained large numbers of time with relatively large scale of parameters which may consume higher time costs. Computing efficiency will be dragged down as well. In order to address this problem, we propose the principle driven parameterized fiber model in this manuscript. This model breaks down the predicted NLSE solution with respect to one set of transmission condition into the linear combination of several eigen solutions which were outputted by each pre-trained principle driven fiber model via the reduced basis method. Therefore, the model can greatly alleviate the heavy burden of re-training since only the linear combination coefficients need to be found when changing the transmission condition. Not only strong physical interpretability can the model posses, but also higher computing efficiency can be obtained. Under the demonstration, the model's computational complexity is 0.0113% of split step Fourier method and 1% of the previously proposed principle driven fiber model.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Fiber Transmission Model with Parameterized Inputs based on GPT-PINN Neural Network
Authors:
Yubin Zang,
Boyu Hua,
Zhipeng Lin,
Fangzheng Zhang,
Simin Li,
Zuxing Zhang,
Hongwei Chen
Abstract:
In this manuscript, a novelty principle driven fiber transmission model for short-distance transmission with parameterized inputs is put forward. By taking into the account of the previously proposed principle driven fiber model, the reduced basis expansion method and transforming the parameterized inputs into parameterized coefficients of the Nonlinear Schrodinger Equations, universal solutions w…
▽ More
In this manuscript, a novelty principle driven fiber transmission model for short-distance transmission with parameterized inputs is put forward. By taking into the account of the previously proposed principle driven fiber model, the reduced basis expansion method and transforming the parameterized inputs into parameterized coefficients of the Nonlinear Schrodinger Equations, universal solutions with respect to inputs corresponding to different bit rates can all be obtained without the need of re-training the whole model. This model, once adopted, can have prominent advantages in both computation efficiency and physical background. Besides, this model can still be effectively trained without the needs of transmitted signals collected in advance. Tasks of on-off keying signals with bit rates ranging from 2Gbps to 50Gbps are adopted to demonstrate the fidelity of the model.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Efficient Autoregressive Audio Modeling via Next-Scale Prediction
Authors:
Kai Qiu,
Xiang Li,
Hao Chen,
Jie Sun,
Jinglu Wang,
Zhe Lin,
Marios Savvides,
Bhiksha Raj
Abstract:
Audio generation has achieved remarkable progress with the advance of sophisticated generative models, such as diffusion models (DMs) and autoregressive (AR) models. However, due to the naturally significant sequence length of audio, the efficiency of audio generation remains an essential issue to be addressed, especially for AR models that are incorporated in large language models (LLMs). In this…
▽ More
Audio generation has achieved remarkable progress with the advance of sophisticated generative models, such as diffusion models (DMs) and autoregressive (AR) models. However, due to the naturally significant sequence length of audio, the efficiency of audio generation remains an essential issue to be addressed, especially for AR models that are incorporated in large language models (LLMs). In this paper, we analyze the token length of audio tokenization and propose a novel \textbf{S}cale-level \textbf{A}udio \textbf{T}okenizer (SAT), with improved residual quantization. Based on SAT, a scale-level \textbf{A}coustic \textbf{A}uto\textbf{R}egressive (AAR) modeling framework is further proposed, which shifts the next-token AR prediction to next-scale AR prediction, significantly reducing the training cost and inference time. To validate the effectiveness of the proposed approach, we comprehensively analyze design choices and demonstrate the proposed AAR framework achieves a remarkable \textbf{35}$\times$ faster inference speed and +\textbf{1.33} Fréchet Audio Distance (FAD) against baselines on the AudioSet benchmark. Code: \url{https://github.com/qiuk2/AAR}.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts
Authors:
Zhihao Lin,
Zhen Tian,
Qi Zhang,
Ziyang Ye,
Hanyang Zhuang,
Jianglin Lan
Abstract:
Safety and efficiency are crucial for autonomous driving in roundabouts, especially in the context of mixed traffic where autonomous vehicles (AVs) and human-driven vehicles coexist. This paper introduces a learning-based algorithm tailored to foster safe and efficient driving behaviors across varying levels of traffic flows in roundabouts. The proposed algorithm employs a deep Q-learning network…
▽ More
Safety and efficiency are crucial for autonomous driving in roundabouts, especially in the context of mixed traffic where autonomous vehicles (AVs) and human-driven vehicles coexist. This paper introduces a learning-based algorithm tailored to foster safe and efficient driving behaviors across varying levels of traffic flows in roundabouts. The proposed algorithm employs a deep Q-learning network to effectively learn safe and efficient driving strategies in complex multi-vehicle roundabouts. Additionally, a KAN (Kolmogorov-Arnold network) enhances the AVs' ability to learn their surroundings robustly and precisely. An action inspector is integrated to replace dangerous actions to avoid collisions when the AV interacts with the environment, and a route planner is proposed to enhance the driving efficiency and safety of the AVs. Moreover, a model predictive control is adopted to ensure stability and precision of the driving actions. The results show that our proposed system consistently achieves safe and efficient driving whilst maintaining a stable training process, as evidenced by the smooth convergence of the reward function and the low variance in the training curves across various traffic flows. Compared to state-of-the-art benchmarks, the proposed algorithm achieves a lower number of collisions and reduced travel time to destination.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Matching-Driven Deep Reinforcement Learning for Energy-Efficient Transmission Parameter Allocation in Multi-Gateway LoRa Networks
Authors:
Ziqi Lin,
Xu Zhang,
Shimin Gong,
Lanhua Li,
Zhou Su,
Bo Gu
Abstract:
Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is…
▽ More
Long-range (LoRa) communication technology, distinguished by its low power consumption and long communication range, is widely used in the Internet of Things. Nevertheless, the LoRa MAC layer adopts pure ALOHA for medium access control, which may suffer from severe packet collisions as the network scale expands, consequently reducing the system energy efficiency (EE). To address this issue, it is critical to carefully allocate transmission parameters such as the channel (CH), transmission power (TP) and spreading factor (SF) to each end device (ED). Owing to the low duty cycle and sporadic traffic of LoRa networks, evaluating the system EE under various parameter settings proves to be time-consuming. Consequently, we propose an analytical model aimed at calculating the system EE while fully considering the impact of multiple gateways, duty cycling, quasi-orthogonal SFs and capture effects. On this basis, we investigate a joint CH, SF and TP allocation problem, with the objective of optimizing the system EE for uplink transmissions. Due to the NP-hard complexity of the problem, the optimization problem is decomposed into two subproblems: CH assignment and SF/TP assignment. First, a matching-based algorithm is introduced to address the CH assignment subproblem. Then, an attention-based multiagent reinforcement learning technique is employed to address the SF/TP assignment subproblem for EDs allocated to the same CH, which reduces the number of learning agents to achieve fast convergence. The simulation outcomes indicate that the proposed approach converges quickly under various parameter settings and obtains significantly better system EE than baseline algorithms.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
Authors:
Zhening Liu,
Xinjie Zhang,
Jiawei Shao,
Zehong Lin,
Jun Zhang
Abstract:
With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo im…
▽ More
With the rapid advancement of stereo vision technologies, stereo image compression has emerged as a crucial field that continues to draw significant attention. Previous approaches have primarily employed a unidirectional paradigm, where the compression of one view is dependent on the other, resulting in imbalanced compression. To address this issue, we introduce a symmetric bidirectional stereo image compression architecture, named BiSIC. Specifically, we propose a 3D convolution based codec backbone to capture local features and incorporate bidirectional attention blocks to exploit global features. Moreover, we design a novel cross-dimensional entropy model that integrates various conditioning factors, including the spatial context, channel context, and stereo dependency, to effectively estimate the distribution of latent representations for entropy coding. Extensive experiments demonstrate that our proposed BiSIC outperforms conventional image/video compression standards, as well as state-of-the-art learning-based methods, in terms of both PSNR and MS-SSIM.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
Jingping Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo Jin,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Rapid and Accurate Diagnosis of Acute Aortic Syndrome using Non-contrast CT: A Large-scale, Retrospective, Multi-center and AI-based Study
Authors:
Yujian Hu,
Yilang Xiang,
Yan-Jie Zhou,
Yangyan He,
Shifeng Yang,
Xiaolong Du,
Chunlan Den,
Youyao Xu,
Gaofeng Wang,
Zhengyao Ding,
Jingyong Huang,
Wenjun Zhao,
Xuejun Wu,
Donglin Li,
Qianqian Zhu,
Zhenjiang Li,
Chenyang Qiu,
Ziheng Wu,
Yunjun He,
Chen Tian,
Yihui Qiu,
Zuodong Lin,
Xiaolong Zhang,
Yuan He,
Zhenpeng Yuan
, et al. (15 additional authors not shown)
Abstract:
Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed…
▽ More
Chest pain symptoms are highly prevalent in emergency departments (EDs), where acute aortic syndrome (AAS) is a catastrophic cardiovascular emergency with a high fatality rate, especially when timely and accurate treatment is not administered. However, current triage practices in the ED can cause up to approximately half of patients with AAS to have an initially missed diagnosis or be misdiagnosed as having other acute chest pain conditions. Subsequently, these AAS patients will undergo clinically inaccurate or suboptimal differential diagnosis. Fortunately, even under these suboptimal protocols, nearly all these patients underwent non-contrast CT covering the aorta anatomy at the early stage of differential diagnosis. In this study, we developed an artificial intelligence model (DeepAAS) using non-contrast CT, which is highly accurate for identifying AAS and provides interpretable results to assist in clinical decision-making. Performance was assessed in two major phases: a multi-center retrospective study (n = 20,750) and an exploration in real-world emergency scenarios (n = 137,525). In the multi-center cohort, DeepAAS achieved a mean area under the receiver operating characteristic curve of 0.958 (95% CI 0.950-0.967). In the real-world cohort, DeepAAS detected 109 AAS patients with misguided initial suspicion, achieving 92.6% (95% CI 76.2%-97.5%) in mean sensitivity and 99.2% (95% CI 99.1%-99.3%) in mean specificity. Our AI model performed well on non-contrast CT at all applicable early stages of differential diagnosis workflows, effectively reduced the overall missed diagnosis and misdiagnosis rate from 48.8% to 4.8% and shortened the diagnosis time for patients with misguided initial suspicion from an average of 681.8 (74-11,820) mins to 68.5 (23-195) mins. DeepAAS could effectively fill the gap in the current clinical workflow without requiring additional tests.
△ Less
Submitted 16 July, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
LU2Net: A Lightweight Network for Real-time Underwater Image Enhancement
Authors:
Haodong Yang,
Jisheng Xu,
Zhiliang Lin,
Jianping He
Abstract:
Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effe…
▽ More
Computer vision techniques have empowered underwater robots to effectively undertake a multitude of tasks, including object tracking and path planning. However, underwater optical factors like light refraction and absorption present challenges to underwater vision, which cause degradation of underwater images. A variety of underwater image enhancement methods have been proposed to improve the effectiveness of underwater vision perception. Nevertheless, for real-time vision tasks on underwater robots, it is necessary to overcome the challenges associated with algorithmic efficiency and real-time capabilities. In this paper, we introduce Lightweight Underwater Unet (LU2Net), a novel U-shape network designed specifically for real-time enhancement of underwater images. The proposed model incorporates axial depthwise convolution and the channel attention module, enabling it to significantly reduce computational demands and model parameters, thereby improving processing speed. The extensive experiments conducted on the dataset and real-world underwater robots demonstrate the exceptional performance and speed of proposed model. It is capable of providing well-enhanced underwater images at a speed 8 times faster than the current state-of-the-art underwater image enhancement method. Moreover, LU2Net is able to handle real-time underwater video enhancement.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
LEO Satellite Networks Assisted Geo-distributed Data Processing
Authors:
Zhiyuan Zhao,
Zhe Chen,
Zheng Lin,
Wenjun Zhu,
Kun Qiu,
Chaoqun You,
Yue Gao
Abstract:
Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti…
▽ More
Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective solution for increasing edge clouds, and hence large volumes of data in edge clouds can be transferred to a core cloud via those networks for time-sensitive big data tasks processing, such as attack detection. However, the state-of-the-art satellite selection algorithms bring poor performance for those processing via our measurements. Therefore, we propose a novel data volume aware satellite selection algorithm, named DVA, to support such big data processing tasks. DVA first takes into account both the data size in edge clouds and satellite capacity to finalize the selection, thereby preventing congestion in the access network and reducing transmitting duration. Extensive simulations validate that DVA has a significantly lower average access network duration than the state-of-the-art satellite selection algorithms in a LEO satellite emulation platform.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
MUSE: Flexible Voiceprint Receptive Fields and Multi-Path Fusion Enhanced Taylor Transformer for U-Net-based Speech Enhancement
Authors:
Zizhen Lin,
Xiaoting Chen,
Junyu Wang
Abstract:
Achieving a balance between lightweight design and high performance remains a challenging task for speech enhancement. In this paper, we introduce Multi-path Enhanced Taylor (MET) Transformer based U-net for Speech Enhancement (MUSE), a lightweight speech enhancement network built upon the Unet architecture. Our approach incorporates a novel Multi-path Enhanced Taylor (MET) Transformer block, whic…
▽ More
Achieving a balance between lightweight design and high performance remains a challenging task for speech enhancement. In this paper, we introduce Multi-path Enhanced Taylor (MET) Transformer based U-net for Speech Enhancement (MUSE), a lightweight speech enhancement network built upon the Unet architecture. Our approach incorporates a novel Multi-path Enhanced Taylor (MET) Transformer block, which integrates Deformable Embedding (DE) to enable flexible receptive fields for voiceprints. The MET Transformer is uniquely designed to fuse Channel and Spatial Attention (CSA) branches, facilitating channel information exchange and addressing spatial attention deficits within the Taylor-Transformer framework. Through extensive experiments conducted on the VoiceBank+DEMAND dataset, we demonstrate that MUSE achieves competitive performance while significantly reducing both training and deployment costs, boasting a mere 0.51M parameters.
△ Less
Submitted 19 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
Sums: Sniffing Unknown Multiband Signals under Low Sampling Rates
Authors:
Jinbo Peng,
Zhe Chen,
Zheng Lin,
Haoxuan Yuan,
Zihan Fang,
Lingzhong Bao,
Zihang Song,
Ying Li,
Jing Ren,
Yue Gao
Abstract:
Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist…
▽ More
Due to sophisticated deployments of all kinds of wireless networks (e.g., 5G, Wi-Fi, Bluetooth, LEO satellite, etc.), multiband signals distribute in a large bandwidth (e.g., from 70 MHz to 8 GHz). Consequently, for network monitoring and spectrum sharing applications, a sniffer for extracting physical layer information, such as structure of packet, with low sampling rate (especially, sub-Nyquist sampling) can significantly improve their cost- and energy-efficiency. However, to achieve a multiband signals sniffer is really a challenge. To this end, we propose Sums, a system that can sniff and analyze multiband signals in a blind manner. Our Sums takes advantage of hardware and algorithm co-design, multi-coset sub-Nyquist sampling hardware, and a multi-task deep learning framework. The hardware component breaks the Nyquist rule to sample GHz bandwidth, but only pays for a 50 MSPS sampling rate. Our multi-task learning framework directly tackles the sampling data to perform spectrum sensing, physical layer protocol recognition, and demodulation for deep inspection from multiband signals. Extensive experiments demonstrate that Sums achieves higher accuracy than the state-of-theart baselines in spectrum sensing, modulation classification, and demodulation. As a result, our Sums can help researchers and end-users to diagnose or troubleshoot their problems of wireless infrastructures deployments in practice.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
SATSense: Multi-Satellite Collaborative Framework for Spectrum Sensing
Authors:
Haoxuan Yuan,
Zhe Chen,
Zheng Lin,
Jinbo Peng,
Zihan Fang,
Yuhang Zhong,
Zihang Song,
Yue Gao
Abstract:
Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential.…
▽ More
Low Earth Orbit satellite Internet has recently been deployed, providing worldwide service with non-terrestrial networks. With the large-scale deployment of both non-terrestrial and terrestrial networks, limited spectrum resources will not be allocated enough. Consequently, dynamic spectrum sharing is crucial for their coexistence in the same spectrum, where accurate spectrum sensing is essential. However, spectrum sensing in space is more challenging than in terrestrial networks due to variable channel conditions, making single-satellite sensing unstable. Therefore, we first attempt to design a collaborative sensing scheme utilizing diverse data from multiple satellites. However, it is non-trivial to achieve this collaboration due to heterogeneous channel quality, considerable raw sampling data, and packet loss. To address the above challenges, we first establish connections between the satellites by modeling their sensing data as a graph and devising a graph neural network-based algorithm to achieve effective spectrum sensing. Meanwhile, we establish a joint sub-Nyquist sampling and autoencoder data compression framework to reduce the amount of transmitted sensing data. Finally, we propose a contrastive learning-based mechanism compensates for missing packets. Extensive experiments demonstrate that our proposed strategy can achieve efficient spectrum sensing performance and outperform the conventional deep learning algorithm in spectrum sensing accuracy.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model?
Authors:
Ziqin Lin,
Heng Li,
Zinan Li,
Huazhu Fu,
Jiang Liu
Abstract:
Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supe…
▽ More
Recent advancements in pre-trained large foundation models (LFM) have yielded significant breakthroughs across various domains, including natural language processing and computer vision. These models have been particularly impactful in the domain of medical diagnostic tasks. With abundant unlabeled data, an LFM has been developed for fundus images using the Vision Transformer (VIT) and a self-supervised learning framework. This LFM has shown promising performance in fundus disease diagnosis across multiple datasets. On the other hand, deep learning models have long been challenged by dataset quality issues, such as image quality and dataset bias. To investigate the influence of data quality on LFM, we conducted explorations in two fundus diagnosis tasks using datasets of varying quality. Specifically, we explored the following questions: Is LFM more robust to image quality? Is LFM affected by dataset bias? Can fine-tuning techniques alleviate these effects? Our investigation found that LFM exhibits greater resilience to dataset quality issues, including image quality and dataset bias, compared to typical convolutional networks. Furthermore, we discovered that overall fine-tuning is an effective adapter for LFM to mitigate the impact of dataset quality issues.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Authors:
Hongjie Wang,
Difan Liu,
Yan Kang,
Yijun Li,
Zhe Lin,
Niraj K. Jha,
Yuchen Liu
Abstract:
Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable…
▽ More
Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: https://atedm.github.io.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
A Reconfigurable Subarray Architecture and Hybrid Beamforming for Millimeter-Wave Dual-Function-Radar-Communication Systems
Authors:
Xin Jin,
Tiejun Lv,
Wei Ni,
Zhipeng Lin,
Qiuming Zhu,
Ekram Hossain,
H. Vincent Poor
Abstract:
Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-pl…
▽ More
Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-plus-noise ratio for radar sensing. Considering the non-convexity of this problem arising from multiplicative coupling of the analog and digital beamforming, we convert the sum-rate maximization into an equivalent weighted mean-square error minimization and apply penalty dual decomposition to decouple the analog and digital beamforming. Specifically, a second-order cone program is first constructed to optimize the fully digital counterpart of the HAD beamforming. Then, the sparsity of the RS architecture is exploited to obtain a low-complexity solution for the HAD beamforming. The convergence and complexity analyses of our algorithm are carried out under the RS architecture. Simulations corroborate that, with the RS architecture, DFRC offers effective communication and sensing and improves energy efficiency by 83.4% and 114.2% with a moderate number of radio frequency chains and phase shifters, compared to the persistently- and fullyconnected architectures, respectively.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
Authors:
Li Siyao,
Tianpei Gu,
Zhitao Yang,
Zhengyu Lin,
Ziwei Liu,
Henghui Ding,
Lei Yang,
Chen Change Loy
Abstract:
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between t…
▽ More
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Dual-modal Tactile E-skin: Enabling Bidirectional Human-Robot Interaction via Integrated Tactile Perception and Feedback
Authors:
Shilong Mu,
Runze Zhao,
Zenan Lin,
Yan Huang,
Shoujie Li,
Chenchang Li,
Xiao-Ping Zhang,
Wenbo Ding
Abstract:
To foster an immersive and natural human-robot interaction, the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced human-robot interaction. The dual-modal tactile e-skin offers multi-functi…
▽ More
To foster an immersive and natural human-robot interaction, the implementation of tactile perception and feedback becomes imperative, effectively bridging the conventional sensory gap. In this paper, we propose a dual-modal electronic skin (e-skin) that integrates magnetic tactile sensing and vibration feedback for enhanced human-robot interaction. The dual-modal tactile e-skin offers multi-functional tactile sensing and programmable haptic feedback, underpinned by a layered structure comprised of flexible magnetic films, soft silicone, a Hall sensor and actuator array, and a microcontroller unit. The e-skin captures the magnetic field changes caused by subtle deformations through Hall sensors, employing deep learning for accurate tactile perception. Simultaneously, the actuator array generates mechanical vibrations to facilitate haptic feedback, delivering diverse mechanical stimuli. Notably, the dual-modal e-skin is capable of transmitting tactile information bidirectionally, enabling object recognition and fine-weighing operations. This bidirectional tactile interaction framework will enhance the immersion and efficiency of interactions between humans and robots.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
A Novel Geometric Solution for Moving Target Localization through Multistatic Sensing in the ISAC System
Authors:
S. Zhuge,
Y. Ma,
Z. Lin,
Y. Zeng
Abstract:
This paper proposes a novel geometric solution for tracking a moving target through multistatic sensing. In contrast to existing two-step weighted least square (2SWLS) methods which use the bistatic range (BR) and bistatic range rate (BRR) measurements, the proposed method incorporates an additional direction of arrival (DOA) measurement of the target obtained from a communication receiver in an i…
▽ More
This paper proposes a novel geometric solution for tracking a moving target through multistatic sensing. In contrast to existing two-step weighted least square (2SWLS) methods which use the bistatic range (BR) and bistatic range rate (BRR) measurements, the proposed method incorporates an additional direction of arrival (DOA) measurement of the target obtained from a communication receiver in an integrated sensing and communication (ISAC) system. Unlike the existing 2SWLS methods that require at least three transmitter-receiver (TX-RX) pairs to operate, the proposed algorithm can conduct location estimation with a single TX-RX pair and velocity estimation with two TX-RX pairs. Simulations reveal that the proposed method exhibits superior performance compared to existing 2SWLS methods, particularly when dealing with moderate levels of noise in DOA measurements.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Active headrest combined with a depth camera-based ear-positioning system
Authors:
Yuteng Liu,
Haowen Li,
Haishan Zou,
Jing Lu,
Zhibin Lin
Abstract:
Active headrests can reduce low-frequency noise around ears based on active noise control (ANC) system. Both the control system using fixed control filters and the remote microphone-based adaptive control system provide good noise reduction performance when the head is in the original position. However, their performance degrades significantly when the head is in motion. In this paper, a human ear…
▽ More
Active headrests can reduce low-frequency noise around ears based on active noise control (ANC) system. Both the control system using fixed control filters and the remote microphone-based adaptive control system provide good noise reduction performance when the head is in the original position. However, their performance degrades significantly when the head is in motion. In this paper, a human ear-positioning system based on the depth camera is introduced to address this problem. The system uses RTMpose model to estimate the two-dimensional (2D) positions of the ears in the color frame, and then derives the corresponding three-dimensional (3D) coordinates in the depth frame with a depth camera. Experimental results show that the ear-positioning system can effectively track the movement of ears, and the broadband noise reduction performance of the active headrest combined with the system is significantly improved when the human head is translating or rotating.
△ Less
Submitted 25 December, 2023;
originally announced January 2024.
-
Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation
Authors:
Zhiwei Lin,
Jun Chen,
Boshi Tang,
Binzhu Sha,
Jing Yang,
Yaolong Ju,
Fan Fan,
Shiyin Kang,
Zhiyong Wu,
Helen Meng
Abstract:
Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still re…
▽ More
Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still remains unaddressed. To this end, we propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. The Multi-view MidiVAE utilizes the two-dimensional (2-D) representation, OctupleMIDI, to capture relationships among notes while reducing the feature sequences length. Moreover, we focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy to integrate both Track- and Bar-view MidiVAE features. Objective and subjective experimental results on the CocoChorales dataset demonstrate that, compared to the baseline, Multi-view MidiVAE exhibits significant improvements in terms of modeling long multi-track symbolic music.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Index Modulation for Fluid Antenna-Assisted MIMO Communications: System Design and Performance Analysis
Authors:
Jing Zhu,
Gaojie Chen,
Pengyu Gao,
Pei Xiao,
Zihuai Lin,
Atta Quddus
Abstract:
In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the…
▽ More
In this paper, we propose a transmission mechanism for fluid antennas (FAs) enabled multiple-input multiple-output (MIMO) communication systems based on index modulation (IM), named FA-IM, which incorporates the principle of IM into FAs-assisted MIMO system to improve the spectral efficiency (SE) without increasing the hardware complexity. In FA-IM, the information bits are mapped not only to the modulation symbols, but also the index of FA position patterns. Additionally, the FA position pattern codebook is carefully designed to further enhance the system performance by maximizing the effective channel gains. Then, a low-complexity detector, referred to efficient sparse Bayesian detector, is proposed by exploiting the inherent sparsity of the transmitted FA-IM signal vectors. Finally, a closed-form expression for the upper bound on the average bit error probability (ABEP) is derived under the finite-path and infinite-path channel condition. Simulation results show that the proposed scheme is capable of improving the SE performance compared to the existing FAs-assisted MIMO and the fixed position antennas (FPAs)-assisted MIMO systems while obviating any additional hardware costs. It has also been shown that the proposed scheme outperforms the conventional FA-assisted MIMO scheme in terms of error performance under the same transmission rate.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference
Authors:
Zhidi Lin,
Yiyong Sun,
Feng Yin,
Alexandre Hoang Thiéry
Abstract:
The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, a…
▽ More
The Gaussian process state-space models (GPSSMs) represent a versatile class of data-driven nonlinear dynamical system models. However, the presence of numerous latent variables in GPSSM incurs unresolved issues for existing variational inference approaches, particularly under the more realistic non-mean-field (NMF) assumption, including extensive training effort, compromised inference accuracy, and infeasibility for online applications, among others. In this paper, we tackle these challenges by incorporating the ensemble Kalman filter (EnKF), a well-established model-based filtering technique, into the NMF variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO). Moreover, owing to the streamlined parameterization via the EnKF, the new GPSSM model can be easily accommodated in online learning applications. We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting. We also provide detailed analysis and fresh insights for the proposed algorithms. Comprehensive evaluation across diverse real and synthetic datasets corroborates the superior learning and inference performance of our EnKF-aided variational inference algorithms compared to existing methods.
△ Less
Submitted 22 July, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
An Unsupervised Machine Learning Scheme for Index-Based CSI Feedback in Wi-Fi
Authors:
Mrugen Deshmukh,
Zinan Lin,
Hanqing Lou,
Mahmoud Kamel,
Rui Yang,
Ismail Guvenc
Abstract:
With the ever-increasing demand for high-speed wireless data transmission, beamforming techniques have been proven to be crucial in improving the data rate and the signal-to-noise ratio (SNR) at the receiver. However, they require feedback mechanisms that need an overhead of information and increase the system complexity, potentially challenging the efficiency and capacity of modern wireless netwo…
▽ More
With the ever-increasing demand for high-speed wireless data transmission, beamforming techniques have been proven to be crucial in improving the data rate and the signal-to-noise ratio (SNR) at the receiver. However, they require feedback mechanisms that need an overhead of information and increase the system complexity, potentially challenging the efficiency and capacity of modern wireless networks. This paper investigates novel index-based feedback mechanisms that aim at reducing the beamforming feedback overhead in Wi-Fi links. The proposed methods mitigate the overhead by generating a set of candidate beamforming vectors using an unsupervised learning-based framework. The amount of feedback information required is thus reduced by using the index of the candidate as feedback instead of transmitting the entire beamforming matrix. We explore several methods that consider different representations of the data in the candidate set. In particular, we propose five different ways to generate and represent the candidate sets that consider the covariance matrices of the channel, serialize the feedback matrix, and account for the effective distance, among others. Additionally, we also discuss the implications of using partial information in the compressed beamforming feedback on the link performance and compare it with the newly proposed index-based methods. Extensive IEEE 802.11 standard-compliant simulation results show that the proposed methods effectively minimize the feedback overhead, enhancing the throughput while maintaining an adequate link performance.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Enhanced Index-Based Feedback Overhead Reduction for WLANs
Authors:
Mrugen Deshmukh,
Zinan Lin,
Hanqing Lou,
Mahmoud Kamel,
Rui Yang,
Ismail Guvenc
Abstract:
Compressed beamforming algorithm is used in the current Wi-Fi standard to reduce the beamforming feedback overhead (BFO). However, with each new amendment of the standard the number of supported antennas in Wi-Fi devices increases, leading to increased BFO and hampering the throughput despite using compressed beamforming. In this paper, a novel index-based method is presented to reduce the BFO in…
▽ More
Compressed beamforming algorithm is used in the current Wi-Fi standard to reduce the beamforming feedback overhead (BFO). However, with each new amendment of the standard the number of supported antennas in Wi-Fi devices increases, leading to increased BFO and hampering the throughput despite using compressed beamforming. In this paper, a novel index-based method is presented to reduce the BFO in Wi-Fi links. In particular, a k-means clustering-based approach is presented to generate candidate beamforming feedback matrices, thereby reducing the BFO to only the index of the said candidate matrices. With extensive simulation results, we compare the newly proposed method with the IEEE 802.11be baseline and our previously published index-based method. We show approximately 54% gain in throughput at high signal-to-noise (SNR) against the IEEE 802.11be baseline. Our comparison also shows approximately 4 dB gain compared to our previously published method at the packet-error-rate (PER) of 0.01 using MCS index 11. Additionally, we also discuss the impact of the distance metric chosen for clustering as well as candidate selection on the link performance.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Enhancing and Adapting in the Clinic: Source-free Unsupervised Domain Adaptation for Medical Image Enhancement
Authors:
Heng Li,
Ziqin Lin,
Zhongxi Qiu,
Zinan Li,
Huazhu Fu,
Yan Hu,
Jiang Liu
Abstract:
Medical imaging provides many valuable clues involving anatomical structure and pathological characteristics. However, image degradation is a common issue in clinical practice, which can adversely impact the observation and diagnosis by physicians and algorithms. Although extensive enhancement models have been developed, these models require a well pre-training before deployment, while failing to…
▽ More
Medical imaging provides many valuable clues involving anatomical structure and pathological characteristics. However, image degradation is a common issue in clinical practice, which can adversely impact the observation and diagnosis by physicians and algorithms. Although extensive enhancement models have been developed, these models require a well pre-training before deployment, while failing to take advantage of the potential value of inference data after deployment. In this paper, we raise an algorithm for source-free unsupervised domain adaptive medical image enhancement (SAME), which adapts and optimizes enhancement models using test data in the inference phase. A structure-preserving enhancement network is first constructed to learn a robust source model from synthesized training data. Then a teacher-student model is initialized with the source model and conducts source-free unsupervised domain adaptation (SFUDA) by knowledge distillation with the test data. Additionally, a pseudo-label picker is developed to boost the knowledge distillation of enhancement tasks. Experiments were implemented on ten datasets from three medical image modalities to validate the advantage of the proposed algorithm, and setting analysis and ablation studies were also carried out to interpret the effectiveness of SAME. The remarkable enhancement performance and benefits for downstream tasks demonstrate the potential and generalizability of SAME. The code is available at https://github.com/liamheng/Annotation-free-Medical-Image-Enhancement.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation
Authors:
Zhaofeng Lin,
Tanvina Patel,
Odette Scharenborg
Abstract:
Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To address the data scarcity issue, we use a signal processin…
▽ More
Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To address the data scarcity issue, we use a signal processing-based technique that transforms the spectral characteristics of normal speech to those of pseudo-whispered speech. We augment an End-to-End ASR with pseudo-whispered speech and achieve an 18.2% relative reduction in word error rate for whispered speech compared to the baseline. Results for the individual speaker groups in the wTIMIT database show the best results for US English. Further investigation showed that the lack of glottal information in whispered speech has the largest impact on whispered speech ASR performance.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
INeAT: Iterative Neural Adaptive Tomography
Authors:
Bo Xiong,
Changqing Su,
Zihan Lin,
You Zhou,
Zhaofei Yu
Abstract:
Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditio…
▽ More
Computed Tomography (CT) with its remarkable capability for three-dimensional imaging from multiple projections, enjoys a broad range of applications in clinical diagnosis, scientific observation, and industrial detection. Neural Adaptive Tomography (NeAT) is a recently proposed 3D rendering method based on neural radiance field for CT, and it demonstrates superior performance compared to traditional methods. However, it still faces challenges when dealing with the substantial perturbations and pose shifts encountered in CT scanning processes. Here, we propose a neural rendering method for CT reconstruction, named Iterative Neural Adaptive Tomography (INeAT), which incorporates iterative posture optimization to effectively counteract the influence of posture perturbations in data, particularly in cases involving significant posture variations. Through the implementation of a posture feedback optimization strategy, INeAT iteratively refines the posture corresponding to the input images based on the reconstructed 3D volume. We demonstrate that INeAT achieves artifact-suppressed and resolution-enhanced reconstruction in scenarios with significant pose disturbances. Furthermore, we show that our INeAT maintains comparable reconstruction performance to stable-state acquisitions even using data from unstable-state acquisitions, which significantly reduces the time required for CT scanning and relaxes the stringent requirements on imaging hardware systems, underscoring its immense potential for applications in short-time and low-cost CT technology.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
PC-bzip2: a phase-space continuity enhanced lossless compression algorithm for light field microscopy data
Authors:
Changqing Su,
Zihan Lin,
You Zhou,
Shuai Wang,
Yuhan Gao,
Chenggang Yan,
Bo Xiong
Abstract:
Light-field fluorescence microscopy (LFM) is a powerful elegant compact method for long-term high-speed imaging of complex biological systems, such as neuron activities and rapid movements of organelles. LFM experiments typically generate terabytes image data and require a huge number of storage space. Some lossy compression algorithms have been proposed recently with good compression performance.…
▽ More
Light-field fluorescence microscopy (LFM) is a powerful elegant compact method for long-term high-speed imaging of complex biological systems, such as neuron activities and rapid movements of organelles. LFM experiments typically generate terabytes image data and require a huge number of storage space. Some lossy compression algorithms have been proposed recently with good compression performance. However, since the specimen usually only tolerates low power density illumination for long-term imaging with low phototoxicity, the image signal-to-noise ratio (SNR) is relative-ly low, which will cause the loss of some efficient position or intensity information by using such lossy compression al-gorithms. Here, we propose a phase-space continuity enhanced bzip2 (PC-bzip2) lossless compression method for LFM data as a high efficiency and open-source tool, which combines GPU-based fast entropy judgement and multi-core-CPU-based high-speed lossless compression. Our proposed method achieves almost 10% compression ratio improvement while keeping the capability of high-speed compression, compared with original bzip2. We evaluated our method on fluorescence beads data and fluorescence staining cells data with different SNRs. Moreover, by introducing the temporal continuity, our method shows the superior compression ratio on time series data of zebrafish blood vessels.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder
Authors:
Zih-Jyun Lin,
Yi-Ju Chen,
Po-Chih Kuo,
Likai Huang,
Chaur-Jong Hu,
Cheng-Yu Chen
Abstract:
Dementia diagnosis requires a series of different testing methods, which is complex and time-consuming. Early detection of dementia is crucial as it can prevent further deterioration of the condition. This paper utilizes a speech recognition model to construct a dementia assessment system tailored for Mandarin speakers during the picture description task. By training an attention-based speech reco…
▽ More
Dementia diagnosis requires a series of different testing methods, which is complex and time-consuming. Early detection of dementia is crucial as it can prevent further deterioration of the condition. This paper utilizes a speech recognition model to construct a dementia assessment system tailored for Mandarin speakers during the picture description task. By training an attention-based speech recognition model on voice data closely resembling real-world scenarios, we have significantly enhanced the model's recognition capabilities. Subsequently, we extracted the encoder from the speech recognition model and added a linear layer for dementia assessment. We collected Mandarin speech data from 99 subjects and acquired their clinical assessments from a local hospital. We achieved an accuracy of 92.04% in Alzheimer's disease detection and a mean absolute error of 9% in clinical dementia rating score prediction.
△ Less
Submitted 15 December, 2023; v1 submitted 5 October, 2023;
originally announced October 2023.
-
Convolution and Attention Mixer for Synthetic Aperture Radar Image Change Detection
Authors:
Haopeng Zhang,
Zijing Lin,
Feng Gao,
Junyu Dong,
Qian Du,
Heng-Chao Li
Abstract:
Synthetic aperture radar (SAR) image change detection is a critical task and has received increasing attentions in the remote sensing community. However, existing SAR change detection methods are mainly based on convolutional neural networks (CNNs), with limited consideration of global attention mechanism. In this letter, we explore Transformer-like architecture for SAR change detection to incorpo…
▽ More
Synthetic aperture radar (SAR) image change detection is a critical task and has received increasing attentions in the remote sensing community. However, existing SAR change detection methods are mainly based on convolutional neural networks (CNNs), with limited consideration of global attention mechanism. In this letter, we explore Transformer-like architecture for SAR change detection to incorporate global attention. To this end, we propose a convolution and attention mixer (CAMixer). First, to compensate the inductive bias for Transformer, we combine self-attention with shift convolution in a parallel way. The parallel design effectively captures the global semantic information via the self-attention and performs local feature extraction through shift convolution simultaneously. Second, we adopt a gating mechanism in the feed-forward network to enhance the non-linear feature transformation. The gating mechanism is formulated as the element-wise multiplication of two parallel linear layers. Important features can be highlighted, leading to high-quality representations against speckle noise. Extensive experiments conducted on three SAR datasets verify the superior performance of the proposed CAMixer. The source codes will be publicly available at https://github.com/summitgao/CAMixer .
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Distributed Finite-Time Cooperative Localization for Three-Dimensional Sensor Networks
Authors:
Jinze Wu,
Lorenzo Zino,
Zhiyun Lin,
Alessandro Rizzo
Abstract:
This paper addresses the distributed localization problem for a network of sensors placed in a three-dimensional space, in which sensors are able to perform range measurements, i.e., measure the relative distance between them, and exchange information on a network structure. First, we derive a necessary and sufficient condition for node localizability using barycentric coordinates. Then, building…
▽ More
This paper addresses the distributed localization problem for a network of sensors placed in a three-dimensional space, in which sensors are able to perform range measurements, i.e., measure the relative distance between them, and exchange information on a network structure. First, we derive a necessary and sufficient condition for node localizability using barycentric coordinates. Then, building on this theoretical result, we design a distributed localizability verification algorithm, in which we propose and employ a novel distributed finite-time algorithm for sum consensus. Finally, we develop a distributed localization algorithm based on conjugate gradient method, and we derive a theoretical guarantee on its performance, which ensures finite-time convergence to the exact position for all localizable nodes. The efficiency of our algorithm compared to the existing ones from the state-of-the-art literature is further demonstrated through numerical simulations.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel
Authors:
Richard Cornelius Suwandi,
Zhidi Lin,
Feng Yin,
Zhiguo Wang,
Sergios Theodoridis
Abstract:
Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture (GSM) kernel is tailored for m…
▽ More
Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture (GSM) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capabilities. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity property of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods.
△ Less
Submitted 26 December, 2023; v1 submitted 15 September, 2023;
originally announced September 2023.
-
Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models
Authors:
Zhidi Lin,
Juan Maroñas,
Ying Li,
Feng Yin,
Sergios Theodoridis
Abstract:
The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states.…
▽ More
The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
A Mobile Data-Driven Hierarchical Deep Reinforcement Learning Approach for Real-time Demand-Responsive Railway Rescheduling and Station Overcrowding Mitigation
Authors:
Enze Liu,
Zhiyuan Lin,
Judith Y. T. Wang,
Hong Chen
Abstract:
Real-time railway rescheduling is an important technique to enable operational recovery in response to unexpected and dynamic conditions in a timely and flexible manner. Current research relies mostly on OD based data and model-based methods for estimating train passenger demands. These approaches primarily focus on averaged disruption patterns, often overlooking the immediate uneven distribution…
▽ More
Real-time railway rescheduling is an important technique to enable operational recovery in response to unexpected and dynamic conditions in a timely and flexible manner. Current research relies mostly on OD based data and model-based methods for estimating train passenger demands. These approaches primarily focus on averaged disruption patterns, often overlooking the immediate uneven distribution of demand over time. In reality, passenger demand deviates significantly from predictions, especially during a disaster. Disastrous situations such as flood in Zhengzhou, China in 2022 has created not only unprecedented effect on Zhengzhou railway station itself, which is a major railway hub in China, but also other major hubs connected to Zhengzhou, e.g., Xi'an, the closest hub west of Zhengzhou. In this study, we define a real-time demand-responsive (RTDR) railway rescheduling problem focusing two specific aspects, namely, volatility of the demand, and management of station crowdedness. For the first time, we propose a data-driven approach using real-time mobile data (MD) to deal with this RTDR problem. A hierarchical deep reinforcement learning (HDRL) framework is designed to perform real-time rescheduling in a demand-responsive manner. The use of MD has enabled the modelling of passenger dynamics in response to train delays and station crowdedness, and a real-time optimisation for rescheduling of train services in view of the change in demand as a result of passengers' behavioural response to disruption. Results show that the agent can steadily satisfy over 62% of the demand with only 61% of the original rolling stock, ensuring continuous operations without overcrowding. Moreover, the agent exhibits adaptability when transferred to a new environment with increased demand, highlighting its effectiveness in addressing unforeseen disruptions in real-time settings.
△ Less
Submitted 6 November, 2023; v1 submitted 22 August, 2023;
originally announced August 2023.
-
Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management
Authors:
Yiwen Xu,
Dengfeng Liu,
Liangtao Huang,
Zhiquan Lin,
Tiesong Zhao,
Sam Kwong
Abstract:
The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric ener…
▽ More
The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT).
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Exploiting Structured Sparsity with Low Complexity Sparse Bayesian Learning for RIS-assisted MIMO Channel Estimation
Authors:
W. Li,
Z. Lin,
Q. Guo,
B. Vucetic
Abstract:
As an emerging communication auxiliary technology, reconfigurable intelligent surface (RIS) is expected to play a significant role in the upcoming 6G networks. Due to its total reflection characteristics, it is challenging to implement conventional channel estimation algorithms. This work focuses on RIS-assisted MIMO communications. Although many algorithms have been proposed to address this issue…
▽ More
As an emerging communication auxiliary technology, reconfigurable intelligent surface (RIS) is expected to play a significant role in the upcoming 6G networks. Due to its total reflection characteristics, it is challenging to implement conventional channel estimation algorithms. This work focuses on RIS-assisted MIMO communications. Although many algorithms have been proposed to address this issue, there are still ample opportunities for improvement in terms of estimation accuracy, complexity, and applicability. To fully exploit the structured sparsity of the multiple-input-multiple-output (MIMO) channels, we propose a new channel estimation algorithm called unitary approximate message passing sparse Bayesian learning with partial common support identification (UAMPSBL-PCI). Thanks to the mechanism of PCI and the use of UAMP, the proposed algorithm has a lower complexity while delivering enhanced performance relative to existing channel estimation algorithms. Extensive simulations demonstrate its excellent performance in various environments.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Large Language Models Empowered Autonomous Edge AI for Connected Intelligence
Authors:
Yifei Shen,
Jiawei Shao,
Xinjie Zhang,
Zehong Lin,
Hao Pan,
Dongsheng Li,
Jun Zhang,
Khaled B. Letaief
Abstract:
The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network…
▽ More
The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge artificial intelligence (Edge AI) is a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. This article presents a vision of autonomous edge AI systems that automatically organize, adapt, and optimize themselves to meet users' diverse requirements, leveraging the power of large language models (LLMs), i.e., Generative Pretrained Transformer (GPT). By exploiting the powerful abilities of GPT in language understanding, planning, and code generation, as well as incorporating classic wisdom such as task-oriented communication and edge federated learning, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models in a privacy-preserving manner. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models at edge servers.
△ Less
Submitted 25 December, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Precise WiFi Indoor Positioning using Deep Learning Algorithms
Authors:
Minxue Cai,
Zihuai Lin
Abstract:
This study demonstrates a WiFi indoor positioning system using Deep Learning algorithms. A new method using fitting function in MATLAB will be utilized to compute the path loss coefficient and log-normal fading variance. To reduce the error, a new hybrid localization approach utilizing Received Signal Strength Indicator (RSSI) and Angle of Arrival (AoA) has been created. Three Deep Learning algori…
▽ More
This study demonstrates a WiFi indoor positioning system using Deep Learning algorithms. A new method using fitting function in MATLAB will be utilized to compute the path loss coefficient and log-normal fading variance. To reduce the error, a new hybrid localization approach utilizing Received Signal Strength Indicator (RSSI) and Angle of Arrival (AoA) has been created. Three Deep Learning algorithms would be utilized to decrease the adverse influence of the noise and interference. This paper compares the performance of two models in three different indoor environments. The average error of our hybrid positioning model trained by CNN in the big classroom is less than 250 mm.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
Spatio-Temporal Classification of Lung Ventilation Patterns using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation
Authors:
Shuzhe Chen,
Li Li,
Zhichao Lin,
Ke Zhang,
Ying Gong,
Lu Wang,
Xu Wu,
Maokun Li,
Yuanlin Song,
Fan Yang,
Shenheng Xu
Abstract:
The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventila…
▽ More
The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder network with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then concatenated to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data collected from 137 subjects were finally used for training. The model is validated by ten-fold and leave-one-out cross-validation first. The accuracy and sensitivity of normal ventilation mode are 0.95 and 1.00, and the f1-score is 0.94. Furthermore, we check the reliability and feasibility of the proposed pipeline by testing it on newly recruited nine subjects. Our results show that the pipeline correctly predicts the ventilation mode of 8 out of 9 subjects. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Probability of Error for Optimal Codes in a Reconfigurable Intelligent Surface Aided URLLC System
Authors:
Likun Sui,
Zihuai Lin
Abstract:
The lower bound on the decoding error probability for the optimal code given a signal-to-noise ratio and a code rate are investigated in this letter for the reconfigurable intelligent surface (RIS) communication system over a Rician fading channel at the short blocklength regime, which is the key characteristic of ultra-reliable low-latency communications (URLLC) to meet the need for strict adhere…
▽ More
The lower bound on the decoding error probability for the optimal code given a signal-to-noise ratio and a code rate are investigated in this letter for the reconfigurable intelligent surface (RIS) communication system over a Rician fading channel at the short blocklength regime, which is the key characteristic of ultra-reliable low-latency communications (URLLC) to meet the need for strict adherence to quality of service (QoS) requirements. Sphere packing technique is used to derive our main results. The Wald sequential t-test lemma and the Gaussian-Chebyshev quadrature are the main tools to obtain the closed-form expression for the lower bound. Numerical results are provided to validate our results and demonstrate the tightness of our results compared to the Polyanskiy-Poor-Verdu (PPV) bound.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
You Only Train Once: Learning a General Anomaly Enhancement Network with Random Masks for Hyperspectral Anomaly Detection
Authors:
Zhaoxu Li,
Yingqian Wang,
Chao Xiao,
Qiang Ling,
Zaiping Lin,
Wei An
Abstract:
In this paper, we introduce a new approach to address the challenge of generalization in hyperspectral anomaly detection (AD). Our method eliminates the need for adjusting parameters or retraining on new test scenes as required by most existing methods. Employing an image-level training paradigm, we achieve a general anomaly enhancement network for hyperspectral AD that only needs to be trained on…
▽ More
In this paper, we introduce a new approach to address the challenge of generalization in hyperspectral anomaly detection (AD). Our method eliminates the need for adjusting parameters or retraining on new test scenes as required by most existing methods. Employing an image-level training paradigm, we achieve a general anomaly enhancement network for hyperspectral AD that only needs to be trained once. Trained on a set of anomaly-free hyperspectral images with random masks, our network can learn the spatial context characteristics between anomalies and background in an unsupervised way. Additionally, a plug-and-play model selection module is proposed to search for a spatial-spectral transform domain that is more suitable for AD task than the original data. To establish a unified benchmark to comprehensively evaluate our method and existing methods, we develop a large-scale hyperspectral AD dataset (HAD100) that includes 100 real test scenes with diverse anomaly targets. In comparison experiments, we combine our network with a parameter-free detector and achieve the optimal balance between detection accuracy and inference speed among state-of-the-art AD methods. Experimental results also show that our method still achieves competitive performance when the training and test set are captured by different sensor devices. Our code is available at https://github.com/ZhaoxuLi123/AETNet.
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
Modeling and Analysis on Efficiency Degradation of Lithium-ion Batteries
Authors:
Zihui Lin,
Dagang Li
Abstract:
Efficiency of Battery Energy Storage Systems (BESSs) is increasingly critical as renewable energy generation becomes more prevalent on the grid. Therefore, it is necessary to study the energy efficiency of lithium-ion batteries, which are typically used in BESSs. The purpose of this study is to propose the State of Efficiency (SOE) as a measure of how efficiently batteries transfer energy, and to…
▽ More
Efficiency of Battery Energy Storage Systems (BESSs) is increasingly critical as renewable energy generation becomes more prevalent on the grid. Therefore, it is necessary to study the energy efficiency of lithium-ion batteries, which are typically used in BESSs. The purpose of this study is to propose the State of Efficiency (SOE) as a measure of how efficiently batteries transfer energy, and to analyze what factors affect the SOE of a battery throughout its lifetime. Using NASA's data set, we measure the SOE of Nickel-Cobalt-Aluminum (NCA) lithium-ion batteries by calculating the ratio of energy generated and consumed during discharge and charge phases. A linear trend was observed in the SOE trajectories, which is confirmed by the Mann-Kendall (MK) trend test. Following that, a linear SOE degradation model was presented. Further analysis shows that ambient temperature, discharge current, and cutoff voltage all affect SOE in different ways. Using the SOE and its behavior observed in this study, Battery Management Systems (BMS) can improve the energy efficiency of batteries by adjusting operating conditions or developing better management strategies.
△ Less
Submitted 19 March, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Sparse Bayesian Learning-Based 3D Spectrum Environment Map Construction-Sampling Optimization, Scenario-Dependent Dictionary Construction and Sparse Recovery
Authors:
Jie Wang,
Qiuming Zhu,
Zhipeng Lin,
Qihui Wu,
Yang Huang,
Xuezhao Cai,
Weizhi Zhong,
Yi Zhao
Abstract:
The spectrum environment map (SEM), which can visualize the information of invisible electromagnetic spectrum, is vital for monitoring, management, and security of spectrum resources in cognitive radio (CR) networks. In view of a limited number of spectrum sensors and constrained sampling time, this paper presents a new three-dimensional (3D) SEM construction scheme based on sparse Bayesian learni…
▽ More
The spectrum environment map (SEM), which can visualize the information of invisible electromagnetic spectrum, is vital for monitoring, management, and security of spectrum resources in cognitive radio (CR) networks. In view of a limited number of spectrum sensors and constrained sampling time, this paper presents a new three-dimensional (3D) SEM construction scheme based on sparse Bayesian learning (SBL). Firstly, we construct a scenario-dependent channel dictionary matrix by considering the propagation characteristic of the interested scenario. To improve sampling efficiency, a maximum mutual information (MMI)-based optimization algorithm is developed for the layout of sampling sensors. Then, a maximum and minimum distance (MMD) clustering-based SBL algorithm is proposed to recover the spectrum data at the unsampled positions and construct the whole 3D SEM. We finally use the simulation data of the campus scenario to construct the 3D SEMs and compare the proposed method with the state-of-the-art. The recovery performance and the impact of different sparsity on the constructed SEMs are also analyzed. Numerical results show that the proposed scheme can reduce the required spectrum sensor number and has higher accuracy under the low sampling rate.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Transcending shift-invariance in the paraxial regime via end-to-end inverse design of freeform nanophotonics
Authors:
William F. Li,
Gaurav Arya,
Charles Roques-Carmes,
Zin Lin,
Steven G. Johnson,
Marin Soljačić
Abstract:
Traditional optical elements and conventional metasurfaces obey shift-invariance in the paraxial regime. For imaging systems obeying paraxial shift-invariance, a small shift in input angle causes a corresponding shift in the sensor image. Shift-invariance has deep implications for the design and functionality of optical devices, such as the necessity of free space between components (as in compoun…
▽ More
Traditional optical elements and conventional metasurfaces obey shift-invariance in the paraxial regime. For imaging systems obeying paraxial shift-invariance, a small shift in input angle causes a corresponding shift in the sensor image. Shift-invariance has deep implications for the design and functionality of optical devices, such as the necessity of free space between components (as in compound objectives made of several curved surfaces). We present a method for nanophotonic inverse design of compact imaging systems whose resolution is not constrained by paraxial shift-invariance. Our method is end-to-end, in that it integrates density-based full-Maxwell topology optimization with a fully iterative elastic-net reconstruction algorithm. By the design of nanophotonic structures that scatter light in a non-shift-invariant manner, our optimized nanophotonic imaging system overcomes the limitations of paraxial shift-invariance, achieving accurate, noise-robust image reconstruction beyond shift-invariant resolution.
△ Less
Submitted 3 February, 2023;
originally announced February 2023.
-
Towards Flexibility and Interpretability of Gaussian Process State-Space Model
Authors:
Zhid Lin,
Feng Yin,
Juan Maroñas
Abstract:
The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a ne…
▽ More
The Gaussian process state-space model (GPSSM) has garnered considerable attention over the past decade. However, the standard GP with a preliminary kernel, such as the squared exponential kernel or Matérn kernel, that is commonly used in GPSSM studies, limits the model's representation power and substantially restricts its applicability to complex scenarios. To address this issue, we propose a new class of probabilistic state-space models called TGPSSMs, which leverage a parametric normalizing flow to enrich the GP priors in the standard GPSSM, enabling greater flexibility and expressivity. Additionally, we present a scalable variational inference algorithm that offers a flexible and optimal structure for the variational distribution of latent states. The proposed algorithm is interpretable and computationally efficient due to the sparse GP representation and the bijective nature of normalizing flow. Moreover, we incorporate a constrained optimization framework into the algorithm to enhance the state-space representation capabilities and optimize the hyperparameters, leading to superior learning and inference performance. Experimental results on synthetic and real datasets corroborate that the proposed TGPSSM outperforms several state-of-the-art methods. The accompanying source code is available at \url{https://github.com/zhidilin/TGPSSM}.
△ Less
Submitted 6 April, 2023; v1 submitted 20 January, 2023;
originally announced January 2023.
-
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Authors:
Zhiqiu Lin,
Samuel Yu,
Zhiyi Kuang,
Deepak Pathak,
Deva Ramanan
Abstract:
The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, w…
▽ More
The ability to quickly learn a new task with minimal instruction - known as few-shot learning - is a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot samples from a single modality, but such samples may not be sufficient to characterize an entire concept class. In contrast, humans use cross-modal information to learn new concepts efficiently. In this work, we demonstrate that one can indeed build a better ${\bf visual}$ dog classifier by ${\bf read}$ing about dogs and ${\bf listen}$ing to them bark. To do so, we exploit the fact that recent multimodal foundation models such as CLIP learn cross-modal encoders that map different modalities to the same representation space. Specifically, we propose a simple strategy for ${\bf cross-modal}$ ${\bf adaptation}$: we treat examples from different modalities as additional few-shot examples. For example, by simply repurposing class names as an additional training sample, we trivially turn any n-shot learning problem into a (n+1)-shot problem. This allows us to produce SOTA results with embarrassingly simple linear classifiers. We show that our approach can be combined with existing methods such as prefix tuning, adapters, and classifier ensembling. Finally, to explore other modalities beyond vision and language, we construct the first (to our knowledge) audiovisual few-shot benchmark and use cross-modal training to improve the performance of both image and audio classification.
△ Less
Submitted 27 August, 2024; v1 submitted 16 January, 2023;
originally announced January 2023.
-
A Novel Exploitative and Explorative GWO-SVM Algorithm for Smart Emotion Recognition
Authors:
Xucun Yan,
Zihuai Lin,
Zhiyun Lin,
Branka Vucetic
Abstract:
Emotion recognition or detection is broadly utilized in patient-doctor interactions for diseases such as schizophrenia and autism and the most typical techniques are speech detection and facial recognition. However, features extracted from these behavior-based emotion recognitions are not reliable since humans can disguise their emotions. Recording voices or tracking facial expressions for a long…
▽ More
Emotion recognition or detection is broadly utilized in patient-doctor interactions for diseases such as schizophrenia and autism and the most typical techniques are speech detection and facial recognition. However, features extracted from these behavior-based emotion recognitions are not reliable since humans can disguise their emotions. Recording voices or tracking facial expressions for a long term is also not efficient. Therefore, our aim is to find a reliable and efficient emotion recognition scheme, which can be used for non-behavior-based emotion recognition in real-time. This can be solved by implementing a single-channel electrocardiogram (ECG) based emotion recognition scheme in a lightweight embedded system. However, existing schemes have relatively low accuracy. Therefore, we propose a reliable and efficient emotion recognition scheme - exploitative and explorative grey wolf optimizer based SVM (X - GWO - SVM) for ECG-based emotion recognition. Two datasets, one raw self-collected iRealcare dataset, and the widely-used benchmark WESAD dataset are used in the X - GWO - SVM algorithm for emotion recognition. This work demonstrates that the X - GWO - SVM algorithm can be used for emotion recognition and the algorithm exhibits superior performance in reliability compared to the use of other supervised machine learning methods in earlier works. It can be implemented in a lightweight embedded system, which is much more efficient than existing solutions based on deep neural networks.
△ Less
Submitted 4 January, 2023;
originally announced January 2023.
-
Quantum Sensing Based Joint 3D Beam Training for UAV-mounted STAR-RIS Aided TeraHertz Multi-user Massive MIMO Systems
Authors:
Xufang Wang,
Zihuai Lin,
Feng Lin,
Pei Xiao
Abstract:
Terahertz (THz) systems are capable of supporting ultra-high data rates thanks to large bandwidth, and the potential to harness high-gain beamforming to combat high pathloss. In this paper, a novel quantum sensing (Ghost Imaging (GI)) based beam training is proposed for Simultaneously Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR RIS) aided THz multi-user massive MIMO system…
▽ More
Terahertz (THz) systems are capable of supporting ultra-high data rates thanks to large bandwidth, and the potential to harness high-gain beamforming to combat high pathloss. In this paper, a novel quantum sensing (Ghost Imaging (GI)) based beam training is proposed for Simultaneously Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR RIS) aided THz multi-user massive MIMO systems. We first conduct GI by surrounding 5G downlink signals to obtain 3D images of the environment including users and obstacles. Based on the information, we calculate the optimal position of the UAV-mounted STAR by the proposed algorithm. Thus the position-based beam training can be performed. To enhance the beam-forming gain, we further combine with channel estimation and propose a semi-passive structure of the STAR and ambiguity elimination scheme for separated channel estimation. Thus the ambiguity in cascaded channel estimation, which may affect optimal passive beamforming, is avoided. The optimal active and passive beamforming are then carried out and data transmission is initiated. The proposed BS sub-array and sub-STAR spatial multiplexing architecture, optimal active and passive beamforming, digital precoding, and optimal position of the UAV- mounted STAR are investigated jointly to maximize the average achievable sum rate of the users. Moreover, the cloud radio access networks (CRAN) structured 5G downlink signal is proposed for GI with enhanced resolution. The simulation results show that the proposed scheme achieves beam training and separated channel estimation efficiently, and increases the spectral efficiency dramatically compared to the case when the STAR operates with random phase.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.