-
A Homogeneous Graph Neural Network for Precoding and Power Allocation in Scalable Wireless Networks
Authors:
Mingjun Sun,
Zeng Li,
Shaochuan Wu,
Yuanwei Liu,
Guoyu Li,
Tong Zhang
Abstract:
Deep learning is widely used in wireless communications but struggles with fixed neural network sizes, which limit their adaptability in environments where the number of users and antennas varies. To overcome this, this paper introduced a generalization strategy for precoding and power allocation in scalable wireless networks. Initially, we employ an innovative approach to abstract the wireless ne…
▽ More
Deep learning is widely used in wireless communications but struggles with fixed neural network sizes, which limit their adaptability in environments where the number of users and antennas varies. To overcome this, this paper introduced a generalization strategy for precoding and power allocation in scalable wireless networks. Initially, we employ an innovative approach to abstract the wireless network into a homogeneous graph. This primarily focuses on bypassing the heterogeneous features between transmitter (TX) and user entities to construct a virtual homogeneous graph serving optimization objectives, thereby enabling all nodes in the virtual graph to share the same neural network. This "TX entity" is known as a base station (BS) in cellular networks and an access point (AP) in cell-free networks. Subsequently, we design a universal graph neural network, termed the information carrying graph neural network (ICGNN), to capture and integrate information from this graph, maintaining permutation invariance. Lastly, using ICGNN as the core algorithm, we tailor the neural network's input and output for specific problem requirements and validate its performance in two scenarios: 1) in cellular networks, we develop a matrix-inverse-free multi-user multi-input multi-output (MU-MIMO) precoding scheme using the conjugate gradient (CG) method, adaptable to varying user and antenna numbers; 2) in a cell-free network, facing dynamic variations in the number of users served by APs, the number of APs serving each user, and the number of antennas per AP, we propose a universal power allocation scheme. Simulations demonstrate that the proposed approach not only significantly reduces computational complexity but also achieves, and potentially exceeds, the spectral efficiency (SE) of conventional algorithms.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
LV-UNet: A Lightweight and Vanilla Model for Medical Image Segmentation
Authors:
Juntao Jiang,
Mengmeng Wang,
Huizhong Tian,
Lingbo Cheng,
Yong Liu
Abstract:
Although the progress made by large models in computer vision, optimization challenges, the complexity of transformer models, computational limitations, and the requirements of practical applications call for simpler designs in model architecture for medical image segmentation, especially in mobile medical devices that require lightweight and deployable models with real-time performance. However,…
▽ More
Although the progress made by large models in computer vision, optimization challenges, the complexity of transformer models, computational limitations, and the requirements of practical applications call for simpler designs in model architecture for medical image segmentation, especially in mobile medical devices that require lightweight and deployable models with real-time performance. However, some of the current lightweight models exhibit poor robustness across different datasets, which hinders their broader adoption. This paper proposes a lightweight and vanilla model called LV-UNet, which effectively utilizes pre-trained MobileNetv3-Large models and introduces fusible modules. It can be trained using an improved deep training strategy and switched to deployment mode during inference, reducing both parameter count and computational load. Experiments are conducted on ISIC 2016, BUSI, CVC- ClinicDB, CVC-ColonDB, and Kvair-SEG datasets, achieving better performance compared to the state-of-the-art and classic models.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
A Deep Learning Approach to Localizing Multi-level Airway Collapse Based on Snoring Sounds
Authors:
Ying-Chieh Hsu,
Stanley Yung-Chuan Liu,
Chao-Jung Huang,
Chi-Wei Wu,
Ren-Kai Cheng,
Jane Yung-Jen Hsu,
Shang-Ran Huang,
Yuan-Ren Cheng,
Fu-Shun Hsu
Abstract:
This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The da…
▽ More
This study investigates the application of machine/deep learning to classify snoring sounds excited at different levels of the upper airway in patients with obstructive sleep apnea (OSA) using data from drug-induced sleep endoscopy (DISE). The snoring sounds of 39 subjects were analyzed and labeled according to the Velum, Oropharynx, Tongue Base, and Epiglottis (VOTE) classification system. The dataset, comprising 5,173 one-second segments, was used to train and test models, including Support Vector Machine (SVM), Bidirectional Long Short-Term Memory (BiLSTM), and ResNet-50. The ResNet-50, a convolutional neural network (CNN), showed the best overall performance in classifying snoring acoustics, particularly in identifying multi-level obstructions. The study emphasizes the potential of integrating snoring acoustics with deep learning to improve the diagnosis and treatment of OSA. However, challenges such as limited sample size, data imbalance, and differences between pharmacologically induced and natural snoring sounds were noted, suggesting further research to enhance model accuracy and generalizability.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Auxiliary Input in Training: Incorporating Catheter Features into Deep Learning Models for ECG-Free Dynamic Coronary Roadmapping
Authors:
Yikang Liu,
Lin Zhao,
Eric Z. Chen,
Xiao Chen,
Terrence Chen,
Shanhui Sun
Abstract:
Dynamic coronary roadmapping is a technology that overlays the vessel maps (the "roadmap") extracted from an offline image sequence of X-ray angiography onto a live stream of X-ray fluoroscopy in real-time. It aims to offer navigational guidance for interventional surgeries without the need for repeated contrast agent injections, thereby reducing the risks associated with radiation exposure and ki…
▽ More
Dynamic coronary roadmapping is a technology that overlays the vessel maps (the "roadmap") extracted from an offline image sequence of X-ray angiography onto a live stream of X-ray fluoroscopy in real-time. It aims to offer navigational guidance for interventional surgeries without the need for repeated contrast agent injections, thereby reducing the risks associated with radiation exposure and kidney failure. The precision of the roadmaps is contingent upon the accurate alignment of angiographic and fluoroscopic images based on their cardiac phases, as well as precise catheter tip tracking. The former ensures the selection of a roadmap that closely matches the vessel shape in the current frame, while the latter uses catheter tips as reference points to adjust for translational motion between the roadmap and the present vessel tree. Training deep learning models for both tasks is challenging and underexplored. However, incorporating catheter features into the models could offer substantial benefits, given humans heavily rely on catheters to complete the tasks. To this end, we introduce a simple but effective method, auxiliary input in training (AIT), and demonstrate that it enhances model performance across both tasks, outperforming baseline methods in knowledge incorporation and transfer learning.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Multi-modal Adversarial Training for Zero-Shot Voice Cloning
Authors:
John Janiczek,
Dading Chong,
Dongyang Dai,
Arlo Faria,
Chao Wang,
Tao Wang,
Yuzong Liu
Abstract:
A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnified for zero-shot voice cloning, a task that requires training data with high variance in speaking styles. We build off of recent works which have used…
▽ More
A text-to-speech (TTS) model trained to reconstruct speech given text tends towards predictions that are close to the average characteristics of a dataset, failing to model the variations that make human speech sound natural. This problem is magnified for zero-shot voice cloning, a task that requires training data with high variance in speaking styles. We build off of recent works which have used Generative Advsarial Networks (GAN) by proposing a Transformer encoder-decoder architecture to conditionally discriminates between real and generated speech features. The discriminator is used in a training pipeline that improves both the acoustic and prosodic features of a TTS model. We introduce our novel adversarial training technique by applying it to a FastSpeech2 acoustic model and training on Libriheavy, a large multi-speaker dataset, for the task of zero-shot voice cloning. Our model achieves improvements over the baseline in terms of speech quality and speaker similarity. Audio examples from our system are available online.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Latent Relationship Mining of Glaucoma Biomarkers: a TRI-LSTM based Deep Learning
Authors:
Cheng Huang,
Junhao Shen,
Qiuyu Luo,
Karanjit Kooner,
Tsengdar Lee,
Yishen Liu,
Jia Zhang
Abstract:
In recently years, a significant amount of research has been conducted on applying deep learning methods for glaucoma classification and detection. However, the explainability of those established machine learning models remains a big concern. In this research, in contrast, we learn from cognitive science concept and study how ophthalmologists judge glaucoma detection. Simulating experts' efforts,…
▽ More
In recently years, a significant amount of research has been conducted on applying deep learning methods for glaucoma classification and detection. However, the explainability of those established machine learning models remains a big concern. In this research, in contrast, we learn from cognitive science concept and study how ophthalmologists judge glaucoma detection. Simulating experts' efforts, we propose a hierarchical decision making system, centered around a holistic set of carefully designed biomarker-oriented machine learning models. While biomarkers represent the key indicators of how ophthalmologists identify glaucoma, they usually exhibit latent inter-relations. We thus construct a time series model, named TRI-LSTM, capable of calculating and uncovering potential and latent relationships among various biomarkers of glaucoma. Our model is among the first efforts to explore the intrinsic connections among glaucoma biomarkers. We monitor temporal relationships in patients' disease states over time and to capture and retain the progression of disease-relevant clinical information from prior visits, thereby enriching biomarker's potential relationships. Extensive experiments over real-world dataset have demonstrated the effectiveness of the proposed model.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Symbiotic Sensing and Communication: Framework and Beamforming Design
Authors:
Fanghao Xia,
Zesong Fei,
Xinyi Wang,
Weijie Yuan,
Qingqing Wu,
Yuanwei Liu,
Tony Q. S. Quek
Abstract:
In this paper, we propose a novel symbiotic sensing and communication (SSAC) framework, comprising a base station (BS) and a passive sensing node. In particular, the BS transmits communication waveform to serve vehicle users (VUEs), while the sensing node is employed to execute sensing tasks based on the echoes in a bistatic manner, thereby avoiding the issue of self-interference. Besides the weak…
▽ More
In this paper, we propose a novel symbiotic sensing and communication (SSAC) framework, comprising a base station (BS) and a passive sensing node. In particular, the BS transmits communication waveform to serve vehicle users (VUEs), while the sensing node is employed to execute sensing tasks based on the echoes in a bistatic manner, thereby avoiding the issue of self-interference. Besides the weak target of interest, the sensing node tracks VUEs and shares sensing results with BS to facilitate sensing-assisted beamforming. By considering both fully digital arrays and hybrid analog-digital (HAD) arrays, we investigate the beamforming design in the SSAC system. We first derive the Cramer-Rao lower bound (CRLB) of the two-dimensional angles of arrival estimation as the sensing metric. Next, we formulate an achievable sum rate maximization problem under the CRLB constraint, where the channel state information is reconstructed based on the sensing results. Then, we propose two penalty dual decomposition (PDD)-based alternating algorithms for fully digital and HAD arrays, respectively. Simulation results demonstrate that the proposed algorithms can achieve an outstanding data rate with effective localization capability for both VUEs and the weak target. In particular, the HAD beamforming design exhibits remarkable performance gain compared to conventional schemes, especially with fewer radio frequency chains.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Toward Mixed Analog-Digital Quantum Signal Processing: Quantum AD/DA Conversion and the Fourier Transform
Authors:
Yuan Liu,
John M. Martyn,
Jasmine Sinanan-Singh,
Kevin C. Smith,
Steven M. Girvin,
Isaac L. Chuang
Abstract:
Signal processing stands as a pillar of classical computation and modern information technology, applicable to both analog and digital signals. Recently, advancements in quantum information science have suggested that quantum signal processing (QSP) can enable more powerful signal processing capabilities. However, the developments in QSP have primarily leveraged \emph{digital} quantum resources, s…
▽ More
Signal processing stands as a pillar of classical computation and modern information technology, applicable to both analog and digital signals. Recently, advancements in quantum information science have suggested that quantum signal processing (QSP) can enable more powerful signal processing capabilities. However, the developments in QSP have primarily leveraged \emph{digital} quantum resources, such as discrete-variable (DV) systems like qubits, rather than \emph{analog} quantum resources, such as continuous-variable (CV) systems like quantum oscillators. Consequently, there remains a gap in understanding how signal processing can be performed on hybrid CV-DV quantum computers. Here we address this gap by developing a new paradigm of mixed analog-digital QSP. We demonstrate the utility of this paradigm by showcasing how it naturally enables analog-digital conversion of quantum signals -- specifically, the transfer of states between DV and CV quantum systems. We then show that such quantum analog-digital conversion enables new implementations of quantum algorithms on CV-DV hardware. This is exemplified by realizing the quantum Fourier transform of a state encoded on qubits via the free-evolution of a quantum oscillator, albeit with a runtime exponential in the number of qubits due to information theoretic arguments. Collectively, this work marks a significant step forward in hybrid CV-DV quantum computation, providing a foundation for scalable analog-digital signal processing on quantum processors.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Authors:
Xinyang Gu,
Yen-Jen Wang,
Xiang Zhu,
Chengming Shi,
Yanjiang Guo,
Yichen Liu,
Jianyu Chen
Abstract:
Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinfor…
▽ More
Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinforcement learning. In this work, we introduce Denoising World Model Learning (DWL), an end-to-end reinforcement learning framework for humanoid locomotion control, which demonstrates the world's first humanoid robot to master real-world challenging terrains such as snowy and inclined land in the wild, up and down stairs, and extremely uneven terrains. All scenarios run the same learned neural network with zero-shot sim-to-real transfer, indicating the superior robustness and generalization capability of the proposed method.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Histology Virtual Staining with Mask-Guided Adversarial Transfer Learning for Tertiary Lymphoid Structure Detection
Authors:
Qiuli Wang,
Yongxu Liu,
Li Ma,
Xianqi Wang,
Wei Chen,
Xiaohong Yao
Abstract:
Histological Tertiary Lymphoid Structures (TLSs) are increasingly recognized for their correlation with the efficacy of immunotherapy in various solid tumors. Traditionally, the identification and characterization of TLSs rely on immunohistochemistry (IHC) staining techniques, utilizing markers such as CD20 for B cells. Despite the specificity of IHC, Hematoxylin-Eosin (H&E) staining offers a more…
▽ More
Histological Tertiary Lymphoid Structures (TLSs) are increasingly recognized for their correlation with the efficacy of immunotherapy in various solid tumors. Traditionally, the identification and characterization of TLSs rely on immunohistochemistry (IHC) staining techniques, utilizing markers such as CD20 for B cells. Despite the specificity of IHC, Hematoxylin-Eosin (H&E) staining offers a more accessible and cost-effective choice. Capitalizing on the prevalence of H&E staining slides, we introduce a novel Mask-Guided Adversarial Transfer Learning method designed for virtual pathological staining. This method adeptly captures the nuanced color variations across diverse tissue types under various staining conditions, such as nucleus, red blood cells, positive reaction regions, without explicit label information, and adeptly synthesizes realistic IHC-like virtual staining patches, even replicating the positive reaction. Further, we propose the Virtual IHC Pathology Analysis Network (VIPA-Net), an integrated framework encompassing a Mask-Guided Transfer Module and an H&E-Based Virtual Staining TLS Detection Module. VIPA-Net synergistically harnesses both H\&E staining slides and the synthesized virtual IHC patches to enhance the detection of TLSs within H&E Whole Slide Images (WSIs). We evaluate the network with a comprehensive dataset comprising 1019 annotated slides from The Cancer Genome Atlas (TCGA). Experimental results compellingly illustrate that the VIPA-Net substantially elevates TLS detection accuracy, effectively circumventing the need for actual CD20 staining across the public dataset.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
Diversity and Multiplexing for Continuous Aperture Array (CAPA)-Based Communications
Authors:
Chongjun Ouyang,
Zhaolin Wang,
Xingqi Zhang,
Yuanwei Liu
Abstract:
The performance of multiplexing and diversity achieved by continuous aperture arrays (CAPAs) over fading channels is analyzed. Angular-domain fading models are derived for CAPA-based multiple-input single-output (MISO), single-input multiple-output (SIMO), and multiple-input multiple-output (MIMO) channels using the Fourier relationship between the spatial response and its angular-domain counterpa…
▽ More
The performance of multiplexing and diversity achieved by continuous aperture arrays (CAPAs) over fading channels is analyzed. Angular-domain fading models are derived for CAPA-based multiple-input single-output (MISO), single-input multiple-output (SIMO), and multiple-input multiple-output (MIMO) channels using the Fourier relationship between the spatial response and its angular-domain counterpart. Building on these models, angular-domain transmission frameworks are proposed to facilitate CAPA-based communications, under which the performance of multiplexing and diversity is analyzed. 1) For SIMO and MISO channels, closed-form expressions are derived for the average data rate (ADR) and outage probability (OP). Additionally, asymptotic analyses are performed in the high signal-to-noise ratio (SNR) regime to unveil the maximal multiplexing gain and maximal diversity gain. The diversity-multiplexing trade-off (DMT) is also characterized, along with the array gain within the DMT framework. 2) For MIMO channels, high-SNR approximations are derived for the ADR and OP, based on which the DMT and associated array gain are revealed. The performance of CAPAs is further compared with that of conventional spatially discrete arrays (SPDAs) to highlight the superiority of CAPAs. The analytical and numerical results demonstrate that: i) compared to SPDAs, CAPAs achieve a lower OP and higher ADR, resulting in better spectral efficiency; ii) CAPAs achieve the same DMT as SPDAs with half-wavelength antenna spacing while attaining a larger array gain; and iii) CAPAs achieve a better DMT than SPDAs with antenna spacing greater than half a wavelength.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
BCDNet: A Convolutional Neural Network For Breast Cancer Detection
Authors:
Yujia Lin,
Aiwei Lian,
Mingyu Liao,
Yipeng Liu
Abstract:
Previous research has established that breast cancer is a prevalent cancer type, with Invasive Ductal Carcinoma (IDC) being the most common subtype. The incidence of this dangerous cancer continues to rise, making accurate and rapid diagnosis, particularly in the early stages, critically important. While modern Computer-Aided Diagnosis (CAD) systems can address most cases, medical professionals st…
▽ More
Previous research has established that breast cancer is a prevalent cancer type, with Invasive Ductal Carcinoma (IDC) being the most common subtype. The incidence of this dangerous cancer continues to rise, making accurate and rapid diagnosis, particularly in the early stages, critically important. While modern Computer-Aided Diagnosis (CAD) systems can address most cases, medical professionals still face challenges in using them in the field without powerful computing resources. In this paper, we propose a novel CNN model called BCDNet, which effectively detects IDC in histopathological images with an accuracy of up to 89.5% and reduces training time effectively.
△ Less
Submitted 26 August, 2024; v1 submitted 25 August, 2024;
originally announced August 2024.
-
Generative AI based Secure Wireless Sensing for ISAC Networks
Authors:
Jiacheng Wang,
Hongyang Du,
Yinqiu Liu,
Geng Sun,
Dusit Niyato,
Shiwen Mao,
Dong In Kim,
Xuemin Shen
Abstract:
Integrated sensing and communications (ISAC) is expected to be a key technology for 6G, and channel state information (CSI) based sensing is a key component of ISAC. However, current research on ISAC focuses mainly on improving sensing performance, overlooking security issues, particularly the unauthorized sensing of users. In this paper, we propose a secure sensing system (DFSS) based on two dist…
▽ More
Integrated sensing and communications (ISAC) is expected to be a key technology for 6G, and channel state information (CSI) based sensing is a key component of ISAC. However, current research on ISAC focuses mainly on improving sensing performance, overlooking security issues, particularly the unauthorized sensing of users. In this paper, we propose a secure sensing system (DFSS) based on two distinct diffusion models. Specifically, we first propose a discrete conditional diffusion model to generate graphs with nodes and edges, guiding the ISAC system to appropriately activate wireless links and nodes, which ensures the sensing performance while minimizing the operation cost. Using the activated links and nodes, DFSS then employs the continuous conditional diffusion model to generate safeguarding signals, which are next modulated onto the pilot at the transmitter to mask fluctuations caused by user activities. As such, only ISAC devices authorized with the safeguarding signals can extract the true CSI for sensing, while unauthorized devices are unable to achieve the same sensing. Experiment results demonstrate that DFSS can reduce the activity recognition accuracy of the unauthorized devices by approximately 70%, effectively shield the user from the unauthorized surveillance.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Full-Duplex ISAC-Enabled D2D Underlaid Cellular Networks: Joint Transceiver Beamforming and Power Allocation
Authors:
Tao Jiang,
Ming Jin,
Qinghua Guo,
Yinhong Liu,
Yaming Li
Abstract:
Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to impro…
▽ More
Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to improve the performance of the coexisting ISAC and D2D networks. To enhance spectral efficiency, a sum rate maximization problem is formulated for the full-duplex ISAC-based D2D underlaid system, which is non-convex. To solve the non-convex optimization problem, we propose a successive convex approximation (SCA)-based iterative algorithm and prove its convergence. Numerical results are provided to validate the effectiveness of the proposed scheme with the iterative algorithm, demonstrating that the proposed scheme outperforms state-of-the-art ones in both communication and sensing performance.
△ Less
Submitted 21 August, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Measurement-based Fast Quantum State Stabilization with Deep Reinforcement Learning
Authors:
Chunxiang Song,
Yanan Liu,
Daoyi Dong,
Hidehiro Yonezawa
Abstract:
The stabilization of quantum states is a fundamental problem for realizing various quantum technologies. Measurement-based-feedback strategies have demonstrated powerful performance, and the construction of quantum control signals using measurement information has attracted great interest. However, the interaction between quantum systems and the environment is inevitable, especially when measureme…
▽ More
The stabilization of quantum states is a fundamental problem for realizing various quantum technologies. Measurement-based-feedback strategies have demonstrated powerful performance, and the construction of quantum control signals using measurement information has attracted great interest. However, the interaction between quantum systems and the environment is inevitable, especially when measurements are introduced, which leads to decoherence. To mitigate decoherence, it is desirable to stabilize quantum systems faster, thereby reducing the time of interaction with the environment. In this paper, we utilize information obtained from measurement and apply deep reinforcement learning (DRL) algorithms, without explicitly constructing specific complex measurement-control mappings, to rapidly drive random initial quantum state to the target state. The proposed DRL algorithm has the ability to speed up the convergence to a target state, which shortens the interaction between quantum systems and their environments to protect coherence. Simulations are performed on two-qubit and three-qubit systems, and the results show that our algorithm can successfully stabilize random initial quantum system to the target entangled state, with a convergence time faster than traditional methods such as Lyapunov feedback control. Moreover, it exhibits robustness against imperfect measurements and delays in system evolution.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Multi-User Continuous-Aperture Array Communications: How to Learn Current Distribution?
Authors:
Jia Guo,
Yuanwei Liu,
Arumugam Nallanathan
Abstract:
The continuous aperture array (CAPA) can provide higher degree-of-freedom and spatial resolution than the spatially discrete array (SDPA), where optimizing multi-user current distributions in CAPA systems is crucial but challenging. The challenge arises from solving non-convex functional optimization problems without closed-form objective functions and constraints. In this paper, we propose a deep…
▽ More
The continuous aperture array (CAPA) can provide higher degree-of-freedom and spatial resolution than the spatially discrete array (SDPA), where optimizing multi-user current distributions in CAPA systems is crucial but challenging. The challenge arises from solving non-convex functional optimization problems without closed-form objective functions and constraints. In this paper, we propose a deep learning framework called L-CAPA to learn current distribution policies. In the framework, we find finite-dimensional representations of channel functions and current distributions, allowing them to be inputted into and outputted from a deep neural network (DNN) for learning the policy. To address the issue that the integrals in the loss function without closed-form expressions hinder training the DNN in an unsupervised manner, we propose to design another two DNNs for learning the integrals. The DNNs are designed as graph neural networks to incorporate with the permutation properties of the mappings to be learned, thereby improving learning performance. Simulation results show that L-CAPA can achieve the performance upper-bound of optimizing precoding in the SDPA system as the number of antennas approaches infinity, and it is with low inference complexity.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Authors:
Yuankun Xie,
Chenxu Xiong,
Xiaopeng Wang,
Zhiyong Wang,
Yi Lu,
Xin Qi,
Ruibo Fu,
Yukun Liu,
Zhengqi Wen,
Jianhua Tao,
Guanjun Li,
Long Ye
Abstract:
Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a…
▽ More
Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based audio have become increasingly critical. This paper investigate the effectiveness of current countermeasure (CM) against ALM-based audio. Specifically, we collect 12 types of the latest ALM-based deepfake audio and utilizing the latest CMs to evaluate. Our findings reveal that the latest codec-trained CM can effectively detect ALM-based audio, achieving 0% equal error rate under most ALM test conditions, which exceeded our expectations. This indicates promising directions for future research in ALM-based deepfake audio detection.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Authors:
Xin Qi,
Ruibo Fu,
Zhengqi Wen,
Jianhua Tao,
Shuchen Shi,
Yi Lu,
Zhiyong Wang,
Xiaopeng Wang,
Yuankun Xie,
Yukun Liu,
Guanjun Li,
Xuefei Liu,
Yongwei Li
Abstract:
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t…
▽ More
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into the pre-introduced conditional parts of the speech models. This fixes the position of LoRA, limiting the flexibility and scalability of its application. Therefore, we propose the Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech (EELE) method. Starting from a general neutral speech model, we do not pre-introduce emotional information but instead use the LoRA plugin to design a flexible adaptive scheme that endows the model with emotional generation capabilities. Specifically, we initially train the model using only neutral speech data. After training is complete, we insert LoRA into different modules and fine-tune the model with emotional speech data to find the optimal insertion scheme. Through experiments, we compare and test the effects of inserting LoRA at different positions within the model and assess LoRA's ability to learn various emotions, effectively proving the validity of our method. Additionally, we explore the impact of the rank size of LoRA and the difference compared to directly fine-tuning the entire model.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
A Noval Feature via Color Quantisation for Fake Audio Detection
Authors:
Zhiyong Wang,
Xiaopeng Wang,
Yuankun Xie,
Ruibo Fu,
Zhengqi Wen,
Jianhua Tao,
Yukun Liu,
Guanjun Li,
Xin Qi,
Yi Lu,
Xuefei Liu,
Yongwei Li
Abstract:
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model…
▽ More
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model distinguish fake audio. However, the disadvantage lies in poor interpretability, meaning it is hard to intuitively present the differences between deepfake and real audio. This paper proposes a noval feature extraction method via color quantisation which constrains the reconstruction to use a limited number of colors for the spectral image-like input. The proposed method ensures reconstructed input differs from the original, which allows for intuitive observation of the focus areas in the spectral reconstruction. Experiments conducted on the ASVspoof2019 dataset demonstrate that the proposed method achieves better classification performance compared to using the original spectral as input and pretraining the recolor network can also benefit the fake audio detection.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Performance Analysis of Physical Layer Security: From Far-Field to Near-Field
Authors:
Boqun Zhao,
Chongjun Ouyang,
Xingqi Zhang,
Yuanwei Liu
Abstract:
The secrecy performance in both near-field and far-field communications is analyzed using two fundamental metrics: the secrecy capacity under a power constraint and the minimum power requirement to achieve a specified secrecy rate target. 1) For the secrecy capacity, a closed-form expression is derived under a discrete-time memoryless setup. This expression is further analyzed under several far-fi…
▽ More
The secrecy performance in both near-field and far-field communications is analyzed using two fundamental metrics: the secrecy capacity under a power constraint and the minimum power requirement to achieve a specified secrecy rate target. 1) For the secrecy capacity, a closed-form expression is derived under a discrete-time memoryless setup. This expression is further analyzed under several far-field and near-field channel models, and the capacity scaling law is revealed by assuming an infinitely large transmit array and an infinitely high power. A novel concept of "depth of insecurity" is proposed to evaluate the secrecy performance achieved by near-field beamfocusing. It is demonstrated that increasing the number of transmit antennas reduces this depth and thus improves the secrecy performance. 2) Regarding the minimum required power, a closed-form expression is derived and analyzed within far-field and near-field scenarios. Asymptotic analyses are performed by setting the number of transmit antennas to infinity to unveil the power scaling law. Numerical results are provided to demonstrate that: i) compared to far-field communications, near-field communications expand the areas where secure transmission is feasible, specifically when the eavesdropper is located in the same direction as the intended receiver; ii) as the number of transmit antennas increases, neither the secrecy capacity nor the minimum required power scales or vanishes unboundedly, adhering to the principle of energy conservation.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Near-Orthogonal Overlay Communications in LoS Channel Enabled by Novel OAM Beams without Central Energy Voids: An Experimental Study
Authors:
Yufei Zhao,
Xiaoyan Ma,
Yong Liang Guan,
Yile Liu,
Afkar Mohamed Ismail,
Xiaobei Liu,
Siew Yam Yeo,
Chau Yuen
Abstract:
This paper introduces a groundbreaking Line-of-Sight (LoS) Multiple-Input Multiple-Output (MIMO) communication architecture leveraging non-traditional Orbital Angular Momentum (OAM) beams. Challenging the conventional paradigm of hollow-emitting OAM beams, this study presents an innovative OAM transmitter design that produces directional OAM beams without central energy voids, aligning their radia…
▽ More
This paper introduces a groundbreaking Line-of-Sight (LoS) Multiple-Input Multiple-Output (MIMO) communication architecture leveraging non-traditional Orbital Angular Momentum (OAM) beams. Challenging the conventional paradigm of hollow-emitting OAM beams, this study presents an innovative OAM transmitter design that produces directional OAM beams without central energy voids, aligning their radiation patterns with those of conventional planar wave horn antennas. Within the main lobe of these antennas, the phase variation characteristics inherent to OAM beams are ingeniously maintained, linking different OAM modes to the linear wavefront variation gradients, thereby reducing channel correlation in LoS scenarios and significantly augmenting the channel capacity of LoS-MIMO frameworks. Empirical validations conducted through a meticulously designed LoS-MIMO experimental platform reveal significant improvements in channel correlation coefficients, communication stability, and Bit Error Rate (BER) compared to systems utilizing traditional planar wave antennas. The experiment results underscore the potential of the novel OAM-based system to improve current LoS-MIMO communication protocols, and offer both academic and engineering guidance for the construction of practical communication infrastructures. Beyond its immediate contributions, this paper underscores a pivotal shift in the field of communications, pointing out that traditional communication algorithms have primarily focused on baseband signal processing while often overlooking the electromagnetic characteristics of the physical world.
△ Less
Submitted 27 July, 2024;
originally announced August 2024.
-
Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development
Authors:
Yuncheng Jiang,
Yiwen Hu,
Zixun Zhang,
Jun Wei,
Chun-Mei Feng,
Xuemei Tang,
Xiang Wan,
Yong Liu,
Shuguang Cui,
Zhen Li
Abstract:
Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS s…
▽ More
Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS scenarios, i.e. colorectal cancer segmentation, detection, and infiltration depth staging. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. Based on this dataset, we further introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR). ASTR is designed based on three considerations: scanning mode discrepancy, temporal information, and low computational complexity. For generalizing to different scanning modes, the adaptive scanning-mode augmentation is proposed to convert between raw sector images and linear scan ones. For mining temporal information, the sparse-context transformer is incorporated to integrate inter-frame local and global features. For reducing computational complexity, the sparse-context block is introduced to extract contextual features from auxiliary frames. Finally, on the benchmark dataset, the proposed ASTR model achieves a 77.6% Dice score in rectal cancer segmentation, largely outperforming previous state-of-the-art methods.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Near-Field Sensing: A Low-Complexity Wavenumber-Domain Method
Authors:
Hao Jiang,
Zhaolin Wang,
Yuanwei Liu
Abstract:
A novel low-complexity wavenumber-domain method is proposed for near-field sensing (NISE). Specifically, the power-concentrated region of the wavenumber-domain channels is related to the target position in a non-linear manner. Based on this observation, a bi-directional convolutional neural network (BiCNN)-based approach is proposed to capture such a relationship, thereby facilitating low-complexi…
▽ More
A novel low-complexity wavenumber-domain method is proposed for near-field sensing (NISE). Specifically, the power-concentrated region of the wavenumber-domain channels is related to the target position in a non-linear manner. Based on this observation, a bi-directional convolutional neural network (BiCNN)-based approach is proposed to capture such a relationship, thereby facilitating low-complexity target localization. This method enables direct and gridless target localization using only a limited bandwidth and a single antenna array. Simulation results demonstrate that: 1) during the offline training phase, the proposed BiCNN method can learn to localize the target with fewer trainable parameters compared to the naive neural network architectures; and 2) during the online implementation phase, the BiCNN method can spend 100x less time while maintaining comparable performance to the conventional two-dimensional multiple signal classification (MUSIC) algorithms.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling
Authors:
Weidong Mei,
Xin Wei,
Yijie Liu,
Boyu Ning,
Zhi Chen
Abstract:
Fluid antennas (FAs) and mobile antennas (MAs) are innovative technologies in wireless communications that are able to proactively improve channel conditions by dynamically adjusting the transmit/receive antenna positions within a given spatial region. In this paper, we investigate an MA-enhanced multiple-input single-output (MISO) secure communication system, aiming to maximize the secrecy rate b…
▽ More
Fluid antennas (FAs) and mobile antennas (MAs) are innovative technologies in wireless communications that are able to proactively improve channel conditions by dynamically adjusting the transmit/receive antenna positions within a given spatial region. In this paper, we investigate an MA-enhanced multiple-input single-output (MISO) secure communication system, aiming to maximize the secrecy rate by jointly optimizing the positions of multiple MAs. Instead of continuously searching for the optimal MA positions as in prior works, we propose to discretize the transmit region into multiple sampling points, thereby converting the continuous antenna position optimization into a discrete sampling point selection problem. However, this point selection problem is combinatory and thus difficult to be optimally solved. To tackle this challenge, we ingeniously transform this combinatory problem into a recursive path selection problem in graph theory and propose a partial enumeration algorithm to obtain its optimal solution without the need for high-complexity exhaustive search. To further reduce the complexity, a linear-time sequential update algorithm is also proposed to obtain a high-quality suboptimal solution. Numerical results show that our proposed algorithms yield much higher secrecy rates as compared to the conventional FPA and other baseline schemes.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark
Authors:
Senmao Wang,
Haifan Gong,
Runmeng Cui,
Boyao Wan,
Yicheng Liu,
Zhonglin Hu,
Haiqing Yang,
Jingyang Zhou,
Bo Pan,
Lin Lin,
Haiyue Jiang
Abstract:
Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co…
▽ More
Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range costal cartilage relationships. Our method leverages a deformable model that integrates topological priors to enhance the adaptability and accuracy of the segmentation process. Furthermore, we developed a comprehensive benchmark that contains 165 cases for costal cartilage segmentation. This benchmark sets a new standard for evaluating costal cartilage segmentation techniques and provides a valuable resource for future research. Extensive experiments conducted on both in-domain benchmarks and out-of domain test sets demonstrate the superiority of our approach over existing methods, showing significant improvements in segmentation precision and robustness.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Chirped DFT-s-OFDM: A new single-carrier waveform with enhanced LMMSE noise suppression
Authors:
Yujie Liu,
Yong Liang Guan,
David González G.,
Halim Yanikomeroglu
Abstract:
In this correspondence, a new single-carrier waveform, called chirped discrete Fourier transform spread orthogonal frequency division multiplexing (DFT-s-OFDM), is proposed for the sixth generation of communications. By chirping DFT-s-OFDM in the time domain, the proposed waveform maintains the low peak-to-average-power ratio (PAPR) of DFT-s-OFDM. Thanks to full-band transmission and symbols retra…
▽ More
In this correspondence, a new single-carrier waveform, called chirped discrete Fourier transform spread orthogonal frequency division multiplexing (DFT-s-OFDM), is proposed for the sixth generation of communications. By chirping DFT-s-OFDM in the time domain, the proposed waveform maintains the low peak-to-average-power ratio (PAPR) of DFT-s-OFDM. Thanks to full-band transmission and symbols retransmission enabled by chirping and discrete Fourier transform (DFT) precoding, the proposed waveform can enhance noise suppression of linear minimum mean square error equalization. Its bit error rate (BER) upper bound and diversity order are derived using pairwise error probability. Simulation results confirm that the proposed waveform outperforms the state-of-the-art waveforms in terms of BER, output signal-to-noise-ratio, and PAPR.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
A Comprehensive Survey on EEG-Based Emotion Recognition: A Graph-Based Perspective
Authors:
Chenyu Liu,
Xinliang Zhou,
Yihao Wu,
Yi Ding,
Liming Zhai,
Kun Wang,
Ziyu Jia,
Yang Liu
Abstract:
Compared to other modalities, electroencephalogram (EEG) based emotion recognition can intuitively respond to emotional patterns in the human brain and, therefore, has become one of the most focused tasks in affective computing. The nature of emotions is a physiological and psychological state change in response to brain region connectivity, making emotion recognition focus more on the dependency…
▽ More
Compared to other modalities, electroencephalogram (EEG) based emotion recognition can intuitively respond to emotional patterns in the human brain and, therefore, has become one of the most focused tasks in affective computing. The nature of emotions is a physiological and psychological state change in response to brain region connectivity, making emotion recognition focus more on the dependency between brain regions instead of specific brain regions. A significant trend is the application of graphs to encapsulate such dependency as dynamic functional connections between nodes across temporal and spatial dimensions. Concurrently, the neuroscientific underpinnings behind this dependency endow the application of graphs in this field with a distinctive significance. However, there is neither a comprehensive review nor a tutorial for constructing emotion-relevant graphs in EEG-based emotion recognition. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of graph-related methods in this field from a methodological perspective. We propose a unified framework for graph applications in this field and categorize these methods on this basis. Finally, based on previous studies, we also present several open challenges and future directions in this field.
△ Less
Submitted 13 August, 2024; v1 submitted 12 August, 2024;
originally announced August 2024.
-
Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images
Authors:
Shouyue Liu,
Jinkui Hao,
Yonghuai Liu,
Huazhu Fu,
Xinyu Guo,
Shuting Zhang,
Yitian Zhao
Abstract:
Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryologic…
▽ More
Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryological origins and physiological characteristics of the retina and brain, retinal imaging is emerging as a potentially rapid and cost-effective alternative for the identification of individuals with or at high risk of AD. In this paper, we present a novel PolarNet+ that uses retinal optical coherence tomography angiography (OCTA) to discriminate early-onset AD (EOAD) and MCI subjects from controls. Our method first maps OCTA images from Cartesian coordinates to polar coordinates, allowing approximate sub-region calculation to implement the clinician-friendly early treatment of diabetic retinopathy study (ETDRS) grid analysis. We then introduce a multi-view module to serialize and analyze the images along three dimensions for comprehensive, clinically useful information extraction. Finally, we abstract the sequence embedding into a graph, transforming the detection task into a general graph classification problem. A regional relationship module is applied after the multi-view module to excavate the relationship between the sub-regions. Such regional relationship analyses validate known eye-brain links and reveal new discriminative patterns.
△ Less
Submitted 9 August, 2024;
originally announced August 2024.
-
Secure Transmission for Movable Antennas Empowered Cell-Free Symbiotic Radio Communications
Authors:
Jiayu Guan,
Bin Lyu,
Yan Liu,
Feng Tian
Abstract:
In this paper, a novel movable antenna (MA) empowered secure transmission scheme is designed for cell-free symbiotic radio (SR) systems in the presence of an eavesdropper (Eve). Specifically, multiple distributed access points (APs) equipped with MAs collaboratively transmit confidential information to the primary user (PU), in the meanwhile the backscatter device (BD) transmits its own informatio…
▽ More
In this paper, a novel movable antenna (MA) empowered secure transmission scheme is designed for cell-free symbiotic radio (SR) systems in the presence of an eavesdropper (Eve). Specifically, multiple distributed access points (APs) equipped with MAs collaboratively transmit confidential information to the primary user (PU), in the meanwhile the backscatter device (BD) transmits its own information to the secondary user (SU) by reflecting incident signals from the APs. The MAs deployed at the APs can adjust their positions flexibly to improve channel conditions between the APs and the PU/SU/BD and suppress the eavesdropping from the Eve on confidential information at the PU. Under this setup, we maximize the secrecy rate of primary transmission through jointly optimizing the APs' transmission beamforming vectors and the positions of the MAs, while adhering to the quality of service constraints at the SU. To address the challenges caused by the non-convexity and find a near-optimal solution, an alternating optimization (AO) framework is proposed, utilizing the successive convex approximation method, the semi-definite relaxation technology and a genetic algorithm modified particle swarm optimization (GA-PSO) algorithm. Numerical results demonstrate the secrecy rate enhancement provided by utilizing the MAs and show the impact of the GA-PSO algorithm for improving the solving accuracy.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Near-Field Sensing Enabled Predictive Beamforming: From Estimation to Tracking
Authors:
Hao Jiang,
Zhaolin Wang,
Yuanwei Liu
Abstract:
A near-field sensing (NISE) enabled predictive beamforming framework is proposed to facilitate wireless communications with high-mobility channels. Unlike conventional far-field sensing, which only captures the angle and the radial velocity of the user, NISE enables the estimation of the full motion state, including additional distance and transverse velocity information. Two full-motion state sen…
▽ More
A near-field sensing (NISE) enabled predictive beamforming framework is proposed to facilitate wireless communications with high-mobility channels. Unlike conventional far-field sensing, which only captures the angle and the radial velocity of the user, NISE enables the estimation of the full motion state, including additional distance and transverse velocity information. Two full-motion state sensing approaches are proposed based on the concepts of estimation and tracking, respectively. 1)AGD-AO approach: The full motion state of the user is estimated within a single CPI. In particular, the gradient descent is adopted to estimate the transverse and radial velocities of the user based on the maximum likelihood criteria, while the distance and the angle are calculated by the kinematic model. In this process, moment estimations are leveraged to adaptively tune the step size, thereby leading to a smoother and faster gradient descent. 2) EKF approach: The full motion state of the user is tracked across multiple CPIs. Based on the noisy measurements in multiple CPIs, the EKF iteratively predicts and updates the current motion state to achieve a low tracking error. Based on the obtained full motion state, the beam prediction, and Doppler frequency compensation can be carried out with minimum pilot overhead. Numerical results are provided to validate the effectiveness and efficiency of the proposed approach compared to the conventional far-field predictive beamforming and feedback-based approaches. It is also revealed that: 1)the proposed AGD-AO can achieve stable descending with small gradients, thereby accelerating convergence; 2) compared to far-field predictive beamforming and feedback-based schemes, both of the proposed methods exhibit superior performance; and 3) by incorporating multiple CPIs, the EKF method exhibits greater robustness in low SNR regions.
△ Less
Submitted 4 August, 2024;
originally announced August 2024.
-
Multibeam Hybrid Transmitarray Based on Polarization Rotating Metasurface With Reconfigurable Bidirectional Radiation
Authors:
Fan Qin,
Yifei Liu,
Chao Gu,
Linfeng Zeng,
Wenchi Cheng,
Hailin Zhang,
Steven Gao
Abstract:
This paper proposes a bidirectional multibeam hybrid transmitarray (HTA) employing a transmission polarization-rotating metasurface (TPRM). A novel configuration is introduced to facilitate bidirectional beam scanning by combining the transmitarray (TA) and folded-transmitarray (FTA). To accomplish the reconfiguration of both unidirectional and bidirectional radiation states in the +z, -z, and +/-…
▽ More
This paper proposes a bidirectional multibeam hybrid transmitarray (HTA) employing a transmission polarization-rotating metasurface (TPRM). A novel configuration is introduced to facilitate bidirectional beam scanning by combining the transmitarray (TA) and folded-transmitarray (FTA). To accomplish the reconfiguration of both unidirectional and bidirectional radiation states in the +z, -z, and +/-z directions, a polarization switchable multi-feed array (MFA) is placed at the focal plane between the TA and FTA, radiating x-polarization, y-polarization, and 45-degree oblique polarization waves, respectively. Meanwhile, the proposed antenna can achieve multibeam radiation in the three aforementioned states by switching the polarization of the MFA. To demonstrate the operating principle, a prototype has been designed, simulated, and fabricated. The measured results agree well with the simulated results. The simulated and measured results indicate that the proposed design can generate reconfigurable multibeam in both forward and backward directions, either separately or simultaneously. In the unidirectional states, forward and backward beam scanning is achieved within an angular range of +/-30° and +/-22°, respectively, with peak gains of 23.6 dBi and 23.1 dBi. A simultaneous forward and backward beam scanning of +/-40° and +/-22° is achieved in the hybrid radiation state, with peak gains of 19.4 dBi and 19.3 dBi, respectively. The proposed antenna array design offers several advantages, including bidirectional low-loss beam scanning, a simple structure, low power consumption, and a low profile.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Augmenting Channel Simulator and Semi- Supervised Learning for Efficient Indoor Positioning
Authors:
Yupeng Li,
Xinyu Ning,
Shijian Gao,
Yitong Liu,
Zhi Sun,
Qixing Wang,
Jiangzhou Wang
Abstract:
This work aims to tackle the labor-intensive and resource-consuming task of indoor positioning by proposing an efficient approach. The proposed approach involves the introduction of a semi-supervised learning (SSL) with a biased teacher (SSLB) algorithm, which effectively utilizes both labeled and unlabeled channel data. To reduce measurement expenses, unlabeled data is generated using an updated…
▽ More
This work aims to tackle the labor-intensive and resource-consuming task of indoor positioning by proposing an efficient approach. The proposed approach involves the introduction of a semi-supervised learning (SSL) with a biased teacher (SSLB) algorithm, which effectively utilizes both labeled and unlabeled channel data. To reduce measurement expenses, unlabeled data is generated using an updated channel simulator (UCHS), and then weighted by adaptive confidence values to simplify the tuning of hyperparameters. Simulation results demonstrate that the proposed strategy achieves superior performance while minimizing measurement overhead and training expense compared to existing benchmarks, offering a valuable and practical solution for indoor positioning.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Joint Vehicle Connection and Beamforming Optimization in Digital Twin Assisted Integrated Sensing and Communication Vehicular Networks
Authors:
Weihang Ding,
Zhaohui Yang,
Mingzhe Chen,
Yuchen Liu,
Mohammad Shikh-Bahaei
Abstract:
This paper introduces an approach to harness digital twin (DT) technology in the realm of integrated sensing and communications (ISAC) in the sixth-generation (6G) Internet-of-everything (IoE) applications. We consider moving targets in a vehicular network and use DT to track and predict the motion of the vehicles. After predicting the location of the vehicle at the next time slot, the DT designs…
▽ More
This paper introduces an approach to harness digital twin (DT) technology in the realm of integrated sensing and communications (ISAC) in the sixth-generation (6G) Internet-of-everything (IoE) applications. We consider moving targets in a vehicular network and use DT to track and predict the motion of the vehicles. After predicting the location of the vehicle at the next time slot, the DT designs the assignment and beamforming for each vehicle. The real time sensing information is then utilized to update and refine the DT, enabling further processing and decision-making. This model incorporates a dynamic Kalman gain, which is updated at each time slot based on the received echo signals. The state representation encompasses both vehicle motion information and the error matrix, with the posterior Cramér-Rao bound (PCRB) employed to assess sensing accuracy. We consider a network with two roadside units (RSUs), and the vehicles need to be allocated to one of them. To optimize the overall transmission rate while maintaining an acceptable sensing accuracy, an optimization problem is formulated. Since it is generally hard to solve the original problem, Lagrange multipliers and fractional programming are employed to simplify this optimization problem. To solve the simplified problem, this paper introduces both greedy and heuristic algorithms through optimizing both vehicle assignments and predictive beamforming. The optimized results are then transferred back to the real space for ISAC applications. Recognizing the computational complexity of the greedy and heuristic algorithms, a bidirectional long short-term memory (LSTM)-based recurrent neural network (RNN) is proposed for efficient beamforming design within the DT. Simulation results demonstrate the effectiveness of the DT-based ISAC network.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming
Authors:
Wenhai Lai,
Zheyu Wu,
Yi Feng,
Kaiming Shen,
Ya-Feng Liu
Abstract:
Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif…
▽ More
Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Differing from most existing works, this letter advocates a convex-hull relaxation of the discrete constraints which leads to a continuous reformulated problem equivalent to the original discrete problem. This letter further proposes an efficient alternating projection/proximal gradient descent and ascent algorithm for solving the reformulated problem. Simulation results show that the proposed algorithm outperforms the state-of-the-art methods significantly.
△ Less
Submitted 28 August, 2024; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Suppressing Beam Squint Effect For Near-Field Wideband Communication Through Movable Antennas
Authors:
Yanze Zhu,
Qingqing Wu,
Yang Liu,
Qingjiang Shi,
Wen Chen
Abstract:
In this correspondence, we study deploying movable antenna (MA) array in a wideband multiple-input-single-output (MISO) communication system, where near-field (NF) channel model is considered. To alleviate beam squint effect, we propose to maximize the minimum analog beamforming gain across the entire wideband spectrum by appropriately adjusting MAs' positions, which is a highly challenging task.…
▽ More
In this correspondence, we study deploying movable antenna (MA) array in a wideband multiple-input-single-output (MISO) communication system, where near-field (NF) channel model is considered. To alleviate beam squint effect, we propose to maximize the minimum analog beamforming gain across the entire wideband spectrum by appropriately adjusting MAs' positions, which is a highly challenging task. By introducing a slack variable and adopting the cutting-the-edge smoothed-gradient-descent-ascent (SGDA) method, we develop algorithms to resolve the aforementioned challenge. Numerical results verify the effectiveness of our proposed algorithms and demonstrate the benefit of utilizing MA array to mitigate beam squint effect in NF wideband system.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Integrating Posture Control in Speech Motor Models: A Parallel-Structured Simulation Approach
Authors:
Yadong Liu,
Sidney Fels,
Arian Shamei,
Najeeb Khan,
Bryan Gick
Abstract:
Posture is an essential aspect of motor behavior, necessitating continuous muscle activation to counteract gravity. It remains stable under perturbation, aiding in maintaining bodily balance and enabling movement execution. Similarities have been observed between gross body postures and speech postures, such as those involving the jaw, tongue, and lips, which also exhibit resilience to perturbatio…
▽ More
Posture is an essential aspect of motor behavior, necessitating continuous muscle activation to counteract gravity. It remains stable under perturbation, aiding in maintaining bodily balance and enabling movement execution. Similarities have been observed between gross body postures and speech postures, such as those involving the jaw, tongue, and lips, which also exhibit resilience to perturbations and assist in equilibrium and movement. Although postural control is a recognized element of human movement and balance, particularly in broader motor skills, it has not been adequately incorporated into existing speech motor control models, which typically concentrate on the gestures or motor commands associated with specific speech movements, overlooking the influence of postural control and gravity. Here we introduce a model that aligns speech posture and movement, using simulations to explore whether speech posture within this framework mirrors the principles of bodily postural control. Our findings indicate that, akin to body posture, speech posture is also robust to perturbation and plays a significant role in maintaining local segment balance and enhancing speech production.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Speech Editing -- a Summary
Authors:
Tobias Kässmann,
Yining Liu,
Danni Liu
Abstract:
With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by alte…
▽ More
With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by altering the mel-spectrogram. Recent advancements, such as context-aware prosody correction and advanced attention mechanisms, have improved speech editing quality. This paper reviews state-of-the-art methods, compares key metrics, and examines widely used datasets. The aim is to highlight ongoing issues and inspire further research and innovation in speech editing.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
APS-USCT: Ultrasound Computed Tomography on Sparse Data via AI-Physic Synergy
Authors:
Yi Sheng,
Hanchen Wang,
Yipei Liu,
Junhuan Yang,
Weiwen Jiang,
Youzuo Lin,
Lei Yang
Abstract:
Ultrasound computed tomography (USCT) is a promising technique that achieves superior medical imaging reconstruction resolution by fully leveraging waveform information, outperforming conventional ultrasound methods. Despite its advantages, high-quality USCT reconstruction relies on extensive data acquisition by a large number of transducers, leading to increased costs, computational demands, exte…
▽ More
Ultrasound computed tomography (USCT) is a promising technique that achieves superior medical imaging reconstruction resolution by fully leveraging waveform information, outperforming conventional ultrasound methods. Despite its advantages, high-quality USCT reconstruction relies on extensive data acquisition by a large number of transducers, leading to increased costs, computational demands, extended patient scanning times, and manufacturing complexities. To mitigate these issues, we propose a new USCT method called APS-USCT, which facilitates imaging with sparse data, substantially reducing dependence on high-cost dense data acquisition. Our APS-USCT method consists of two primary components: APS-wave and APS-FWI. The APS-wave component, an encoder-decoder system, preprocesses the waveform data, converting sparse data into dense waveforms to augment sample density prior to reconstruction. The APS-FWI component, utilizing the InversionNet, directly reconstructs the speed of sound (SOS) from the ultrasound waveform data. We further improve the model's performance by incorporating Squeeze-and-Excitation (SE) Blocks and source encoding techniques. Testing our method on a breast cancer dataset yielded promising results. It demonstrated outstanding performance with an average Structural Similarity Index (SSIM) of 0.8431. Notably, over 82% of samples achieved an SSIM above 0.8, with nearly 61% exceeding 0.85, highlighting the significant potential of our approach in improving USCT image reconstruction by efficiently utilizing sparse data.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training
Authors:
Lukuan Dong,
Donghong Qin,
Fengbo Bai,
Fanhua Song,
Yan Liu,
Chen Xu,
Zhijian Ou
Abstract:
The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense…
▽ More
The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense that the annotated speech is very limited. With less than 10 hours of transcribed Iu Mien language, this paper investigates and compares the three approaches for Iu Mien speech recognition. Our experiments are based on the recently released, three backbone models pretrained over the 10 languages from the CommonVoice dataset (CV-Lang10), which correspond to the three approaches for low-resourced ASR. It is found that phoneme supervision can achieve better results compared to subword supervision and self-supervision, thereby providing higher data-efficiency. Particularly, the Whistle models, i.e., obtained by the weakly-supervised phoneme-based multilingual pre-training, obtain the most competitive results.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Disturbance Observer for Estimating Coupled Disturbances
Authors:
Jindou Jia,
Yuhang Liu,
Kexin Guo,
Xiang Yu,
Lihua Xie,
Lei Guo
Abstract:
High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning phil…
▽ More
High-precision control for nonlinear systems is impeded by the low-fidelity dynamical model and external disturbance. Especially, the intricate coupling between internal uncertainty and external disturbance is usually difficult to be modeled explicitly. Here we show an effective and convergent algorithm enabling accurate estimation of the coupled disturbance via combining control and learning philosophies. Specifically, by resorting to Chebyshev series expansion, the coupled disturbance is firstly decomposed into an unknown parameter matrix and two known structures depending on system state and external disturbance respectively. A Regularized Least Squares (RLS) algorithm is subsequently formalized to learn the parameter matrix by using historical time-series data. Finally, a higher-order disturbance observer (HODO) is developed to achieve a high-precision estimation of the coupled disturbance by utilizing the learned portion. The efficiency of the proposed algorithm is evaluated through extensive simulations. We believe this work can offer a new option to merge learning schemes into the control framework for addressing existing intractable control problems.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Authors:
Huadai Liu,
Jialei Wang,
Rongjie Huang,
Yang Liu,
Jiayang Xu,
Zhou Zhao
Abstract:
Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accu…
▽ More
Text-guided diffusion models catalyze a paradigm shift in audio generation, facilitating the adaptability of source audio to conform to specific textual prompts. Recent advancements introduce inversion techniques, like DDIM inversion, to zero-shot editing, exploiting pre-trained diffusion models for audio modification. Nonetheless, our investigation exposes that DDIM inversion suffers from an accumulation of errors across each diffusion step, undermining its efficacy. And the lack of attention control hinders the fine-grained manipulations of music. To counteract these limitations, we introduce the \textit{Disentangled Inversion} technique, which is designed to disentangle the diffusion process into triple branches, thereby magnifying their individual capabilities for both precise editing and preservation. Furthermore, we propose the \textit{Harmonized Attention Control} framework, which unifies the mutual self-attention and cross-attention with an additional Harmonic Branch to achieve the desired composition and structural information in the target music. Collectively, these innovations comprise the \textit{Disentangled Inversion Control (DIC)} framework, enabling accurate music editing whilst safeguarding structural integrity. To benchmark audio editing efficacy, we introduce \textit{ZoME-Bench}, a comprehensive music editing benchmark hosting 1,100 samples spread across 10 distinct editing categories, which facilitates both zero-shot and instruction-based music editing tasks. Our method demonstrates unparalleled performance in edit fidelity and essential content preservation, outperforming contemporary state-of-the-art inversion techniques.
△ Less
Submitted 20 August, 2024; v1 submitted 18 July, 2024;
originally announced July 2024.
-
ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024
Authors:
Ruibo Fu,
Rui Liu,
Chunyu Qiang,
Yingming Gao,
Yi Lu,
Shuchen Shi,
Tao Wang,
Ya Li,
Zhengqi Wen,
Chen Zhang,
Hui Bu,
Yukun Liu,
Xin Qi,
Guanjun Li
Abstract:
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept…
▽ More
The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective perception in practical applications like companion robots for children and marketing bots. The core issue lies in the inconsistency between high-quality audio generation and the ultimate human subjective experience. Therefore, this challenge aims to enhance the persuasiveness and acceptability of synthesized audio, focusing on human alignment convincing and inspirational audio generation. A total of 19 teams have registered for the challenge, and the results of the competition and the competition are described in this paper.
△ Less
Submitted 31 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach
Authors:
Mingjie Shao,
Wei-Kun Chen,
Cheng-Yang Yu,
Ya-Feng Liu,
Wing-Kin Ma
Abstract:
As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-in…
▽ More
As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-input multiple-output (MIMO) detection in an uplink multiuser transmission scenario where the BS employs one-bit ADCs. One-bit quantization retains only the sign information and loses the amplitude information, which poses a unique challenge in the corresponding detection problem. The maximum-likelihood (ML) formulation of one-bit MIMO detection has a challenging likelihood function that hinders the application of many high-performance detectors developed for classic MIMO detection (under high-resolution ADCs). While many approximate methods for the ML detection problem have been studied, it lacks an efficient global algorithm. This paper fills this gap by proposing an efficient branch-and-bound algorithm, which is guaranteed to find the global solution of the one-bit ML MIMO detection problem. Additionally, a new amplitude retrieval (AR) detection approach is developed, incorporating explicit amplitude variables into the problem formulation. The AR approach yields simpler objective functions that enable the development of efficient algorithms offering both global and approximate solutions. The paper also contributes to the computational complexity analysis of both ML and AR detection problems. Extensive simulations are conducted to demonstrate the effectiveness and efficiency of the proposed formulations and algorithms.
△ Less
Submitted 16 July, 2024; v1 submitted 13 July, 2024;
originally announced July 2024.
-
Automated high-resolution backscattered-electron imaging at macroscopic scale
Authors:
Zhiyuan Lang,
Zunshuai Zhang,
Lei Wang,
Yuhan Liu,
Weixiong Qian,
Shenghua Zhou,
Ying Jiang,
Tongyi Zhang,
Jiong Yang
Abstract:
Scanning electron microscopy (SEM) has been widely utilized in the field of materials science due to its significant advantages, such as large depth of field, wide field of view, and excellent stereoscopic imaging. However, at high magnification, the limited imaging range in SEM cannot cover all the possible inhomogeneous microstructures. In this research, we propose a novel approach for generatin…
▽ More
Scanning electron microscopy (SEM) has been widely utilized in the field of materials science due to its significant advantages, such as large depth of field, wide field of view, and excellent stereoscopic imaging. However, at high magnification, the limited imaging range in SEM cannot cover all the possible inhomogeneous microstructures. In this research, we propose a novel approach for generating high-resolution SEM images across multiple scales, enabling a single image to capture physical dimensions at the centimeter level while preserving submicron-level details. We adopted the SEM imaging on the AlCoCrFeNi2.1 eutectic high entropy alloy (EHEA) as an example. SEM videos and image stitching are combined to fulfill this goal, and the video-extracted low-definition (LD) images are clarified by a well-trained denoising model. Furthermore, we segment the macroscopic image of the EHEA, and area of various microstructures are distinguished. Combining the segmentation results and hardness experiments, we found that the hardness is positively correlated with the content of body-centered cubic (BCC) phase, negatively correlated with the lamella width, and the relationship with the proportion of lamellar structures was not significant. Our work provides a feasible solution to generate macroscopic images based on SEMs for further analysis of the correlations between the microstructures and spatial distribution, and can be widely applied to other types of microscope.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Gravity Balanced Arm Exoskeleton for Basketball Shooting Training
Authors:
Yunfei Liu,
Zhanghao Yang
Abstract:
This paper proposes a gravity balanced arm exoskeleton design for basketball shooting training. The potential energy equation of the mechanism is derived. A simulation of the arm going through the basketball shooting motion is done on the mechanism. Throughout the motion the total potential energy is constant. Thus, the proposed arm exoskeleton is indeed gravity balanced with the use of two spring…
▽ More
This paper proposes a gravity balanced arm exoskeleton design for basketball shooting training. The potential energy equation of the mechanism is derived. A simulation of the arm going through the basketball shooting motion is done on the mechanism. Throughout the motion the total potential energy is constant. Thus, the proposed arm exoskeleton is indeed gravity balanced with the use of two springs.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Human Leg Training Machine Based on The Multi-linkage System
Authors:
Yunfei Liu,
Zhanghao Yang
Abstract:
In real life, many people have leg defects. the goal of our work is to design a mechanism which could help them walk based on a specific trajectory and realize flexible walking finally. In this paper, we use a motor to drive a multi-link leg mechanism. The major issues addressed in this paper are as follows: (i) design human leg training mechanism based on the multi-link mechanism (ii) Simulate le…
▽ More
In real life, many people have leg defects. the goal of our work is to design a mechanism which could help them walk based on a specific trajectory and realize flexible walking finally. In this paper, we use a motor to drive a multi-link leg mechanism. The major issues addressed in this paper are as follows: (i) design human leg training mechanism based on the multi-link mechanism (ii) Simulate leg movement trajectory of multi-link mechanism based on walking process (iii) make use of one motor torque control to control the trajectory and velocity of this mechanism.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Temperature Secret in Bathtub: A Model of Temperature Distribution of Bathtub Based on Heat Conduction Equation
Authors:
Yunfei Liu
Abstract:
We use the multidimensional heat conduction and heat transfer equations to model the temperature distribution of water in a bathtub by solving partial differential equations. We address optimal water addition and bathtub design. First, we establish a water surface cooling model using Newton's law of cooling to simulate heat exchange between air and water. Without new heat sources, the water temper…
▽ More
We use the multidimensional heat conduction and heat transfer equations to model the temperature distribution of water in a bathtub by solving partial differential equations. We address optimal water addition and bathtub design. First, we establish a water surface cooling model using Newton's law of cooling to simulate heat exchange between air and water. Without new heat sources, the water temperature reaches a minimum in 40 minutes. We then simulate adding hot water with a one-dimensional heat conduction model, including air cooling effects. We determine that the optimal heat input is 80 Joules and the optimal water velocity is 0.042 m/s to maintain temperature and save water. The ideal bathtub dimensions are 1.5m length, 0.6m width, 0.42m depth, with rounded corners. Using finite difference methods and MATLAB's Pdetool, we solve the heat conduction equation and verify numerical stability, discussing the model's pros and cons and suggesting improvements.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Perceived Time To Collision as Public Space Users' Discomfort Metric
Authors:
Alireza Jafari,
Yen-Chen Liu
Abstract:
Micro-mobility transport vehicles such as e-scooters are joining current sidewalk users and affect the safety and comfort of pedestrians as primary sidewalk users. The lack of agreed-upon metrics to quantify people's discomfort hinders shared public space safety research. We introduce perceived Time To Collision (TTC) as a potential metric of user discomfort performing controlled experiments using…
▽ More
Micro-mobility transport vehicles such as e-scooters are joining current sidewalk users and affect the safety and comfort of pedestrians as primary sidewalk users. The lack of agreed-upon metrics to quantify people's discomfort hinders shared public space safety research. We introduce perceived Time To Collision (TTC) as a potential metric of user discomfort performing controlled experiments using an e-scooter and a pedestrian moving in a hallway. The results strongly correlate the participant's reported discomfort and the perceived TTC. Therefore, TTC is a potential metric for public space users' discomfort. Since the metric only uses relative velocity and position information, it is a viable candidate for neighboring people's discomfort estimation in advanced driver assistance systems for e-scooters and PMVs. Our ongoing research extends the results to mobile robots.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Dynamic Modeling and Stability Analysis of Balancing in Riderless Electric Scooters
Authors:
Yun-Hao Lin,
Alireza Jafari,
Yen-Chen Liu
Abstract:
Today, electric scooter is a trendy personal mobility vehicle. The rising demand and opportunities attract ride-share services. A common problem of such services is abandoned e-scooters. An autonomous e-scooter capable of moving to the charging station is a solution. This paper focuses on maintaining balance for these riderless e-scooters. The paper presents a nonlinear model for an e-scooter movi…
▽ More
Today, electric scooter is a trendy personal mobility vehicle. The rising demand and opportunities attract ride-share services. A common problem of such services is abandoned e-scooters. An autonomous e-scooter capable of moving to the charging station is a solution. This paper focuses on maintaining balance for these riderless e-scooters. The paper presents a nonlinear model for an e-scooter moving with simultaneously varying speed and steering. A PD and a feedback-linearized PD controller stabilize the model. The stability analysis shows that the controllers are ultimately bounded even with parameter uncertainties and measurement inaccuracy. Simulations on a realistic e-scooter with a general demanding path to follow verify the ultimate boundedness of the controllers. In addition, the feedback-linearized PD controller outperforms the PD controller because it has narrower ultimate bounds. Future work focuses on experiments using a self-balancing mechanism installed on an e-scooter.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Physical encryption and decryption for secure data transmission in optical networks leveraging the temporal Talbot effect and microwave photonics
Authors:
Chulun Lin,
Taixia Shi,
Yiqing Liu,
Yang Chen
Abstract:
A novel microwave photonic scheme for secure data transmission in optical networks is proposed. The security of the scheme is guaranteed by physical encryption and decryption via the temporal Talbot effect in dispersive mediums. First, the original data is randomized in the digital domain by performing an exclusive OR operation using a random matrix. Subsequently, a time-varying multi-tone electri…
▽ More
A novel microwave photonic scheme for secure data transmission in optical networks is proposed. The security of the scheme is guaranteed by physical encryption and decryption via the temporal Talbot effect in dispersive mediums. First, the original data is randomized in the digital domain by performing an exclusive OR operation using a random matrix. Subsequently, a time-varying multi-tone electrical signal, which represents the randomized data matrix, is modulated onto an optical carrier. The optical signal after modulation is then phase-modulated by a temporal Talbot array illuminator (TAI) signal, and the optical signal after discrete quadratic phase modulation will lose its original appearance in the frequency domain and be further dispersed in the first dispersive medium. Due to the dispersion that does not match the TAI signal exactly, the waveform after the first dispersive medium is a noise-like signal. Hence, the physical encryption of the original data is successfully achieved. As the optical signal passes a second dispersive medium that makes the total dispersion match the TAI signal, the temporal waveform of the noise-like signal after photodetection is transformed into pulses. "1" and "0" in the randomized data matrix are represented through the presence and absence of pulses, and the physical decryption is achieved. By further processing the recovered data matrix using the random matrix, the original data can be recovered. The physical layer security of the proposed scheme and its fiber transmission capability are demonstrated. 8-Gbit/s data is transmitted, encrypted, and decrypted using two dispersive mediums and an optical fiber of 10 to 200 km, and error-free transmission is achieved. Many factors that affect the encryption, decryption, and transmission performance of the system have been analyzed.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.