-
Compartment-specific estimation of T2 and T2* with diffusion-PEPTIDE MRI
Authors:
Ting Gong,
Merlin J. Fair,
Kawin Setsompop,
Hui Zhang
Abstract:
We present a microstructure imaging technique for estimating compartment-specific T2 and T2* simultaneously in the human brain. Microstructure imaging with diffusion MRI (dMRI) has enabled the modelling of intra-neurite and extra-neurite diffusion signals separately allowing for the estimation of compartment-specific tissue properties. These compartment-specific properties have been widely used in…
▽ More
We present a microstructure imaging technique for estimating compartment-specific T2 and T2* simultaneously in the human brain. Microstructure imaging with diffusion MRI (dMRI) has enabled the modelling of intra-neurite and extra-neurite diffusion signals separately allowing for the estimation of compartment-specific tissue properties. These compartment-specific properties have been widely used in clinical studies. However, conventional dMRI cannot disentangle differences in relaxations between tissue compartments, causing biased estimates of diffusion measures which also change with TE. To solve the problem, combined relaxometry-diffusion imaging methods have been developed in recent years, providing compartmental T2-diffusion or T2*-diffusion imaging respectively, but not T2 and T2* together. As they provide complementary information, a technique that can estimate both jointly with diffusion is appealing to neuroimaging studies. The aim of this work is to develop a method to map compartmental T2-T2*-diffusion simultaneously. Using an advanced MRI acquisition called diffusion-PEPTIDE, a novel microstructure model is proposed and a multi-step fitting method is developed to estimate parameters of interest. We demonstrate for the first time that compartmental T2, T2* can be estimated simultaneously from in vivo data. we further show the accuracy and precision of parameter estimation with simulation.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Prescribed-time Convergent Distributed Multiobjective Optimization with Dynamic Event-triggered Communication
Authors:
Tengyang Gong,
Zhongguo Li,
Yiqiao Xu,
Zhengtao Ding
Abstract:
This paper addresses distributed constrained multiobjective resource allocation problems (DCMRAPs) within multi-agent networks, where each agent has multiple, potentially conflicting local objectives, constrained by both local and global constraints. By reformulating the DCMRAP as a single-objective weighted $L_p$ problem, a distributed solution is enabled, which eliminates the need for predetermi…
▽ More
This paper addresses distributed constrained multiobjective resource allocation problems (DCMRAPs) within multi-agent networks, where each agent has multiple, potentially conflicting local objectives, constrained by both local and global constraints. By reformulating the DCMRAP as a single-objective weighted $L_p$ problem, a distributed solution is enabled, which eliminates the need for predetermined weighting factors or centralized decision-making in traditional methods. Leveraging prescribed-time control and dynamic event-triggered mechanisms (ETMs), novel distributed algorithms are proposed to achieve Pareto optimality within a prescribed settling time through sampled communication. Using generalized time-based generators (TBGs), these algorithms provide more flexibility in optimizing accuracy and control smoothness without the constraints of initial conditions. Novel dynamic ETMs are designed to work with generalized TBGs to promote communication efficiency, which adjusts to both local error metrics and network-based disagreements. The Zeno behavior is excluded. Validated by Lyapunov analysis and simulations, our method demonstrates superior control performance and efficiency compared to existing methods, advancing distributed optimization in complex environments.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
V3rified: Revelation vs Non-Revelation Mechanisms for Decentralized Verifiable Computation
Authors:
Tiantian Gong,
Aniket Kate,
Alexandros Psomas,
Athina Terzoglou
Abstract:
In the era of Web3, decentralized technologies have emerged as the cornerstone of a new digital paradigm. Backed by a decentralized blockchain architecture, the Web3 space aims to democratize all aspects of the web. From data-sharing to learning models, outsourcing computation is an established, prevalent practice. Verifiable computation makes this practice trustworthy as clients/users can now eff…
▽ More
In the era of Web3, decentralized technologies have emerged as the cornerstone of a new digital paradigm. Backed by a decentralized blockchain architecture, the Web3 space aims to democratize all aspects of the web. From data-sharing to learning models, outsourcing computation is an established, prevalent practice. Verifiable computation makes this practice trustworthy as clients/users can now efficiently validate the integrity of a computation. As verifiable computation gets considered for applications in the Web3 space, decentralization is crucial for system reliability, ensuring that no single entity can suppress clients. At the same time, however, decentralization needs to be balanced with efficiency: clients want their computations done as quickly as possible.
Motivated by these issues, we study the trade-off between decentralization and efficiency when outsourcing computational tasks to strategic, rational solution providers. Specifically, we examine this trade-off when the client employs (1) revelation mechanisms, i.e. auctions, where solution providers bid their desired reward for completing the task by a specific deadline and then the client selects which of them will do the task and how much they will be rewarded, and (2) simple, non-revelation mechanisms, where the client commits to the set of rules she will use to map solutions at specific times to rewards and then solution providers decide whether they want to do the task or not. We completely characterize the power and limitations of revelation and non-revelation mechanisms in our model.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Distributed Feedback-Feedforward Algorithms for Time-Varying Resource Allocation
Authors:
Yiqiao Xu,
Tengyang Gong,
Zhengtao Ding,
Alessandra Parisio
Abstract:
In this paper, we address distributed Time-Varying Resource Allocation (TVRA) problem, where the local cost functions, global equality constraint, and Local Feasibility Constraints (LFCs) vary with time. To track the optimal trajectories, algorithms that mimic the structure of feedback-feedforward control systems are proposed. We begin with their conceptual design in the absence of LFCs, developin…
▽ More
In this paper, we address distributed Time-Varying Resource Allocation (TVRA) problem, where the local cost functions, global equality constraint, and Local Feasibility Constraints (LFCs) vary with time. To track the optimal trajectories, algorithms that mimic the structure of feedback-feedforward control systems are proposed. We begin with their conceptual design in the absence of LFCs, developing a feedback-feedforward algorithm that is fixed-time convergent. For cases with LFCs, existing approaches predominantly rely on constructing a time-dependent barrier function, which may impede the design of fixed-time convergent algorithms. Therefore, by exploring the connection between projection and penalty functions, switched feedforward laws are tailored to handle LFCs, with projection used in conjunction. Based on this, we develop a projection-based feedback-feedforward algorithm, which converges to the exact optimal trajectories, possibly along with a number of switching instants, while exhibiting fixed-time convergence between consecutive switching instants. Numerical experiments verify the effectiveness of the proposed algorithms.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization
Authors:
Changtao Miao,
Qi Chu,
Tao Gong,
Zhentao Tan,
Zhenchao Jin,
Wanyi Zhuang,
Man Luo,
Honggang Hu,
Nenghai Yu
Abstract:
With the advancement of face manipulation technology, forgery images in multi-face scenarios are gradually becoming a more complex and realistic challenge. Despite this, detection and localization methods for such multi-face manipulations remain underdeveloped. Traditional manipulation localization methods either indirectly derive detection results from localization masks, resulting in limited det…
▽ More
With the advancement of face manipulation technology, forgery images in multi-face scenarios are gradually becoming a more complex and realistic challenge. Despite this, detection and localization methods for such multi-face manipulations remain underdeveloped. Traditional manipulation localization methods either indirectly derive detection results from localization masks, resulting in limited detection performance, or employ a naive two-branch structure to simultaneously obtain detection and localization results, which cannot effectively benefit the localization capability due to limited interaction between two tasks. This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization. The MoNFAP primarily introduces two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM). The FUP integrates detection and localization tasks using a token learning strategy and multiple forgery-aware transformers, which facilitates the use of classification information to enhance localization capability. Besides, motivated by the crucial role of noise information in forgery detection, the MNM leverages multiple noise extractors based on the concept of the mixture of experts to enhance the general RGB features, further boosting the performance of our framework. Finally, we establish a comprehensive benchmark for multi-face detection and localization and the proposed \textit{MoNFAP} achieves significant performance. The codes will be made available.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Three-dimensional solitons supported by the spin-orbit coupling and Rydberg-Rydberg interactions in PT-symmetric potentials
Authors:
Yuan Zhao,
Qihong Huang,
Tixian Gong,
Siliu Xu,
Zeping Li,
Boris A. Malomed
Abstract:
Excited states (ESs) of two- and three-dimensional (2D and 3D) solitons of the semivortex (SV) and mixed-mode (MM) types, supported by the interplay of the spin-orbit coupling (SOC) and local nonlinearity in binary Bose-Einstein condensates, are unstable, on the contrary to the stability of the SV and MM solitons in their fundamental states. We propose a stabilization strategy for these states in…
▽ More
Excited states (ESs) of two- and three-dimensional (2D and 3D) solitons of the semivortex (SV) and mixed-mode (MM) types, supported by the interplay of the spin-orbit coupling (SOC) and local nonlinearity in binary Bose-Einstein condensates, are unstable, on the contrary to the stability of the SV and MM solitons in their fundamental states. We propose a stabilization strategy for these states in 3D, combining SOC and long-range Rydberg-Rydberg interactions (RRI), in the presence of a spatially-periodic potential, that may include a parity-time (PT)-symmetric component. ESs of the SV solitons, which carry integer vorticities S and S+1 in their two components, exhibit robustness up to S= 4. ESs of MM solitons feature an interwoven necklace-like structure, with the components carrying opposite fractional values of the orbital angular momentum. Regions of the effective stability of the 3D solitons of the SV and MM types (both fundamental ones and ESs), are identified as functions of the imaginary component of the PT-symmetric potential and strengths of the SOC and RRI terms.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Joint Active and Passive Beamforming Design for IRS-aided MIMO ISAC Based on Sensing Mutual Information
Authors:
Jin Li,
Gui Zhou,
Tantao Gong,
Nan Liu,
Rui Zhang
Abstract:
In this paper, we investigate the intelligent reflecting surface (IRS)/reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system based on sensing mutual information (MI). Specifically, the base station (BS) perceives the sensing target via the reflected sensing signal by the IRS, while communicating with the users simultaneously. Our aim is to maximize the s…
▽ More
In this paper, we investigate the intelligent reflecting surface (IRS)/reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) system based on sensing mutual information (MI). Specifically, the base station (BS) perceives the sensing target via the reflected sensing signal by the IRS, while communicating with the users simultaneously. Our aim is to maximize the sensing MI, subject to the quality of service (QoS) constraints for all communication users, the transmit power constraint at the BS, and the unit-modulus constraint on the IRS's passive reflection. We solve this problem under two cases: one simplified case assuming a line-of-sight (LoS) channel between the BS and IRS and no clutter interference to sensing, and the other generalized case considering the Rician fading channel of the BS-IRS link and the presence of clutter interference to sensing. For the first case, we show that the dedicated sensing beamformer cannot enhance the sensing MI if the BS-user direct links are blocked, and develop a low-complexity iterative algorithm to jointly optimize the BS and IRS active/passive beamformers. In contrast, for the second case, we propose an alternative iterative algorithm, which can also be applied to the first case, to solve the beamforming design problem under the general setup. Numerical results are provided to validate the performance of the proposed algorithms, as compared to various benchmark schemes.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Homotopy Types Of Toric Orbifolds From Weyl Polytopes
Authors:
Tao Gong
Abstract:
Given a reduced crystallographic root system with a fixed simple system, it is associated to a Weyl group $W$, parabolic subgroups $W_K$'s and a polytope $P$ which is the convex hull of a dominant weight. The quotient $P/W_K$ can be identified with a polytope. Polytopes $P$ and $P/W_K$ are associated to toric varieties $X_P$ and $X_{P/W_K}$ respectively. It turns out the underlying topological spa…
▽ More
Given a reduced crystallographic root system with a fixed simple system, it is associated to a Weyl group $W$, parabolic subgroups $W_K$'s and a polytope $P$ which is the convex hull of a dominant weight. The quotient $P/W_K$ can be identified with a polytope. Polytopes $P$ and $P/W_K$ are associated to toric varieties $X_P$ and $X_{P/W_K}$ respectively. It turns out the underlying topological spaces $X_P/W_K$ and $X_{P/W_K}$ are homotopy equivalent, when considering the polytopes in the real span of the root lattice or of the weight lattice.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Authors:
Hyungjun Yoon,
Biniyam Aschalew Tolera,
Taesik Gong,
Kimin Lee,
Sung-Ju Lee
Abstract:
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt…
▽ More
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts and reducing token costs by 15.8x. Our findings highlight the effectiveness and cost-efficiency of visual prompts with MLLMs for various sensory tasks.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Microstructure.jl: a Julia Package for Probabilistic Microstructure Model Fitting with Diffusion MRI
Authors:
Ting Gong,
Anastasia Yendiki
Abstract:
Microstructure.jl is a Julia package designed for probabilistic microstructure imaging using diffusion and combined diffusion-relaxometry MRI techniques. It provides a flexible and extendable framework for compartment models and includes robust and unified estimators for parameter fitting and uncertainty quantification. The package incorporates several established models from the literature, such…
▽ More
Microstructure.jl is a Julia package designed for probabilistic microstructure imaging using diffusion and combined diffusion-relaxometry MRI techniques. It provides a flexible and extendable framework for compartment models and includes robust and unified estimators for parameter fitting and uncertainty quantification. The package incorporates several established models from the literature, such as the spherical mean technique and soma and neurite density imaging (SANDI), along with their extensions for analyzing combined diffusion and T2 mapping data acquired at multiple echo times. For parameter estimation, it features methods like Markov Chain Monte Carlo (MCMC) sampling and Monte Carlo dropout with neural networks, which provide probabilistic estimates by approximating the posteriors of modelling parameters. Microstructure.jl is applicable to both in vivo and ex vivo imaging data. We are currently testing and optimizing the package and are pleased to share its major functionalities, design, and documentation progress.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Interplay between MRI-based axon diameter and myelination estimates in macaque and human brain
Authors:
Ting Gong,
Chiara Maffei,
Evan Dann,
Hong-Hsi Lee,
Hansol Lee,
Jean C. Augustinack,
Susie Y. Huang,
Suzanne N. Haber,
Anastasia Yendiki
Abstract:
Axon diameter and myelin thickness are closely related microstructural tissue properties that affect the conduction velocity of action potentials in the nervous system. Imaging them non-invasively with MRI-based methods is thus valuable for studying brain microstructure and function. However, the relationship between MRI-based axon diameter and myelination measures has not been investigated across…
▽ More
Axon diameter and myelin thickness are closely related microstructural tissue properties that affect the conduction velocity of action potentials in the nervous system. Imaging them non-invasively with MRI-based methods is thus valuable for studying brain microstructure and function. However, the relationship between MRI-based axon diameter and myelination measures has not been investigated across the brain, mainly due to methodological limitations in estimating axon diameters. In recent years, studies using ultra-high gradient strength diffusion MRI (dMRI) have demonstrated improved estimation of axon diameter across white-matter (WM) tracts in the human brain, making such investigations feasible. In this study, we aim to investigate relationships between tissue microstructure properties with MRI-based methods and compare the imaging findings to histological evidence from the literature. We collected dMRI with ultra-high gradient strength and multi-echo spin-echo MRI on ex vivo macaque and human brain samples on a preclinical scanner. From these data, we estimated axon diameter, intra-axonal signal fraction, myelin water fraction (MWF) and aggregate g-ratio and investigated their correlations. We found that the microstructural imaging parameters exhibited consistent patterns across WM tracts and species. Overall, the findings suggest that MRI-based axon geometry and myelination measures can provide complementary information about fiber morphology, and the relationships between these measures agree with prior histological evidence.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Distribution-Free Online Change Detection for Low-Rank Images
Authors:
Tingnan Gong,
Seong-Hee Kim,
Yao Xie
Abstract:
We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific para…
▽ More
We present a distribution-free CUSUM procedure designed for online change detection in a time series of low-rank images, particularly when the change causes a mean shift. We represent images as matrix data and allow for temporal dependence, in addition to inherent spatial dependence, before and after the change. The marginal distributions are assumed to be general, not limited to any specific parametric distribution. We propose new monitoring statistics that utilize the low-rank structure of the in-control mean matrix. Additionally, we study the properties of the proposed detection procedure, assessing whether the monitoring statistics effectively capture a mean shift and evaluating the rate of increase in average run length relative to the control limit in both in-control and out-of-control cases. The effectiveness of our procedure is demonstrated through simulated and real data experiments.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
How Does Distribution Matching Help Domain Generalization: An Information-theoretic Analysis
Authors:
Yuxin Dong,
Tieliang Gong,
Hong Chen,
Shuangyong Song,
Weizhan Zhang,
Chen Li
Abstract:
Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution match…
▽ More
Domain generalization aims to learn invariance across multiple training domains, thereby enhancing generalization against out-of-distribution data. While gradient or representation matching algorithms have achieved remarkable success, these methods generally lack generalization guarantees or depend on strong assumptions, leaving a gap in understanding the underlying mechanism of distribution matching. In this work, we formulate domain generalization from a novel probabilistic perspective, ensuring robustness while avoiding overly conservative solutions. Through comprehensive information-theoretic analysis, we provide key insights into the roles of gradient and representation matching in promoting generalization. Our results reveal the complementary relationship between these two components, indicating that existing works focusing solely on either gradient or representation alignment are insufficient to solve the domain generalization problem. In light of these theoretical findings, we introduce IDM to simultaneously align the inter-domain gradients and representations. Integrated with the proposed PDM method for complex distribution matching, IDM achieves superior performance over various baseline methods.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Electromagnetic Information Theory for Holographic MIMO Communications
Authors:
Li Wei,
Tierui Gong,
Chongwen Huang,
Zhaoyang Zhang,
Wei E. I. Sha,
Zhi Ning Chen,
Linglong Dai,
Merouane Debbah,
Chau Yuen
Abstract:
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far it…
▽ More
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far its capabilities can be extended. However, the traditional Shannon information theory falls short in addressing these inquiries because it only focuses on the information itself while neglecting the underlying carrier, electromagnetic (EM) waves, and environmental interactions. To fill up the gap between the theoretical analysis and the practical application for HMIMO systems, we introduce electromagnetic information theory (EIT) in this paper. This paper begins by laying the foundation for HMIMO-oriented EIT, encompassing EM wave equations and communication regions. In the context of HMIMO systems, the resultant physical limitations are presented, involving Chu's limit, Harrington's limit, Hannan's limit, and the evaluation of coupling effects. Field sampling and HMIMO-assisted oversampling are also discussed to guide the optimal HMIMO design within the EIT framework. To comprehensively depict the EM-compliant propagation process, we present the approximate and exact channel modeling approaches in near-/far-field zones. Furthermore, we discuss both traditional Shannon's information theory, employing the probabilistic method, and Kolmogorov information theory, utilizing the functional analysis, for HMIMO-oriented EIT systems.
△ Less
Submitted 25 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay
Authors:
Hyungjun Yoon,
Jaehyun Kwak,
Biniyam Aschalew Tolera,
Gaole Dai,
Mo Li,
Taesik Gong,
Kimin Lee,
Sung-Ju Lee
Abstract:
Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine…
▽ More
Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine-tuned in heterogeneous domains. To address the issue, we propose ADAPT^2, a few-shot domain adaptation framework for personalizing self-supervised models. ADAPT2 proposes self-supervised meta-learning for initial model pre-training, followed by a user-side model adaptation by replaying the self-supervision with user-specific data. This allows models to adjust their pre-trained representations to the user with only a few samples. Evaluation with four benchmarks demonstrates that ADAPT^2 outperforms existing baselines by an average F1-score of 8.8%p. Our on-device computational overhead analysis on a commodity off-the-shelf (COTS) smartphone shows that ADAPT2 completes adaptation within an unobtrusive latency (in three minutes) with only a 9.54% memory consumption, demonstrating the computational efficiency of the proposed method.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
OneActor: Consistent Character Generation via Cluster-Conditioned Guidance
Authors:
Jiahao Wang,
Caixia Yan,
Haonan Lin,
Weizhan Zhang,
Mengmeng Wang,
Tieliang Gong,
Guang Dai,
Hao Sun
Abstract:
Text-to-image diffusion models benefit artists with high-quality image generation. Yet their stochastic nature hinders artists from creating consistent images of the same subject. Existing methods try to tackle this challenge and generate consistent content in various ways. However, they either depend on external restricted data or require expensive tuning of the diffusion model. For this issue, w…
▽ More
Text-to-image diffusion models benefit artists with high-quality image generation. Yet their stochastic nature hinders artists from creating consistent images of the same subject. Existing methods try to tackle this challenge and generate consistent content in various ways. However, they either depend on external restricted data or require expensive tuning of the diffusion model. For this issue, we propose a novel one-shot tuning paradigm, termed as OneActor. It efficiently performs consistent subject generation solely driven by prompts via a learned semantic guidance to bypass the laborious backbone tuning. We lead the way to formalize the objective of consistent subject generation from a clustering perspective, and thus design a cluster-conditioned model. To mitigate the overfitting challenge shared by one-shot tuning pipelines, we augment the tuning with auxiliary samples and devise two inference strategies: semantic interpolation and cluster guidance. These techniques are later verified to significantly enhance the generation quality. Comprehensive experiments show that our method outperforms a variety of baselines with satisfactory subject consistency, superior prompt conformity as well as high image quality. Our method is capable of multi-subject generation and compatible with popular diffusion extensions. Besides, we achieve a 4 times faster tuning speed than tuning-based baselines and, if desired, avoid increasing inference time. Furthermore, to our best knowledge, we are the first to prove that the semantic space of the diffusion model has the same interpolation property as the latent space does. This property can serve as another promising tool for fine generation control.
△ Less
Submitted 12 July, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Authors:
Taeckyung Lee,
Sorn Chottananurak,
Taesik Gong,
Sung-Ju Lee
Abstract:
Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, suc…
▽ More
Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data. However, TTA faces challenges of adaptation failures due to its reliance on blind adaptation to unknown test samples in dynamic scenarios. Traditional methods for out-of-distribution performance estimation are limited by unrealistic assumptions in the TTA context, such as requiring labeled data or re-training models. To address this issue, we propose AETTA, a label-free accuracy estimation algorithm for TTA. We propose the prediction disagreement as the accuracy estimate, calculated by comparing the target model prediction with dropout inferences. We then improve the prediction disagreement to extend the applicability of AETTA under adaptation failures. Our extensive evaluation with four baselines and six TTA methods demonstrates that AETTA shows an average of 19.8%p more accurate estimation compared with the baselines. We further demonstrate the effectiveness of accuracy estimation with a model recovery case study, showcasing the practicality of our model recovery based on accuracy estimation. The source code is available at https://github.com/taeckyung/AETTA.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
An AI-Native Runtime for Multi-Wearable Environments
Authors:
Chulhong Min,
Utku Günay Acer,
SiYoung Jang,
Sangwon Choi,
Diana A. Vasile,
Taesik Gong,
Juheon Yi,
Fahim Kawsar
Abstract:
The miniaturization of AI accelerators is paving the way for next-generation wearable applications within wearable technologies. We introduce Mojito, an AI-native runtime with advanced MLOps designed to facilitate the development and deployment of these applications on wearable devices. It emphasizes the necessity of dynamic orchestration of distributed resources equipped with ultra-low-power AI a…
▽ More
The miniaturization of AI accelerators is paving the way for next-generation wearable applications within wearable technologies. We introduce Mojito, an AI-native runtime with advanced MLOps designed to facilitate the development and deployment of these applications on wearable devices. It emphasizes the necessity of dynamic orchestration of distributed resources equipped with ultra-low-power AI accelerators to overcome challenges associated with unpredictable runtime environments. Through its innovative approaches, Mojito demonstrates how future wearable technologies can evolve to be more autonomous.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Near-Field Channel Modeling for Holographic MIMO Communications
Authors:
Tierui Gong,
Li Wei,
Chongwen Huang,
George C. Alexandropoulos,
Mérouane Debbah,
Chau Yuen
Abstract:
Empowered by the latest progress on innovative metamaterials/metasurfaces and advanced antenna technologies, holographic multiple-input multiple-output (H-MIMO) emerges as a promising technology to fulfill the extreme goals of the sixth-generation (6G) wireless networks. The antenna arrays utilized in H-MIMO comprise massive (possibly to extreme extent) numbers of antenna elements, densely spaced…
▽ More
Empowered by the latest progress on innovative metamaterials/metasurfaces and advanced antenna technologies, holographic multiple-input multiple-output (H-MIMO) emerges as a promising technology to fulfill the extreme goals of the sixth-generation (6G) wireless networks. The antenna arrays utilized in H-MIMO comprise massive (possibly to extreme extent) numbers of antenna elements, densely spaced less than half-a-wavelength and integrated into a compact space, realizing an almost continuous aperture. Thanks to the expected low cost, size, weight, and power consumption, such apertures are expected to be largely fabricated for near-field communications. In addition, the physical features of H-MIMO enable manipulations directly on the electromagnetic (EM) wave domain and spatial multiplexing. To fully leverage this potential, near-field H-MIMO channel modeling, especially from the EM perspective, is of paramount significance. In this article, we overview near-field H-MIMO channel models elaborating on the various modeling categories and respective features, as well as their challenges and evaluation criteria. We also present EM-domain channel models that address the inherit computational and measurement complexities. Finally, the article is concluded with a set of future research directions on the topic.
△ Less
Submitted 16 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Casimir repulsion with biased semiconductors
Authors:
Benjamin Spreng,
Calum Shelden,
Tao Gong,
Jeremy N. Munday
Abstract:
Quantum and thermal fluctuations are fundamental to a plethora of phenomena within quantum optics, including the Casimir effect that acts between closely separated surfaces typically found in MEMS and NEMS devices. Particularly promising for engineering and harnessing these forces are systems out of thermal equilibrium. Recently, semiconductors with external bias have been proposed to study the no…
▽ More
Quantum and thermal fluctuations are fundamental to a plethora of phenomena within quantum optics, including the Casimir effect that acts between closely separated surfaces typically found in MEMS and NEMS devices. Particularly promising for engineering and harnessing these forces are systems out of thermal equilibrium. Recently, semiconductors with external bias have been proposed to study the nonequilibrium Casimir force. Here, we explore systems involving moderately biased semiconductors that exhibit strong repulsive Casimir forces, and we determine the effects of bias voltage, semiconductor bandgap energy, and separation for experimentally accessible configurations. Modes emitted from the semiconductors exert a repulsive force on a near surface that overcomes the attractive equilibrium Casimir force contribution at submicron distances. For the geometry of two parallel planes, those modes undergo Fabry-Pérot interference resulting in an oscillatory force behavior as a function of separation. Utilizing the proximity-force approximation, we predict that the repulsive force exerted on a gold sphere is well within the accuracy of typical Casimir force experiments. Our work opens up new possibilities of controlling forces at the nano- and micrometer scale with applications in sensing and actuation in nanotechnology.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
MiM-ISTD: Mamba-in-Mamba for Efficient Infrared Small Target Detection
Authors:
Tianxiang Chen,
Zi Ye,
Zhentao Tan,
Tao Gong,
Yue Wu,
Qi Chu,
Bin Liu,
Nenghai Yu,
Jieping Ye
Abstract:
Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the models combining CNNs with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to sequence length. Inspired by the recent basic mode…
▽ More
Recently, infrared small target detection (ISTD) has made significant progress, thanks to the development of basic models. Specifically, the models combining CNNs with transformers can successfully extract both local and global features. However, the disadvantage of the transformer is also inherited, i.e., the quadratic computational complexity to sequence length. Inspired by the recent basic model with linear complexity for long-distance modeling, Mamba, we explore the potential of this state space model for ISTD task in terms of effectiveness and efficiency in the paper. However, directly applying Mamba achieves suboptimal performances due to the insufficient harnessing of local features, which are imperative for detecting small targets. Instead, we tailor a nested structure, Mamba-in-Mamba (MiM-ISTD), for efficient ISTD. It consists of Outer and Inner Mamba blocks to adeptly capture both global and local features. Specifically, we treat the local patches as "visual sentences" and use the Outer Mamba to explore the global information. We then decompose each visual sentence into sub-patches as "visual words" and use the Inner Mamba to further explore the local information among words in the visual sentence with negligible computational costs. By aggregating the visual word and visual sentence features, our MiM-ISTD can effectively explore both global and local information. Experiments on NUAA-SIRST and IRSTD-1k show the superior accuracy and efficiency of our method. Specifically, MiM-ISTD is $8 \times$ faster than the SOTA method and reduces GPU memory usage by 62.2$\%$ when testing on $2048 \times 2048$ images, overcoming the computation and memory constraints on high-resolution infrared images.
△ Less
Submitted 24 June, 2024; v1 submitted 4 March, 2024;
originally announced March 2024.
-
OTFS vs OFDM: Which is Superior in Multiuser LEO Satellite Communications
Authors:
Yu Liu,
Ming Chen,
Cunhua Pan,
Tantao Gong,
Jinhong Yuan,
Jiangzhou Wang
Abstract:
Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satell…
▽ More
Orthogonal time frequency space (OTFS) modulation, a delay-Doppler (DD) domain communication scheme exhibiting strong robustness against the Doppler shifts, has the potentials to be employed in LEO satellite communications. However, the performance comparison with the orthogonal frequency division multiplexing (OFDM) modulation and the resource allocation scheme for multiuser OTFS-based LEO satellite communication system have rarely been investigated. In this paper, we conduct a performance comparison under various channel conditions between the OTFS and OFDM modulations, encompassing evaluations of sum-rate and bit error ratio (BER). Additionally, we investigate the joint optimal allocation of power and delay-Doppler resource blocks aiming at maximizing sum-rate for multiuser downlink OTFS-based LEO satellite communication systems. Unlike the conventional modulations relaying on complex input-output relations within the Time-Frequency (TF) domain, the OTFS modulation exploits both time and frequency diversities, i.e., delay and Doppler shifts remain constant during a OTFS frame, which facilitates a DD domain input-output simple relation for our investigation. We transform the resulting non-convex and combinatorial optimization problem into an equivalent difference of convex problem by decoupling the conditional constraints, and solve the transformed problem via penalty convex-concave procedure algorithm. Simulation results demonstrate that the OTFS modulation is robust to carrier frequency offsets (CFO) caused by high-mobility of LEO satellites, and has superior performance to the OFDM modulation. Moreover, numerical results indicate that our proposed resource allocation scheme has higher sum-rate than existed schemes for the OTFS modulation, such as delay divided multiple access and Doppler divided multiple access, especially in the high signal-to-noise ratio (SNR) regime.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Bootstrapping Audio-Visual Segmentation by Strengthening Audio Cues
Authors:
Tianxiang Chen,
Zhentao Tan,
Tao Gong,
Qi Chu,
Yue Wu,
Bin Liu,
Le Lu,
Jieping Ye,
Nenghai Yu
Abstract:
How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate…
▽ More
How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual segmentation (AVS) task has been proposed, aiming to segment the sounding objects in video frames under the guidance of audio cues. However, most existing AVS methods are hindered by a modality imbalance where the visual features tend to dominate those of the audio modality, due to a unidirectional and insufficient integration of audio cues. This imbalance skews the feature representation towards the visual aspect, impeding the learning of joint audio-visual representations and potentially causing segmentation inaccuracies. To address this issue, we propose AVSAC. Our approach features a Bidirectional Audio-Visual Decoder (BAVD) with integrated bidirectional bridges, enhancing audio cues and fostering continuous interplay between audio and visual modalities. This bidirectional interaction narrows the modality imbalance, facilitating more effective learning of integrated audio-visual representations. Additionally, we present a strategy for audio-visual frame-wise synchrony as fine-grained guidance of BAVD. This strategy enhances the share of auditory components in visual features, contributing to a more balanced audio-visual representation learning. Extensive experiments show that our method attains new benchmarks in AVS performance.
△ Less
Submitted 6 February, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Synergy: Towards On-Body AI via Tiny AI Accelerator Collaboration on Wearables
Authors:
Taesik Gong,
Si Young Jang,
Utku Günay Acer,
Fahim Kawsar,
Chulhong Min
Abstract:
The advent of tiny artificial intelligence (AI) accelerators enables AI to run at the extreme edge, offering reduced latency, lower power cost, and improved privacy. When integrated into wearable devices, these accelerators open exciting opportunities, allowing various AI apps to run directly on the body. We present Synergy that provides AI apps with best-effort performance via system-driven holis…
▽ More
The advent of tiny artificial intelligence (AI) accelerators enables AI to run at the extreme edge, offering reduced latency, lower power cost, and improved privacy. When integrated into wearable devices, these accelerators open exciting opportunities, allowing various AI apps to run directly on the body. We present Synergy that provides AI apps with best-effort performance via system-driven holistic collaboration over AI accelerator-equipped wearables. To achieve this, Synergy provides device-agnostic programming interfaces to AI apps, giving the system visibility and controllability over the app's resource use. Then, Synergy maximizes the inference throughput of concurrent AI models by creating various execution plans for each app considering AI accelerator availability and intelligently selecting the best set of execution plans. Synergy further improves throughput by leveraging parallelization opportunities over multiple computation units. Our evaluations with 7 baselines and 8 models demonstrate that, on average, Synergy achieves a 23.0 times improvement in throughput, while reducing latency by 73.9% and power consumption by 15.8%, compared to the baselines.
△ Less
Submitted 2 July, 2024; v1 submitted 11 December, 2023;
originally announced January 2024.
-
Tunable terahertz photodetector using ferroelectric-integrated graphene plasmonics for portable spectrometer
Authors:
Lin Lin,
Junxiong Guo,
Shangdong Li,
Tianxun Gong,
Juan Xia,
Zenghui Wang,
Jun Tang,
Yang Zhang,
Jinxing Zhang,
Yuan Lin,
Wen Huang,
Xiaosheng Zhang
Abstract:
Terahertz (THz) detector has great potential for use in imaging, spectroscopy, and communications due to its fascinating interactions between radiation and matter. However, current THz detection devices have limitations in sensitivity, operating frequency range, and bulky footprint. While recent ferroelectric-integrated graphene plasmonic devices show promise in overcoming these limitations, they…
▽ More
Terahertz (THz) detector has great potential for use in imaging, spectroscopy, and communications due to its fascinating interactions between radiation and matter. However, current THz detection devices have limitations in sensitivity, operating frequency range, and bulky footprint. While recent ferroelectric-integrated graphene plasmonic devices show promise in overcoming these limitations, they are not yet extended to the THz range. Here, we propose a wavelength-sensitive terahertz detector that uses a single layer graphene integrated onto the ferroelectric thin film with patterned polarization domains. This device works at room temperature, with high responsivity and detectivity by coupling graphene plasmons with THz frequencies through spatial modulation of carrier behaviors using ferroelectric polarization, without requiring additional local electrodes. By reconfiguring an interweaving squared ferroelectric domain array with alternating upward and downward polarizations to highly confine graphene surface plasmon polaritons, our device achieves an ultrahigh responsivity of 1717 A W-1 and a normalized detectivity of 1.07*10^13 Jones at a resonance frequency of 6.30 THz and a 0.3 V bias voltage. We also show that the device makes possible for spectrum reconstruction application of portable spectrometer combining the mathematical algorithms.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
High-resolution myelin-water fraction and quantitative relaxation mapping using 3D ViSTa-MR fingerprinting
Authors:
Congyu Liao,
Xiaozhi Cao,
Siddharth Srinivasan Iyer,
Sophie Schauman,
Zihan Zhou,
Xiaoqian Yan,
Quan Chen,
Zhitao Li,
Nan Wang,
Ting Gong,
Zhe Wu,
Hongjian He,
Jianhui Zhong,
Yang Yang,
Adam Kerr,
Kalanit Grill-Spector,
Kawin Setsompop
Abstract:
Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous mapping of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time.
Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MR…
▽ More
Purpose: This study aims to develop a high-resolution whole-brain multi-parametric quantitative MRI approach for simultaneous mapping of myelin-water fraction (MWF), T1, T2, and proton-density (PD), all within a clinically feasible scan time.
Methods: We developed 3D ViSTa-MRF, which combined Visualization of Short Transverse relaxation time component (ViSTa) technique with MR Fingerprinting (MRF), to achieve high-fidelity whole-brain MWF and T1/T2/PD mapping on a clinical 3T scanner. To achieve fast acquisition and memory-efficient reconstruction, the ViSTa-MRF sequence leverages an optimized 3D tiny-golden-angle-shuffling spiral-projection acquisition and joint spatial-temporal subspace reconstruction with optimized preconditioning algorithm. With the proposed ViSTa-MRF approach, high-fidelity direct MWF mapping was achieved without a need for multi-compartment fitting that could introduce bias and/or noise from additional assumptions or priors.
Results: The in-vivo results demonstrate the effectiveness of the proposed acquisition and reconstruction framework to provide fast multi-parametric mapping with high SNR and good quality. The in-vivo results of 1mm- and 0.66mm-iso datasets indicate that the MWF values measured by the proposed method are consistent with standard ViSTa results that are 30x slower with lower SNR. Furthermore, we applied the proposed method to enable 5-minute whole-brain 1mm-iso assessment of MWF and T1/T2/PD mappings for infant brain development and for post-mortem brain samples.
Conclusions: In this work, we have developed a 3D ViSTa-MRF technique that enables the acquisition of whole-brain MWF, quantitative T1, T2, and PD maps at 1mm and 0.66mm isotropic resolution in 5 and 15 minutes, respectively. This advancement allows for quantitative investigations of myelination changes in the brain.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Towards More Unified In-context Visual Understanding
Authors:
Dianmo Sheng,
Dongdong Chen,
Zhentao Tan,
Qiankun Liu,
Qi Chu,
Jianmin Bao,
Tao Gong,
Bin Liu,
Shengwei Xu,
Nenghai Yu
Abstract:
The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content ac…
▽ More
The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content across multiple modalities, which limits their potential usage scenarios. To address this issue, we present a new ICL framework for visual understanding with multi-modal output enabled. First, we quantize and embed both text and visual prompt into a unified representational space, structured as interleaved in-context sequences. Then a decoder-only sparse transformer architecture is employed to perform generative modeling on them, facilitating in-context learning. Thanks to this design, the model is capable of handling in-context vision understanding tasks with multimodal output in a unified pipeline.Experimental results demonstrate that our model achieves competitive performance compared with specialized models and previous ICL baselines. Overall, our research takes a further step toward unified multimodal in-context learning.
△ Less
Submitted 16 March, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
SoTTA: Robust Test-Time Adaptation on Noisy Data Streams
Authors:
Taesik Gong,
Yewon Kim,
Taeckyung Lee,
Sorn Chottananurak,
Sung-Ju Lee
Abstract:
Test-time adaptation (TTA) aims to address distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation. However, most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild. For instance, an unseen object or noise could appear in autonomous driving. This leads to a new threat to existing…
▽ More
Test-time adaptation (TTA) aims to address distributional shifts between training and testing data using only unlabeled test data streams for continual model adaptation. However, most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild. For instance, an unseen object or noise could appear in autonomous driving. This leads to a new threat to existing TTA algorithms; we found that prior TTA algorithms suffer from those noisy test samples as they blindly adapt to incoming samples. To address this problem, we present Screening-out Test-Time Adaptation (SoTTA), a novel TTA algorithm that is robust to noisy samples. The key enabler of SoTTA is two-fold: (i) input-wise robustness via high-confidence uniform-class sampling that effectively filters out the impact of noisy samples and (ii) parameter-wise robustness via entropy-sharpness minimization that improves the robustness of model parameters against large gradients from noisy samples. Our evaluation with standard TTA benchmarks with various noisy scenarios shows that our method outperforms state-of-the-art TTA methods under the presence of noisy samples and achieves comparable accuracy to those methods without noisy samples. The source code is available at https://github.com/taeckyung/SoTTA .
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Quick and Consistent Sparsity Estimation for Streaming Images with Noise
Authors:
Tingnan Gong
Abstract:
Given fruitful works in the image monitoring, there is a lack of data-driven tools guiding the practitioners to select proper monitoring procedures. The potential model mismatch caused by the arbitrary selection could deviate the empirical detection delay from their theoretical analysis and bias the prognosis. In the image monitoring, the sparsity of the underlying anomaly is one of the attributes…
▽ More
Given fruitful works in the image monitoring, there is a lack of data-driven tools guiding the practitioners to select proper monitoring procedures. The potential model mismatch caused by the arbitrary selection could deviate the empirical detection delay from their theoretical analysis and bias the prognosis. In the image monitoring, the sparsity of the underlying anomaly is one of the attributes on which the development of many monitoring procedures is highly based. This paper proposes a computational-friendly sparsity index, the corrected Hoyer index, to estimate the sparsity of the underlying anomaly interrupted by noise. We theoretically prove the consistency of the constructed sparsity index. We use simulations to validate the consistency and demonstrate the robustness against the noise. We also provide the insights on how to guide the real applications with the proposed sparsity index.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Prototype-guided Cross-modal Completion and Alignment for Incomplete Text-based Person Re-identification
Authors:
Tiantian Gong,
Guodong Du,
Junsheng Wang,
Yongkang Ding,
Liyan Zhang
Abstract:
Traditional text-based person re-identification (ReID) techniques heavily rely on fully matched multi-modal data, which is an ideal scenario. However, due to inevitable data missing and corruption during the collection and processing of cross-modal data, the incomplete data issue is usually met in real-world applications. Therefore, we consider a more practical task termed the incomplete text-base…
▽ More
Traditional text-based person re-identification (ReID) techniques heavily rely on fully matched multi-modal data, which is an ideal scenario. However, due to inevitable data missing and corruption during the collection and processing of cross-modal data, the incomplete data issue is usually met in real-world applications. Therefore, we consider a more practical task termed the incomplete text-based ReID task, where person images and text descriptions are not completely matched and contain partially missing modality data. To this end, we propose a novel Prototype-guided Cross-modal Completion and Alignment (PCCA) framework to handle the aforementioned issues for incomplete text-based ReID. Specifically, we cannot directly retrieve person images based on a text query on missing modality data. Therefore, we propose the cross-modal nearest neighbor construction strategy for missing data by computing the cross-modal similarity between existing images and texts, which provides key guidance for the completion of missing modal features. Furthermore, to efficiently complete the missing modal features, we construct the relation graphs with the aforementioned cross-modal nearest neighbor sets of missing modal data and the corresponding prototypes, which can further enhance the generated missing modal features. Additionally, for tighter fine-grained alignment between images and texts, we raise a prototype-aware cross-modal alignment loss that can effectively reduce the modality heterogeneity gap for better fine-grained alignment in common space. Extensive experimental results on several benchmarks with different missing ratios amply demonstrate that our method can consistently outperform state-of-the-art text-image ReID approaches.
△ Less
Submitted 2 October, 2023; v1 submitted 29 September, 2023;
originally announced September 2023.
-
LanSER: Language-Model Supported Speech Emotion Recognition
Authors:
Taesik Gong,
Josh Belanich,
Krishna Somandepalli,
Arsha Nagrani,
Brian Eoff,
Brendan Jou
Abstract:
Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained…
▽ More
Speech emotion recognition (SER) models typically rely on costly human-labeled data for training, making scaling methods to large speech datasets and nuanced emotion taxonomies difficult. We present LanSER, a method that enables the use of unlabeled data by inferring weak emotion labels via pre-trained large language models through weakly-supervised learning. For inferring weak labels constrained to a taxonomy, we use a textual entailment approach that selects an emotion label with the highest entailment score for a speech transcript extracted via automatic speech recognition. Our experimental results show that models pre-trained on large datasets with this weak supervision outperform other baseline models on standard SER datasets when fine-tuned, and show improved label efficiency. Despite being pre-trained on labels derived only from text, we show that the resulting representations appear to model the prosodic content of speech.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
A Transmit-Receive Parameter Separable Electromagnetic Channel Model for LoS Holographic MIMO
Authors:
Tierui Gong,
Chongwen Huang,
Jiguang He,
Marco Di Renzo,
Mérouane Debbah,
Chau Yuen
Abstract:
To support the extremely high spectral efficiency and energy efficiency requirements, and emerging applications of future wireless communications, holographic multiple-input multiple-output (H-MIMO) technology is envisioned as one of the most promising enablers. It can potentially bring extra degrees-of-freedom for communications and signal processing, including spatial multiplexing in line-of-sig…
▽ More
To support the extremely high spectral efficiency and energy efficiency requirements, and emerging applications of future wireless communications, holographic multiple-input multiple-output (H-MIMO) technology is envisioned as one of the most promising enablers. It can potentially bring extra degrees-of-freedom for communications and signal processing, including spatial multiplexing in line-of-sight (LoS) channels and electromagnetic (EM) field processing performed using specialized devices, to attain the fundamental limits of wireless communications. In this context, EM-domain channel modeling is critical to harvest the benefits offered by H-MIMO. Existing EM-domain channel models are built based on the tensor Green function, which require prior knowledge of the global position and/or the relative distances and directions of the transmit/receive antenna elements. Such knowledge may be difficult to acquire in real-world applications due to extensive measurements needed for obtaining this data. To overcome this limitation, we propose a transmit-receive parameter separable channel model methodology in which the EM-domain (or holographic) channel can be simply acquired from the distance/direction measured between the center-points between the transmit and receive surfaces, and the local positions between the transmit and receive elements, thus avoiding extensive global parameter measurements. Analysis and numerical results showcase the effectiveness of the proposed channel modeling approach in approximating the H-MIMO channel, and achieving the theoretical channel capacity.
△ Less
Submitted 29 August, 2023; v1 submitted 28 August, 2023;
originally announced August 2023.
-
AsdKB: A Chinese Knowledge Base for the Early Screening and Diagnosis of Autism Spectrum Disorder
Authors:
Tianxing Wu,
Xudong Cao,
Yipeng Zhu,
Feiyue Wu,
Tianling Gong,
Yuxiang Wang,
Shenqi Jing
Abstract:
To easily obtain the knowledge about autism spectrum disorder and help its early screening and diagnosis, we create AsdKB, a Chinese knowledge base on autism spectrum disorder. The knowledge base is built on top of various sources, including 1) the disease knowledge from SNOMED CT and ICD-10 clinical descriptions on mental and behavioural disorders, 2) the diagnostic knowledge from DSM-5 and diffe…
▽ More
To easily obtain the knowledge about autism spectrum disorder and help its early screening and diagnosis, we create AsdKB, a Chinese knowledge base on autism spectrum disorder. The knowledge base is built on top of various sources, including 1) the disease knowledge from SNOMED CT and ICD-10 clinical descriptions on mental and behavioural disorders, 2) the diagnostic knowledge from DSM-5 and different screening tools recommended by social organizations and medical institutes, and 3) the expert knowledge on professional physicians and hospitals from the Web. AsdKB contains both ontological and factual knowledge, and is accessible as Linked Data at https://w3id.org/asdkb/. The potential applications of AsdKB are question answering, auxiliary diagnosis, and expert recommendation, and we illustrate them with a prototype which can be accessed at http://asdkb.org.cn/.
△ Less
Submitted 2 August, 2023; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Experimental study of a cryogenic power supply for superconducting DC devices
Authors:
Lauro Ferreira,
Yasmine Baazizi,
Simon Meunier,
Tanguy Phulpin,
Richard Beljio,
Frédéric Trillaud,
Tian-Yong Gong,
Gustavo Henn,
Loïc Quéval
Abstract:
Although a superconductor has no DC losses, a superconducting system does have significant losses, especially when it comes to power supply. Here, we study two different power supply systems. The first, a conventional one, consists of a transformer and a diode bridge operating at room temperature, plus current leads that allow the current to flow from the room-temperature medium to the cryogenic m…
▽ More
Although a superconductor has no DC losses, a superconducting system does have significant losses, especially when it comes to power supply. Here, we study two different power supply systems. The first, a conventional one, consists of a transformer and a diode bridge operating at room temperature, plus current leads that allow the current to flow from the room-temperature medium to the cryogenic medium. The second consists of a transformer with a superconducting secondary winding, combined with a diode bridge operating at cryogenic temperature, thus dispensing with the need for current leads. We are experimentally comparing the performance of conventional and superconducting transformers, as well as the performance of a diode bridge at ambient and cryogenic temperatures. Our results indicate that the prototype superconducting transformer developed has lower winding resistance and secondary leakage inductance than the conventional transformer. In addition, we found that only certain diodes are suitable for operation at cryogenic temperatures. Finally, the diode bridge made from adapted diodes shows reduced losses at cryogenic temperature. This experimental work is the first step towards the realization of a complete power supply system for a superconducting device.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.
-
Patch-CNN: Training data-efficient deep learning for high-fidelity diffusion tensor estimation from minimal diffusion protocols
Authors:
Tobias Goodwin-Allcock,
Ting Gong,
Robert Gray,
Parashkev Nachev,
Hui Zhang
Abstract:
We propose a new method, Patch-CNN, for diffusion tensor (DT) estimation from only six-direction diffusion weighted images (DWI). Deep learning-based methods have been recently proposed for dMRI parameter estimation, using either voxel-wise fully-connected neural networks (FCN) or image-wise convolutional neural networks (CNN). In the acute clinical context -- where pressure of time limits the num…
▽ More
We propose a new method, Patch-CNN, for diffusion tensor (DT) estimation from only six-direction diffusion weighted images (DWI). Deep learning-based methods have been recently proposed for dMRI parameter estimation, using either voxel-wise fully-connected neural networks (FCN) or image-wise convolutional neural networks (CNN). In the acute clinical context -- where pressure of time limits the number of imaged directions to a minimum -- existing approaches either require an infeasible number of training images volumes (image-wise CNNs), or do not estimate the fibre orientations (voxel-wise FCNs) required for tractogram estimation. To overcome these limitations, we propose Patch-CNN, a neural network with a minimal (non-voxel-wise) convolutional kernel (3$\times$3$\times$3). Compared with voxel-wise FCNs, this has the advantage of allowing the network to leverage local anatomical information. Compared with image-wise CNNs, the minimal kernel vastly reduces training data demand. Evaluated against both conventional model fitting and a voxel-wise FCN, Patch-CNN, trained with a single subject is shown to improve the estimation of both scalar dMRI parameters and fibre orientation from six-direction DWIs. The improved fibre orientation estimation is shown to produce improved tractogram.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
CloudBrain-MRS: An Intelligent Cloud Computing Platform for in vivo Magnetic Resonance Spectroscopy Preprocessing, Quantification, and Analysis
Authors:
Xiaodie Chen,
Jiayu Li,
Dicheng Chen,
Yirong Zhou,
Zhangren Tu,
Meijin Lin,
Taishan Kang,
Jianzhong Lin,
Tao Gong,
Liuhong Zhu,
Jianjun Zhou,
Lin Ou-yang,
Jiefeng Guo,
Jiyang Dong,
Di Guo,
Xiaobo Qu
Abstract:
Magnetic resonance spectroscopy (MRS) is an important clinical imaging method for diagnosis of diseases. MRS spectrum is used to observe the signal intensity of metabolites or further infer their concentrations. Although the magnetic resonance vendors commonly provide basic functions of spectra plots and metabolite quantification, the widespread clinical research of MRS is still limited due to the…
▽ More
Magnetic resonance spectroscopy (MRS) is an important clinical imaging method for diagnosis of diseases. MRS spectrum is used to observe the signal intensity of metabolites or further infer their concentrations. Although the magnetic resonance vendors commonly provide basic functions of spectra plots and metabolite quantification, the widespread clinical research of MRS is still limited due to the lack of easy-to-use processing software or platform. To address this issue, we have developed CloudBrain-MRS, a cloud-based online platform that provides powerful hardware and advanced algorithms. The platform can be accessed simply through a web browser, without the need of any program installation on the user side. CloudBrain-MRS also integrates the classic LCModel and advanced artificial intelligence algorithms and supports batch preprocessing, quantification, and analysis of MRS data from different vendors. Additionally, the platform offers useful functions: 1) Automatically statistical analysis to find biomarkers for diseases; 2) Consistency verification between the classic and artificial intelligence quantification algorithms; 3) Colorful three-dimensional visualization for easy observation of individual metabolite spectrum. Last, both healthy and mild cognitive impairment patient data are used to demonstrate the functions of the platform. To the best of our knowledge, this is the first cloud computing platform for in vivo MRS with artificial intelligence processing. We have shared our cloud platform at MRSHub, providing free access and service for two years. Please visit https://mrshub.org/software_all/#CloudBrain-MRS or https://csrc.xmu.edu.cn/CloudBrain.html.
△ Less
Submitted 6 September, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
Front-running Attack in Sharded Blockchains and Fair Cross-shard Consensus
Authors:
Jianting Zhang,
Wuhui Chen,
Sifu Luo,
Tiantian Gong,
Zicong Hong,
Aniket Kate
Abstract:
Sharding is a prominent technique for scaling blockchains. By dividing the network into smaller components known as shards, a sharded blockchain can process transactions in parallel without introducing inconsistencies through the coordination of intra-shard and cross-shard consensus protocols. However, we observe a critical security issue with sharded systems: transaction ordering manipulations ca…
▽ More
Sharding is a prominent technique for scaling blockchains. By dividing the network into smaller components known as shards, a sharded blockchain can process transactions in parallel without introducing inconsistencies through the coordination of intra-shard and cross-shard consensus protocols. However, we observe a critical security issue with sharded systems: transaction ordering manipulations can occur when coordinating intra-shard and cross-shard consensus protocols, leaving the system vulnerable to attack. Specifically, we identify a novel security issue known as finalization fairness, which can be exploited through a front-running attack. This attack allows an attacker to manipulate the execution order of transactions, even if the victim's transaction has already been processed and added to the blockchain by a fair intra-shard consensus.
To address the issue, we offer Haechi, a novel cross-shard protocol that is immune to front-running attacks. Haechi introduces an ordering phase between transaction processing and execution, ensuring that the execution order of transactions is the same as the processing order and achieving finalization fairness. To accommodate different consensus speeds among shards, Haechi incorporates a finalization fairness algorithm to achieve a globally fair order with minimal performance loss. By providing a global order, Haechi ensures strong consistency among shards, enabling better parallelism in handling conflicting transactions across shards. These features make Haechi a promising solution for supporting popular smart contracts in the real world. To evaluate Haechi's performance, we implemented the protocol using Tendermint and conducted extensive experiments on a geo-distributed AWS environment. Our results demonstrate that Haechi achieves finalization fairness with little performance sacrifice compared to existing cross-shard consensus protocols.
△ Less
Submitted 8 September, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
Multi-spectral Class Center Network for Face Manipulation Detection and Localization
Authors:
Changtao Miao,
Qi Chu,
Zhentao Tan,
Zhenchao Jin,
Tao Gong,
Wanyi Zhuang,
Yue Wu,
Bin Liu,
Honggang Hu,
Nenghai Yu
Abstract:
As deepfake content proliferates online, advancing face manipulation forensics has become crucial. To combat this emerging threat, previous methods mainly focus on studying how to distinguish authentic and manipulated face images. Although impressive, image-level classification lacks explainability and is limited to specific application scenarios, spurring recent research on pixel-level prediction…
▽ More
As deepfake content proliferates online, advancing face manipulation forensics has become crucial. To combat this emerging threat, previous methods mainly focus on studying how to distinguish authentic and manipulated face images. Although impressive, image-level classification lacks explainability and is limited to specific application scenarios, spurring recent research on pixel-level prediction for face manipulation forensics. However, existing forgery localization methods suffer from exploring frequency-based forgery traces in the localization network. In this paper, we observe that multi-frequency spectrum information is effective for identifying tampered regions. To this end, a novel Multi-Spectral Class Center Network (MSCCNet) is proposed for face manipulation detection and localization. Specifically, we design a Multi-Spectral Class Center (MSCC) module to learn more generalizable and multi-frequency features. Based on the features of different frequency bands, the MSCC module collects multi-spectral class centers and computes pixel-to-class relations. Applying multi-spectral class-level representations suppresses the semantic information of the visual concepts which is insensitive to manipulated regions of forgery images. Furthermore, we propose a Multi-level Features Aggregation (MFA) module to employ more low-level forgery artifacts and structural textures. Meanwhile, we conduct a comprehensive localization benchmark based on pixel-level FF++ and Dolos datasets. Experimental results quantitatively and qualitatively demonstrate the effectiveness and superiority of the proposed MSCCNet. We expect this work to inspire more studies on pixel-level face manipulation localization. The codes are available (https://github.com/miaoct/MSCCNet).
△ Less
Submitted 13 July, 2024; v1 submitted 18 May, 2023;
originally announced May 2023.
-
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
Authors:
Tao Gong,
Chengqi Lyu,
Shilong Zhang,
Yudong Wang,
Miao Zheng,
Qian Zhao,
Kuikun Liu,
Wenwei Zhang,
Ping Luo,
Kai Chen
Abstract:
We present a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. MultiModal-GPT can follow various instructions from humans, such as generating a detailed caption, counting the number of interested objects, and answering general questions from users. MultiModal-GPT is parameter-efficiently fine-tuned from OpenFlamingo, with Low-rank Adapter (LoRA) added both…
▽ More
We present a vision and language model named MultiModal-GPT to conduct multi-round dialogue with humans. MultiModal-GPT can follow various instructions from humans, such as generating a detailed caption, counting the number of interested objects, and answering general questions from users. MultiModal-GPT is parameter-efficiently fine-tuned from OpenFlamingo, with Low-rank Adapter (LoRA) added both in the cross-attention part and the self-attention part of the language model. We first construct instruction templates with vision and language data for multi-modality instruction tuning to make the model understand and follow human instructions. We find the quality of training data is vital for the dialogue performance, where few data containing short answers can lead the model to respond shortly to any instructions. To further enhance the ability to chat with humans of the MultiModal-GPT, we utilize language-only instruction-following data to train the MultiModal-GPT jointly. The joint training of language-only and visual-language instructions with the \emph{same} instruction template effectively improves dialogue performance. Various demos show the ability of continuous dialogue of MultiModal-GPT with humans. Code, dataset, and demo are at https://github.com/open-mmlab/Multimodal-GPT
△ Less
Submitted 13 June, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
Understanding the Generalization Ability of Deep Learning Algorithms: A Kernelized Renyi's Entropy Perspective
Authors:
Yuxin Dong,
Tieliang Gong,
Hong Chen,
Chen Li
Abstract:
Recently, information theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient/Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far fr…
▽ More
Recently, information theoretic analysis has become a popular framework for understanding the generalization behavior of deep neural networks. It allows a direct analysis for stochastic gradient/Langevin descent (SGD/SGLD) learning algorithms without strong assumptions such as Lipschitz or convexity conditions. However, the current generalization error bounds within this framework are still far from optimal, while substantial improvements on these bounds are quite challenging due to the intractability of high-dimensional information quantities. To address this issue, we first propose a novel information theoretical measure: kernelized Renyi's entropy, by utilizing operator representation in Hilbert space. It inherits the properties of Shannon's entropy and can be effectively calculated via simple random sampling, while remaining independent of the input dimension. We then establish the generalization error bounds for SGD/SGLD under kernelized Renyi's entropy, where the mutual information quantities can be directly calculated, enabling evaluation of the tightness of each intermediate step. We show that our information-theoretical bounds depend on the statistics of the stochastic gradients evaluated along with the iterates, and are rigorously tighter than the current state-of-the-art (SOTA) results. The theoretical findings are also supported by large-scale empirical studies1.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Holographic thermodynamics of rotating black holes
Authors:
Ting-Feng Gong,
Jie Jiang,
Ming Zhang
Abstract:
We provide mass/energy formulas for the extended thermodynamics, mixed thermodynamics, and holographic conformal field theory (CFT) thermodynamics for the charged and rotating Kerr-Newman Anti-de Sitter black holes. Then for the CFT thermal states dual to the black hole, we find the first-order phase transitions and criticality phenomena in the canonical ensemble with fixed angular momentum, volum…
▽ More
We provide mass/energy formulas for the extended thermodynamics, mixed thermodynamics, and holographic conformal field theory (CFT) thermodynamics for the charged and rotating Kerr-Newman Anti-de Sitter black holes. Then for the CFT thermal states dual to the black hole, we find the first-order phase transitions and criticality phenomena in the canonical ensemble with fixed angular momentum, volume, and central charge. We observe that the CFT states cannot be analogous to the Van der Waals fluids, despite the critical exponents falling into the universality class predicted by the mean field theory. Additionally, we examine the (de)confinement phase transitions within the grand canonical ensemble with fixed angular velocity, volume, and central charge of the CFT. Our findings suggest that the near zero temperature (de)confinement phase transitions can occur with the angular velocity of the CFT that solely depends on the CFT volume.
△ Less
Submitted 24 June, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Holographic MIMO Communications with Arbitrary Surface Placements: Near-Field LoS Channel Model and Capacity Limit
Authors:
Tierui Gong,
Li Wei,
Chongwen Huang,
Zhijia Yang,
Jiguang He,
Mérouane Debbah,
Chau Yuen
Abstract:
Envisioned as one of the most promising technologies, holographic multiple-input multiple-output (H-MIMO) recently attracts notable research interests for its great potential in expanding wireless possibilities and achieving fundamental wireless limits. Empowered by the nearly continuous, large and energy-efficient surfaces with powerful electromagnetic (EM) wave control capabilities, H-MIMO opens…
▽ More
Envisioned as one of the most promising technologies, holographic multiple-input multiple-output (H-MIMO) recently attracts notable research interests for its great potential in expanding wireless possibilities and achieving fundamental wireless limits. Empowered by the nearly continuous, large and energy-efficient surfaces with powerful electromagnetic (EM) wave control capabilities, H-MIMO opens up the opportunity for signal processing in a more fundamental EM-domain, paving the way for realizing holographic imaging level communications in supporting the extremely high spectral efficiency and energy efficiency in future networks. In this article, we try to implement a generalized EM-domain near-field channel modeling and study its capacity limit of point-to-point H-MIMO systems that equips arbitrarily placed surfaces in a line-of-sight (LoS) environment. Two effective and computational-efficient channel models are established from their integral counterpart, where one is with a sophisticated formula but showcases more accurate, and another is concise with a slight precision sacrifice. Furthermore, we unveil the capacity limit using our channel model, and derive a tight upper bound based upon an elaborately built analytical framework. Our result reveals that the capacity limit grows logarithmically with the product of transmit element area, receive element area, and the combined effects of $1/{{d}_{mn}^2}$, $1/{{d}_{mn}^4}$, and $1/{{d}_{mn}^6}$ over all transmit and receive antenna elements, where $d_{mn}$ indicates the distance between each transmit and receive elements. Numerical evaluations validate the effectiveness of our channel models, and showcase the slight disparity between the upper bound and the exact capacity, which is beneficial for predicting practical system performance.
△ Less
Submitted 29 November, 2023; v1 submitted 11 April, 2023;
originally announced April 2023.
-
STCF Conceptual Design Report: Volume 1 -- Physics & Detector
Authors:
M. Achasov,
X. C. Ai,
R. Aliberti,
L. P. An,
Q. An,
X. Z. Bai,
Y. Bai,
O. Bakina,
A. Barnyakov,
V. Blinov,
V. Bobrovnikov,
D. Bodrov,
A. Bogomyagkov,
A. Bondar,
I. Boyko,
Z. H. Bu,
F. M. Cai,
H. Cai,
J. J. Cao,
Q. H. Cao,
Z. Cao,
Q. Chang,
K. T. Chao,
D. Y. Chen,
H. Chen
, et al. (413 additional authors not shown)
Abstract:
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII,…
▽ More
The Super $τ$-Charm facility (STCF) is an electron-positron collider proposed by the Chinese particle physics community. It is designed to operate in a center-of-mass energy range from 2 to 7 GeV with a peak luminosity of $0.5\times 10^{35}{\rm cm}^{-2}{\rm s}^{-1}$ or higher. The STCF will produce a data sample about a factor of 100 larger than that by the present $τ$-Charm factory -- the BEPCII, providing a unique platform for exploring the asymmetry of matter-antimatter (charge-parity violation), in-depth studies of the internal structure of hadrons and the nature of non-perturbative strong interactions, as well as searching for exotic hadrons and physics beyond the Standard Model. The STCF project in China is under development with an extensive R\&D program. This document presents the physics opportunities at the STCF, describes conceptual designs of the STCF detector system, and discusses future plans for detector R\&D and physics case studies.
△ Less
Submitted 5 October, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
A Generalized Electromagnetic-Domain Channel Modeling for LOS Holographic MIMO with Arbitrary Surface Placements
Authors:
Tierui Gong,
Li Wei,
Zhijia Yang,
Mérouane Debbah,
Chau Yuen
Abstract:
Holographic multiple-input multiple-output (H-MIMO) is considered as one of the most promising technologies to enable future wireless communications in supporting the expected extreme requirements, such as high energy and spectral efficiency. Empowered by the powerful capability in electromagnetic (EM) wave manipulations, H-MIMO has the potential to reach the fundamental limit of the wireless envi…
▽ More
Holographic multiple-input multiple-output (H-MIMO) is considered as one of the most promising technologies to enable future wireless communications in supporting the expected extreme requirements, such as high energy and spectral efficiency. Empowered by the powerful capability in electromagnetic (EM) wave manipulations, H-MIMO has the potential to reach the fundamental limit of the wireless environment, and opens up the possibility of signal processing in the EM-domain, which needs to be depicted carefully from an EM perspective, especially the wireless channel. To this aim, we study the line-of-sight (LOS) H-MIMO communications with arbitrary surface placements and establish an exact expression of the wireless channel in the EM-domain. To further obtain a more explicit and computationally-efficient channel models, we solve the implicit integrals of the exact channel model with moderate and reasonable assumptions. Numerical studies are executed and the results show good agreements of our established approximated channel models to the exact channel model.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
On the Stability and Generalization of Triplet Learning
Authors:
Jun Chen,
Hong Chen,
Xue Jiang,
Bin Gu,
Weifu Li,
Tieliang Gong,
Feng Zheng
Abstract:
Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks with an extremely large number of categories, e.g., face recognition and person re-identification. Albeit with rapid progress in designing and applying triplet learning algorithms, there is a lacking study on the theoretical understanding of their generalization performance. To fill this gap, t…
▽ More
Triplet learning, i.e. learning from triplet data, has attracted much attention in computer vision tasks with an extremely large number of categories, e.g., face recognition and person re-identification. Albeit with rapid progress in designing and applying triplet learning algorithms, there is a lacking study on the theoretical understanding of their generalization performance. To fill this gap, this paper investigates the generalization guarantees of triplet learning by leveraging the stability analysis. Specifically, we establish the first general high-probability generalization bound for the triplet learning algorithm satisfying the uniform stability, and then obtain the excess risk bounds of the order $O(n^{-\frac{1}{2}} \mathrm{log}n)$ for both stochastic gradient descent (SGD) and regularized risk minimization (RRM), where $2n$ is approximately equal to the number of training samples. Moreover, an optimistic generalization bound in expectation as fast as $O(n^{-1})$ is derived for RRM in a low noise case via the on-average stability analysis. Finally, our results are applied to triplet metric learning to characterize its theoretical underpinning.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Order but Not Execute in Order
Authors:
Tiantian Gong,
Aniket Kate
Abstract:
This work aims to address the general order manipulation issue in blockchain-based decentralized exchanges (DEX) by exploring the benefits of employing a novel combination of order-fair atomic broadcast (of-ABC) mechanisms for transaction ordering and frequent batch auction (FBA) for breaking the order while executing those transactions. In of-ABC, transactions submitted to a sufficient number of…
▽ More
This work aims to address the general order manipulation issue in blockchain-based decentralized exchanges (DEX) by exploring the benefits of employing a novel combination of order-fair atomic broadcast (of-ABC) mechanisms for transaction ordering and frequent batch auction (FBA) for breaking the order while executing those transactions. In of-ABC, transactions submitted to a sufficient number of blockchain validators are ordered before or along with later transactions. FBA then executes transactions with a uniform price double auction that prioritizes price instead of transaction order within the same committed batch.
To demonstrate the merits of our order-but-not-execute-in-order design, we compare the welfare loss and liquidity provision in DEX under FBA and its continuous counterpart, the Continuous Limit Order Book (CLOB). Assuming that the exchange is realized over an of-ABC protocol, we find that (1) FBA always achieves better social welfare when no party is privately informed about asset valuations. Even otherwise, FBA incurs less welfare loss compared to CLOB when (2) price takers and public information reflecting asset value changes arrives sufficiently frequently compared to private information, or (3) the priority fees are small, or (4) the market is more balanced on both sides. Our empirical analysis on dYdX transactions indicates additional $8\%-83\%$ costs when transactions are executed continuously. Further, our findings also indicate that liquidity provision is better under FBA under similar conditions.
△ Less
Submitted 27 September, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Machine-learning-informed parameter estimation improves the reliability of spinal cord diffusion MRI
Authors:
Ting Gong,
Francesco Grussu,
Claudia A. M. Gandini Wheeler-Kingshott,
Daniel C Alexander,
Hui Zhang
Abstract:
Purpose: We address the challenge of inaccurate parameter estimation in diffusion MRI when the signal-to-noise ratio (SNR) is very low, as in the spinal cord. The accuracy of conventional maximum-likelihood estimation (MLE) depends highly on initialisation. Unfavourable choices could result in suboptimal parameter estimates. Current methods to address this issue, such as grid search (GS) can incre…
▽ More
Purpose: We address the challenge of inaccurate parameter estimation in diffusion MRI when the signal-to-noise ratio (SNR) is very low, as in the spinal cord. The accuracy of conventional maximum-likelihood estimation (MLE) depends highly on initialisation. Unfavourable choices could result in suboptimal parameter estimates. Current methods to address this issue, such as grid search (GS) can increase computation time substantially. Methods: We propose a machine learning (ML) informed MLE approach that combines conventional MLE with ML approaches synergistically. ML-based methods have been developed recently to improve the speed and precision of parameter estimation. However, they can generate high systematic bias in estimated parameters when SNR is low. In the proposed ML-MLE approach, an artificial neural network model is trained to provide sensible initialisation for MLE efficiently, with the final solution determined by MLE, avoiding biases typically affecting pure ML estimations. Results: Using parameter estimation of neurite orientation dispersion and density imaging as an example, simulation and in vivo experiments suggest that the ML-MLE method can reduce outlier estimates from conventional MLE in white matter voxels affected by CSF contamination. It also accelerates computation compared to GS-MLE. Conclusion: The ML-MLE method can improve the reliability of parameter estimation with reduced computation time compared to GS-MLE, making it a practical tool for diffusion dataset with low SNR.
△ Less
Submitted 28 January, 2023;
originally announced January 2023.
-
Holographic MIMO Communications: Theoretical Foundations, Enabling Technologies, and Future Directions
Authors:
Tierui Gong,
Panagiotis Gavriilidis,
Ran Ji,
Chongwen Huang,
George C. Alexandropoulos,
Li Wei,
Zhaoyang Zhang,
Mérouane Debbah,
H. Vincent Poor,
Chau Yuen
Abstract:
Future wireless systems are envisioned to create an endogenously holography-capable, intelligent, and programmable radio propagation environment, that will offer unprecedented capabilities for high spectral and energy efficiency, low latency, and massive connectivity. A potential and promising technology for supporting the expected extreme requirements of the sixth-generation (6G) communication sy…
▽ More
Future wireless systems are envisioned to create an endogenously holography-capable, intelligent, and programmable radio propagation environment, that will offer unprecedented capabilities for high spectral and energy efficiency, low latency, and massive connectivity. A potential and promising technology for supporting the expected extreme requirements of the sixth-generation (6G) communication systems is the concept of the holographic multiple-input multiple-output (HMIMO), which will actualize holographic radios with reasonable power consumption and fabrication cost. The HMIMO is facilitated by ultra-thin, extremely large, and nearly continuous surfaces that incorporate reconfigurable and sub-wavelength-spaced antennas and/or metamaterials. Such surfaces comprising dense electromagnetic (EM) excited elements are capable of recording and manipulating impinging fields with utmost flexibility and precision, as well as with reduced cost and power consumption, thereby shaping arbitrary-intended EM waves with high energy efficiency. The powerful EM processing capability of HMIMO opens up the possibility of wireless communications of holographic imaging level, paving the way for signal processing techniques realized in the EM-domain, possibly in conjunction with their digital-domain counterparts. However, in spite of the significant potential, the studies on HMIMO communications are still at an initial stage, its fundamental limits remain to be unveiled, and a certain number of critical technical challenges need to be addressed. In this survey, we present a comprehensive overview of the latest advances in the HMIMO communications paradigm, with a special focus on their physical aspects, their theoretical foundations, as well as the enabling technologies for HMIMO systems. We also compare the HMIMO with existing multi-antenna technologies, especially the massive MIMO, present various...
△ Less
Submitted 28 August, 2023; v1 submitted 2 December, 2022;
originally announced December 2022.
-
Robust and Fast Measure of Information via Low-rank Representation
Authors:
Yuxin Dong,
Tieliang Gong,
Shujian Yu,
Hong Chen,
Chen Li
Abstract:
The matrix-based Rényi's entropy allows us to directly quantify information measures from given data, without explicit estimation of the underlying probability distribution. This intriguing property makes it widely applied in statistical inference and machine learning tasks. However, this information theoretical quantity is not robust against noise in the data, and is computationally prohibitive i…
▽ More
The matrix-based Rényi's entropy allows us to directly quantify information measures from given data, without explicit estimation of the underlying probability distribution. This intriguing property makes it widely applied in statistical inference and machine learning tasks. However, this information theoretical quantity is not robust against noise in the data, and is computationally prohibitive in large-scale applications. To address these issues, we propose a novel measure of information, termed low-rank matrix-based Rényi's entropy, based on low-rank representations of infinitely divisible kernel matrices. The proposed entropy functional inherits the specialty of of the original definition to directly quantify information from data, but enjoys additional advantages including robustness and effective calculation. Specifically, our low-rank variant is more sensitive to informative perturbations induced by changes in underlying distributions, while being insensitive to uninformative ones caused by noises. Moreover, low-rank Rényi's entropy can be efficiently approximated by random projection and Lanczos iteration techniques, reducing the overall complexity from $\mathcal{O}(n^3)$ to $\mathcal{O}(n^2 s)$ or even $\mathcal{O}(ns^2)$, where $n$ is the number of data samples and $s \ll n$. We conduct large-scale experiments to evaluate the effectiveness of this new information measure, demonstrating superior results compared to matrix-based Rényi's entropy in terms of both performance and computational efficiency.
△ Less
Submitted 30 November, 2022;
originally announced November 2022.
-
A Framework for Mutual Information-based MIMO Integrated Sensing and Communication Beamforming Design
Authors:
Jin Li,
Gui Zhou,
Tantao Gong,
Nan Liu
Abstract:
Integrated sensing and communication (ISAC) unifies sensing and communication, and improves the efficiency of the spectrum, energy, and hardware. In this work, we investigate the ISAC beamforming design to maximize the mutual information between the target response matrix of a point radar target and the echo signals, while ensuring the data rate requirements of the communication users. In order to…
▽ More
Integrated sensing and communication (ISAC) unifies sensing and communication, and improves the efficiency of the spectrum, energy, and hardware. In this work, we investigate the ISAC beamforming design to maximize the mutual information between the target response matrix of a point radar target and the echo signals, while ensuring the data rate requirements of the communication users. In order to study the impact of the echo interference caused by communication users on sensing performance, we study two scenarios: a single communication user and multiple communication users. For the case of a single communication user, we consider three types of echo interference, no interference, a point interference, and an extended interference. For the case of multiple communication users, the interference is also an extended one, and furthermore, each user's communication rate requirement needs to be satisfied. To find the optimal beamforming design in these problems, we provide a closed-form solution with low complexiy, a semidefinite relaxation (SDR) method, a low-complexity algorithm based on the Majorization-Minimization (MM) method and the successive convex approximation (SCA) method, and an algorithm based on MM method and SCA method, respectively. Numerical results demonstrate that, compared to the ISAC beamforming schemes based on the Cramér-Rao bound (CRB) metric and the beampattern metric, the proposed mutual information metric can bring better beampattern and root mean square error (RMSE) of angle estimation. Furthermore, our proposed schemes designed based on the mutual information metric can suppress the echo interference from the communication users effectively.
△ Less
Submitted 11 January, 2023; v1 submitted 14 November, 2022;
originally announced November 2022.