Search | arXiv e-print repository

$r$-Minimal Codes with Respect to Rank Metric

Authors: Yang Xu, Haibin Kan, Guangyue Han

Abstract: In this paper, we propose and study $r$-minimal codes, a natural extension of minimal codes which have been extensively studied with respect to Hamming metric, rank metric and sum-rank metric. We first propose $r$-minimal codes in a general setting where the ambient space is a finite dimensional left module over a division ring and is supported on a lattice. We characterize minimal subcodes and… ▽ More In this paper, we propose and study $r$-minimal codes, a natural extension of minimal codes which have been extensively studied with respect to Hamming metric, rank metric and sum-rank metric. We first propose $r$-minimal codes in a general setting where the ambient space is a finite dimensional left module over a division ring and is supported on a lattice. We characterize minimal subcodes and $r$-minimal codes, derive a general singleton bound, and give existence results for $r$-minimal codes by using combinatorial arguments. We then consider $r$-minimal rank metric codes over a field extension $\mathbb{E}/\mathbb{F}$ of degree $m$, where $\mathbb{E}$ can be infinite. We characterize these codes in terms of cutting $r$-blocking sets, generalized rank weights of the codes and those of the dual codes, and classify codes whose $r$-dimensional subcodes have constant rank support weight. Next, with the help of the evasiveness property of cutting $r$-blocking sets and some upper bounds for the dimensions of evasive subspaces, we derive several lower and upper bounds for the minimal length of $r$-minimal codes. Furthermore, when $\mathbb{E}$ is finite, we establish a general upper bound which generalizes and improves the counterpart for minimal codes in the literature. As a corollary, we show that if $m=3$, then for any $k\geqslant2$, the minimal length of $k$-dimensional minimal codes is equal to $2k$. To the best of our knowledge, when $m\geqslant3$, there was no known explicit formula for the minimal length of $k$-dimensional minimal codes for arbitrary $k$ in the literature. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2407.16574 [pdf, other]

TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback

Authors: Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, Daejin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo

Abstract: Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri… ▽ More Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: ACL2024 Findings

arXiv:2407.11291 [pdf, ps, other]

Normal forms of elements in the Weyl algebra and Dixmier Conjecture

Authors: Gang Han, Zhennan Pan, Yulin Chen

Abstract: A result of A. Joseph says that any nilpotent or semisimple element $z$ in the Weyl algebra $A_1$ over some algebracally closed field $K$ of characterstic 0 has a normal form up to the action of the automorphism group of $A_1$. It is shown in this note that the normal form corresponds to some unique pair of integers $(k,n)$ with $k\ge n\ge 0$, and will be called the Joseph norm form of $z$. Simila… ▽ More A result of A. Joseph says that any nilpotent or semisimple element $z$ in the Weyl algebra $A_1$ over some algebracally closed field $K$ of characterstic 0 has a normal form up to the action of the automorphism group of $A_1$. It is shown in this note that the normal form corresponds to some unique pair of integers $(k,n)$ with $k\ge n\ge 0$, and will be called the Joseph norm form of $z$. Similar results for the symplectic Poisson algebra $S_1$ are obtained. The Dixmier conjecture can be reformulated as follows: For any nilpotent element $z\in A_1$ whose Joseph norm corresponds to $(k,n)$ with $k>n\ge 1$, there exists no $w\in A_1$ with $ [z,w]=1$. It is known to hold true if $k$ and $n$ are coprime. In this note we show that the assertion also holds if $k$ or $n$ is prime. Analogous results for the Jacobian conjecture for $K[X,Y]$ are obtained. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.04064 [pdf, other]

Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie, causal representation disentanglement, which can identify the causal and non-causal factors in representations. After that, we only pass causal factors for subsequent policy learning and thus explicitly eliminate the influence of non-causal factors, which effectively improves the generalization ability of DRL models. Experimental results show that our proposed method can achieve robust navigation performance and effective collision avoidance especially in unseen scenarios, which significantly outperforms existing SOTA algorithms. △ Less

Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.04056 [pdf, other]

Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in performance degradation in unseen environments. To address this issue, we investigate the cause of weak generalization ability in DRL and propose a novel causal feature selection module. This module can be integrated into the policy network and effectively filters out non-causal factors in representations, thereby reducing the influence of spurious correlations between non-causal factors and action predictions. Experimental results demonstrate that our proposed method can achieve robust navigation performance and effective collision avoidance especially in scenarios with unseen backgrounds and obstacles, which significantly outperforms existing state-of-the-art algorithms. △ Less

Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03231 [pdf]

doi 10.1021/acs.nanolett.4c01536

Dimensionality Engineering of Magnetic Anisotropy from Anomalous Hall Effect in Synthetic SrRuO3 Crystals

Authors: Seung Gyo Jeong, Seong Won Cho, Sehwan Song, Jin Young Oh, Do Gyeom Jeong, Gyeongtak Han, Hu Young Jeong, Ahmed Yousef Mohamed, Woo-suk Noh, Sungkyun Park, Jong Seok Lee, Suyoun Lee, Young-Min Kim, Deok-Yong Cho, Woo Seok Choi

Abstract: Magnetic anisotropy in atomically thin correlated heterostructures is essential for exploring quantum magnetic phases for next-generation spintronics. Whereas previous studies have mostly focused on van der Waals systems, here, we investigate the impact of dimensionality of epitaxially-grown correlated oxides down to the monolayer limit on structural, magnetic, and orbital anisotropies. By designi… ▽ More Magnetic anisotropy in atomically thin correlated heterostructures is essential for exploring quantum magnetic phases for next-generation spintronics. Whereas previous studies have mostly focused on van der Waals systems, here, we investigate the impact of dimensionality of epitaxially-grown correlated oxides down to the monolayer limit on structural, magnetic, and orbital anisotropies. By designing oxide superlattices with a correlated ferromagnetic SrRuO3 and nonmagnetic SrTiO3 layers, we observed modulated ferromagnetic behavior with the change of the SrRuO3 thickness. Especially, for three-unit-cell-thick layers, we observe a significant 1,500% improvement of coercive field in the anomalous Hall effect, which cannot be solely attributed to the dimensional crossover in ferromagnetism. The atomic-scale heterostructures further reveal the systematic modulation of anisotropy for the lattice structure and orbital hybridization, explaining the enhanced magnetic anisotropy. Our findings provide valuable insights into engineering the anisotropic hybridization of synthetic magnetic crystals, offering a tunable spin order for various applications. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 23 pages

Journal ref: published 2024

arXiv:2406.02013 [pdf, other]

Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

Authors: Jiahang Cao, Qiang Zhang, Ziqing Wang, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu

Abstract: Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretica… ▽ More Sequential modeling has demonstrated remarkable capabilities in offline reinforcement learning (RL), with Decision Transformer (DT) being one of the most notable representatives, achieving significant success. However, RL trajectories possess unique properties to be distinguished from the conventional sequence (e.g., text or audio): (1) local correlation, where the next states in RL are theoretically determined solely by current states and actions based on the Markov Decision Process (MDP), and (2) global correlation, where each step's features are related to long-term historical information due to the time-continuous nature of trajectories. In this paper, we propose a novel action sequence predictor, named Mamba Decision Maker (MambaDM), where Mamba is expected to be a promising alternative for sequence modeling paradigms, owing to its efficient modeling of multi-scale dependencies. In particular, we introduce a novel mixer module that proficiently extracts and integrates both global and local features of the input sequence, effectively capturing interrelationships in RL datasets. Extensive experiments demonstrate that MambaDM achieves state-of-the-art performance in Atari and OpenAI Gym datasets. Furthermore, we empirically investigate the scaling laws of MambaDM, finding that increasing model size does not bring performance improvement, but scaling the dataset amount by 2x for MambaDM can obtain up to 33.7% score improvement on Atari dataset. This paper delves into the sequence modeling capabilities of MambaDM in the RL domain, paving the way for future advancements in robust and efficient decision-making systems. Our code will be available at https://github.com/AndyCao1125/MambaDM. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures

arXiv:2405.18405 [pdf, other]

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization

Authors: Jiawei Ma, Yulei Niu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

Abstract: Language has been useful in extending the vision encoder to data from diverse distributions without empirical discovery in training domains. However, as the image description is mostly at coarse-grained level and ignores visual details, the resulted embeddings are still ineffective in overcoming complexity of domains at inference time. We present a self-supervision framework WIDIn, Wording Images… ▽ More Language has been useful in extending the vision encoder to data from diverse distributions without empirical discovery in training domains. However, as the image description is mostly at coarse-grained level and ignores visual details, the resulted embeddings are still ineffective in overcoming complexity of domains at inference time. We present a self-supervision framework WIDIn, Wording Images for Domain-Invariant representation, to disentangle discriminative visual representation, by only leveraging data in a single domain and without any test prior. Specifically, for each image, we first estimate the language embedding with fine-grained alignment, which can be consequently used to adaptively identify and then remove domain-specific counterpart from the raw visual embedding. WIDIn can be applied to both pretrained vision-language models like CLIP, and separately trained uni-modal models like MoCo and BERT. Experimental studies on three domain generalization datasets demonstrate the effectiveness of our approach. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.07052 [pdf, other]

Length-Aware Multi-Kernel Transformer for Long Document Classification

Authors: Guangzeng Han, Jack Tsao, Xiaolei Huang

Abstract: Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths.… ▽ More Lengthy documents pose a unique challenge to neural language models due to substantial memory consumption. While existing state-of-the-art (SOTA) models segment long texts into equal-length snippets (e.g., 128 tokens per snippet) or deploy sparse attention networks, these methods have new challenges of context fragmentation and generalizability due to sentence boundaries and varying text lengths. For example, our empirical analysis has shown that SOTA models consistently overfit one set of lengthy documents (e.g., 2000 tokens) while performing worse on texts with other lengths (e.g., 1000 or 4000). In this study, we propose a Length-Aware Multi-Kernel Transformer (LAMKIT) to address the new challenges for the long document classification. LAMKIT encodes lengthy documents by diverse transformer-based kernels for bridging context boundaries and vectorizes text length by the kernels to promote model robustness over varying document lengths. Experiments on five standard benchmarks from health and law domains show LAMKIT outperforms SOTA models up to an absolute 10.9% improvement. We conduct extensive ablation analyses to examine model robustness and effectiveness over varying document lengths. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Accepted to SEM 2024

arXiv:2405.06983 [pdf, other]

ISAC-Assisted Wireless Rechargeable Sensor Networks with Multiple Mobile Charging Vehicles

Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Adeel Ahmed

Abstract: As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact… ▽ More As IoT-based wireless sensor networks (WSNs) become more prevalent, the issue of energy shortages becomes more pressing. One potential solution is the use of wireless power transfer (WPT) technology, which is the key to building a new shape of wireless rechargeable sensor networks (WRSNs). However, efficient charging and scheduling are critical for WRSNs to function properly. Motivated by the fact that probabilistic techniques can help enhance the effectiveness of charging scheduling for WRSNs, this article addresses the aforementioned issue and proposes a novel ISAC-assisted WRSN protocol. In particular, our proposed protocol considers several factors to balance the charging load on each mobile charging vehicle (MCV), uses an efficient charging factor strategy to partially charge network devices, and employs the ISAC concept to reduce the traveling cost of each MCV and prevent charging conflicts. Simulation results demonstrate that this protocol outperforms other classic, cutting-edge protocols in multiple areas. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Accepted for publication in the Special Issue Q1'2024, "Integrating Sensing and Communication for Ubiquitous Internet of Things," IEEE Internet of Things Magazine

arXiv:2404.13654 [pdf, other]

Multi-AUV Cooperative Underwater Multi-Target Tracking Based on Dynamic-Switching-enabled Multi-Agent Reinforcement Learning

Authors: Shengbo Wang, Chuan Lin, Guangjie Han, Shengchao Zhu, Zhixian Li, Zhenyu Wang

Abstract: With the rapid development of underwater communication, sensing, automation, robot technologies, autonomous underwater vehicle (AUV) swarms are gradually becoming popular and have been widely promoted in ocean exploration and underwater tracking or surveillance, etc. However, the complex underwater environment poses significant challenges for AUV swarm-based accurate tracking for the underwater mo… ▽ More With the rapid development of underwater communication, sensing, automation, robot technologies, autonomous underwater vehicle (AUV) swarms are gradually becoming popular and have been widely promoted in ocean exploration and underwater tracking or surveillance, etc. However, the complex underwater environment poses significant challenges for AUV swarm-based accurate tracking for the underwater moving targets. In this paper, we aim at proposing a multi-AUV cooperative underwater multi-target tracking algorithm especially when the real underwater factors are taken into account.We first give normally modelling approach for the underwater sonar-based detection and the ocean current interference on the target tracking process.Then, we regard the AUV swarm as a underwater ad-hoc network and propose a novel Multi-Agent Reinforcement Learning (MARL) architecture towards the AUV swarm based on Software-Defined Networking (SDN).It enhances the flexibility and scalability of the AUV swarm through centralized management and distributed operations.Based on the proposed MARL architecture, we propose the "dynamic-attention switching" and "dynamic-resampling switching" mechanisms, to enhance the efficiency and accuracy of AUV swarm cooperation during task execution.Finally, based on a proposed AUV classification method, we propose an efficient cooperative tracking algorithm called ASMA.Evaluation results demonstrate that our proposed tracking algorithm can perform precise underwater multi-target tracking, comparing with many of recent research products in terms of convergence speed and tracking accuracy. △ Less

Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.07950 [pdf, other]

Reinforcement Learning with Generalizable Gaussian Splatting

Authors: Jiaxu Wang, Qiang Zhang, Jingkai Sun, Jiahang Cao, Gang Han, Wen Zhao, Weining Zhang, Yecheng Shao, Yijie Guo, Renjing Xu

Abstract: An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. Ho… ▽ More An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL. △ Less

Submitted 5 August, 2024; v1 submitted 18 March, 2024; originally announced April 2024.

Comments: 7 pages,2 figures

arXiv:2404.04863 [pdf]

Microscopic Insights into Fatigue Mechanism in Wurtzite Ferroelectric Al$_{0.65}$Sc$_{0.35}$N: Oxygen Infiltration Enabled Grain Amorphization Spanning Boundary to Bulk

Authors: Ruiqing Wang, Danyang Yao, Jiuren Zhou, Yang Li, Zhi Jiang, Dongliang Chen, Xu Ran, Yu Gao, Zixuan Cheng, Yong Wang, Yan Liu, Yue Hao, Genquan Han

Abstract: For the first time, the fatigue behavior involving external oxygen in highly Sc-doped AlN ferroelectric film was observed using transmission electron microscope techniques. Despite increasing the Sc composition in AlScN film contributes to reducing the device operation voltage, the inherent affinity of Sc for oxygen introduces instability in device performance. In this study, oxygen incorporation… ▽ More For the first time, the fatigue behavior involving external oxygen in highly Sc-doped AlN ferroelectric film was observed using transmission electron microscope techniques. Despite increasing the Sc composition in AlScN film contributes to reducing the device operation voltage, the inherent affinity of Sc for oxygen introduces instability in device performance. In this study, oxygen incorporation at top electrode edges and grain boundaries accompanied with an increase in current leakage and the disappearance of ferroelectric properties, was observed in nanoscale after long-term field cycling. This observation indicates the emergence of non-ferroelectric and even amorphous states. This presented work revealed solid experimental evidence of an oxygen-involved fatigue mechanism, providing valuable insights into the physical nature of the ferroelectric properties of AlScN films. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: 2 Pages,7 figures

arXiv:2404.04656 [pdf, other]

Binary Classifier Optimization for Large Language Model Alignment

Authors: Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

Abstract: Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-compl… ▽ More Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-completion pair. In this paper, we present theoretical foundations to explain the successful alignment achieved through these binary signals. Our analysis uncovers a new perspective: optimizing a binary classifier, whose logit is a reward, implicitly induces minimizing the Direct Preference Optimization (DPO) loss. In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching. Consequently, we propose a new algorithm, \textit{Binary Classifier Optimization}, that integrates the techniques. We validate our methodology in two settings: first, on a paired preference dataset, where our method performs on par with DPO and KTO; and second, on binary signal datasets simulating real-world conditions with divergent underlying distributions between thumbs-up and thumbs-down data. Our model consistently demonstrates effective and robust alignment across two base LLMs and three different binary signal datasets, showcasing the strength of our approach to learning from binary feedback. △ Less

Submitted 6 April, 2024; originally announced April 2024.

Comments: 18 pages, 9 figures

arXiv:2404.04480 [pdf]

Possible charge density wave induced lattice distortion in ferromagnetic FeGe film

Authors: Guangdong Nie, Guanghui Han, Erfa S. Z., Shijian Chen, Hao Ding, Fangdong Tang, Licong Peng, Young Sun, Deshun Hong

Abstract: Binary compound FeGe hosts multiple structures, where skyrmion lattice emerges in the chiral B20 phase and antiferromagnet with charge density wave shows up in the hexagonal phase. Here, we synthesized monoclinic FeGe films which are ferromagnetic with Curie temperature as high as 800 K. By low temperature transmission electron microscope, lattice reconstructions in both real and reciprocal space… ▽ More Binary compound FeGe hosts multiple structures, where skyrmion lattice emerges in the chiral B20 phase and antiferromagnet with charge density wave shows up in the hexagonal phase. Here, we synthesized monoclinic FeGe films which are ferromagnetic with Curie temperature as high as 800 K. By low temperature transmission electron microscope, lattice reconstructions in both real and reciprocal space were captured at 100 K whereas no observable transition was observed in either transport nor magnetic characterizations. We infer the lattice distortion may be induced by charge density wave. Our work suggests FeGe films an ideal platform for understanding the intertwining of charge density wave, lattice distortion and magnetism, and paves the way to the tuning charge density wave by means of lattice engineering. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.02838 [pdf, other]

I-Design: Personalized LLM Interior Designer

Authors: Ata Çelen, Guo Han, Konrad Schindler, Luc Van Gool, Iro Armeni, Anton Obukhov, Xi Wang

Abstract: Interior design allows us to be who we are and live how we want - each design is as unique as our distinct personality. However, it is not trivial for non-professionals to express and materialize this since it requires aligning functional and visual expectations with the constraints of physical space; this renders interior design a luxury. To make it more accessible, we present I-Design, a persona… ▽ More Interior design allows us to be who we are and live how we want - each design is as unique as our distinct personality. However, it is not trivial for non-professionals to express and materialize this since it requires aligning functional and visual expectations with the constraints of physical space; this renders interior design a luxury. To make it more accessible, we present I-Design, a personalized interior designer that allows users to generate and visualize their design goals through natural language communication. I-Design starts with a team of large language model agents that engage in dialogues and logical reasoning with one another, transforming textual user input into feasible scene graph designs with relative object relationships. Subsequently, an effective placement algorithm determines optimal locations for each object within the scene. The final design is then constructed in 3D by retrieving and integrating assets from an existing object database. Additionally, we propose a new evaluation protocol that utilizes a vision-language model and complements the design pipeline. Extensive quantitative and qualitative experiments show that I-Design outperforms existing methods in delivering high-quality 3D design solutions and aligning with abstract concepts that match user input, showcasing its advantages across detailed 3D arrangement and conceptual fidelity. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.13786 [pdf, other]

Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts

Authors: Guangzeng Han, Weisi Liu, Xiaolei Huang, Brian Borsari

Abstract: Automatic coding patient behaviors is essential to support decision making for psychotherapists during the motivational interviewing (MI), a collaborative communication intervention approach to address psychiatric issues, such as alcohol and drug addiction. While the behavior coding task has rapidly adapted machine learning to predict patient states during the MI sessions, lacking of domain-specif… ▽ More Automatic coding patient behaviors is essential to support decision making for psychotherapists during the motivational interviewing (MI), a collaborative communication intervention approach to address psychiatric issues, such as alcohol and drug addiction. While the behavior coding task has rapidly adapted machine learning to predict patient states during the MI sessions, lacking of domain-specific knowledge and overlooking patient-therapist interactions are major challenges in developing and deploying those models in real practice. To encounter those challenges, we introduce the Chain-of-Interaction (CoI) prompting method aiming to contextualize large language models (LLMs) for psychiatric decision support by the dyadic interactions. The CoI prompting approach systematically breaks down the coding task into three key reasoning steps, extract patient engagement, learn therapist question strategies, and integrates dyadic interactions between patients and therapists. This approach enables large language models to leverage the coding scheme, patient state, and domain knowledge for patient behavioral coding. Experiments on real-world datasets can prove the effectiveness and flexibility of our prompting method with multiple state-of-the-art LLMs over existing prompting baselines. We have conducted extensive ablation analysis and demonstrate the critical role of dyadic interactions in applying LLMs for psychotherapy behavior understanding. △ Less

Submitted 23 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

Comments: Accepted to IEEE ICHI 2024

arXiv:2403.10945 [pdf, other]

Zero-Inflated Stochastic Volatility Model for Disaggregated Inflation Data with Exact Zeros

Authors: Geonhee Han, Kaoru Irie

Abstract: The disaggregated time-series data for Consumer Price Index often exhibits frequent instances of exact zero price changes, stemming from measurement errors inherent in the data collection process. However, the currently prominent stochastic volatility model of trend inflation is designed for aggregate measures of price inflation, where exact zero price changes rarely occur. We propose a zero-infla… ▽ More The disaggregated time-series data for Consumer Price Index often exhibits frequent instances of exact zero price changes, stemming from measurement errors inherent in the data collection process. However, the currently prominent stochastic volatility model of trend inflation is designed for aggregate measures of price inflation, where exact zero price changes rarely occur. We propose a zero-inflated stochastic volatility model applicable to such nonstationary real-valued multivariate time-series data with exact zeros, by a Bayesian dynamic generalized linear model that jointly specifies the dynamic zero-generating process. We also provide an efficient custom Gibbs sampler that leverages the Pólya-Gamma augmentation. Applying the model to disaggregated Japanese Consumer Price Index data, we find that the zero-inflated model provides more sensible and informative estimates of time-varying trend and volatility. Through an out-of-sample forecasting exercise, we find that the zero-inflated model provides improved point forecasts when zero-inflation is prominent, and better coverage of interval forecasts of the non-zero data by the non-zero distributional component. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.10492 [pdf, other]

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Authors: Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim

Abstract: Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended halluci… ▽ More Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dialogues. To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues powered by our novel Adversarial Question Generator (AQG), which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LVLMs. On our benchmark, the zero-shot performance of state-of-the-art LVLMs drops significantly for both the VQA and Captioning tasks. Next, we further reveal this hallucination is mainly due to the prediction bias toward preceding dialogues rather than visual content. To reduce this bias, we propose Adversarial Instruction Tuning (AIT) that robustly fine-tunes LVLMs against hallucinatory dialogues. Extensive experiments show our proposed approach successfully reduces dialogue hallucination while maintaining performance. △ Less

Submitted 25 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.18294 [pdf, other]

Whole-body Humanoid Robot Locomotion with Human Reference

Authors: Qiang Zhang, Peter Cui, David Yan, Jingkai Sun, Yiqun Duan, Gang Han, Wen Zhao, Weining Zhang, Yijie Guo, Arthur Zhang, Renjing Xu

Abstract: Recently, humanoid robots have made significant advances in their ability to perform challenging tasks due to the deployment of Reinforcement Learning (RL), however, the inherent complexity of humanoid robots, including the difficulty of designing complicated reward functions and training entire sophisticated systems, still poses a notable challenge. To conquer these challenges, after many iterati… ▽ More Recently, humanoid robots have made significant advances in their ability to perform challenging tasks due to the deployment of Reinforcement Learning (RL), however, the inherent complexity of humanoid robots, including the difficulty of designing complicated reward functions and training entire sophisticated systems, still poses a notable challenge. To conquer these challenges, after many iterations and in-depth investigations, we have meticulously developed a full-size humanoid robot, "Adam", whose innovative structural design greatly improves the efficiency and effectiveness of the imitation learning process. In addition, we have developed a novel imitation learning framework based on an adversarial motion prior, which applies not only to Adam but also to humanoid robots in general. Using the framework, Adam can exhibit unprecedented human-like characteristics in locomotion tasks. Our experimental results demonstrate that the proposed framework enables Adam to achieve human-comparable performance in complex locomotion tasks, marking the first time that human locomotion data has been used for imitation learning in a full-size humanoid robot. △ Less

Submitted 26 August, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: 7pages, 7 figures

arXiv:2402.17774 [pdf]

doi 10.1021/acsnano.4c02434

A paper-based multiplexed serological test to monitor immunity against SARS-CoV-2 using machine learning

Authors: Merve Eryilmaz, Artem Goncharov, Gyeo-Re Han, Hyou-Arm Joung, Zachary S. Ballard, Rajesh Ghosh, Yijie Zhang, Dino Di Carlo, Aydogan Ozcan

Abstract: The rapid spread of SARS-CoV-2 caused the COVID-19 pandemic and accelerated vaccine development to prevent the spread of the virus and control the disease. Given the sustained high infectivity and evolution of SARS-CoV-2, there is an ongoing interest in developing COVID-19 serology tests to monitor population-level immunity. To address this critical need, we designed a paper-based multiplexed vert… ▽ More The rapid spread of SARS-CoV-2 caused the COVID-19 pandemic and accelerated vaccine development to prevent the spread of the virus and control the disease. Given the sustained high infectivity and evolution of SARS-CoV-2, there is an ongoing interest in developing COVID-19 serology tests to monitor population-level immunity. To address this critical need, we designed a paper-based multiplexed vertical flow assay (xVFA) using five structural proteins of SARS-CoV-2, detecting IgG and IgM antibodies to monitor changes in COVID-19 immunity levels. Our platform not only tracked longitudinal immunity levels but also categorized COVID-19 immunity into three groups: protected, unprotected, and infected, based on the levels of IgG and IgM antibodies. We operated two xVFAs in parallel to detect IgG and IgM antibodies using a total of 40 uL of human serum sample in <20 min per test. After the assay, images of the paper-based sensor panel were captured using a mobile phone-based custom-designed optical reader and then processed by a neural network-based serodiagnostic algorithm. The trained serodiagnostic algorithm was blindly tested with serum samples collected before and after vaccination or infection, achieving an accuracy of 89.5%. The competitive performance of the xVFA, along with its portability, cost-effectiveness, and rapid operation, makes it a promising computational point-of-care (POC) serology test for monitoring COVID-19 immunity, aiding in timely decisions on the administration of booster vaccines and general public health policies to protect vulnerable populations. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: 19 Pages, 4 Figures

Journal ref: ACS Nano (2024)

arXiv:2402.11195 [pdf]

Deep learning-enhanced paper-based vertical flow assay for high-sensitivity troponin detection using nanoparticle amplification

Authors: Gyeo-Re Han, Artem Goncharov, Merve Eryilmaz, Hyou-Arm Joung, Rajesh Ghosh, Geon Yim, Nicole Chang, Minsoo Kim, Kevin Ngo, Marcell Veszpremi, Kun Liao, Omai B. Garner, Dino Di Carlo, Aydogan Ozcan

Abstract: Successful integration of point-of-care testing (POCT) into clinical settings requires improved assay sensitivity and precision to match laboratory standards. Here, we show how innovations in amplified biosensing, imaging, and data processing, coupled with deep learning, can help improve POCT. To demonstrate the performance of our approach, we present a rapid and cost-effective paper-based high-se… ▽ More Successful integration of point-of-care testing (POCT) into clinical settings requires improved assay sensitivity and precision to match laboratory standards. Here, we show how innovations in amplified biosensing, imaging, and data processing, coupled with deep learning, can help improve POCT. To demonstrate the performance of our approach, we present a rapid and cost-effective paper-based high-sensitivity vertical flow assay (hs-VFA) for quantitative measurement of cardiac troponin I (cTnI), a biomarker widely used for measuring acute cardiac damage and assessing cardiovascular risk. The hs-VFA includes a colorimetric paper-based sensor, a portable reader with time-lapse imaging, and computational algorithms for digital assay validation and outlier detection. Operating at the level of a rapid at-home test, the hs-VFA enabled the accurate quantification of cTnI using 50 uL of serum within 15 min per test and achieved a detection limit of 0.2 pg/mL, enabled by gold ion amplification chemistry and time-lapse imaging. It also achieved high precision with a coefficient of variation of < 7% and a very large dynamic range, covering cTnI concentrations over six orders of magnitude, up to 100 ng/mL, satisfying clinical requirements. In blinded testing, this computational hs-VFA platform accurately quantified cTnI levels in patient samples and showed a strong correlation with the ground truth values obtained by a benchtop clinical analyzer. This nanoparticle amplification-based computational hs-VFA platform can democratize access to high-sensitivity point-of-care diagnostics and provide a cost-effective alternative to laboratory-based biomarker testing. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: 23 Pages, 4 Figures, 1 Table

arXiv:2402.10873 [pdf, ps, other]

Probabilistic On-Demand Charging Scheduling for ISAC-Assisted WRSNs with Multiple Mobile Charging Vehicles

Authors: Muhammad Umar Farooq Qaisar, Weijie Yuan, Paolo Bellavista, Guangjie Han, Rabiu Sale Zakariyya, Adeel Ahmed

Abstract: The internet of things (IoT) based wireless sensor networks (WSNs) face an energy shortage challenge that could be overcome by the novel wireless power transfer (WPT) technology. The combination of WSNs and WPT is known as wireless rechargeable sensor networks (WRSNs), with the charging efficiency and charging scheduling being the primary concerns. Therefore, this paper proposes a probabilistic on… ▽ More The internet of things (IoT) based wireless sensor networks (WSNs) face an energy shortage challenge that could be overcome by the novel wireless power transfer (WPT) technology. The combination of WSNs and WPT is known as wireless rechargeable sensor networks (WRSNs), with the charging efficiency and charging scheduling being the primary concerns. Therefore, this paper proposes a probabilistic on-demand charging scheduling for integrated sensing and communication (ISAC)-assisted WRSNs with multiple mobile charging vehicles (MCVs) that addresses three parts. First, it considers the four attributes with their probability distributions to balance the charging load on each MCV. The distributions are residual energy of charging node, distance from MCV to charging node, degree of charging node, and charging node betweenness centrality. Second, it considers the efficient charging factor strategy to partially charge network nodes. Finally, it employs the ISAC concept to efficiently utilize the wireless resources to reduce the traveling cost of each MCV and to avoid the charging conflicts between them. The simulation results show that the proposed protocol outperforms cutting-edge protocols in terms of energy usage efficiency, charging delay, survival rate, and travel distance. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Accepted for publication at the IEEE Global Communications Conference (GLOBECOM) 2023

arXiv:2401.16941 [pdf, ps, other]

Deformed Laurent series rings and completions of the Weyl division ring

Authors: Gang Han, Yulin Chen, Zhennan Pan

Abstract: Let $ L((T^{-1}))$ be the space of (inverse) Laurent serieswith coefficients in some field $L$. It has a standard degree map and the induced topology. With its usual addition and a new product on this space which is continuous and preserves the standard degree map, it will be a complete topological division ring, and called a deformed Laurent series ring. Under mild restrictions, we give the neces… ▽ More Let $ L((T^{-1}))$ be the space of (inverse) Laurent serieswith coefficients in some field $L$. It has a standard degree map and the induced topology. With its usual addition and a new product on this space which is continuous and preserves the standard degree map, it will be a complete topological division ring, and called a deformed Laurent series ring. Under mild restrictions, we give the necessary and sufficient conditions for a product on $ L((T^{-1}))$ to make it a deformed Laurent series ring. Then we apply the above theory to construct the completions of the Weyl division ring $D_1$, over some field of characteristic 0, with respect to a class of discrete valuations on it. Such completions are topological division rings with nice properties. For instance, their valuation rings are non-commutative Henselian rings; the centralizer of each element not in the center is commutative. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.08121 [pdf, other]

CycLight: learning traffic signal cooperation with a cycle-level strategy

Authors: Gengyue Han, Xiaohan Liu, Xianyue Peng, Hao Wang, Yu Han

Abstract: This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm… ▽ More This study introduces CycLight, a novel cycle-level deep reinforcement learning (RL) approach for network-level adaptive traffic signal control (NATSC) systems. Unlike most traditional RL-based traffic controllers that focus on step-by-step decision making, CycLight adopts a cycle-level strategy, optimizing cycle length and splits simultaneously using Parameterized Deep Q-Networks (PDQN) algorithm. This cycle-level approach effectively reduces the computational burden associated with frequent data communication, meanwhile enhancing the practicality and safety of real-world applications. A decentralized framework is formulated for multi-agent cooperation, while attention mechanism is integrated to accurately assess the impact of the surroundings on the current intersection. CycLight is tested in a large synthetic traffic grid using the microscopic traffic simulation tool, SUMO. Experimental results not only demonstrate the superiority of CycLight over other state-of-the-art approaches but also showcase its robustness against information transmission delays. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2312.12423 [pdf, other]

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a… ▽ More The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a single framework. In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework. VistaLLM utilizes an instruction-guided image tokenizer that filters global embeddings using task descriptions to extract compressed and refined features from numerous images. Moreover, VistaLLM employs a gradient-aware adaptive sampling technique to represent binary segmentation masks as sequences, significantly improving over previously used uniform sampling. To bolster the desired capability of VistaLLM, we curate CoinIt, a comprehensive coarse-to-fine instruction tuning dataset with 6.8M samples. We also address the lack of multi-image grounding datasets by introducing a novel task, AttCoSeg (Attribute-level Co-Segmentation), which boosts the model's reasoning and grounding capability over multiple input images. Extensive experiments on a wide range of V- and VL tasks demonstrate the effectiveness of VistaLLM by achieving consistent state-of-the-art performance over strong baselines across all downstream tasks. Our project page can be found at https://shramanpramanick.github.io/VistaLLM/. △ Less

Submitted 19 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight

arXiv:2312.12227 [pdf, other]

HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

Authors: Gaoge Han, Shaoli Huang, Mingming Gong, Jinglei Tang

Abstract: We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhan… ▽ More We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhancing the quality of motion generation. Furthermore, our findings reveal that utilizing few-shot feedback can yield performance levels on par with those attained through extensive human feedback. This discovery emphasizes the potential and efficiency of incorporating few-shot human-guided optimization within latent diffusion models for personalized and style-aware human motion generation applications. The experimental results show the significantly superior performance of our method over existing state-of-the-art approaches. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024 Main Track

arXiv:2311.15013 [pdf, ps, other]

Inequalities and asymptotics for hook numbers in restricted partitions

Authors: William Craig, Madeline Locus Dawsey, Guo-Niu Han

Abstract: In this paper, we consider the asymptotic properties of hook numbers of partitions in restricted classes. More specifically, we compare the frequency with which partitions into odd parts and partitions into distinct parts have hook numbers equal to $h \geq 1$ by deriving an asymptotic formula for the total number of hooks equal to $h$ that appear among partitions into odd and distinct parts, respe… ▽ More In this paper, we consider the asymptotic properties of hook numbers of partitions in restricted classes. More specifically, we compare the frequency with which partitions into odd parts and partitions into distinct parts have hook numbers equal to $h \geq 1$ by deriving an asymptotic formula for the total number of hooks equal to $h$ that appear among partitions into odd and distinct parts, respectively. We use these asymptotic formulas to prove a recent conjecture of the first author and collaborators that for $h \geq 2$ and $n \gg 0$, partitions into odd parts have, on average, more hooks equal to $h$ than do partitions into distinct parts. We also use our asymptotics to prove certain probabilistic statements about how hooks distribute in the rows of partitions. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.14264 [pdf, ps, other]

An ADMM-Based Geometric Configuration Optimization in RSSD-Based Source Localization By UAVs with Spread Angle Constraint

Authors: Xin Cheng, Guangjie Han, Jinlin Peng, Jinfang Jiang, Yu He, Weiqiang Zhu, Feng Shu, Jiangzhou Wang

Abstract: Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for rece… ▽ More Deploying multiple unmanned aerial vehicles (UAVs) to locate a signal-emitting source covers a wide range of military and civilian applications like rescue and target tracking. It is well known that the UAVs-source (sensors-target) geometry, namely geometric configuration, significantly affects the final localization accuracy. This paper focuses on the geometric configuration optimization for received signal strength difference (RSSD)-based passive source localization by drone swarm. Different from prior works, this paper considers a general measuring condition where the spread angle of drone swarm centered on the source is constrained. Subject to this constraint, a geometric configuration optimization problem with the aim of maximizing the determinant of Fisher information matrix (FIM) is formulated. After transforming this problem using matrix theory, an alternating direction method of multipliers (ADMM)-based optimization framework is proposed. To solve the subproblems in this framework, two global optimal solutions based on the Von Neumann matrix trace inequality theorem and majorize-minimize (MM) algorithm are proposed respectively. Finally, the effectiveness as well as the practicality of the proposed ADMM-based optimization algorithm are demonstrated by extensive simulations. △ Less

Submitted 17 July, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.01018 [pdf, other]

Expanding Expressiveness of Diffusion Models with Limited Data via Self-Distillation based Fine-Tuning

Authors: Jiwan Hur, Jaehyun Choi, Gyojin Han, Dong-Jae Lee, Junmo Kim

Abstract: Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness, leading to unsatisfactory results in various downstream tasks utilizing pretrained diffusion models, such as domain translation and text-guided image manipulation. In this paper, we propose Self-Distillation for Fine-Tuning diffusion models (SDFT), a methodology to address the… ▽ More Training diffusion models on limited datasets poses challenges in terms of limited generation capacity and expressiveness, leading to unsatisfactory results in various downstream tasks utilizing pretrained diffusion models, such as domain translation and text-guided image manipulation. In this paper, we propose Self-Distillation for Fine-Tuning diffusion models (SDFT), a methodology to address these challenges by leveraging diverse features from diffusion models pretrained on large source datasets. SDFT distills more general features (shape, colors, etc.) and less domain-specific features (texture, fine details, etc) from the source model, allowing successful knowledge transfer without disturbing the training process on target datasets. The proposed method is not constrained by the specific architecture of the model and thus can be generally adopted to existing frameworks. Experimental results demonstrate that SDFT enhances the expressiveness of the diffusion model with limited datasets, resulting in improved generation capabilities across various downstream tasks. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2310.10856 [pdf]

Joint Optimization of Traffic Signal Control and Vehicle Routing in Signalized Road Networks using Multi-Agent Deep Reinforcement Learning

Authors: Xianyue Peng, Hang Gao, Gengyue Han, Hao Wang, Michael Zhang

Abstract: Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performan… ▽ More Urban traffic congestion is a critical predicament that plagues modern road networks. To alleviate this issue and enhance traffic efficiency, traffic signal control and vehicle routing have proven to be effective measures. In this paper, we propose a joint optimization approach for traffic signal control and vehicle routing in signalized road networks. The objective is to enhance network performance by simultaneously controlling signal timings and route choices using Multi-Agent Deep Reinforcement Learning (MADRL). Signal control agents (SAs) are employed to establish signal timings at intersections, whereas vehicle routing agents (RAs) are responsible for selecting vehicle routes. By establishing relevance between agents and enabling them to share observations and rewards, interaction and cooperation among agents are fostered, which enhances individual training. The Multi-Agent Advantage Actor-Critic algorithm is used to handle multi-agent environments, and Deep Neural Network (DNN) structures are designed to facilitate the algorithm's convergence. Notably, our work is the first to utilize MADRL in determining the optimal joint policy for signal control and vehicle routing. Numerical experiments conducted on the modified Sioux network demonstrate that our integration of signal control and vehicle routing outperforms controlling signal timings or vehicles' routes alone in enhancing traffic efficiency. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.06404 [pdf, other]

Hexa: Self-Improving for Knowledge-Grounded Dialogue System

Authors: Daejin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim

Abstract: A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the gene… ▽ More A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e.g., web-search, memory retrieval) with modular approaches. However, data for such steps are often inaccessible compared to those of dialogue responses as they are unobservable in an ordinary dialogue. To fill in the absence of these data, we develop a self-improving method to improve the generative performances of intermediate steps without the ground truth data. In particular, we propose a novel bootstrapping scheme with a guided prompt and a modified loss function to enhance the diversity of appropriate self-generated responses. Through experiments on various benchmark datasets, we empirically demonstrate that our method successfully leverages a self-improving mechanism in generating intermediate and final responses and improves the performances on the task of knowledge-grounded dialogue generation. △ Less

Submitted 2 April, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2309.03509 [pdf, other]

BroadCAM: Outcome-agnostic Class Activation Mapping for Small-scale Weakly Supervised Applications

Authors: Jiatai Lin, Guoqiang Han, Xuemiao Xu, Changhong Liang, Tien-Tsin Wong, C. L. Philip Chen, Zaiyi Liu, Chu Han

Abstract: Class activation mapping~(CAM), a visualization technique for interpreting deep learning models, is now commonly used for weakly supervised semantic segmentation~(WSSS) and object localization~(WSOL). It is the weighted aggregation of the feature maps by activating the high class-relevance ones. Current CAM methods achieve it relying on the training outcomes, such as predicted scores~(forward info… ▽ More Class activation mapping~(CAM), a visualization technique for interpreting deep learning models, is now commonly used for weakly supervised semantic segmentation~(WSSS) and object localization~(WSOL). It is the weighted aggregation of the feature maps by activating the high class-relevance ones. Current CAM methods achieve it relying on the training outcomes, such as predicted scores~(forward information), gradients~(backward information), etc. However, when with small-scale data, unstable training may lead to less effective model outcomes and generate unreliable weights, finally resulting in incorrect activation and noisy CAM seeds. In this paper, we propose an outcome-agnostic CAM approach, called BroadCAM, for small-scale weakly supervised applications. Since broad learning system (BLS) is independent to the model learning, BroadCAM can avoid the weights being affected by the unreliable model outcomes when with small-scale data. By evaluating BroadCAM on VOC2012 (natural images) and BCSS-WSSS (medical images) for WSSS and OpenImages30k for WSOL, BroadCAM demonstrates superior performance than existing CAM methods with small-scale data (less than 5\%) in different CNN architectures. It also achieves SOTA performance with large-scale training data. Extensive qualitative comparisons are conducted to demonstrate how BroadCAM activates the high class-relevance feature maps and generates reliable CAMs when with small-scale training data. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.00783 [pdf, other]

Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking

Authors: Mingzhan Yang, Guangxin Han, Bin Yan, Wenhua Zhang, Jinqing Qi, Huchuan Lu, Dong Wang

Abstract: Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, spatial and appearance information will become ambiguous simultaneously d… ▽ More Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, spatial and appearance information will become ambiguous simultaneously due to the high overlap among objects. In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues to compensate for strong cues. Along with velocity direction, we introduce the confidence and height state as potential weak cues. With superior performance, our method still maintains Simple, Online and Real-Time (SORT) characteristics. Also, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner. Significant and consistent improvements are observed when applying our method to 5 different representative trackers. Further, with both strong and weak cues, our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack where interaction and severe occlusion frequently happen with complex motions. The code and models are available at https://github.com/ymzis69/HybridSORT. △ Less

Submitted 20 January, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted to AAAI 2024

arXiv:2307.08671 [pdf, other]

Deep Cross-Modal Steganography Using Neural Representations

Authors: Gyojin Han, Dong-Jae Lee, Jiwan Hur, Jaehyun Choi, Junmo Kim

Abstract: Steganography is the process of embedding secret data into another message or data, in such a way that it is not easily noticeable. With the advancement of deep learning, Deep Neural Networks (DNNs) have recently been utilized in steganography. However, existing deep steganography techniques are limited in scope, as they focus on specific data types and are not effective for cross-modal steganogra… ▽ More Steganography is the process of embedding secret data into another message or data, in such a way that it is not easily noticeable. With the advancement of deep learning, Deep Neural Networks (DNNs) have recently been utilized in steganography. However, existing deep steganography techniques are limited in scope, as they focus on specific data types and are not effective for cross-modal steganography. Therefore, We propose a deep cross-modal steganography framework using Implicit Neural Representations (INRs) to hide secret data of various formats in cover images. The proposed framework employs INRs to represent the secret data, which can handle data of various modalities and resolutions. Experiments on various secret datasets of diverse types demonstrate that the proposed approach is expandable and capable of accommodating different modalities. △ Less

Submitted 7 October, 2023; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: ICIP 2023 Oral

arXiv:2307.05889 [pdf, other]

Rethinking Mitosis Detection: Towards Diverse Data and Feature Representation

Authors: Hao Wang, Jiatai Lin, Danyi Li, Jing Wang, Bingchao Zhao, Zhenwei Shi, Xipeng Pan, Huadeng Wang, Bingbing Li, Changhong Liang, Guoqiang Han, Li Liang, Chu Han, Zaiyi Liu

Abstract: Mitosis detection is one of the fundamental tasks in computational pathology, which is extremely challenging due to the heterogeneity of mitotic cell. Most of the current studies solve the heterogeneity in the technical aspect by increasing the model complexity. However, lacking consideration of the biological knowledge and the complex model design may lead to the overfitting problem while limited… ▽ More Mitosis detection is one of the fundamental tasks in computational pathology, which is extremely challenging due to the heterogeneity of mitotic cell. Most of the current studies solve the heterogeneity in the technical aspect by increasing the model complexity. However, lacking consideration of the biological knowledge and the complex model design may lead to the overfitting problem while limited the generalizability of the detection model. In this paper, we systematically study the morphological appearances in different mitotic phases as well as the ambiguous non-mitotic cells and identify that balancing the data and feature diversity can achieve better generalizability. Based on this observation, we propose a novel generalizable framework (MitDet) for mitosis detection. The data diversity is considered by the proposed diversity-guided sample balancing (DGSB). And the feature diversity is preserved by inter- and intra- class feature diversity-preserved module (InCDP). Stain enhancement (SE) module is introduced to enhance the domain-relevant diversity of both data and features simultaneously. Extensive experiments have demonstrated that our proposed model outperforms all the SOTA approaches in several popular mitosis detection datasets in both internal and external test sets using minimal annotation efforts with point annotations only. Comprehensive ablation studies have also proven the effectiveness of the rethinking of data and feature diversity balancing. By analyzing the results quantitatively and qualitatively, we believe that our proposed model not only achieves SOTA performance but also might inspire the future studies in new perspectives. Source code is at https://github.com/Onehour0108/MitDet. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.10079 [pdf, other]

doi 10.1145/3580305.3599862

M3PT: A Multi-Modal Model for POI Tagging

Authors: Jingsong Yang, Guanzhou Han, Deqing Yang, Jingping Liu, Yanghua Xiao, Xiang Xu, Baohua Wu, Shenghua Ni

Abstract: POI tagging aims to annotate a point of interest (POI) with some informative tags, which facilitates many services related to POIs, including search, recommendation, and so on. Most of the existing solutions neglect the significance of POI images and seldom fuse the textual and visual features of POIs, resulting in suboptimal tagging performance. In this paper, we propose a novel Multi-Modal Model… ▽ More POI tagging aims to annotate a point of interest (POI) with some informative tags, which facilitates many services related to POIs, including search, recommendation, and so on. Most of the existing solutions neglect the significance of POI images and seldom fuse the textual and visual features of POIs, resulting in suboptimal tagging performance. In this paper, we propose a novel Multi-Modal Model for POI Tagging, namely M3PT, which achieves enhanced POI tagging through fusing the target POI's textual and visual features, and the precise matching between the multi-modal representations. Specifically, we first devise a domain-adaptive image encoder (DIE) to obtain the image embeddings aligned to their gold tags' semantics. Then, in M3PT's text-image fusion module (TIF), the textual and visual representations are fully fused into the POIs' content embeddings for the subsequent matching. In addition, we adopt a contrastive learning strategy to further bridge the gap between the representations of different modalities. To evaluate the tagging models' performance, we have constructed two high-quality POI tagging datasets from the real-world business scenario of Ali Fliggy. Upon the datasets, we conducted the extensive experiments to demonstrate our model's advantage over the baselines of uni-modality and multi-modality, and verify the effectiveness of important components in M3PT, including DIE, TIF and the contrastive learning strategy. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted by KDD 2023

ACM Class: H.3.0

arXiv:2306.07289 [pdf, other]

Multi-Interactive-Modality based Modeling for Myopia Pro-Gression of Adolescent Student

Authors: Xiangyu Yan, Gongen Han, Can Fang, Xuan Jing

Abstract: Myopia is a common visual disorder that affects millions of people worldwide and its prevalence has been increasing in recent years. Environmental factors, such as reading time, viewing distance, and ambient lighting, have been identified as potential factors in the development of myopia. In this study, we investigated the relationship between three major factors and myopia in 120 adolescents. By… ▽ More Myopia is a common visual disorder that affects millions of people worldwide and its prevalence has been increasing in recent years. Environmental factors, such as reading time, viewing distance, and ambient lighting, have been identified as potential factors in the development of myopia. In this study, we investigated the relationship between three major factors and myopia in 120 adolescents. By collecting environmental images of the adolescents in the learning state as well as retinal fundus images, we proposed an environmental visual load (EVL) model to extract the potential information in these images. Through experimental data analysis, we found that these three major factors are closely related to the severity of myopia, and that the simultaneous exacerbation of these factors sharply increases the myopia of the eye. Our results suggest that interventions targeting these environmental factors may help prevent and manage myopia. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 9 pages, 5 figures

arXiv:2306.02393 [pdf, other]

Accessible Robot Control in Mixed Reality

Authors: Ganlin Zhang, Deheng Zhang, Longteng Duan, Guo Han

Abstract: A novel method to control the Spot robot of Boston Dynamics by Hololens 2 is proposed. This method is mainly designed for people with physical disabilities, users can control the robot's movement and robot arm without using their hands. The eye gaze tracking and head motion tracking technologies of Hololens 2 are utilized for sending control commands. The movement of the robot would follow the eye… ▽ More A novel method to control the Spot robot of Boston Dynamics by Hololens 2 is proposed. This method is mainly designed for people with physical disabilities, users can control the robot's movement and robot arm without using their hands. The eye gaze tracking and head motion tracking technologies of Hololens 2 are utilized for sending control commands. The movement of the robot would follow the eye gaze and the robot arm would mimic the pose of the user's head. Through our experiment, our method is comparable with the traditional control method by joystick in both time efficiency and user experience. Demo can be found on our project webpage: https://zhangganlin.github.io/Holo-Spot-Page/index.html △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: Course Project of Mixed Reality at ETH Zurich

arXiv:2305.13973 [pdf, other]

Effortless Integration of Memory Management into Open-Domain Conversation Systems

Authors: Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, Daejin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo

Abstract: Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propo… ▽ More Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach. One of the limitations of the system, however, is the absence of management capability for external memory. In this paper, we propose a simple method to improve BlenderBot3 by integrating memory management ability into it. Since no training data exists for this purpose, we propose an automating dataset creation for memory management. Our method 1) requires little cost for data construction, 2) does not affect performance in other tasks, and 3) reduces external memory. We show that our proposed model BlenderBot3-M^3, which is multi-task trained with memory management, outperforms BlenderBot3 with a relative 4% performance gain in terms of F1 score. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.07701 [pdf, ps, other]

A Gröbner Basis Approach to Combinatorial Nullstellensatz

Authors: Yang Xu, Haibin Kan, Guangyue Han

Abstract: In this paper, using some conditions that arise naturally in Alon's combinatorial Nullstellensatz as well as its various extensions and generalizations, we characterize Gröbner bases consisting of monic polynomials, which helps us to establish a Nullstellensatz from a Gröbner basis perspective. As corollaries of this general Nullstellensatz, we establish four special Nullstellensatz, which, among… ▽ More In this paper, using some conditions that arise naturally in Alon's combinatorial Nullstellensatz as well as its various extensions and generalizations, we characterize Gröbner bases consisting of monic polynomials, which helps us to establish a Nullstellensatz from a Gröbner basis perspective. As corollaries of this general Nullstellensatz, we establish four special Nullstellensatz, which, among others, include a common generalization of the Nullstellensatz for multisets established in Kós, Rónyai and Mészáros \cite{23,24} and the Nullstellensatz with multiplicity established in Ball and Serra \cite{9}, and include a punctured Nullstellensatz, generalizing several existing results in the literature. As applications of our punctured Nullstellensatz, we extend some results on hyperplane covering in \cite{9,23,24} to wider settings, and give an alternative proof of the generalized Alon-Füredi theorem established in Bishnoi, Clark, Potukuchi and Schmitt \cite{12}. Unless specified otherwise, all our results are established over an arbitrary commutative ring $R$. △ Less

Submitted 16 April, 2023; originally announced April 2023.

arXiv:2304.04625 [pdf, other]

Reinforcement Learning-Based Black-Box Model Inversion Attacks

Authors: Gyojin Han, Jaehyun Choi, Haeil Lee, Junmo Kim

Abstract: Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current… ▽ More Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current black-box model inversion attacks that utilize GANs suffer from issues such as being unable to guarantee the completion of the attack process within a predetermined number of query accesses or achieve the same level of performance as white-box attacks. To overcome these limitations, we propose a reinforcement learning-based black-box model inversion attack. We formulate the latent space search as a Markov Decision Process (MDP) problem and solve it with reinforcement learning. Our method utilizes the confidence scores of the generated images to provide rewards to an agent. Finally, the private data can be reconstructed using the latent vectors found by the agent trained in the MDP. The experiment results on various datasets and models demonstrate that our attack successfully recovers the private information of the target model by achieving state-of-the-art attack performance. We emphasize the importance of studies on privacy-preserving machine learning by proposing a more advanced black-box model inversion attack. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: CVPR 2023, Accepted

arXiv:2303.15466 [pdf, other]

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Authors: Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang

Abstract: Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a few labeled data, ViT tends to overfit and suffers from severe performance degradation due to its absence of CNN-alike inductive bias. Previous works i… ▽ More Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a few labeled data, ViT tends to overfit and suffers from severe performance degradation due to its absence of CNN-alike inductive bias. Previous works in FSL avoid such problem either through the help of self-supervised auxiliary losses, or through the dextile uses of label information under supervised settings. But the gap between self-supervised and supervised few-shot Transformers is still unfilled. Inspired by recent advances in self-supervised knowledge distillation and masked image modeling (MIM), we propose a novel Supervised Masked Knowledge Distillation model (SMKD) for few-shot Transformers which incorporates label information into self-distillation frameworks. Compared with previous self-supervised methods, we allow intra-class knowledge distillation on both class and patch tokens, and introduce the challenging task of masked patch tokens reconstruction across intra-class images. Experimental results on four few-shot classification benchmark datasets show that our method with simple design outperforms previous methods by a large margin and achieves a new start-of-the-art. Detailed ablation studies confirm the effectiveness of each component of our model. Code for this paper is available here: https://github.com/HL-hanlin/SMKD. △ Less

Submitted 28 March, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: To appear in CVPR 2023

arXiv:2303.09674 [pdf, other]

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

Authors: Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang

Abstract: Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data. Existing approaches enhance few-shot generalization with the sacrifice of base-class performance, or maintain high precision in base-class detection with limited improvement in novel-class adaptation. In this paper, we point out the re… ▽ More Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data. Existing approaches enhance few-shot generalization with the sacrifice of base-class performance, or maintain high precision in base-class detection with limited improvement in novel-class adaptation. In this paper, we point out the reason is insufficient Discriminative feature learning for all of the classes. As such, we propose a new training framework, DiGeo, to learn Geometry-aware features of inter-class separation and intra-class compactness. To guide the separation of feature clusters, we derive an offline simplex equiangular tight frame (ETF) classifier whose weights serve as class centers and are maximally and equally separated. To tighten the cluster for each class, we include adaptive class-specific margins into the classification loss and encourage the features close to the class centers. Experimental studies on two few-shot benchmark datasets (VOC, COCO) and one long-tail dataset (LVIS) demonstrate that, with a single model, our method can effectively improve generalization on novel classes without hurting the detection of base classes. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: CVPR 2023 Camera Ready (Supp Attached). Code Link: https://github.com/Phoenix-V/DiGeo

arXiv:2303.07683 [pdf, other]

Recovering Arrhythmic EEG Transients from Their Stochastic Interference

Authors: Javier Díaz, Hiroyasu Ando, GoEun Han, Olga Malyshevskaya, Xifang Hayashi, Juan-Carlos Letelier, Masashi Yanagisawa, Kaspar E. Vogt

Abstract: Traditionally, the neuronal dynamics underlying electroencephalograms (EEG) have been understood as arising from \textit{rhythmic oscillators with varying degrees of synchronization}. This dominant metaphor employs frequency domain EEG analysis to identify the most prominent populations of neuronal current sources in terms of their frequency and spectral power. However, emerging perspectives on EE… ▽ More Traditionally, the neuronal dynamics underlying electroencephalograms (EEG) have been understood as arising from \textit{rhythmic oscillators with varying degrees of synchronization}. This dominant metaphor employs frequency domain EEG analysis to identify the most prominent populations of neuronal current sources in terms of their frequency and spectral power. However, emerging perspectives on EEG highlight its arrhythmic nature, which is primarily inferred from broadband EEG properties like the ubiquitous $1/f$ spectrum. In the present study, we use an \textit{arrhythmic superposition of pulses} as a metaphor to explain the origin of EEG. This conceptualization has a fundamental problem because the interference produced by the superpositions of pulses generates colored Gaussian noise, masking the temporal profile of the generating pulse. We solved this problem by developing a mathematical method involving the derivative of the autocovariance function to recover excellent approximations of the underlying pulses, significantly extending the analysis of this type of stochastic processes. When the method is applied to spontaneous mouse EEG sampled at $5$ kHz during the sleep-wake cycle, specific patterns -- called $Ψ$-patterns -- characterizing NREM sleep, REM sleep, and wakefulness are revealed. $Ψ$-patterns can be understood theoretically as \textit{power density in the time domain} and correspond to combinations of generating pulses at different time scales. Remarkably, we report the first EEG wakefulness-specific feature, which corresponds to an ultra-fast ($\sim 1$ ms) transient component of the observed patterns. By shifting the paradigm of EEG genesis from oscillators to random pulse generators, our theoretical framework pushes the boundaries of traditional Fourier-based EEG analysis, paving the way for new insights into the arrhythmic components of neural dynamics. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: Original research manuscript in PDF format, 46 pages long, with 13 figures and one table

arXiv:2302.14139 [pdf, other]

Scalable End-to-End ML Platforms: from AutoML to Self-serve

Authors: Igor L. Markov, Pavlos A. Apostolopoulos, Mia R. Garrard, Tanya Qie, Yin Huang, Tanvi Gupta, Anika Li, Cesar Cardoso, George Han, Ryan Maghsoudian, Norm Zhou

Abstract: ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integrat… ▽ More ML platforms help enable intelligent data-driven applications and maintain them with limited engineering effort. Upon sufficiently broad adoption, such platforms reach economies of scale that bring greater component reuse while improving efficiency of system development and maintenance. For an end-to-end ML platform with broad adoption, scaling relies on pervasive ML automation and system integration to reach the quality we term self-serve that we define with ten requirements and six optional capabilities. With this in mind, we identify long-term goals for platform development, discuss related tradeoffs and future work. Our reasoning is illustrated on two commercially-deployed end-to-end ML platforms that host hundreds of real-time use cases -- one general-purpose and one specialized. △ Less

Submitted 3 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 1 figure, 2 tables

arXiv:2302.13073 [pdf, other]

Feedback Capacity of the Continuous-Time ARMA(1,1) Gaussian Channel

Authors: Jun Su, Guangyue Han, Shlomo Shamai

Abstract: We consider the continuous-time ARMA(1,1) Gaussian channel and derive its feedback capacity in closed form. More specifically, the channel is given by $\boldsymbol{y}(t) =\boldsymbol{x}(t) +\boldsymbol{z}(t)$, where the channel input $\{\boldsymbol{x}(t) \}$ satisfies average power constraint $P$ and the noise $\{\boldsymbol{z}(t)\}$ is a first-order {\em autoregressive moving average} (ARMA(1,1))… ▽ More We consider the continuous-time ARMA(1,1) Gaussian channel and derive its feedback capacity in closed form. More specifically, the channel is given by $\boldsymbol{y}(t) =\boldsymbol{x}(t) +\boldsymbol{z}(t)$, where the channel input $\{\boldsymbol{x}(t) \}$ satisfies average power constraint $P$ and the noise $\{\boldsymbol{z}(t)\}$ is a first-order {\em autoregressive moving average} (ARMA(1,1)) Gaussian process satisfying $$ \boldsymbol{z}^\prime(t)+κ\boldsymbol{z}(t)=(κ+λ)\boldsymbol{w}(t)+\boldsymbol{w}^\prime(t), $$ where $κ>0,~λ\in\mathbb{R}$ and $\{\boldsymbol{w}(t) \}$ is a white Gaussian process with unit double-sided spectral density. We show that the feedback capacity of this channel is equal to the unique positive root of the equation $$ P(x+κ)^2 = 2x(x+\vert κ+λ\vert)^2 $$ when $-2κ<λ<0$ and is equal to $P/2$ otherwise. Among many others, this result shows that, as opposed to a discrete-time additive Gaussian channel, feedback may not increase the capacity of a continuous-time additive Gaussian channel even if the noise process is colored. The formula enables us to conduct a thorough analysis of the effect of feedback on the capacity for such a channel. We characterize when the feedback capacity equals or doubles the non-feedback capacity; moreover, we disprove continuous-time analogues of the half-bit bound and Cover's $2P$ conjecture for discrete-time additive Gaussian channels. △ Less

Submitted 10 April, 2024; v1 submitted 25 February, 2023; originally announced February 2023.

arXiv:2302.12662 [pdf, other]

FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

Authors: Tianpeng Deng, Yanqi Huang, Guoqiang Han, Zhenwei Shi, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-jing Guo, C. L. Philip Chen, Chu Han

Abstract: Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by keeping training samples locally, but existing FL-based frameworks require a large number of well-annotated… ▽ More Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by keeping training samples locally, but existing FL-based frameworks require a large number of well-annotated training samples and numerous rounds of communication which hinder their practicability in the real-world clinical scenario. In this paper, we propose a universal and lightweight federated learning framework, named Federated Deep-Broad Learning (FedDBL), to achieve superior classification performance with limited training samples and only one-round communication. By simply associating a pre-trained deep learning feature extractor, a fast and lightweight broad learning inference system and a classical federated aggregation approach, FedDBL can dramatically reduce data dependency and improve communication efficiency. Five-fold cross-validation demonstrates that FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Furthermore, due to the lightweight design and one-round communication, FedDBL reduces the communication burden from 4.6GB to only 276.5KB per client using the ResNet-50 backbone at 50-round training. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk. Code is available at https://github.com/tianpeng-deng/FedDBL. △ Less

Submitted 17 December, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

arXiv:2301.10934 [pdf]

doi 10.1002/smll.202300617

Deep learning-enabled multiplexed point-of-care sensor using a paper-based fluorescence vertical flow assay

Authors: Artem Goncharov, Hyou-Arm Joung, Rajesh Ghosh, Gyeo-Re Han, Zachary S. Ballard, Quinn Maloney, Alexandra Bell, Chew Tin Zar Aung, Omai B. Garner, Dino Di Carlo, Aydogan Ozcan

Abstract: We demonstrate multiplexed computational sensing with a point-of-care serodiagnosis assay to simultaneously quantify three biomarkers of acute cardiac injury. This point-of-care sensor includes a paper-based fluorescence vertical flow assay (fxVFA) processed by a low-cost mobile reader, which quantifies the target biomarkers through trained neural networks, all within <15 min of test time using 50… ▽ More We demonstrate multiplexed computational sensing with a point-of-care serodiagnosis assay to simultaneously quantify three biomarkers of acute cardiac injury. This point-of-care sensor includes a paper-based fluorescence vertical flow assay (fxVFA) processed by a low-cost mobile reader, which quantifies the target biomarkers through trained neural networks, all within <15 min of test time using 50 microliters of serum sample per patient. This fxVFA platform is validated using human serum samples to quantify three cardiac biomarkers, i.e., myoglobin, creatine kinase-MB (CK-MB) and heart-type fatty acid binding protein (FABP), achieving less than 0.52 ng/mL limit-of-detection for all three biomarkers with minimal cross-reactivity. Biomarker concentration quantification using the fxVFA that is coupled to neural network-based inference is blindly tested using 46 individually activated cartridges, which showed a high correlation with the ground truth concentrations for all three biomarkers achieving > 0.9 linearity and < 15 % coefficient of variation. The competitive performance of this multiplexed computational fxVFA along with its inexpensive paper-based design and handheld footprint make it a promising point-of-care sensor platform that could expand access to diagnostics in resource-limited settings. △ Less

Submitted 25 January, 2023; originally announced January 2023.

Comments: 17 Pages, 6 Figures

Journal ref: Small (2023)

arXiv:2301.02044 [pdf, ps, other]

Simultaneously Transmitting and Reflecting (STAR) RIS Assisted Over-the-Air Computation Systems

Authors: Xiongfei Zhai, Guojun Han, Yunlong Cai, Yuanwei Liu, Lajos Hanzo

Abstract: The performance of over-the-air computation (AirComp) systems degrades due to the hostile channel conditions of wireless devices (WDs), which can be significantly improved by the employment of reconfigurable intelligent surfaces (RISs). However, the conventional RISs require that the WDs have to be located in the half-plane of the reflection space, which restricts their potential benefits. To addr… ▽ More The performance of over-the-air computation (AirComp) systems degrades due to the hostile channel conditions of wireless devices (WDs), which can be significantly improved by the employment of reconfigurable intelligent surfaces (RISs). However, the conventional RISs require that the WDs have to be located in the half-plane of the reflection space, which restricts their potential benefits. To address this issue, the novel family of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) is considered in AirComp systems to improve the computation accuracy across a wide coverage area. To minimize the computation mean-squared-error (MSE) in STAR-RIS assisted AirComp systems, we propose a joint beamforming design for optimizing both the transmit power at the WDs, as well as the passive reflect and transmit beamforming matrices at the STAR-RIS, and the receive beamforming vector at the fusion center (FC). Specifically, in the updates of the passive reflect and transmit beamforming matrices, closed-form solutions are derived by introducing an auxiliary variable and exploiting the coupled binary phase-shift conditions. Moreover, by assuming that the number of antennas at the FC and that of elements at the STAR-RIS/RIS are sufficiently high, we theoretically prove that the STAR-RIS assisted AirComp systems provide higher computation accuracy than the conventional RIS assisted systems. Our numerical results show that the proposed beamforming design outperforms the benchmark schemes relying on random phase-shift constraints and the deployment of conventional RIS. Moreover, its performance is close to the lower bound achieved by the beamforming design based on the STAR-RIS dispensing with coupled phase-shift constraints. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Showing 1–50 of 261 results for author: Han, G