Search | arXiv e-print repository

Imaginary-time Mpemba effect in quantum many-body systems

Authors: Wei-Xuan Chang, Shuai Yin, Shi-Xin Zhang, Zi-Xiang Li

Abstract: Various exotic phenomena emerge in non-equilibrium quantum many-body systems. The Mpemba effect, denoting the situation where a hot system freezes faster than the colder one, is a counterintuitive non-equilibrium phenomenon that has attracted enduring interest for more than half a century. In this Letter, we report a novel phenomenon of the Mpemba effect in the imaginary-time relaxation dynamics i… ▽ More Various exotic phenomena emerge in non-equilibrium quantum many-body systems. The Mpemba effect, denoting the situation where a hot system freezes faster than the colder one, is a counterintuitive non-equilibrium phenomenon that has attracted enduring interest for more than half a century. In this Letter, we report a novel phenomenon of the Mpemba effect in the imaginary-time relaxation dynamics in quantum many-body systems, dubbed as imaginary-time Mpemba effect (ITME). Through numerically exact quantum Monte-Carlo (QMC) simulation, we unambiguously demonstrate that in different classes of interacting quantum models, the initial states with higher energy are relaxed faster than lower-energy initial states in the process of imaginary-time relaxation. The emergence of ITME is intimately associated with the low-energy excitations in quantum many-body systems. More crucially, since imaginary-time dynamics is broadly applied in numerical simulation on the quantum many-body ground states, the discovery of ITME potentially provides a new pathway to expedite the quantum many-body computation, particularly for QMC involving the sign problem. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 4.5+8 pages, 4+6 figures

arXiv:2409.05323 [pdf]

Preventing overfitting in infrared ellipsometry using temperature dependence: fused silica as a case study

Authors: Shenwei Yin, Jin-Woo Cho, Demeng Feng, Hongyan Mei, Tanuj Kumar, Chenghao Wan, Yeonghoon Jin, Minjeong Kim, Mikhail A. Kats

Abstract: The dispersive linear optical properties of materials are frequently described using oscillator models, where the oscillators represent interactions between light and various material resonances (vibrational, free-carrier, interband, etc.). The state-of-the-art measurement of the complex refractive index is variable-angle spectroscopic ellipsometry (VASE), where additional measurement angles and t… ▽ More The dispersive linear optical properties of materials are frequently described using oscillator models, where the oscillators represent interactions between light and various material resonances (vibrational, free-carrier, interband, etc.). The state-of-the-art measurement of the complex refractive index is variable-angle spectroscopic ellipsometry (VASE), where additional measurement angles and the measured depolarization of light provides much more information compared to simpler measurements such as single-angle reflectance and transmittance. Nevertheless, even state-of-the-art VASE data can be hard to uniquely fit using oscillator models, and the resulting models may be hard to interpret physically. Here, we demonstrate the use of an additional degree of freedom, temperature, to improve the accuracy, uniqueness, and physicality of oscillator models of materials. Our approach relies on the well-understood temperature dependence of material resonances, and in particular vibrational resonances in amorphous SiO2, which are expected to change monotonically from room temperature to hundreds of degrees C. We performed VASE measurements at different temperatures, independently fitted the data at each temperature, and then confirmed that our models are unique and physical by monitoring the temperature dependence of the resulting fitting parameters. Using this technique, we generated highly accurate and precise data sets and material models describing the mid-infrared complex refractive index of three different grades of fused SiO2, which can then be used for modeling of mid-infrared optical components such as thermal emitters. △ Less

Submitted 10 September, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

Comments: Main text + supplementary (pdf updated to fix figure rendering)

arXiv:2409.00803 [pdf]

Broadband light extraction from near-surface NV centers using crystalline-silicon antennas

Authors: Minjeong Kim, Maryam Zahedian, Wenxin Wu, Chengyu Fang, Zhaoning Yu, Raymond A. Wambold, Shenwei Yin, David A. Czaplewski, Jennifer T. Choy, Mikhail A. Kats

Abstract: We use crystalline silicon (Si) antennas to efficiently extract broadband single-photon fluorescence from shallow nitrogen-vacancy (NV) centers in diamond into free space. Our design features relatively easy-to-pattern high-index Si resonators on the diamond surface to boost photon extraction by overcoming total internal reflection and Fresnel reflection at the diamond-air interface, and providing… ▽ More We use crystalline silicon (Si) antennas to efficiently extract broadband single-photon fluorescence from shallow nitrogen-vacancy (NV) centers in diamond into free space. Our design features relatively easy-to-pattern high-index Si resonators on the diamond surface to boost photon extraction by overcoming total internal reflection and Fresnel reflection at the diamond-air interface, and providing modest Purcell enhancement, without etching or otherwise damaging the diamond surface. In simulations, ~20 times more single photons are collected from a single NV center compared to the case without the antenna; in experiments, we observe an enhancement of ~4 times, limited by spatial alignment between the NV and the antenna. Our approach can be readily applied to other color centers in diamond, and more generally to the extraction of light from quantum emitters in wide-bandgap materials. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Main text + supplementary

arXiv:2408.15548 [pdf, other]

ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model

Authors: Lifan Jiang, Zhihui Wang, Siqi Yin, Guangxiao Ma, Peng Zhang, Boxi Wu

Abstract: Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame. Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios. However, these methods still face challenges such as poor noise resistance and frequent ID switches. In this research, we… ▽ More Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame. Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios. However, these methods still face challenges such as poor noise resistance and frequent ID switches. In this research, we propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on perturbed bounding boxes. This progressive denoising strategy significantly improves the model's noise resistance. During the training phase, paired object boxes within two adjacent frames are diffused from ground-truth boxes to a random distribution, and then the model learns to detect and track by reversing this process. In inference, the model refines randomly generated boxes into detection and tracking results through minimal denoising steps. ConsistencyTrack also introduces an innovative target association strategy to address target occlusion. Experiments on the MOT17 and DanceTrack datasets demonstrate that ConsistencyTrack outperforms other compared methods, especially better than DiffusionTrack in inference speed and other performance metrics. Our code is available at https://github.com/Tankowa/ConsistencyTrack. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2308.09905 by other authors

arXiv:2408.07750 [pdf, other]

Quantum Mpemba effects in many-body localization systems

Authors: Shuo Liu, Hao-Kai Zhang, Shuai Yin, Shi-Xin Zhang, Hong Yao

Abstract: The nonequilibrium dynamics of quantum many-body systems have attracted growing attention due to various intriguing phenomena absent in equilibrium physics. One famous example is the quantum Mpemba effect, where the subsystem symmetry is restored faster under a symmetric quench from a more asymmetric initial state. The quantum Mpemba effect has been extensively studied in integrable and chaotic sy… ▽ More The nonequilibrium dynamics of quantum many-body systems have attracted growing attention due to various intriguing phenomena absent in equilibrium physics. One famous example is the quantum Mpemba effect, where the subsystem symmetry is restored faster under a symmetric quench from a more asymmetric initial state. The quantum Mpemba effect has been extensively studied in integrable and chaotic systems. In this Letter, we investigate symmetry restoration and quantum Mpemba effect in many-body localized systems with various initial states. We reveal that the symmetry can still be fully restored in many-body localization phases without approaching thermal equilibrium. Furthermore, we demonstrate that the presence of the quantum Mpemba effect is universal for any initial tilted product state, contrasting to the cases in the chaotic systems where the presence of the quantum Mpemba effect relies on the choice of initial states. We also provide a theoretical analysis of symmetry restoration and quantum Mpemba effects with the help of the effective model for many-body localization. This Letter not only sheds light on extending the quantum Mpemba effect to more non-equilibrium settings but also contributes to a deeper understanding of the many-body localization. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 23 pages (including supplemental materials), 11 figures

arXiv:2408.06138 [pdf, other]

Nonequilibrium Critical Dynamics with Emergent Supersymmetry

Authors: Zhi Zeng, Yin-Kai Yu, Zi-Xiang Li, Shuai Yin

Abstract: Proposed as an elegant symmetry relating bosons and fermions, spacetime supersymmetry (SUSY) has been actively pursued in both particle physics and emergent phenomena in quantum critical points (QCP) of topological quantum materials. However, how SUSY casts the light on nonequilibrium dynamics remains open. In this letter, we investigate the Kibble-Zurek dynamics across a QCP with emergent… ▽ More Proposed as an elegant symmetry relating bosons and fermions, spacetime supersymmetry (SUSY) has been actively pursued in both particle physics and emergent phenomena in quantum critical points (QCP) of topological quantum materials. However, how SUSY casts the light on nonequilibrium dynamics remains open. In this letter, we investigate the Kibble-Zurek dynamics across a QCP with emergent $\mathcal{N}=2$ spacetime SUSY between the Dirac semimetal and a superconductor through large-scale quantum Monte Carlo simulation. The scaling behaviors in the whole driven process are uncovered to satisfy the full finite-time scaling (FTS) forms. More crucially, we demonstrate that the emergent SUSY manifests in the intimate relation between the FTS behaviors of fermionic and bosonic observables, namely the fermions and bosons acquire the identical anomalous dimensions. Our work not only brings a fundamental new ingredient into the critical theory with SUSY, but also provide the theoretical guidance to experimental detect of QCP with emergent SUSY from the perspectives of Kibble-Zurek mechanism and FTS. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: 7+2 pages, 3 figures

arXiv:2408.05417 [pdf, ps, other]

A Class of Analytical Models for Black holes Surrounded by Dark Matter Halos

Authors: Zibo Shen, Anzhong Wang, Shaoyu Yin

Abstract: We present a class of analytic models for a dark matter halo surrounding a Schwarzschild black hole sitting at the center of a galaxy, with a variable inner radius $r_{\text{in}}$, at which the density profile of the dark matter halo vanishes. We examine in detail how the three energy conditions are satisfied in such models. In particular, we find that the three energy conditions are satisfied whe… ▽ More We present a class of analytic models for a dark matter halo surrounding a Schwarzschild black hole sitting at the center of a galaxy, with a variable inner radius $r_{\text{in}}$, at which the density profile of the dark matter halo vanishes. We examine in detail how the three energy conditions are satisfied in such models. In particular, we find that the three energy conditions are satisfied when $r_{\text{in}} > 5M/2$, where $M$ denotes the mass of the black hole. These solutions expressed explicitly in closed form are particularly valuable for the studies of the gravitational waveforms of extreme/intermediate mass ratio inspirals and the nature of dark matter in galaxies. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 5 pages, no figures

arXiv:2408.02484 [pdf, other]

Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection

Authors: Ting Lei, Shaofeng Yin, Yuxin Peng, Yang Liu

Abstract: Zero-shot Human-Object Interaction (HOI) detection has emerged as a frontier topic due to its capability to detect HOIs beyond a predefined set of categories. This task entails not only identifying the interactiveness of human-object pairs and localizing them but also recognizing both seen and unseen interaction categories. In this paper, we introduce a novel framework for zero-shot HOI detection… ▽ More Zero-shot Human-Object Interaction (HOI) detection has emerged as a frontier topic due to its capability to detect HOIs beyond a predefined set of categories. This task entails not only identifying the interactiveness of human-object pairs and localizing them but also recognizing both seen and unseen interaction categories. In this paper, we introduce a novel framework for zero-shot HOI detection using Conditional Multi-Modal Prompts, namely CMMP. This approach enhances the generalization of large foundation models, such as CLIP, when fine-tuned for HOI detection. Unlike traditional prompt-learning methods, we propose learning decoupled vision and language prompts for interactiveness-aware visual feature extraction and generalizable interaction classification, respectively. Specifically, we integrate prior knowledge of different granularity into conditional vision prompts, including an input-conditioned instance prior and a global spatial pattern prior. The former encourages the image encoder to treat instances belonging to seen or potentially unseen HOI concepts equally while the latter provides representative plausible spatial configuration of the human and object under interaction. Besides, we employ language-aware prompt learning with a consistency constraint to preserve the knowledge of the large foundation model to enable better generalization in the text branch. Extensive experiments demonstrate the efficacy of our detector with conditional multi-modal prompts, outperforming previous state-of-the-art on unseen classes of various zero-shot settings. The code and models are available at \url{https://github.com/ltttpku/CMMP}. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.00377 [pdf, other]

Rogers-Ramanujan type identities involving double sums

Authors: Dandan Chen, Siyu Yin

Abstract: For a given integer $k$, an identity of the following shape is defined as: finite sum of \begin{align*} \sum_{(i_1,\cdots,1_k)\in S}\frac{(-1)^{t(i_1,\cdots,i_k)}q^{Q(i_1,\cdots,i_k)}}{(q^{n_1};q^{n_1})_{i_1}\cdots(q^{n_k};q^{n_k})_{i_k}}=\prod_{(a,n)\in P}(q^a;q^n)_{\infty}^{r(a,n)} \end{align*} as a Rogers-Ramanujan type identities of $index(n_1,n_2,\cdots,n_k)$, where $t(i_1,\cdots,i_k)$ is an… ▽ More For a given integer $k$, an identity of the following shape is defined as: finite sum of \begin{align*} \sum_{(i_1,\cdots,1_k)\in S}\frac{(-1)^{t(i_1,\cdots,i_k)}q^{Q(i_1,\cdots,i_k)}}{(q^{n_1};q^{n_1})_{i_1}\cdots(q^{n_k};q^{n_k})_{i_k}}=\prod_{(a,n)\in P}(q^a;q^n)_{\infty}^{r(a,n)} \end{align*} as a Rogers-Ramanujan type identities of $index(n_1,n_2,\cdots,n_k)$, where $t(i_1,\cdots,i_k)$ is an integer-valued function, $Q(i_1,\cdots,i_k)$ is a rational polynomials in variables $i_1,\cdots,i_k,n_1,\cdots,n_k$ are positive integers with $gcd(n_1,n_2,\cdots,n_k)=1$, $S$ is a subset of $\mathbb{Z}^k$, $P$ is a finite subset of $\mathbb{Q}^2$ and $r(a,n)$ are integer-valued functions. We construct some Rogers-Ramanujan type identities by using the constant term method. △ Less

Submitted 13 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.17717 [pdf, other]

A set of $q$-orthogonal functions

Authors: Dandan Chen, Siyu Yin

Abstract: In this paper we establish $q$-orthogonality relation for the continuous $q$-ultraspherical polynomials, which was considered by Gasper. A new $q$-beta integral with seven parameters is evaluated. In this paper we establish $q$-orthogonality relation for the continuous $q$-ultraspherical polynomials, which was considered by Gasper. A new $q$-beta integral with seven parameters is evaluated. △ Less

Submitted 8 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.13279 [pdf, other]

Analyzing and Bridging the Gap between Maximizing Total Reward and Discounted Reward in Deep Reinforcement Learning

Authors: Shuyu Yin, Fei Wen, Peilin Liu, Tao Luo

Abstract: In deep reinforcement learning applications, maximizing discounted reward is often employed instead of maximizing total reward to ensure the convergence and stability of algorithms, even though the performance metric for evaluating the policy remains the total reward. However, the optimal policies corresponding to these two objectives may not always be consistent. To address this issue, we analyze… ▽ More In deep reinforcement learning applications, maximizing discounted reward is often employed instead of maximizing total reward to ensure the convergence and stability of algorithms, even though the performance metric for evaluating the policy remains the total reward. However, the optimal policies corresponding to these two objectives may not always be consistent. To address this issue, we analyzed the suboptimality of the policy obtained through maximizing discounted reward in relation to the policy that maximizes total reward and identified the influence of hyperparameters. Additionally, we proposed sufficient conditions for aligning the optimal policies of these two objectives under various settings. The primary contributions are as follows: We theoretically analyzed the factors influencing performance when using discounted reward as a proxy for total reward, thereby enhancing the theoretical understanding of this scenario. Furthermore, we developed methods to align the optimal policies of the two objectives in certain situations, which can improve the performance of reinforcement learning algorithms. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.13168 [pdf, other]

SciCode: A Research Coding Benchmark Curated by Scientists

Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we created a scientist-curated coding benchmark, SciCode. The problems in SciCode naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems. It offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards becoming helpful scientific assistants and sheds light on the development and evaluation of scientific AI in the future. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 25 pages, 9 figures, 7 tables

arXiv:2407.10416 [pdf, other]

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively handle LTPP, as they solely focus on separate stage optimization, and with most efforts confined to computational enhancements. By re-examining the end-to-end flow of dynamic sparse acceleration, we pinpoint an ever-overlooked opportunity that the LTPP can exploit the intrinsic coordination among stages to avoid excessive memory access and redundant computation. Motivated by our observation, we present SOFA, a cross-stage compute-memory efficient algorithm-hardware co-design, which is tailored to tackle the challenges posed by LTPP of Transformer inference effectively. We first propose a novel leading zero computing paradigm, which predicts attention sparsity by using log-based add-only operations to avoid the significant overhead of prediction. Then, a distributed sorting and a sorted updating FlashAttention mechanism are proposed with a cross-stage coordinated tiling principle, which enables fine-grained and lightweight coordination among stages, helping optimize memory access and latency. Further, we propose a SOFA accelerator to support these optimizations efficiently. Extensive experiments on 20 benchmarks show that SOFA achieves $9.5\times$ speed up and $71.5\times$ higher energy efficiency than Nvidia A100 GPU. Compared to 8 SOTA accelerators, SOFA achieves an average $15.8\times$ energy efficiency, $10.3\times$ area efficiency and $9.3\times$ speed up, respectively. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.07896 [pdf, other]

Pentagonal Photonic Crystal Mirrors: Scalable Lightsails with Enhanced Acceleration via Neural Topology Optimization

Authors: L. Norder, S. Yin, M. J. de Jong, F. Stallone, H. Aydogmus, P. M. Sberna, M. A. Bessa, R. A. Norte

Abstract: The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engine… ▽ More The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engineering. Unlike the microchip payload, which must be minimized in every dimension, such lightsails need meter-scale dimensions with nanoscale thickness and billions of nanoscale holes to enhance reflectivity and reduce mass. Our study employs neural topology optimization, revealing a novel pentagonal lattice-based photonic crystal (PhC) reflector. The optimized designs shorten acceleration times, therefore lowering launch costs significantly. Crucially, these designs also enable lightsail material fabrication with orders-of-magnitude reduction in costs. We have fabricated a 60 x 60 mm$^2$, 200nm thick, single-layer reflector perforated with over a billion nanoscale features; the highest aspect-ratio nanophotonic element to date. We achieve this with nearly 9,000 times cost reduction per m$^2$. Starshot lightsails will have several stringent requirements but will ultimately be driven by costs to build at scale. Here we highlight challenges and possible solutions in developing lightsail materials - showcasing the potential of scaling nanophotonics for cost-effective next-generation space exploration. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.03966 [pdf, other]

Serialized Output Training by Learned Dominance

Authors: Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han

Abstract: Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization st… ▽ More Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization strategy that incorporates an auxiliary module into the Attention Encoder-Decoder architecture, autonomously identifying the crucial factors to order the output sequence of the speech components in multi-talker speech. Experiments conducted on the LibriSpeech and LibriMix databases reveal that our approach significantly outperforms the PIT and FIFO baselines in both 2-mix and 3-mix scenarios. Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: accepted by INTERSPEECH 2024

arXiv:2406.10635 [pdf, other]

ROSfs: A User-Level File System for ROS

Authors: Zijun Xu, Xuanjun Wen, Yanjie Song, Shu Yin

Abstract: We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label. ROSfs applies a time index structure to enhance the flexible data query while the data file is under modification. It provides multi-robot systems (MRS) with prompt cross-robot data acquisition and collaboration. We impleme… ▽ More We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label. ROSfs applies a time index structure to enhance the flexible data query while the data file is under modification. It provides multi-robot systems (MRS) with prompt cross-robot data acquisition and collaboration. We implemented a ROSfs prototype and integrated it into a mainstream ROS platform. We then applied and evaluated ROSfs on real-world UAVs and data servers. Evaluation results show that compared with traditional ROS storage methods, ROSfs improves the offline query performance by up to 129x and reduces inter-robot online data query latency under a wireless network by up to 7x. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08148 [pdf, other]

Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation

Authors: Shuyu Yin, Fei Wen, Peilin Liu, Tao Luo

Abstract: Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualizatio… ▽ More Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualization reveals how the global minima in the loss landscape can transform into saddle points in the effective loss landscape, as well as the implicit bias of the semi-gradient method. Additionally, we demonstrate that saddle points, originating from the global minima in loss landscape, still exist in the effective loss landscape under high-dimensional parameter spaces and neural network settings. This paper develop a novel approach for probing implicit bias in semi-gradient Q-learning. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07421 [pdf, other]

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this study, we delve into two speaker augmentation approaches: speed perturbation (SP) and vocal tract length perturbation (VTLP). Despite the empirical utilization of both methods, a comprehensive investigation into their efficacy is lacking. Our study, conducted using two public datasets, VoxCeleb and CN-Celeb, revealed that both SP and VTLP are proficient at generating new speakers, leading to significant performance improvements in speaker recognition. Furthermore, they exhibit distinct properties in sensitivity to perturbation factors and data complexity, hinting at the potential benefits of their fusion. Our research underscores the substantial potential of speaker augmentation, highlighting the importance of in-depth exploration and analysis. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: to be published in INTERSPEECH 2024

arXiv:2406.03868 [pdf, other]

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin

Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex design space. Moreover, conducting actual training experiments to find optimal configurations is impractical due to time constraints. Hence, predicting the optimal mapping of various parallelisms to such tiled system architectures becomes crucial. In this study, leveraging an analysis of existing mainstream DL model training strategies, we introduce a performance simulator named PALM. PALM targets both the training and inference processes for tiled accelerators, aiming to inspire the design of current and future accelerators. Specifically, (i) we establish a scheduling mechanism among tiled accelerators based on an event-driven framework; (ii) we support user-configurable pipeline, tensor, and data parallelism on tiled accelerators, determining the absolute performance throughput under these parallelism strategies; (iii) we model the interaction of on-chip SRAM, NoC, and off-chip DRAM during operator execution. This work is available here: https://github.com/fangjh21/PALM. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2405.18132 [pdf, other]

EG4D: Explicit Generation of 4D Object without Score Distillation

Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose DG4D, a novel multi-stage framework that generates high-quality and consistent 4D assets without score distillation. Specifically, collaborative techniques and solutions are developed, including an attention injection strategy to synthesize temporal-consistent multi-view videos, a robust and efficient dynamic reconstruction method based on Gaussian Splatting, and a refinement stage with diffusion prior for semantic restoration. The qualitative results and user preference study demonstrate that our framework outperforms the baselines in generation quality by a considerable margin. Code will be released at \url{https://github.com/jasongzy/EG4D}. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17221 [pdf, other]

Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

Authors: Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticated workflows. Specifically, we find that the intrinsic Dual Dynamicity of Orchestrated AI Workflows, namely dynamic execution times and frequencies of Task Blocks, can be effectively represented using the Orchestrated Workflow Graph. Furthermore, the intrinsic Dual Dynamicity poses challenges to existing spatial architecture, namely Indiscriminate Resource Allocation, Reactive Load Rebalancing, and Contagious PEA Idleness. To overcome these challenges, we present Octopus, a scale-out spatial architecture and a suite of advanced scheduling strategies optimized for executing Orchestrated AI Workflows, such as the Discriminate Dual-Scheduling Mechanism, Adaptive TBU Scheduling Strategy, and Proactive Cluster Scheduling Strategy. Our evaluations demonstrate that Octopus significantly outperforms traditional architectures in handling the dynamic demands of Orchestrated AI Workflows, and possesses robust scalability in large scale hardware such as wafer-scale chip. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.15223 [pdf, other]

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Authors: Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long

Abstract: World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer fram… ▽ More World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens, facilitating an interactive experience of agents via next-token prediction. iVideoGPT features a novel compressive tokenization technique that efficiently discretizes high-dimensional visual observations. Leveraging its scalable architecture, we are able to pre-train iVideoGPT on millions of human and robotic manipulation trajectories, establishing a versatile foundation that is adaptable to serve as interactive world models for a wide range of downstream tasks. These include action-conditioned video prediction, visual planning, and model-based reinforcement learning, where iVideoGPT achieves competitive performance compared with state-of-the-art methods. Our work advances the development of interactive general world models, bridging the gap between generative video models and practical model-based reinforcement learning applications. △ Less

Submitted 2 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: Project website: https://thuml.github.io/iVideoGPT

arXiv:2405.10463 [pdf, other]

Single-shot volumetric fluorescence imaging with neural fields

Authors: Oumeng Zhang, Haowen Zhou, Brandon Y. Feng, Elin M. Larsson, Reinaldo E. Alcalde, Siyuan Yin, Catherine Deng, Changhuei Yang

Abstract: Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, el… ▽ More Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, eliminating depth ambiguity in the reconstruction, and maintaining high resolution across a large field of view. In this paper, we introduce the QuadraPol point spread function (PSF) combined with neural fields, a novel approach for SVF imaging. This method utilizes a custom polarizer at the back focal plane and a polarization camera to detect fluorescence, effectively encoding the 3D scene within a compact PSF without depth ambiguity. Additionally, we propose a reconstruction algorithm based on the neural fields technique that provides improved reconstruction quality and addresses the inaccuracies of phase retrieval methods used to correct imaging system aberrations. This algorithm combines the accuracy of experimental PSFs with the long depth of field of computationally generated retrieved PSFs. QuadraPol PSF, combined with neural fields, significantly reduces the acquisition time of a conventional fluorescence microscope by approximately 20 times and captures a 100 mm$^3$ cubic volume in one shot. We validate the effectiveness of both our hardware and algorithm through all-in-focus imaging of bacterial colonies on sand surfaces and visualization of plant root morphology. Our approach offers a powerful tool for advancing biological research and ecological studies. △ Less

Submitted 4 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.07551 [pdf, other]

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

Abstract: The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly in… ▽ More The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($μ$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

arXiv:2405.06887 [pdf, other]

FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

Authors: Jinglin Xu, Sibo Yin, Guohao Zhao, Zishuo Wang, Yuxin Peng

Abstract: Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requi… ▽ More Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique. Based on this insight, we propose a new fine-grained spatial-temporal action parser named \textbf{FineParser}. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in time and space to minimize the impact of invalid backgrounds during the assessment. In addition, we construct fine-grained annotations of human-centric foreground action masks for the FineDiving dataset, called \textbf{FineDiving-HM}. With refined annotations on diverse target action procedures, FineDiving-HM can promote the development of real-world AQA systems. Through extensive experiments, we demonstrate the effectiveness of FineParser, which outperforms state-of-the-art methods while supporting more tasks of fine-grained action understanding. Data and code are available at \url{https://github.com/PKU-ICST-MIPL/FineParser_CVPR2024}. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR 2024

arXiv:2405.05722 [pdf, other]

A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and… ▽ More We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and SO(3)-equivariant quantities and their representations. We first construct theoretical SO(3)-invariant quantities derived from the SO(3)-equivariant regression targets, and use these invariant quantities as supervisory labels to guide the learning of high-quality SO(3)-invariant features. Given that SO(3)-invariance is preserved under non-linear operations, the encoding process for invariant features can extensively utilize non-linear mappings, thereby fully capturing the non-linear patterns inherent in physical systems. Building on this foundation, we propose a gradient-based mechanism to induce SO(3)-equivariant encodings of various degrees from the learned SO(3)-invariant features. This mechanism can incorporate non-linear expressive capabilities into SO(3)-equivariant representations, while theoretically preserving their equivariant properties as we prove. We apply our theory and method to the electronic-structure Hamiltonian prediction tasks, experimental results on eight benchmark databases covering multiple types of elements and challenging scenarios show dramatic breakthroughs on the state-of-the-art prediction accuracy, with improvements of up to 40% in predicting Hamiltonians and up to 76% in predicting downstream physical quantities such as occupied orbital energy. Our approach goes beyond handling physical systems and offers a promising general solution to the critical dilemma between equivariance and non-linear expressiveness for the deep learning paradigm. △ Less

Submitted 18 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02155 [pdf, other]

Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

Authors: Siqi Yin, Lifan Jiang

Abstract: This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to… ▽ More This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on multiple datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our model can significantly improve classification accuracy compared to single-model approaches, achieving AUROC scores above 96% across all test datasets, and notably surpassing 99% on the CIFAR-10 dataset. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.18612 [pdf]

Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains

Authors: Chuheng Chen, Xinxing Chen, Shucong Yin, Yuxuan Wang, Binxin Huang, Yuquan Leng, Chenglong Fu

Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and oth… ▽ More Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and other severe consequences. In this paper, a visual-inertial motion estimation approach is proposed for prosthesis to perceive its movement and the changes of spatial relationship between the prosthesis and uneven terrain when traversing them. To achieve this, we estimate the knee motion by utilizing a depth camera to perceive the environment and align feature points extracted from stairs and obstacles. Subsequently, an error-state Kalman filter is incorporated to fuse the inertial data into visual estimations to reduce the feature extraction error and obtain a more robust estimation. The motion of prosthetic joint and toe are derived using the prosthesis model parameters. Experiment conducted on our collected dataset and stair walking trials with a powered prosthesis shows that the proposed method can accurately tracking the motion of the human leg and prosthesis with an average root-mean-square error of toe trajectory less than 5 cm. The proposed method is expected to enable the environmental adaptive control for prosthesis, thereby enhancing amputee's safety and mobility in uneven terrains. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.13378 [pdf, other]

doi 10.1109/TIV.2024.3352180

Social Force Embedded Mixed Graph Convolutional Network for Multi-class Trajectory Prediction

Authors: Quancheng Du, Xiao Wang, Shouguo Yin, Lingxi Li, Huansheng Ning

Abstract: Accurate prediction of agent motion trajectories is crucial for autonomous driving, contributing to the reduction of collision risks in human-vehicle interactions and ensuring ample response time for other traffic participants. Current research predominantly focuses on traditional deep learning methods, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These meth… ▽ More Accurate prediction of agent motion trajectories is crucial for autonomous driving, contributing to the reduction of collision risks in human-vehicle interactions and ensuring ample response time for other traffic participants. Current research predominantly focuses on traditional deep learning methods, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These methods leverage relative distances to forecast the motion trajectories of a single class of agents. However, in complex traffic scenarios, the motion patterns of various types of traffic participants exhibit inherent randomness and uncertainty. Relying solely on relative distances may not adequately capture the nuanced interaction patterns between different classes of road users. In this paper, we propose a novel multi-class trajectory prediction method named the social force embedded mixed graph convolutional network (SFEM-GCN). SFEM-GCN comprises three graph topologies: the semantic graph (SG), position graph (PG), and velocity graph (VG). These graphs encode various of social force relationships among different classes of agents in complex scenes. Specifically, SG utilizes one-hot encoding of agent-class information to guide the construction of graph adjacency matrices based on semantic information. PG and VG create adjacency matrices to capture motion interaction relationships between different classes agents. These graph structures are then integrated into a mixed graph, where learning is conducted using a spatiotemporal graph convolutional neural network (ST-GCNN). To further enhance prediction performance, we adopt temporal convolutional networks (TCNs) to generate the predicted trajectory with fewer parameters. Experimental results on publicly available datasets demonstrate that SFEM-GCN surpasses state-of-the-art methods in terms of accuracy and robustness. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 11 pages,3 figures, published to IEEE Transactions on Intelligent vehicles

arXiv:2404.12104 [pdf, other]

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

Authors: Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

Abstract: The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designe… ▽ More The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools without necessitating internal model revision. Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions by refining user commands and rectifying model outputs. Systematic evaluation metrics, combining GPT4-V, HEIM, and FairFace scores, assess alignment capability. Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models like DALLE 3, ensuring user-generated content adheres to ethical standards while maintaining image quality. This study indicates the potential of Ethical-Lens to ensure the sustainable development of open-source text-to-image tools and their beneficial integration into society. Our code is available at https://github.com/yuzhu-cai/Ethical-Lens. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 42 pages, 17 figures, 29 tables

arXiv:2404.06762 [pdf, other]

Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantly enhance student engagement and learning efficiency. However, characterizing and simulating student's persona remain challenging in training and evaluating conversational ITSs. In this work, we propose a framework to construct profiles of different student groups by refining and integrating both cognitive and noncognitive aspects, and leverage LLMs for personality-aware student simulation in a language learning scenario. We further enhance the framework with multi-aspect validation, and conduct extensive analysis from both teacher and student perspectives. Our experimental results show that state-of-the-art LLMs can produce diverse student responses according to the given language ability and personality traits, and trigger teacher's adaptive scaffolding strategies. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06194 [pdf, other]

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Authors: Ting Lei, Shaofeng Yin, Yang Liu

Abstract: Open-vocabulary human-object interaction (HOI) detection, which is concerned with the problem of detecting novel HOIs guided by natural language, is crucial for understanding human-centric scenes. However, prior zero-shot HOI detectors often employ the same levels of feature maps to model HOIs with varying distances, leading to suboptimal performance in scenes containing human-object pairs with a… ▽ More Open-vocabulary human-object interaction (HOI) detection, which is concerned with the problem of detecting novel HOIs guided by natural language, is crucial for understanding human-centric scenes. However, prior zero-shot HOI detectors often employ the same levels of feature maps to model HOIs with varying distances, leading to suboptimal performance in scenes containing human-object pairs with a wide range of distances. In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone. In this paper, we introduce a novel end-to-end open vocabulary HOI detection framework with conditional multi-level decoding and fine-grained semantic enhancement (CMD-SE), harnessing the potential of Visual-Language Models (VLMs). Specifically, we propose to model human-object pairs with different distances with different levels of feature maps by incorporating a soft constraint during the bipartite matching process. Furthermore, by leveraging large language models (LLMs) such as GPT models, we exploit their extensive world knowledge to generate descriptions of human body part states for various interactions. Then we integrate the generalizable and fine-grained semantics of human body parts to improve interaction recognition. Experimental results on two datasets, SWIG-HOI and HICO-DET, demonstrate that our proposed method achieves state-of-the-art results in open vocabulary HOI detection. The code and models are available at https://github.com/ltttpku/CMD-SE-release. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05412 [pdf]

Valley edge states as bound states in the continuum

Authors: Shunda Yin, Liping Ye, Hailong He, Xueqin Huang, Manzhu Ke, Weiyin Deng, Jiuyang Lu, Zhengyou Liu

Abstract: Bound states in the continuum (BICs) are spatially localized states with energy embedded in the continuum spectrum of extended states. The combination of BICs physics and nontrivial band topology theory giving rise to topological BICs, which are robust against disorders and meanwhile of the merit of conventional BICs, is attracting wide attention recently. Here, we report valley edge states as top… ▽ More Bound states in the continuum (BICs) are spatially localized states with energy embedded in the continuum spectrum of extended states. The combination of BICs physics and nontrivial band topology theory giving rise to topological BICs, which are robust against disorders and meanwhile of the merit of conventional BICs, is attracting wide attention recently. Here, we report valley edge states as topological BICs, which appear at domain wall between two distinct valley topological phases. The robustness of such BICs is demonstrated. The simulations and experiments show great agreement. Our findings of valley related topological BICs shed light on both BICs and valley physics, and may foster innovative applications of topological acoustic devices. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: A revised version has been accepted by Science Bulletin

arXiv:2404.04449 [pdf]

Self-referencing photothermal common-path interferometry to measure absorption of Si3N4 membranes for laser-light sails

Authors: Demeng Feng, Tanuj Kumar, Shenwei Yin, Merlin Mah, Phyo Lin, Margaret Fortman, Gabriel R. Jaffe, Chenghao Wan, Hongyan Mei, Yuzhe Xiao, Ron Synowicki, Ronald J. Warzoha, Victor W. Brar, Joseph J. Talghader, Mikhail A. Kats

Abstract: Laser-light sails are a spacecraft concept wherein lightweight "sails" are propelled to high speeds by lasers with high intensities. The sails must comprise materials with low optical loss, to minimize the risk of laser damage. Stoichiometric silicon nitride (Si$_3$N$_4$) is a candidate material with low loss in the near infrared, but the precise absorption coefficient has not been characterized i… ▽ More Laser-light sails are a spacecraft concept wherein lightweight "sails" are propelled to high speeds by lasers with high intensities. The sails must comprise materials with low optical loss, to minimize the risk of laser damage. Stoichiometric silicon nitride (Si$_3$N$_4$) is a candidate material with low loss in the near infrared, but the precise absorption coefficient has not been characterized in the membrane form-factor needed for sails. We use photothermal common-path interferometry (PCI), a sensitive pump-probe technique, to measure the absorption coefficient of stoichiometric and nonstoichiometric silicon nitride. To calibrate PCI measurements of membranes, we developed a self-referencing technique where a measurement is performed twice: once on a bare membrane, and a second time with a monolayer of graphene deposited on the membrane. The absorption of the sample with graphene can be measured by both PCI and more-conventional spectroscopic techniques, enabling the calibration of the PCI measurement. We find that with an absorption coefficient of (2.09 $\pm$ 0.76) $\times$ 10$^{-2}$ cm$^{-1}$ at 1064 nm, Si$_3$N$_4$ is a suitable laser-sail material for laser intensities as high as ~10 GW/m$^{2}$, which have been proposed for some laser-sail missions, while silicon-rich SiN$_x$ (x~1), with an absorption coefficient of 7.94 $\pm$ 0.50 cm$^{-1}$, is unlikely to survive such high laser intensities. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Main text + supplementary

arXiv:2404.03681 [pdf, other]

Muon beamtest results of high-density glass scintillator tiles

Authors: Dejing Du, Yong Liu, Hua Cai, Danping Chen, Zhehao Hua, Jifeng Han, Jifeng Han, Baohua Qi, Sen Qian, Jing Ren, Xinyuan Sun, Xinyuan Sun, Dong Yang, Shenghua Yin, Minghui Zhang

Abstract: To achieve the physics goal of precisely measure the Higgs, Z, W bosons and the top quark, future electron-positron colliders require that their detector system has excellent jet energy resolution. One feasible technical option is the high granular calorimetery based on the particle flow algorithm (PFA). A new high-granularity hadronic calorimeter with glass scintillator tiles (GSHCAL) has been pr… ▽ More To achieve the physics goal of precisely measure the Higgs, Z, W bosons and the top quark, future electron-positron colliders require that their detector system has excellent jet energy resolution. One feasible technical option is the high granular calorimetery based on the particle flow algorithm (PFA). A new high-granularity hadronic calorimeter with glass scintillator tiles (GSHCAL) has been proposed, which focus on the significant improvement of hadronic energy resolution with a notable increase of the energy sampling fraction by using high-density glass scintillator tiles. The minimum ionizing particle (MIP) response of a glass scintillator tile is crucial to the hadronic calorimeter, so a dedicated beamtest setup was developed for testing the first batch of large-size glass scintillators. The maximum MIP response of the first batch of glass scintillator tiles can reach up to 107 p.e./MIP, which essentially meets the design requirements of the CEPC GSHCAL. An optical simulation model of a single glass scintillator tile has been established, and the simulation results are consistent with the beamtest results. △ Less

Submitted 9 May, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.03429 [pdf, other]

Scaffolding Language Learning via Multi-modal Tutoring Systems with Pedagogical Instructions

Authors: Zhengyuan Liu, Stella Xin Yin, Carolyn Lee, Nancy F. Chen

Abstract: Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in lang… ▽ More Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in language education as it involves developing skills in communication, which, however, drew relatively less attention. Additionally, due to the complicated cognitive development at younger ages, more endeavors are needed for practical uses. Scaffolding refers to a teaching technique where teachers provide support and guidance to students for learning and developing new concepts or skills. It is an effective way to support diverse learning needs, goals, processes, and outcomes. In this work, we investigate how pedagogical instructions facilitate the scaffolding in ITSs, by conducting a case study on guiding children to describe images for language learning. We construct different types of scaffolding tutoring systems grounded in four fundamental learning theories: knowledge construction, inquiry-based learning, dialogic teaching, and zone of proximal development. For qualitative and quantitative analyses, we build and refine a seven-dimension rubric to evaluate the scaffolding process. In our experiment on GPT-4V, we observe that LLMs demonstrate strong potential to follow pedagogical instructions and achieve self-paced learning in different student groups. Moreover, we extend our evaluation framework from a manual to an automated approach, paving the way to benchmark various conversational tutoring systems. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.19258 [pdf, other]

Finite-time Scaling beyond the Kibble-Zurek Prerequisite: Driven Critical Dynamics in Strongly Interacting Dirac Systems

Authors: Zhi Zeng, Yin-Kai Yu, Zhi-Xuan Li, Zi-Xiang Li, Shuai Yin

Abstract: In conventional quantum critical point (QCP) characterized by order parameter fluctuations, the celebrated Kibble-Zurek mechanism (KZM) and finite-time scaling (FTS) theory provide universal descriptions of the driven critical dynamics. However, in strongly correlated fermionic systems where gapless fermions are usually present in vicinity of QCP, the driven dynamics has rarely been explored. In t… ▽ More In conventional quantum critical point (QCP) characterized by order parameter fluctuations, the celebrated Kibble-Zurek mechanism (KZM) and finite-time scaling (FTS) theory provide universal descriptions of the driven critical dynamics. However, in strongly correlated fermionic systems where gapless fermions are usually present in vicinity of QCP, the driven dynamics has rarely been explored. In this Letter, we investigate the driven critical dynamics in two-dimensional Dirac systems, which harbor semimetal and Mott insulator phases separated by the QCP triggered by the interplay between fluctuations of gapless Dirac fermions and order-parameter bosons. By studying the evolution of physical quantities for different driving rates through large-scale quantum Monte Carlo simulation, we confirm that the driven dynamics is described by the FTS form. Accordingly, our results significantly generalize the KZM theory by relaxing its requirement for a gapped initial state to the system accommodating gapless Dirac fermionic excitation. Through successfully extending the KZM and FTS theory to Dirac QCP, our work not only brings new fundamental perspective into the nonequilibrium critical dynamics, but also provides a novel theoretical approach to fathom quantum critical properties in fermionic systems. △ Less

Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 9+3 pages, 5+2 figures

arXiv:2403.09084 [pdf, other]

doi 10.1103/PhysRevB.109.184303

Imaginary-time relaxation quantum critical dynamics in two-dimensional dimerized Heisenberg model

Authors: Jia-Qi Cai, Yu-Rong Shu, Xue-Qing Rao, Shuai Yin

Abstract: We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling rel… ▽ More We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling relations are obtained. We numerically verify the scaling form and the improved short-time scaling relations for different initial states using projector quantum Monte Carlo algorithm. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 10 pages, 8 figures

Journal ref: Phys. Rev. B 109, 184303(2024)

arXiv:2403.08459 [pdf, other]

Symmetry restoration and quantum Mpemba effect in symmetric random circuits

Authors: Shuo Liu, Hao-Kai Zhang, Shuai Yin, Shi-Xin Zhang

Abstract: Entanglement asymmetry, which serves as a diagnostic tool for symmetry breaking and a proxy for thermalization, has recently been proposed and studied in the context of symmetry restoration for quantum many-body systems undergoing a quench. In this Letter, we investigate symmetry restoration in various symmetric random quantum circuits, particularly focusing on the U(1) symmetry case. In contrast… ▽ More Entanglement asymmetry, which serves as a diagnostic tool for symmetry breaking and a proxy for thermalization, has recently been proposed and studied in the context of symmetry restoration for quantum many-body systems undergoing a quench. In this Letter, we investigate symmetry restoration in various symmetric random quantum circuits, particularly focusing on the U(1) symmetry case. In contrast to non-symmetric random circuits where the U(1) symmetry of a small subsystem can always be restored at late times, we reveal that symmetry restoration can fail in U(1)-symmetric circuits for certain weak symmetry-broken initial states in finite-size systems. In the early-time dynamics, we observe an intriguing quantum Mpemba effect implying that symmetry is restored faster when the initial state is more asymmetric. Furthermore, we also investigate the entanglement asymmetry dynamics for SU(2) and $Z_{2}$ symmetric circuits and identify the presence and absence of the quantum Mpemba effect for the corresponding symmetries, respectively. A unified understanding of these results is provided through the lens of quantum thermalization with conserved charges. △ Less

Submitted 16 August, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

Comments: 4.5 pages, 5 figures, and Supplemental Material

arXiv:2403.06770 [pdf, other]

Estimates on the convergence of expansions at finite baryon chemical potentials

Authors: Rui Wen, Shi Yin, Wei-jie Fu

Abstract: Convergence of three different expansion schemes at finite baryon chemical potentials, including the conventional Taylor expansion, the Padé approximants, and the $T'$ expansion proposed recently in lattice QCD simulations, have been investigated in a low energy effective theory within the fRG approach. It is found that the $T'$ expansion or the Padé approximants would hardly improve the convergen… ▽ More Convergence of three different expansion schemes at finite baryon chemical potentials, including the conventional Taylor expansion, the Padé approximants, and the $T'$ expansion proposed recently in lattice QCD simulations, have been investigated in a low energy effective theory within the fRG approach. It is found that the $T'$ expansion or the Padé approximants would hardly improve the convergence of expansion in comparison to the conventional Taylor expansion, within the expansion orders considered in this work. Furthermore, we find that the consistent regions of the three different expansions are in agreement with the convergence radius of the Lee-Yang edge singularities. △ Less

Submitted 19 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 9 pages, 6 figures

arXiv:2403.04481 [pdf, other]

Do Large Language Model Understand Multi-Intent Spoken Language ?

Authors: Shangjian Yin, Peijie Huang, Yuhong Xu, Haojing Huang, Jiatian Chen

Abstract: This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of… ▽ More This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of Sub-Intent Instruction (SII) to amplify the analysis and interpretation of complex, multi-intent communications, which further supports the creation of the ENSI-LLM models series. Our novel datasets, identified as LM-MixATIS and LM-MixSNIPS, are synthesized from existing benchmarks. The study evidences that LLMs may match or even surpass the performance of the current best multi-intent SLU models. We also scrutinize the performance of LLMs across a spectrum of intent configurations and dataset distributions. On top of this, we present two revolutionary metrics - Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA) - to facilitate a detailed assessment of LLM competence in this multifaceted field." Our code and datasets are available at \url{https://github.com/SJY8460/SLM}. △ Less

Submitted 15 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03742 [pdf, other]

Mitigating Ageism through Virtual Reality: Intergenerational Collaborative Escape Room Design

Authors: Ruotong Zou, Shuyu Yin, Tianqi Song, Peinuan Qin, Yi-Chieh Lee

Abstract: As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape r… ▽ More As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape room game. To ensure consistent and reliable performance data of older players, our experimenters simulated older participants following specific guidelines. After completing the game, we found a significant reduction in ageism among younger participants. Furthermore, we introduce a new game mechanism that encourages intergenerational collaboration. Our research highlights the potential of VR collaborative games as a practical tool for mitigating ageism. It provides valuable insights for designing immersive VR experiences that foster enhanced intergenerational collaboration. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.00019 [pdf, other]

Transformer-based Parameter Estimation in Statistics

Authors: Xiaoxin Yin, David S. Yin

Abstract: Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not… ▽ More Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution). In this paper we propose a transformer-based approach to parameter estimation. Compared with existing solutions, our approach does not require a closed-form solution or any mathematical derivations. It does not even require knowing the probability density function, which is needed by numerical methods. After the transformer model is trained, only a single inference is needed to estimate the parameters of the underlying distribution based on a sample of observations. In the empirical study we compared our approach with maximum likelihood estimation on commonly used distributions such as normal distribution, exponential distribution and beta distribution. It is shown that our approach achieves similar or better accuracy as measured by mean-square-errors. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2402.16899 [pdf, other]

A priori Estimates for Deep Residual Network in Continuous-time Reinforcement Learning

Authors: Shuyu Yin, Qixuan Zhou, Fei Wen, Tao Luo

Abstract: Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is a… ▽ More Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is applicable to all such problems where the transition function satisfies semi-group and Lipschitz properties. Under this method, we can directly analyze the \emph{a priori} generalization error of the Bellman optimal loss. The core of this method lies in two transformations of the loss function. To complete the transformation, we propose a decomposition method for the maximum operator. Additionally, this analysis method does not require a boundedness assumption. Finally, we obtain an \emph{a priori} generalization error without the curse of dimensionality. △ Less

Submitted 7 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.16272 [pdf, other]

Mass production and performance study on the 20-inch PMT acrylic protection covers in JUNO

Authors: Miao He, Zhonghua Qin, Diru Wu, Meihang Xu, Wan Xie, Fang Chen, Xiaoping Jing, Genhua Yin, Shengjiong Yin, Linhua Gu, Xiaofeng Xia, Qinchang Wang

Abstract: The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional p… ▽ More The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional precision, mechanical strength, and transparency. This paper presents the manufacturing technology, mass production process, and performance characteristics of the acrylic covers. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 12 pages, 10 figures

arXiv:2402.14634 [pdf, other]

doi 10.1145/3636534.3649376

GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame

Authors: Ke Li, Ruidong Zhang, Boao Chen, Siyuan Chen, Sicheng Yin, Saif Mahmud, Qikang Liang, François Guimbretière, Cheng Zhang

Abstract: In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further… ▽ More In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further processed to calculate the echo profiles, which are fed to a customized deep learning pipeline to continuously infer the gaze position. In a user study with 20 participants, GazeTrak achieves an accuracy of 3.6° within the same remounting session and 4.9° across different sessions with a refreshing rate of 83.3 Hz and a power signature of 287.9 mW. Furthermore, we report the performance of our gaze tracking system fully implemented on an MCU with a low-power CNN accelerator (MAX78002). In this configuration, the system runs at up to 83.3 Hz and has a total power signature of 95.4 mW with a 30 Hz FPS. △ Less

Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 16 pages, 5 figures, 7 tables, The 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom 2024)

arXiv:2402.12823 [pdf, other]

The influence of hadronic rescatterings on the net-baryon number fluctuations

Authors: Qian Chen, Rui Wen, Shi Yin, Wei-jie Fu, Zi-Wei Lin, Guo-Liang Ma

Abstract: Fluctuations of conserved charges, such as the net-baryon number fluctuations, are influenced by different dynamical evolution processes. In this paper, we investigate the influence of hadronic rescatterings on different orders of cumulants of the net-baryon number distribution. At the start of hadronic rescatterings, we introduce net-baryon number distributions reconstructed based on net-baryon c… ▽ More Fluctuations of conserved charges, such as the net-baryon number fluctuations, are influenced by different dynamical evolution processes. In this paper, we investigate the influence of hadronic rescatterings on different orders of cumulants of the net-baryon number distribution. At the start of hadronic rescatterings, we introduce net-baryon number distributions reconstructed based on net-baryon cumulants of different orders obtained from computation in functional renormalization group (FRG), where the distributions were constructed using the maximum entropy method. This way we introduce the critical fluctuations of Quantum Chromodynamics (QCD) into the AMPT model. Firstly, we find that hadronic rescatterings have distinct effects on cumulant ratios of different orders for the net-baryon number. Secondly, we observe that the effect of hadronic rescatterings is more significant for critical fluctuations than dynamical fluctuations, because the two-, three- and four-particle correlation functions due to critical fluctuations are weakened more significantly by hadronic rescatterings. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 10 pages, 8 figures

arXiv:2402.10534 [pdf, other]

Using Left and Right Brains Together: Towards Vision and Language Planning

Authors: Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang

Abstract: Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking pro… ▽ More Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking process. Therefore, we introduce a novel vision-language planning framework in this work to perform concurrent visual and language planning for tasks with inputs of any form. Our framework incorporates visual planning to capture intricate environmental details, while language planning enhances the logical coherence of the overall system. We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks. The results demonstrate the superior performance of our approach, indicating that the integration of visual and language planning yields better contextually aware task execution. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 19 pages, 13 figures

arXiv:2402.02140 [pdf, other]

Generative Visual Compression: A Review

Authors: Bolin Chen, Shanzhi Yin, Peilin Chen, Shiqi Wang, Yan Ye

Abstract: Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promi… ▽ More Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promising applications in ultra-low bitrate communication, user-specified reconstruction/filtering, and intelligent machine analysis. In particular, we review the visual data compression methodologies with deep generative models, and summarize how compact representation and high-fidelity reconstruction could be actualized via generative techniques. In addition, we generalize related generative compression technologies for machine vision and intelligent analytics. Finally, we discuss the fundamental challenges on generative visual compression techniques and envision their future research directions. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.01271 [pdf, other]

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Authors: Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: INTERSPEECH 2023

Showing 1–50 of 307 results for author: Yin, S