-
CogVLM2: Visual Language Models for Image and Video Understanding
Authors:
Wenyi Hong,
Weihan Wang,
Ming Ding,
Wenmeng Yu,
Qingsong Lv,
Yan Wang,
Yean Cheng,
Shiyu Huang,
Junhui Ji,
Zhao Xue,
Lei Zhao,
Zhuoyi Yang,
Xiaotao Gu,
Xiaohan Zhang,
Guanyu Feng,
Da Yin,
Zihan Wang,
Ji Qi,
Xixuan Song,
Peng Zhang,
Debing Liu,
Bin Xu,
Juanzi Li,
Yuxiao Dong,
Jie Tang
Abstract:
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2…
▽ More
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new generation of visual language models for image and video understanding including CogVLM2, CogVLM2-Video and GLM-4V. As an image understanding model, CogVLM2 inherits the visual expert architecture with improved training recipes in both pre-training and post-training stages, supporting input resolution up to $1344 \times 1344$ pixels. As a video understanding model, CogVLM2-Video integrates multi-frame input with timestamps and proposes automated temporal grounding data construction. Notably, CogVLM2 family has achieved state-of-the-art results on benchmarks like MMBench, MM-Vet, TextVQA, MVBench and VCGBench. All models are open-sourced in https://github.com/THUDM/CogVLM2 and https://github.com/THUDM/GLM-4, contributing to the advancement of the field.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation
Authors:
Tri Ton,
Ji Woo Hong,
SooHwan Eom,
Jun Yeop Shim,
Junyeong Kim,
Chang D. Yoo
Abstract:
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask propos…
▽ More
Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask proposal models; consequently, the information that could come from 2D association to 3D was not fully exploited. This bias towards 3D data, while effective for familiar indoor objects, limits the system's adaptability to new and varied object types, where 2D models offer greater utility. Addressing this gap, we introduce Zero-Shot Dual-Path Integration Framework that equally values the contributions of both 3D and 2D modalities. Our framework comprises three components: 3D pathway, 2D pathway, and Dual-Path Integration. 3D pathway generates spatially accurate class-agnostic mask proposals of common indoor objects from 3D point cloud data using a pre-trained 3D model, while 2D pathway utilizes pre-trained open-vocabulary instance segmentation model to identify a diverse array of object proposals from multi-view RGB-D images. In Dual-Path Integration, our Conditional Integration process, which operates in two stages, filters and merges the proposals from both pathways adaptively. This process harmonizes output proposals to enhance segmentation capabilities. Our framework, utilizing pre-trained models in a zero-shot manner, is model-agnostic and demonstrates superior performance on both seen and unseen data, as evidenced by comprehensive evaluations on the ScanNet200 and qualitative results on ARKitScenes datasets.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Authors:
Xiao Liu,
Tianjie Zhang,
Yu Gu,
Iat Long Iong,
Yifan Xu,
Xixuan Song,
Shudan Zhang,
Hanyu Lai,
Xinyi Liu,
Hanlin Zhao,
Jiadai Sun,
Xinyue Yang,
Yu Yang,
Zehan Qi,
Shuntian Yao,
Xueqiao Sun,
Siyi Cheng,
Qinkai Zheng,
Hao Yu,
Hanchen Zhang,
Wenyi Hong,
Ming Ding,
Lihang Pan,
Xiaotao Gu,
Aohan Zeng
, et al. (5 additional authors not shown)
Abstract:
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMM…
▽ More
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments. To address this gap, we introduce VisualAgentBench (VAB), a comprehensive and pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents across diverse scenarios, including Embodied, Graphical User Interface, and Visual Design, with tasks formulated to probe the depth of LMMs' understanding and interaction capabilities. Through rigorous testing across nine proprietary LMM APIs and eight open models, we demonstrate the considerable yet still developing agent capabilities of these models. Additionally, VAB constructs a trajectory training set constructed through hybrid methods including Program-based Solvers, LMM Agent Bootstrapping, and Human Demonstrations, promoting substantial performance improvements in LMMs through behavior cloning. Our work not only aims to benchmark existing models but also provides a solid foundation for future development into visual foundation agents. Code, train \& test data, and part of fine-tuned open LMMs are available at \url{https://github.com/THUDM/VisualAgentBench}.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Authors:
Zhuoyi Yang,
Jiayan Teng,
Wendi Zheng,
Ming Ding,
Shiyu Huang,
Jiazheng Xu,
Yuanming Yang,
Wenyi Hong,
Xiaohan Zhang,
Guanyu Feng,
Da Yin,
Xiaotao Gu,
Yuxuan Zhang,
Weihan Wang,
Yean Cheng,
Ting Liu,
Bin Xu,
Yuxiao Dong,
Jie Tang
Abstract:
We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep…
▽ More
We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep fusion between the two modalities. By employing a progressive training technique, CogVideoX is adept at producing coherent, long-duration videos characterized by significant motions. In addition, we develop an effective text-video data processing pipeline that includes various data preprocessing strategies and a video captioning method. It significantly helps enhance the performance of CogVideoX, improving both generation quality and semantic alignment. Results show that CogVideoX demonstrates state-of-the-art performance across both multiple machine metrics and human evaluations. The model weights of both the 3D Causal VAE and CogVideoX are publicly available at https://github.com/THUDM/CogVideo.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
Authors:
Gwanhyeong Koo,
Sunjae Yoon,
Ji Woo Hong,
Chang D. Yoo
Abstract:
Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve altering the image's layout or structure. Our comprehensive analysis reveals that the high-frequency components of DDIM latent, crucial for retaining the…
▽ More
Current image editing methods primarily utilize DDIM Inversion, employing a two-branch diffusion approach to preserve the attributes and layout of the original image. However, these methods encounter challenges with non-rigid edits, which involve altering the image's layout or structure. Our comprehensive analysis reveals that the high-frequency components of DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to these limitations. Addressing this, we introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent, by reducing high-frequency components in targeted editing areas. FlexiEdit comprises two key components: (1) Latent Refinement, which modifies DDIM latent to better accommodate layout adjustments, and (2) Edit Fidelity Enhancement via Re-inversion, aimed at ensuring the edits more accurately reflect the input text prompts. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits, showcasing its enhanced capability through comparative experiments.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Necklace-like pattern of vortex bound states
Authors:
Zhiyong Hou,
Kailun Chen,
Wenshan Hong,
Da Wang,
Wen Duan,
Huan Yang,
Shiliang Li,
Huiqian Luo,
Qiang-Hua Wang,
Tao Xiang,
Hai-Hu Wen
Abstract:
Vortex is a topological defect in the superconducting condensate when a magnetic field is applied to a type-II superconductor, as elucidated by the Ginzburg-Landau theory. Due to the confinement of the quasiparticles by a vortex, it exhibits a circular shaped pattern of bound states with discrete energy levels, as predicted by the Caroli-de Gennes-Matricon theory in 1964. Here, however, we report…
▽ More
Vortex is a topological defect in the superconducting condensate when a magnetic field is applied to a type-II superconductor, as elucidated by the Ginzburg-Landau theory. Due to the confinement of the quasiparticles by a vortex, it exhibits a circular shaped pattern of bound states with discrete energy levels, as predicted by the Caroli-de Gennes-Matricon theory in 1964. Here, however, we report a completely new type of vortex pattern which is necklace-like in an iron-based superconductor KCa2Fe4As4F2. Our theoretical analysis shows that this necklace-like vortex pattern arises from selective off-shell interference between vortex bound states of opposite angular momenta in the presence of rotational symmetry breaking due to disorders. This fascinating effect can be observed in a system with a small Fermi energy and wave vector, conditions fortuitously met in our samples. Our results not only disclose a novel vortex structure but also provide insights into comprehending the physics of the superconducting condensate.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Regularization by Nonlinear Noise for PDEs: Well-posedness and Finite Time Extinction
Authors:
Wei Hong,
Shihu Li,
Wei Liu
Abstract:
This work focuses on the regularization by nonlinear noise for a class of partial differential equations that may only have local solutions. In particular, we obtain the global existence, uniqueness and the Feller property for stochastic 3D Navier-Stokes equations, which provide positive answers to some longstanding open problems in this field.
Moreover, we discover a new phenomenon that for a p…
▽ More
This work focuses on the regularization by nonlinear noise for a class of partial differential equations that may only have local solutions. In particular, we obtain the global existence, uniqueness and the Feller property for stochastic 3D Navier-Stokes equations, which provide positive answers to some longstanding open problems in this field.
Moreover, we discover a new phenomenon that for a potentially explosive deterministic system, an appropriate intervention of nonlinear noise can not only prevent blow-up but also lead to the finite time extinction of the associated stochastic system. Our main results have broad applications, including stochastic $p$-Laplace equations with heat sources, stochastic surface growth models and stochastic quasi-geostrophic equations.
△ Less
Submitted 30 August, 2024; v1 submitted 9 July, 2024;
originally announced July 2024.
-
Actuation system of the inertial sensor for high-precision space missions using torsion pendulum
Authors:
Fangchao Yang,
Yan Zhu,
Xiaofei Jin,
Yujie Zhao,
Shixun Pei,
Wei Hong
Abstract:
Precision space inertial sensors are imperative to Earth geodesy missions, gravitational wave observations and several fundamental physics experiments in space. In these missions, the residual acceleration noise of the test mass(TM) caused by the forces from inertial sensor components and environment is supposed to be kept below a certain level. As a number of forces contributing to residual accel…
▽ More
Precision space inertial sensors are imperative to Earth geodesy missions, gravitational wave observations and several fundamental physics experiments in space. In these missions, the residual acceleration noise of the test mass(TM) caused by the forces from inertial sensor components and environment is supposed to be kept below a certain level. As a number of forces contributing to residual acceleration are related to actuation system, developing a precise actuation system to exclude any erroneous force and obtain an ultra sensitive value for TM acceleration noise is necessary and essential. However, it is difficult to test the actuation system on ground. In this paper, a torsion pendulum is established to test the influence of actuation system on TM torque noise and a closed-loop control system combined torsion pendulum and parts of actuation modules is designed to assess the performance of actuation control algorithm. The experimental results show that the parameters in an actuation system will introduce additional torque noise and the maximum noise can reach as much as 10^{-13}Nm /Hz^{1/2} at 1 mHz. The stable tracking error for the closed-loop system is about 10^{-7}, indicating that the combination system achieves good tracking performance and robustness for TM rotation control in different conditions of inertial sensors.
△ Less
Submitted 10 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Asymptotic behavior for the quenched survival probability of a supercritical branching random walk in random environment with a barrier
Authors:
You Lv,
Wenming Hong
Abstract:
We introduce a random barrier to a supercritical branching random walk in an i.i.d. random environment $\{\mathcal{L}_n\}$ indexed by time $n,$ i.e., in each generation, only the individuals born below the barrier can survive and reproduce. At generation $n$ ($n\in\mathbb{N}$), the barrier is set as $χ_n+\varepsilon n,$ where $\{χ_n\}$ is a random walk determined by the random environment. Lv \& H…
▽ More
We introduce a random barrier to a supercritical branching random walk in an i.i.d. random environment $\{\mathcal{L}_n\}$ indexed by time $n,$ i.e., in each generation, only the individuals born below the barrier can survive and reproduce. At generation $n$ ($n\in\mathbb{N}$), the barrier is set as $χ_n+\varepsilon n,$ where $\{χ_n\}$ is a random walk determined by the random environment. Lv \& Hong (2024) showed that for almost every $\mathcal{L}:=\{\mathcal{L}_n\},$ the quenched survival probability (denoted by $\varrho_{\mathcal{L}}(\varepsilon)$) of the particles system will be 0 (resp., positive) when $\varepsilon\leq 0$ (resp., $\varepsilon>0$). In the present paper, we prove that $\sqrt{\varepsilon}\log\varrho_\mathcal{L}(\varepsilon)$ will converge in Probability/ almost surely/ in $L^p$ to an explicit negative constant (depending on the environment) as $\varepsilon\downarrow 0$ under some integrability conditions respectively. This result extends the scope of the result of Gantert et al. (2011) to the random environment case.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
LVBench: An Extreme Long Video Understanding Benchmark
Authors:
Weihan Wang,
Zehai He,
Wenyi Hong,
Yean Cheng,
Xiaohan Zhang,
Ji Qi,
Shiyu Huang,
Bin Xu,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sport…
▽ More
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the demands of real-world applications such as embodied intelligence for long-term decision-making, in-depth movie reviews and discussions, and live sports commentary, all of which require comprehension of long videos spanning several hours. To address this gap, we introduce LVBench, a benchmark specifically designed for long video understanding. Our dataset comprises publicly sourced videos and encompasses a diverse set of tasks aimed at long video comprehension and information extraction. LVBench is designed to challenge multimodal models to demonstrate long-term memory and extended comprehension capabilities. Our extensive evaluations reveal that current multimodal models still underperform on these demanding long video understanding tasks. Through LVBench, we aim to spur the development of more advanced models capable of tackling the complexities of long video comprehension. Our data and code are publicly available at: https://lvbench.github.io.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents
Authors:
John L. Zhou,
Weizhe Hong,
Jonathan C. Kao
Abstract:
Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naïve reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-shaping methods have demonstrated the ability to re…
▽ More
Emergent cooperation among self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents. Instead, naïve reinforcement learning algorithms typically converge to Pareto-dominated outcomes in even the simplest of social dilemmas. An emerging class of opponent-shaping methods have demonstrated the ability to reach prosocial outcomes by influencing the learning of other agents. However, they rely on higher-order derivatives through the predicted learning step of other agents or learning meta-game dynamics, which in turn rely on stringent assumptions over opponent learning rules or exponential sample complexity, respectively. To provide a learning rule-agnostic and sample-efficient alternative, we introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of an opponent's actions on their returns. This approach effectively seeks to modify other agents' $Q$-values by increasing their return following beneficial actions (with respect to the Reciprocator) and decreasing it after detrimental actions, guiding them towards mutually beneficial actions without attempting to directly shape policy updates. We show that Reciprocators can be used to promote cooperation in a variety of temporally extended social dilemmas during simultaneous learning.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Global-Local Detail Guided Transformer for Sea Ice Recognition in Optical Remote Sensing Images
Authors:
Zhanchao Huang,
Wenjun Hong,
Hua Su
Abstract:
The recognition of sea ice is of great significance for reflecting climate change and ensuring the safety of ship navigation. Recently, many deep learning based methods have been proposed and applied to segment and recognize sea ice regions. However, the diverse scales of sea ice areas, the zigzag and fine edge contours, and the difficulty in distinguishing different types of sea ice pose challeng…
▽ More
The recognition of sea ice is of great significance for reflecting climate change and ensuring the safety of ship navigation. Recently, many deep learning based methods have been proposed and applied to segment and recognize sea ice regions. However, the diverse scales of sea ice areas, the zigzag and fine edge contours, and the difficulty in distinguishing different types of sea ice pose challenges to existing sea ice recognition models. In this paper, a Global-Local Detail Guided Transformer (GDGT) method is proposed for sea ice recognition in optical remote sensing images. In GDGT, a global-local feature fusiont mechanism is designed to fuse global structural correlation features and local spatial detail features. Furthermore, a detail-guided decoder is developed to retain more high-resolution detail information during feature reconstruction for improving the performance of sea ice recognition. Experiments on the produced sea ice dataset demonstrated the effectiveness and advancement of GDGT.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
A Flat Dual-Polarized Millimeter-Wave Luneburg Lens Antenna Using Transformation Optics with Reduced Anisotropy and Impedance Mismatch
Authors:
Yuanyan Su,
Teng Li,
Wei Hong,
Zhi Ning Chen,
Anja K. Skrivervik
Abstract:
In this paper, a compact wideband dual-polarized Luneburg lens antenna (LLA) with reduced anisotropy and improved impedance matching is proposed in Ka band with a wide 2D beamscanning capability. Based on transformation optics, the spherical Luneburg lens is compressed into a cylindrical one, while the merits of high gain, broad band, wide scanning, and free polarization are preserved. A trigonome…
▽ More
In this paper, a compact wideband dual-polarized Luneburg lens antenna (LLA) with reduced anisotropy and improved impedance matching is proposed in Ka band with a wide 2D beamscanning capability. Based on transformation optics, the spherical Luneburg lens is compressed into a cylindrical one, while the merits of high gain, broad band, wide scanning, and free polarization are preserved. A trigonometric function is employed to the material property of the flattened Luneburg lens with reduced anisotropy, thus effectively alleviates the strong reflection, the high sidelobes and back radiation with a free cost on the antenna weight and volume. Furthermore, a light thin wideband 7-by-1 metasurface phased array is studied as the primary feed for the LLA. The proposed metantenna, shorted for metamaterial-based antenna, has a high potential for B5G, future wireless communication and radar sensing as an onboard system.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Precise large deviation for stationary sequence of branching process with immigration
Authors:
Jiayan Guo,
Wenming Hong
Abstract:
It is known that there exists the stationary sequence of branching process with immigration $\{X_{n}\}_{n\in\mathbb{Z}}$ under some conditions (Foster and Williamson (1971)), when the offspring is critical or subcritical. A precise large deviation probability for the partial sum $S_{n}=X_{1}+\cdots+X_{n}$ is specified, the significant difference is revealed for the critical and subcritical cases.
It is known that there exists the stationary sequence of branching process with immigration $\{X_{n}\}_{n\in\mathbb{Z}}$ under some conditions (Foster and Williamson (1971)), when the offspring is critical or subcritical. A precise large deviation probability for the partial sum $S_{n}=X_{1}+\cdots+X_{n}$ is specified, the significant difference is revealed for the critical and subcritical cases.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Authors:
Zhuoyi Yang,
Heyang Jiang,
Wenyi Hong,
Jiayan Teng,
Wendi Zheng,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inferenc…
▽ More
Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inference process and handle global dependencies. Building on this module, we adopt the DiT structure for upsampling and develop an infinite super-resolution model capable of upsampling images of various shapes and resolutions. Comprehensive experiments show that our model achieves SOTA performance in generating ultra-high-resolution images in both machine and human evaluation. Compared to commonly used UNet structures, our model can save more than 5x memory when generating 4096*4096 images. The project URL is https://github.com/THUDM/Inf-DiT.
△ Less
Submitted 8 May, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Quasi-stationary distributions for subcritical branching Markov chains
Authors:
Wenming Hong,
Dan Yao
Abstract:
Consider a subcritical branching Markov chain. Let $Z_n$ denote the counting measure of particles of generation $n$. Under some conditions, we give a probabilistic proof for the existence of the Yaglom limit of $(Z_n)_{n\in\mathbb{N}}$ by the moment method, based on the spinal decomposition and the many-to-few formula. As a result, we give explicit integral representations of all quasi-stationary…
▽ More
Consider a subcritical branching Markov chain. Let $Z_n$ denote the counting measure of particles of generation $n$. Under some conditions, we give a probabilistic proof for the existence of the Yaglom limit of $(Z_n)_{n\in\mathbb{N}}$ by the moment method, based on the spinal decomposition and the many-to-few formula. As a result, we give explicit integral representations of all quasi-stationary distributions of $(Z_n)_{n\in\mathbb{N}}$, whose proofs are direct and probabilistic, and don't rely on Martin boundary theory.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Precise Large Deviations For The Total Population Of Heavy-Tailed Subcritical Branching Process With Immigration
Authors:
Jiayan Guo,
Wenming Hong
Abstract:
In this article we focus on the partial sum $S_{n}=X_{1}+\cdots+X_{n}$ of the subcritical branching process with immigration $\{X_{n}\}_{n\in\mathbb{N_{+}}}$, under the condition that one of the offspring $ξ$ or immigration $η$ is regularly varying. The tail distribution of $S_n$ is heavily dependent on that of $ξ$ and $η$, and a precise large deviation probability for $S_{n}$ is specified. (i)Whe…
▽ More
In this article we focus on the partial sum $S_{n}=X_{1}+\cdots+X_{n}$ of the subcritical branching process with immigration $\{X_{n}\}_{n\in\mathbb{N_{+}}}$, under the condition that one of the offspring $ξ$ or immigration $η$ is regularly varying. The tail distribution of $S_n$ is heavily dependent on that of $ξ$ and $η$, and a precise large deviation probability for $S_{n}$ is specified. (i)When the tail of offspring $ξ$ is lighter than immigration $η$, uniformly for $x\geq x_{n}$, $P(S_{n}-ES_{n}>x)\sim c_{1}nP(η>x)$ with some constant $c_{1}$ and sequence $\{x_{n}\}$, where $c_{1}$ is only related to the mean of offspring; (ii) When the tail of immigration $η$ is not heavier than offspring $ξ$, uniformly for $x\geq x_{n}$,$P(S_{n} ES_{n}>x)\sim c_{2}nP(ξ>x)$ with some constant $c_{2}$ and sequence $\{x_{n}\}$, where $c_{2}$ is related to both the mean of offspring and the mean of immigration.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
End-to-End Verifiable Decentralized Federated Learning
Authors:
Chaehyeon Lee,
Jonathan Heiss,
Stefan Tai,
James Won-Ki Hong
Abstract:
Verifiable decentralized federated learning (FL) systems combining blockchains and zero-knowledge proofs (ZKP) make the computational integrity of local learning and global aggregation verifiable across workers. However, they are not end-to-end: data can still be corrupted prior to the learning. In this paper, we propose a verifiable decentralized FL system for end-to-end integrity and authenticit…
▽ More
Verifiable decentralized federated learning (FL) systems combining blockchains and zero-knowledge proofs (ZKP) make the computational integrity of local learning and global aggregation verifiable across workers. However, they are not end-to-end: data can still be corrupted prior to the learning. In this paper, we propose a verifiable decentralized FL system for end-to-end integrity and authenticity of data and computation extending verifiability to the data source. Addressing an inherent conflict of confidentiality and transparency, we introduce a two-step proving and verification (2PV) method that we apply to central system procedures: a registration workflow that enables non-disclosing verification of device certificates and a learning workflow that extends existing blockchain and ZKP-based FL systems through non-disclosing data authenticity proofs. Our evaluation on a prototypical implementation demonstrates the technical feasibility with only marginal overheads to state-of-the-art solutions.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Primary Rate Maximization in Movable Antennas Empowered Symbiotic Radio Communications
Authors:
Bin Lyu,
Hao Liu,
Wenqing Hong,
Shimin Gong,
Feng Tian
Abstract:
In this paper, we propose a movable antenna (MA) empowered scheme for symbiotic radio (SR) communication systems. Specifically, multiple antennas at the primary transmitter (PT) can be flexibly moved to favorable locations to boost the channel conditions of the primary and secondary transmissions. The primary transmission is achieved by the active transmission from the PT to the primary user (PU),…
▽ More
In this paper, we propose a movable antenna (MA) empowered scheme for symbiotic radio (SR) communication systems. Specifically, multiple antennas at the primary transmitter (PT) can be flexibly moved to favorable locations to boost the channel conditions of the primary and secondary transmissions. The primary transmission is achieved by the active transmission from the PT to the primary user (PU), while the backscatter device (BD) takes a ride over the incident signal from the PT to passively send the secondary signal to the PU. Under this setup, we consider a primary rate maximization problem by jointly optimizing the transmit beamforming and the positions of MAs at the PT under a practical bit error rate constraint on the secondary transmission. Then, an alternating optimization framework with the utilization of the successive convex approximation, semi-definite processing and simulated annealing (SA) modified particle swarm optimization (SA-PSO) methods is proposed to find the solution of the transmit beamforming and MAs' positions. Finally, numerical results are provided to demonstrate the performance improvement provided by the proposed MA empowered scheme and the proposed algorithm.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
CSST large-scale structure analysis pipeline: I. constructing reference mock galaxy redshift surveys
Authors:
Yizhou Gu,
Xiaohu Yang,
Jiaxin Han,
Yirong Wang,
Qingyang Li,
Zhenlin Tan,
Wenkang Jiang,
Yaru Wang,
Jiaqi Wang,
Antonios Katsianis,
Xiaoju Xu,
Haojie Xu,
Wensheng Hong,
Houjun Mo,
Run Wen,
Xianzhong Zheng,
Feng Shi,
Pengjie Zhang,
Zhongxu Zhai,
Chengze Liu,
Wenting Wang,
Ying Zu,
Hong Guo,
Youcai Zhang,
Yi Lu
, et al. (7 additional authors not shown)
Abstract:
In this paper, we set out to construct a set of reference mock galaxy redshift surveys (MGRSs) for the future Chinese Space-station Survey Telescope (CSST) observation, where subsequent survey selection effects can be added and evaluated. This set of MGRSs is generated using the dark matter subhalos extracted from a high-resolution Jiutian $N$-body simulation of the standard $Λ$CDM cosmogony with…
▽ More
In this paper, we set out to construct a set of reference mock galaxy redshift surveys (MGRSs) for the future Chinese Space-station Survey Telescope (CSST) observation, where subsequent survey selection effects can be added and evaluated. This set of MGRSs is generated using the dark matter subhalos extracted from a high-resolution Jiutian $N$-body simulation of the standard $Λ$CDM cosmogony with $Ω_m=0.3111$, $Ω_Λ=0.6889$, and $σ_8=0.8102$. The simulation has a boxsize of $1~h^{-1} {\rm Gpc}$, and consists of $6144^3$ particles with mass resolution $3.723 \times 10^{8} h^{-1} M_\odot $. In order to take into account the effect of redshift evolution, we first use all 128 snapshots in the Jiutian simulation to generate a light-cone halo/subhalo catalog. Next, galaxy luminosities are assigned to the main and subhalo populations using the subhalo abundance matching (SHAM) method with the DESI $z$-band luminosity functions at different redshifts. Multi-band photometries, as well as images, are then assigned to each mock galaxy using a 3-dimensional parameter space nearest neighbor sampling of the DESI LS observational galaxies and groups. Finally, the CSST and DESI LS survey geometry and magnitude limit cuts are applied to generate the required MGRSs. As we have checked, this set of MGRSs can generally reproduce the observed galaxy luminosity/mass functions within 0.1 dex for galaxies with $L > 10^8 L_\odot$ (or $M_* > 10^{8.5} M_\odot$) and within 1-$σ$ level for galaxies with $L < 10^8L_\odot$ (or $M_* < 10^{8.5} M_\odot$). Together with the CSST slitless spectra and redshifts for our DESI LS seed galaxies that are under construction, we will set out to test various slitless observational selection effects in subsequent probes.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Evolution of cold streams in hot gaseous halos
Authors:
WenSheng Hong,
Weishan Zhu,
TianRui Wang,
Xiaohu Yang,
LongLong Feng
Abstract:
In the prevailing model of galaxy formation and evolution, the process of gas accretion onto central galaxies undergoes a transition from cold-dominated to hot-dominated modes. This shift occurs when the mass of the parent dark matter halos exceeds a critical threshold known as $M_{shock}$. Moreover, cold gas usually flows onto central galaxies through filamentary structures, currently referred to…
▽ More
In the prevailing model of galaxy formation and evolution, the process of gas accretion onto central galaxies undergoes a transition from cold-dominated to hot-dominated modes. This shift occurs when the mass of the parent dark matter halos exceeds a critical threshold known as $M_{shock}$. Moreover, cold gas usually flows onto central galaxies through filamentary structures, currently referred to as cold streams. However, the evolution of cold streams in halos with masses around $M_{shock}$, particularly how they are disrupted, remains unclear. To address this issue, we conduct a set of idealised hydrodynamic simulations. Our simulations show that (1) for a gas metallicity $Z=0.001-0.1Z_{\odot}$, cold stream with an inflow rate $\sim 3\, \rm{M_{\odot}}/yr$ per each can persist and effectively transport cold and cool gas to the central region ($< 0.2$ virial radius) in halos with mass $10^{12}\, \rm{M_{\odot}}$, but is disrupted at a radius around $0.2$ virial radius due to compression heating for halos with mass $3 \times 10^{12}\, \rm{M_{\odot}}$. (2) At $z\sim 2$, the maximum halo mass that capable of hosting and sustaining cold streams $M_{stream}$ is between $1\times 10^{12} \rm{M_{\odot}}$ and $1.5\times 10^{12}\rm{M_{\odot}}$ for gas metallicity $Z=0.001Z_{\odot}$, while for a higher gas metallicity $Z=0.1Z_{\odot}$, this value increases to $\sim 1.5\times 10^{12}\rm{M_{\odot}}$. (3) The evolution and ultimate fate of cold streams are determined primarily by the rivalry between radiative cooling and compression. Stronger heating due to compression in halos more massive than $M_{stream}$ can surpass cooling and heat the gas in cold streams to the hot ($\geq 10^6\,$ K) phase.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Identifying Black Holes Through Space Telescopes and Deep Learning
Authors:
Yeqi Fang,
Wei Hong,
Jun Tao
Abstract:
The EHT has captured a series of images of black holes. These images could provide valuable information about the gravitational environment near the event horizon. However, accurate detection and parameter estimation for candidate black holes are necessary. This paper explores the potential for identifying black holes in the ultraviolet band using space telescopes. We establish a data pipeline for…
▽ More
The EHT has captured a series of images of black holes. These images could provide valuable information about the gravitational environment near the event horizon. However, accurate detection and parameter estimation for candidate black holes are necessary. This paper explores the potential for identifying black holes in the ultraviolet band using space telescopes. We establish a data pipeline for generating simulated observations and present an ensemble neural network model for black hole detection and parameter estimation. The model achieves mean average precision [0.5] values of 0.9176 even when reaching the imaging FWHM ($θ_c$) and maintains the detection ability until $0.54θ_c$. The parameter estimation is also accurate. These results indicate that our methodology enables super-resolution recognition. Moreover, the model successfully detects the shadow of M87* from background noise and other celestial bodies and estimates its inclination and positional angle. Our work demonstrates the feasibility of detecting black holes in the ultraviolet band and provides a new method for black hole detection and further parameter estimation.
△ Less
Submitted 15 August, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
Large Deviation Principle for Multi-Scale Fully Local Monotone Stochastic Dynamical Systems with Multiplicative Noise
Authors:
Wei Hong,
Wei Liu,
Luhan Yang
Abstract:
This paper is devoted to proving the small noise asymptotic behaviour, particularly large deviation principle, for multi-scale stochastic dynamical systems with fully local monotone coefficients driven by multiplicative noise. The main techniques are based on a combination of the weak convergence approach, the time discretization technique and the theory of pseudo-monotone operator. The main resul…
▽ More
This paper is devoted to proving the small noise asymptotic behaviour, particularly large deviation principle, for multi-scale stochastic dynamical systems with fully local monotone coefficients driven by multiplicative noise. The main techniques are based on a combination of the weak convergence approach, the time discretization technique and the theory of pseudo-monotone operator. The main results derived in this paper have broad applicability to various multi-scale models, where the slow component could be such as stochastic porous medium equations, stochastic Cahn-Hilliard equations and stochastic 2D Liquid crystal equations.
△ Less
Submitted 8 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Detailed Report on the Measurement of the Positive Muon Anomalous Magnetic Moment to 0.20 ppm
Authors:
D. P. Aguillard,
T. Albahri,
D. Allspach,
A. Anisenkov,
K. Badgley,
S. Baeßler,
I. Bailey,
L. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
E. Barzi,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
S. Braun,
M. Bressler,
G. Cantatore,
R. M. Carey,
B. C. K. Casey
, et al. (168 additional authors not shown)
Abstract:
We present details on a new measurement of the muon magnetic anomaly, $a_μ= (g_μ-2)/2$. The result is based on positive muon data taken at Fermilab's Muon Campus during the 2019 and 2020 accelerator runs. The measurement uses $3.1$ GeV$/c$ polarized muons stored in a $7.1$-m-radius storage ring with a $1.45$ T uniform magnetic field. The value of $ a_μ$ is determined from the measured difference b…
▽ More
We present details on a new measurement of the muon magnetic anomaly, $a_μ= (g_μ-2)/2$. The result is based on positive muon data taken at Fermilab's Muon Campus during the 2019 and 2020 accelerator runs. The measurement uses $3.1$ GeV$/c$ polarized muons stored in a $7.1$-m-radius storage ring with a $1.45$ T uniform magnetic field. The value of $ a_μ$ is determined from the measured difference between the muon spin precession frequency and its cyclotron frequency. This difference is normalized to the strength of the magnetic field, measured using Nuclear Magnetic Resonance (NMR). The ratio is then corrected for small contributions from beam motion, beam dispersion, and transient magnetic fields. We measure $a_μ= 116 592 057 (25) \times 10^{-11}$ (0.21 ppm). This is the world's most precise measurement of this quantity and represents a factor of $2.2$ improvement over our previous result based on the 2018 dataset. In combination, the two datasets yield $a_μ(\text{FNAL}) = 116 592 055 (24) \times 10^{-11}$ (0.20 ppm). Combining this with the measurements from Brookhaven National Laboratory for both positive and negative muons, the new world average is $a_μ$(exp) $ = 116 592 059 (22) \times 10^{-11}$ (0.19 ppm).
△ Less
Submitted 22 May, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations
Authors:
Ji Qi,
Ming Ding,
Weihan Wang,
Yushi Bai,
Qingsong Lv,
Wenyi Hong,
Bin Xu,
Lei Hou,
Juanzi Li,
Yuxiao Dong,
Jie Tang
Abstract:
Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, further resulting in failures in meticulous visual problems and unfaithful responses. Drawing inspiration from human cognition in solving visual problems…
▽ More
Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, further resulting in failures in meticulous visual problems and unfaithful responses. Drawing inspiration from human cognition in solving visual problems (e.g., marking, zoom in), this paper introduces Chain of Manipulations, a mechanism that enables VLMs to solve problems step-by-step with evidence. After training, models can solve various visual problems by eliciting intrinsic manipulations (e.g., grounding, zoom in) with results (e.g., boxes, image) actively without involving external tools, while also allowing users to trace error causes. We study the roadmap to implement this mechanism, including (1) a flexible design of manipulations upon extensive analysis, (2) an efficient automated data generation pipeline, (3) a compatible VLM architecture capable of multi-turn multi-image, and (4) a model training process for versatile capabilities. With the design, we also manually annotate 6K high-quality samples for the challenging graphical mathematical problems. Our trained model, \textbf{CogCoM}, equipped with this mechanism with 17B parameters achieves state-of-the-art performance across 9 benchmarks from 4 categories, demonstrating the effectiveness while preserving the interpretability. Our code, model weights, and collected data are publicly available at https://github.com/THUDM/CogCoM.
△ Less
Submitted 22 May, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Using digital twins for managing change in complex projects
Authors:
Jennifer Whyte,
Ranjith Soman,
Rafael Sacks,
Neda Mohammadi,
Nader Naderpajouh,
Wei-Ting Hong,
Ghang Lee
Abstract:
Complex systems are not entirely decomposable, hence interdependences arise at the interfaces in complex projects. When changes occur, significant risks arise at these interfaces as it is hard to identify, manage and visualise the systemic consequences of changes. Particularly problematic are the interfaces in which there are multiple interdependencies, which occur where the boundaries between des…
▽ More
Complex systems are not entirely decomposable, hence interdependences arise at the interfaces in complex projects. When changes occur, significant risks arise at these interfaces as it is hard to identify, manage and visualise the systemic consequences of changes. Particularly problematic are the interfaces in which there are multiple interdependencies, which occur where the boundaries between design components, contracts and organisation coincide, such as between design disciplines. In this paper, we propose an approach to digital twin-based interface management, through an underpinning state-of-the-art review of the existing technical literature and a small pilot to identify the characteristics of future data-driven solutions. We set out an approach to digital twin-based interface management and an agenda for research on advanced methodologies for managing change in complex projects. This agenda includes the need to integrate work on identifying systems interfaces, change propagation and visualisation, and the potential to significantly extend the limitations of existing solutions by using developments in the digital twin, such as linked data, semantic enrichment, network analyses, natural language processing (NLP)-enhanced ontology and machine learning.
△ Less
Submitted 30 May, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Unprecedentedly large superconducting gap in HgBa$_2$Ca$_2$Cu$_3$O$_{8+δ}$ with the highest $T_c$ at ambient pressure
Authors:
Chuanhao Wen,
Zhiyong Hou,
Alireza Akbari,
Kailun Chen,
Wenshan Hong,
Huan Yang,
Ilya Eremin,
Yuan Li,
Hai-Hu Wen
Abstract:
In cuprate superconductors, the highest superconducting transition temperature $T_c$ is possessed by the HgBa$_2$Ca$_2$Cu$_3$O$_{8+δ}$ (Hg-1223) system at ambient pressure, but the reason remains elusive. Here we report the scanning tunneling microscope measurements on the Hg-1223 single crystals with $T_c$ = 134 K. The observed superconducting gaps determined from the tunneling spectra can be cat…
▽ More
In cuprate superconductors, the highest superconducting transition temperature $T_c$ is possessed by the HgBa$_2$Ca$_2$Cu$_3$O$_{8+δ}$ (Hg-1223) system at ambient pressure, but the reason remains elusive. Here we report the scanning tunneling microscope measurements on the Hg-1223 single crystals with $T_c$ = 134 K. The observed superconducting gaps determined from the tunneling spectra can be categorized into two groups: the smaller gap $Δ_1$ ranges from about 45 to 70 meV, while the larger gap $Δ_2$ from about 65 to 98 meV. The observed unprecedentedly large gap value gives a straightforward explanation to the highest $T_c$ in the Hg-1223 system. The largest gap observed here is comparable to the magnetic superexchange energy and excludes any possibility of using phonon pictures to interpret the superconductivity. Interestingly, an extremely strong particle-hole asymmetry is observed in associating with a very robust coherence peak at the bias of the larger gap in the hole branch of the Bogoliubov dispersion. We propose that the observed asymmetry results from the interplay of a flat band (van Hove singularity) in the electronic spectrum and the large superconducting gap in the underdoped layer. This could be the main reason for the strong pairing, and significant enhancement of the density of states in the hole branch of the Bogoliubov band yielding strong phase coherence of Cooper pairs. A scenario based on a trilayer model with an interlayer coupling can give a reasonable explanation. Our results provide deep insight into understanding the mechanism of superconductivity in cuprate superconductors.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Impact of Flexible and Bidirectional Charging in Medium- and Heavy-Duty Trucks on California's Decarbonization Pathway
Authors:
Osten Anderson,
Wanshi Hong,
Bin Wang,
Nanpeng Yu
Abstract:
California has committed to ambitious decarbonization targets across multiple sectors, including decarbonizing the electrical grid by 2045. In addition, the medium- and heavy-duty truck fleets are expected to see rapid electrification over the next two decades. Considering these two pathways in tandem is critical for ensuring cost optimality and reliable power system operation. In particular, we e…
▽ More
California has committed to ambitious decarbonization targets across multiple sectors, including decarbonizing the electrical grid by 2045. In addition, the medium- and heavy-duty truck fleets are expected to see rapid electrification over the next two decades. Considering these two pathways in tandem is critical for ensuring cost optimality and reliable power system operation. In particular, we examine the potential cost savings of electrical generation infrastructure by enabling flexible charging and bidirectional charging for these trucks. We also examine costs adjacent to enabling these services, such as charger upgrades and battery degradation. We deploy a large mixed-integer decarbonization planning model to quantify the costs associated with the electric generation decarbonization pathway. Example scenarios governing truck driving and charging behaviors are implemented to reveal the sensitivity of temporal driving patterns. Our experiments show that cost savings on the order of multiple billions of dollars are possible by enabling flexible and bidirectional charging in medium- and heavy-duty trucks in California.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Measuring the conditional luminosity and stellar mass functions of galaxies by combining the DESI LS DR9, SV3 and Y1 data
Authors:
Yirong Wang,
Xiaohu Yang,
Yizhou Gu,
Xiaoju Xu,
Haojie Xu,
Yuyu Wang,
Antonios Katsianis,
Jiaxin Han,
Min He,
Yunliang Zheng,
Qingyang Li,
Yaru Wang,
Wensheng Hong,
Jiaqi Wang,
Zhenlin Tan,
Hu Zou,
Johannes Ulf Lange,
ChangHoon Hahn,
Peter Behroozi,
Jessica Nicole Aguilar,
Steven Ahlen,
David Brooks,
Todd Claybaugh,
Shaun Cole,
Axel de la Macorra
, et al. (20 additional authors not shown)
Abstract:
In this investigation, we leverage the combination of Dark Energy Spectroscopic Instrument Legacy imaging Surveys Data Release 9 (DESI LS DR9), Survey Validation 3 (SV3), and Year 1 (Y1) data sets to estimate the conditional luminosity and stellar mass functions (CLFs & CSMFs) of galaxies across various halo mass bins and redshift ranges. To support our analysis, we utilize a realistic DESI Mock G…
▽ More
In this investigation, we leverage the combination of Dark Energy Spectroscopic Instrument Legacy imaging Surveys Data Release 9 (DESI LS DR9), Survey Validation 3 (SV3), and Year 1 (Y1) data sets to estimate the conditional luminosity and stellar mass functions (CLFs & CSMFs) of galaxies across various halo mass bins and redshift ranges. To support our analysis, we utilize a realistic DESI Mock Galaxy Redshift Survey (MGRS) generated from a high-resolution Jiutian simulation. An extended halo-based group finder is applied to both MGRS catalogs and DESI observation. By comparing the r and z-band luminosity functions (LFs) and stellar mass functions (SMFs) derived using both photometric and spectroscopic data, we quantified the impact of photometric redshift (photo-z) errors on the galaxy LFs and SMFs, especially in the low redshift bin at low luminosity/mass end. By conducting prior evaluations of the group finder using MGRS, we successfully obtain a set of CLF and CSMF measurements from observational data. We find that at low redshift the faint end slopes of CLFs and CSMFs below $10^{9}h^{-2}L_{\odot}$ (or $h^{-2}M_{\odot}$) evince a compelling concordance with the subhalo mass functions. After correcting the cosmic variance effect of our local Universe following arXiv:1809.00523, the faint end slopes of the LFs/SMFs turn out to be also in good agreement with the slope of the halo mass function.
△ Less
Submitted 22 June, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
CogAgent: A Visual Language Model for GUI Agents
Authors:
Wenyi Hong,
Weihan Wang,
Qingsong Lv,
Jiazheng Xu,
Wenmeng Yu,
Junhui Ji,
Yan Wang,
Zihan Wang,
Yuxuan Zhang,
Juanzi Li,
Bin Xu,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billi…
▽ More
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggle to understand and interact with GUIs, thus limiting their potential to increase automation levels. In this paper, we introduce CogAgent, an 18-billion-parameter visual language model (VLM) specializing in GUI understanding and navigation. By utilizing both low-resolution and high-resolution image encoders, CogAgent supports input at a resolution of 1120*1120, enabling it to recognize tiny page elements and text. As a generalist visual language model, CogAgent achieves the state of the art on five text-rich and four general VQA benchmarks, including VQAv2, OK-VQA, Text-VQA, ST-VQA, ChartQA, infoVQA, DocVQA, MM-Vet, and POPE. CogAgent, using only screenshots as input, outperforms LLM-based methods that consume extracted HTML text on both PC and Android GUI navigation tasks -- Mind2Web and AITW, advancing the state of the art. The model and codes are available at https://github.com/THUDM/CogVLM .
△ Less
Submitted 21 December, 2023; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Neutral Editing Framework for Diffusion-based Video Editing
Authors:
Sunjae Yoon,
Gwanhyeong Koo,
Ji Woo Hong,
Chang D. Yoo
Abstract:
Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex…
▽ More
Text-conditioned image editing has succeeded in various types of editing based on a diffusion framework. Unfortunately, this success did not carry over to a video, which continues to be challenging. Existing video editing systems are still limited to rigid-type editing such as style transfer and object overlay. To this end, this paper proposes Neutral Editing (NeuEdit) framework to enable complex non-rigid editing by changing the motion of a person/object in a video, which has never been attempted before. NeuEdit introduces a concept of `neutralization' that enhances a tuning-editing process of diffusion-based editing systems in a model-agnostic manner by leveraging input video and text without any other auxiliary aids (e.g., visual masks, video captions). Extensive experiments on numerous videos demonstrate adaptability and effectiveness of the NeuEdit framework. The website of our work is available here: https://neuedit.github.io
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Model Evaluation for Domain Identification of Unknown Classes in Open-World Recognition: A Proposal
Authors:
Gusti Ahmad Fanshuri Alfarisy,
Owais Ahmed Malik,
Ong Wee Hong
Abstract:
Open-World Recognition (OWR) is an emerging field that makes a machine learning model competent in rejecting the unknowns, managing them, and incrementally adding novel samples to the base knowledge. However, this broad objective is not practical for an agent that works on a specific task. Not all rejected samples will be used for learning continually in the future. Some novel images in the open e…
▽ More
Open-World Recognition (OWR) is an emerging field that makes a machine learning model competent in rejecting the unknowns, managing them, and incrementally adding novel samples to the base knowledge. However, this broad objective is not practical for an agent that works on a specific task. Not all rejected samples will be used for learning continually in the future. Some novel images in the open environment may not belong to the domain of interest. Hence, identifying the unknown in the domain of interest is essential for a machine learning model to learn merely the important samples. In this study, we propose an evaluation protocol for estimating a model's capability in separating unknown in-domain (ID) and unknown out-of-domain (OOD). We evaluated using three approaches with an unknown domain and demonstrated the possibility of identifying the domain of interest using the pre-trained parameters through traditional transfer learning, Automated Machine Learning (AutoML), and Nearest Class Mean (NCM) classifier with First Integer Neighbor Clustering Hierarchy (FINCH). We experimented with five different domains: garbage, food, dogs, plants, and birds. The results show that all approaches can be used as an initial baseline yielding a good accuracy. In addition, a Balanced Accuracy (BACCU) score from a pre-trained model indicates a tendency to excel in one or more domains of interest. We observed that MobileNetV3 yielded the highest BACCU score for the garbage domain and surpassed complex models such as the transformer network. Meanwhile, our results also suggest that a strong representation in the pre-trained model is important for identifying unknown classes in the same domain. This study could open the bridge toward open-world recognition in domain-specific tasks where the relevancy of the unknown classes is vital.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Electronic Phase Propagation Speed in BaFe$_2$As$_2$ Revealed by Dilatometry
Authors:
Xin Qin,
Xingyu Wang,
Wenshan Hong,
Mengqiao Geng,
Yuan Li,
Huiqian Luo,
Shiliang Li,
Yang Liu
Abstract:
Thermal expansion offers deep insights into phase transitions in condensed matter physics. Utilizing an advanced AC-temperature dilatometer with picometer resolution, this study clearly resolves the antiferromagnetic and structural transition in BaFe$_2$As$_2$. The implementation of temperature oscillation reveals a hysteresis near the transition temperature $T_\mathrm{N}$ with unprecedented resol…
▽ More
Thermal expansion offers deep insights into phase transitions in condensed matter physics. Utilizing an advanced AC-temperature dilatometer with picometer resolution, this study clearly resolves the antiferromagnetic and structural transition in BaFe$_2$As$_2$. The implementation of temperature oscillation reveals a hysteresis near the transition temperature $T_\mathrm{N}$ with unprecedented resolution. Unexpectedly, we find that the hysteretic width exhibits a universal dependence on the parameters of temperature oscillation and the sample's longidutinal dimension, which in turn reveals a finite transition speed. Our quantitative analysis shows that this phase boundary propagates at a mere 188 $μ$m/s - a speed seven orders of magnitude slower than acoustic waves. It suggests a hidden thermodynamic constraint imposed by the electronic degrees of freedom. Our research not only sheds light on the dynamics of phase transitions between different correlated phases, but also establishes high precision dilatometry as a powerful tool for material studies. This measurement technique, when properly modified, can be extended to studies of other material properties such as piezoelectric, magneto-restriction, elastic modulus, etc.
△ Less
Submitted 26 March, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Joint Design of ISAC Waveform under PAPR Constraints
Authors:
Yating Chen,
Cai Wen,
Yan Huang,
Le Liang,
Jie Li,
Hui Zhang,
Wei Hong
Abstract:
In this paper, we formulate the precoding problem of integrated sensing and communication (ISAC) waveform as a non-convex quadratically constrainted quadratic program (QCQP), in which the weighted sum of communication multi-user interference (MUI) and the gap between dual-use waveform and ideal radar waveform is minimized with peak-to-average power ratio (PAPR) constraints. We propose an efficient…
▽ More
In this paper, we formulate the precoding problem of integrated sensing and communication (ISAC) waveform as a non-convex quadratically constrainted quadratic program (QCQP), in which the weighted sum of communication multi-user interference (MUI) and the gap between dual-use waveform and ideal radar waveform is minimized with peak-to-average power ratio (PAPR) constraints. We propose an efficient algorithm based on alternating direction method of multipliers (ADMM), which is able to decouple multiple variables and provide a closed-form solution for each subproblem. In addition, to improve the sensing performance in both spatial and temporal domains, we propose a new criteria to design the ideal radar waveform, in which the beam pattern is made similar to the ideal one and the integrated sidelobe level of the ambiguity function in each target direction is minimized in the region of interest. The limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm is applied to the design of the ideal radar waveform which works as a reference in the design of the dual-function waveform. Numerical results indicate that the designed dual-function waveform is capable of offering good communication quality of service (QoS) and sensing performance.
△ Less
Submitted 11 February, 2024; v1 submitted 20 November, 2023;
originally announced November 2023.
-
Conditional central limit theorem for critical branching random walk
Authors:
Wenming Hong,
Shengli Liang
Abstract:
Consider a critical branching random walk on $\mathbb{R}$. Let $Z^{(n)}(A)$ be the number of individuals in the $n$-th generation located in $A\in \mathcal{B}(\mathbb{R})$ and $Z_{n}:=Z^{(n)}(\mathbb{R})$ denote the population of the $n$-th generation. We prove that, under some conditions, for all $x\in \mathbb{R}$, as $n\to \infty$,…
▽ More
Consider a critical branching random walk on $\mathbb{R}$. Let $Z^{(n)}(A)$ be the number of individuals in the $n$-th generation located in $A\in \mathcal{B}(\mathbb{R})$ and $Z_{n}:=Z^{(n)}(\mathbb{R})$ denote the population of the $n$-th generation. We prove that, under some conditions, for all $x\in \mathbb{R}$, as $n\to \infty$, $$\mathcal{L}\left(\frac{Z^{(n)}(-\infty, \sqrt{n} x]}{n} ~\bigg |~ Z_{n}>0\right) \Longrightarrow\mathcal{L}\left(Y(x)\right),$$ where $\Rightarrow$ means weak convergence and $Y(x)$ is a random variable whose distribution is specified by its moments.
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
The NeurIPS 2022 Neural MMO Challenge: A Massively Multiagent Competition with Specialization and Trade
Authors:
Enhong Liu,
Joseph Suarez,
Chenhui You,
Bo Wu,
Bingcheng Chen,
Jun Hu,
Jiaxin Chen,
Xiaolong Zhu,
Clare Zhu,
Julian Togelius,
Sharada Mohanty,
Weijun Hong,
Rui Du,
Yibing Zhang,
Qinwen Wang,
Xinhang Li,
Zheng Yuan,
Xiang Li,
Yuejia Huang,
Kun Zhang,
Hanhui Yang,
Shiqi Tang,
Phillip Isola
Abstract:
In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which in…
▽ More
In this paper, we present the results of the NeurIPS-2022 Neural MMO Challenge, which attracted 500 participants and received over 1,600 submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved agents from 16 populations surviving in procedurally generated worlds by collecting resources and defeating opponents. This year's competition runs on the latest v1.6 Neural MMO, which introduces new equipment, combat, trading, and a better scoring system. These elements combine to pose additional robustness and generalization challenges not present in previous competitions. This paper summarizes the design and results of the challenge, explores the potential of this environment as a benchmark for learning methods, and presents some practical reinforcement learning training approaches for complex tasks with sparse rewards. Additionally, we have open-sourced our baselines, including environment wrappers, benchmarks, and visualization tools for future research.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
CogVLM: Visual Expert for Pretrained Language Models
Authors:
Weihan Wang,
Qingsong Lv,
Wenmeng Yu,
Wenyi Hong,
Ji Qi,
Yan Wang,
Junhui Ji,
Zhuoyi Yang,
Lei Zhao,
Xixuan Song,
Jiazheng Xu,
Bin Xu,
Juanzi Li,
Yuxiao Dong,
Ming Ding,
Jie Tang
Abstract:
We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables deep fusion of vision…
▽ More
We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained language model and image encoder by a trainable visual expert module in the attention and FFN layers. As a result, CogVLM enables deep fusion of vision language features without sacrificing any performance on NLP tasks. CogVLM-17B achieves state-of-the-art performance on 10 classic cross-modal benchmarks, including NoCaps, Flicker30k captioning, RefCOCO, RefCOCO+, RefCOCOg, Visual7W, GQA, ScienceQA, VizWiz VQA and TDIUC, and ranks the 2nd on VQAv2, OKVQA, TextVQA, COCO captioning, etc., surpassing or matching PaLI-X 55B. Codes and checkpoints are available at https://github.com/THUDM/CogVLM.
△ Less
Submitted 4 February, 2024; v1 submitted 6 November, 2023;
originally announced November 2023.
-
Reconfigurable Intelligent Surface & Edge -- An Introduction of an EM manipulation structure on obstacles' edge
Authors:
Tianqi Xiang,
Zhiwei Jiang,
Weijun Hong,
Xin Zhang,
Yuehong Gao
Abstract:
Reconfigurable Intelligent Surface (RIS) or metasurface is one of the important enabling technologies in mobile cellular networks that can effectively enhance the signal coverage performance in obstructed regions, and it is generally deployed on surfaces different from obstacles to redirect electromagnetic (EM) waves by reflection, or covered on objects' surfaces to manipulate EM waves by refracti…
▽ More
Reconfigurable Intelligent Surface (RIS) or metasurface is one of the important enabling technologies in mobile cellular networks that can effectively enhance the signal coverage performance in obstructed regions, and it is generally deployed on surfaces different from obstacles to redirect electromagnetic (EM) waves by reflection, or covered on objects' surfaces to manipulate EM waves by refraction. In this paper, Reconfigurable Intelligent Surface & Edge (RISE) is proposed to extend RIS' abilities of reflection and refraction over surfaces to diffraction around obstacles' edge for better adaptation to specific coverage scenarios. Based on that, this paper analyzes the performance of several different deployment locations and EM manipulation structure designs for different coverage scenarios. Then a novel EM manipulation structure deployed at the obstacles' edge is proposed to achieve static EM environment modification. Simulations validate the preference of the schemes for different scenarios and the new structure achieves better coverage performance than other typical structures in the static scheme.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Deconfined quantum critical point lost in pressurized SrCu2(BO3)2
Authors:
Jing Guo,
Pengyu Wang,
Cheng Huang,
Bin-Bin Chen,
Wenshan Hong,
Shu Cai,
Jinyu Zhao,
Jinyu Han,
Xintian Chen,
Yazhou Zhou,
Shiliang Li,
Qi Wu,
Zi Yang Meng,
Liling Sun
Abstract:
In the field of correlated electron materials, the relation between the resonating spin singlet and antiferromagnetic states has long been an attractive topic for understanding of the interesting macroscopic quantum phenomena, such as the ones emerging from magnetic frustrated materials, antiferromagnets and high-temperature superconductors. SrCu2(BO3)2 is a well-known quantum magnet, and it is th…
▽ More
In the field of correlated electron materials, the relation between the resonating spin singlet and antiferromagnetic states has long been an attractive topic for understanding of the interesting macroscopic quantum phenomena, such as the ones emerging from magnetic frustrated materials, antiferromagnets and high-temperature superconductors. SrCu2(BO3)2 is a well-known quantum magnet, and it is theoretically expected to be the candidate of correlated electron material for clarifying the existence of a pressure-induced deconfined quantum critical point (DQCP), featured by a continuous quantum phase transition, between the plaquette-singlet (PS) valence bond solid phase and the antiferromagnetic (AF) phase. However, the real nature of the transition is yet to be identified experimentally due to the technical challenge. Here we show the experimental results for the first time, through the state-of-the-art high-pressure heat capacity measurement, that the PS-AF phase transition of the pressurized SrCu2(BO3)2 at zero field is clearly a first-order one. Our result clarifies the more than two-decade long debates about this key issue, and resonates nicely with the recent quantum entanglement understanding that the theoretically predicted DQCPs in representative lattice models are actually a first-order transition. Intriguingly, we also find that the transition temperatures of the PS and AF phase meet at the same pressure-temperature point, which signifies a bi-critical point as those observed in Fe-based superconductor and heavy-fermion compound, and constitutes the first experimental discovery of the pressure-induced bi-critical point in frustrated magnets. Our results provide fresh information for understanding the evolution among different spin states of correlated electron materials under pressure.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Multiband superconductivity and a deep gap minimum from the specific heat in KCa$_2$(Fe$_{1-x}$Ni$_x$)$_4$As$_4$F$_2$ ($x$ = 0, 0.05, 0.13)
Authors:
Yiwen Li,
Zhengyan Zhu,
Yongze Ye,
Wenshan Hong,
Yang Li,
Shiliang Li,
Huiqian Luo,
Hai-Hu Wen
Abstract:
Specific heat can explore low-energy quasiparticle excitations of superconductors, so it is a powerful tool for bulk measurement on the superconducting gap structure and pairing symmetry. Here, we report an in-depth investigation on the specific heat of the multiband superconductors KCa$_2$(Fe$_{1-x}$Ni$_x$)$_4$As$_4$F$_2$ ($x$ = 0, 0.05, 0.13) single crystals and the overdoped non-superconducting…
▽ More
Specific heat can explore low-energy quasiparticle excitations of superconductors, so it is a powerful tool for bulk measurement on the superconducting gap structure and pairing symmetry. Here, we report an in-depth investigation on the specific heat of the multiband superconductors KCa$_2$(Fe$_{1-x}$Ni$_x$)$_4$As$_4$F$_2$ ($x$ = 0, 0.05, 0.13) single crystals and the overdoped non-superconducting one with $x$ = 0.17. Clear specific heat anomalies can be observed at the superconducting transition temperature of 33.6 K and 28.8 K for the samples with $x$ = 0 and $x$ = 0.05, respectively. For the two samples, the magnetic field induced specific heat coefficient $Δγ(H)$ in the low-temperature limit increases rapidly below 2 T, then it rises slowly above 2 T. Using the non-superconducting sample with $x$ = 0.17 as a reference, the specific heat of phonon background for various superconducting samples can be obtained and subtracted, which allows us to extract the electronic specific heat of the superconducting samples. Through comparative analyses, it is found that the energy gap structure including two $s$-wave gaps and an extended $s$-wave gap with large anisotropy can reasonably describe the electronic specific heat data. According to these results, we suggest that at least one anisotropic superconducting gap with a deep gap minimum should exist in this multiband system. With the doping of Ni, the superconducting transition temperature of the sample decreases along with the decrease of the large $s$-wave gap, but the extended $s$-wave gap increases due to the enlarged electron pockets via adding more electrons. Despite these changes, the general properties of the gap structure remain unchanged versus doping Ni.
△ Less
Submitted 19 January, 2024; v1 submitted 29 October, 2023;
originally announced October 2023.
-
BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning
Authors:
Tianle Huang,
Nitish Sontakke,
K. Niranjan Kumar,
Irfan Essa,
Stefanos Nikolaidis,
Dennis W. Hong,
Sehoon Ha
Abstract:
Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automatin…
▽ More
Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Evaluating residual acceleration noise for TianQin gravitational waves observatory with an empirical magnetic field model
Authors:
Wei Su,
Ze-Bing Zhou,
Yan Wang,
Chen Zhou,
P. F. Chen,
Wei Hong,
J. H. Peng,
Yun Yang,
Y. W. Ni
Abstract:
TianQin (TQ) project plans to deploy three satellites in space around the Earth to measure the displacement change of test masses caused by gravitational waves via laser interferometry. The requirement of the acceleration noise of the test mass is on the order of $10^{-15}~\,{\rm m}\,{\rm s}^{-2}\,{\rm Hz}^{-1/2}$ in the sensitive frequency range of TQ, %the extremely precise acceleration measurem…
▽ More
TianQin (TQ) project plans to deploy three satellites in space around the Earth to measure the displacement change of test masses caused by gravitational waves via laser interferometry. The requirement of the acceleration noise of the test mass is on the order of $10^{-15}~\,{\rm m}\,{\rm s}^{-2}\,{\rm Hz}^{-1/2}$ in the sensitive frequency range of TQ, %the extremely precise acceleration measurement requirements make it necessary to investigate acceleration noise due to space magnetic fields. which is so stringent that the acceleration noise caused by the interaction of the space magnetic field with the test mass needs to be investigated. In this work, by using the Tsyganenko model, a data-based empirical space magnetic field model, we obtain the magnetic field distribution around TQ's orbit spanning two solar cycles in 23 years from 1998 to 2020. With the obtained space magnetic field, we derive the distribution and amplitude spectral densities (ASDs) of the acceleration noise of TQ in 23 years. Our results reveal that the average values of the ratio of the acceleration noise cauesd by the space magnetic field to the requirements of TQ at 1 mHz ($R_{\rm 1mHz}$) and 6 mHz ($R_{\rm 6mHz}$) are 0.123$\pm$0.052 and 0.027$\pm$0.013, respectively. The occurence probabilities of $R_{\rm 1mHz}>0.2$ and $>0.3$ are only 7.9% and 1.2%, respectively, and $R_{\rm 6mHz}$ never exceeds 0.2.
△ Less
Submitted 30 November, 2023; v1 submitted 15 October, 2023;
originally announced October 2023.
-
Order-disorder phase transition and elastic-to-plastic vortex creep crossover in a triclinic iron pnictide superconductor (Ca0.85La0.15)10(Pt3As8)(Fe2As2)5
Authors:
Shyam Sundar,
P. V. Lopes,
S. Salem-Sugui, Jr.,
Z. -Z. Li,
W. -S. Hong,
H. -Q. Luo,
S. -L. Li,
L. Ghivelder
Abstract:
Vortex matter in layered high-$T_c$ superconductors, including iron-pnictides, undergo several thermodynamic phase transitions due to the complex interplay of pinning energy, thermal energy and elastic energy. Moreover, the presence of anisotropy makes their vortex physics even more intriguing. Here, we report a detailed vortex dynamics study, using dc magnetization measurements, in a triclinic ir…
▽ More
Vortex matter in layered high-$T_c$ superconductors, including iron-pnictides, undergo several thermodynamic phase transitions due to the complex interplay of pinning energy, thermal energy and elastic energy. Moreover, the presence of anisotropy makes their vortex physics even more intriguing. Here, we report a detailed vortex dynamics study, using dc magnetization measurements, in a triclinic iron-pnictide superconductor (Ca$_{0.85}$La$_{0.15}$)$_{10}$(Pt$_3$As$_8$)(Fe$_2$As$_2$)$_5$, with a superconducting transition temperature, T$_c$ $\sim$ 31 K. A second magnetization peak (SMP) feature is observed for magnetic field perpendicular ($H$$\parallel$$c$) and parallel ($H$$\parallel$$ab$) to the crystal plane. However, its fundamental origin is quite different in both directions. For $H$$\parallel$$c$, the SMP can be well explained using an elastic-to-plastic vortex creep crossover, using collective creep theory. In addition, a possible rhombic-to-square vortex lattice phase transition is also observed for fields in between the onset-field and peak-field related to the SMP. On the other hand, for $H$$\parallel$$ab$, a clear signature of an order-disorder vortex phase transition is observed in the isothermal $M$($H$) measurements at $T$ $\geq$ 6 K. The disordered phase exhibits the characteristics of entangled pinned vortex-liquid. We construct a comprehensive vortex phase diagram by displaying characteristic temperatures and magnetic fields for both crystal geometries in this unique superconducting compound. Our study sheds light on the intricate vortex dynamics and pinning in an iron-pnictide superconductor with triclinic symmetry.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Neural Collapse for Unconstrained Feature Model under Cross-entropy Loss with Imbalanced Data
Authors:
Wanli Hong,
Shuyang Ling
Abstract:
Recent years have witnessed the huge success of deep neural networks (DNNs) in various tasks of computer vision and text processing. Interestingly, these DNNs with massive number of parameters share similar structural properties on their feature representation and last-layer classifier at terminal phase of training (TPT). Specifically, if the training data are balanced (each class shares the same…
▽ More
Recent years have witnessed the huge success of deep neural networks (DNNs) in various tasks of computer vision and text processing. Interestingly, these DNNs with massive number of parameters share similar structural properties on their feature representation and last-layer classifier at terminal phase of training (TPT). Specifically, if the training data are balanced (each class shares the same number of samples), it is observed that the feature vectors of samples from the same class converge to their corresponding in-class mean features and their pairwise angles are the same. This fascinating phenomenon is known as Neural Collapse (N C), first termed by Papyan, Han, and Donoho in 2019. Many recent works manage to theoretically explain this phenomenon by adopting so-called unconstrained feature model (UFM). In this paper, we study the extension of N C phenomenon to the imbalanced data under cross-entropy loss function in the context of unconstrained feature model. Our contribution is multi-fold compared with the state-of-the-art results: (a) we show that the feature vectors exhibit collapse phenomenon, i.e., the features within the same class collapse to the same mean vector; (b) the mean feature vectors no longer form an equiangular tight frame. Instead, their pairwise angles depend on the sample size; (c) we also precisely characterize the sharp threshold on which the minority collapse (the feature vectors of the minority groups collapse to one single vector) will take place; (d) finally, we argue that the effect of the imbalance in datasize diminishes as the sample size grows. Our results provide a complete picture of the N C under the cross-entropy loss for the imbalanced data. Numerical experiments confirm our theoretical analysis.
△ Less
Submitted 24 October, 2023; v1 submitted 18 September, 2023;
originally announced September 2023.
-
The Nanoplasmonic Purcell Effect in Ultrafast and High-Light-Yield Perovskite Scintillators
Authors:
Wenzheng Ye,
Zhihua Yong,
Michael Go,
Dominik Kowal,
Francesco Maddalena,
Liliana Tjahjana,
Wang Hong,
Arramel Arramel,
Christophe Dujardin,
Muhammad Danang Birowosuto,
Liang Jie Wong
Abstract:
The development of X-ray scintillators with ultrahigh light yields and ultrafast response times is a long sought-after goal. In this work, we theoretically predict and experimentally demonstrate a fundamental mechanism that pushes the frontiers of ultrafast X-ray scintillator performance: the use of nanoscale-confined surface plasmon polariton modes to tailor the scintillator response time via the…
▽ More
The development of X-ray scintillators with ultrahigh light yields and ultrafast response times is a long sought-after goal. In this work, we theoretically predict and experimentally demonstrate a fundamental mechanism that pushes the frontiers of ultrafast X-ray scintillator performance: the use of nanoscale-confined surface plasmon polariton modes to tailor the scintillator response time via the Purcell effect. By incorporating nanoplasmonic materials in scintillator devices, this work predicts over 10-fold enhancement in decay rate and 38% reduction in time resolution even with only a simple planar design. We experimentally demonstrate the nanoplasmonic Purcell effect using perovskite scintillators, enhancing the light yield by over 120% to 88 $\pm$ 11 ph/keV, and the decay rate by over 60% to 2.0 $\pm$ 0.2 ns for the average decay time, and 0.7 $\pm$ 0.1 ns for the ultrafast decay component, in good agreement with the predictions of our theoretical framework. We perform proof-of-concept X-ray imaging experiments using nanoplasmonic scintillators, demonstrating 182% enhancement in the modulation transfer function at 4 line pairs per millimeter spatial frequency. This work highlights the enormous potential of nanoplasmonics in optimizing ultrafast scintillator devices for applications including time-of-flight X-ray imaging and photon-counting computed tomography.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis
Authors:
Jiayan Teng,
Wendi Zheng,
Ming Ding,
Wenyi Hong,
Jianqiao Wangni,
Zhuoyi Yang,
Jie Tang
Abstract:
Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resol…
▽ More
Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning. RDM achieves state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256$\times$256, surpassing previous works such as ADM, LDM and DiT by a large margin. All the codes and checkpoints are open-sourced at \url{https://github.com/THUDM/RelayDiffusion}.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO
Authors:
Yangkun Chen,
Joseph Suarez,
Junjie Zhang,
Chenghui Yu,
Bo Wu,
Hanmo Chen,
Hengman Zhu,
Rui Du,
Shanliang Qian,
Shuai Liu,
Weijun Hong,
Jinke He,
Yibing Zhang,
Liang Zhao,
Clare Zhu,
Julian Togelius,
Sharada Mohanty,
Jiaxin Chen,
Xiu Li,
Xiaolong Zhu,
Phillip Isola
Abstract:
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents…
▽ More
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents in the environment. The top submissions demonstrate strong success on this task using mostly standard reinforcement learning (RL) methods combined with domain-specific engineering. We summarize the competition design and results and suggest that, as an academic community, competitions may be a powerful approach to solving hard problems and establishing a solid benchmark for algorithms. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
Proprioceptive Learning with Soft Polyhedral Networks
Authors:
Xiaobo Liu,
Xudong Han,
Wei Hong,
Fang Wan,
Chaoyang Song
Abstract:
Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of ada…
▽ More
Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of adaptive kinesthesia and viscoelastic proprioception by learning kinetic features. This design enables passive adaptations to omni-directional interactions, visually captured by a miniature high-speed motion tracking system embedded inside for proprioceptive learning. The results show that the soft network can infer real-time 6D forces and torques with accuracies of 0.25/0.24/0.35 N and 0.025/0.034/0.006 Nm in dynamic interactions. We also incorporate viscoelasticity in proprioception during static adaptation by adding a creep and relaxation modifier to refine the predicted results. The proposed soft network combines simplicity in design, omni-adaptation, and proprioceptive sensing with high accuracy, making it a versatile solution for robotics at a low cost with more than 1 million use cycles for tasks such as sensitive and competitive grasping, and touch-based geometry reconstruction. This study offers new insights into vision-based proprioception for soft robots in adaptive grasping, soft manipulation, and human-robot interaction.
△ Less
Submitted 27 July, 2024; v1 submitted 16 August, 2023;
originally announced August 2023.
-
Measurement of the Positive Muon Anomalous Magnetic Moment to 0.20 ppm
Authors:
D. P. Aguillard,
T. Albahri,
D. Allspach,
A. Anisenkov,
K. Badgley,
S. Baeßler,
I. Bailey,
L. Bailey,
V. A. Baranov,
E. Barlas-Yucel,
T. Barrett,
E. Barzi,
F. Bedeschi,
M. Berz,
M. Bhattacharya,
H. P. Binney,
P. Bloom,
J. Bono,
E. Bottalico,
T. Bowcock,
S. Braun,
M. Bressler,
G. Cantatore,
R. M. Carey,
B. C. K. Casey
, et al. (166 additional authors not shown)
Abstract:
We present a new measurement of the positive muon magnetic anomaly, $a_μ\equiv (g_μ- 2)/2$, from the Fermilab Muon $g\!-\!2$ Experiment using data collected in 2019 and 2020. We have analyzed more than 4 times the number of positrons from muon decay than in our previous result from 2018 data. The systematic error is reduced by more than a factor of 2 due to better running conditions, a more stable…
▽ More
We present a new measurement of the positive muon magnetic anomaly, $a_μ\equiv (g_μ- 2)/2$, from the Fermilab Muon $g\!-\!2$ Experiment using data collected in 2019 and 2020. We have analyzed more than 4 times the number of positrons from muon decay than in our previous result from 2018 data. The systematic error is reduced by more than a factor of 2 due to better running conditions, a more stable beam, and improved knowledge of the magnetic field weighted by the muon distribution, $\tildeω'^{}_p$, and of the anomalous precession frequency corrected for beam dynamics effects, $ω_a$. From the ratio $ω_a / \tildeω'^{}_p$, together with precisely determined external parameters, we determine $a_μ= 116\,592\,057(25) \times 10^{-11}$ (0.21 ppm). Combining this result with our previous result from the 2018 data, we obtain $a_μ\text{(FNAL)} = 116\,592\,055(24) \times 10^{-11}$ (0.20 ppm). The new experimental world average is $a_μ(\text{Exp}) = 116\,592\,059(22)\times 10^{-11}$ (0.19 ppm), which represents a factor of 2 improvement in precision.
△ Less
Submitted 4 October, 2023; v1 submitted 11 August, 2023;
originally announced August 2023.
-
NTIRE 2023 Quality Assessment of Video Enhancement Challenge
Authors:
Xiaohong Liu,
Xiongkuo Min,
Wei Sun,
Yulun Zhang,
Kai Zhang,
Radu Timofte,
Guangtao Zhai,
Yixuan Gao,
Yuqin Cao,
Tengchuan Kou,
Yunlong Dong,
Ziheng Jia,
Yilin Li,
Wei Wu,
Shuming Hu,
Sibin Deng,
Pengxiang Xiao,
Ying Chen,
Kai Li,
Kai Zhao,
Kun Yuan,
Ming Sun,
Heng Cong,
Hao Wang,
Lingzhi Fu
, et al. (47 additional authors not shown)
Abstract:
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual…
▽ More
This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.