Search | arXiv e-print repository

Approximately Invertible Neural Network for Learned Image Compression

Authors: Yanbo Gao, Meng Fu, Shuai Li, Chong Lv, Xun Cai, Hui Yuan, Mao Ye

Abstract: Learned image compression have attracted considerable interests in recent years. It typically comprises an analysis transform, a synthesis transform, quantization and an entropy coding model. The analysis transform and synthesis transform are used to encode an image to latent feature and decode the quantized feature to reconstruct the image, and can be regarded as coupled transforms. However, the… ▽ More Learned image compression have attracted considerable interests in recent years. It typically comprises an analysis transform, a synthesis transform, quantization and an entropy coding model. The analysis transform and synthesis transform are used to encode an image to latent feature and decode the quantized feature to reconstruct the image, and can be regarded as coupled transforms. However, the analysis transform and synthesis transform are designed independently in the existing methods, making them unreliable in high-quality image compression. Inspired by the invertible neural networks in generative modeling, invertible modules are used to construct the coupled analysis and synthesis transforms. Considering the noise introduced in the feature quantization invalidates the invertible process, this paper proposes an Approximately Invertible Neural Network (A-INN) framework for learned image compression. It formulates the rate-distortion optimization in lossy image compression when using INN with quantization, which differentiates from using INN for generative modelling. Generally speaking, A-INN can be used as the theoretical foundation for any INN based lossy compression method. Based on this formulation, A-INN with a progressive denoising module (PDM) is developed to effectively reduce the quantization noise in the decoding. Moreover, a Cascaded Feature Recovery Module (CFRM) is designed to learn high-dimensional feature recovery from low-dimensional ones to further reduce the noise in feature channel compression. In addition, a Frequency-enhanced Decomposition and Synthesis Module (FDSM) is developed by explicitly enhancing the high-frequency components in an image to address the loss of high-frequency information inherent in neural network based image compression. Extensive experiments demonstrate that the proposed A-INN outperforms the existing learned image compression methods. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2408.11558 [pdf, other]

GSTran: Joint Geometric and Semantic Coherence for Point Cloud Segmentation

Authors: Abiao Li, Chenlei Lv, Guofeng Mei, Yifan Zuo, Jian Zhang, Yuming Fang

Abstract: Learning meaningful local and global information remains a challenge in point cloud segmentation tasks. When utilizing local information, prior studies indiscriminately aggregates neighbor information from different classes to update query points, potentially compromising the distinctive feature of query points. In parallel, inaccurate modeling of long-distance contextual dependencies when utilizi… ▽ More Learning meaningful local and global information remains a challenge in point cloud segmentation tasks. When utilizing local information, prior studies indiscriminately aggregates neighbor information from different classes to update query points, potentially compromising the distinctive feature of query points. In parallel, inaccurate modeling of long-distance contextual dependencies when utilizing global information can also impact model performance. To address these issues, we propose GSTran, a novel transformer network tailored for the segmentation task. The proposed network mainly consists of two principal components: a local geometric transformer and a global semantic transformer. In the local geometric transformer module, we explicitly calculate the geometric disparity within the local region. This enables amplifying the affinity with geometrically similar neighbor points while suppressing the association with other neighbors. In the global semantic transformer module, we design a multi-head voting strategy. This strategy evaluates semantic similarity across the entire spatial range, facilitating the precise capture of contextual dependencies. Experiments on ShapeNetPart and S3DIS benchmarks demonstrate the effectiveness of the proposed method, showing its superiority over other algorithms. The code is available at https://github.com/LAB123-tech/GSTran. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: ICPR 2024

arXiv:2408.05132 [pdf, other]

Hidden curved spaces in Bosonic Kitaev model

Authors: Chenwei Lv, Qi Zhou

Abstract: Quantum matter in curved spaces exhibits remarkable properties unattainable in flat spaces. To access curved spaces in laboratories, the conventional wisdom is that physical distortions need to be implemented into a system. In contrast to this belief, here, we show that two hyperbolic surfaces readily exist in bosonic Kitaev model in the absence of any physical distortions and give rise to a range… ▽ More Quantum matter in curved spaces exhibits remarkable properties unattainable in flat spaces. To access curved spaces in laboratories, the conventional wisdom is that physical distortions need to be implemented into a system. In contrast to this belief, here, we show that two hyperbolic surfaces readily exist in bosonic Kitaev model in the absence of any physical distortions and give rise to a range of intriguing phenomena, such as chiral quantum transport or chiral reaction-diffusion. A finite chemical potential couples these two hyperbolic surfaces, delivering a quantum sensor whose sensitivity grows exponentially with the size of the system. Our results provide experimentalists with an unprecedented opportunity to explore intriguing quantum phenomena in curve spaces without distortion or access non-Hermitian phenomena without dissipation. Our work also suggests a new class of quantum sensors in which geometry amplifies small signals. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 7 pages, 3 figures

arXiv:2408.03552 [pdf, other]

A deeper investigation of the Primordial Binary Cluster

Authors: Qingshun Hu, Songmei Qin, Chunyan Li, Chenglong Lv, Yang Pan, Yangping Luo

Abstract: We hereby reported a new physical binary cluster (ASCC~19 and ASCC~21) near the Orion star-forming complex based on the data in the literature. Analysis of the results shows that it is a primordial binary cluster. It is possible that this binary cluster is undergoing two-body relaxation by inspecting the radial velocity anomalies of its member stars. In addition, based on the analysis of its metal… ▽ More We hereby reported a new physical binary cluster (ASCC~19 and ASCC~21) near the Orion star-forming complex based on the data in the literature. Analysis of the results shows that it is a primordial binary cluster. It is possible that this binary cluster is undergoing two-body relaxation by inspecting the radial velocity anomalies of its member stars. In addition, based on the analysis of its metal abundances, we found that the components of this binary cluster may have been formed by the deep fusion of multiple subclusters. Finally, we investigated the 3D morphology of this binary cluster, simulated the trajectories of its components in the galactic disk, and concluded that its components may not merge into a single cluster. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 15 pages, 16 figures, 3 tables

arXiv:2407.19208 [pdf, other]

WindPoly: Polygonal Mesh Reconstruction via Winding Numbers

Authors: Xin He, Chenlei Lv, Pengdi Huang, Hui Huang

Abstract: Polygonal mesh reconstruction of a raw point cloud is a valuable topic in the field of computer graphics and 3D vision. Especially to 3D architectural models, polygonal mesh provides concise expressions for fundamental geometric structures while effectively reducing data volume. However, there are some limitations of traditional reconstruction methods: normal vector dependency, noisy points and de… ▽ More Polygonal mesh reconstruction of a raw point cloud is a valuable topic in the field of computer graphics and 3D vision. Especially to 3D architectural models, polygonal mesh provides concise expressions for fundamental geometric structures while effectively reducing data volume. However, there are some limitations of traditional reconstruction methods: normal vector dependency, noisy points and defective parts sensitivity, and internal geometric structure lost, which reduce the practicality in real scene. In this paper, we propose a robust and efficient polygonal mesh reconstruction method to address the issues in architectural point cloud reconstruction task. It is an iterative adaptation process to detect planar shapes from scattered points. The initial structural polygonal mesh can be established in the constructed convex polyhedral space without assistance of normal vectors. Then, we develop an efficient polygon-based winding number strategy to orient polygonal mesh with global consistency. The significant advantage of our method is to provide a structural reconstruction for architectural point clouds and avoid point-based normal vector analysis. It effectively improves the robustness to noisy points and defective parts. More geometric details can be preserved in the reconstructed polygonal mesh. Experimental results show that our method can reconstruct concise, oriented and faithfully polygonal mesh that are better than results of state-of-the-art methods. More results and details can be found on https://vcc.tech/research/2024/WindPoly △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: European Conference on Computer Vision (Proceedings of ECCV 2024)

arXiv:2407.13480 [pdf]

Risk-Aware Vehicle Trajectory Prediction Under Safety-Critical Scenarios

Authors: Qingfan Wang, Dongyang Xu, Gaoyuan Kuang, Chen Lv, Shengbo Eben Li, Bingbing Nie

Abstract: Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in aut… ▽ More Trajectory prediction is significant for intelligent vehicles to achieve high-level autonomous driving, and a lot of relevant research achievements have been made recently. Despite the rapid development, most existing studies solely focused on normal safe scenarios while largely neglecting safety-critical scenarios, particularly those involving imminent collisions. This oversight may result in autonomous vehicles lacking the essential predictive ability in such situations, posing a significant threat to safety. To tackle these, this paper proposes a risk-aware trajectory prediction framework tailored to safety-critical scenarios. Leveraging distinctive hazardous features, we develop three core risk-aware components. First, we introduce a risk-incorporated scene encoder, which augments conventional encoders with quantitative risk information to achieve risk-aware encoding of hazardous scene contexts. Next, we incorporate endpoint-risk-combined intention queries as prediction priors in the decoder to ensure that the predicted multimodal trajectories cover both various spatial intentions and risk levels. Lastly, an auxiliary risk prediction task is implemented for the ultimate risk-aware prediction. Furthermore, to support model training and performance evaluation, we introduce a safety-critical trajectory prediction dataset and tailored evaluation metrics. We conduct comprehensive evaluations and compare our model with several SOTA models. Results demonstrate the superior performance of our model, with a significant improvement in most metrics. This prediction advancement enables autonomous vehicles to execute correct collision avoidance maneuvers under safety-critical scenarios, eventually enhancing road traffic safety. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.11585 [pdf, other]

QVD: Post-training Quantization for Video Diffusion Models

Authors: Shilong Tian, Hong Chen, Chengtao Lv, Yu Liu, Jinyang Guo, Xianglong Liu, Shengxi Li, Hao Yang, Tao Xie

Abstract: Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effe… ▽ More Recently, video diffusion models (VDMs) have garnered significant attention due to their notable advancements in generating coherent and realistic video content. However, processing multiple frame features concurrently, coupled with the considerable model size, results in high latency and extensive memory consumption, hindering their broader application. Post-training quantization (PTQ) is an effective technique to reduce memory footprint and improve computational efficiency. Unlike image diffusion, we observe that the temporal features, which are integrated into all frame features, exhibit pronounced skewness. Furthermore, we investigate significant inter-channel disparities and asymmetries in the activation of video diffusion models, resulting in low coverage of quantization levels by individual channels and increasing the challenge of quantization. To address these issues, we introduce the first PTQ strategy tailored for video diffusion models, dubbed QVD. Specifically, we propose the High Temporal Discriminability Quantization (HTDQ) method, designed for temporal features, which retains the high discriminability of quantized features, providing precise temporal guidance for all video frames. In addition, we present the Scattered Channel Range Integration (SCRI) method which aims to improve the coverage of quantization levels across individual channels. Experimental validations across various models, datasets, and bit-width settings demonstrate the effectiveness of our QVD in terms of diverse metrics. In particular, we achieve near-lossless performance degradation on W8A8, outperforming the current methods by 205.12 in FVD. △ Less

Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

Comments: accepted by ACMMM2024

arXiv:2407.09359 [pdf, other]

A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization

Authors: Qiyu Chen, Huiyuan Luo, Chengkan Lv, Zhengtao Zhang

Abstract: Anomaly synthesis strategies can effectively enhance unsupervised anomaly detection. However, existing strategies have limitations in the coverage and controllability of anomaly synthesis, particularly for weak defects that are very similar to normal regions. In this paper, we propose Global and Local Anomaly co-Synthesis Strategy (GLASS), a novel unified framework designed to synthesize a broader… ▽ More Anomaly synthesis strategies can effectively enhance unsupervised anomaly detection. However, existing strategies have limitations in the coverage and controllability of anomaly synthesis, particularly for weak defects that are very similar to normal regions. In this paper, we propose Global and Local Anomaly co-Synthesis Strategy (GLASS), a novel unified framework designed to synthesize a broader coverage of anomalies under the manifold and hypersphere distribution constraints of Global Anomaly Synthesis (GAS) at the feature level and Local Anomaly Synthesis (LAS) at the image level. Our method synthesizes near-in-distribution anomalies in a controllable way using Gaussian noise guided by gradient ascent and truncated projection. GLASS achieves state-of-the-art results on the MVTec AD (detection AUROC of 99.9\%), VisA, and MPDD datasets and excels in weak defect detection. The effectiveness and efficiency have been further validated in industrial applications for woven fabric defect detection. The code and dataset are available at: \url{https://github.com/cqylunlun/GLASS}. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.01219 [pdf, other]

Searching for Best Practices in Retrieval-Augmented Generation

Authors: Xiaohua Wang, Zhenghua Wang, Xuan Gao, Feiran Zhang, Yixin Wu, Zhibo Xu, Tianyuan Shi, Zhengyuan Wang, Shizheng Li, Qi Qian, Ruicheng Yin, Changze Lv, Xiaoqing Zheng, Xuanjing Huang

Abstract: Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolong… ▽ More Retrieval-augmented generation (RAG) techniques have proven to be effective in integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains. While many RAG approaches have been proposed to enhance large language models through query-dependent retrievals, these approaches still suffer from their complex implementation and prolonged response times. Typically, a RAG workflow involves multiple processing steps, each of which can be executed in various ways. Here, we investigate existing RAG approaches and their potential combinations to identify optimal RAG practices. Through extensive experiments, we suggest several strategies for deploying RAG that balance both performance and efficiency. Moreover, we demonstrate that multimodal retrieval techniques can significantly enhance question-answering capabilities about visual inputs and accelerate the generation of multimodal content using a "retrieval as generation" strategy. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19230 [pdf, other]

Spiking Convolutional Neural Networks for Text Classification

Authors: Changze Lv, Jianhan Xu, Xiaoqing Zheng

Abstract: Spiking neural networks (SNNs) offer a promising pathway to implement deep neural networks (DNNs) in a more energy-efficient manner since their neurons are sparsely activated and inferences are event-driven. However, there have been very few works that have demonstrated the efficacy of SNNs in language tasks partially because it is non-trivial to represent words in the forms of spikes and to deal… ▽ More Spiking neural networks (SNNs) offer a promising pathway to implement deep neural networks (DNNs) in a more energy-efficient manner since their neurons are sparsely activated and inferences are event-driven. However, there have been very few works that have demonstrated the efficacy of SNNs in language tasks partially because it is non-trivial to represent words in the forms of spikes and to deal with variable-length texts by SNNs. This work presents a "conversion + fine-tuning" two-step method for training SNNs for text classification and proposes a simple but effective way to encode pre-trained word embeddings as spike trains. We show empirically that after fine-tuning with surrogate gradients, the converted SNNs achieve comparable results to their DNN counterparts with much less energy consumption across multiple datasets for both English and Chinese. We also show that such SNNs are more robust to adversarial attacks than DNNs. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.16062 [pdf, other]

Towards Biologically Plausible Computing: A Comprehensive Comparison

Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, global error computation, and dual-phase training. To address this long-standing challenge, many studies have endeavored to devise biologically plausible training algorithms. However, a fully biologically plausible algorithm for training multilayer neural networks remains elusive, and interpretations of biological plausibility vary among researchers. In this study, we establish criteria for biological plausibility that a desirable learning algorithm should meet. Using these criteria, we evaluate a range of existing algorithms considered to be biologically plausible, including Hebbian learning, spike-timing-dependent plasticity, feedback alignment, target propagation, predictive coding, forward-forward algorithm, perturbation learning, local losses, and energy-based learning. Additionally, we empirically evaluate these algorithms across diverse network architectures and datasets. We compare the feature representations learned by these algorithms with brain activity recorded by non-invasive devices under identical stimuli, aiming to identify which algorithm can most accurately replicate brain activity patterns. We are hopeful that this study could inspire the development of new biologically plausible algorithms for training multilayer networks, thereby fostering progress in both the fields of neuroscience and machine learning. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.10976 [pdf, other]

Promoting Data and Model Privacy in Federated Learning through Quantized LoRA

Authors: JianHao Zhu, Changze Lv, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Zixuan Ling, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

Abstract: Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers… ▽ More Conventional federated learning primarily aims to secure the privacy of data distributed across multiple edge devices, with the global model dispatched to edge devices for parameter updates during the learning process. However, the development of large language models (LLMs) requires substantial data and computational resources, rendering them valuable intellectual properties for their developers and owners. To establish a mechanism that protects both data and model privacy in a federated learning context, we introduce a method that just needs to distribute a quantized version of the model's parameters during training. This method enables accurate gradient estimations for parameter updates while preventing clients from accessing a model whose performance is comparable to the centrally hosted one. Moreover, we combine this quantization strategy with LoRA, a popular and parameter-efficient fine-tuning method, to significantly reduce communication costs in federated learning. The proposed framework, named \textsc{FedLPP}, successfully ensures both data and model privacy in the federated learning context. Additionally, the learned central model exhibits good generalization and can be trained in a resource-efficient manner. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.02993 [pdf, other]

Dual-color Q-switched mode-locking in an Erbium-doped fiber laser

Authors: Chenyue Lv, Baole Lu, Jintao Bai

Abstract: Q-switched mode-locking (QML) has been widely observed in various lasers, but its generation mechanism in passive mode-locking remains unclear. In this paper, we build up a dual-color QML Erbium-doped fiber laser and find a bound-state-like envelope on the optical spectrum for the first time. Theoretically, the formation mechanism of QML is numerically investigated using the coupled Ginzburg-Landa… ▽ More Q-switched mode-locking (QML) has been widely observed in various lasers, but its generation mechanism in passive mode-locking remains unclear. In this paper, we build up a dual-color QML Erbium-doped fiber laser and find a bound-state-like envelope on the optical spectrum for the first time. Theoretically, the formation mechanism of QML is numerically investigated using the coupled Ginzburg-Landau equations. In addition, we demonstrated the existence of two QML pulse evolution patterns with gain or polarization state variations in simulation. Our results deepen the understanding of QML pulses in mode-locked fiber lasers and provide a foundation for studying mode-locking nonlinear evolutionary paths. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.14362 [pdf, other]

Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators

Authors: Changze Lv, Dongqi Han, Yansen Wang, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li

Abstract: Spiking neural networks (SNNs) represent a promising approach to developing artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike-form positional encoding (PE) strategy. Drawing i… ▽ More Spiking neural networks (SNNs) represent a promising approach to developing artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike-form positional encoding (PE) strategy. Drawing inspiration from the central pattern generators (CPGs) in the human brain, which produce rhythmic patterned outputs without requiring rhythmic inputs, we propose a novel PE technique for SNNs, termed CPG-PE. We demonstrate that the commonly used sinusoidal PE is mathematically a specific solution to the membrane potential dynamics of a particular CPG. Moreover, extensive experiments across various domains, including time-series forecasting, natural language processing, and image classification, show that SNNs with CPG-PE outperform their conventional counterparts. Additionally, we perform analysis experiments to elucidate the mechanism through which SNNs encode positional information and to explore the function of CPGs in the human brain. This investigation may offer valuable insights into the fundamental principles of neural computation. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14226 [pdf, other]

Variational Delayed Policy Optimization

Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Chao Huang

Abstract: In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL). However, state-of-the-art (SOTA) RL techniques with Temporal-Difference (TD) learning frameworks often suffer from learning inefficiency, due to the significant expansion of the augmented state space with the dela… ▽ More In environments with delayed observation, state augmentation by including actions within the delay window is adopted to retrieve Markovian property to enable reinforcement learning (RL). However, state-of-the-art (SOTA) RL techniques with Temporal-Difference (TD) learning frameworks often suffer from learning inefficiency, due to the significant expansion of the augmented state space with the delay. To improve learning efficiency without sacrificing performance, this work introduces a novel framework called Variational Delayed Policy Optimization (VDPO), which reformulates delayed RL as a variational inference problem. This problem is further modelled as a two-step iterative optimization problem, where the first step is TD learning in the delay-free environment with a small state space, and the second step is behaviour cloning which can be addressed much more efficiently than TD learning. We not only provide a theoretical analysis of VDPO in terms of sample complexity and performance, but also empirically demonstrate that VDPO can achieve consistent performance with SOTA methods, with a significant enhancement of sample efficiency (approximately 50\% less amount of samples) in the MuJoCo benchmark. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.11186 [pdf, other]

Compact Spin-Polarized Positron Acceleration in Multi-Layer Microhole Array Films

Authors: Zhen-Ke Dou, Chong Lv, Yousef I. Salamin, Nan Zhang, Feng Wan, Zhong-Feng Xu, Jian-Xing Li

Abstract: Compact spin-polarized positron accelerators play a major role in promoting significant positron application research, which typically require high acceleration gradients and polarization degree, both of which, however, are still great challenging. Here, we put forward a novel spin-polarized positron acceleration method which employs an ultrarelativistic high-density electron beam passing through… ▽ More Compact spin-polarized positron accelerators play a major role in promoting significant positron application research, which typically require high acceleration gradients and polarization degree, both of which, however, are still great challenging. Here, we put forward a novel spin-polarized positron acceleration method which employs an ultrarelativistic high-density electron beam passing through any hole of multi-layer microhole array films to excite strong electrostatic and transition radiation fields. Positrons in the polarized electron-positron pair plasma, filled in the front of the multi-layer films, can be captured, accelerated, and focused by the electrostatic and transition radiation fields, while maintaining high polarization of above 90% and high acceleration gradient of about TeV/m. Multi-layer design allows for capturing more positrons and achieving cascade acceleration. Our method offers a promising solution for accelerator miniaturization, positron injection, and polarization maintaining, and also can be used to accelerate other charged particles. △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.10157 [pdf]

Incorporating ESO into Deep Koopman Operator Modelling for Control of Autonomous Vehicles

Authors: Hao Chen, Chen Lv

Abstract: Koopman operator theory is a kind of data-driven modelling approach that accurately captures the nonlinearities of mechatronic systems such as vehicles against physics-based methods. However, the infinite-dimensional Koopman operator is impossible to implement in real-world applications. To approximate the infinite-dimensional Koopman operator through collection dataset rather than manual trial an… ▽ More Koopman operator theory is a kind of data-driven modelling approach that accurately captures the nonlinearities of mechatronic systems such as vehicles against physics-based methods. However, the infinite-dimensional Koopman operator is impossible to implement in real-world applications. To approximate the infinite-dimensional Koopman operator through collection dataset rather than manual trial and error, we adopt deep neural networks (DNNs) to extract basis functions by offline training and map the nonlinearities of vehicle planar dynamics into a linear form in the lifted space. Besides, the effects of the dimensions of basis functions on the model accuracy are explored. Further, the extended state observer (ESO) is introduced to online estimate the total disturbance in the lifted space and compensate for the modelling errors and residuals of the learned deep Koopman operator (DK) while also improving its generalization. Then, the proposed model is applied to predict vehicle states within prediction horizons and later formulates the constrained finite-time optimization problem of model predictive control (MPC), i.e., ESO-DKMPC. In terms of the trajectory tracking of autonomous vehicles, the ESO-DKMPC generates the wheel steering angle to govern lateral motions based on the decoupling control structure. The various conditions under the double-lane change scenarios are built on the CarSim/Simulink co-simulation platform, and extensive comparisons are conducted with the linear MPC (LMPC) and nonlinear MPC (NMPC) informed by the physics-based model. The results indicate that the proposed ESO-DKMPC has better tracking performance and moderate efficacy both within linear and nonlinear regions. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.10145 [pdf]

Deep Koopman Operator-Informed Safety Command Governor for Autonomous Vehicles

Authors: Hao Chen, Xiangkun He, Shuo Cheng, Chen Lv

Abstract: Modeling of nonlinear behaviors with physical-based models poses challenges. However, Koopman operator maps the original nonlinear system into an infinite-dimensional linear space to achieve global linearization of the nonlinear system through input and output data, which derives an absolute equivalent linear representation of the original state space. Due to the impossibility of implementing the… ▽ More Modeling of nonlinear behaviors with physical-based models poses challenges. However, Koopman operator maps the original nonlinear system into an infinite-dimensional linear space to achieve global linearization of the nonlinear system through input and output data, which derives an absolute equivalent linear representation of the original state space. Due to the impossibility of implementing the infinite-dimensional Koopman operator, finite-dimensional kernel functions are selected as an approximation. Given its flexible structure and high accuracy, deep learning is initially employed to extract kernel functions from data and acquire a linear evolution dynamic of the autonomous vehicle in the lifted space. Additionally, the control barrier function (CBF) converts the state constraints to the constraints on the input to render safety property. Then, in terms of the lateral stability of the in-wheel motor driven vehicle, the CBF conditions are incorporated with the learned deep Koopman model. Because of the linear fashion of the deep Koopman model, the quadratic programming problem is formulated to generate the applied driving torque with minimal perturbation to the original driving torque as a safety command governor. In the end, to validate the fidelity of the deep Koopman model compared to other mainstream approaches and demonstrate the lateral improvement achieved by the proposed safety command governor, data collection and safety testing scenarios are conducted on a hardware-in-the-loop platform. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.08263 [pdf, other]

Palette-based Color Transfer between Images

Authors: Chenlei Lv, Dan Zhang

Abstract: As an important subtopic of image enhancement, color transfer aims to enhance the color scheme of a source image according to a reference one while preserving the semantic context. To implement color transfer, the palette-based color mapping framework was proposed. \textcolor{black}{It is a classical solution that does not depend on complex semantic analysis to generate a new color scheme. However… ▽ More As an important subtopic of image enhancement, color transfer aims to enhance the color scheme of a source image according to a reference one while preserving the semantic context. To implement color transfer, the palette-based color mapping framework was proposed. \textcolor{black}{It is a classical solution that does not depend on complex semantic analysis to generate a new color scheme. However, the framework usually requires manual settings, blackucing its practicality.} The quality of traditional palette generation depends on the degree of color separation. In this paper, we propose a new palette-based color transfer method that can automatically generate a new color scheme. With a redesigned palette-based clustering method, pixels can be classified into different segments according to color distribution with better applicability. {By combining deep learning-based image segmentation and a new color mapping strategy, color transfer can be implemented on foreground and background parts independently while maintaining semantic consistency.} The experimental results indicate that our method exhibits significant advantages over peer methods in terms of natural realism, color consistency, generality, and robustness. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.06426 [pdf, other]

Generation of Ultra-Collimated Polarized Attosecond $γ-$Rays via Beam Instabilities

Authors: Li-Jie Cui, Ke-Jia Wei, Chong Lv, Feng Wan, Yousef I. Salamin, Lei-Feng Cao, Jian-Xing Li

Abstract: Polarized attosecond $γ-$rays may offer excitation and hyperfine tracking of reactions relevant to nuclear physics, astrophysics, high-energy physics, etc. However, unfortunately, generation of a feasible and easy-to-deploy source is still a great challenge. Here, we put forward a novel method for producing ultra-collimated high-brilliance polarized attosecond $γ-$rays via the interaction of an un… ▽ More Polarized attosecond $γ-$rays may offer excitation and hyperfine tracking of reactions relevant to nuclear physics, astrophysics, high-energy physics, etc. However, unfortunately, generation of a feasible and easy-to-deploy source is still a great challenge. Here, we put forward a novel method for producing ultra-collimated high-brilliance polarized attosecond $γ-$rays via the interaction of an unpolarized electron beam with a solid-density plasma. As a relativistic electron beam enters a solid-density plasma, it can be modulated into high-density clusters via the self-modulation instability of itself and further into attosecond slices due to its own hosing instability. This is accompanied by the generation of similar pulse-width $γ-$slices via nonlinear Compton scattering. The severe hosing instability breaks the symmetry of the excited electromagnetic fields, resulting in net linear polarization of $γ-$slices, which challenges the conventional perception that the interaction of an axially symmetric unpolarized electron beam with a uniform plasma cannot generate polarized radiation. In addition, we also obtain high-quality electron microbunches which may serve as an alternative source for prebunched free-electron lasers. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.06001 [pdf, other]

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Authors: Ruihao Gong, Yang Yong, Shiqiao Gu, Yushi Huang, Chentao Lv, Yunchen Zhang, Xianglong Liu, Dacheng Tao

Abstract: Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements limit the widespread adoption. Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating LLMs, albeit w… ▽ More Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements limit the widespread adoption. Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating LLMs, albeit with potential risks to accuracy. Numerous studies have aimed to minimize the accuracy loss associated with quantization. However, their quantization configurations vary from each other and cannot be fairly compared. In this paper, we present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. LLMC integrates dozens of algorithms, models, and hardwares, offering high extensibility from integer to floating-point quantization, from LLM to vision-language (VLM) model, from fixed-bit to mixed precision, and from quantization to sparsification. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats, providing novel insights and detailed analyses for further research and practical guidance for users. Our toolkit is available at \href{LLMC}{https://github.com/ModelTC/llmc}. △ Less

Submitted 20 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.03144 [pdf, other]

PTQ4SAM: Post-Training Quantization for Segment Anything

Authors: Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, Xianglong Liu

Abstract: Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to th… ▽ More Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to the bimodal distribution in post-Key-Linear activations. We analyze its characteristics from both per-tensor and per-channel perspectives, and propose a Bimodal Integration strategy, which utilizes a mathematically equivalent sign operation to transform the bimodal distribution into a relatively easy-quantized normal distribution offline. Second, SAM encompasses diverse attention mechanisms (i.e., self-attention and two-way cross-attention), resulting in substantial variations in the post-Softmax distributions. Therefore, we introduce an Adaptive Granularity Quantization for Softmax through searching the optimal power-of-two base, which is hardware-friendly. Extensive experimental results across various vision tasks (instance segmentation, semantic segmentation and object detection), datasets and model variants show the superiority of PTQ4SAM. For example, when quantizing SAM-L to 6-bit, we achieve lossless accuracy for instance segmentation, about 0.5\% drop with theoretical 3.9$\times$ acceleration. The code is available at \url{https://github.com/chengtao-lv/PTQ4SAM}. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: CVPR 2024

arXiv:2404.15348 [pdf]

High-Linearity PAM-4 Silicon Micro-ring Transmitter Architecture with Electronic-Photonic Hybrid DAC

Authors: Zheng Li, Chengyang Lv, Min Tan

Abstract: This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexib… ▽ More This paper presents a high linearity PAM-4 transmitter (TX) architecture, consisting of a three-segment micro-ring modulator (MRM) and a matched CMOS driver. This architecture can drive a high-linearity 4-level pulse amplitude (PAM-4) modulation signal, thereby extending the tunable operating wavelength range for achieving linear PAM-4 output. We use the three-segment MRM to increase design flexibility so that the linearity of PAM-4 output can be optimized with another degree of freedom. Each phase shift region is directly driven by the independently amplitude-tunable Non-Return-to-Zero (NRZ) signal. The three-segment modulator can achieve an adjustable wavelength range of approximately 0.037 nm within the high linearity PAM-4 output limit when the driving voltage varies from 1.5 V to 3 V, simultaneously achieving an adjustable insertion loss (IL) range of approximately 2 dB, roughly four times that of the two-segment MRM with a similar design. The driver circuit with adjustable driving voltage is co-designed to adjust the eye height to improve PAM-4 linearity. In this article, the high linearity PAM-4 silicon micro-ring architecture can be employed in optical transmitters to adjust PAM-4 eye-opening size and maximize the PAM-4 output linearity, thus offering the potential for high-performance and low-power overhead transmitters. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 14 pages, 11 figures

arXiv:2404.14047 [pdf, other]

An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs

Authors: Wei Huang, Xingyu Zheng, Xudong Ma, Haotong Qin, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

Abstract: The LLaMA family has become one of the most powerful open-source Large Language Models (LLMs) and the popular LLM backbones of Multimodal Large Language Models (MLLMs), widely applied in Computer Vision (CV) and Natural Language Understanding (NLU) tasks. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over… ▽ More The LLaMA family has become one of the most powerful open-source Large Language Models (LLMs) and the popular LLM backbones of Multimodal Large Language Models (MLLMs), widely applied in Computer Vision (CV) and Natural Language Understanding (NLU) tasks. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when quantized to low bit-width. This exploration can potentially unveil new insights and challenges for low-bit quantization of LLaMA3 and other forthcoming LLMs, especially in addressing performance degradation problems that suffer in LLM compression. Specifically, we comprehensively evaluate the 10 existing post-training quantization and LoRA-finetuning methods of LLaMA3 on 1-8 bits and diverse datasets to reveal LLaMA3's low-bit quantization performance. To uncover the capabilities of low-bit quantized MLLM, we assessed the performance of the LLaMA3-based LLaVA-Next-8B model under 2-4 ultra-low bits with post-training quantization methods. Our experimental results indicate that LLaMA3 still suffers non-negligent degradation in linguistic and visual contexts, particularly under ultra-low bit widths. This highlights the significant performance gap under low bit-width that needs to be bridged in future developments. We expect that this empirical study will prove valuable in advancing future models, driving LLMs and MLLMs to achieve higher accuracy at lower bit to enhance practicality. △ Less

Submitted 19 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.14037 [pdf, other]

doi 10.1145/3664647.3681675

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Authors: Hongyun Yu, Zhan Qu, Qihang Yu, Jianchuan Chen, Zhonghua Jiang, Zhiwen Chen, Shengyu Zhang, Jimin Xu, Fei Wu, Chengfei Lv, Gang Yu

Abstract: Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method… ▽ More Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip motion generation. Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose, delivering stable and realistic rendered videos. Extensive experimental results suggest that GaussianTalker outperforms existing state-of-the-art methods in talking head synthesis, delivering precise lip synchronization and exceptional visual quality. Our method achieves rendering speeds of 130 FPS on NVIDIA RTX4090 GPU, significantly exceeding the threshold for real-time rendering performance, and can potentially be deployed on other hardware platforms. △ Less

Submitted 9 August, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by ACM MM 2024. Project page: https://yuhongyun777.github.io/GaussianTalker/

arXiv:2404.11593 [pdf, other]

IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

Authors: Xi Chen, Sida Peng, Dongchen Yang, Yuan Liu, Bowen Pan, Chengfei Lv, Xiaowei Zhou

Abstract: This paper aims to recover object materials from posed images captured under an unknown static lighting condition. Recent methods solve this task by optimizing material parameters through differentiable physically based rendering. However, due to the coupling between object geometry, materials, and environment lighting, there is inherent ambiguity during the inverse rendering process, preventing p… ▽ More This paper aims to recover object materials from posed images captured under an unknown static lighting condition. Recent methods solve this task by optimizing material parameters through differentiable physically based rendering. However, due to the coupling between object geometry, materials, and environment lighting, there is inherent ambiguity during the inverse rendering process, preventing previous methods from obtaining accurate results. To overcome this ill-posed problem, our key idea is to learn the material prior with a generative model for regularizing the optimization process. We observe that the general rendering equation can be split into diffuse and specular shading terms, and thus formulate the material prior as diffusion models of albedo and specular. Thanks to this design, our model can be trained using the existing abundant 3D object data, and naturally acts as a versatile tool to resolve the ambiguity when recovering material representations from RGB images. In addition, we develop a coarse-to-fine training strategy that leverages estimated materials to guide diffusion models to satisfy multi-view consistent constraints, leading to more stable and accurate results. Extensive experiments on real-world and synthetic datasets demonstrate that our approach achieves state-of-the-art performance on material recovery. The code will be available at https://zju3dv.github.io/IntrinsicAnything. △ Less

Submitted 22 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Project page: https://zju3dv.github.io/IntrinsicAnything

arXiv:2404.02524 [pdf, other]

Versatile Scene-Consistent Traffic Scenario Generation as Optimization with Diffusion

Authors: Zhiyu Huang, Zixu Zhang, Ameya Vaidya, Yuxiao Chen, Chen Lv, Jaime Fernández Fisac

Abstract: Generating realistic and controllable agent behaviors in traffic simulation is crucial for the development of autonomous vehicles. This problem is often formulated as imitation learning (IL) from real-world driving data by either directly predicting future trajectories or inferring cost functions with inverse optimal control. In this paper, we draw a conceptual connection between IL and diffusion-… ▽ More Generating realistic and controllable agent behaviors in traffic simulation is crucial for the development of autonomous vehicles. This problem is often formulated as imitation learning (IL) from real-world driving data by either directly predicting future trajectories or inferring cost functions with inverse optimal control. In this paper, we draw a conceptual connection between IL and diffusion-based generative modeling and introduce a novel framework Versatile Behavior Diffusion (VBD) to simulate interactive scenarios with multiple traffic participants. Our model not only generates scene-consistent multi-agent interactions but also enables scenario editing through multi-step guidance and refinement. Experimental evaluations show that VBD achieves state-of-the-art performance on the Waymo Sim Agents benchmark. In addition, we illustrate the versatility of our model by adapting it to various applications. VBD is capable of producing scenarios conditioning on priors, integrating with model-based optimization, sampling multi-modal scene-consistent scenarios by fusing marginal predictions, and generating safety-critical scenarios when combined with a game-theoretic solver. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.17297 [pdf, other]

InternLM2 Technical Report

Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.12642 [pdf]

Interlayer Dzyaloshinskii-Moriya interaction in synthetic ferrimagnets

Authors: Shen Li, Mouad Fattouhi, Tianxun Huang, Chen Lv, Mark C. H. de Jong, Pingzhi Li, Xiaoyang Lin, Felipe Garcia-Sanchez, Eduardo Martinez, Stéphane Mangin, Bert Koopmans, Weisheng Zhao, Reinoud Lavrijsen

Abstract: The antisymmetric interlayer exchange interaction, i.e., interlayer Dzyaloshinskii-Moriya interaction (IL-DMI) has attracted significant interest since this long-range chiral spin interaction provides a new dimension for controlling spin textures and dynamics. However, the role of IL-DMI in the field induced and spin-orbit torque (SOT) induced switching of synthetic ferrimagnets (SFi) has not been… ▽ More The antisymmetric interlayer exchange interaction, i.e., interlayer Dzyaloshinskii-Moriya interaction (IL-DMI) has attracted significant interest since this long-range chiral spin interaction provides a new dimension for controlling spin textures and dynamics. However, the role of IL-DMI in the field induced and spin-orbit torque (SOT) induced switching of synthetic ferrimagnets (SFi) has not been uncovered. Here, we exploit interlayer chiral exchange bias fields in SFi to address both the sign and magnitude of the IL-DMI. Depending on the degree of imbalance between the two magnetic moments of the SFi, the amount of asymmetry, addressed via loop shifts of the hysteresis loops under an in-plane field reveals a unidirectional and chiral nature of the IL-DMI. The devices are then tested with SOT switching experiments and the process is examined via both transient state and steady state detection. In addition to field-free SOT switching, we find that the combination of IL-DMI and SOT give rise to multi-resistance states, which provides a possible direction for the future design of neuromorphic computing devices based on SOT. This work is a step towards characterizing and understanding the IL-DMI for spintronic applications. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: no

MSC Class: no

arXiv:2403.11183 [pdf, other]

Decoding Continuous Character-based Language from Non-invasive Brain Recordings

Authors: Cenyuan Zhang, Xiaoqing Zheng, Ruicheng Yin, Shujie Geng, Jianhan Xu, Xuan Gao, Changze Lv, Zixuan Ling, Xuanjing Huang, Miao Cao, Jianfeng Feng

Abstract: Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to… ▽ More Deciphering natural language from brain activity through non-invasive devices remains a formidable challenge. Previous non-invasive decoders either require multiple experiments with identical stimuli to pinpoint cortical regions and enhance signal-to-noise ratios in brain activity, or they are limited to discerning basic linguistic elements such as letters and words. We propose a novel approach to decoding continuous language from single-trial non-invasive fMRI recordings, in which a three-dimensional convolutional network augmented with information bottleneck is developed to automatically identify responsive voxels to stimuli, and a character-based decoder is designed for the semantic reconstruction of continuous language characterized by inherent character structures. The resulting decoder can produce intelligible textual sequences that faithfully capture the meaning of perceived speech both within and across subjects, while existing decoders exhibit significantly inferior performance in cross-subject contexts. The ability to decode continuous language from single trials across subjects demonstrates the promising applications of non-invasive language brain-computer interfaces in both healthcare and neuroscience. △ Less

Submitted 19 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

arXiv:2403.10206 [pdf, other]

A Data-Driven Approach for Mitigating Dark Current Noise and Bad Pixels in Complementary Metal Oxide Semiconductor Cameras for Space-based Telescopes

Authors: Peng Jia, Chao Lv, Yushan Li, Yongyang Sun, Shu Niu, Zhuoxiao Wang

Abstract: In recent years, there has been a gradual increase in the performance of Complementary Metal Oxide Semiconductor (CMOS) cameras. These cameras have gained popularity as a viable alternative to charge-coupled device (CCD) cameras in a wide range of applications. One particular application is the CMOS camera installed in small space telescopes. However, the limited power and spatial resources availa… ▽ More In recent years, there has been a gradual increase in the performance of Complementary Metal Oxide Semiconductor (CMOS) cameras. These cameras have gained popularity as a viable alternative to charge-coupled device (CCD) cameras in a wide range of applications. One particular application is the CMOS camera installed in small space telescopes. However, the limited power and spatial resources available on satellites present challenges in maintaining ideal observation conditions, including temperature and radiation environment. Consequently, images captured by CMOS cameras are susceptible to issues such as dark current noise and defective pixels. In this paper, we introduce a data-driven framework for mitigating dark current noise and bad pixels for CMOS cameras. Our approach involves two key steps: pixel clustering and function fitting. During pixel clustering step, we identify and group pixels exhibiting similar dark current noise properties. Subsequently, in the function fitting step, we formulate functions that capture the relationship between dark current and temperature, as dictated by the Arrhenius law. Our framework leverages ground-based test data to establish distinct temperature-dark current relations for pixels within different clusters. The cluster results could then be utilized to estimate the dark current noise level and detect bad pixels from real observational data. To assess the effectiveness of our approach, we have conducted tests using real observation data obtained from the Yangwang-1 satellite, equipped with a near-ultraviolet telescope and an optical telescope. The results show a considerable improvement in the detection efficiency of space-based telescopes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted by the AJ, comments are welcome. The complete code could be downloaded from: DOI: 10.12149/101387

arXiv:2403.00436 [pdf, other]

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Authors: Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen Lv, Jianru Xue, Tat-Seng Chua

Abstract: We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions. We annotate over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. MM-AU supports various accident understanding tasks, particularly multimodal video dif… ▽ More We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions. We annotate over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. MM-AU supports various accident understanding tasks, particularly multimodal video diffusion to understand accident cause-effect chains for safe driving. With MM-AU, we present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven by an abductive CLIP model. This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as accident reasons, prevention advice, and accident categories. OAVD enforces the causal region learning while fixing the content of the original frame background in video generation, to find the dominant cause-effect chain for certain accidents. Extensive experiments verify the abductive ability of AdVersa-SD and the superiority of OAVD against the state-of-the-art diffusion models. Additionally, we provide careful benchmark evaluations for object detection and accident reason answering since AdVersa-SD relies on precise object and accident reason information. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024. This is not the camera-ready version. The Project page: http://www.lotvsmmau.net

arXiv:2402.19013 [pdf, other]

Ultraviolet Positioning via TDOA: Error Analysis and System Prototype

Authors: Shihui Yu, Chubing Lv, Yueke Yang, Yuchen Pan, Lei Sun, Juliang Cao, Ruihang Yu, Chen Gong, Wenqi Wu, Zhengyuan Xu

Abstract: This work performs the design, real-time hardware realization, and experimental evaluation of a positioning system by ultra-violet (UV) communication under photon-level signal detection. The positioning is based on time-difference of arrival (TDOA) principle. Time division-based transmission of synchronization sequence from three transmitters with known positions is applied. We investigate the pos… ▽ More This work performs the design, real-time hardware realization, and experimental evaluation of a positioning system by ultra-violet (UV) communication under photon-level signal detection. The positioning is based on time-difference of arrival (TDOA) principle. Time division-based transmission of synchronization sequence from three transmitters with known positions is applied. We investigate the positioning error via decomposing it into two parts, the transmitter-side timing error and the receiver-side synchronization error. The theoretical average error matches well with the simulation results, which indicates that theoretical fitting can provide reliable guidance and prediction for hardware experiments. We also conduct real-time hardware realization of the TDOA-based positioning system using Field Programmable Gate Array (FPGA), which is experimentally evaluated via outdoor experiments. Experimental results match well with the theoretical and simulation results. △ Less

Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.15179 [pdf, other]

Advancing Parameter Efficiency in Fine-tuning via Representation Editing

Authors: Muling Wu, Wenhao Liu, Xiaohua Wang, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

Abstract: Parameter Efficient Fine-Tuning (PEFT) techniques have drawn significant attention due to their ability to yield competitive results while updating only a small portion of the adjustable parameters. However, existing PEFT methods pose challenges in hyperparameter selection, such as choosing the rank for LoRA or Adapter, or specifying the length of soft prompts. To address these challenges, we prop… ▽ More Parameter Efficient Fine-Tuning (PEFT) techniques have drawn significant attention due to their ability to yield competitive results while updating only a small portion of the adjustable parameters. However, existing PEFT methods pose challenges in hyperparameter selection, such as choosing the rank for LoRA or Adapter, or specifying the length of soft prompts. To address these challenges, we propose a novel fine-tuning approach for neural models, named Representation EDiting (RED), which modifies the representations generated at some layers through the application of scaling and biasing operations. While existing PEFT methods still demonstrate over-parameterization that could potentially undermine the generalization ability acquired from pre-training, RED can substantially reduce the number of trainable parameters by a factor of 25, 700 compared to full parameter fine-tuning and by a factor of 32 relative to LoRA. Remarkably, RED achieves results comparable or superior to both full parameter fine-tuning and other PEFT methods. Extensive experiments across various model architectures and scales, including RoBERTa, GPT-2, T5, and LLaMA-2, have demonstrated the effectiveness and efficiency of RED1, thereby positioning it as a promising PEFT strategy for large-scale neural models. △ Less

Submitted 2 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.11960 [pdf, other]

DB-LLM: Accurate Dual-Binarization for Efficient LLMs

Authors: Hong Chen, Chengtao Lv, Liang Ding, Haotong Qin, Xiabin Zhou, Yifu Ding, Xuebo Liu, Min Zhang, Jinyang Guo, Xianglong Liu, Dacheng Tao

Abstract: Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective methods for improving the computational efficiency of LLMs. However, existing ultra-low-bit quantization always causes severe accuracy drops. In this paper, we e… ▽ More Large language models (LLMs) have significantly advanced the field of natural language processing, while the expensive memory and computation consumption impede their practical deployment. Quantization emerges as one of the most effective methods for improving the computational efficiency of LLMs. However, existing ultra-low-bit quantization always causes severe accuracy drops. In this paper, we empirically relieve the micro and macro characteristics of ultra-low bit quantization and present a novel Dual-Binarization method for LLMs, namely DB-LLM. For the micro-level, we take both the accuracy advantage of 2-bit-width and the efficiency advantage of binarization into account, introducing Flexible Dual Binarization (FDB). By splitting 2-bit quantized weights into two independent sets of binaries, FDB ensures the accuracy of representations and introduces flexibility, utilizing the efficient bitwise operations of binarization while retaining the inherent high sparsity of ultra-low bit quantization. For the macro-level, we find the distortion that exists in the prediction of LLM after quantization, which is specified as the deviations related to the ambiguity of samples. We propose the Deviation-Aware Distillation (DAD) method, enabling the model to focus differently on various samples. Comprehensive experiments show that our DB-LLM not only significantly surpasses the current State-of-The-Art (SoTA) in ultra-low bit quantization (eg, perplexity decreased from 9.64 to 7.23), but also achieves an additional 20\% reduction in computational consumption compared to the SOTA method under the same bit-width. Our code will be released soon. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.03141 [pdf, other]

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

Abstract: Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary ta… ▽ More Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary tasks involving short delays to accelerate RL with long delays, without compromising performance in stochastic environments. Specifically, AD-RL learns a value function for short delays and uses bootstrapping and policy improvement techniques to adjust it for long delays. We theoretically show that this can greatly reduce the sample complexity. On deterministic and stochastic benchmarks, our method significantly outperforms the SOTAs in both sample efficiency and policy performance. Code is available at https://github.com/QingyuanWuNothing/AD-RL. △ Less

Submitted 5 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2402.02426 [pdf, other]

Hybrid-Prediction Integrated Planning for Autonomous Driving

Authors: Haochen Liu, Zhiyu Huang, Wenhui Huang, Haohan Yang, Xiaoyu Mo, Chen Lv

Abstract: Autonomous driving systems require the ability to fully understand and predict the surrounding environment to make informed decisions in complex scenarios. Recent advancements in learning-based systems have highlighted the importance of integrating prediction and planning modules. However, this integration has brought forth three major challenges: inherent trade-offs by sole prediction, consistenc… ▽ More Autonomous driving systems require the ability to fully understand and predict the surrounding environment to make informed decisions in complex scenarios. Recent advancements in learning-based systems have highlighted the importance of integrating prediction and planning modules. However, this integration has brought forth three major challenges: inherent trade-offs by sole prediction, consistency between prediction patterns, and social coherence in prediction and planning. To address these challenges, we introduce a hybrid-prediction integrated planning (HPP) system, which possesses three novelly designed modules. First, we introduce marginal-conditioned occupancy prediction to align joint occupancy with agent-wise perceptions. Our proposed MS-OccFormer module achieves multi-stage alignment per occupancy forecasting with consistent awareness from agent-wise motion predictions. Second, we propose a game-theoretic motion predictor, GTFormer, to model the interactive future among individual agents with their joint predictive awareness. Third, hybrid prediction patterns are concurrently integrated with Ego Planner and optimized by prediction guidance. HPP achieves state-of-the-art performance on the nuScenes dataset, demonstrating superior accuracy and consistency for end-to-end paradigms in prediction and planning. Moreover, we test the long-term open-loop and closed-loop performance of HPP on the Waymo Open Motion Dataset and CARLA benchmark, surpassing other integrated prediction and planning pipelines with enhanced accuracy and compatibility. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2402.01533 [pdf, other]

Efficient and Effective Time-Series Forecasting with Spiking Neural Networks

Authors: Changze Lv, Yansen Wang, Dongqi Han, Xiaoqing Zheng, Xuanjing Huang, Dongsheng Li

Abstract: Spiking neural networks (SNNs), inspired by the spiking behavior of biological neurons, provide a unique pathway for capturing the intricacies of temporal data. However, applying SNNs to time-series forecasting is challenging due to difficulties in effective temporal alignment, complexities in encoding processes, and the absence of standardized guidelines for model selection. In this paper, we pro… ▽ More Spiking neural networks (SNNs), inspired by the spiking behavior of biological neurons, provide a unique pathway for capturing the intricacies of temporal data. However, applying SNNs to time-series forecasting is challenging due to difficulties in effective temporal alignment, complexities in encoding processes, and the absence of standardized guidelines for model selection. In this paper, we propose a framework for SNNs in time-series forecasting tasks, leveraging the efficiency of spiking neurons in processing temporal information. Through a series of experiments, we demonstrate that our proposed SNN-based approaches achieve comparable or superior results to traditional time-series forecasting methods on diverse benchmarks with much less energy consumption. Furthermore, we conduct detailed analysis experiments to assess the SNN's capacity to capture temporal dependencies within time-series data, offering valuable insights into its nuanced strengths and effectiveness in modeling the intricate dynamics of temporal data. Our study contributes to the expanding field of SNNs and offers a promising alternative for time-series forecasting tasks, presenting a pathway for the development of more biologically inspired and temporally aware forecasting models. Our code is available at https://github.com/microsoft/SeqSNN. △ Less

Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

arXiv:2401.15315 [pdf, other]

Learning Online Belief Prediction for Efficient POMDP Planning in Autonomous Driving

Authors: Zhiyu Huang, Chen Tang, Chen Lv, Masayoshi Tomizuka, Wei Zhan

Abstract: Effective decision-making in autonomous driving relies on accurate inference of other traffic agents' future behaviors. To achieve this, we propose an online belief-update-based behavior prediction model and an efficient planner for Partially Observable Markov Decision Processes (POMDPs). We develop a Transformer-based prediction model, enhanced with a recurrent neural memory model, to dynamically… ▽ More Effective decision-making in autonomous driving relies on accurate inference of other traffic agents' future behaviors. To achieve this, we propose an online belief-update-based behavior prediction model and an efficient planner for Partially Observable Markov Decision Processes (POMDPs). We develop a Transformer-based prediction model, enhanced with a recurrent neural memory model, to dynamically update latent belief state and infer the intentions of other agents. The model can also integrate the ego vehicle's intentions to reflect closed-loop interactions among agents, and it learns from both offline data and online interactions. For planning, we employ a Monte-Carlo Tree Search (MCTS) planner with macro actions, which reduces computational complexity by searching over temporally extended action steps. Inside the MCTS planner, we use predicted long-term multi-modal trajectories to approximate future updates, which eliminates iterative belief updating and improves the running efficiency. Our approach also incorporates deep Q-learning (DQN) as a search prior, which significantly improves the performance of the MCTS planner. Experimental results from simulated environments validate the effectiveness of our proposed method. The online belief update model can significantly enhance the accuracy and temporal consistency of predictions, leading to improved decision-making performance. Employing DQN as a search prior in the MCTS planner considerably boosts its performance and outperforms an imitation learning-based prior. Additionally, we show that the MCTS planning with macro actions substantially outperforms the vanilla method in terms of performance and efficiency. △ Less

Submitted 17 June, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: IEEE Robotics and Automation Letters

arXiv:2401.14075 [pdf, other]

Generation of High-Brilliance Polarized $γ$-Rays via Vacuum Dichroism-assisted Vacuum Birefringence

Authors: Chong Lv, Feng Wan, Yousef I. Salamin, Qian Zhao, Mamutjan Ababekri, Ruirui Xu, Jian-Xing Li

Abstract: We put forward a novel method to generate high-brilliance polarized $γ$-photon beams via vacuum dichroism (VD)-assisted vacuum birefringence (VB) effect. We split a linearly polarized (LP) laser pulse into two subpulses with the first one colliding with a dense unpolarized electron beam to generate LP $γ$ photons (via nonlinear Compton scattering), which then further collide with the second subpul… ▽ More We put forward a novel method to generate high-brilliance polarized $γ$-photon beams via vacuum dichroism (VD)-assisted vacuum birefringence (VB) effect. We split a linearly polarized (LP) laser pulse into two subpulses with the first one colliding with a dense unpolarized electron beam to generate LP $γ$ photons (via nonlinear Compton scattering), which then further collide with the second subpulse and are partially transformed into circularly polarized ones via the VB effect. We find that by manipulating the relative polarization of two subpulses, one can ``purify'' (i.e., enhance) the polarization of the $γ$-photon beam via the VD effect. Due to the VD assistance, the VB effect reaches optimal when the relative polarization is nearly $30^\circ$, not the widely used $45^\circ$ in the common VB detection methods. In addition, our method can be used to efficiently confirm the well-known VB effect itself, which has not been directly observed in experiments yet. △ Less

Submitted 30 April, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.06824 [pdf, other]

Rethinking Jailbreaking through the Lens of Representation Engineering

Authors: Tianlong Li, Shihan Dou, Wenhao Liu, Muling Wu, Changze Lv, Rui Zheng, Xiaoqing Zheng, Xuanjing Huang

Abstract: The recent surge in jailbreaking methods has revealed the vulnerability of Large Language Models (LLMs) to malicious inputs. While earlier research has primarily concentrated on increasing the success rates of jailbreaking attacks, the underlying mechanism for safeguarding LLMs remains underexplored. This study investigates the vulnerability of safety-aligned LLMs by uncovering specific activity p… ▽ More The recent surge in jailbreaking methods has revealed the vulnerability of Large Language Models (LLMs) to malicious inputs. While earlier research has primarily concentrated on increasing the success rates of jailbreaking attacks, the underlying mechanism for safeguarding LLMs remains underexplored. This study investigates the vulnerability of safety-aligned LLMs by uncovering specific activity patterns within the representation space generated by LLMs. Such ``safety patterns'' can be identified with only a few pairs of contrastive queries in a simple method and function as ``keys'' (used as a metaphor for security defense capability) that can be used to open or lock Pandora's Box of LLMs. Extensive experiments demonstrate that the robustness of LLMs against jailbreaking can be lessened or augmented by attenuating or strengthening the identified safety patterns. These findings deepen our understanding of jailbreaking phenomena and call for the LLM community to address the potential misuse of open-source LLMs. △ Less

Submitted 6 August, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 21 pages, 20 figures, 6 tables

arXiv:2401.06439 [pdf, other]

Ordering-Flexible Multi-Robot Coordination for MovingTarget Convoying Using Long-TermTask Execution

Authors: Bin-Bin Hu, Yanxin Zhou, Henglai Wei, Yan Wang, Chen Lv

Abstract: In this paper, we propose a cooperative long-term task execution (LTTE) algorithm for protecting a moving target into the interior of an ordering-flexible convex hull by a team of robots resiliently in the changing environments. Particularly, by designing target-approaching and sensing-neighbor collision-free subtasks, and incorporating these subtasks into the constraints rather than the tradition… ▽ More In this paper, we propose a cooperative long-term task execution (LTTE) algorithm for protecting a moving target into the interior of an ordering-flexible convex hull by a team of robots resiliently in the changing environments. Particularly, by designing target-approaching and sensing-neighbor collision-free subtasks, and incorporating these subtasks into the constraints rather than the traditional cost function in an online constraint-based optimization framework, the proposed LTTE can systematically guarantee long-term target convoying under changing environments in the n-dimensional Euclidean space. Then, the introduction of slack variables allow for the constraint violation of different subtasks; i.e., the attraction from target-approaching constraints and the repulsion from time-varying collision-avoidance constraints, which results in the desired formation with arbitrary spatial ordering sequences. Rigorous analysis is provided to guarantee asymptotical convergence with challenging nonlinear couplings induced by time-varying collision-free constraints. Finally, 2D experiments using three autonomous mobile robots (AMRs) are conducted to validate the effectiveness of the proposed algorithm, and 3D simulations tackling changing environmental elements, such as different initial positions, some robots suddenly breakdown and static obstacles are presented to demonstrate the multi-dimensional adaptability, robustness and the ability of obstacle avoidance of the proposed method. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Journal ref: Published on Automatica, 2024

arXiv:2401.05268 [pdf, other]

AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

Authors: Shuofei Qiao, Ningyu Zhang, Runnan Fang, Yujie Luo, Wangchunshu Zhou, Yuchen Eleanor Jiang, Chengfei Lv, Huajun Chen

Abstract: Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agen… ▽ More Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agent learning framework for QA that does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models (e.g., GPT-4). Given limited data with a tool library, AutoAct first automatically synthesizes planning trajectories without any assistance from humans or strong closed-source models. Then, AutoAct leverages a division-of-labor strategy to automatically differentiate based on the target task information and synthesized trajectories, producing a sub-agent group to complete the task. We conduct comprehensive experiments with different LLMs, which demonstrates that AutoAct yields better or parallel performance compared to various strong baselines. Further analysis demonstrates the effectiveness of the division-of-labor strategy, with the trajectory quality generated by AutoAct generally outperforming that of others. Code will be available at https://github.com/zjunlp/AutoAct. △ Less

Submitted 26 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: ACL 2024

arXiv:2312.17589 [pdf]

Improving photon number resolvability of a superconducting nanowire detector array using a level comparator circuit

Authors: Jia Huang, Xingyu Zhang, Weijun Zhang, Chaomeng Ding, Yong Wang, Chaolin Lv, Guangzhao Xu, Xiaoyu Liu, Hao Li, Zhen Wang, Lixing You

Abstract: Photon number resolving (PNR) capability is very important in many optical applications, including quantum information processing, fluorescence detection, and few-photon-level ranging and imaging. Superconducting nanowire single-photon detectors (SNSPDs) with a multipixel interleaved architecture give the array an excellent spatial PNR capability. However, the signal-to-noise ratio (SNR) of the ph… ▽ More Photon number resolving (PNR) capability is very important in many optical applications, including quantum information processing, fluorescence detection, and few-photon-level ranging and imaging. Superconducting nanowire single-photon detectors (SNSPDs) with a multipixel interleaved architecture give the array an excellent spatial PNR capability. However, the signal-to-noise ratio (SNR) of the photon number resolution (SNRPNR) of the array will be degraded with increasing the element number due to the electronic noise in the readout circuit, which limits the PNR resolution as well as the maximum PNR number. In this study, a 16-element interleaved SNSPD array was fabricated, and the PNR capability of the array was investigated and analyzed. By introducing a level comparator circuit (LCC), the SNRPNR of the detector array was improved over a factor of four. In addition, we performed a statistical analysis of the photon number on this SNSPD array with LCC, showing that the LCC method effectively enhances the PNR resolution. Besides, the system timing jitter of the detector was reduced from 90 ps to 72 ps due to the improved electrical SNR. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: 8 pages, 7 figures

arXiv:2312.15997 [pdf, other]

Aligning Large Language Models with Human Preferences through Representation Engineering

Authors: Wenhao Liu, Xiaohua Wang, Muling Wu, Tianlong Li, Changze Lv, Zixuan Ling, Jianhao Zhu, Cenyuan Zhang, Xiaoqing Zheng, Xuanjing Huang

Abstract: Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often involves employing reinforcement learning from human feedback (RLHF) to fine-tune LLMs based on human labels assessing the relative quality of model responses. Nevert… ▽ More Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often involves employing reinforcement learning from human feedback (RLHF) to fine-tune LLMs based on human labels assessing the relative quality of model responses. Nevertheless, RLHF is susceptible to instability during fine-tuning and presents challenges in implementation.Drawing inspiration from the emerging field of representation engineering (RepE), this study aims to identify relevant representations for high-level human preferences embedded in patterns of activity within an LLM, and achieve precise control of model behavior by transforming its representations. This novel approach, denoted as Representation Alignment from Human Feedback (RAHF), proves to be effective, computationally efficient, and easy to implement.Extensive experiments demonstrate the efficacy of RAHF in not only capturing but also manipulating representations to align with a broad spectrum of human preferences or values, rather than being confined to a singular concept or function (e.g. honesty or bias). RAHF's versatility in accommodating diverse human preferences shows its potential for advancing LLM performance. △ Less

Submitted 3 July, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.12913 [pdf, other]

Produce Once, Utilize Twice for Anomaly Detection

Authors: Shuyuan Wang, Qi Li, Huiyuan Luo, Chengkan Lv, Zhengtao Zhang

Abstract: Visual anomaly detection aims at classifying and locating the regions that deviate from the normal appearance. Embedding-based methods and reconstruction-based methods are two main approaches for this task. However, they are either not efficient or not precise enough for the industrial detection. To deal with this problem, we derive POUTA (Produce Once Utilize Twice for Anomaly detection), which i… ▽ More Visual anomaly detection aims at classifying and locating the regions that deviate from the normal appearance. Embedding-based methods and reconstruction-based methods are two main approaches for this task. However, they are either not efficient or not precise enough for the industrial detection. To deal with this problem, we derive POUTA (Produce Once Utilize Twice for Anomaly detection), which improves both the accuracy and efficiency by reusing the discriminant information potential in the reconstructive network. We observe that the encoder and decoder representations of the reconstructive network are able to stand for the features of the original and reconstructed image respectively. And the discrepancies between the symmetric reconstructive representations provides roughly accurate anomaly information. To refine this information, a coarse-to-fine process is proposed in POUTA, which calibrates the semantics of each discriminative layer by the high-level representations and supervision loss. Equipped with the above modules, POUTA is endowed with the ability to provide a more precise anomaly location than the prior arts. Besides, the representation reusage also enables to exclude the feature extraction process in the discriminative network, which reduces the parameters and improves the efficiency. Extensive experiments show that, POUTA is superior or comparable to the prior methods with even less cost. Furthermore, POUTA also achieves better performance than the state-of-the-art few-shot anomaly detection methods without any special design, showing that POUTA has strong ability to learn representations inherent in the training data. △ Less

Submitted 20 December, 2023; originally announced December 2023.

arXiv:2311.03012 [pdf, other]

The role of viscosity on drop impact forces

Authors: Vatsal Sanjay, Bin Zhang, Cunjing Lv, Detlef Lohse

Abstract: A liquid drop impacting a rigid substrate undergoes deformation and spreading due to normal reaction forces, which are counteracted by surface tension. On a non-wetting substrate, the drop subsequently retracts and takes off. Our recent work (Zhang et al., \textit{Phys. Rev. Lett.}, vol. 129, 2022, 104501) revealed two peaks in the temporal evolution of the normal force $F(t)$--one at impact and a… ▽ More A liquid drop impacting a rigid substrate undergoes deformation and spreading due to normal reaction forces, which are counteracted by surface tension. On a non-wetting substrate, the drop subsequently retracts and takes off. Our recent work (Zhang et al., \textit{Phys. Rev. Lett.}, vol. 129, 2022, 104501) revealed two peaks in the temporal evolution of the normal force $F(t)$--one at impact and another at jump-off. The second peak coincides with a Worthington jet formation, which vanishes at high viscosities due to increased viscous dissipation affecting flow focusing. In this article, using experiments, direct numerical simulations, and scaling arguments, we characterize both the peak amplitude $F_1$ at impact and the one at take off ($F_2$) and elucidate their dependency on the control parameters: the Weber number $We$ (dimensionless impact velocity) and the Ohnesorge number $Oh$ (dimensionless viscosity). \VS{The first peak amplitude $F_1$ and the time $t_1$ to reach it depend on inertial timescales for low viscosity liquids, remaining nearly constant for viscosities up to 100 times that of water. For high viscosity liquids, we balance the rate of change in kinetic energy with viscous dissipation to obtain new scaling laws: $F_1/F_ρ\sim \sqrt{Oh}$ and $t_1/τ_ρ\sim 1/\sqrt{Oh}$, where $F_ρ$ and $τ_ρ$ are the inertial force and time scales, respectively, which are consistent with our data.} The time $t_2$ at which the amplitude $F_2$ appears is set by the inertio-capillary timescale $τ_γ$, independent of both the viscosity and the impact velocity of the drop. However, these properties dictate the magnitude of this amplitude. △ Less

Submitted 30 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: version after first round of the review

arXiv:2310.19559 [pdf, other]

Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning

Authors: Changsheng Lv, Shuai Zhang, Yapeng Tian, Mengshi Qi, Huadong Ma

Abstract: In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, an… ▽ More In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention. Our proposed method is a plug-and-play module that can be incorporated into any baseline. In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance. Our source code is available at https://github.com/Andy20178/DCL. △ Less

Submitted 1 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

Comments: To be published in 37th Conference on Neural Information Processing Systems

arXiv:2310.17133 [pdf, other]

Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs

Authors: Yuxin Zuo, Bei Li, Chuanhao Lv, Tong Zheng, Tong Xiao, Jingbo Zhu

Abstract: This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete. Instead, we attribute this phenomenon to insufficient cross-modal interaction, rather than image information redundancy. A novel approach is proposed to generate parallel Visual Ques… ▽ More This paper presents an in-depth study of multimodal machine translation (MMT), examining the prevailing understanding that MMT systems exhibit decreased sensitivity to visual information when text inputs are complete. Instead, we attribute this phenomenon to insufficient cross-modal interaction, rather than image information redundancy. A novel approach is proposed to generate parallel Visual Question-Answering (VQA) style pairs from the source text, fostering more robust cross-modal interaction. Using Large Language Models (LLMs), we explicitly model the probing signal in MMT to convert it into VQA-style data to create the Multi30K-VQA dataset. An MMT-VQA multitask learning framework is introduced to incorporate explicit probing signals from the dataset into the MMT training process. Experimental results on two widely-used benchmarks demonstrate the effectiveness of this novel approach. Our code and data would be available at: \url{https://github.com/libeineu/MMT-VQA}. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: Findings of EMNLP2023

arXiv:2310.16582 [pdf, other]

Tailoring Personality Traits in Large Language Models via Unsupervisedly-Built Personalized Lexicons

Authors: Tianlong Li, Shihan Dou, Changze Lv, Wenhao Liu, Jianhan Xu, Muling Wu, Zixuan Ling, Xiaoqing Zheng, Xuanjing Huang

Abstract: Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is ineffic… ▽ More Personality plays a pivotal role in shaping human expression patterns, thus regulating the personality of large language models (LLMs) holds significant potential in enhancing the user experience of LLMs. Previous methods either relied on fine-tuning LLMs on specific corpora or necessitated manually crafted prompts to elicit specific personalities from LLMs. However, the former approach is inefficient and costly, while the latter cannot precisely manipulate personality traits at a fine-grained level. To address the above challenges, we have employed a novel Unsupervisedly-Built Personalized Lexicons (UBPL) in a pluggable manner during the decoding phase of LLMs to manipulate their personality traits. UBPL is a lexicon built through an unsupervised approach from a situational judgment test dataset (SJTs4LLM). Users can utilize UBPL to adjust the probability vectors of predicted words in the decoding phase of LLMs, thus influencing the personality expression of LLMs. Extensive experimentation demonstrates the remarkable effectiveness and pluggability of our method for fine-grained manipulation of LLM's personality. △ Less

Submitted 6 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: Work in progress

Showing 1–50 of 232 results for author: Lv, C