Search | arXiv e-print repository

On-Chip Optical Skyrmionic Beam Generators

Authors: Wenbo Lin, Yasutomo Ota, Yasuhiko Arakawa, Satoshi Iwamoto

Abstract: Optical skyrmion beams, which encompass two-dimensional topology in their spatial structures, are promising for ultra-dense optical communications and advanced matter manipulation. Generating such light beams via a chip-based approach will vastly broaden their applications and promote the advancement of untapped fundamental science. Here, we present a breakthrough in chip-based technology by exper… ▽ More Optical skyrmion beams, which encompass two-dimensional topology in their spatial structures, are promising for ultra-dense optical communications and advanced matter manipulation. Generating such light beams via a chip-based approach will vastly broaden their applications and promote the advancement of untapped fundamental science. Here, we present a breakthrough in chip-based technology by experimentally demonstrating on-chip devices capable of generating optical skyrmions with tailored topological invariants. These devices, fabricated with high precision, exhibit behavior that closely aligns with theoretical predictions and numerical simulations. The realization of on-chip optical skyrmion beam generators ushers a new dawn in optical and material science. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 8 pages, 4 figures

arXiv:2408.15861 [pdf, other]

Fusing Pruned and Backdoored Models: Optimal Transport-based Data-free Backdoor Mitigation

Authors: Weilin Lin, Li Liu, Jianze Li, Hui Xiong

Abstract: Backdoor attacks present a serious security threat to deep neuron networks (DNNs). Although numerous effective defense techniques have been proposed in recent years, they inevitably rely on the availability of either clean or poisoned data. In contrast, data-free defense techniques have evolved slowly and still lag significantly in performance. To address this issue, different from the traditional… ▽ More Backdoor attacks present a serious security threat to deep neuron networks (DNNs). Although numerous effective defense techniques have been proposed in recent years, they inevitably rely on the availability of either clean or poisoned data. In contrast, data-free defense techniques have evolved slowly and still lag significantly in performance. To address this issue, different from the traditional approach of pruning followed by fine-tuning, we propose a novel data-free defense method named Optimal Transport-based Backdoor Repairing (OTBR) in this work. This method, based on our findings on neuron weight changes (NWCs) of random unlearning, uses optimal transport (OT)-based model fusion to combine the advantages of both pruned and backdoored models. Specifically, we first demonstrate our findings that the NWCs of random unlearning are positively correlated with those of poison unlearning. Based on this observation, we propose a random-unlearning NWC pruning technique to eliminate the backdoor effect and obtain a backdoor-free pruned model. Then, motivated by the OT-based model fusion, we propose the pruned-to-backdoored OT-based fusion technique, which fuses pruned and backdoored models to combine the advantages of both, resulting in a model that demonstrates high clean accuracy and a low attack success rate. To our knowledge, this is the first work to apply OT and model fusion techniques to backdoor defense. Extensive experiments show that our method successfully defends against all seven backdoor attacks across three benchmark datasets, outperforming both state-of-the-art (SOTA) data-free and data-dependent methods. The code implementation and Appendix are provided in the Supplementary Material. △ Less

Submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15252 [pdf, other]

Generative AI on SpectrumNet: An Open Benchmark of Multiband 3D Radio Maps

Authors: Shuhang Zhang, Shuai Jiang, Wanjie Lin, Zheng Fang, Kangjun Liu, Hongliang Zhang, Ke Chen

Abstract: Radio map is an efficient demonstration for visually displaying the wireless signal coverage within a certain region. It has been considered to be increasingly helpful for the future sixth generation (6G) of wireless networks, as wireless nodes are becoming more crowded and complicated. However, the construction of high resolution radio map is very challenging due to the sparse sampling in practic… ▽ More Radio map is an efficient demonstration for visually displaying the wireless signal coverage within a certain region. It has been considered to be increasingly helpful for the future sixth generation (6G) of wireless networks, as wireless nodes are becoming more crowded and complicated. However, the construction of high resolution radio map is very challenging due to the sparse sampling in practical systems. Generative artificial intelligence (AI), which is capable to create synthetic data to fill in gaps in real-world measurements, is an effective technique to construct high precision radio maps. Currently, generative models for radio map construction are trained with two-dimension (2D) single band radio maps in urban scenario, which has poor generalization in diverse terrain scenarios, spectrum bands, and heights. To tackle this problem, we provide a multiband three-dimension (3D) radio map dataset with consideration of terrain and climate information, named SpectrumNet. It is the largest radio map dataset in terms of dimensions and scale, which contains the radio map of 3 spacial dimensions, 5 frequency bands, 11 terrain scenarios, and 3 climate scenarios. We introduce the parameters and settings for the SpectrumNet dataset generation, and evaluate three baseline methods for radio map construction based on the SpectrumNet dataset. Experiments show the necessity of the SpectrumNet dataset for training models with strong generalization in spacial, frequency, and scenario domains. Future works on the SpectrumNet dataset are also discussed, including the dataset expansion and calibration, as well as the extended studies on generative models for radio map construction based on the SpectrumNet dataset. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: 30 pages, 15 figures

arXiv:2408.14968 [pdf, other]

MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce

Authors: Hao Jiang, Haoxiang Zhang, Qingshan Hou, Chaofeng Chen, Weisi Lin, Jingchang Zhang, Annan Wang

Abstract: Providing high-quality item recall for text queries is crucial in large-scale e-commerce search systems. Current Embedding-based Retrieval Systems (ERS) embed queries and items into a shared low-dimensional space, but uni-modality ERS rely too heavily on textual features, making them unreliable in complex contexts. While multi-modality ERS incorporate various data sources, they often overlook indi… ▽ More Providing high-quality item recall for text queries is crucial in large-scale e-commerce search systems. Current Embedding-based Retrieval Systems (ERS) embed queries and items into a shared low-dimensional space, but uni-modality ERS rely too heavily on textual features, making them unreliable in complex contexts. While multi-modality ERS incorporate various data sources, they often overlook individual preferences for different modalities, leading to suboptimal results. To address these issues, we propose MRSE, a Multi-modality Retrieval System that integrates text, item images, and user preferences through lightweight mixture-of-expert (LMoE) modules to better align features across and within modalities. MRSE also builds user profiles at a multi-modality level and introduces a novel hybrid loss function that enhances consistency and robustness using hard negative sampling. Experiments on a large-scale dataset from Shopee and online A/B testing show that MRSE achieves an 18.9% improvement in offline relevance and a 3.7% gain in online core metrics compared to Shopee's state-of-the-art uni-modality system. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14180 [pdf, other]

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Authors: Yiwei Ma, Jiayi Ji, Ke Ye, Weihuang Lin, Zhibin Wang, Yonghan Zheng, Qiang Zhou, Xiaoshuai Sun, Rongrong Ji

Abstract: Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench,… ▽ More Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench, a comprehensive benchmark designed to automatically evaluate the quality of edited images produced by IIE models from multiple dimensions. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. It offers three distinctive characteristics: 1) Comprehensive Evaluation Dimensions: I2EBench comprises 16 evaluation dimensions that cover both high-level and low-level aspects, providing a comprehensive assessment of each IIE model. 2) Human Perception Alignment: To ensure the alignment of our benchmark with human perception, we conducted an extensive user study for each evaluation dimension. 3) Valuable Research Insights: By analyzing the advantages and disadvantages of existing IIE models across the 16 dimensions, we offer valuable research insights to guide future development in the field. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models. The code, dataset and generated images from all IIE models are provided in github: https://github.com/cocoshe/I2EBench. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Tech report, 39 pages, 41 figures

arXiv:2408.13380 [pdf, other]

The MUSE Beamline Calorimeter

Authors: W. Lin, T. Rostomyan, R. Gilman, S. Strauch, C. Meier, C. Nestler, M. Ali, H. Atac, J. C. Bernauer, W. J. Briscoe, A. Christopher Ndukwe, E. W. Cline, K. Deiters, S. Dogra, E. J. Downie, Z. Duan, I. P. Fernando, A. Flannery, D. Ghosal, A. Golossanov, J. Guo, N. S. Ifat, Y. Ilieva, M. Kohl, I. Lavrukhin , et al. (18 additional authors not shown)

Abstract: The MUon Scattering Experiment (MUSE) was motivated by the proton radius puzzle arising from the discrepancy between muonic hydrogen spectroscopy and electron-proton measurements. The MUSE physics goals also include testing lepton universality, precisely measuring two-photon exchange contribution, and testing radiative corrections. MUSE addresses these physics goals through simultaneous measuremen… ▽ More The MUon Scattering Experiment (MUSE) was motivated by the proton radius puzzle arising from the discrepancy between muonic hydrogen spectroscopy and electron-proton measurements. The MUSE physics goals also include testing lepton universality, precisely measuring two-photon exchange contribution, and testing radiative corrections. MUSE addresses these physics goals through simultaneous measurement of high precision cross sections for electron-proton and muon-proton scattering using a mixed-species beam. The experiment will run at both positive and negative beam polarities. Measuring precise cross sections requires understanding both the incident beam energy and the radiative corrections. For this purpose, a lead-glass calorimeter was installed at the end of the beam line in the MUSE detector system. In this article we discuss the detector specifications, calibration and performance. We demonstrate that the detector performance is well reproduced by simulation, and meets experimental requirements. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12867 [pdf, other]

Semantic Alignment for Multimodal Large Language Models

Authors: Tao Wu, Mengze Li, Jingyuan Chen, Wei Ji, Wang Lin, Jinyang Gao, Kun Kuang, Zhou Zhao, Fei Wu

Abstract: Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and t… ▽ More Research on Multi-modal Large Language Models (MLLMs) towards the multi-image cross-modal instruction has received increasing attention and made significant progress, particularly in scenarios involving closely resembling images (e.g., change captioning). Existing MLLMs typically follow a two-step process in their pipelines: first, extracting visual tokens independently for each input image, and then aligning these visual tokens from different images with the Large Language Model (LLM) in its textual feature space. However, the independent extraction of visual tokens for each image may result in different semantics being prioritized for different images in the first step, leading to a lack of preservation of linking information among images for subsequent LLM analysis. This issue becomes more serious in scenarios where significant variations exist among the images (e.g., visual storytelling). To address this challenge, we introduce Semantic Alignment for Multi-modal large language models (SAM). By involving the bidirectional semantic guidance between different images in the visual-token extraction process, SAM aims to enhance the preservation of linking information for coherent analysis and align the semantics of different images before feeding them into LLM. As the test bed, we propose a large-scale dataset named MmLINK consisting of 69K samples. Different from most existing datasets for MLLMs fine-tuning, our MmLINK dataset comprises multi-modal instructions with significantly diverse images. Extensive experiments on the group captioning task and the storytelling task prove the effectiveness of our SAM model, surpassing the state-of-the-art methods by a large margin (+37% for group captioning and +22% for storytelling on CIDEr score). Project page: https://mccartney01.github.io/SAM. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: Accepted by MM 2024

arXiv:2408.12104 [pdf, other]

Minute-Cadence Observations of the LAMOST Fields with the TMTS: IV -- Catalog of Cataclysmic Variables from the First 3-yr Survey

Authors: Qichun Liu, Jie Lin, Xiaofeng Wang, Zhibin Dai, Yongkang Sun, Gaobo Xi, Jun Mo, Jialian Liu, Shengyu Yan, Alexei V. Filippenko, Thomas G. Brink, Yi Yang, Kishore C. Patra, Yongzhi Cai, Zhihao Chen, Liyang Chen, Fangzhou Guo, Xiaojun Jiang, Gaici Li, Wenxiong Li, Weili Lin, Cheng Miao, Xiaoran Ma, Haowei Peng, Qiqi Xia , et al. (2 additional authors not shown)

Abstract: The Tsinghua University--Ma Huateng Telescopes for Survey (TMTS) started to monitor the LAMOST plates in 2020, leading to the discovery of numerous short-period eclipsing binaries, peculiar pulsators, flare stars, and other variable objects. Here, we present the uninterrupted light curves for a sample of 64 cataclysmic variables (CVs) observed/discovered using the TMTS during its first three-year… ▽ More The Tsinghua University--Ma Huateng Telescopes for Survey (TMTS) started to monitor the LAMOST plates in 2020, leading to the discovery of numerous short-period eclipsing binaries, peculiar pulsators, flare stars, and other variable objects. Here, we present the uninterrupted light curves for a sample of 64 cataclysmic variables (CVs) observed/discovered using the TMTS during its first three-year observations, and we introduce new CVs and new light-variation periods (from known CVs) revealed through the TMTS observations. Thanks to the high-cadence observations of TMTS, diverse light variations, including superhumps, quasi-periodic oscillations, large-amplitude orbital modulations, and rotational modulations, are able to be detected in our CV samples, providing key observational clues for understanding the fast-developing physical processes in various CVs. All of these short-timescale light-curve features help further classify the subtypes of CV systems. We highlight the light-curve features observed in our CV sample and discuss further implications of minute-cadence light curves for CV identifications and classifications. Moreover, we examine the H$α$ emission lines in the spectra from our nonmagnetic CV samples (i.e., dwarf novae and nova-like subclasses) and find that the distribution of H$α$ emission strength shows significant differences between the sources with orbital periods above and below the period gap, which agrees with the trend seen from the SDSS nonmagnetic CV sample. △ Less

Submitted 21 August, 2024; originally announced August 2024.

Comments: 27 pages, 12 figures in main text, accepted for the publication in Universe

arXiv:2408.11523 [pdf, other]

doi 10.1145/3640457.3688135

LARR: Large Language Model Aided Real-time Scene Recommendation with Semantic Understanding

Authors: Zhizhong Wan, Bin Yin, Junjie Xie, Fei Jiang, Xiang Li, Wei Lin

Abstract: Click-Through Rate (CTR) prediction is crucial for Recommendation System(RS), aiming to provide personalized recommendation services for users in many aspects such as food delivery, e-commerce and so on. However, traditional RS relies on collaborative signals, which lacks semantic understanding to real-time scenes. We also noticed that a major challenge in utilizing Large Language Models (LLMs) fo… ▽ More Click-Through Rate (CTR) prediction is crucial for Recommendation System(RS), aiming to provide personalized recommendation services for users in many aspects such as food delivery, e-commerce and so on. However, traditional RS relies on collaborative signals, which lacks semantic understanding to real-time scenes. We also noticed that a major challenge in utilizing Large Language Models (LLMs) for practical recommendation purposes is their efficiency in dealing with long text input. To break through the problems above, we propose Large Language Model Aided Real-time Scene Recommendation(LARR), adopt LLMs for semantic understanding, utilizing real-time scene information in RS without requiring LLM to process the entire real-time scene text directly, thereby enhancing the efficiency of LLM-based CTR modeling. Specifically, recommendation domain-specific knowledge is injected into LLM and then RS employs an aggregation encoder to build real-time scene information from separate LLM's outputs. Firstly, a LLM is continual pretrained on corpus built from recommendation data with the aid of special tokens. Subsequently, the LLM is fine-tuned via contrastive learning on three kinds of sample construction strategies. Through this step, LLM is transformed into a text embedding model. Finally, LLM's separate outputs for different scene features are aggregated by an encoder, aligning to collaborative signals in RS, enhancing the performance of recommendation model. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11393 [pdf, other]

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

Authors: Chi Ma, Mincong Huang, Ying Zhang, Chao Wang, Yujie Wang, Lei Yu, Chuan Liu, Wei Lin

Abstract: Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA)… ▽ More Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA) method that leverage sequence information to exploit the inherent sparsity of models across various architectures. This method is designed to accelerate generation speed by 18-25\% without significantly compromising task performance, thereby addressing the limitations of existing DA techniques. Moreover, we delve into the root causes of LLM sparsity and theoretically analyze two of its critical features: history-related activation uncertainty and semantic-irrelevant activation inertia. Our comprehensive analyses not only provide a robust theoretical foundation for DA methods but also offer valuable insights to guide future research in optimizing LLMs for greater efficiency and effectiveness. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11003 [pdf, other]

DEEPEAST technique to enhance power in two-sample tests via the same-attraction function

Authors: Yiting Chen, Min Gao, Wei Lin, Andrew Jirasek, Kirsty Milligan, Xiaoping Shi

Abstract: Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-s… ▽ More Data depth has emerged as an invaluable nonparametric measure for the ranking of multivariate samples. The main contribution of depth-based two-sample comparisons is the introduction of the Q statistic (Liu and Singh, 1993), a quality index. Unlike traditional methods, data depth does not require the assumption of normal distributions and adheres to four fundamental properties. Many existing two-sample homogeneity tests, which assess mean and/or scale changes in distributions often suffer from low statistical power or indeterminate asymptotic distributions. To overcome these challenges, we introduced a DEEPEAST (depth-explored same-attraction sample-to-sample central-outward ranking) technique for improving statistical power in two-sample tests via the same-attraction function. We proposed two novel and powerful depth-based test statistics: the sum test statistic and the product test statistic, which are rooted in Q statistics, share a "common attractor" and are applicable across all depth functions. We further proved the asymptotic distribution of these statistics for various depth functions. To assess the performance of power gain, we apply three depth functions: Mahalanobis depth (Liu and Singh, 1993), Spatial depth (Brown, 1958; Gower, 1974), and Projection depth (Liu, 1992). Through two-sample simulations, we have demonstrated that our sum and product statistics exhibit superior power performance, utilizing a strategic block permutation algorithm and compare favourably with popular methods in literature. Our tests are further validated through analysis on Raman spectral data, acquired from cellular and tissue samples, highlighting the effectiveness of the proposed tests highlighting the effective discrimination between health and cancerous samples. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.08926 [pdf, other]

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

Authors: Andy K. Zhang, Neil Perry, Riya Dulepet, Eliot Jones, Justin W. Lin, Joey Ji, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh , et al. (2 additional authors not shown)

Abstract: Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetrat… ▽ More Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetration testing. Toward that end, we introduce Cybench, a framework for specifying cybersecurity tasks and evaluating agents on those tasks. We include 40 professional-level Capture the Flag (CTF) tasks from 4 distinct CTF competitions, chosen to be recent, meaningful, and spanning a wide range of difficulties. Each task includes its own description, starter files, and is initialized in an environment where an agent can execute bash commands and observe outputs. Since many tasks are beyond the capabilities of existing LM agents, we introduce subtasks, which break down a task into intermediary steps for more gradated evaluation; we add subtasks for 17 of the 40 tasks. To evaluate agent capabilities, we construct a cybersecurity agent and evaluate 7 models: GPT-4o, Claude 3 Opus, Claude 3.5 Sonnet, Mixtral 8x22b Instruct, Gemini 1.5 Pro, Llama 3 70B Chat, and Llama 3.1 405B Instruct. Without guidance, we find that agents are able to solve only the easiest complete tasks that took human teams up to 11 minutes to solve, with Claude 3.5 Sonnet and GPT-4o having the highest success rates. Finally, subtasks provide more signal for measuring performance compared to unguided runs, with models achieving a 3.2\% higher success rate on complete tasks with subtask-guidance than without subtask-guidance. All code and data are publicly available at https://cybench.github.io △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: 86 pages, 7 figures

arXiv:2408.08586 [pdf, other]

Rubick: Exploiting Job Reconfigurability for Deep Learning Cluster Scheduling

Authors: Xinyi Zhang, Hanyu Zhao, Wencong Xiao, Xianyan Jia, Fei Xu, Yong Li, Wei Lin, Fangming Liu

Abstract: The era of large deep learning models has given rise to advanced training strategies such as 3D parallelism and the ZeRO series. These strategies enable various (re-)configurable execution plans for a training job, which exhibit remarkably different requirements of multiple resource types. Existing cluster scheduling systems, however, treat such reconfigurable training jobs as black boxes: they re… ▽ More The era of large deep learning models has given rise to advanced training strategies such as 3D parallelism and the ZeRO series. These strategies enable various (re-)configurable execution plans for a training job, which exhibit remarkably different requirements of multiple resource types. Existing cluster scheduling systems, however, treat such reconfigurable training jobs as black boxes: they rely on users to choose execution plans statically, and then make resource allocations without awareness of the chosen plans and their resource requirements. This approach results in mismatches between execution plans and resources, making both training performance and cluster utilization far from optimal. We introduce Rubick, a cluster scheduling system for deep learning training that exploits the reconfigurability to improve job performance and cluster efficiency. Rubick incorporates the job execution planning as a new dimension in cluster scheduling, by continuously reconfiguring jobs' execution plans and tuning multi-resource allocations across jobs jointly. Such a co-optimization is navigated by a performance model that understands the diverse resource requirements and performance characteristics of different jobs and execution plans. Rubick exploits such a model to make performance-aware scheduling decisions to maximize cluster throughput while providing performance guarantees to individual jobs. Evaluations on a 64-GPU high-performance training cluster show that Rubick improves average job completion time and makespan by up to 3.2x and 1.4x, respectively, compared against state-of-the-art systems. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08315 [pdf, other]

Segment Anything for Videos: A Systematic Survey

Authors: Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan

Abstract: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various… ▽ More The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (\eg, text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond. △ Less

Submitted 30 July, 2024; originally announced August 2024.

Comments: https://github.com/983632847/SAM-for-Videos

arXiv:2408.08044 [pdf, other]

Crystalline Material Discovery in the Era of Artificial Intelligence

Authors: Zhenzhong Wang, Haowei Hua, Wanyu Lin, Ming Yang, Kay Chen Tan

Abstract: Crystalline materials, with their symmetrical and periodic structures, possess a diverse array of properties and have been widely used in various fields, ranging from electronic devices to energy applications. To discover crystalline materials, traditional experimental and computational approaches are often time-consuming and expensive. In these years, thanks to the explosive amount of crystalline… ▽ More Crystalline materials, with their symmetrical and periodic structures, possess a diverse array of properties and have been widely used in various fields, ranging from electronic devices to energy applications. To discover crystalline materials, traditional experimental and computational approaches are often time-consuming and expensive. In these years, thanks to the explosive amount of crystalline materials data, great interest has been given to data-driven materials discovery. Particularly, recent advancements have exploited the expressive representation ability of deep learning to model the highly complex atomic systems within crystalline materials, opening up new avenues for fast and accurate materials discovery. These works typically focus on four types of tasks, including physicochemical property prediction, crystalline material synthesis, aiding characterization, and accelerating theoretical computations. Despite the remarkable progress, there is still a lack of systematic research to summarize their correlations, distinctions, and limitations. To fill this gap, we systematically investigated the progress made in deep learning-based material discovery in recent years. We first introduce several data representations of the crystalline materials. Based on the representations, we summarize various fundamental deep learning models and their tailored usages in material discovery tasks. We also point out the remaining challenges and propose several future directions. This review offers comprehensive and valuable insights, and fosters progress in the intersection of artificial intelligence and material science. △ Less

Submitted 23 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

arXiv:2408.07299 [pdf, ps, other]

On the singularity of Lie-transform perturbation approach to the guiding-center problem

Authors: W. H. Lin, J. Garcia, J. Q. Li

Abstract: We present a novel scheme of carrying out the Lie-transform perturbation for the guiding-center motion, with an aim at addressing directly the problem of singularity which exists intrinsically in the determining equation for the generating vector, and which gives rise to the formidable gauge functions in the pure oscillating part of the Lie transformation. Whereas in most applications of Lie-trans… ▽ More We present a novel scheme of carrying out the Lie-transform perturbation for the guiding-center motion, with an aim at addressing directly the problem of singularity which exists intrinsically in the determining equation for the generating vector, and which gives rise to the formidable gauge functions in the pure oscillating part of the Lie transformation. Whereas in most applications of Lie-transform perturbation such gauge functions must be approximately solved from some partial differential equations, our scheme, characterized by a staggered determination of the generating vectors, naturally produces the gauge functions through explicit integral over the gyro-angle, leaving no unaccountable error of high order in all the succeeding transformations. Based on such scheme, a formalism of guiding-center transformation has been derived in a unified manner retaining the effects of the strong ExB shearing as well as those of electromagnetic fluctuations. △ Less

Submitted 14 August, 2024; originally announced August 2024.

Comments: 12 pages

arXiv:2408.06969 [pdf, ps, other]

IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization

Authors: Guanchang Li, Wensheng Lin, Lixin Li, Yixuan He, Fucheng Yang, Zhu Han

Abstract: This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of… ▽ More This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of the simulations conducted to evaluate the performance of the DRL-based method. The simulation results indicate that the outage probability of the considered system increases significantly with more correlated channel coefficients. Moreover, the performance gap between DRL and theoretical limit increases with higher transmit power and/or larger distortion requirement. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.06608 [pdf, other]

Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

Authors: Yu Feng, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Han Zhao, Xiaofeng Hou, Jieru Zhao, Yuhao Zhu

Abstract: Neural Radiance Field (NeRF) has emerged as a promising alternative for photorealistic rendering. Despite recent algorithmic advancements, achieving real-time performance on today's resource-constrained devices remains challenging. In this paper, we identify the primary bottlenecks in current NeRF algorithms and introduce a unified algorithm-architecture co-design, Potamoi, designed to accommodate… ▽ More Neural Radiance Field (NeRF) has emerged as a promising alternative for photorealistic rendering. Despite recent algorithmic advancements, achieving real-time performance on today's resource-constrained devices remains challenging. In this paper, we identify the primary bottlenecks in current NeRF algorithms and introduce a unified algorithm-architecture co-design, Potamoi, designed to accommodate various NeRF algorithms. Specifically, we introduce a runtime system featuring a plug-and-play algorithm, SpaRW, which significantly reduces the per-frame computational workload and alleviates compute inefficiencies. Furthermore, our unified streaming pipeline coupled with customized hardware support effectively tames both SRAM and DRAM inefficiencies by minimizing repetitive DRAM access and completely eliminating SRAM bank conflicts. When evaluated against a baseline utilizing a dedicated DNN accelerator, our framework demonstrates a speed-up and energy reduction of 53.1$\times$ and 67.7$\times$, respectively, all while maintaining high visual quality with less than a 1.0 dB reduction in peak signal-to-noise ratio. △ Less

Submitted 12 August, 2024; originally announced August 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2404.11852

arXiv:2408.05631 [pdf, other]

PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer

Authors: Libo Zhang, Yuxuan Han, Wenbin Lin, Jingwang Ling, Feng Xu

Abstract: We present PRTGaussian, a realtime relightable novel-view synthesis method made possible by combining 3D Gaussians and Precomputed Radiance Transfer (PRT). By fitting relightable Gaussians to multi-view OLAT data, our method enables real-time, free-viewpoint relighting. By estimating the radiance transfer based on high-order spherical harmonics, we achieve a balance between capturing detailed reli… ▽ More We present PRTGaussian, a realtime relightable novel-view synthesis method made possible by combining 3D Gaussians and Precomputed Radiance Transfer (PRT). By fitting relightable Gaussians to multi-view OLAT data, our method enables real-time, free-viewpoint relighting. By estimating the radiance transfer based on high-order spherical harmonics, we achieve a balance between capturing detailed relighting effects and maintaining computational efficiency. We utilize a two-stage process: in the first stage, we reconstruct a coarse geometry of the object from multi-view images. In the second stage, we initialize 3D Gaussians with the obtained point cloud, then simultaneously refine the coarse geometry and learn the light transport for each Gaussian. Extensive experiments on synthetic datasets show that our approach can achieve fast and high-quality relighting for general objects. Code and data are available at https://github.com/zhanglbthu/PRTGaussian. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2408.05112 [pdf, other]

Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

Authors: Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Rui Li, Wenchi Cheng, Zhu Han

Abstract: Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scen… ▽ More Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scenarios. This system leverages deep generative models to establish a new paradigm in SC. Specifically, At the transmitter end, it employs a joint source-channel coding mechanism based on the Swin Transformer for efficient semantic feature extraction and compression. At the receiver end, an advanced Diffusion Model (DM) reconstructs high-quality images from degraded signals, enhancing perceptual details. Additionally, we present a Multi-User Generative Semantic Communication (MU-GSC) system utilizing an asynchronous processing model. This model effectively manages multiple user requests and optimally utilizes system resources for parallel processing. Simulation results on public datasets demonstrate that our generative AI semantic communication systems achieve superior transmission efficiency and enhanced communication content quality across various channel conditions. Compared to CNN-based DeepJSCC, our methods improve the Peak Signal-to-Noise Ratio (PSNR) by 17.75% in Additive White Gaussian Noise (AWGN) channels and by 20.86% in Rayleigh channels. △ Less

Submitted 31 July, 2024; originally announced August 2024.

arXiv:2408.05019 [pdf, other]

Instruction Tuning-free Visual Token Complement for Multimodal LLMs

Authors: Dongsheng Wang, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang

Abstract: As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (… ▽ More As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (VTC) that helps MLLMs regain the missing visual features and thus improve response accuracy. Specifically, our VTC integrates text-to-image generation as a guide to identifying the text-irrelevant features, and a visual selector is then developed to generate complementary visual tokens to enrich the original visual input. Moreover, an iterative strategy is further designed to extract more visual information by iteratively using the visual selector without any additional training. Notably, the training pipeline requires no additional image-text pairs, resulting in a desired instruction tuning-free property. Both qualitative and quantitative experiments demonstrate the superiority and efficiency of our VTC. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Comments: Accepted by ECCV2024 (20pages)

arXiv:2408.04947 [pdf, other]

Revealing the Fate of Exoplanet Systems: Asteroseismic Identification of Host Star in the Red Clump or Red Giant Branch

Authors: Wen-Xu Lin, Sheng-Bang Qian, Li-Ying Zhu

Abstract: Determining the evolutionary stage of stars is crucial for understanding the evolution of exoplanetary systems. In this context, Red Giant Branch (RGB) and Red Clump (RC) stars, stages in the later evolution of stars situated before and after the helium flash, harbor critical clues to unveiling the evolution of planets. The first step in revealing these clues is to confirm the evolutionary stage o… ▽ More Determining the evolutionary stage of stars is crucial for understanding the evolution of exoplanetary systems. In this context, Red Giant Branch (RGB) and Red Clump (RC) stars, stages in the later evolution of stars situated before and after the helium flash, harbor critical clues to unveiling the evolution of planets. The first step in revealing these clues is to confirm the evolutionary stage of the host stars through asteroseismology. However, up to now, host stars confirmed to be RGB or RC stars are extremely rare. In this investigation, we present a comprehensive asteroseismic analysis of two evolved stars, HD 120084 and HD 29399, known to harbor exoplanets, using data from the Transiting Exoplanet Survey Satellite (TESS). We have discovered for the first time that HD 120084 is a Red Clump star in the helium-core burning phase, and confirmed that HD 29399 is a Red Giant Branch star in the hydrogen-shell burning phase. Through the precise measurement of asteroseismic parameters such as $ν_{max}$, $Δν$ and $ΔΠ_{1}$ we have determined the evolutionary states of these stars and derived their fundamental stellar parameters. The significance of this study lies in the application of automated techniques to measure asymptotic period spacings in red giants, which provides critical insights into the evolutionary outcomes of exoplanet systems. We demonstrate that asteroseismology is a potent tool for probing the internal structures of stars, thereby offering a window into the past and future dynamics of planetary orbits. The presence of a long-period giant planet orbiting HD 120084, in particular, raises intriguing questions about the potential engulfment of inner planets during the host star's expansion, a hypothesis that warrants further investigation. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04158 [pdf, other]

Efficient Single Image Super-Resolution with Entropy Attention and Receptive Field Augmentation

Authors: Xiaole Zhao, Linze Li, Chengxing Xie, Xiaoming Zhang, Ting Jiang, Wenjie Lin, Shuaicheng Liu, Tianrui Li

Abstract: Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient S… ▽ More Transformer-based deep models for single image super-resolution (SISR) have greatly improved the performance of lightweight SISR tasks in recent years. However, they often suffer from heavy computational burden and slow inference due to the complex calculation of multi-head self-attention (MSA), seriously hindering their practical application and deployment. In this work, we present an efficient SR model to mitigate the dilemma between model efficiency and SR performance, which is dubbed Entropy Attention and Receptive Field Augmentation network (EARFA), and composed of a novel entropy attention (EA) and a shifting large kernel attention (SLKA). From the perspective of information theory, EA increases the entropy of intermediate features conditioned on a Gaussian distribution, providing more informative input for subsequent reasoning. On the other hand, SLKA extends the receptive field of SR models with the assistance of channel shifting, which also favors to boost the diversity of hierarchical features. Since the implementation of EA and SLKA does not involve complex computations (such as extensive matrix multiplications), the proposed method can achieve faster nonlinear inference than Transformer-based SR models while maintaining better SR performance. Extensive experiments show that the proposed model can significantly reduce the delay of model inference while achieving the SR performance comparable with other advanced models. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: Accepted to ACM MM 2024

arXiv:2408.03790 [pdf, other]

Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection

Authors: Christian Fruhwirth-Reisinger, Wei Lin, Dušan Malić, Horst Bischof, Horst Possegger

Abstract: Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated data, which is expensive to obtain and restricted to predefined object categories. To mitigate manual labeling efforts, recent unsupervised object detection approaches generate class-agn… ▽ More Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated data, which is expensive to obtain and restricted to predefined object categories. To mitigate manual labeling efforts, recent unsupervised object detection approaches generate class-agnostic pseudo-labels for moving objects, subsequently serving as supervision signal to bootstrap a detector. Despite promising results, these approaches do not provide class labels or generalize well to static objects. Furthermore, they are mostly restricted to data containing multiple drives from the same scene or images from a precisely calibrated and synchronized camera setup. To overcome these limitations, we propose a vision-language-guided unsupervised 3D detection approach that operates exclusively on LiDAR point clouds. We transfer CLIP knowledge to classify point clusters of static and moving objects, which we discover by exploiting the inherent spatio-temporal information of LiDAR point clouds for clustering, tracking, as well as box and label refinement. Our approach outperforms state-of-the-art unsupervised 3D object detectors on the Waymo Open Dataset ($+23~\text{AP}_{3D}$) and Argoverse 2 ($+7.9~\text{AP}_{3D}$) and provides class labels not solely based on object size assumptions, marking a significant advancement in the field. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: Accepted to BMVC 2024

arXiv:2408.02657 [pdf, other]

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Authors: Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, Peng Gao

Abstract: We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. Unlike existing autoregressive image generation approaches, Lumina-mGPT employs a pretrained decoder-only transformer as a unified framework for modeling multimodal token sequences. Our key ins… ▽ More We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. Unlike existing autoregressive image generation approaches, Lumina-mGPT employs a pretrained decoder-only transformer as a unified framework for modeling multimodal token sequences. Our key insight is that a simple decoder-only transformer with multimodal Generative PreTraining (mGPT), utilizing the next-token prediction objective on massive interleaved text-image sequences, can learn broad and general multimodal capabilities, thereby illuminating photorealistic text-to-image generation. Building on these pretrained models, we propose Flexible Progressive Supervised Finetuning (FP-SFT) on high-quality image-text pairs to fully unlock their potential for high-aesthetic image synthesis at any resolution while maintaining their general multimodal capabilities. Furthermore, we introduce Ominiponent Supervised Finetuning (Omni-SFT), transforming Lumina-mGPT into a foundation model that seamlessly achieves omnipotent task unification. The resulting model demonstrates versatile multimodal capabilities, including visual generation tasks like flexible text-to-image generation and controllable generation, visual recognition tasks like segmentation and depth estimation, and vision-language tasks like multiturn visual question answering. Additionally, we analyze the differences and similarities between diffusion-based and autoregressive methods in a direct comparison. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: Code available at: https://github.com/Alpha-VLLM/Lumina-mGPT

arXiv:2408.02230 [pdf, other]

doi 10.3847/1538-4357/ad6982

Mock Observations: Three Different Types of Galaxy Alignment in TNG100 Simulations

Authors: Yanyao Lan, Lin Tang, Weipeng Lin, Junyu Gong

Abstract: In this study, galaxy samples have been generated using mock observation techniques based on the results of TNG100-1 simulations to investigate three forms of intrinsic alignment: satellite-central alignment between the orientation of the brightest group galaxies (BGG) and the spatial distribution of their satellites, radial alignment between the satellites' orientation and the direction towards t… ▽ More In this study, galaxy samples have been generated using mock observation techniques based on the results of TNG100-1 simulations to investigate three forms of intrinsic alignment: satellite-central alignment between the orientation of the brightest group galaxies (BGG) and the spatial distribution of their satellites, radial alignment between the satellites' orientation and the direction towards their BGG, as well as direct alignment between the orientation of BGG and that of its satellites. Overall, the predictions of galaxy alignment generally align with observations, although minor discrepancies have been identified. For satellite-central alignment, the alignment strength and color-dependence trends are well replicated by the mock observations. Regarding radial alignment, the signals are weak but discernible, with no apparent color dependence. As for direct alignment, no signal is detected, nor is there any color dependence. We also investigate the alignment dependencies on halo or the BGG properties, and proximity effect. For satellite-central alignment, the predicted alignment signal shows a positive correlation with halo and BGG mass, consistent with observations and previous predictions. Similar correlations have also been observed with the BGG age and metallicity, which merit future observational analysis for confirmation. Proximity effects have been observed for all three types of alignment, with satellites closer to the BGG exhibiting stronger alignment signals. The influence of galaxy definition and shape determination on alignment studies is also analyzed. This study underscores the importance of employing mock observation techniques for a fair comparison between predictions and observations. △ Less

Submitted 6 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

Comments: 18 pages, 10 figures, 2 tables, ApJ accepted. As suggested by the TNG team, we have changed "IllustrisTNG100" to "TNG100"

arXiv:2407.21507 [pdf, other]

FSSC: Federated Learning of Transformer Neural Networks for Semantic Image Communication

Authors: Yuna Yan, Xin Zhang, Lixin Li, Wensheng Lin, Rui Li, Wenchi Cheng, Zhu Han

Abstract: In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next,… ▽ More In this paper, we address the problem of image semantic communication in a multi-user deployment scenario and propose a federated learning (FL) strategy for a Swin Transformer-based semantic communication system (FSSC). Firstly, we demonstrate that the adoption of a Swin Transformer for joint source-channel coding (JSCC) effectively extracts semantic information in the communication system. Next, the FL framework is introduced to collaboratively learn a global model by aggregating local model parameters, rather than directly sharing clients' data. This approach enhances user privacy protection and reduces the workload on the server or mobile edge. Simulation evaluations indicate that our method outperforms the typical JSCC algorithm and traditional separate-based communication algorithms. Particularly after integrating local semantics, the global aggregation model has further increased the Peak Signal-to-Noise Ratio (PSNR) by more than 2dB, thoroughly proving the effectiveness of our algorithm. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.21142 [pdf, other]

Candidate Distant Trans-Neptunian Objects Detected by the New Horizons Subaru TNO Survey

Authors: Wesley C. Fraser, Simon B. Porter, Lowell Peltier, JJ Kavelaars, Anne J. Verbiscer, Marc W. Buie, S. Alan Stern, John R. Spencer, Susan D. Benecchi, Tsuyoshi Terai, Takashi Ito, Fumi Yoshida, David W. Gerdes, Kevin J. Napier, Hsing Wen Lin, Stephen D. J. Gwyn, Hayden Smotherman, Sebastien Fabbro, Kelsi N. Singer, Amanda M. Alexander, Ko Arimatsu, Maria E. Banks, Veronica J. Bray, Mohamed Ramy El-Maarry, Chelsea L. Ferrell , et al. (19 additional authors not shown)

Abstract: We report the detection of 239 trans-Neptunian Objects discovered through the on-going New Horizons survey for distant minor bodies being performed with the Hyper Suprime-Cam mosaic imager on the Subaru Telescope. These objects were discovered in images acquired with either the r2 or the recently commissioned EB-gri filter using shift and stack routines. Due to the extremely high stellar density o… ▽ More We report the detection of 239 trans-Neptunian Objects discovered through the on-going New Horizons survey for distant minor bodies being performed with the Hyper Suprime-Cam mosaic imager on the Subaru Telescope. These objects were discovered in images acquired with either the r2 or the recently commissioned EB-gri filter using shift and stack routines. Due to the extremely high stellar density of the search region down stream of the spacecraft, new machine learning techniques had to be developed to manage the extremely high false positive rate of bogus candidates produced from the shift and stack routines. We report discoveries as faint as r2$\sim26.5$. We highlight an overabundance of objects found at heliocentric distances $R\gtrsim70$~au compared to expectations from modelling of the known outer Solar System. If confirmed, these objects betray the presence of a heretofore unrecognized abundance of distant objects that can help explain a number of other observations that otherwise remain at odds with the known Kuiper Belt, including detections of serendipitous stellar occultations, and recent results from the Student Dust Counter on-board the New Horizons spacecraft. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: Accepted for publication in the Planetary Science Journal, 28 pages, 7 figures, 3 tables

arXiv:2407.21118 [pdf, other]

Palu: Compressing KV-Cache with Low-Rank Projection

Authors: Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu

Abstract: KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matri… ▽ More KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matrices, caches the smaller intermediate states, and reconstructs the full keys and values on the fly. To improve accuracy, compression rate, and efficiency, Palu further encompasses (1) a medium-grained low-rank decomposition scheme, (2) an efficient rank search algorithm, (3) a low-rank-aware quantization algorithm, and (4) matrix fusion with optimized GPU kernels. Our extensive experiments with popular LLMs show that Palu can compress KV-Cache by more than 91.25% while maintaining a significantly better accuracy (up to 1.19 lower perplexity) than state-of-the-art KV-Cache quantization methods at a similar or even higher memory usage. When compressing KV-Cache for 50%, Palu delivers up to 1.61x end-to-end speedup for the attention module. Our code is publicly available at https://github.com/shadowpa0327/Palu. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20956 [pdf, other]

An Effective Dynamic Gradient Calibration Method for Continual Learning

Authors: Weichen Lin, Jiaxiang Chen, Ruomin Huang, Hu Ding

Abstract: Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks. Due to the memory limit, we cannot store all the historical data, and therefore confront the ``catastrophic forgetting'' problem, i.e., the performance on the previous tasks can substantially decrease because of the missing information in the latter peri… ▽ More Continual learning (CL) is a fundamental topic in machine learning, where the goal is to train a model with continuously incoming data and tasks. Due to the memory limit, we cannot store all the historical data, and therefore confront the ``catastrophic forgetting'' problem, i.e., the performance on the previous tasks can substantially decrease because of the missing information in the latter period. Though a number of elegant methods have been proposed, the catastrophic forgetting phenomenon still cannot be well avoided in practice. In this paper, we study the problem from the gradient perspective, where our aim is to develop an effective algorithm to calibrate the gradient in each updating step of the model; namely, our goal is to guide the model to be updated in the right direction under the situation that a large amount of historical data are unavailable. Our idea is partly inspired by the seminal stochastic variance reduction methods (e.g., SVRG and SAGA) for reducing the variance of gradient estimation in stochastic gradient descent algorithms. Another benefit is that our approach can be used as a general tool, which is able to be incorporated with several existing popular CL methods to achieve better performance. We also conduct a set of experiments on several benchmark datasets to evaluate the performance in practice. △ Less

Submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.20119 [pdf, ps, other]

Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

Authors: Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan

Abstract: We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friend… ▽ More We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friendly feature representations by an enhanced graph auto-encoder with contrastive learning technique. It further leverages the clustering results adaptively obtained by robust continuous clustering (RCC) to generate prototypes for negative sampling, which can further contribute to promoting consistency among positive pairs and enlarging the gap between positive and negative samples. ASRC obtains the final clustering results by applying RCC to the learned feature representations with their consistent graph structure and edge weights. Extensive experiments conducted on seven benchmark datasets demonstrate the efficacy of ASRC, demonstrating its superior performance over other popular clustering models. Notably, ASRC even outperforms methods that rely on prior knowledge of the number of clusters, highlighting its effectiveness in addressing the challenges of clustering unstructured data. △ Less

Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19727 [pdf, other]

Adaptive Utilization of Cross-scenario Information for Multi-scenario Recommendation

Authors: Xiufeng Shu, Ruidong Han, Xiang Li, Wei Lin

Abstract: Recommender system of the e-commerce platform usually serves multiple business scenarios. Multi-scenario Recommendation (MSR) is an important topic that improves ranking performance by leveraging information from different scenarios. Recent methods for MSR mostly construct scenario shared or specific modules to model commonalities and differences among scenarios. However, when the amount of data a… ▽ More Recommender system of the e-commerce platform usually serves multiple business scenarios. Multi-scenario Recommendation (MSR) is an important topic that improves ranking performance by leveraging information from different scenarios. Recent methods for MSR mostly construct scenario shared or specific modules to model commonalities and differences among scenarios. However, when the amount of data among scenarios is skewed or data in some scenarios is extremely sparse, it is difficult to learn scenario-specific parameters well. Besides, simple sharing of information from other scenarios may result in a negative transfer. In this paper, we propose a unified model named Cross-Scenario Information Interaction (CSII) to serve all scenarios by a mixture of scenario-dominated experts. Specifically, we propose a novel method to select highly transferable features in data instances. Then, we propose an attention-based aggregator module, which can adaptively extract relative knowledge from cross-scenario. Experiments on the production dataset verify the superiority of our method. Online A/B test in Meituan Waimai APP also shows a significant performance gain, leading to an average improvement in GMV (Gross Merchandise Value) of 1.0% for overall scenarios. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19704 [pdf, other]

UNQA: Unified No-Reference Quality Assessment for Audio, Image, Video, and Audio-Visual Content

Authors: Yuqin Cao, Xiongkuo Min, Yixuan Gao, Wei Sun, Weisi Lin, Guangtao Zhai

Abstract: As multimedia data flourishes on the Internet, quality assessment (QA) of multimedia data becomes paramount for digital media applications. Since multimedia data includes multiple modalities including audio, image, video, and audio-visual (A/V) content, researchers have developed a range of QA methods to evaluate the quality of different modality data. While they exclusively focus on addressing th… ▽ More As multimedia data flourishes on the Internet, quality assessment (QA) of multimedia data becomes paramount for digital media applications. Since multimedia data includes multiple modalities including audio, image, video, and audio-visual (A/V) content, researchers have developed a range of QA methods to evaluate the quality of different modality data. While they exclusively focus on addressing the single modality QA issues, a unified QA model that can handle diverse media across multiple modalities is still missing, whereas the latter can better resemble human perception behaviour and also have a wider range of applications. In this paper, we propose the Unified No-reference Quality Assessment model (UNQA) for audio, image, video, and A/V content, which tries to train a single QA model across different media modalities. To tackle the issue of inconsistent quality scales among different QA databases, we develop a multi-modality strategy to jointly train UNQA on multiple QA databases. Based on the input modality, UNQA selectively extracts the spatial features, motion features, and audio features, and calculates a final quality score via the four corresponding modality regression modules. Compared with existing QA methods, UNQA has two advantages: 1) the multi-modality training strategy makes the QA model learn more general and robust quality-aware feature representation as evidenced by the superior performance of UNQA compared to state-of-the-art QA methods. 2) UNQA reduces the number of models required to assess multimedia data across different modalities. and is friendly to deploy to practical applications. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19658 [pdf, other]

doi 10.1145/3627673.3679914

Enhancing CTR Prediction through Sequential Recommendation Pre-training: Introducing the SRP4CTR Framework

Authors: Ruidong Han, Qianzhong Li, He Jiang, Rui Li, Yurou Zhao, Xiang Li, Wei Lin

Abstract: Understanding user interests is crucial for Click-Through Rate (CTR) prediction tasks. In sequential recommendation, pre-training from user historical behaviors through self-supervised learning can better comprehend user dynamic preferences, presenting the potential for direct integration with CTR tasks. Previous methods have integrated pre-trained models into downstream tasks with the sole purpos… ▽ More Understanding user interests is crucial for Click-Through Rate (CTR) prediction tasks. In sequential recommendation, pre-training from user historical behaviors through self-supervised learning can better comprehend user dynamic preferences, presenting the potential for direct integration with CTR tasks. Previous methods have integrated pre-trained models into downstream tasks with the sole purpose of extracting semantic information or well-represented user features, which are then incorporated as new features. However, these approaches tend to ignore the additional inference costs to the downstream tasks, and they do not consider how to transfer the effective information from the pre-trained models for specific estimated items in CTR prediction. In this paper, we propose a Sequential Recommendation Pre-training framework for CTR prediction (SRP4CTR) to tackle the above problems. Initially, we discuss the impact of introducing pre-trained models on inference costs. Subsequently, we introduced a pre-trained method to encode sequence side information concurrently.During the fine-tuning process, we incorporate a cross-attention block to establish a bridge between estimated items and the pre-trained model at a low cost. Moreover, we develop a querying transformer technique to facilitate the knowledge transfer from the pre-trained model to industrial CTR models. Offline and online experiments show that our method outperforms previous baseline models. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.19482 [pdf]

Bistability in spatiotemporal mode-locking with dynamic multimode gain

Authors: Zhijin Xiong, Yuankai Guo, Wei Lin, Hao Xiu, Yuncong Ma, Xuewen Chen, Zhaoheng Liang, Lin Ling, Tao Liu, Xiaoming Wei, Zhongmin Yang

Abstract: Three-dimensional (3D) dissipative soliton existed in spatiotemporal mode-locked (STML) multimode fiber laser has been demonstrated to be a promising formalism for generating high-energy femtosecond pulses, which unfortunately exhibit diverse spatiotemporal dynamics that have not been fully understood. Completely modeling the STML multimode fiber lasers can shed new light on the underlying physics… ▽ More Three-dimensional (3D) dissipative soliton existed in spatiotemporal mode-locked (STML) multimode fiber laser has been demonstrated to be a promising formalism for generating high-energy femtosecond pulses, which unfortunately exhibit diverse spatiotemporal dynamics that have not been fully understood. Completely modeling the STML multimode fiber lasers can shed new light on the underlying physics of the spatiotemporal dynamics and thus better manipulate the generation of high-quality energic femtosecond pulses, which however is still largely unmet. To this end, here we theoretically investigate a dynamic multimode gain model of the STML multimode fiber laser by exploring the multimode rate equation (MMRE) in the framework of generalized multimode nonlinear Schrödinger equation. Using this dynamic multimode gain model, the attractor dissection theory is revisited to understand the dominant effects that determine the modal composition of 3D dissipative soliton. Specifically, by varying the numerical aperture of the multimode gain fiber (MMGF), different gain dynamics that correspond to distinct types of gain attractors are observed. As a result, two distinguishing STML operation regimes, respectively governed by the multimode gain effect and spatiotemporal saturable absorption, are identified. In the latter regime, especially, 3D dissipative solitons present bistability that there exist bifurcated solutions with two different linearly polarized (LP) mode compositions. To verify the theoretical findings, the experimental implementation shows that the state of STML can be switched between different LP modes, and confirms the presence of bistability. Particularly, the 3D-soliton shaping mechanism that is governed by the multimode gain effect is testified for the first time, to the best of our knowledge. △ Less

Submitted 30 July, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.19387 [pdf, other]

BPS Chaos

Authors: Yiming Chen, Henry W. Lin, Stephen H. Shenker

Abstract: Black holes are chaotic quantum systems that are expected to exhibit random matrix statistics in their finite energy spectrum. Lin, Maldacena, Rozenberg and Shan (LMRS) have proposed a related characterization of chaos for the ground states of BPS black holes with finite area horizons. On a separate front, the "fuzzball program" has uncovered large families of horizon-free geometries that account… ▽ More Black holes are chaotic quantum systems that are expected to exhibit random matrix statistics in their finite energy spectrum. Lin, Maldacena, Rozenberg and Shan (LMRS) have proposed a related characterization of chaos for the ground states of BPS black holes with finite area horizons. On a separate front, the "fuzzball program" has uncovered large families of horizon-free geometries that account for the entropy of holographic BPS systems, but only in situations with sufficient supersymmetry to exclude finite area horizons. The highly structured, non-random nature of these solutions seems in tension with strong chaos. We verify this intuition by performing analytic and numerical calculations of the LMRS diagnostic in the corresponding boundary quantum system. In particular we examine the 1/2 and 1/4-BPS sectors of $\mathcal{N}=4$ SYM, and the two charge sector of the D1-D5 CFT. We find evidence that these systems are only weakly chaotic, with a Thouless time determining the onset of chaos that grows as a power of $N$. In contrast, finite horizon area BPS black holes should be strongly chaotic, with a Thouless time of order one. In this case, finite energy chaotic states become BPS as $N$ is decreased through the recently discovered "fortuity" mechanism. Hence they can plausibly retain their strongly chaotic character. △ Less

Submitted 28 July, 2024; originally announced July 2024.

Comments: 52 pages plus appendices, 23 figures

arXiv:2407.19272 [pdf, other]

Isovolumetric Energy Minimization for Ball-Shaped Volume-Preserving Parameterizations of 3-Manifolds

Authors: Shu-Yung Liu, Tsung-Ming Huang, Wen-Wei Lin, Mei-Heng Yueh

Abstract: A volume-preserving parameterization is a bijective mapping that maps a 3-manifold onto a specified canonical domain that preserves the local volume. This paper formulates the computation of ball-shaped volume-preserving parameterizations as an isovolumetric energy minimization (IEM) problem with the boundary points constrained on a unit sphere. In addition, we develop a new preconditioned nonline… ▽ More A volume-preserving parameterization is a bijective mapping that maps a 3-manifold onto a specified canonical domain that preserves the local volume. This paper formulates the computation of ball-shaped volume-preserving parameterizations as an isovolumetric energy minimization (IEM) problem with the boundary points constrained on a unit sphere. In addition, we develop a new preconditioned nonlinear conjugate gradient algorithm for solving the IEM problem with guaranteed theoretical convergence and significantly improved accuracy and computational efficiency compared to other state-of-the-art algorithms. Applications to solid shape registration and deformation are presented to highlight the usefulness of the proposed algorithm. △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: 23 pages, 10 figures

MSC Class: 65D18; 68U05; 68U01; 65D17

arXiv:2407.17779 [pdf, other]

doi 10.1145/3664647.3680859

DAC: 2D-3D Retrieval with Noisy Labels via Divide-and-Conquer Alignment and Correction

Authors: Chaofan Gan, Yuanpeng Tu, Yuxi Li, Weiyao Lin

Abstract: With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity t… ▽ More With the recent burst of 2D and 3D data, cross-modal retrieval has attracted increasing attention recently. However, manual labeling by non-experts will inevitably introduce corrupted annotations given ambiguous 2D/3D content. Though previous works have addressed this issue by designing a naive division strategy with hand-crafted thresholds, their performance generally exhibits great sensitivity to the threshold value. Besides, they fail to fully utilize the valuable supervisory signals within each divided subset. To tackle this problem, we propose a Divide-and-conquer 2D-3D cross-modal Alignment and Correction framework (DAC), which comprises Multimodal Dynamic Division (MDD) and Adaptive Alignment and Correction (AAC). Specifically, the former performs accurate sample division by adaptive credibility modeling for each sample based on the compensation information within multimodal loss distribution. Then in AAC, samples in distinct subsets are exploited with different alignment strategies to fully enhance the semantic compactness and meanwhile alleviate over-fitting to noisy labels, where a self-correction strategy is introduced to improve the quality of representation. Moreover. To evaluate the effectiveness in real-world scenarios, we introduce a challenging noisy benchmark, namely Objaverse-N200, which comprises 200k-level samples annotated with 1156 realistic noisy labels. Extensive experiments on both traditional and the newly proposed benchmarks demonstrate the generality and superiority of our DAC, where DAC outperforms state-of-the-art models by a large margin. (i.e., with +5.9% gain on ModelNet40 and +5.8% on Objaverse-N200). △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: accepted by ACM MM 2024

arXiv:2407.17035 [pdf, other]

Q-Ground: Image Quality Grounding with Large Multi-modality Models

Authors: Chaofeng Chen, Sensen Yang, Haoning Wu, Liang Liao, Zicheng Zhang, Annan Wang, Wenxiu Sun, Qiong Yan, Weisi Lin

Abstract: Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content. However, these advancements are mostly focused on overall quality assessment, and the detailed examination of local quality, which is crucial for comprehensive visual understanding, is still largely unexplored. In thi… ▽ More Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content. However, these advancements are mostly focused on overall quality assessment, and the detailed examination of local quality, which is crucial for comprehensive visual understanding, is still largely unexplored. In this work, we introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding by combining large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the QGround-100K dataset, a novel resource containing 100k triplets of (image, quality text, distortion segmentation) to facilitate deep investigations into visual quality. The dataset comprises two parts: one with human-labeled annotations for accurate quality assessment, and another labeled automatically by LMMs such as GPT4V, which helps improve the robustness of model training while also reducing the costs of data collection. With the QGround-100K dataset, we propose a LMM-based method equipped with multi-scale feature learning to learn models capable of performing both image quality answering and distortion segmentation based on text prompts. This dual-capability approach not only refines the model's understanding of region-aware image quality but also enables it to interactively respond to complex, text-based queries about image quality and specific distortions. Q-Ground takes a step towards sophisticated visual quality analysis in a finer scale, establishing a new benchmark for future research in the area. Codes and dataset are available at https://github.com/Q-Future/Q-Ground. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: ACM Multimedia 2024 (Oral)

arXiv:2407.16198 [pdf, other]

INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model

Authors: Yiwei Ma, Zhibin Wang, Xiaoshuai Sun, Weihuang Lin, Qiang Zhou, Jiayi Ji, Rongrong Ji

Abstract: With advancements in data availability and computing resources, Multimodal Large Language Models (MLLMs) have showcased capabilities across various fields. However, the quadratic complexity of the vision encoder in MLLMs constrains the resolution of input images. Most current approaches mitigate this issue by cropping high-resolution images into smaller sub-images, which are then processed indepen… ▽ More With advancements in data availability and computing resources, Multimodal Large Language Models (MLLMs) have showcased capabilities across various fields. However, the quadratic complexity of the vision encoder in MLLMs constrains the resolution of input images. Most current approaches mitigate this issue by cropping high-resolution images into smaller sub-images, which are then processed independently by the vision encoder. Despite capturing sufficient local details, these sub-images lack global context and fail to interact with one another. To address this limitation, we propose a novel MLLM, INF-LLaVA, designed for effective high-resolution image perception. INF-LLaVA incorporates two innovative components. First, we introduce a Dual-perspective Cropping Module (DCM), which ensures that each sub-image contains continuous details from a local perspective and comprehensive information from a global perspective. Second, we introduce Dual-perspective Enhancement Module (DEM) to enable the mutual enhancement of global and local features, allowing INF-LLaVA to effectively process high-resolution images by simultaneously capturing detailed local information and comprehensive global context. Extensive ablation studies validate the effectiveness of these components, and experiments on a diverse set of benchmarks demonstrate that INF-LLaVA outperforms existing MLLMs. Code and pretrained model are available at https://github.com/WeihuangLin/INF-LLaVA. △ Less

Submitted 23 July, 2024; originally announced July 2024.

arXiv:2407.15783 [pdf, other]

24 days-stable CNOT-gate on fluxonium qubits with over 99.9% fidelity

Authors: Wei-Ju Lin, Hyunheung Cho, Yinqi Chen, Maxim G. Vavilov, Chen Wang, Vladimir E. Manucharyan

Abstract: Fluxonium qubit is a promising building block for quantum information processing due to its long coherence time and strong anharmonicity. In this paper, we realize a 60 ns direct CNOT-gate on two inductively-coupled fluxonium qubits using selective darkening approach, resulting in a gate fidelity as high as 99.94%. The fidelity remains above 99.9% for 24 days without any recalibration between rand… ▽ More Fluxonium qubit is a promising building block for quantum information processing due to its long coherence time and strong anharmonicity. In this paper, we realize a 60 ns direct CNOT-gate on two inductively-coupled fluxonium qubits using selective darkening approach, resulting in a gate fidelity as high as 99.94%. The fidelity remains above 99.9% for 24 days without any recalibration between randomized benchmarking measurements. Compared with the 99.96% fidelity of a 60 ns identity gate, our data brings the investigation of the non-decoherence-related errors during gate operations down to $2 \times 10^{-4}$. The present result adds a simple and robust two-qubit gate into the still relatively small family of "the beyond three nines" demonstrations on superconducting qubits. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15450 [pdf, other]

Verifying the analogy between transversely coupled spin-1/2 systems and inductively-coupled fluxoniums

Authors: Wei-Ju Lin, Hyunheung Cho, Yinqi Chen, Maxim G. Vavilov, Chen Wang, Vladimir E. Manucharyan

Abstract: We report a detailed characterization of two inductively coupled superconducting fluxonium qubits for implementing high-fidelity cross-resonance gates. Our circuit stands out because it behaves very closely to the case of two transversely coupled spin-1/2 systems. In particular, the generally unwanted static ZZ-term due to the non-computational transitions is nearly absent despite a strong qubit-q… ▽ More We report a detailed characterization of two inductively coupled superconducting fluxonium qubits for implementing high-fidelity cross-resonance gates. Our circuit stands out because it behaves very closely to the case of two transversely coupled spin-1/2 systems. In particular, the generally unwanted static ZZ-term due to the non-computational transitions is nearly absent despite a strong qubit-qubit hybridization. Spectroscopy of the non-computational transitions reveals a spurious LC-mode arising from the combination of the coupling inductance and the capacitive links between the terminals of the two qubit circuits. Such a mode has a minor effect on our specific device, but it must be carefully considered for optimizing future designs. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.14829 [pdf, other]

Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct dataset and baseline model respectively. In total, 32 competing teams register for the challenge, from which we received 11 successful submissions. In this paper, we will present the results of the challenge and a summary of the systems, highlighting commonalities and innovations among participating systems. Datasets and baseline models of the AI-Debater 2023 Challenge have been already released and can be accessed through the official website of the challenge. △ Less

Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.14697 [pdf, ps, other]

Single-proton removal reaction in the IQMD+GEMINI model benchmarked by elemental fragmentation cross sections of $^{29-33}\mathrm{Si}$ on carbon at $\sim$230~MeV/nucleon

Authors: Guang-Shuai Li, Jun Su, Satoru Terashima, Jian-Wei Zhao, Er-Xi Xiao, Ji-Chao Zhang, Liu-Chun He, Ge Guo, Wei-Ping Lin, Wen-Jian Lin, Chuan-Ye Liu, Chen-Gui Lu, Bo Mei, Dan-Yang Pang, Ye-Lei Sun, Zhi-Yu Sun, Meng Wang, Feng Wang, Jing Wang, Shi-Tao Wang, Xiu-Lin Wei, Xiao-Dong Xu, Jun-Yao Xu, Li-Hua Zhu, Yong Zheng , et al. (2 additional authors not shown)

Abstract: We report on the first measurement of the elemental fragmentation cross sections (EFCSs) of $^{29-33}\mathrm{Si}$ on a carbon target at $\sim$230~MeV/nucleon. The experimental data covering charge changes of $ΔZ$ = 1-4 are reproduced well by the isospin-dependent quantum molecular dynamics (IQMD) coupled with the evaporation GEMINI (IQMD+GEMINI) model. We further explore the mechanisms underlying… ▽ More We report on the first measurement of the elemental fragmentation cross sections (EFCSs) of $^{29-33}\mathrm{Si}$ on a carbon target at $\sim$230~MeV/nucleon. The experimental data covering charge changes of $ΔZ$ = 1-4 are reproduced well by the isospin-dependent quantum molecular dynamics (IQMD) coupled with the evaporation GEMINI (IQMD+GEMINI) model. We further explore the mechanisms underlying the single-proton removal reaction in this model framework. We conclude that the cross sections from direct proton knockout exhibit a overall weak dependence on the mass number of $\mathrm{Si}$ projectiles. The proton evaporation induced after the projectile excitation significantly affects the cross sections for neutron-deficient $\mathrm{Si}$ isotopes, while neutron evaporation plays a crucial role in the reactions of neutron-rich $\mathrm{Si}$ isotopes. It is presented that the relative magnitude of one-proton and one-neutron separation energies is an essential factor that influences evaporation processes. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 7 pages, 4 figures

arXiv:2407.13664 [pdf, other]

Decision Focused Causal Learning for Direct Counterfactual Marketing Optimization

Authors: Hao Zhou, Rongxiao Huang, Shaoming Li, Guibin Jiang, Jiaqi Zheng, Bing Cheng, Wei Lin

Abstract: Marketing optimization plays an important role to enhance user engagement in online Internet platforms. Existing studies usually formulate this problem as a budget allocation problem and solve it by utilizing two fully decoupled stages, i.e., machine learning (ML) and operation research (OR). However, the learning objective in ML does not take account of the downstream optimization task in OR, whi… ▽ More Marketing optimization plays an important role to enhance user engagement in online Internet platforms. Existing studies usually formulate this problem as a budget allocation problem and solve it by utilizing two fully decoupled stages, i.e., machine learning (ML) and operation research (OR). However, the learning objective in ML does not take account of the downstream optimization task in OR, which causes that the prediction accuracy in ML may be not positively related to the decision quality. Decision Focused Learning (DFL) integrates ML and OR into an end-to-end framework, which takes the objective of the downstream task as the decision loss function and guarantees the consistency of the optimization direction between ML and OR. However, deploying DFL in marketing is non-trivial due to multiple technological challenges. Firstly, the budget allocation problem in marketing is a 0-1 integer stochastic programming problem and the budget is uncertain and fluctuates a lot in real-world settings, which is beyond the general problem background in DFL. Secondly, the counterfactual in marketing causes that the decision loss cannot be directly computed and the optimal solution can never be obtained, both of which disable the common gradient-estimation approaches in DFL. Thirdly, the OR solver is called frequently to compute the decision loss during model training in DFL, which produces huge computational cost and cannot support large-scale training data. In this paper, we propose a decision focused causal learning framework (DFCL) for direct counterfactual marketing optimization, which overcomes the above technological challenges. Both offline experiments and online A/B testing demonstrate the effectiveness of DFCL over the state-of-the-art methods. Currently, DFCL has been deployed in several marketing scenarios in Meituan, one of the largest online food delivery platform in the world. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted by KDD 2024

arXiv:2407.13274 [pdf, other]

Aligning Explanations for Recommendation with Rating and Feature via Maximizing Mutual Information

Authors: Yurou Zhao, Yiding Sun, Ruidong Han, Fei Jiang, Lu Guan, Xiang Li, Wei Lin, Weizhi Ma, Jiaxin Mao

Abstract: Providing natural language-based explanations to justify recommendations helps to improve users' satisfaction and gain users' trust. However, as current explanation generation methods are commonly trained with an objective to mimic existing user reviews, the generated explanations are often not aligned with the predicted ratings or some important features of the recommended items, and thus, are su… ▽ More Providing natural language-based explanations to justify recommendations helps to improve users' satisfaction and gain users' trust. However, as current explanation generation methods are commonly trained with an objective to mimic existing user reviews, the generated explanations are often not aligned with the predicted ratings or some important features of the recommended items, and thus, are suboptimal in helping users make informed decision on the recommendation platform. To tackle this problem, we propose a flexible model-agnostic method named MMI (Maximizing Mutual Information) framework to enhance the alignment between the generated natural language explanations and the predicted rating/important item features. Specifically, we propose to use mutual information (MI) as a measure for the alignment and train a neural MI estimator. Then, we treat a well-trained explanation generation model as the backbone model and further fine-tune it through reinforcement learning with guidance from the MI estimator, which rewards a generated explanation that is more aligned with the predicted rating or a pre-defined feature of the recommended item. Experiments on three datasets demonstrate that our MMI framework can boost different backbone models, enabling them to outperform existing baselines in terms of alignment with predicted ratings and item features. Additionally, user studies verify that MI-enhanced explanations indeed facilitate users' decisions and are favorable compared with other baselines due to their better alignment properties. △ Less

Submitted 20 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: This paper has been accepted by cikm2024, and the code repository will be updated soon

arXiv:2407.10648 [pdf, other]

Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Authors: Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, Weiyao Lin

Abstract: Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulatio… ▽ More Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulation using a simple point-mass physics model and a depth rendering engine. Despite this simplicity, our method excels in challenging tasks for both multi-agent and single-agent applications with zero-shot sim-to-real transfer. In multi-agent scenarios, our system demonstrates self-organized behavior, enabling autonomous coordination without communication or centralized planning - an achievement not seen in existing traditional or learning-based methods. In single-agent scenarios, our system achieves a 90% success rate in navigating through complex environments, significantly surpassing the 60% success rate of the previous state-of-the-art approach. Our system can operate without state estimation and adapt to dynamic obstacles. In real-world forest environments, it navigates at speeds up to 20 m/s, doubling the speed of previous imitation learning-based solutions. Notably, all these capabilities are deployed on a budget-friendly $21 computer, costing less than 5% of a GPU-equipped board used in existing systems. Video demonstrations are available at https://youtu.be/LKg9hJqc2cc. △ Less

Submitted 15 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10199 [pdf, other]

Charge radii of $^{11-16}$C, $^{13-17}$N and $^{15-18}$O determined from their charge-changing cross-sections and the mirror-difference charge radii

Authors: J. W. Zhao, B. -H. Sun, I. Tanihata, J. Y. Xu, K. Y. Zhang, A. Prochazka, L. H. Zhu, S. Terashima, J. Meng, L. C. He, C. Y. Liu, G. S. Li, C. G. Lu, W. J. Lin, W. P. Lin, Z. Liu, P. P Ren, Z. Y. Sun, F. Wang, J. Wang, M. Wang, S. T. Wang, X. L. Wei, X. D. Xu, J. C. Zhang , et al. (2 additional authors not shown)

Abstract: Charge-changing cross-sections of $^{11-16}$C, $^{13-17}$N and $^{15-18}$O on a carbon target have been determined at energies around 300 MeV/nucleon. A nucleon separation energy dependent correction factor has been introduced to the Glauber model calculation for extracting the nuclear charge radii from the experimental CCCSs. The charge radii of $^{11}$C, $^{13,16}$N and $^{15}$O thus were determ… ▽ More Charge-changing cross-sections of $^{11-16}$C, $^{13-17}$N and $^{15-18}$O on a carbon target have been determined at energies around 300 MeV/nucleon. A nucleon separation energy dependent correction factor has been introduced to the Glauber model calculation for extracting the nuclear charge radii from the experimental CCCSs. The charge radii of $^{11}$C, $^{13,16}$N and $^{15}$O thus were determined for the first time. With the new radii, we studied the experimental mirror-difference charge radii ($ΔR_{\text {ch}}^{\text {mirror}}$) of $^{11}$B-$^{11}$C, $^{13}$C-$^{13}$N, $^{15}$N-$^{15}$O, $^{17}$N-$^{17}$Ne pairs for the first time. We find that the $ΔR_{\text {ch}}^{\text {mirror}}$, including both bound and weakly bound proton-rich mirror partners, are reproduced by the empirical relation to the isospin asymmetry predicted by the $ab$ $initio$ calculations. △ Less

Submitted 4 August, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

Comments: 3 figures, submitted to Physics Letters B

arXiv:2407.06766 [pdf, other]

Relational Perspective on Graph Query Languages

Authors: Diego Figueira, Anthony W. Lin, Liat Peterfreund

Abstract: We study a relational perspective of graph database querying. Such a perspective underlies various graph database systems but very few theoretical investigations have been conducted on it. This perspective offers a powerful and unified framework to study graph database querying, by which algorithms and complexity follow from classical results. We provide two concrete applications. The first is q… ▽ More We study a relational perspective of graph database querying. Such a perspective underlies various graph database systems but very few theoretical investigations have been conducted on it. This perspective offers a powerful and unified framework to study graph database querying, by which algorithms and complexity follow from classical results. We provide two concrete applications. The first is querying property graphs. The property graph data model supersedes previously proposed graph models and underlies the new standard GQL for graph query languages. We show that this standard can be, by and large, expressed by extensions of relational calculus with transitive closure operators (FO[TC]) and existential second-order quantifiers (ESO). With this, we obtain optimal data complexity bounds, along with extensions including schema validation. The second application is incorporating data from concrete domains (e.g., numbers) in graph database querying. We use embedded finite model theory and, by exploiting a generic Restricted Quantifier Collapse (RQC) result for FO[TC] and ESO, we obtain optimal data complexity bounds for GQL with arithmetics and comparisons. Moreover, we show that Regular Data Path Querying with operations on data (i.e. using register automata formalisms) can be captured in FO[TC] over embedded finite graphs while preserving nondeterministic logspace data complexity. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05382 [pdf, other]

Rethinking Unsupervised Outlier Detection via Multiple Thresholding

Authors: Zhonghang Liu, Panzhong Lu, Guoyang Xie, Zhichao Lu, Wen-Yan Lin

Abstract: In the realm of unsupervised image outlier detection, assigning outlier scores holds greater significance than its subsequent task: thresholding for predicting labels. This is because determining the optimal threshold on non-separable outlier score functions is an ill-posed problem. However, the lack of predicted labels not only hiders some real applications of current outlier detectors but also c… ▽ More In the realm of unsupervised image outlier detection, assigning outlier scores holds greater significance than its subsequent task: thresholding for predicting labels. This is because determining the optimal threshold on non-separable outlier score functions is an ill-posed problem. However, the lack of predicted labels not only hiders some real applications of current outlier detectors but also causes these methods not to be enhanced by leveraging the dataset's self-supervision. To advance existing scoring methods, we propose a multiple thresholding (Multi-T) module. It generates two thresholds that isolate inliers and outliers from the unlabelled target dataset, whereas outliers are employed to obtain better feature representation while inliers provide an uncontaminated manifold. Extensive experiments verify that Multi-T can significantly improve proposed outlier scoring methods. Moreover, Multi-T contributes to a naive distance-based method being state-of-the-art. △ Less

Submitted 14 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

Showing 1–50 of 1,229 results for author: Lin, W