-
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection
Authors:
Mengya Hu,
Rui Xu,
Deren Lei,
Yaxi Li,
Mingyu Wang,
Emily Ching,
Eslam Kamal,
Alex Deng
Abstract:
Large language models (LLMs) are highly capable but face latency challenges in real-time applications, such as conducting online hallucination detection. To overcome this issue, we propose a novel framework that leverages a small language model (SLM) classifier for initial detection, followed by a LLM as constrained reasoner to generate detailed explanations for detected hallucinated content. This…
▽ More
Large language models (LLMs) are highly capable but face latency challenges in real-time applications, such as conducting online hallucination detection. To overcome this issue, we propose a novel framework that leverages a small language model (SLM) classifier for initial detection, followed by a LLM as constrained reasoner to generate detailed explanations for detected hallucinated content. This study optimizes the real-time interpretable hallucination detection by introducing effective prompting techniques that align LLM-generated explanations with SLM decisions. Empirical experiment results demonstrate its effectiveness, thereby enhancing the overall user experience.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
H. Amar,
P. Amedo,
J. Anderson,
C. Andreopoulos,
M. Andreotti
, et al. (1347 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the European Strategy for Particle Physics. While the construction of the DUNE Phase I is well underway, this White Paper focuses on DUNE Phase II planning. DUNE Phase-II consists of a third and fourth far detector (FD) module, an upgraded near detector complex, and an enhanced 2.1 MW beam. The fourth FD module is conceived as a "Module of Opportunity", aimed at expanding the physics opportunities, in addition to supporting the core DUNE science program, with more advanced technologies. This document highlights the increased science opportunities offered by the DUNE Phase II near and far detectors, including long-baseline neutrino oscillation physics, neutrino astrophysics, and physics beyond the standard model. It describes the DUNE Phase II near and far detector technologies and detector design concepts that are currently under consideration. A summary of key R&D goals and prototyping phases needed to realize the Phase II detector technical designs is also provided. DUNE's Phase II detectors, along with the increased beam power, will complete the full scope of DUNE, enabling a multi-decadal program of groundbreaking science with neutrinos.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Dissipation and Interaction-Controlled Non-Hermitian Skin Effects
Authors:
Yang Li,
Zhao-Fan Cai,
Tao Liu,
Franco Nori
Abstract:
Non-Hermitian skin effects (NHSEs) have recently been investigated extensively at the single-particle level. When many-body interactions become dominant, novel non-Hermitian physical phenomena can emerge. In this work, we theoretically study NHSEs controlled by dissipation and interaction. We consider a 1D zigzag Bose-Hubbard lattice, subject to magnetic flux, staggered onsite single-particle loss…
▽ More
Non-Hermitian skin effects (NHSEs) have recently been investigated extensively at the single-particle level. When many-body interactions become dominant, novel non-Hermitian physical phenomena can emerge. In this work, we theoretically study NHSEs controlled by dissipation and interaction. We consider a 1D zigzag Bose-Hubbard lattice, subject to magnetic flux, staggered onsite single-particle loss, and uniform onsite two-particle loss. When the two-particle loss is small, two-body bound eigenstates (i.e., doublons) are all localized at the same boundary due to the interplay of the magnetic flux and staggered single-particle loss. While, for strong two-particle loss, the localization direction of doublons is unexpectedly reversed. This is attributed to the effective strong nonreciprocal hopping of doublons contributing from the virtual second-order and third-order hopping processes of particle pairs in combination with the magnetic flux, the strong two-particle loss, and the many-body interaction. Moreover, a two-particle gain can induce the same skin-localization of doublons, which can be utilized to dynamically observe the NHSE and its reversal of doublons controlled by interactions. Our results open up a new avenue for exploring novel non-Hermitian phenomena in many-body systems.
△ Less
Submitted 24 August, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
Dataset | Mindset = Explainable AI | Interpretable AI
Authors:
Caesar Wu,
Rajkumar Buyya,
Yuan Fang Li,
Pascal Bouvry
Abstract:
We often use "explainable" Artificial Intelligence (XAI)" and "interpretable AI (IAI)" interchangeably when we apply various XAI tools for a given dataset to explain the reasons that underpin machine learning (ML) outputs. However, these notions can sometimes be confusing because interpretation often has a subjective connotation, while explanations lean towards objective facts. We argue that XAI i…
▽ More
We often use "explainable" Artificial Intelligence (XAI)" and "interpretable AI (IAI)" interchangeably when we apply various XAI tools for a given dataset to explain the reasons that underpin machine learning (ML) outputs. However, these notions can sometimes be confusing because interpretation often has a subjective connotation, while explanations lean towards objective facts. We argue that XAI is a subset of IAI. The concept of IAI is beyond the sphere of a dataset. It includes the domain of a mindset. At the core of this ambiguity is the duality of reasons, in which we can reason either outwards or inwards. When directed outwards, we want the reasons to make sense through the laws of nature. When turned inwards, we want the reasons to be happy, guided by the laws of the heart. While XAI and IAI share reason as the common notion for the goal of transparency, clarity, fairness, reliability, and accountability in the context of ethical AI and trustworthy AI (TAI), their differences lie in that XAI emphasizes the post-hoc analysis of a dataset, and IAI requires a priori mindset of abstraction. This hypothesis can be proved by empirical experiments based on an open dataset and harnessed by High-Performance Computing (HPC). The demarcation of XAI and IAI is indispensable because it would be impossible to determine regulatory policies for many AI applications, especially in healthcare, human resources, banking, and finance. We aim to clarify these notions and lay the foundation of XAI, IAI, EAI, and TAI for many practitioners and policymakers in future AI applications and research.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
BIPeC: A Combined Change-Point Analyzer to Identify Performance Regressions in Large-scale Database Systems
Authors:
Zhan Lyu,
Thomas Bach,
Yong Li,
Nguyen Minh Le,
Lars Hoemke
Abstract:
Performance testing in large-scale database systems like SAP HANA is a crucial yet labor-intensive task, involving extensive manual analysis of thousands of measurements, such as CPU time and elapsed time. Manual maintenance of these metrics is time-consuming and susceptible to human error, making early detection of performance regressions challenging. We address these issues by proposing an autom…
▽ More
Performance testing in large-scale database systems like SAP HANA is a crucial yet labor-intensive task, involving extensive manual analysis of thousands of measurements, such as CPU time and elapsed time. Manual maintenance of these metrics is time-consuming and susceptible to human error, making early detection of performance regressions challenging. We address these issues by proposing an automated approach to detect performance regressions in such measurements. Our approach integrates Bayesian inference with the Pruned Exact Linear Time (PELT) algorithm, enhancing the detection of change points and performance regressions with high precision and efficiency compared to previous approaches. Our method minimizes false negatives and ensures SAP HANA's system's reliability and performance quality. The proposed solution can accelerate testing and contribute to more sustainable performance management practices in large-scale data management environments.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Cell-ontology guided transcriptome foundation model
Authors:
Xinyu Yuan,
Zhihao Zhan,
Zuobai Zhang,
Manqi Zhou,
Jianan Zhao,
Boyu Han,
Yue Li,
Jian Tang
Abstract:
Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, whi…
▽ More
Transcriptome foundation models TFMs hold great promises of deciphering the transcriptomic language that dictate diverse cell functions by self-supervised learning on large-scale single-cell gene expression data, and ultimately unraveling the complex mechanisms of human diseases. However, current TFMs treat cells as independent samples and ignore the taxonomic relationships between cell types, which are available in cell ontology graphs. We argue that effectively leveraging this ontology information during the TFM pre-training can improve learning biologically meaningful gene co-expression patterns while preserving TFM as a general purpose foundation model for downstream zero-shot and fine-tuning tasks. To this end, we present \textbf{s}ingle \textbf{c}ell, \textbf{Cell}-\textbf{o}ntology guided TFM scCello. We introduce cell-type coherence loss and ontology alignment loss, which are minimized along with the masked gene expression prediction loss during the pre-training. The novel loss component guide scCello to learn the cell-type-specific representation and the structural relation between cell types from the cell ontology graph, respectively. We pre-trained scCello on 22 million cells from CellxGene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Basis-independent quantum coherence and its distribution under relativistic motion
Authors:
Ming-Ming Du,
Hong-Wei Li,
Zhen Tao,
Shu-Ting Shen,
Xiao-Jing Yan. Xi-Yun Li,
Wei Zhong,
Yu-Bo Sheng,
Lan Zhou
Abstract:
Recent studies have increasingly focused on the effect of relativistic motion on quantum coherence. Prior research predominantly examined the influence of relative motion on basis-dependent quantum coherence, underscoring its susceptibility to decoherence under accelerated conditions. Yet, the effect of relativistic motion on basis-independent quantum coherence, which is critical for understanding…
▽ More
Recent studies have increasingly focused on the effect of relativistic motion on quantum coherence. Prior research predominantly examined the influence of relative motion on basis-dependent quantum coherence, underscoring its susceptibility to decoherence under accelerated conditions. Yet, the effect of relativistic motion on basis-independent quantum coherence, which is critical for understanding the intrinsic quantum features of a system, remains an interesting open question. This paper addresses this question by examining how total, collective, and localized coherence are affected by acceleration and coupling strength. Our analysis reveals that both total and collective coherence significantly decrease with increasing acceleration and coupling strength, ultimately vanishing at high levels of acceleration. This underscores the profound impact of Unruh thermal noise. Conversely, localized coherence exhibits relative stability, decreasing to zero only under the extreme condition of infinite acceleration. Moreover, we demonstrate that collective, localized, and basis-independent coherence collectively satisfy the triangle inequality. These findings are crucial for enhancing our understanding of quantum information dynamics in environments subjected to high acceleration and offer valuable insights on the behavior of quantum coherence under relativistic conditions.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
MedDiT: A Knowledge-Controlled Diffusion Transformer Framework for Dynamic Medical Image Generation in Virtual Simulated Patient
Authors:
Yanzeng Li,
Cheng Zeng,
Jinchao Zhang,
Jie Zhou,
Lei Zou
Abstract:
Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversati…
▽ More
Medical education relies heavily on Simulated Patients (SPs) to provide a safe environment for students to practice clinical skills, including medical image analysis. However, the high cost of recruiting qualified SPs and the lack of diverse medical imaging datasets have presented significant challenges. To address these issues, this paper introduces MedDiT, a novel knowledge-controlled conversational framework that can dynamically generate plausible medical images aligned with simulated patient symptoms, enabling diverse diagnostic skill training. Specifically, MedDiT integrates various patient Knowledge Graphs (KGs), which describe the attributes and symptoms of patients, to dynamically prompt Large Language Models' (LLMs) behavior and control the patient characteristics, mitigating hallucination during medical conversation. Additionally, a well-tuned Diffusion Transformer (DiT) model is incorporated to generate medical images according to the specified patient attributes in the KG. In this paper, we present the capabilities of MedDiT through a practical demonstration, showcasing its ability to act in diverse simulated patient cases and generate the corresponding medical images. This can provide an abundant and interactive learning experience for students, advancing medical education by offering an immersive simulation platform for future healthcare professionals. The work sheds light on the feasibility of incorporating advanced technologies like LLM, KG, and DiT in education applications, highlighting their potential to address the challenges faced in simulated patient-based medical education.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Prescribing positive curvature with conical singularities on $\mathbb S^2$
Authors:
Jingyi Chen,
Yuxiang Li,
Yunqing Wu
Abstract:
For conformal metrics with conical singularities and positive curvature on $\mathbb S^2$, we prove a convergence theorem and apply it to obtain a criterion for nonexistence in an open region of the prescribing data. The core of our study is a fine analysis of the bubble trees and an area identity in the convergence process.
For conformal metrics with conical singularities and positive curvature on $\mathbb S^2$, we prove a convergence theorem and apply it to obtain a criterion for nonexistence in an open region of the prescribing data. The core of our study is a fine analysis of the bubble trees and an area identity in the convergence process.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Prescribing negative curvature with cusps and conical singularities on compact surface
Authors:
Jingyi Chen,
Yuxiang Li,
Yunqing Wu
Abstract:
On a compact surface, we prove existence and uniqueness of the conformal metric whose curvature is prescribed by a negative function away from finitely many points where the metric has prescribed angles presenting cusps or conical singularities.
On a compact surface, we prove existence and uniqueness of the conformal metric whose curvature is prescribed by a negative function away from finitely many points where the metric has prescribed angles presenting cusps or conical singularities.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Rebalancing Multi-Label Class-Incremental Learning
Authors:
Kaile Du,
Yifan Zhou,
Fan Lyu,
Yuyang Li,
Junzhou Xie,
Yixi Shen,
Fuyuan Hu,
Guangcan Liu
Abstract:
Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the t…
▽ More
Multi-label class-incremental learning (MLCIL) is essential for real-world multi-label applications, allowing models to learn new labels while retaining previously learned knowledge continuously. However, recent MLCIL approaches can only achieve suboptimal performance due to the oversight of the positive-negative imbalance problem, which manifests at both the label and loss levels because of the task-level partial label issue. The imbalance at the label level arises from the substantial absence of negative labels, while the imbalance at the loss level stems from the asymmetric contributions of the positive and negative loss parts to the optimization. To address the issue above, we propose a Rebalance framework for both the Loss and Label levels (RebLL), which integrates two key modules: asymmetric knowledge distillation (AKD) and online relabeling (OR). AKD is proposed to rebalance at the loss level by emphasizing the negative label learning in classification loss and down-weighting the contribution of overconfident predictions in distillation loss. OR is designed for label rebalance, which restores the original class distribution in memory by online relabeling the missing classes. Our comprehensive experiments on the PASCAL VOC and MS-COCO datasets demonstrate that this rebalancing strategy significantly improves performance, achieving new state-of-the-art results even with a vanilla CNN backbone.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM
Authors:
Zhaochen Su,
Jun Zhang,
Xiaoye Qu,
Tong Zhu,
Yanshu Li,
Jiashuo Sun,
Juntao Li,
Min Zhang,
Yu Cheng
Abstract:
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missin…
▽ More
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. Only a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge. However, a thorough assessment of knowledge conflict in LLMs is still missing. Motivated by this research gap, we present ConflictBank, the first comprehensive benchmark developed to systematically evaluate knowledge conflicts from three aspects: (i) conflicts encountered in retrieved knowledge, (ii) conflicts within the models' encoded knowledge, and (iii) the interplay between these conflict forms. Our investigation delves into four model families and twelve LLM instances, meticulously analyzing conflicts stemming from misinformation, temporal discrepancies, and semantic divergences. Based on our proposed novel construction framework, we create 7,453,853 claim-evidence pairs and 553,117 QA pairs. We present numerous findings on model scale, conflict causes, and conflict types. We hope our ConflictBank benchmark will help the community better understand model behavior in conflicts and develop more reliable LLMs.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results
Authors:
Maksim Smirnov,
Aleksandr Gushchin,
Anastasia Antsiferova,
Dmitry Vatolin,
Radu Timofte,
Ziheng Jia,
Zicheng Zhang,
Wei Sun,
Jiaying Qian,
Yuqin Cao,
Yinan Sun,
Yuxin Zhu,
Xiongkuo Min,
Guangtao Zhai,
Kanjar De,
Qing Luo,
Ao-Xiang Zhang,
Peng Zhang,
Haibo Lei,
Linyan Jiang,
Yaqing Li,
Wenhui Meng,
Xiaoheng Tan,
Haiqiang Wang,
Xiaozhong Xu
, et al. (11 additional authors not shown)
Abstract:
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat…
▽ More
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html.
△ Less
Submitted 28 August, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Parallel Speculative Decoding with Adaptive Draft Length
Authors:
Tianyu Liu,
Yun Li,
Qitan Lv,
Kai Liu,
Jianchen Zhu,
Winston Hu
Abstract:
Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice…
▽ More
Speculative decoding (SD), where an extra draft model is employed to provide multiple \textit{draft} tokens first and then the original target model verifies these tokens in parallel, has shown great power for LLM inference acceleration. However, existing SD methods suffer from the mutual waiting problem, i.e., the target model gets stuck when the draft model is \textit{guessing} tokens, and vice versa. This problem is directly incurred by the asynchronous execution of the draft model and the target model, and is exacerbated due to the fixed draft length in speculative decoding. To address these challenges, we propose a conceptually simple, flexible, and general framework to boost speculative decoding, namely \textbf{P}arallel sp\textbf{E}culative decoding with \textbf{A}daptive d\textbf{R}aft \textbf{L}ength (PEARL). Specifically, PEARL proposes \textit{pre-verify} to verify the first draft token in advance during the drafting phase, and \textit{post-verify} to generate more draft tokens during the verification phase. PEARL parallels the drafting phase and the verification phase via applying the two strategies, and achieves adaptive draft length for different scenarios, which effectively alleviates the mutual waiting problem. Moreover, we theoretically demonstrate that the mean accepted tokens of PEARL is more than existing \textit{draft-then-verify} works. Experiments on various text generation benchmarks demonstrate the effectiveness of our \name, leading to a superior speedup performance up to \textbf{3.79$\times$} and \textbf{1.52$\times$}, compared to auto-regressive decoding and vanilla speculative decoding, respectively.
△ Less
Submitted 4 September, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Authors:
Yinghao Aaron Li,
Xilin Jiang,
Jordan Darefsky,
Ge Zhu,
Nima Mesgarani
Abstract:
The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resou…
▽ More
The rapid advancement of large language models (LLMs) has significantly propelled the development of text-based chatbots, demonstrating their capability to engage in coherent and contextually relevant dialogues. However, extending these advancements to enable end-to-end speech-to-speech conversation bots remains a formidable challenge, primarily due to the extensive dataset and computational resources required. The conventional approach of cascading automatic speech recognition (ASR), LLM, and text-to-speech (TTS) models in a pipeline, while effective, suffers from unnatural prosody because it lacks direct interactions between the input audio and its transcribed text and the output audio. These systems are also limited by their inherent latency from the ASR process for real-time applications. This paper introduces Style-Talker, an innovative framework that fine-tunes an audio LLM alongside a style-based TTS model for fast spoken dialog generation. Style-Talker takes user input audio and uses transcribed chat history and speech styles to generate both the speaking style and text for the response. Subsequently, the TTS model synthesizes the speech, which is then played back to the user. While the response speech is being played, the input speech undergoes ASR processing to extract the transcription and speaking style, serving as the context for the ensuing dialogue turn. This novel pipeline accelerates the traditional cascade ASR-LLM-TTS systems while integrating rich paralinguistic information from input speech. Our experimental results show that Style-Talker significantly outperforms the conventional cascade and speech-to-speech baselines in terms of both dialogue naturalness and coherence while being more than 50% faster.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Editable Fairness: Fine-Grained Bias Mitigation in Language Models
Authors:
Ruizhe Chen,
Yichen Li,
Jianfei Yang,
Joey Tianyi Zhou,
Zuozhu Liu
Abstract:
Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable o…
▽ More
Generating fair and accurate predictions plays a pivotal role in deploying large language models (LLMs) in the real world. However, existing debiasing methods inevitably generate unfair or incorrect predictions as they are designed and evaluated to achieve parity across different social groups but leave aside individual commonsense facts, resulting in modified knowledge that elicits unreasonable or undesired predictions. In this paper, we first establish a new bias mitigation benchmark, BiaScope, which systematically assesses performance by leveraging newly constructed datasets and metrics on knowledge retention and generalization. Then, we propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases. FAST identifies the decisive layer responsible for storing social biases and then calibrates its outputs by integrating a small modular network, considering both bias mitigation and knowledge-preserving demands. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with superior debiasing performance while not compromising the overall model capability for knowledge retention and downstream predictions. This highlights the potential of fine-grained debiasing strategies to achieve fairness in LLMs. Code will be publicly available.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
AppAgent v2: Advanced Agent for Flexible Mobile Interactions
Authors:
Yanda Li,
Chi Zhang,
Wanqi Yang,
Bin Fu,
Pei Cheng,
Xin Chen,
Ling Chen,
Yunchao Wei
Abstract:
With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible actio…
▽ More
With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal agent framework for mobile devices. This framework, capable of navigating mobile devices, emulates human-like interactions. Our agent constructs a flexible action space that enhances adaptability across various applications including parser, text and vision descriptions. The agent operates through two main phases: exploration and deployment. During the exploration phase, functionalities of user interface elements are documented either through agent-driven or manual explorations into a customized structured knowledge base. In the deployment phase, RAG technology enables efficient retrieval and update from this knowledge base, thereby empowering the agent to perform tasks effectively and accurately. This includes performing complex, multi-step operations across various applications, thereby demonstrating the framework's adaptability and precision in handling customized task workflows. Our experimental results across various benchmarks demonstrate the framework's superior performance, confirming its effectiveness in real-world scenarios. Our code will be open source soon.
△ Less
Submitted 23 August, 2024; v1 submitted 5 August, 2024;
originally announced August 2024.
-
Variational autoencoder inverse mapper for extraction of Compton form factors: Benchmarks and conditional learning
Authors:
Fayaz Hossen,
Douglas Adams,
Joshua Bautista,
Yaohang Li,
Gia-Wei Chern,
Simonetta Liuti,
Marie Boer,
Marija Cuic,
Gari R. Goldstein,
Michael Engelhardt,
Huey-Wen Li
Abstract:
Deeply virtual exclusive scattering processes (DVES) serve as precise probes of nucleon quark and gluon distributions in coordinate space. These distributions are derived from generalized parton distributions (GPDs) via Fourier transform relative to proton momentum transfer. QCD factorization theorems enable DVES to be parameterized by Compton form factors (CFFs), which are convolutions of GPDs wi…
▽ More
Deeply virtual exclusive scattering processes (DVES) serve as precise probes of nucleon quark and gluon distributions in coordinate space. These distributions are derived from generalized parton distributions (GPDs) via Fourier transform relative to proton momentum transfer. QCD factorization theorems enable DVES to be parameterized by Compton form factors (CFFs), which are convolutions of GPDs with perturbatively calculable kernels. Accurate extraction of CFFs from DVCS, benefiting from interference with the Bethe-Heitler (BH) process and a simpler final state structure, is essential for inferring GPDs. This paper focuses on extracting CFFs from DVCS data using a variational autoencoder inverse mapper (VAIM) and its constrained variant (C-VAIM). VAIM is shown to be consistent with Markov Chain Monte Carlo (MCMC) methods in extracting multiple CFF solutions for given kinematics, while C-VAIM effectively captures correlations among CFFs across different kinematic values, providing more constrained solutions. This study represents a crucial first step towards a comprehensive analysis pipeline towards the extraction of GPDs.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Low-Light Object Tracking: A Benchmark
Authors:
Pengzhi Zhong,
Xiaoyu Guo,
Defeng Huang,
Xiaojun Peng,
Yian Li,
Qijun Zhao,
Shuiwang Li
Abstract:
In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low…
▽ More
In recent years, the field of visual tracking has made significant progress with the application of large-scale training datasets. These datasets have supported the development of sophisticated algorithms, enhancing the accuracy and stability of visual object tracking. However, most research has primarily focused on favorable illumination circumstances, neglecting the challenges of tracking in low-ligh environments. In low-light scenes, lighting may change dramatically, targets may lack distinct texture features, and in some scenarios, targets may not be directly observable. These factors can lead to a severe decline in tracking performance. To address this issue, we introduce LLOT, a benchmark specifically designed for Low-Light Object Tracking. LLOT comprises 269 challenging sequences with a total of over 132K frames, each carefully annotated with bounding boxes. This specially designed dataset aims to promote innovation and advancement in object tracking techniques for low-light conditions, addressing challenges not adequately covered by existing benchmarks. To assess the performance of existing methods on LLOT, we conducted extensive tests on 39 state-of-the-art tracking algorithms. The results highlight a considerable gap in low-light tracking performance. In response, we propose H-DCPT, a novel tracker that incorporates historical and darkness clue prompts to set a stronger baseline. H-DCPT outperformed all 39 evaluated methods in our experiments, demonstrating significant improvements. We hope that our benchmark and H-DCPT will stimulate the development of novel and accurate methods for tracking objects in low-light conditions. The LLOT and code are available at https://github.com/OpenCodeGithub/H-DCPT.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Enabling Small Models for Zero-Shot Classification through Model Label Learning
Authors:
Jia Zhang,
Zhi Zhou,
Lan-Zhe Guo,
Yu-Feng Li
Abstract:
Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot a…
▽ More
Vision-language models (VLMs) like CLIP have demonstrated impressive zero-shot ability in image classification tasks by aligning text and images but suffer inferior performance compared with task-specific expert models. On the contrary, expert models excel in their specialized domains but lack zero-shot ability for new tasks. How to obtain both the high performance of expert models and zero-shot ability is an important research direction. In this paper, we attempt to demonstrate that by constructing a model hub and aligning models with their functionalities using model labels, new tasks can be solved in a zero-shot manner by effectively selecting and reusing models in the hub. We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities through a Semantic Directed Acyclic Graph (SDAG) and leverages an algorithm, Classification Head Combination Optimization (CHCO), to select capable models for new tasks. Compared with the foundation model paradigm, it is less costly and more scalable, i.e., the zero-shot ability grows with the sizes of the model hub. Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL, demonstrating that expert models can be effectively reused for zero-shot tasks. Our code will be released publicly.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Authors:
Yili Li,
Jing Yu,
Keke Gai,
Bang Liu,
Gang Xiong,
Qi Wu
Abstract:
Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in nat…
▽ More
Current text-video retrieval methods mainly rely on cross-modal matching between queries and videos to calculate their similarity scores, which are then sorted to obtain retrieval results. This method considers the matching between each candidate video and the query, but it incurs a significant time cost and will increase notably with the increase of candidates. Generative models are common in natural language processing and computer vision, and have been successfully applied in document retrieval, but their application in multimodal retrieval remains unexplored. To enhance retrieval efficiency, in this paper, we introduce a model-based video indexer named T2VIndexer, which is a sequence-to-sequence generative model directly generating video identifiers and retrieving candidate videos with constant time complexity. T2VIndexer aims to reduce retrieval time while maintaining high accuracy. To achieve this goal, we propose video identifier encoding and query-identifier augmentation approaches to represent videos as short sequences while preserving their semantic information. Our method consistently enhances the retrieval efficiency of current state-of-the-art models on four standard datasets. It enables baselines with only 30\%-50\% of the original retrieval time to achieve better retrieval performance on MSR-VTT (+1.0%), MSVD (+1.8%), ActivityNet (+1.5%), and DiDeMo (+0.2%). The code is available at https://github.com/Lilidamowang/T2VIndexer-generativeSearch.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
AS-LIO: Spatial Overlap Guided Adaptive Sliding Window LiDAR-Inertial Odometry for Aggressive FOV Variation
Authors:
Tianxiang Zhang,
Xuanxuan Zhang,
Zongbo Liao,
Xin Xia,
You Li
Abstract:
LiDAR-Inertial Odometry (LIO) demonstrates outstanding accuracy and stability in general low-speed and smooth motion scenarios. However, in high-speed and intense motion scenarios, such as sharp turns, two primary challenges arise: firstly, due to the limitations of IMU frequency, the error in estimating significantly non-linear motion states escalates; secondly, drastic changes in the Field of Vi…
▽ More
LiDAR-Inertial Odometry (LIO) demonstrates outstanding accuracy and stability in general low-speed and smooth motion scenarios. However, in high-speed and intense motion scenarios, such as sharp turns, two primary challenges arise: firstly, due to the limitations of IMU frequency, the error in estimating significantly non-linear motion states escalates; secondly, drastic changes in the Field of View (FOV) may diminish the spatial overlap between LiDAR frame and pointcloud map (or between frames), leading to insufficient data association and constraint degradation.
To address these issues, we propose a novel Adaptive Sliding window LIO framework (AS-LIO) guided by the Spatial Overlap Degree (SOD). Initially, we assess the SOD between the LiDAR frames and the registered map, directly evaluating the adverse impact of current FOV variation on pointcloud alignment. Subsequently, we design an adaptive sliding window to manage the continuous LiDAR stream and control state updates, dynamically adjusting the update step according to the SOD. This strategy enables our odometry to adaptively adopt higher update frequency to precisely characterize trajectory during aggressive FOV variation, thus effectively reducing the non-linear error in positioning. Meanwhile, the historical constraints within the sliding window reinforce the frame-to-map data association, ensuring the robustness of state estimation. Experiments show that our AS-LIO framework can quickly perceive and respond to challenging FOV change, outperforming other state-of-the-art LIO frameworks in terms of accuracy and robustness.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Full-Duplex ISAC-Enabled D2D Underlaid Cellular Networks: Joint Transceiver Beamforming and Power Allocation
Authors:
Tao Jiang,
Ming Jin,
Qinghua Guo,
Yinhong Liu,
Yaming Li
Abstract:
Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to impro…
▽ More
Integrating device-to-device (D2D) communication into cellular networks can significantly reduce the transmission burden on base stations (BSs). Besides, integrated sensing and communication (ISAC) is envisioned as a key feature in future wireless networks. In this work, we consider a full-duplex ISAC- based D2D underlaid system, and propose a joint beamforming and power allocation scheme to improve the performance of the coexisting ISAC and D2D networks. To enhance spectral efficiency, a sum rate maximization problem is formulated for the full-duplex ISAC-based D2D underlaid system, which is non-convex. To solve the non-convex optimization problem, we propose a successive convex approximation (SCA)-based iterative algorithm and prove its convergence. Numerical results are provided to validate the effectiveness of the proposed scheme with the iterative algorithm, demonstrating that the proposed scheme outperforms state-of-the-art ones in both communication and sensing performance.
△ Less
Submitted 21 August, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Towards a first principles light-front Hamiltonian for the nucleon
Authors:
Siqi Xu,
Yiping Liu,
Chandan Mondal,
Jiangshan Lan,
Xingbo Zhao,
Yang Li,
James P. Vary
Abstract:
We solve the nucleon's wave functions from the eigenstates of the light-front quantum chromodynamics Hamiltonian for the first time, using a fully relativistic and nonperturbative approach based on light-front quantization, without an explicit confining potential. These eigenstates are determined for the three-quark, three-quark-gluon, and three-quark-quark-antiquark Fock representations, making t…
▽ More
We solve the nucleon's wave functions from the eigenstates of the light-front quantum chromodynamics Hamiltonian for the first time, using a fully relativistic and nonperturbative approach based on light-front quantization, without an explicit confining potential. These eigenstates are determined for the three-quark, three-quark-gluon, and three-quark-quark-antiquark Fock representations, making them suitable for low-resolution probes. From this, we calculate the nucleon's quark and gluon matter densities, helicity, and transversity distributions, which show qualitative consistency with experimental extractions. We also compute the contributions of quark and gluon helicity to the proton spin and the tensor charges. The obtained light-front wave functions represent a significant advancement towards a unified description of various hadron distribution functions in both longitudinal and transverse momentum space.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Microsatellite-based real-time quantum key distribution
Authors:
Yang Li,
Wen-Qi Cai,
Ji-Gang Ren,
Chao-Ze Wang,
Meng Yang,
Liang Zhang,
Hui-Ying Wu,
Liang Chang,
Jin-Cai Wu,
Biao Jin,
Hua-Jian Xue,
Xue-Jiao Li,
Hui Liu,
Guang-Wen Yu,
Xue-Ying Tao,
Ting Chen,
Chong-Fei Liu,
Wen-Bin Luo,
Jie Zhou,
Hai-Lin Yong,
Yu-Huai Li,
Feng-Zhi Li,
Cong Jiang,
Hao-Ze Chen,
Chao Wu
, et al. (16 additional authors not shown)
Abstract:
A quantum network provides an infrastructure connecting quantum devices with revolutionary computing, sensing, and communication capabilities. As the best-known application of a quantum network, quantum key distribution (QKD) shares secure keys guaranteed by the laws of quantum mechanics. A quantum satellite constellation offers a solution to facilitate the quantum network on a global scale. The M…
▽ More
A quantum network provides an infrastructure connecting quantum devices with revolutionary computing, sensing, and communication capabilities. As the best-known application of a quantum network, quantum key distribution (QKD) shares secure keys guaranteed by the laws of quantum mechanics. A quantum satellite constellation offers a solution to facilitate the quantum network on a global scale. The Micius satellite has verified the feasibility of satellite quantum communications, however, scaling up quantum satellite constellations is challenging, requiring small lightweight satellites, portable ground stations and real-time secure key exchange. Here we tackle these challenges and report the development of a quantum microsatellite capable of performing space-to-ground QKD using portable ground stations. The quantum microsatellite features a payload weighing approximately 23 kg, while the portable ground station weighs about 100 kg. These weights represent reductions by more than an order and two orders of magnitude, respectively, compared to the Micius satellite. Additionally, we multiplex bidirectional satellite-ground optical communication with quantum communication, enabling key distillation and secure communication in real-time. Using the microsatellite and the portable ground stations, we demonstrate satellite-based QKD with multiple ground stations and achieve the sharing of up to 0.59 million bits of secure keys during a single satellite pass. The compact quantum payload can be readily assembled on existing space stations or small satellites, paving the way for a satellite-constellation-based quantum and classical network for widespread real-life applications.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
GRANDlib: A simulation pipeline for the Giant Radio Array for Neutrino Detection (GRAND)
Authors:
GRAND Collaboration,
Rafael Alves Batista,
Aurélien Benoit-Lévy,
Teresa Bister,
Martina Bohacova,
Mauricio Bustamante,
Washington Carvalho,
Yiren Chen,
LingMei Cheng,
Simon Chiche,
Jean-Marc Colley,
Pablo Correa,
Nicoleta Cucu Laurenciu,
Zigao Dai,
Rogerio M. de Almeida,
Beatriz de Errico,
Sijbrand de Jong,
João R. T. de Mello Neto,
Krijn D. de Vries,
Valentin Decoene,
Peter B. Denton,
Bohao Duan,
Kaikai Duan,
Ralph Engel,
William Erba
, et al. (90 additional authors not shown)
Abstract:
The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challen…
▽ More
The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challenges. Its primary goal is to perform end-to-end simulations of the detector operation, from the interaction of ultra-high-energy particles, through -- by interfacing with external air-shower simulations -- the ensuing particle shower development and its radio emission, to its detection by antenna arrays and its processing by data-acquisition systems. Additionally, GRANDlib manages the visualization, storage, and retrieval of experimental and simulated data. We present an overview of GRANDlib to serve as the basis of future GRAND analyses.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Unveiling the jet angular broadening with $γ-$jet in high-energy nuclear collisions
Authors:
Sa Wang,
Yao Li,
Jin-Wen Kang,
Ben-Wei Zhang
Abstract:
Medium modification of jet substructure within the hot and dense nuclear matter has attracted enormous interest from the heavy-ion physics community in recent years. Measurements of inclusive jet show the angular narrowing in nucleus-nucleus collisions, while the recent CMS results of the photon-tagged jets ($γ-$jet) indicate hints of broadening. In this work, we conduct a theoretical study on the…
▽ More
Medium modification of jet substructure within the hot and dense nuclear matter has attracted enormous interest from the heavy-ion physics community in recent years. Measurements of inclusive jet show the angular narrowing in nucleus-nucleus collisions, while the recent CMS results of the photon-tagged jets ($γ-$jet) indicate hints of broadening. In this work, we conduct a theoretical study on the angular structure of inclusive jet and $γ-$jet with a transport approach considering the jet energy loss and the medium response in the quark-gluon plasma. We carry out the girth modification of $γ-$jet in $0-30\%$ PbPb collisions at $\sqrt{s_{NN}}=$ 5.02 TeV, which shows a satisfactory agreement with the recent CMS measurement. We explore the connection between the selection bias and the jet kinematics when choosing different $x_{jγ}=p_T^{\rm jet}/p_T^γ$ threshold. Importantly, we quantitatively demonstrate that $γ-$jet provides significant advantages to reduce the selection bias and can effectively collect jets sufficiently quenched in PbPb collisions compared to the inclusive jet, which is critical to capture the jet angular broadening observed by CMS. We further estimate the contributions of the medium-induced gluon radiation and the medium response to the broadening of the jet angular substructure.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
Authors:
Qi Ma,
Yue Li,
Bin Ren,
Nicu Sebe,
Ender Konukoglu,
Theo Gevers,
Luc Van Gool,
Danda Pani Paudel
Abstract:
3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, w…
▽ More
3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU.
We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \textbf{\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Revisiting the measurements and interpretations of DLVO forces
Authors:
Bo Feng,
Xiantang Liu,
Xinmin Liu,
Yingli Li,
Hang Li
Abstract:
The DLVO theory and electrical double layer (EDL) theory are the foundation of colloid and interface science. With the invention and development of surface forces apparatus (SFA) and atomic force microscope (AFM), the measurements and interpretations of DLVO forces (i.e., mainly measuring the EDL force (electrostatic force) FEDL and van der Waals force FvdW, and interpreting the potential ψ, charg…
▽ More
The DLVO theory and electrical double layer (EDL) theory are the foundation of colloid and interface science. With the invention and development of surface forces apparatus (SFA) and atomic force microscope (AFM), the measurements and interpretations of DLVO forces (i.e., mainly measuring the EDL force (electrostatic force) FEDL and van der Waals force FvdW, and interpreting the potential ψ, charge density σ, and Hamaker constant H) can be greatly facilitated by various surface force measurement techniques, and would have been very promising in advancing the DLVO theory, EDL theory, and colloid and interface science. However, although numerous studies have been conducted, pervasive anomalous results can be identified throughout the literature, main including: (1) the fitted ψ/σ is normally extremely small (ψ can be close to or (much) smaller than ψζ (zeta potential)) and varies greatly; (2) the fitted ψ/σ can exceed the allowable range of calculation; and (3) the measured FvdW and the fitted H vary greatly. Based on rigorous and comprehensive arguments, we have reasonably explained the pervasive anomalous results in the literature and further speculated that, the pervasive anomalous results are existing but not noticed and questioned owing to the two important aspects: (1) the pervasive unreasonable understandings of EDL theory and (2) the commonly neglected systematic errors. Consequently, we believe that the related studies have been seriously hampered. We therefore call for re-examination and re-analysis of related experimental results and theoretical understandings by careful consideration of the EDL theory and systematic errors. On these bases, we can interpret the experimental results properly and promote the development of EDL theory, colloid and interface science, and many related fields.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Authors:
Xin Qi,
Ruibo Fu,
Zhengqi Wen,
Jianhua Tao,
Shuchen Shi,
Yi Lu,
Zhiyong Wang,
Xiaopeng Wang,
Yuankun Xie,
Yukun Liu,
Guanjun Li,
Xuefei Liu,
Yongwei Li
Abstract:
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t…
▽ More
In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into the pre-introduced conditional parts of the speech models. This fixes the position of LoRA, limiting the flexibility and scalability of its application. Therefore, we propose the Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech (EELE) method. Starting from a general neutral speech model, we do not pre-introduce emotional information but instead use the LoRA plugin to design a flexible adaptive scheme that endows the model with emotional generation capabilities. Specifically, we initially train the model using only neutral speech data. After training is complete, we insert LoRA into different modules and fine-tune the model with emotional speech data to find the optimal insertion scheme. Through experiments, we compare and test the effects of inserting LoRA at different positions within the model and assess LoRA's ability to learn various emotions, effectively proving the validity of our method. Additionally, we explore the impact of the rank size of LoRA and the difference compared to directly fine-tuning the entire model.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
A Noval Feature via Color Quantisation for Fake Audio Detection
Authors:
Zhiyong Wang,
Xiaopeng Wang,
Yuankun Xie,
Ruibo Fu,
Zhengqi Wen,
Jianhua Tao,
Yukun Liu,
Guanjun Li,
Xin Qi,
Yi Lu,
Xuefei Liu,
Yongwei Li
Abstract:
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model…
▽ More
In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model distinguish fake audio. However, the disadvantage lies in poor interpretability, meaning it is hard to intuitively present the differences between deepfake and real audio. This paper proposes a noval feature extraction method via color quantisation which constrains the reconstruction to use a limited number of colors for the spectral image-like input. The proposed method ensures reconstructed input differs from the original, which allows for intuitive observation of the focus areas in the spectral reconstruction. Experiments conducted on the ASVspoof2019 dataset demonstrate that the proposed method achieves better classification performance compared to using the original spectral as input and pretraining the recolor network can also benefit the fake audio detection.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Adversarial Attack for Explanation Robustness of Rationalization Models
Authors:
Yuankai Zhang,
Lingxiao Kong,
Haozhao Wang,
Ruixuan Li,
Jun Wang,
Yuhua Li,
Wei Liu
Abstract:
Rationalization models, which select a subset of input text as rationale-crucial for humans to understand and trust predictions-have recently emerged as a prominent research area in eXplainable Artificial Intelligence. However, most of previous studies mainly focus on improving the quality of the rationale, ignoring its robustness to malicious attack. Specifically, whether the rationalization mode…
▽ More
Rationalization models, which select a subset of input text as rationale-crucial for humans to understand and trust predictions-have recently emerged as a prominent research area in eXplainable Artificial Intelligence. However, most of previous studies mainly focus on improving the quality of the rationale, ignoring its robustness to malicious attack. Specifically, whether the rationalization models can still generate high-quality rationale under the adversarial attack remains unknown. To explore this, this paper proposes UAT2E, which aims to undermine the explainability of rationalization models without altering their predictions, thereby eliciting distrust in these models from human users. UAT2E employs the gradient-based search on triggers and then inserts them into the original input to conduct both the non-target and target attack. Experimental results on five datasets reveal the vulnerability of rationalization models in terms of explanation, where they tend to select more meaningless tokens under attacks. Based on this, we make a series of recommendations for improving rationalization models in terms of explanation.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection
Authors:
Tri Cao,
Chengyu Huang,
Yuexin Li,
Huilin Wang,
Amy He,
Nay Oo,
Bryan Hooi
Abstract:
Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also encounter notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowl…
▽ More
Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also encounter notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowledge bases with Multimodal Large Language Models (MLLMs). This combination leads to broader brand coverage, which enhances brand recognition and recall. Furthermore, we propose a multimodal information retrieval framework designed to extract the top k relevant items from offline knowledge bases, utilizing all available information from a webpage, including logos, HTML, and URLs. Our empirical results, based on three real-world datasets, demonstrate that the proposed framework significantly enhances detection accuracy and reduces both false positives and false negatives, while maintaining model efficiency. Additionally, PhishAgent shows strong resilience against various types of adversarial attacks.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning
Authors:
Deyu Li,
Longfei Xiao,
Handi Wei,
Yan Li,
Binghua Zhang
Abstract:
The accurate measurement of the wave field and its spatiotemporal evolution is essential in many hydrodynamic experiments and engineering applications. The binocular stereo imaging technique has been widely used to measure waves. However, the optical properties of indoor water surfaces, including transparency, specular reflection, and texture absence, pose challenges for image processing and stere…
▽ More
The accurate measurement of the wave field and its spatiotemporal evolution is essential in many hydrodynamic experiments and engineering applications. The binocular stereo imaging technique has been widely used to measure waves. However, the optical properties of indoor water surfaces, including transparency, specular reflection, and texture absence, pose challenges for image processing and stereo reconstruction. This study proposed a novel technique that combined thermal stereography and deep learning to achieve fully noncontact wave measurements. The optical imaging properties of water in the long-wave infrared spectrum were found to be suitable for stereo matching, effectively avoiding the issues in the visible-light spectrum. After capturing wave images using thermal stereo cameras, a reconstruction strategy involving deep learning techniques was proposed to improve stereo matching performance. A generative approach was employed to synthesize a dataset with ground-truth disparity from unannotated infrared images. This dataset was then fed to a pretrained stereo neural network for fine-tuning to achieve domain adaptation. Wave flume experiments were conducted to validate the feasibility and accuracy of the proposed technique. The final reconstruction results indicated great agreement and high accuracy with a mean bias of less than 2.1% compared with the measurements obtained using wave probes, suggesting that the novel technique effectively measures the spatiotemporal distribution of wave surface in hydrodynamic experiments.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Learning Instruction-Guided Manipulation Affordance via Large Models for Embodied Robotic Tasks
Authors:
Dayou Li,
Chenkun Zhao,
Shuo Yang,
Lin Ma,
Yibin Li,
Wei Zhang
Abstract:
We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction…
▽ More
We study the task of language instruction-guided robotic manipulation, in which an embodied robot is supposed to manipulate the target objects based on the language instructions. In previous studies, the predicted manipulation regions of the target object typically do not change with specification from the language instructions, which means that the language perception and manipulation prediction are separate. However, in human behavioral patterns, the manipulation regions of the same object will change for different language instructions. In this paper, we propose Instruction-Guided Affordance Net (IGANet) for predicting affordance maps of instruction-guided robotic manipulation tasks by utilizing powerful priors from vision and language encoders pre-trained on large-scale datasets. We develop a Vison-Language-Models(VLMs)-based data augmentation pipeline, which can generate a large amount of data automatically for model training. Besides, with the help of Large-Language-Models(LLMs), actions can be effectively executed to finish the tasks defined by instructions. A series of real-world experiments revealed that our method can achieve better performance with generated data. Moreover, our model can generalize better to scenarios with unseen objects and language instructions.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Cores and weights of multipartitions and blocks of Ariki-Koike algebras
Authors:
Yanbo Li,
Kai Meng Tan
Abstract:
Let $e$ be an integer at least two. We define the $e$-core and the $e$-weight of a multipartition associated with a multicharge as the $e$-core and the $e$-weight of its image under the Uglov map. We do not place any restriction on the multicharge for these definitions. We show how these definitions lead to the definition of the $e$-core and the $e$-weight of a block of an Ariki-Koike algebra with…
▽ More
Let $e$ be an integer at least two. We define the $e$-core and the $e$-weight of a multipartition associated with a multicharge as the $e$-core and the $e$-weight of its image under the Uglov map. We do not place any restriction on the multicharge for these definitions. We show how these definitions lead to the definition of the $e$-core and the $e$-weight of a block of an Ariki-Koike algebra with quantum parameter $e$, and an analogue of Nakayama's `Conjecture' that classifies these blocks. Our definition of $e$-weight of such a block coincides with that first defined by Fayers. We further generalise the notion of a $[w:k]$-pair for Iwahori-Hecke algebra of type $A$ to the Ariki-Koike algebras, and obtain a sufficient condition for such a pair to be Scopes equivalent.
△ Less
Submitted 28 August, 2024; v1 submitted 20 August, 2024;
originally announced August 2024.
-
Vision Calorimeter for Anti-neutron Reconstruction: A Baseline
Authors:
Hongtian Yu,
Yangu Li,
Mingrui Wu,
Letian Shen,
Yue Liu,
Yunxuan Song,
Qixiang Ye,
Xiaorui Lyu,
Yajun Mao,
Yangheng Zheng,
Yunfan Liu
Abstract:
In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering…
▽ More
In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering the information of incident $\bar{n}$ insufficiently. In this study, we introduce Vision Calorimeter (ViC), a baseline method for anti-neutron reconstruction that leverages deep learning detectors to analyze the implicit relationships between EMC responses and incident $\bar{n}$ characteristics. Our motivation lies in that energy distributions of $\bar{n}$ samples deposited in the EMC cell arrays embody rich contextual information. Converted to 2-D images, such contextual energy distributions can be used to predict the status of $\bar{n}$ ($i.e.$, incident position and momentum) through a deep learning detector along with pseudo bounding boxes and a specified training objective. Experimental results demonstrate that ViC substantially outperforms the conventional reconstruction approach, reducing the prediction error of incident position by 42.81% (from 17.31$^{\circ}$ to 9.90$^{\circ}$). More importantly, this study for the first time realizes the measurement of incident $\bar{n}$ momentum, underscoring the potential of deep learning detectors for particle reconstruction. Code is available at https://github.com/yuhongtian17/ViC.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Where to Fetch: Extracting Visual Scene Representation from Large Pre-Trained Models for Robotic Goal Navigation
Authors:
Yu Li,
Dayou Li,
Chenkun Zhao,
Ruifeng Wang,
Ran Song,
Wei Zhang
Abstract:
To complete a complex task where a robot navigates to a goal object and fetches it, the robot needs to have a good understanding of the instructions and the surrounding environment. Large pre-trained models have shown capabilities to interpret tasks defined via language descriptions. However, previous methods attempting to integrate large pre-trained models with daily tasks are not competent in ma…
▽ More
To complete a complex task where a robot navigates to a goal object and fetches it, the robot needs to have a good understanding of the instructions and the surrounding environment. Large pre-trained models have shown capabilities to interpret tasks defined via language descriptions. However, previous methods attempting to integrate large pre-trained models with daily tasks are not competent in many robotic goal navigation tasks due to poor understanding of the environment. In this work, we present a visual scene representation built with large-scale visual language models to form a feature representation of the environment capable of handling natural language queries. Combined with large language models, this method can parse language instructions into action sequences for a robot to follow, and accomplish goal navigation with querying the scene representation. Experiments demonstrate that our method enables the robot to follow a wide range of instructions and complete complex goal navigation tasks.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Generative Diffusion Models for High Dimensional Channel Estimation
Authors:
Xingyu Zhou,
Le Liang,
Jing Zhang,
Peiwen Jiang,
Yong Li,
Shi Jin
Abstract:
Along with the prosperity of generative artificial intelligence (AI), its potential for solving conventional challenges in wireless communications has also surfaced. Inspired by this trend, we investigate the application of the advanced diffusion models (DMs), a representative class of generative AI models, to high dimensional wireless channel estimation. By capturing the structure of multiple-inp…
▽ More
Along with the prosperity of generative artificial intelligence (AI), its potential for solving conventional challenges in wireless communications has also surfaced. Inspired by this trend, we investigate the application of the advanced diffusion models (DMs), a representative class of generative AI models, to high dimensional wireless channel estimation. By capturing the structure of multiple-input multiple-output (MIMO) wireless channels via a deep generative prior encoded by DMs, we develop a novel posterior inference method for channel reconstruction. We further adapt the proposed method to recover channel information from low-resolution quantized measurements. Additionally, to enhance the over-the-air viability, we integrate the DM with the unsupervised Stein's unbiased risk estimator to enable learning from noisy observations and circumvent the requirements for ground truth channel data that is hardly available in practice. Results reveal that the proposed estimator achieves high-fidelity channel recovery while reducing estimation latency by a factor of 10 compared to state-of-the-art schemes, facilitating real-time implementation. Moreover, our method outperforms existing estimators while reducing the pilot overhead by half, showcasing its scalability to ultra-massive antenna arrays.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Interplay of Quantum Resources in Nonlocality Tests
Authors:
Hai-Hao Dong,
Yuwei Zhu,
Su-Yi Cheng,
Xingjian Zhang,
Cheng-Long Li,
Ying-Zhao Li,
Hao Li,
Lixing You,
Xiongfeng Ma,
Qiang Zhang,
Jian-Wei Pan
Abstract:
Nonlocality, evidenced by the violation of Bell inequalities, not only signifies entanglement but also highlights measurement incompatibility in quantum systems. Utilizing the generalized Clauser-Horne-Shimony-Holt (CHSH) Bell inequality, our high-efficiency optical setup achieves a loophole-free violation of $2.0132$. This result provides a device-independent lower bound on entanglement, quantifi…
▽ More
Nonlocality, evidenced by the violation of Bell inequalities, not only signifies entanglement but also highlights measurement incompatibility in quantum systems. Utilizing the generalized Clauser-Horne-Shimony-Holt (CHSH) Bell inequality, our high-efficiency optical setup achieves a loophole-free violation of $2.0132$. This result provides a device-independent lower bound on entanglement, quantified as the entanglement of formation at $0.0159$. Moreover, by tuning the parameters of the generalized Bell inequality, we enhance the estimation of measurement incompatibility, which is quantified by an effective overlap of $4.3883 \times 10^{-5}$. To explore the intricate interplay among nonlocality, entanglement, and measurement incompatibility, we generate mixed states, allowing for flexible modulation of entanglement via fast switching among the four Bell states using Pockels cells, achieving a fidelity above $99.10\%$. Intriguingly, our results reveal a counterintuitive relationship where increasing incompatibility initially boosts nonlocality but eventually leads to its reduction. Typically, maximal nonlocality does not coincide with maximal incompatibility. This experimental study sheds light on the optimal management of quantum resources for Bell-inequality-based quantum information processing.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Recognizing Beam Profiles from Silicon Photonics Gratings using Transformer Model
Authors:
Yu Dian Lim,
Hong Yu Li,
Simon Chun Kiat Goh,
Xiangyu Wang,
Peng Zhao,
Chuan Seng Tan
Abstract:
Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transf…
▽ More
Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transformer models to recognize the corresponding height categories of beam profiles of light from SiPh gratings. The model is trained using two techniques: (1) input patches, and (2) input sequence. For model trained with input patches, the model achieved recognition accuracy of 0.938. Meanwhile, model trained with input sequence shows lower accuracy of 0.895. However, when repeating the model-training 150 cycles, model trained with input patches shows inconsistent accuracy ranges between 0.445 to 0.959, while model trained with input sequence exhibit higher accuracy values between 0.789 to 0.936. The obtained outcomes can be expanded to various applications, including auto-focusing of light beam and auto-adjustment of z-axis stage to acquire desired beam profiles.
△ Less
Submitted 22 August, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Authors:
Aviv Bick,
Kevin Y. Li,
Eric P. Xing,
J. Zico Kolter,
Albert Gu
Abstract:
Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown promise, but have been pretrained with substantially less computational resources than the strongest Transformer models. In this work, we present a metho…
▽ More
Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown promise, but have been pretrained with substantially less computational resources than the strongest Transformer models. In this work, we present a method that is able to distill a pretrained Transformer architecture into alternative architectures such as state space models (SSMs). The key idea to our approach is that we can view both Transformers and SSMs as applying different forms of mixing matrices over the token sequences. We can thus progressively distill the Transformer architecture by matching different degrees of granularity in the SSM: first matching the mixing matrices themselves, then the hidden units at each block, and finally the end-to-end predictions. Our method, called MOHAWK, is able to distill a Mamba-2 variant based on the Phi-1.5 architecture (Phi-Mamba) using only 3B tokens and a hybrid version (Hybrid Phi-Mamba) using 5B tokens. Despite using less than 1% of the training data typically used to train models from scratch, Phi-Mamba boasts substantially stronger performance compared to all past open-source non-Transformer models. MOHAWK allows models like SSMs to leverage computational resources invested in training Transformer-based architectures, highlighting a new avenue for building such models.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
LoopSplat: Loop Closure by Registering 3D Gaussian Splats
Authors:
Liyuan Zhu,
Yue Li,
Erik Sandström,
Shengyu Huang,
Konrad Schindler,
Iro Armeni
Abstract:
Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS subm…
▽ More
Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. Code is available at loopsplat.github.io.
△ Less
Submitted 19 August, 2024; v1 submitted 19 August, 2024;
originally announced August 2024.
-
Finite dimensional 2-cyclic Jacobian algebras
Authors:
Yiyu Li,
Liangang Peng
Abstract:
In this paper, we start with a class of quivers containing only 2-cycles and loops, referred to as 2-cyclic quivers. We prove that there exists a potential on these quivers that ensures the resulting quiver with potential is Jacobian-finite. As an application, we first demonstrate through covering theory that a Jacobian-finite potential exists on a class of 2-acyclic quivers. Secondly, by using th…
▽ More
In this paper, we start with a class of quivers containing only 2-cycles and loops, referred to as 2-cyclic quivers. We prove that there exists a potential on these quivers that ensures the resulting quiver with potential is Jacobian-finite. As an application, we first demonstrate through covering theory that a Jacobian-finite potential exists on a class of 2-acyclic quivers. Secondly, by using the 2-cyclic Caldero-Chapoton formula defined on section 4.2, the $τ$-rigid modules obtained from the Jacobian algebras of our proven Jacobian-finite 2-cyclic quiver with potential can categorify Paquette-Schiffler's generalized cluster algebras in three specific cases: one for a disk with two marked points and one 3-puncture, one for a sphere with one puncture, one 3-puncture and one orbifold point, and another for a sphere with one puncture and two 3-punctures.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders
Authors:
Xuechao Chen,
Ying Chen,
Jialin Li,
Qiang Nie,
Yong Liu,
Qixing Huang,
Yang Li
Abstract:
3D pre-training is crucial to 3D perception tasks. However, limited by the difficulties in collecting clean 3D data, 3D pre-training consistently faced data scaling challenges. Inspired by semi-supervised learning leveraging limited labeled data and a large amount of unlabeled data, in this work, we propose a novel self-supervised pre-training framework utilizing the real 3D data and the pseudo-3D…
▽ More
3D pre-training is crucial to 3D perception tasks. However, limited by the difficulties in collecting clean 3D data, 3D pre-training consistently faced data scaling challenges. Inspired by semi-supervised learning leveraging limited labeled data and a large amount of unlabeled data, in this work, we propose a novel self-supervised pre-training framework utilizing the real 3D data and the pseudo-3D data lifted from images by a large depth estimation model. Another challenge lies in the efficiency. Previous methods such as Point-BERT and Point-MAE, employ k nearest neighbors to embed 3D tokens, requiring quadratic time complexity. To efficiently pre-train on such a large amount of data, we propose a linear-time-complexity token embedding strategy and a training-efficient 2D reconstruction target. Our method achieves state-of-the-art performance in 3D classification and few-shot learning while maintaining high pre-training and downstream fine-tuning efficiency.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype
Authors:
Yadong Lu,
Shitian Zhao,
Boxiang Yun,
Dongsheng Jiang,
Yin Li,
Qingli Li,
Yan Wang
Abstract:
Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zer…
▽ More
Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zero-shot capabilities, as well as the confusions caused by category-relatedness between domains. In this paper, we propose a simple yet effective solution: leveraging intra-domain category-aware prototypes for ODCL in CLIP (DPeCLIP), where the prototype is the key to bridging the above two processes. Concretely, we propose a training-free Task-ID discriminator method, by utilizing prototypes as classifiers for identifying Task-IDs. Furthermore, to maintain the knowledge corresponding to each domain, we incorporate intra-domain category-aware prototypes as domain prior prompts into the training process. Extensive experiments conducted on 11 different datasets demonstrate the effectiveness of our approach, achieving 2.37% and 1.14% average improvement in class-incremental and task-incremental settings, respectively.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Privacy Technologies for Financial Intelligence
Authors:
Yang Li,
Thilina Ranbaduge,
Kee Siong Ng
Abstract:
Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related…
▽ More
Financial crimes like terrorism financing and money laundering can have real impacts on society, including the abuse and mismanagement of public funds, increase in societal problems such as drug trafficking and illicit gambling with attendant economic costs, and loss of innocent lives in the case of terrorism activities. Complex financial crimes can be hard to detect primarily because data related to different pieces of the overall puzzle is usually distributed across a network of financial institutions, regulators, and law-enforcement agencies and they cannot be easily shared due to privacy constraints. Recent advances in Privacy-Preserving Data Matching and Machine Learning provide an opportunity for regulators and the financial industry to come together to solve the risk-discovery problem with technology. This paper provides a survey of the financial intelligence landscape and where opportunities lie for privacy technologies to improve the state-of-the-art in financial-crime detection.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Enhancing quantum phase synchronization through squeezed-reservoir engineering
Authors:
Xing Xiao,
Tian-Xiang Lu,
Wo-Jun Zhong,
Yan-Ling Li
Abstract:
We investigate the enhancement of quantum phase synchronization in a two-level system (TLS) coupled to a squeezed reservoir. Our study reveals that the squeezed reservoir induces a stable limit cycle in the TLS, enhancing the quantum phase synchronization. We utilize the Husimi $Q$-function to describe the phase portrait of the driven TLS, and the $S$-function to quantitatively illustrate the effe…
▽ More
We investigate the enhancement of quantum phase synchronization in a two-level system (TLS) coupled to a squeezed reservoir. Our study reveals that the squeezed reservoir induces a stable limit cycle in the TLS, enhancing the quantum phase synchronization. We utilize the Husimi $Q$-function to describe the phase portrait of the driven TLS, and the $S$-function to quantitatively illustrate the effects of signal strength and detuning on phase synchronization. Remarkably, we demonstrate that the squeezed reservoir imparts its squeezing characteristics to the TLS, leading to a more localized and pronounced synchronization. Additionally, we observe typical features of the Arnold tongue in the synchronization regions. The experimental feasibility of our findings is discussed in the context of a circuit QED system, suggesting that squeezed-reservoir engineering is an effective approach for achieving quantum phase synchronization.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Predicting Long-term Dynamics of Complex Networks via Identifying Skeleton in Hyperbolic Space
Authors:
Ruikun Li,
Huandong Wang,
Jinghua Piao,
Qingmin Liao,
Yong Li
Abstract:
Learning complex network dynamics is fundamental for understanding, modeling, and controlling real-world complex systems. Though great efforts have been made to predict the future states of nodes on networks, the capability of capturing long-term dynamics remains largely limited. This is because they overlook the fact that long-term dynamics in complex network are predominantly governed by their i…
▽ More
Learning complex network dynamics is fundamental for understanding, modeling, and controlling real-world complex systems. Though great efforts have been made to predict the future states of nodes on networks, the capability of capturing long-term dynamics remains largely limited. This is because they overlook the fact that long-term dynamics in complex network are predominantly governed by their inherent low-dimensional manifolds, i.e., skeletons. Therefore, we propose the Dynamics-Invariant Skeleton Neural Net}work (DiskNet), which identifies skeletons of complex networks based on the renormalization group structure in hyperbolic space to preserve both topological and dynamics properties. Specifically, we first condense complex networks with various dynamics into simple skeletons through physics-informed hyperbolic embeddings. Further, we design graph neural ordinary differential equations to capture the condensed dynamics on the skeletons. Finally, we recover the skeleton networks and dynamics to the original ones using a degree-based super-resolution module. Extensive experiments across three representative dynamics as well as five real-world and two synthetic networks demonstrate the superior performances of the proposed DiskNet, which outperforms the state-of-the-art baselines by an average of 10.18\% in terms of long-term prediction accuracy. Code for reproduction is available at: https://github.com/tsinghua-fib-lab/DiskNet.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
TDNetGen: Empowering Complex Network Resilience Prediction with Generative Augmentation of Topology and Dynamics
Authors:
Chang Liu,
Jingtao Ding,
Yiwen Song,
Yong Li
Abstract:
Predicting the resilience of complex networks, which represents the ability to retain fundamental functionality amidst external perturbations or internal failures, plays a critical role in understanding and improving real-world complex systems. Traditional theoretical approaches grounded in nonlinear dynamical systems rely on prior knowledge of network dynamics. On the other hand, data-driven appr…
▽ More
Predicting the resilience of complex networks, which represents the ability to retain fundamental functionality amidst external perturbations or internal failures, plays a critical role in understanding and improving real-world complex systems. Traditional theoretical approaches grounded in nonlinear dynamical systems rely on prior knowledge of network dynamics. On the other hand, data-driven approaches frequently encounter the challenge of insufficient labeled data, a predicament commonly observed in real-world scenarios. In this paper, we introduce a novel resilience prediction framework for complex networks, designed to tackle this issue through generative data augmentation of network topology and dynamics. The core idea is the strategic utilization of the inherent joint distribution present in unlabeled network data, facilitating the learning process of the resilience predictor by illuminating the relationship between network topology and dynamics. Experiment results on three network datasets demonstrate that our proposed framework TDNetGen can achieve high prediction accuracy up to 85%-95%. Furthermore, the framework still demonstrates a pronounced augmentation capability in extreme low-data regimes, thereby underscoring its utility and robustness in enhancing the prediction of network resilience. We have open-sourced our code in the following link, https://github.com/tsinghua-fib-lab/TDNetGen.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.