-
Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift
Authors:
Qingyuan Zeng,
Yunpeng Gong,
Min Jiang
Abstract:
Studying adversarial attacks on artificial intelligence (AI) systems helps discover model shortcomings, enabling the construction of a more robust system. Most existing adversarial attack methods only concentrate on single-task single-model or single-task cross-model scenarios, overlooking the multi-task characteristic of artificial intelligence systems. As a result, most of the existing attacks d…
▽ More
Studying adversarial attacks on artificial intelligence (AI) systems helps discover model shortcomings, enabling the construction of a more robust system. Most existing adversarial attack methods only concentrate on single-task single-model or single-task cross-model scenarios, overlooking the multi-task characteristic of artificial intelligence systems. As a result, most of the existing attacks do not pose a practical threat to a comprehensive and collaborative AI system. However, implementing cross-task attacks is highly demanding and challenging due to the difficulty in obtaining the real labels of different tasks for the same picture and harmonizing the loss functions across different tasks. To address this issue, we propose a self-supervised Cross-Task Attack framework (CTA), which utilizes co-attention and anti-attention maps to generate cross-task adversarial perturbation. Specifically, the co-attention map reflects the area to which different visual task models pay attention, while the anti-attention map reflects the area that different visual task models neglect. CTA generates cross-task perturbations by shifting the attention area of samples away from the co-attention map and closer to the anti-attention map. We conduct extensive experiments on multiple vision tasks and the experimental results confirm the effectiveness of the proposed design for adversarial attacks.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models
Authors:
Qingcheng Zeng,
Mingyu Jin,
Qinkai Yu,
Zhenting Wang,
Wenyue Hua,
Zihao Zhou,
Guangyan Sun,
Yanda Meng,
Shiqing Ma,
Qifan Wang,
Felix Juefei-Xu,
Kaize Ding,
Fan Yang,
Ruixiang Tang,
Yongfeng Zhang
Abstract:
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates…
▽ More
Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates the fragility of uncertainty estimation and explores potential attacks. We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output. Specifically, the proposed backdoor attack method can alter an LLM's output probability distribution, causing the probability distribution to converge towards an attacker-predefined distribution while ensuring that the top-1 prediction remains unchanged. Our experimental results demonstrate that this attack effectively undermines the model's self-evaluation reliability in multiple-choice questions. For instance, we achieved a 100 attack success rate (ASR) across three different triggering strategies in four models. Further, we investigate whether this manipulation generalizes across different prompts and domains. This work highlights a significant threat to the reliability of LLMs and underscores the need for future defenses against such attacks. The code is available at https://github.com/qcznlp/uncertainty_attack.
△ Less
Submitted 16 July, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Graph-Guided Test-Time Adaptation for Glaucoma Diagnosis using Fundus Photography
Authors:
Qian Zeng,
Le Zhang,
Yipeng Liu,
Ce Zhu,
Fan Zhang
Abstract:
Glaucoma is a leading cause of irreversible blindness worldwide. While deep learning approaches using fundus images have largely improved early diagnosis of glaucoma, variations in images from different devices and locations (known as domain shifts) challenge the use of pre-trained models in real-world settings. To address this, we propose a novel Graph-guided Test-Time Adaptation (GTTA) framework…
▽ More
Glaucoma is a leading cause of irreversible blindness worldwide. While deep learning approaches using fundus images have largely improved early diagnosis of glaucoma, variations in images from different devices and locations (known as domain shifts) challenge the use of pre-trained models in real-world settings. To address this, we propose a novel Graph-guided Test-Time Adaptation (GTTA) framework to generalize glaucoma diagnosis models to unseen test environments. GTTA integrates the topological information of fundus images into the model training, enhancing the model's transferability and reducing the risk of learning spurious correlation. During inference, GTTA introduces a novel test-time training objective to make the source-trained classifier progressively adapt to target patterns with reliable class conditional estimation and consistency regularization. Experiments on cross-domain glaucoma diagnosis benchmarks demonstrate the superiority of the overall framework and individual components under different backbone networks.
△ Less
Submitted 9 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Authors:
Yunheng Li,
ZhongYu Li,
Quansheng Zeng,
Qibin Hou,
Ming-Ming Cheng
Abstract:
Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual fea…
▽ More
Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual features weakens the zero-shot ability for novel classes. The large differences between the visual features from different layers make these features hard to align well with the text embeddings. We resolve this problem by introducing a series of independent decoders to align the multi-level visual features with the text embeddings in a cascaded way, forming a novel but simple framework named Cascade-CLIP. Our Cascade-CLIP is flexible and can be easily applied to existing zero-shot semantic segmentation methods. Experimental results show that our simple Cascade-CLIP achieves superior zero-shot performance on segmentation benchmarks, like COCO-Stuff, Pascal-VOC, and Pascal-Context. Our code is available at: https://github.com/HVision-NKU/Cascade-CLIP
△ Less
Submitted 6 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Large Language Models Can Self-Correct with Minimal Effort
Authors:
Zhenyu Wu,
Qingkai Zeng,
Zhihan Zhang,
Zhaoxuan Tan,
Chao Shen,
Meng Jiang
Abstract:
Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current…
▽ More
Intrinsic self-correct was a method that instructed large language models (LLMs) to verify and correct their responses without external feedback. Unfortunately, the study concluded that the LLMs could not self-correct reasoning yet. We find that a simple yet effective verification method can unleash inherent capabilities of the LLMs. That is to mask a key condition in the question, add the current response to construct a verification question, and predict the condition to verify the response. The condition can be an entity in an open-domain question or a numeric value in a math question, which requires minimal effort (via prompting) to identify. We propose an iterative verify-then-correct framework to progressively identify and correct (probably) false responses, named ProCo. We conduct experiments on three reasoning tasks. On average, ProCo, with GPT-3.5-Turbo as the backend LLM, yields $+6.8$ exact match on four open-domain question answering datasets, $+14.1$ accuracy on three arithmetic reasoning datasets, and $+9.6$ accuracy on a commonsense reasoning dataset, compared to Self-Correct.
△ Less
Submitted 23 June, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies
Authors:
Amine Marzouki,
Zhuxian Guo,
Qinghe Zeng,
Camille Kurtz,
Nicolas Loménie
Abstract:
Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategi…
▽ More
Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategic dataset augmentation strategies, includes novel biological upsampling and custom visual cohesion transformations tailored to the unique properties of tissue imagery, and enables to dramatically improve model performances. Our optimization reveals a pivotal realization: given intensive customization, standard computational pathology models can achieve high-capability biomarker development, without increasing the architectural complexity. We showcase the interest of this approach in the context of breast cancer where our strategies lead to good lymphocyte detection performances, echoing a broadly impactful paradigm shift. Furthermore, our data curation techniques enable crucial histological analysis benchmarks, highlighting improved generalizable potential.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
GAD-Generative Learning for HD Map-Free Autonomous Driving
Authors:
Weijian Sun,
Yanbo Jia,
Qi Zeng,
Zihao Liu,
Jiang Liao,
Yue Li,
Xianfeng Li
Abstract:
Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic progra…
▽ More
Deep-learning-based techniques have been widely adopted for autonomous driving software stacks for mass production in recent years, focusing primarily on perception modules, with some work extending this method to prediction modules. However, the downstream planning and control modules are still designed with hefty handcrafted rules, dominated by optimization-based methods such as quadratic programming or model predictive control. This results in a performance bottleneck for autonomous driving systems in that corner cases simply cannot be solved by enumerating hand-crafted rules. We present a deep-learning-based approach that brings prediction, decision, and planning modules together with the attempt to overcome the rule-based methods' deficiency in real-world applications of autonomous driving, especially for urban scenes. The DNN model we proposed is solely trained with 10 hours of human driver data, and it supports all mass-production ADAS features available on the market to date. This method is deployed onto a Jiyue test car with no modification to its factory-ready sensor set and compute platform. the feasibility, usability, and commercial potential are demonstrated in this article.
△ Less
Submitted 31 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Multi-View Subgraph Neural Networks: Self-Supervised Learning with Scarce Labeled Data
Authors:
Zhenzhong Wang,
Qingyuan Zeng,
Wanyu Lin,
Min Jiang,
Kay Chen Tan
Abstract:
While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not p…
▽ More
While graph neural networks (GNNs) have become the de-facto standard for graph-based node classification, they impose a strong assumption on the availability of sufficient labeled samples. This assumption restricts the classification performance of prevailing GNNs on many real-world applications suffering from low-data regimes. Specifically, features extracted from scarce labeled nodes could not provide sufficient supervision for the unlabeled samples, leading to severe over-fitting. In this work, we point out that leveraging subgraphs to capture long-range dependencies can augment the representation of a node with homophily properties, thus alleviating the low-data regime. However, prior works leveraging subgraphs fail to capture the long-range dependencies among nodes. To this end, we present a novel self-supervised learning framework, called multi-view subgraph neural networks (Muse), for handling long-range dependencies. In particular, we propose an information theory-based identification mechanism to identify two types of subgraphs from the views of input space and latent space, respectively. The former is to capture the local structure of the graph, while the latter captures the long-range dependencies among nodes. By fusing these two views of subgraphs, the learned representations can preserve the topological properties of the graph at large, including the local structure and long-range dependencies, thus maximizing their expressiveness for downstream node classification tasks. Experimental results show that Muse outperforms the alternative methods on node classification tasks with limited labeled data.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Authors:
Mingyu Jin,
Qinkai Yu,
Jingyuan Huang,
Qingcheng Zeng,
Zhenting Wang,
Wenyue Hua,
Haiyan Zhao,
Kai Mei,
Yanda Meng,
Kaize Ding,
Fan Yang,
Mengnan Du,
Yongfeng Zhang
Abstract:
Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are ty…
▽ More
Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the hypothesis that LLMs process concepts of varying complexities in different layers, introducing the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers. Specifically, we categorize concepts based on their level of abstraction, defining them in the order of increasing complexity within factual, emotional, and inferential tasks. We conduct extensive probing experiments using layer-wise representations across various LLM families (Gemma, LLaMA, QWen) on various datasets spanning the three domains of tasks. Our findings reveal that models could efficiently conduct probing for simpler tasks in shallow layers, and more complex tasks typically necessitate deeper layers for accurate understanding. Additionally, we examine how external factors, such as adding noise to the input and quantizing the model weights, might affect layer-wise representations. Our findings suggest that these factors can impede the development of a conceptual understanding of LLMs until deeper layers are explored. We hope that our proposed concept and experimental insights will enhance the understanding of the mechanisms underlying LLMs. Our codes are available at https://github.com/Luckfort/CD.
△ Less
Submitted 30 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Towards Atomic MIMO Receivers
Authors:
Mingyao Cui,
Qunsong Zeng,
Kaibin Huang
Abstract:
The advancement of Rydberg atoms in quantum sensing is driving a paradigm shift from classical receivers to atomic receivers. Capitalizing on the extreme sensitivity of Rydberg atoms to external disturbance, atomic receivers can measure radio-waves more precisely than classical receivers to support high-performance wireless communication and sensing. Although the atomic receiver is developing rapi…
▽ More
The advancement of Rydberg atoms in quantum sensing is driving a paradigm shift from classical receivers to atomic receivers. Capitalizing on the extreme sensitivity of Rydberg atoms to external disturbance, atomic receivers can measure radio-waves more precisely than classical receivers to support high-performance wireless communication and sensing. Although the atomic receiver is developing rapidly in quantum-sensing domain, its integration with wireless communications is still at a nascent stage. Particularly, systematic methods to enhance communication performance through this integration are largely uncharted. Motivated by this observation, we propose to incorporate atomic receivers into multiple-input-multiple-output (MIMO) communication to implement atomic-MIMO receivers. Specifically, we establish the framework of atomic-MIMO receivers exploiting the principle of quantum sensing, and reveal that its signal detection is intrinsically a non-linear biased phase-retrieval (PR) problem, as opposed to the linear model in classical MIMO systems. To this end, we modify the Gerchberg-Saxton (GS) algorithm, a typical PR solver, with a biased GS algorithm to solve the discovered biased PR problem. Moreover, we propose an Expectation-Maximization-GS (EM-GS) algorithm by introducing a high-pass filter constructed by Bessel functions into the iteration of GS, which improves the detection accuracy efficiently. Finally, the effectiveness of atomic MIMO receivers is demonstrated by theoretical analysis and numerical simulation.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques
Authors:
Rui Yang,
Haoran Liu,
Edison Marrese-Taylor,
Qingcheng Zeng,
Yu He Ke,
Wanxin Li,
Lechao Cheng,
Qingyu Chen,
James Caverlee,
Yutaka Matsuo,
Irene Li
Abstract:
Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking an…
▽ More
Large language models (LLMs) have demonstrated impressive generative capabilities with the potential to innovate in medicine. However, the application of LLMs in real clinical settings remains challenging due to the lack of factual consistency in the generated content. In this work, we develop an augmented LLM framework, KG-Rank, which leverages a medical knowledge graph (KG) along with ranking and re-ranking techniques, to improve the factuality of long-form question answering (QA) in the medical domain. Specifically, when receiving a question, KG-Rank automatically identifies medical entities within the question and retrieves the related triples from the medical KG to gather factual information. Subsequently, KG-Rank innovatively applies multiple ranking techniques to refine the ordering of these triples, providing more relevant and precise information for LLM inference. To the best of our knowledge, KG-Rank is the first application of KG combined with ranking models in medical QA specifically for generating long answers. Evaluation on four selected medical QA datasets demonstrates that KG-Rank achieves an improvement of over 18% in ROUGE-L score. Additionally, we extend KG-Rank to open domains, including law, business, music, and history, where it realizes a 14% improvement in ROUGE-L score, indicating the effectiveness and great potential of KG-Rank.
△ Less
Submitted 4 July, 2024; v1 submitted 9 March, 2024;
originally announced March 2024.
-
Consistent and Asymptotically Statistically-Efficient Solution to Camera Motion Estimation
Authors:
Guangyang Zeng,
Qingcheng Zeng,
Xinghan Li,
Biqiang Mu,
Jiming Chen,
Ling Shi,
Junfeng Wu
Abstract:
Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and no…
▽ More
Given 2D point correspondences between an image pair, inferring the camera motion is a fundamental issue in the computer vision community. The existing works generally set out from the epipolar constraint and estimate the essential matrix, which is not optimal in the maximum likelihood (ML) sense. In this paper, we dive into the original measurement model with respect to the rotation matrix and normalized translation vector and formulate the ML problem. We then propose a two-step algorithm to solve it: In the first step, we estimate the variance of measurement noises and devise a consistent estimator based on bias elimination; In the second step, we execute a one-step Gauss-Newton iteration on manifold to refine the consistent estimate. We prove that the proposed estimate owns the same asymptotic statistical properties as the ML estimate: The first is consistency, i.e., the estimate converges to the ground truth as the point number increases; The second is asymptotic efficiency, i.e., the mean squared error of the estimate converges to the theoretical lower bound -- Cramer-Rao bound. In addition, we show that our algorithm has linear time complexity. These appealing characteristics endow our estimator with a great advantage in the case of dense point correspondences. Experiments on both synthetic data and real images demonstrate that when the point number reaches the order of hundreds, our estimator outperforms the state-of-the-art ones in terms of estimation accuracy and CPU time.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
Anatomy-guided fiber trajectory distribution estimation for cranial nerves tractography
Authors:
Lei Xie,
Qingrun Zeng,
Huajun Zhou,
Guoqiang Xie,
Mingchu Li,
Jiahao Huang,
Jianan Cui,
Hao Chen,
Yuanjing Feng
Abstract:
Diffusion MRI tractography is an important tool for identifying and analyzing the intracranial course of cranial nerves (CNs). However, the complex environment of the skull base leads to ambiguous spatial correspondence between diffusion directions and fiber geometry, and existing diffusion tractography methods of CNs identification are prone to producing erroneous trajectories and missing true po…
▽ More
Diffusion MRI tractography is an important tool for identifying and analyzing the intracranial course of cranial nerves (CNs). However, the complex environment of the skull base leads to ambiguous spatial correspondence between diffusion directions and fiber geometry, and existing diffusion tractography methods of CNs identification are prone to producing erroneous trajectories and missing true positive connections. To overcome the above challenge, we propose a novel CNs identification framework with anatomy-guided fiber trajectory distribution, which incorporates anatomical shape prior knowledge during the process of CNs tracing to build diffusion tensor vector fields. We introduce higher-order streamline differential equations for continuous flow field representations to directly characterize the fiber trajectory distribution of CNs from the tract-based level. The experimental results on the vivo HCP dataset and the clinical MDM dataset demonstrate that the proposed method reduces false-positive fiber production compared to competing methods and produces reconstructed CNs (i.e. CN II, CN III, CN V, and CN VII/VIII) that are judged to better correspond to the known anatomy.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
ChatEL: Entity Linking with Chatbots
Authors:
Yifan Ding,
Qingkai Zeng,
Tim Weninger
Abstract:
Entity Linking (EL) is an essential and challenging task in natural language processing that seeks to link some text representing an entity within a document or sentence with its corresponding entry in a dictionary or knowledge base. Most existing approaches focus on creating elaborate contextual models that look for clues the words surrounding the entity-text to help solve the linking problem. Al…
▽ More
Entity Linking (EL) is an essential and challenging task in natural language processing that seeks to link some text representing an entity within a document or sentence with its corresponding entry in a dictionary or knowledge base. Most existing approaches focus on creating elaborate contextual models that look for clues the words surrounding the entity-text to help solve the linking problem. Although these fine-tuned language models tend to work, they can be unwieldy, difficult to train, and do not transfer well to other domains. Fortunately, Large Language Models (LLMs) like GPT provide a highly-advanced solution to the problems inherent in EL models, but simply naive prompts to LLMs do not work well. In the present work, we define ChatEL, which is a three-step framework to prompt LLMs to return accurate results. Overall the ChatEL framework improves the average F1 performance across 10 datasets by more than 2%. Finally, a thorough error analysis shows many instances with the ground truth labels were actually incorrect, and the labels predicted by ChatEL were actually correct. This indicates that the quantitative results presented in this paper may be a conservative estimate of the actual performance. All data and code are available as an open-source package on GitHub at https://github.com/yifding/In_Context_EL.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
InfoNet: Neural Estimation of Mutual Information without Test-Time Optimization
Authors:
Zhengyang Hu,
Song Kang,
Qunsong Zeng,
Kaibin Huang,
Yanchao Yang
Abstract:
Estimating mutual correlations between random variables or data streams is essential for intelligent behavior and decision-making. As a fundamental quantity for measuring statistical relationships, mutual information has been extensively studied and utilized for its generality and equitability. However, existing methods often lack the efficiency needed for real-time applications, such as test-time…
▽ More
Estimating mutual correlations between random variables or data streams is essential for intelligent behavior and decision-making. As a fundamental quantity for measuring statistical relationships, mutual information has been extensively studied and utilized for its generality and equitability. However, existing methods often lack the efficiency needed for real-time applications, such as test-time optimization of a neural network, or the differentiability required for end-to-end learning, like histograms. We introduce a neural network called InfoNet, which directly outputs mutual information estimations of data streams by leveraging the attention mechanism and the computational efficiency of deep learning infrastructures. By maximizing a dual formulation of mutual information through large-scale simulated training, our approach circumvents time-consuming test-time optimization and offers generalization ability. We evaluate the effectiveness and generalization of our proposed mutual information estimation scheme on various families of distributions and applications. Our results demonstrate that InfoNet and its training process provide a graceful efficiency-accuracy trade-off and order-preserving properties. We will make the code and models available as a comprehensive toolbox to facilitate studies in different fields requiring real-time mutual information estimation.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Progress in artificial intelligence applications based on the combination of self-driven sensors and deep learning
Authors:
Weixiang Wan,
Wenjian Sun,
Qiang Zeng,
Linying Pan,
Jingyu Xu,
Bo Liu
Abstract:
In the era of Internet of Things, how to develop a smart sensor system with sustainable power supply, easy deployment and flexible use has become a difficult problem to be solved. The traditional power supply has problems such as frequent replacement or charging when in use, which limits the development of wearable devices. The contact-to-separate friction nanogenerator (TENG) was prepared by usin…
▽ More
In the era of Internet of Things, how to develop a smart sensor system with sustainable power supply, easy deployment and flexible use has become a difficult problem to be solved. The traditional power supply has problems such as frequent replacement or charging when in use, which limits the development of wearable devices. The contact-to-separate friction nanogenerator (TENG) was prepared by using polychotomy thy lene (PTFE) and aluminum (AI) foils. Human motion energy was collected by human body arrangement, and human motion posture was monitored according to the changes of output electrical signals. In 2012, Academician Wang Zhong lin and his team invented the triboelectric nanogenerator (TENG), which uses Maxwell displacement current as a driving force to directly convert mechanical stimuli into electrical signals, so it can be used as a self-driven sensor. Teng-based sensors have the advantages of simple structure and high instantaneous power density, which provides an important means for building intelligent sensor systems. At the same time, machine learning, as a technology with low cost, short development cycle, strong data processing ability and prediction ability, has a significant effect on the processing of a large number of electrical signals generated by TENG, and the combination with TENG sensors will promote the rapid development of intelligent sensor networks in the future. Therefore, this paper is based on the intelligent sound monitoring and recognition system of TENG, which has good sound recognition capability, and aims to evaluate the feasibility of the sound perception module architecture in ubiquitous sensor networks.
△ Less
Submitted 12 March, 2024; v1 submitted 30 January, 2024;
originally announced February 2024.
-
Generalizing across Temporal Domains with Koopman Operators
Authors:
Qiuhao Zeng,
Wei Wang,
Fan Zhou,
Gezheng Xu,
Ruizhi Pu,
Changjian Shui,
Christian Gagne,
Shichun Yang,
Boyu Wang,
Charles X. Ling
Abstract:
In the field of domain generalization, the task of constructing a predictive model capable of generalizing to a target domain without access to target data remains challenging. This problem becomes further complicated when considering evolving dynamics between domains. While various approaches have been proposed to address this issue, a comprehensive understanding of the underlying generalization…
▽ More
In the field of domain generalization, the task of constructing a predictive model capable of generalizing to a target domain without access to target data remains challenging. This problem becomes further complicated when considering evolving dynamics between domains. While various approaches have been proposed to address this issue, a comprehensive understanding of the underlying generalization theory is still lacking. In this study, we contribute novel theoretic results that aligning conditional distribution leads to the reduction of generalization bounds. Our analysis serves as a key motivation for solving the Temporal Domain Generalization (TDG) problem through the application of Koopman Neural Operators, resulting in Temporal Koopman Networks (TKNets). By employing Koopman Operators, we effectively address the time-evolving distributions encountered in TDG using the principles of Koopman theory, where measurement functions are sought to establish linear transition relations between evolving domains. Through empirical evaluations conducted on synthetic and real-world datasets, we validate the effectiveness of our proposed approach.
△ Less
Submitted 15 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Chain-of-Layer: Iteratively Prompting Large Language Models for Taxonomy Induction from Limited Examples
Authors:
Qingkai Zeng,
Yuyang Bai,
Zhaoxuan Tan,
Shangbin Feng,
Zhenwen Liang,
Zhihan Zhang,
Meng Jiang
Abstract:
Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down…
▽ More
Automatic taxonomy induction is crucial for web search, recommendation systems, and question answering. Manual curation of taxonomies is expensive in terms of human effort, making automatic taxonomy construction highly desirable. In this work, we introduce Chain-of-Layer which is an in-context learning framework designed to induct taxonomies from a given set of entities. Chain-of-Layer breaks down the task into selecting relevant candidate entities in each layer and gradually building the taxonomy from top to bottom. To minimize errors, we introduce the Ensemble-based Ranking Filter to reduce the hallucinated content generated at each iteration. Through extensive experiments, we demonstrate that Chain-of-Layer achieves state-of-the-art performance on four real-world benchmarks.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
EntGPT: Linking Generative Large Language Models with Knowledge Bases
Authors:
Yifan Ding,
Amrit Poudel,
Qingkai Zeng,
Tim Weninger,
Balaji Veeramani,
Sanmitra Bhattacharya
Abstract:
The ability of Large Language Models (LLMs) to generate factually correct output remains relatively unexplored due to the lack of fact-checking and knowledge grounding during training and inference. In this work, we aim to address this challenge through the Entity Disambiguation (ED) task. We first consider prompt engineering, and design a three-step hard-prompting method to probe LLMs' ED perform…
▽ More
The ability of Large Language Models (LLMs) to generate factually correct output remains relatively unexplored due to the lack of fact-checking and knowledge grounding during training and inference. In this work, we aim to address this challenge through the Entity Disambiguation (ED) task. We first consider prompt engineering, and design a three-step hard-prompting method to probe LLMs' ED performance without supervised fine-tuning (SFT). Overall, the prompting method improves the micro-F_1 score of the original vanilla models by a large margin, on some cases up to 36% and higher, and obtains comparable performance across 10 datasets when compared to existing methods with SFT. We further improve the knowledge grounding ability through instruction tuning (IT) with similar prompts and responses. The instruction-tuned model not only achieves higher micro-F1 score performance as compared to several baseline methods on supervised entity disambiguation tasks with an average micro-F_1 improvement of 2.1% over the existing baseline models, but also obtains higher accuracy on six Question Answering (QA) tasks in the zero-shot setting. Our methodologies apply to both open- and closed-source LLMs.
△ Less
Submitted 9 February, 2024;
originally announced February 2024.
-
Efficient Invariant Kalman Filter for Inertial-based Odometry with Large-sample Environmental Measurements
Authors:
Xinghan Li,
Haoying Li,
Guangyang Zeng,
Qingcheng Zeng,
Xiaoqiang Ren,
Chao Yang,
Junfeng Wu
Abstract:
A filter for inertial-based odometry is a recursive method used to estimate the pose from measurements of ego-motion and relative pose. Currently, there is no known filter that guarantees the computation of a globally optimal solution for the non-linear measurement model. In this paper, we demonstrate that an innovative filter, with the state being $SE_2(3)$ and the $\sqrt{n}$-\textit{consistent}…
▽ More
A filter for inertial-based odometry is a recursive method used to estimate the pose from measurements of ego-motion and relative pose. Currently, there is no known filter that guarantees the computation of a globally optimal solution for the non-linear measurement model. In this paper, we demonstrate that an innovative filter, with the state being $SE_2(3)$ and the $\sqrt{n}$-\textit{consistent} pose as the initialization, efficiently achieves \textit{asymptotic optimality} in terms of minimum mean square error. This approach is tailored for real-time SLAM and inertial-based odometry applications.
Our first contribution is that we propose an iterative filtering method based on the Gauss-Newton method on Lie groups which is numerically to solve the estimation of states from a priori and non-linear measurements. The filtering stands out due to its iterative mechanism and adaptive initialization. Second, when dealing with environmental measurements of the surroundings, we utilize a $\sqrt{n}$-consistent pose as the initial value for the update step in a single iteration. The solution is closed in form and has computational complexity $O(n)$. Third, we theoretically show that the approach can achieve asymptotic optimality in the sense of minimum mean square error from the a priori and virtual relative pose measurements (see Problem~\ref{prob:new update problem}). Finally, to validate our method, we carry out extensive numerical and experimental evaluations. Our results consistently demonstrate that our approach outperforms other state-of-the-art filter-based methods, including the iterated extended Kalman filter and the invariant extended Kalman filter, in terms of accuracy and running time.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning
Authors:
Zhaoxuan Tan,
Qingkai Zeng,
Yijun Tian,
Zheyuan Liu,
Bing Yin,
Meng Jiang
Abstract:
Personalization in large language models (LLMs) is increasingly important, aiming to align LLM's interactions, content, and recommendations with individual user preferences. Recent advances in LLM personalization have spotlighted effective prompt design, by enriching user queries with non-parametric knowledge through behavior history retrieval and textual profiles. However, these approaches were l…
▽ More
Personalization in large language models (LLMs) is increasingly important, aiming to align LLM's interactions, content, and recommendations with individual user preferences. Recent advances in LLM personalization have spotlighted effective prompt design, by enriching user queries with non-parametric knowledge through behavior history retrieval and textual profiles. However, these approaches were limited due to a lack of model ownership, resulting in constrained customization and privacy issues. Moreover, they often failed to accurately capture user behavior patterns, especially in cases where user data were complex and dynamic. To address these shortcomings, we introduce One PEFT Per User (OPPU), which employs personalized parameter-efficient fine-tuning (PEFT) modules, to store user-specific behavior patterns and preferences. By plugging in users' personal PEFT parameters, they can own and use their LLMs personally. OPPU integrates parametric user knowledge in the personal PEFT parameters with the non-parametric knowledge acquired through retrieval and profile. This integration adapts individual LLMs to user behavior shifts. Experimental results demonstrate that OPPU significantly outperforms existing prompt-based methods across seven diverse tasks in the LaMP benchmark. Further in-depth studies reveal OPPU's enhanced capabilities in handling user behavior shifts, modeling users at different active levels, maintaining robustness across various user history formats, and displaying versatility with different PEFT methods.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
When eBPF Meets Machine Learning: On-the-fly OS Kernel Compartmentalization
Authors:
Zicheng Wang,
Tiejin Chen,
Qinrun Dai,
Yueqi Chen,
Hua Wei,
Qingkai Zeng
Abstract:
Compartmentalization effectively prevents initial corruption from turning into a successful attack. This paper presents O2C, a pioneering system designed to enforce OS kernel compartmentalization on the fly. It not only provides immediate remediation for sudden threats but also maintains consistent system availability through the enforcement process.
O2C is empowered by the newest advancements o…
▽ More
Compartmentalization effectively prevents initial corruption from turning into a successful attack. This paper presents O2C, a pioneering system designed to enforce OS kernel compartmentalization on the fly. It not only provides immediate remediation for sudden threats but also maintains consistent system availability through the enforcement process.
O2C is empowered by the newest advancements of the eBPF ecosystem which allows to instrument eBPF programs that perform enforcement actions into the kernel at runtime. O2C takes the lead in embedding a machine learning model into eBPF programs, addressing unique challenges in on-the-fly compartmentalization. Our comprehensive evaluation shows that O2C effectively confines damage within the compartment. Further, we validate that decision tree is optimally suited for O2C owing to its advantages in processing tabular data, its explainable nature, and its compliance with the eBPF ecosystem. Last but not least, O2C is lightweight, showing negligible overhead and excellent sacalability system-wide.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
D3Former: Jointly Learning Repeatable Dense Detectors and Feature-enhanced Descriptors via Saliency-guided Transformer
Authors:
Junjie Gao,
Pengfei Wang,
Qiujie Dong,
Qiong Zeng,
Shiqing Xin,
Caiming Zhang
Abstract:
Establishing accurate and representative matches is a crucial step in addressing the point cloud registration problem. A commonly employed approach involves detecting keypoints with salient geometric features and subsequently mapping these keypoints from one frame of the point cloud to another. However, methods within this category are hampered by the repeatability of the sampled keypoints. In thi…
▽ More
Establishing accurate and representative matches is a crucial step in addressing the point cloud registration problem. A commonly employed approach involves detecting keypoints with salient geometric features and subsequently mapping these keypoints from one frame of the point cloud to another. However, methods within this category are hampered by the repeatability of the sampled keypoints. In this paper, we introduce a saliency-guided trans\textbf{former}, referred to as \textit{D3Former}, which entails the joint learning of repeatable \textbf{D}ense \textbf{D}etectors and feature-enhanced \textbf{D}escriptors. The model comprises a Feature Enhancement Descriptor Learning (FEDL) module and a Repetitive Keypoints Detector Learning (RKDL) module. The FEDL module utilizes a region attention mechanism to enhance feature distinctiveness, while the RKDL module focuses on detecting repeatable keypoints to enhance matching capabilities. Extensive experimental results on challenging indoor and outdoor benchmarks demonstrate that our proposed method consistently outperforms state-of-the-art point cloud matching methods. Notably, tests on 3DLoMatch, even with a low overlap ratio, show that our method consistently outperforms recently published approaches such as RoReg and RoITr. For instance, with the number of extracted keypoints reduced to 250, the registration recall scores for RoReg, RoITr, and our method are 64.3\%, 73.6\%, and 76.5\%, respectively.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention
Authors:
Hao Shao,
Quansheng Zeng,
Qibin Hou,
Jufeng Yang
Abstract:
Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attentio…
▽ More
Efficiently capturing multi-scale information and building long-range dependencies among pixels are essential for medical image segmentation because of the various sizes and shapes of the lesion regions or organs. In this paper, we present Multi-scale Cross-axis Attention (MCA) to solve the above challenging issues based on the efficient axial attention. Instead of simply connecting axial attention along the horizontal and vertical directions sequentially, we propose to calculate dual cross attentions between two parallel axial attentions to capture global information better. To process the significant variations of lesion regions or organs in individual sizes and shapes, we also use multiple convolutions of strip-shape kernels with different kernel sizes in each axial attention path to improve the efficiency of the proposed MCA in encoding spatial information. We build the proposed MCA upon the MSCAN backbone, yielding our network, termed MCANet. Our MCANet with only 4M+ parameters performs even better than most previous works with heavy backbones (e.g., Swin Transformer) on four challenging tasks, including skin lesion segmentation, nuclei segmentation, abdominal multi-organ segmentation, and polyp segmentation. Code is available at https://github.com/haoshao-nku/medical_seg.
△ Less
Submitted 19 December, 2023; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Cross Domain LifeLong Sequential Modeling for Online Click-Through Rate Prediction
Authors:
Ruijie Hou,
Zhaoyang Yang,
Yu Ming,
Hongyu Lu,
Zhuobin Zheng,
Yu Chen,
Qinsong Zeng,
Ming Chen
Abstract:
Deep neural networks (DNNs) that incorporated lifelong sequential modeling (LSM) have brought great success to recommendation systems in various social media platforms. While continuous improvements have been made in domain-specific LSM, limited work has been done in cross-domain LSM, which considers modeling of lifelong sequences of both target domain and source domain. In this paper, we propose…
▽ More
Deep neural networks (DNNs) that incorporated lifelong sequential modeling (LSM) have brought great success to recommendation systems in various social media platforms. While continuous improvements have been made in domain-specific LSM, limited work has been done in cross-domain LSM, which considers modeling of lifelong sequences of both target domain and source domain. In this paper, we propose Lifelong Cross Network (LCN) to incorporate cross-domain LSM to improve the click-through rate (CTR) prediction in the target domain. The proposed LCN contains a LifeLong Attention Pyramid (LAP) module that comprises of three levels of cascaded attentions to effectively extract interest representations with respect to the candidate item from lifelong sequences. We also propose Cross Representation Production (CRP) module to enforce additional supervision on the learning and alignment of cross-domain representations so that they can be better reused on learning of the CTR prediction in the target domain. We conducted extensive experiments on WeChat Channels industrial dataset as well as on benchmark dataset. Results have revealed that the proposed LCN outperforms existing work in terms of both prediction accuracy and online performance.
△ Less
Submitted 17 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
Ascle: A Python Natural Language Processing Toolkit for Medical Text Generation
Authors:
Rui Yang,
Qingcheng Zeng,
Keen You,
Yujie Qiao,
Lucas Huang,
Chia-Chun Hsieh,
Benjamin Rosand,
Jeremy Goldwasser,
Amisha D Dave,
Tiarnan D. L. Keenan,
Emily Y Chew,
Dragomir Radev,
Zhiyong Lu,
Hua Xu,
Qingyu Chen,
Irene Li
Abstract:
This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing f…
▽ More
This study introduces Ascle, a pioneering natural language processing (NLP) toolkit designed for medical text generation. Ascle is tailored for biomedical researchers and healthcare professionals with an easy-to-use, all-in-one solution that requires minimal programming expertise. For the first time, Ascle evaluates and provides interfaces for the latest pre-trained language models, encompassing four advanced and challenging generative functions: question-answering, text summarization, text simplification, and machine translation. In addition, Ascle integrates 12 essential NLP functions, along with query and search capabilities for clinical databases. The toolkit, its models, and associated data are publicly available via https://github.com/Yale-LILY/MedGen.
△ Less
Submitted 9 December, 2023; v1 submitted 28 November, 2023;
originally announced November 2023.
-
Segment Together: A Versatile Paradigm for Semi-Supervised Medical Image Segmentation
Authors:
Qingjie Zeng,
Yutong Xie,
Zilin Lu,
Mengkang Lu,
Yicheng Wu,
Yong Xia
Abstract:
Annotation scarcity has become a major obstacle for training powerful deep-learning models for medical image segmentation, restricting their deployment in clinical scenarios. To address it, semi-supervised learning by exploiting abundant unlabeled data is highly desirable to boost the model training. However, most existing works still focus on limited medical tasks and underestimate the potential…
▽ More
Annotation scarcity has become a major obstacle for training powerful deep-learning models for medical image segmentation, restricting their deployment in clinical scenarios. To address it, semi-supervised learning by exploiting abundant unlabeled data is highly desirable to boost the model training. However, most existing works still focus on limited medical tasks and underestimate the potential of learning across diverse tasks and multiple datasets. Therefore, in this paper, we introduce a \textbf{Ver}satile \textbf{Semi}-supervised framework (VerSemi) to point out a new perspective that integrates various tasks into a unified model with a broad label space, to exploit more unlabeled data for semi-supervised medical image segmentation. Specifically, we introduce a dynamic task-prompted design to segment various targets from different datasets. Next, this unified model is used to identify the foreground regions from all labeled data, to capture cross-dataset semantics. Particularly, we create a synthetic task with a cutmix strategy to augment foreground targets within the expanded label space. To effectively utilize unlabeled data, we introduce a consistency constraint. This involves aligning aggregated predictions from various tasks with those from the synthetic task, further guiding the model in accurately segmenting foreground regions during training. We evaluated our VerSemi model on four public benchmarking datasets. Extensive experiments demonstrated that VerSemi can consistently outperform the second-best method by a large margin (e.g., an average 2.69\% Dice gain on four datasets), setting new SOTA performance for semi-supervised medical image segmentation. The code will be released.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Authors:
Zhihan Zhang,
Shuohang Wang,
Wenhao Yu,
Yichong Xu,
Dan Iter,
Qingkai Zeng,
Yang Liu,
Chenguang Zhu,
Meng Jiang
Abstract:
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning. Unfortunately, the performance of LLMs is greatly influenced by the quality of these instructions, and manually writing effective instructions for each task is a laborious and subjective process. In this paper, we introduce Auto-Instruct, a…
▽ More
Large language models (LLMs) can perform a wide range of tasks by following natural language instructions, without the necessity of task-specific fine-tuning. Unfortunately, the performance of LLMs is greatly influenced by the quality of these instructions, and manually writing effective instructions for each task is a laborious and subjective process. In this paper, we introduce Auto-Instruct, a novel method to automatically improve the quality of instructions provided to LLMs. Our method leverages the inherent generative ability of LLMs to produce diverse candidate instructions for a given task, and then ranks them using a scoring model trained on a variety of 575 existing NLP tasks. In experiments on 118 out-of-domain tasks, Auto-Instruct surpasses both human-written instructions and existing baselines of LLM-generated instructions. Furthermore, our method exhibits notable generalizability even with other LLMs that are not incorporated into its training process.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Semi-Aerodynamic Model Aided Invariant Kalman Filtering for UAV Full-State Estimation
Authors:
Xiaoyu Ye,
Fujun Song,
Zongyu Zhang,
Rui Zhang,
Qinghua Zeng
Abstract:
Due to the state trajectory-independent features of invariant Kalman filtering (InEKF), it has attracted widespread attention in the research community for its significantly improved state estimation accuracy and convergence under disturbance. In this paper, we formulate the full-source data fusion navigation problem for fixed-wing unmanned aerial vehicle (UAV) within a framework based on error st…
▽ More
Due to the state trajectory-independent features of invariant Kalman filtering (InEKF), it has attracted widespread attention in the research community for its significantly improved state estimation accuracy and convergence under disturbance. In this paper, we formulate the full-source data fusion navigation problem for fixed-wing unmanned aerial vehicle (UAV) within a framework based on error state right-invariant extended Kalman filtering (ES-RIEKF) on Lie groups. We merge measurements from a multi-rate onboard sensor network on UAVs to achieve real-time estimation of pose, air flow angles, and wind speed. Detailed derivations are provided, and the algorithm's convergence and accuracy improvements over established methods like Error State EKF (ES-EKF) and Nonlinear Complementary Filter (NCF) are demonstrated using real-flight data from UAVs. Additionally, we introduce a semi-aerodynamic model fusion framework that relies solely on ground-measurable parameters. We design and train an Long Short Term Memory (LSTM) deep network to achieve drift-free prediction of the UAV's angle of attack (AOA) and side-slip angle (SA) using easily obtainable onboard data like control surface deflections, thereby significantly reducing dependency on GNSS or complicated aerodynamic model parameters. Further, we validate the algorithm's robust advantages under GNSS denied, where flight data shows that the maximum positioning error stays within 30 meters over a 130-second denial period. To the best of our knowledge, this study is the first to apply ES-RIEKF to full-source navigation applications for fixed-wing UAVs, aiming to provide engineering references for designers. Our implementations using MATLAB/Simulink will open source.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Discrepancy Matters: Learning from Inconsistent Decoder Features for Consistent Semi-supervised Medical Image Segmentation
Authors:
Qingjie Zeng,
Yutong Xie,
Zilin Lu,
Mengkang Lu,
Yong Xia
Abstract:
Semi-supervised learning (SSL) has been proven beneficial for mitigating the issue of limited labeled data especially on the task of volumetric medical image segmentation. Unlike previous SSL methods which focus on exploring highly confident pseudo-labels or developing consistency regularization schemes, our empirical findings suggest that inconsistent decoder features emerge naturally when two de…
▽ More
Semi-supervised learning (SSL) has been proven beneficial for mitigating the issue of limited labeled data especially on the task of volumetric medical image segmentation. Unlike previous SSL methods which focus on exploring highly confident pseudo-labels or developing consistency regularization schemes, our empirical findings suggest that inconsistent decoder features emerge naturally when two decoders strive to generate consistent predictions. Based on the observation, we first analyze the treasure of discrepancy in learning towards consistency, under both pseudo-labeling and consistency regularization settings, and subsequently propose a novel SSL method called LeFeD, which learns the feature-level discrepancy obtained from two decoders, by feeding the discrepancy as a feedback signal to the encoder. The core design of LeFeD is to enlarge the difference by training differentiated decoders, and then learn from the inconsistent information iteratively. We evaluate LeFeD against eight state-of-the-art (SOTA) methods on three public datasets. Experiments show LeFeD surpasses competitors without any bells and whistles such as uncertainty estimation and strong constraints, as well as setting a new state-of-the-art for semi-supervised medical image segmentation. Code is available at \textcolor{cyan}{https://github.com/maxwell0027/LeFeD}
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
UMMAFormer: A Universal Multimodal-adaptive Transformer Framework for Temporal Forgery Localization
Authors:
Rui Zhang,
Hongxia Wang,
Mingshan Du,
Hanqing Liu,
Yang Zhou,
Qiang Zeng
Abstract:
The emergence of artificial intelligence-generated content (AIGC) has raised concerns about the authenticity of multimedia content in various fields. However, existing research for forgery content detection has focused mainly on binary classification tasks of complete videos, which has limited applicability in industrial settings. To address this gap, we propose UMMAFormer, a novel universal trans…
▽ More
The emergence of artificial intelligence-generated content (AIGC) has raised concerns about the authenticity of multimedia content in various fields. However, existing research for forgery content detection has focused mainly on binary classification tasks of complete videos, which has limited applicability in industrial settings. To address this gap, we propose UMMAFormer, a novel universal transformer framework for temporal forgery localization (TFL) that predicts forgery segments with multimodal adaptation. Our approach introduces a Temporal Feature Abnormal Attention (TFAA) module based on temporal feature reconstruction to enhance the detection of temporal differences. We also design a Parallel Cross-Attention Feature Pyramid Network (PCA-FPN) to optimize the Feature Pyramid Network (FPN) for subtle feature enhancement. To evaluate the proposed method, we contribute a novel Temporal Video Inpainting Localization (TVIL) dataset specifically tailored for video inpainting scenes. Our experiments show that our approach achieves state-of-the-art performance on benchmark datasets, including Lav-DF, TVIL, and Psynd, significantly outperforming previous methods. The code and data are available at https://github.com/ymhzyj/UMMAFormer/.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts
Authors:
Fan Gao,
Hang Jiang,
Rui Yang,
Qingcheng Zeng,
Jinghui Lu,
Moritz Blum,
Dairui Liu,
Tianwei She,
Yuang Jiang,
Irene Li
Abstract:
Educational materials such as survey articles in specialized fields like computer science traditionally require tremendous expert inputs and are therefore expensive to create and update. Recently, Large Language Models (LLMs) have achieved significant success across various general tasks. However, their effectiveness and limitations in the education domain are yet to be fully explored. In this wor…
▽ More
Educational materials such as survey articles in specialized fields like computer science traditionally require tremendous expert inputs and are therefore expensive to create and update. Recently, Large Language Models (LLMs) have achieved significant success across various general tasks. However, their effectiveness and limitations in the education domain are yet to be fully explored. In this work, we examine the proficiency of LLMs in generating succinct survey articles specific to the niche field of NLP in computer science, focusing on a curated list of 99 topics. Automated benchmarks reveal that GPT-4 surpasses its predecessors, inluding GPT-3.5, PaLM2, and LLaMa2 by margins ranging from 2% to 20% in comparison to the established ground truth. We compare both human and GPT-based evaluation scores and provide in-depth analysis. While our findings suggest that GPT-created surveys are more contemporary and accessible than human-authored ones, certain limitations were observed. Notably, GPT-4, despite often delivering outstanding content, occasionally exhibited lapses like missing details or factual errors. At last, we compared the rating behavior between humans and GPT-4 and found systematic bias in using GPT evaluation.
△ Less
Submitted 23 May, 2024; v1 submitted 20 August, 2023;
originally announced August 2023.
-
Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning
Authors:
Xiang Yuan,
Gong Cheng,
Kebing Yan,
Qinghua Zeng,
Junwei Han
Abstract:
The past few years have witnessed the immense success of object detection, while current excellent detectors struggle on tackling size-limited instances. Concretely, the well-known challenge of low overlaps between the priors and object regions leads to a constrained sample pool for optimization, and the paucity of discriminative information further aggravates the recognition. To alleviate the afo…
▽ More
The past few years have witnessed the immense success of object detection, while current excellent detectors struggle on tackling size-limited instances. Concretely, the well-known challenge of low overlaps between the priors and object regions leads to a constrained sample pool for optimization, and the paucity of discriminative information further aggravates the recognition. To alleviate the aforementioned issues, we propose CFINet, a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning. Firstly, we introduce Coarse-to-fine RPN (CRPN) to ensure sufficient and high-quality proposals for small objects through the dynamic anchor selection strategy and cascade regression. Then, we equip the conventional detection head with a Feature Imitation (FI) branch to facilitate the region representations of size-limited instances that perplex the model in an imitation manner. Moreover, an auxiliary imitation loss following supervised contrastive learning paradigm is devised to optimize this branch. When integrated with Faster RCNN, CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A, underscoring its superiority over baseline detector and other mainstream detection approaches.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Efficient Multiuser AI Downloading via Reusable Knowledge Broadcasting
Authors:
Hai Wu,
Qunsong Zeng,
Kaibin Huang
Abstract:
For the 6G mobile networks, in-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices. However, the simultaneous downloading of diverse and high-dimensional models to multiple devices over wireless links presents a significant communication bottleneck. To overcome the bottleneck, we propose the framework of model broadcastin…
▽ More
For the 6G mobile networks, in-situ model downloading has emerged as an important use case to enable real-time adaptive artificial intelligence on edge devices. However, the simultaneous downloading of diverse and high-dimensional models to multiple devices over wireless links presents a significant communication bottleneck. To overcome the bottleneck, we propose the framework of model broadcasting and assembling (MBA), which represents the first attempt on leveraging reusable knowledge, referring to shared parameters among tasks, to enable parameter broadcasting to reduce communication overhead. The MBA framework comprises two key components. The first, the MBA protocol, defines the system operations including parameter selection from a model library, power control for broadcasting, and model assembling at devices. The second component is the joint design of parameter-selection-and-power-control (PS-PC), which provides guarantees on devices' model performance and minimizes the downloading latency. The corresponding optimization problem is simplified by decomposition into the sequential PS and PC sub-problems without compromising its optimality. The PS sub-problem is solved efficiently by designing two efficient algorithms. On one hand, the low-complexity algorithm of greedy parameter selection features the construction of candidate model sets and a selection metric, both of which are designed under the criterion of maximum reusable knowledge among tasks. On the other hand, the optimal tree-search algorithm gains its efficiency via the proposed construction of a compact binary tree pruned using model architecture constraints and an intelligent branch-and-bound search. Given optimal PS, the optimal PC policy is derived in closed form. Extensive experiments demonstrate the substantial reduction in downloading latency achieved by the proposed MBA compared to traditional model downloading.
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning
Authors:
Zhenwen Liang,
Dian Yu,
Xiaoman Pan,
Wenlin Yao,
Qingkai Zeng,
Xiangliang Zhang,
Dong Yu
Abstract:
Reasoning in mathematical domains remains a significant challenge for relatively small language models (LMs). Many current methods focus on specializing LMs in mathematical reasoning and rely heavily on knowledge distillation from powerful but inefficient large LMs (LLMs). In this work, we explore a new direction that avoids over-reliance on LLM teachers, introducing a multi-view fine-tuning metho…
▽ More
Reasoning in mathematical domains remains a significant challenge for relatively small language models (LMs). Many current methods focus on specializing LMs in mathematical reasoning and rely heavily on knowledge distillation from powerful but inefficient large LMs (LLMs). In this work, we explore a new direction that avoids over-reliance on LLM teachers, introducing a multi-view fine-tuning method that efficiently exploits existing mathematical problem datasets with diverse annotation styles. Our approach uniquely considers the various annotation formats as different "views" and leverages them in training the model. By postpending distinct instructions to input questions, models can learn to generate solutions in diverse formats in a flexible manner. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches that utilize knowledge distillation, as well as carefully established baselines. Additionally, the proposed method grants the models promising generalization ability across various views and datasets, and the capability to learn from inaccurate or incomplete noisy data. We hope our multi-view training paradigm could inspire future studies in other machine reasoning domains.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation
Authors:
Liliang Ren,
Mankeerat Sidhu,
Qi Zeng,
Revanth Gangi Reddy,
Heng Ji,
ChengXiang Zhai
Abstract:
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the us…
▽ More
Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system. Consequently, they often correlate poorly with human evaluations. To address this issue, we propose a novel model-agnostic approach that leverages Conditional Pointwise Mutual Information (C-PMI) to measure the turn-level interaction between the system and the user based on a given evaluation dimension. Experimental results on the widely used FED dialogue evaluation dataset demonstrate that our approach significantly improves the correlation with human judgment compared with existing evaluation systems. By replacing the negative log-likelihood-based scorer with our proposed C-PMI scorer, we achieve a relative 62.6% higher Spearman correlation on average for the FED evaluation metric. Our code is publicly available at https://github.com/renll/C-PMI.
△ Less
Submitted 1 September, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Towards Fairness in Personalized Ads Using Impression Variance Aware Reinforcement Learning
Authors:
Aditya Srinivas Timmaraju,
Mehdi Mashayekhi,
Mingliang Chen,
Qi Zeng,
Quintin Fettes,
Wesley Cheung,
Yihan Xiao,
Manojkumar Rangasamy Kannadasan,
Pushkar Tripathi,
Sean Gahagan,
Miranda Bogen,
Rob Roudani
Abstract:
Variances in ad impression outcomes across demographic groups are increasingly considered to be potentially indicative of algorithmic bias in personalized ads systems. While there are many definitions of fairness that could be applicable in the context of personalized systems, we present a framework which we call the Variance Reduction System (VRS) for achieving more equitable outcomes in Meta's a…
▽ More
Variances in ad impression outcomes across demographic groups are increasingly considered to be potentially indicative of algorithmic bias in personalized ads systems. While there are many definitions of fairness that could be applicable in the context of personalized systems, we present a framework which we call the Variance Reduction System (VRS) for achieving more equitable outcomes in Meta's ads systems. VRS seeks to achieve a distribution of impressions with respect to selected protected class (PC) attributes that more closely aligns the demographics of an ad's eligible audience (a function of advertiser targeting criteria) with the audience who sees that ad, in a privacy-preserving manner. We first define metrics to quantify fairness gaps in terms of ad impression variances with respect to PC attributes including gender and estimated race. We then present the VRS for re-ranking ads in an impression variance-aware manner. We evaluate VRS via extensive simulations over different parameter choices and study the effect of the VRS on the chosen fairness metric. We finally present online A/B testing results from applying VRS to Meta's ads systems, concluding with a discussion of future work. We have deployed the VRS to all users in the US for housing ads, resulting in significant improvement in our fairness metric. VRS is the first large-scale deployed framework for pursuing fairness for multiple PC attributes in online advertising.
△ Less
Submitted 8 June, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
Large Language Models Are Partially Primed in Pronoun Interpretation
Authors:
Suet-Ying Lam,
Qingcheng Zeng,
Kexun Zhang,
Chenyu You,
Rob Voigt
Abstract:
While a large body of literature suggests that large language models (LLMs) acquire rich linguistic representations, little is known about whether they adapt to linguistic biases in a human-like way. The present study probes this question by asking whether LLMs display human-like referential biases using stimuli and procedures from real psycholinguistic experiments. Recent psycholinguistic studies…
▽ More
While a large body of literature suggests that large language models (LLMs) acquire rich linguistic representations, little is known about whether they adapt to linguistic biases in a human-like way. The present study probes this question by asking whether LLMs display human-like referential biases using stimuli and procedures from real psycholinguistic experiments. Recent psycholinguistic studies suggest that humans adapt their referential biases with recent exposure to referential patterns; closely replicating three relevant psycholinguistic experiments from Johnson & Arnold (2022) in an in-context learning (ICL) framework, we found that InstructGPT adapts its pronominal interpretations in response to the frequency of referential patterns in the local discourse, though in a limited fashion: adaptation was only observed relative to syntactic but not semantic biases. By contrast, FLAN-UL2 fails to generate meaningful patterns. Our results provide further evidence that contemporary LLMs discourse representations are sensitive to syntactic patterns in the local context but less so to semantic patterns. Our data and code are available at \url{https://github.com/zkx06111/llm_priming}.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation
Authors:
Qi Zeng,
Mankeerat Sidhu,
Ansel Blume,
Hou Pong Chan,
Lu Wang,
Heng Ji
Abstract:
Opinions in scientific research papers can be divergent, leading to controversies among reviewers. However, most existing datasets for opinion summarization are centered around product reviews and assume that the analyzed opinions are non-controversial, failing to account for the variability seen in other contexts such as academic papers, political debates, or social media discussions. To address…
▽ More
Opinions in scientific research papers can be divergent, leading to controversies among reviewers. However, most existing datasets for opinion summarization are centered around product reviews and assume that the analyzed opinions are non-controversial, failing to account for the variability seen in other contexts such as academic papers, political debates, or social media discussions. To address this gap, we propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. To facilitate this task, we introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences. Furthermore, we propose the Checklist-guided Iterative Introspection approach, which breaks down scientific opinion summarization into several stages, iteratively refining the summary under the guidance of questions from a checklist. Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions and can be applied to other complex text generation using black-box LLMs.
△ Less
Submitted 15 June, 2024; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization
Authors:
Hou Pong Chan,
Qi Zeng,
Heng Ji
Abstract:
Existing factual consistency evaluation approaches for text summarization provide binary predictions and limited insights into the weakness of summarization systems. Therefore, we propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we prop…
▽ More
Existing factual consistency evaluation approaches for text summarization provide binary predictions and limited insights into the weakness of summarization systems. Therefore, we propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact, which explicitly represents the facts in the documents and summaries with semantic frames extracted by semantic role labeling, and highlights the related semantic frames to predict inconsistency. The highlighted semantic frames help verify predicted error types and correct inconsistent summaries. Experiment results demonstrate that our model outperforms strong baselines and provides evidence to support or refute the summary.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Adaptive Privacy-Preserving Coded Computing With Hierarchical Task Partitioning
Authors:
Qicheng Zeng,
Zhaojun Nan,
Sheng Zhou
Abstract:
Distributed computing is known as an emerging and efficient technique to support various intelligent services, such as large-scale machine learning. However, privacy leakage and random delays from straggling servers pose significant challenges. To address these issues, coded computing, a promising solution that combines coding theory with distributed computing, recovers computation tasks with resu…
▽ More
Distributed computing is known as an emerging and efficient technique to support various intelligent services, such as large-scale machine learning. However, privacy leakage and random delays from straggling servers pose significant challenges. To address these issues, coded computing, a promising solution that combines coding theory with distributed computing, recovers computation tasks with results from a subset of workers. In this paper, we propose the adaptive privacy-preserving coded computing (APCC) strategy, which can adaptively provide accurate or approximated results according to the form of computation functions, so as to suit diverse types of computation tasks. We prove that APCC achieves complete data privacy preservation and demonstrate its optimality in terms of encoding rate, defined as the ratio between the computation loads of tasks before and after encoding. To further alleviate the straggling effect and reduce delay, we integrate hierarchical task partitioning and task cancellation into the coding design of APCC. The corresponding partitioning problems are formulated as mixed-integer nonlinear programming (MINLP) problems with the objective of minimizing task completion delay. We propose a low-complexity maximum value descent (MVD) algorithm to optimally solve these problems. Simulation results show that APCC can reduce task completion delay by a range of 20.3% to 47.5% when compared to other state-of-the-art benchmarks.
△ Less
Submitted 30 October, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Improved Naive Bayes with Mislabeled Data
Authors:
Qianhan Zeng,
Yingqiu Zhu,
Xuening Zhu,
Feifei Wang,
Weichen Zhao,
Shuning Sun,
Meng Su,
Hansheng Wang
Abstract:
Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generatin…
▽ More
Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generating mechanism of incorrect labels, we optimize the corresponding log-likelihood function iteratively by using an EM algorithm. Our simulation and experiment results show that the improved Naive Bayes method greatly improves the performances of the Naive Bayes method with mislabeled data.
△ Less
Submitted 13 April, 2023;
originally announced April 2023.
-
Regression-based Deep-Learning predicts molecular biomarkers from pathology slides
Authors:
Omar S. M. El Nahhas,
Chiara M. L. Loeffler,
Zunamys I. Carrero,
Marko van Treeck,
Fiona R. Kolbinger,
Katherine J. Hewitt,
Hannah S. Muti,
Mara Graziani,
Qinghe Zeng,
Julien Calderaro,
Nadina Ortiz-Brüchle,
Tanwei Yuan,
Michael Hoffmeister,
Hermann Brenner,
Alexander Brobeil,
Jorge S. Reis-Filho,
Jakob Nikolas Kather
Abstract:
Deep Learning (DL) can predict biomarkers from cancer histopathology. Several clinically approved applications use this technology. Most approaches, however, predict categorical labels, whereas biomarkers are often continuous measurements. We hypothesized that regression-based DL outperforms classification-based DL. Therefore, we developed and evaluated a new self-supervised attention-based weakly…
▽ More
Deep Learning (DL) can predict biomarkers from cancer histopathology. Several clinically approved applications use this technology. Most approaches, however, predict categorical labels, whereas biomarkers are often continuous measurements. We hypothesized that regression-based DL outperforms classification-based DL. Therefore, we developed and evaluated a new self-supervised attention-based weakly supervised regression method that predicts continuous biomarkers directly from images in 11,671 patients across nine cancer types. We tested our method for multiple clinically and biologically relevant biomarkers: homologous repair deficiency (HRD) score, a clinically used pan-cancer biomarker, as well as markers of key biological processes in the tumor microenvironment. Using regression significantly enhances the accuracy of biomarker prediction, while also improving the interpretability of the results over classification. In a large cohort of colorectal cancer patients, regression-based prediction scores provide a higher prognostic value than classification-based scores. Our open-source regression approach offers a promising alternative for continuous biomarker analysis in computational pathology.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
DecentRAN: Decentralized Radio Access Network for 5.5G and beyond
Authors:
Hao Xu,
Xun Liu,
Qinghai Zeng,
Qiang Li,
Shibin Ge,
Guohua Zhou,
Raymond Forbes
Abstract:
Radio Access Network faces challenges from privacy and flexible wide area and local area network access. RAN is limited from providing local service directly due to centralized design of cellular network and concerns of user privacy and data security. DecentRAN or Decentralized Radio Access Network offers an alternative perspective to cope with the emerging demands of 5G Non-public Network and the…
▽ More
Radio Access Network faces challenges from privacy and flexible wide area and local area network access. RAN is limited from providing local service directly due to centralized design of cellular network and concerns of user privacy and data security. DecentRAN or Decentralized Radio Access Network offers an alternative perspective to cope with the emerging demands of 5G Non-public Network and the hybrid deployment of 5GS and Wi-Fi in the campus network. Starting from Public key as an Identity, independent mutual authentication between UE and RAN are made possible in a privacy-preserving manner. With the introduction of decentralized architecture and network functions using blockchain and smart contracts, DecentRAN has ability to provide users with locally managed, end-to-end encrypted 5G NPN and the potential connectivity to Local Area Network via campus routers. Furthermore, the performance regarding throughput and latency are discussed, offering the deployment guidance for DecentRAN.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
SmartBook: AI-Assisted Situation Report Generation for Intelligence Analysts
Authors:
Revanth Gangi Reddy,
Daniel Lee,
Yi R. Fung,
Khanh Duy Nguyen,
Qi Zeng,
Manling Li,
Ziqi Wang,
Clare Voss,
Heng Ji
Abstract:
Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-bui…
▽ More
Timely and comprehensive understanding of emerging events is crucial for effective decision-making; automating situation report generation can significantly reduce the time, effort, and cost for intelligence analysts. In this work, we identify intelligence analysts' practices and preferences for AI assistance in situation report generation to guide the design strategies for an effective, trust-building interface that aligns with their thought processes and needs. Next, we introduce SmartBook, an automated framework designed to generate situation reports from large volumes of news data, creating structured reports by automatically discovering event-related strategic questions. These reports include multiple hypotheses (claims), summarized and grounded to sources with factual evidence, to promote in-depth situation understanding. Our comprehensive evaluation of SmartBook, encompassing a user study alongside a content review with an editing study, reveals SmartBook's effectiveness in generating accurate and relevant situation reports. Qualitative evaluations indicate over 80% of questions probe for strategic information, and over 90% of summaries produce tactically useful content, being consistently favored over summaries from a large language model integrated with web search. The editing study reveals that minimal information is removed from the generated text (under 2.5%), suggesting that SmartBook provides analysts with a valuable foundation for situation reports
△ Less
Submitted 27 May, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Turning Noises to Fingerprint-Free "Credentials": Secure and Usable Drone Authentication
Authors:
Chuxiong Wu,
Qiang Zeng
Abstract:
Drones have been widely used in various services, such as delivery and surveillance. Authentication forms the foundation of the security of these services. However, drones are expensive and may carry important payloads. To avoid being captured by attackers, drones should keep a safe distance from the verifier before authentication succeeds. This makes authentication methods that only work in very…
▽ More
Drones have been widely used in various services, such as delivery and surveillance. Authentication forms the foundation of the security of these services. However, drones are expensive and may carry important payloads. To avoid being captured by attackers, drones should keep a safe distance from the verifier before authentication succeeds. This makes authentication methods that only work in very close proximity not applicable. Our work leverages drone noises for authentication. While using sounds for authentication is highly usable, how to handle various attacks that manipulate sounds is an \emph{unresolved challenge}. It is also unclear how to ensure robustness under various environmental sounds. Being the first in the literature, we address the two major challenges by exploiting unique characteristics of drone noises. We thereby build an authentication system that does \emph{not} rely on any drone sound fingerprints, keeps resilient to attacks, and is robust under environmental sounds. An extensive evaluation demonstrates its security and usability.
△ Less
Submitted 10 April, 2024; v1 submitted 17 February, 2023;
originally announced February 2023.
-
Foresee What You Will Learn: Data Augmentation for Domain Generalization in Non-stationary Environment
Authors:
Qiuhao Zeng,
Wei Wang,
Fan Zhou,
Charles Ling,
Boyu Wang
Abstract:
Existing domain generalization aims to learn a generalizable model to perform well even on unseen domains. For many real-world machine learning applications, the data distribution often shifts gradually along domain indices. For example, a self-driving car with a vision system drives from dawn to dusk, with the sky darkening gradually. Therefore, the system must be able to adapt to changes in ambi…
▽ More
Existing domain generalization aims to learn a generalizable model to perform well even on unseen domains. For many real-world machine learning applications, the data distribution often shifts gradually along domain indices. For example, a self-driving car with a vision system drives from dawn to dusk, with the sky darkening gradually. Therefore, the system must be able to adapt to changes in ambient illumination and continue to drive safely on the road. In this paper, we formulate such problems as Evolving Domain Generalization, where a model aims to generalize well on a target domain by discovering and leveraging the evolving pattern of the environment. We then propose Directional Domain Augmentation (DDA), which simulates the unseen target features by mapping source data as augmentations through a domain transformer. Specifically, we formulate DDA as a bi-level optimization problem and solve it through a novel meta-learning approach in the representation space. We evaluate the proposed method on both synthetic datasets and realworld datasets, and empirical results show that our approach can outperform other existing methods.
△ Less
Submitted 8 March, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.
-
Towards Transcervical Ultrasound Image Guidance for Transoral Robotic Surgery
Authors:
Wanwen Chen,
Megha Kalia,
Qi Zeng,
Emily H. T. Pang,
Razeyeh Bagherinasab,
Thomas D. Milner,
Farahna Sabiq,
Eitan Prisman,
Septimiu E. Salcudean
Abstract:
Purpose: Trans-oral robotic surgery (TORS) using the da Vinci surgical robot is a new minimally-invasive surgery method to treat oropharyngeal tumors, but it is a challenging operation. Augmented reality (AR) based on intra-operative ultrasound (US) has the potential to enhance the visualization of the anatomy and cancerous tumors to provide additional tools for decision-making in surgery. Methods…
▽ More
Purpose: Trans-oral robotic surgery (TORS) using the da Vinci surgical robot is a new minimally-invasive surgery method to treat oropharyngeal tumors, but it is a challenging operation. Augmented reality (AR) based on intra-operative ultrasound (US) has the potential to enhance the visualization of the anatomy and cancerous tumors to provide additional tools for decision-making in surgery. Methods: We propose and carry out preliminary evaluations of a US-guided AR system for TORS, with the transducer placed on the neck for a transcervical view. Firstly, we perform a novel MRI-transcervical 3D US registration study. Secondly, we develop a US-robot calibration method with an optical tracker and an AR system to display the anatomy mesh model in the real-time endoscope images inside the surgeon console. Results: Our AR system reaches a mean projection error of 26.81 and 27.85 pixels for the projection from the US to stereo cameras in a water bath experiment. The average target registration error for MRI to 3D US is 8.90 mm for the 3D US transducer and 5.85 mm for freehand 3D US, and the average distance between the vessel centerlines is 2.32 mm. Conclusion: We demonstrate the first proof-of-concept transcervical US-guided AR system for TORS and the feasibility of trans-cervical 3D US-MRI registration. Our results show that trans-cervical 3D US is a promising technique for TORS image guidance.
△ Less
Submitted 31 March, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost
Authors:
Qingcheng Zeng,
Lucas Garay,
Peilin Zhou,
Dading Chong,
Yining Hua,
Jiageng Wu,
Yikang Pan,
Han Zhou,
Rob Voigt,
Jie Yang
Abstract:
Large pre-trained models have revolutionized natural language processing (NLP) research and applications, but high training costs and limited data resources have prevented their benefits from being shared equally amongst speakers of all the world's languages. To address issues of cross-linguistic access to such models and reduce energy consumption for sustainability during large-scale model traini…
▽ More
Large pre-trained models have revolutionized natural language processing (NLP) research and applications, but high training costs and limited data resources have prevented their benefits from being shared equally amongst speakers of all the world's languages. To address issues of cross-linguistic access to such models and reduce energy consumption for sustainability during large-scale model training, this study proposes an effective and energy-efficient framework called GreenPLM that uses bilingual lexicons to directly "translate" pre-trained language models of one language into another at almost no additional cost. We validate this approach in 18 languages' BERT models and show that this framework is comparable to, if not better than, other heuristics with high training costs. In addition, given lightweight continued pre-training on limited data where available, this framework outperforms the original monolingual language models in six out of seven tested languages with up to 200x less pre-training efforts. Aiming at the Leave No One Behind Principle (LNOB), our approach manages to reduce inequalities between languages and energy consumption greatly. We make our codes and models publicly available here: \url{https://github.com/qcznlp/GreenPLMs}
△ Less
Submitted 26 May, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Filter and evolve: progressive pseudo label refining for semi-supervised automatic speech recognition
Authors:
Zezhong Jin,
Dading Zhong,
Xiao Song,
Zhaoyi Liu,
Naipeng Ye,
Qingcheng Zeng
Abstract:
Fine tuning self supervised pretrained models using pseudo labels can effectively improve speech recognition performance. But, low quality pseudo labels can misguide decision boundaries and degrade performance. We propose a simple yet effective strategy to filter low quality pseudo labels to alleviate this problem. Specifically, pseudo-labels are produced over the entire training set and filtered…
▽ More
Fine tuning self supervised pretrained models using pseudo labels can effectively improve speech recognition performance. But, low quality pseudo labels can misguide decision boundaries and degrade performance. We propose a simple yet effective strategy to filter low quality pseudo labels to alleviate this problem. Specifically, pseudo-labels are produced over the entire training set and filtered via average probability scores calculated from the model output. Subsequently, an optimal percentage of utterances with high probability scores are considered reliable training data with trustworthy labels. The model is iteratively updated to correct the unreliable pseudo labels to minimize the effect of noisy labels. The process above is repeated until unreliable pseudo abels have been adequately corrected. Extensive experiments on LibriSpeech show that these filtered samples enable the refined model to yield more correct predictions, leading to better ASR performances under various experimental settings.
△ Less
Submitted 28 October, 2022;
originally announced October 2022.