-
Debias Can be Unreliable: Mitigating Bias Issue in Evaluating Debiasing Recommendation
Authors:
Chengbing Wang,
Wentao Shi,
Jizhi Zhang,
Wenjie Wang,
Hang Pan,
Fuli Feng
Abstract:
Recent work has improved recommendation models remarkably by equipping them with debiasing methods. Due to the unavailability of fully-exposed datasets, most existing approaches resort to randomly-exposed datasets as a proxy for evaluating debiased models, employing traditional evaluation scheme to represent the recommendation performance. However, in this study, we reveal that traditional evaluat…
▽ More
Recent work has improved recommendation models remarkably by equipping them with debiasing methods. Due to the unavailability of fully-exposed datasets, most existing approaches resort to randomly-exposed datasets as a proxy for evaluating debiased models, employing traditional evaluation scheme to represent the recommendation performance. However, in this study, we reveal that traditional evaluation scheme is not suitable for randomly-exposed datasets, leading to inconsistency between the Recall performance obtained using randomly-exposed datasets and that obtained using fully-exposed datasets. Such inconsistency indicates the potential unreliability of experiment conclusions on previous debiasing techniques and calls for unbiased Recall evaluation using randomly-exposed datasets. To bridge the gap, we propose the Unbiased Recall Evaluation (URE) scheme, which adjusts the utilization of randomly-exposed datasets to unbiasedly estimate the true Recall performance on fully-exposed datasets. We provide theoretical evidence to demonstrate the rationality of URE and perform extensive experiments on real-world datasets to validate its soundness.
△ Less
Submitted 7 September, 2024;
originally announced September 2024.
-
The Giant Radio Array for Neutrino Detection (GRAND) Collaboration -- Contributions to the 10th International Workshop on Acoustic and Radio EeV Neutrino Detection Activities (ARENA 2024)
Authors:
Rafael Alves Batista,
Aurélien Benoit-Lévy,
Teresa Bister,
Martina Bohacova,
Mauricio Bustamante,
Washington Carvalho,
Yiren Chen,
LingMei Cheng,
Simon Chiche,
Jean-Marc Colley,
Pablo Correa,
Nicoleta Cucu Laurenciu,
Zigao Dai,
Rogerio M. de Almeida,
Beatriz de Errico,
Sijbrand de Jong,
João R. T. de Mello Neto,
Krijn D de Vries,
Valentin Decoene,
Peter B. Denton,
Bohao Duan,
Kaikai Duan,
Ralph Engel,
William Erba,
Yizhong Fan
, et al. (100 additional authors not shown)
Abstract:
This is an index of the contributions by the Giant Radio Array for Neutrino Detection (GRAND) Collaboration to the 10th International Workshop on Acoustic and Radio EeV Neutrino Detection Activities (ARENA 2024, University of Chicago, June 11-14, 2024). The contributions include an overview of GRAND in its present and future incarnations, methods of radio-detection that are being developed for the…
▽ More
This is an index of the contributions by the Giant Radio Array for Neutrino Detection (GRAND) Collaboration to the 10th International Workshop on Acoustic and Radio EeV Neutrino Detection Activities (ARENA 2024, University of Chicago, June 11-14, 2024). The contributions include an overview of GRAND in its present and future incarnations, methods of radio-detection that are being developed for them, and ongoing joint work between the GRAND and BEACON experiments.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
An Efficient Two-Dimensional Functional Mixed-Effect Model Framework for Repeatedly Measured Functional Data
Authors:
Cheng Cao,
Jiguo Cao,
Hao Pan,
Yunting Zhang,
Fan Jiang,
Xinyue Li
Abstract:
With the rapid development of wearable device technologies, accelerometers can record minute-by-minute physical activity for consecutive days, which provides important insight into a dynamic association between the intensity of physical activity and mental health outcomes for large-scale population studies. Using Shanghai school adolescent cohort we estimate the effect of health assessment results…
▽ More
With the rapid development of wearable device technologies, accelerometers can record minute-by-minute physical activity for consecutive days, which provides important insight into a dynamic association between the intensity of physical activity and mental health outcomes for large-scale population studies. Using Shanghai school adolescent cohort we estimate the effect of health assessment results on physical activity profiles recorded by accelerometers throughout a week, which is recognized as repeatedly measured functional data. To achieve this goal, we propose an innovative two-dimensional functional mixed-effect model (2dFMM) for the specialized data, which smoothly varies over longitudinal day observations with covariate-dependent mean and covariance functions. The modeling framework characterizes the longitudinal and functional structures while incorporating two-dimensional fixed effects for covariates of interest. We also develop a fast three-stage estimation procedure to provide accurate fixed-effect inference for model interpretability and improve computational efficiency when encountering large datasets. We find strong evidence of intraday and interday varying significant associations between physical activity and mental health assessments among our cohort population, which shed light on possible intervention strategies targeting daily physical activity patterns to improve school adolescent mental health. Our method is also used in environmental data to illustrate the wide applicability. Supplementary materials for this article are available online.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
DiffCSG: Differentiable CSG via Rasterization
Authors:
Haocheng Yuan,
Adrien Bousseau,
Hao Pan,
Chengquan Zhang,
Niloy J. Mitra,
Changjian Li
Abstract:
Differentiable rendering is a key ingredient for inverse rendering and machine learning, as it allows to optimize scene parameters (shape, materials, lighting) to best fit target images. Differentiable rendering requires that each scene parameter relates to pixel values through differentiable operations. While 3D mesh rendering algorithms have been implemented in a differentiable way, these algori…
▽ More
Differentiable rendering is a key ingredient for inverse rendering and machine learning, as it allows to optimize scene parameters (shape, materials, lighting) to best fit target images. Differentiable rendering requires that each scene parameter relates to pixel values through differentiable operations. While 3D mesh rendering algorithms have been implemented in a differentiable way, these algorithms do not directly extend to Constructive-Solid-Geometry (CSG), a popular parametric representation of shapes, because the underlying boolean operations are typically performed with complex black-box mesh-processing libraries. We present an algorithm, DiffCSG, to render CSG models in a differentiable manner. Our algorithm builds upon CSG rasterization, which displays the result of boolean operations between primitives without explicitly computing the resulting mesh and, as such, bypasses black-box mesh processing. We describe how to implement CSG rasterization within a differentiable rendering pipeline, taking special care to apply antialiasing along primitive intersections to obtain gradients in such critical areas. Our algorithm is simple and fast, can be easily incorporated into modern machine learning setups, and enables a range of applications for computer-aided design, including direct and image-based editing of CSG primitives. Code and data: https://yyyyyhc.github.io/DiffCSG/.
△ Less
Submitted 9 September, 2024; v1 submitted 2 September, 2024;
originally announced September 2024.
-
Sparse Mamba: Reinforcing Controllability In Structural State Space Models
Authors:
Emadeldeen Hamdan,
Hongyi Pan,
Ahmet Enis Cetin
Abstract:
In this article, we introduce the concept of controllability and observability to the M amba architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. The structured state space model (SSM) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models (LLMs) on longer se…
▽ More
In this article, we introduce the concept of controllability and observability to the M amba architecture in our Sparse-Mamba (S-Mamba) for natural language processing (NLP) applications. The structured state space model (SSM) development in recent studies, such as Mamba and Mamba2, outperformed and solved the computational inefficiency of transformers and large language models (LLMs) on longer sequences in small to medium NLP tasks. The Mamba SSMs architecture drops the need for attention layer or MLB blocks in transformers. However, the current Mamba models do not reinforce the controllability on state space equations in the calculation of A, B, C, and D matrices at each time step, which increase the complexity and the computational cost needed. In this article we show that the number of parameters can be significantly decreased by reinforcing controllability in the state space equations in the proposed Sparse-Mamba (S-Mamba), while maintaining the performance. The controllable n x n state matrix A is sparse and it has only n free parameters. Our novel approach will ensure a controllable system and could be the gate key for Mamba 3.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
PolypDB: A Curated Multi-Center Dataset for Development of AI Algorithms in Colonoscopy
Authors:
Debesh Jha,
Nikhil Kumar Tomar,
Vanshali Sharma,
Quoc-Huy Trinh,
Koushik Biswas,
Hongyi Pan,
Ritika K. Jha,
Gorkem Durak,
Alexander Hann,
Jonas Varkey,
Hang Viet Dao,
Long Van Dao,
Binh Phuc Nguyen,
Khanh Cong Pham,
Quang Trung Tran,
Nikolaos Papachrysos,
Brandon Rieders,
Peter Thelin Schmidt,
Enrik Geissler,
Tyler Berzin,
Pål Halvorsen,
Michael A. Riegler,
Thomas de Lange,
Ulas Bagci
Abstract:
Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer…
▽ More
Colonoscopy is the primary method for examination, detection, and removal of polyps. Regular screening helps detect and prevent colorectal cancer at an early curable stage. However, challenges such as variation among the endoscopists' skills, bowel quality preparation, and complex nature of the large intestine which cause large number of polyp miss-rate. These missed polyps can develop into cancer later on, which underscores the importance of improving the detection methods. A computer-aided diagnosis system can support physicians by assisting in detecting overlooked polyps. However, one of the important challenges for developing novel deep learning models for automatic polyp detection and segmentation is the lack of publicly available, multi-center large and diverse datasets. To address this gap, we introduce PolypDB, a large scale publicly available dataset that contains 3934 still polyp images and their corresponding ground truth from real colonoscopy videos to design efficient polyp detection and segmentation architectures. The dataset has been developed and verified by a team of 10 gastroenterologists. PolypDB comprises of images from five modalities: Blue Light Imaging (BLI), Flexible Imaging Color Enhancement (FICE), Linked Color Imaging (LCI), Narrow Band Imaging (NBI), and White Light Imaging (WLI) and three medical centers from Norway, Sweden and Vietnam. Thus, we split the dataset based on modality and medical center for modality-wise and center-wise analysis. We provide a benchmark on each modality using eight popular segmentation methods and six standard benchmark polyp detection methods. Furthermore, we also provide benchmark on center-wise under federated learning settings. Our dataset is public and can be downloaded at \url{https://osf.io/pr7ms/}.
△ Less
Submitted 19 August, 2024;
originally announced September 2024.
-
Deep extragalactic HI survey of the COSMOS field with FAST
Authors:
Hengxing Pan,
Matt J. Jarvis,
Ming Zhu,
Yin-Zhe Ma,
Mario G. Santos,
Anastasia A. Ponomareva,
Ian Heywood,
Yingjie Jing,
Chen Xu,
Ziming Liu,
Yogesh Chandola,
Yipeng Jing
Abstract:
We present a deep HI survey at L-band conducted with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) over the COSMOS field. This survey is strategically designed to overlap with the MIGHTEE COSMOS field, aiming to combine the sensitivity of the FAST and high-resolution of the MeerKAT. We observed the field with FAST for 11 hours covering $\sim$2 square degrees, and reduced the raw…
▽ More
We present a deep HI survey at L-band conducted with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) over the COSMOS field. This survey is strategically designed to overlap with the MIGHTEE COSMOS field, aiming to combine the sensitivity of the FAST and high-resolution of the MeerKAT. We observed the field with FAST for 11 hours covering $\sim$2 square degrees, and reduced the raw data to HI spectral cubes over the frequency range 1310-1420 MHz. The FAST-HI data reach a median 3$σ$ column density of $N_{\rm HI}\sim2\times10^{17}$ cm$^{-2}$ over a 5 km s$^{-1}$ channel width, allowing for studies of the distribution of HI gas in various environments, such as in galaxies, the Circum-Galactic Medium (CGM) and Intergalactic Medium (IGM). We visually searched the spectral cubes for HI sources, and found a total of 80 HI detections, of which 56 have been cross-matched with the MIGHTEE-HI catalogue. With the cross-matched sources, we compare their HI masses and find that the total HI mass fraction in the IGM and CGM surrounding the galaxy pairs is statistically higher than the HI fraction surrounding the isolated galaxies by a difference of 13$\pm$4%, indicating that the CGM and IGM associated with interacting systems are richer in neutral hydrogen compared to those around isolated galaxies in the local Universe. We also describe several FAST-MeerKAT synergy projects, highlighting the full potential of exploiting both single-dish and interferometric observations to study the distribution and evolution of the diffuse HI gas.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
BURExtract-Llama: An LLM for Clinical Concept Extraction in Breast Ultrasound Reports
Authors:
Yuxuan Chen,
Haoyan Yang,
Hengkai Pan,
Fardeen Siddiqui,
Antonio Verdone,
Qingyang Zhang,
Sumit Chopra,
Chen Zhao,
Yiqiu Shen
Abstract:
Breast ultrasound is essential for detecting and diagnosing abnormalities, with radiology reports summarizing key findings like lesion characteristics and malignancy assessments. Extracting this critical information is challenging due to the unstructured nature of these reports, with varied linguistic styles and inconsistent formatting. While proprietary LLMs like GPT-4 are effective, they are cos…
▽ More
Breast ultrasound is essential for detecting and diagnosing abnormalities, with radiology reports summarizing key findings like lesion characteristics and malignancy assessments. Extracting this critical information is challenging due to the unstructured nature of these reports, with varied linguistic styles and inconsistent formatting. While proprietary LLMs like GPT-4 are effective, they are costly and raise privacy concerns when handling protected health information. This study presents a pipeline for developing an in-house LLM to extract clinical information from radiology reports. We first use GPT-4 to create a small labeled dataset, then fine-tune a Llama3-8B model on it. Evaluated on clinician-annotated reports, our model achieves an average F1 score of 84.6%, which is on par with GPT-4. Our findings demonstrate the feasibility of developing an in-house LLM that not only matches GPT-4's performance but also offers cost reductions and enhanced data privacy.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
GRANDlib: A simulation pipeline for the Giant Radio Array for Neutrino Detection (GRAND)
Authors:
GRAND Collaboration,
Rafael Alves Batista,
Aurélien Benoit-Lévy,
Teresa Bister,
Martina Bohacova,
Mauricio Bustamante,
Washington Carvalho,
Yiren Chen,
LingMei Cheng,
Simon Chiche,
Jean-Marc Colley,
Pablo Correa,
Nicoleta Cucu Laurenciu,
Zigao Dai,
Rogerio M. de Almeida,
Beatriz de Errico,
Sijbrand de Jong,
João R. T. de Mello Neto,
Krijn D. de Vries,
Valentin Decoene,
Peter B. Denton,
Bohao Duan,
Kaikai Duan,
Ralph Engel,
William Erba
, et al. (90 additional authors not shown)
Abstract:
The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challen…
▽ More
The operation of upcoming ultra-high-energy cosmic-ray, gamma-ray, and neutrino radio-detection experiments, like the Giant Radio Array for Neutrino Detection (GRAND), poses significant computational challenges involving the production of numerous simulations of particle showers and their detection, and a high data throughput. GRANDlib is an open-source software tool designed to meet these challenges. Its primary goal is to perform end-to-end simulations of the detector operation, from the interaction of ultra-high-energy particles, through -- by interfacing with external air-shower simulations -- the ensuing particle shower development and its radio emission, to its detection by antenna arrays and its processing by data-acquisition systems. Additionally, GRANDlib manages the visualization, storage, and retrieval of experimental and simulated data. We present an overview of GRANDlib to serve as the basis of future GRAND analyses.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Fusion Rules of Majorana-Kramer-Pairs in Time-Reversal-Invariant Topological Superconductors
Authors:
Hongfa Pan,
Jinxiong Jia,
Zhenhua Qiao
Abstract:
We theoretically investigate the fusion rules of Majorana Kramers pairs in time-reversal-invariant topological superconductors. We find that the fusion of Majorana Kramers pairs is a process that Ising anyons fuse independently in the two distinct time-reversal sectors. Considering the full fusion including the initialization and the fusion, we explore the observation of a supersymmetry that emerg…
▽ More
We theoretically investigate the fusion rules of Majorana Kramers pairs in time-reversal-invariant topological superconductors. We find that the fusion of Majorana Kramers pairs is a process that Ising anyons fuse independently in the two distinct time-reversal sectors. Considering the full fusion including the initialization and the fusion, we explore the observation of a supersymmetry that emerges in time-reversal-invariant topological superconductors, and design the schemes for the nontrivial fusion and the trivial fusion to show the non-Abelian statistics of Majorana Kramers pairs. We also show the possible influence of local adiabatic mixing on the fusion and the differentiation between distinct fusion processes remains feasible even in the presence of such mixing. Our proposals are applied in $d_{x^2-y^2}$-wave topological superconductors, and the theoretical framework can be extended to the fusion of multiple Majorana zero modes protected by unitary symmetry.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Physically Aware Synthesis Revisited: Guiding Technology Mapping with Primitive Logic Gate Placement
Authors:
Hongyang Pan,
Cunqing Lan,
Yiting Liu,
Zhiang Wang,
Li Shang,
Xuan Zeng,
Fan Yang,
Keren Zhu
Abstract:
A typical VLSI design flow is divided into separated front-end logic synthesis and back-end physical design (PD) stages, which often require costly iterations between these stages to achieve design closure. Existing approaches face significant challenges, notably in utilizing feedback from physical metrics to better adapt and refine synthesis operations, and in establishing a unified and comprehen…
▽ More
A typical VLSI design flow is divided into separated front-end logic synthesis and back-end physical design (PD) stages, which often require costly iterations between these stages to achieve design closure. Existing approaches face significant challenges, notably in utilizing feedback from physical metrics to better adapt and refine synthesis operations, and in establishing a unified and comprehensive metric. This paper introduces a new Primitive logic gate placement guided technology MAPping (PigMAP) framework to address these challenges. With approximating technology-independent spatial information, we develop a novel wirelength (WL) driven mapping algorithm to produce PD-friendly netlists. PigMAP is equipped with two schemes: a performance mode that focuses on optimizing the critical path WL to achieve high performance, and a power mode that aims to minimize the total WL, resulting in balanced power and performance outcomes. We evaluate our framework using the EPFL benchmark suites with ASAP7 technology, using the OpenROAD tool for place-and-route. Compared with OpenROAD flow scripts, performance mode reduces delay by 14% while increasing power consumption by only 6%. Meanwhile, power mode achieves a 3% improvement in delay and a 9% reduction in power consumption.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
TaPS: A Performance Evaluation Suite for Task-based Execution Frameworks
Authors:
J. Gregory Pauloski,
Valerie Hayot-Sasson,
Maxime Gonthier,
Nathaniel Hudson,
Haochen Pan,
Sicheng Zhou,
Ian Foster,
Kyle Chard
Abstract:
Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal. Task-based execution frameworks abstract the parallel execution of an application's tasks on arbitrary hardware. Research into these task ex…
▽ More
Task-based execution frameworks, such as parallel programming libraries, computational workflow systems, and function-as-a-service platforms, enable the composition of distinct tasks into a single, unified application designed to achieve a computational goal. Task-based execution frameworks abstract the parallel execution of an application's tasks on arbitrary hardware. Research into these task executors has accelerated as computational sciences increasingly need to take advantage of parallel compute and/or heterogeneous hardware. However, the lack of evaluation standards makes it challenging to compare and contrast novel systems against existing implementations. Here, we introduce TaPS, the Task Performance Suite, to support continued research in parallel task executor frameworks. TaPS provides (1) a unified, modular interface for writing and evaluating applications using arbitrary execution frameworks and data management systems and (2) an initial set of reference synthetic and real-world science applications. We discuss how the design of TaPS supports the reliable evaluation of frameworks and demonstrate TaPS through a survey of benchmarks using the provided reference applications.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Reformulating Conversational Recommender Systems as Tri-Phase Offline Policy Learning
Authors:
Gangyi Zhang,
Chongming Gao,
Hang Pan,
Runzhe Teng,
Ruizhe Li
Abstract:
Existing Conversational Recommender Systems (CRS) predominantly utilize user simulators for training and evaluating recommendation policies. These simulators often oversimplify the complexity of user interactions by focusing solely on static item attributes, neglecting the rich, evolving preferences that characterize real-world user behavior. This limitation frequently leads to models that perform…
▽ More
Existing Conversational Recommender Systems (CRS) predominantly utilize user simulators for training and evaluating recommendation policies. These simulators often oversimplify the complexity of user interactions by focusing solely on static item attributes, neglecting the rich, evolving preferences that characterize real-world user behavior. This limitation frequently leads to models that perform well in simulated environments but falter in actual deployment. Addressing these challenges, this paper introduces the Tri-Phase Offline Policy Learning-based Conversational Recommender System (TCRS), which significantly reduces dependency on real-time interactions and mitigates overfitting issues prevalent in traditional approaches. TCRS integrates a model-based offline learning strategy with a controllable user simulation that dynamically aligns with both personalized and evolving user preferences. Through comprehensive experiments, TCRS demonstrates enhanced robustness, adaptability, and accuracy in recommendations, outperforming traditional CRS models in diverse user scenarios. This approach not only provides a more realistic evaluation environment but also facilitates a deeper understanding of user behavior dynamics, thereby refining the recommendation process.
△ Less
Submitted 7 September, 2024; v1 submitted 13 August, 2024;
originally announced August 2024.
-
Data Generation Scheme for Thermal Modality with Edge-Guided Adversarial Conditional Diffusion Model
Authors:
Guoqing Zhu,
Honghu Pan,
Qiang Wang,
Chao Tian,
Chao Yang,
Zhenyu He
Abstract:
In challenging low light and adverse weather conditions,thermal vision algorithms,especially object detection,have exhibited remarkable potential,contrasting with the frequent struggles encountered by visible vision algorithms. Nevertheless,the efficacy of thermal vision algorithms driven by deep learning models remains constrained by the paucity of available training data samples. To this end,thi…
▽ More
In challenging low light and adverse weather conditions,thermal vision algorithms,especially object detection,have exhibited remarkable potential,contrasting with the frequent struggles encountered by visible vision algorithms. Nevertheless,the efficacy of thermal vision algorithms driven by deep learning models remains constrained by the paucity of available training data samples. To this end,this paper introduces a novel approach termed the edge guided conditional diffusion model. This framework aims to produce meticulously aligned pseudo thermal images at the pixel level,leveraging edge information extracted from visible images. By utilizing edges as contextual cues from the visible domain,the diffusion model achieves meticulous control over the delineation of objects within the generated images. To alleviate the impacts of those visible-specific edge information that should not appear in the thermal domain,a two-stage modality adversarial training strategy is proposed to filter them out from the generated images by differentiating the visible and thermal modality. Extensive experiments on LLVIP demonstrate ECDM s superiority over existing state-of-the-art approaches in terms of image generation quality.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
NeurDB: On the Design and Implementation of an AI-powered Autonomous Database
Authors:
Zhanhao Zhao,
Shaofeng Cai,
Haotian Gao,
Hexiang Pan,
Siqi Xiang,
Naili Xing,
Gang Chen,
Beng Chin Ooi,
Yanyan Shen,
Yuncheng Wu,
Meihui Zhang
Abstract:
Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper intr…
▽ More
Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper introduces NeurDB, an AI-powered autonomous database that deepens the fusion of AI and databases with adaptability to data and workload drift. NeurDB establishes a new in-database AI ecosystem that seamlessly integrates AI workflows within the database. This integration enables efficient and effective in-database AI analytics and fast-adaptive learned system components. Empirical evaluations demonstrate that NeurDB substantially outperforms existing solutions in managing AI analytics tasks, with the proposed learned components more effectively handling environmental dynamism than state-of-the-art approaches.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
Study of Wide-Field-of-View X-ray Observations of the Virgo Cluster Using the Lobster Eye Imager for Astronomy
Authors:
Wen-Cheng Feng,
Shu-Mei Jia,
Hai-Hui Zhao,
Heng Yu,
Hai-Wu Pan,
Cheng-Kui Li,
Yu-Lin Cheng,
Shan-Shan Weng,
Yong Chen,
Yuan Liu,
Zhi-Xing Ling,
Chen Zhang
Abstract:
The Lobster Eye Imager for Astronomy (LEIA) is the pathfinder of the wide-field X-ray telescope used in the Einstein Probe mission. In this study, we present an image of the Virgo Cluster taken by LEIA in the 0.5-4.5 keV band with an exposure time of $\sim$17.3 ks in the central region. This extended emission is generally consistent with the results obtained by ROSAT. However, the field is affecte…
▽ More
The Lobster Eye Imager for Astronomy (LEIA) is the pathfinder of the wide-field X-ray telescope used in the Einstein Probe mission. In this study, we present an image of the Virgo Cluster taken by LEIA in the 0.5-4.5 keV band with an exposure time of $\sim$17.3 ks in the central region. This extended emission is generally consistent with the results obtained by ROSAT. However, the field is affected by bright point sources due to the instrument's Point Spread Function (PSF) effect. Through fitting of the LEIA spectrum of the Virgo Cluster, we obtained a temperature of $2.1^{+0.3}_{-0.1}$ keV, which is consistent with the XMM-Newton results ($\sim$2.3 keV). Above 1.6 keV, the spectrum is dominated by the X-ray background. In summary, this study validates LEIA's extended source imaging and spectral resolution capabilities for the first time.
△ Less
Submitted 31 July, 2024;
originally announced August 2024.
-
Braiding Induced by Finite-Size Effect in One-Dimensional Topological Superconductors
Authors:
Hongfa Pan,
Zhengtian Li,
Jinxiong Jia,
Zhenhua Qiao
Abstract:
We investigate the transport properties of Majorana zero mode (MZM) and Majorana Kramers pair (MKP) in one-dimensional topological superconductors, respectively. An effective model is established for braiding of MZMs and MKPs. We employ the $d_{x^{2}-y^{2}}$-wave topological superconductors to embody the effective model for braiding of MKPs by utilizing finite-size effects and locally tunable coup…
▽ More
We investigate the transport properties of Majorana zero mode (MZM) and Majorana Kramers pair (MKP) in one-dimensional topological superconductors, respectively. An effective model is established for braiding of MZMs and MKPs. We employ the $d_{x^{2}-y^{2}}$-wave topological superconductors to embody the effective model for braiding of MKPs by utilizing finite-size effects and locally tunable coupling parameters. We show how to construct the state initialization and readout via gate control. We also use this method for braiding MZMs in s-wave topological superconductors. Our proposal presents a promising avenue for experimentally verifying the non-Abelian statistical properties of MZMs and MKPs, with implications for topological quantum computing.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Rabi and Ramsey oscillations of a Majorana qubit in a quantum dot-superconductor array
Authors:
Haining Pan,
Sankar Das Sarma,
Chun-Xiao Liu
Abstract:
The Kitaev chain can be engineered within a quantum dot-superconductor array, hosting Majorana zero modes at fine-tuned sweet spots. In this work, we propose and simulate the occurrence of Rabi and Ramsey oscillations to feasibly construct a minimal Majorana qubit in the quantum dot setup. Our real-time results incorporate realistic effects, e.g., charge noise and leakage, reflecting the latest ex…
▽ More
The Kitaev chain can be engineered within a quantum dot-superconductor array, hosting Majorana zero modes at fine-tuned sweet spots. In this work, we propose and simulate the occurrence of Rabi and Ramsey oscillations to feasibly construct a minimal Majorana qubit in the quantum dot setup. Our real-time results incorporate realistic effects, e.g., charge noise and leakage, reflecting the latest experimental progress. We demonstrate that Majorana qubits with larger energy gaps exhibit significantly enhanced performance -- longer dephasing times, higher quality factors, reduced leakage probabilities, and improved visibilities -- compared to those with smaller gaps and with conventional quantum dot-based charge qubits. We introduce a method for reading out Majorana qubits via quantum capacitance measurements. Our work paves the way for future experiments on realizing Majorana qubits in quantum dot-superconductor arrays.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Observation of Ferromagnetic Phase in the Second Moiré Band of Twisted MoTe2
Authors:
Liheng An,
Haiyang Pan,
Wen-Xuan Qiu,
Naizhou Wang,
Shihao Ru,
Qinghai Tan,
Xuran Dai,
Xiangbin Cai,
Qiuyu Shang,
Xiufang Lu,
Hao Jiang,
Xiaodan Lyu,
Kenji Watanabe,
Takashi Taniguchi,
Fengcheng Wu,
Wei-bo Gao
Abstract:
Flat bands and electron correlation in moiré lattices give rise to many exotic phases, including Mott insulators, superconductivity, and topological states. Within the first moiré band, integer and fractional quantum anomalous Hall effects have been observed in twisted bilayer MoTe2 (tMoTe2) at one hole doping and fractional doping per moiré unit cell, respectively. When the second moiré band is f…
▽ More
Flat bands and electron correlation in moiré lattices give rise to many exotic phases, including Mott insulators, superconductivity, and topological states. Within the first moiré band, integer and fractional quantum anomalous Hall effects have been observed in twisted bilayer MoTe2 (tMoTe2) at one hole doping and fractional doping per moiré unit cell, respectively. When the second moiré band is fully hole doped, quantum spin Hall insulator has also been reported in tMoTe2 at a certain twist angle. Exotic topological states together with ferromagnetic (FM) states in the high moiré band can potentially exist as well. In this study, we report the observation of a FM phase in the second moiré band in tMoTe2. The FM phase can be tuned by both the doping level and displacement field. At filling around 2.58 holes per moiré unit cell, the FM phase reaches a Curie temperature of 3.5 K. A large displacement field can suppress the FM phase, like the FM phase at the filling of -1. Our results demonstrate the realization of time-reversal symmetry-breaking states in the higher moiré bands in tMoTe2.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
X-ray Sources Classification Using Machine Learning: A Study with EP-WXT Pathfinder LEIA
Authors:
Xiaoxiong Zuo,
Yihan Tao,
Yuan Liu,
Yunfei Xu,
Wenda Zhang,
Haiwu Pan,
Hui Sun,
Zhen Zhang,
Chenzhou Cui,
Weimin Yuan
Abstract:
X-ray observations play a crucial role in time-domain astronomy. The Einstein Probe (EP), a recently launched X-ray astronomical satellite, emerges as a forefront player in the field of time-domain astronomy and high-energy astrophysics. With a focus on systematic surveys in the soft X-ray band, EP aims to discover high-energy transients and monitor variable sources in the universe. To achieve the…
▽ More
X-ray observations play a crucial role in time-domain astronomy. The Einstein Probe (EP), a recently launched X-ray astronomical satellite, emerges as a forefront player in the field of time-domain astronomy and high-energy astrophysics. With a focus on systematic surveys in the soft X-ray band, EP aims to discover high-energy transients and monitor variable sources in the universe. To achieve these objectives, a quick and reliable classification of observed sources is essential. In this study, we developed a machine learning classifier for autonomous source classification using data from the EP-WXT Pathfinder Lobster Eye Imager for Astronomy (LEIA) and EP-WXT simulations. The proposed Random Forest classifier, built on selected features derived from light curves, energy spectra, and location information, achieves an accuracy of approximately 95% on EP simulation data and 98% on LEIA observational data. The classifier is integrated into the LEIA data processing pipeline, serving as a tool for manual validation and rapid classification during observations. This paper presents an efficient method for the classification of X-ray sources based on single observations, along with implications of most effective features for the task. This work facilitates rapid source classification for the EP mission and also provides valuable insights into feature selection and classification techniques for enhancing the efficiency and accuracy of X-ray source classification that can be adapted to other X-ray telescope data.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Octopus: Experiences with a Hybrid Event-Driven Architecture for Distributed Scientific Computing
Authors:
Haochen Pan,
Ryan Chard,
Sicheng Zhou,
Alok Kamatar,
Rafael Vescovi,
Valerie Hayot-Sasson,
André Bauer,
Maxime Gonthier,
Kyle Chard,
Ian Foster
Abstract:
Scientific research increasingly relies on distributed computational resources, storage systems, networks, and instruments, ranging from HPC and cloud systems to edge devices. Event-driven architecture (EDA) benefits applications targeting distributed research infrastructures by enabling the organization, communication, processing, reliability, and security of events generated from many sources. T…
▽ More
Scientific research increasingly relies on distributed computational resources, storage systems, networks, and instruments, ranging from HPC and cloud systems to edge devices. Event-driven architecture (EDA) benefits applications targeting distributed research infrastructures by enabling the organization, communication, processing, reliability, and security of events generated from many sources. To support the development of scientific EDA, we introduce Octopus, a hybrid, cloud-to-edge event fabric designed to link many local event producers and consumers with cloud-hosted brokers. Octopus can be scaled to meet demand, permits the deployment of highly available Triggers for automatic event processing, and enforces fine-grained access control. We identify requirements in self-driving laboratories, scientific data automation, online task scheduling, epidemic modeling, and dynamic workflow management use cases, and present results demonstrating Octopus' ability to meet those requirements. Octopus supports producing and consuming events at a rate of over 4.2 M and 9.6 M events per second, respectively, from distributed clients.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding
Authors:
Huitong Pan,
Qi Zhang,
Cornelia Caragea,
Eduard Dragut,
Longin Jan Latecki
Abstract:
Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains…
▽ More
Flowcharts are graphical tools for representing complex concepts in concise visual representations. This paper introduces the FlowLearn dataset, a resource tailored to enhance the understanding of flowcharts. FlowLearn contains complex scientific flowcharts and simulated flowcharts. The scientific subset contains 3,858 flowcharts sourced from scientific literature and the simulated subset contains 10,000 flowcharts created using a customizable script. The dataset is enriched with annotations for visual components, OCR, Mermaid code representation, and VQA question-answer pairs. Despite the proven capabilities of Large Vision-Language Models (LVLMs) in various visual understanding tasks, their effectiveness in decoding flowcharts - a crucial element of scientific communication - has yet to be thoroughly investigated. The FlowLearn test set is crafted to assess the performance of LVLMs in flowchart comprehension. Our study thoroughly evaluates state-of-the-art LVLMs, identifying existing limitations and establishing a foundation for future enhancements in this relatively underexplored domain. For instance, in tasks involving simulated flowcharts, GPT-4V achieved the highest accuracy (58%) in counting the number of nodes, while Claude recorded the highest accuracy (83%) in OCR tasks. Notably, no single model excels in all tasks within the FlowLearn framework, highlighting significant opportunities for further development.
△ Less
Submitted 9 July, 2024; v1 submitted 6 July, 2024;
originally announced July 2024.
-
Enhanced Long-Tailed Recognition with Contrastive CutMix Augmentation
Authors:
Haolin Pan,
Yong Guo,
Mianjie Yu,
Jian Chen
Abstract:
Real-world data often follows a long-tailed distribution, where a few head classes occupy most of the data and a large number of tail classes only contain very limited samples. In practice, deep models often show poor generalization performance on tail classes due to the imbalanced distribution. To tackle this, data augmentation has become an effective way by synthesizing new samples for tail clas…
▽ More
Real-world data often follows a long-tailed distribution, where a few head classes occupy most of the data and a large number of tail classes only contain very limited samples. In practice, deep models often show poor generalization performance on tail classes due to the imbalanced distribution. To tackle this, data augmentation has become an effective way by synthesizing new samples for tail classes. Among them, one popular way is to use CutMix that explicitly mixups the images of tail classes and the others, while constructing the labels according to the ratio of areas cropped from two images. However, the area-based labels entirely ignore the inherent semantic information of the augmented samples, often leading to misleading training signals. To address this issue, we propose a Contrastive CutMix (ConCutMix) that constructs augmented samples with semantically consistent labels to boost the performance of long-tailed recognition. Specifically, we compute the similarities between samples in the semantic space learned by contrastive learning, and use them to rectify the area-based labels. Experiments show that our ConCutMix significantly improves the accuracy on tail classes as well as the overall performance. For example, based on ResNeXt-50, we improve the overall accuracy on ImageNet-LT by 3.0% thanks to the significant improvement of 3.3% on tail classes. We highlight that the improvement also generalizes well to other benchmarks and models. Our code and pretrained models are available at https://github.com/PanHaulin/ConCutMix.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Small $x$ asymptotics for special function solutions of Painlevé III equation
Authors:
Hao Pan,
Andrei Prokhorov
Abstract:
In this paper we compute the small $x$ asymptotics of the special function solutions of Painlevé-III equation. We use the representation of solution in terms of Hankel determinants of Bessel functions, which seems to be new. Hankel determinants can be rewritten as multiple contour integrals using Andrèief identity. Finally small $x$ asymptotics is obtained using elementary asymptotic methods.
In this paper we compute the small $x$ asymptotics of the special function solutions of Painlevé-III equation. We use the representation of solution in terms of Hankel determinants of Bessel functions, which seems to be new. Hankel determinants can be rewritten as multiple contour integrals using Andrèief identity. Finally small $x$ asymptotics is obtained using elementary asymptotic methods.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Neutral atomic and molecular gas dynamics in the nearby spiral galaxies NGC 1512, NGC 4535, and NGC 7496
Authors:
Sebastian Laudage,
Cosima Eibensteiner,
Frank Bigiel,
Adam K. Leroy,
Sharon Meidt,
Eva Schinnerer,
W. J. G. de Blok,
Miguele Querejeta,
Sophia Stuber,
Dario Colombo,
Erik Rosolowsky,
D. J. Pisano,
Dyas Utomo,
Rebecca C. Levy,
Ralf Klessen,
Yixian Cao,
Eric W. Koch,
Sushma Kurapati,
Patricia Sanchez-Blazquez,
Justus Neumann,
Lukas Neumann,
Hsi-An Pan,
Thomas G. Williams
Abstract:
Neutral atomic gas (HI) effectively traces galactic dynamics across mid to large galactocentric radii. However, its limitations in observing small-scale changes within the central few kiloparsecs, coupled with the often observed HI deficit in galactic centers, necessitates using molecular gas emission as a preferred tracer in these regions. Understanding the dynamics of both neutral atomic and mol…
▽ More
Neutral atomic gas (HI) effectively traces galactic dynamics across mid to large galactocentric radii. However, its limitations in observing small-scale changes within the central few kiloparsecs, coupled with the often observed HI deficit in galactic centers, necessitates using molecular gas emission as a preferred tracer in these regions. Understanding the dynamics of both neutral atomic and molecular gas is crucial for a more complete understanding of how galaxies evolve, funnel gas from the outer disk into their central parts, and eventually form stars. In this work we aim to quantify the dynamics of both, the neutral atomic and molecular gas, in the nearby spiral galaxies NGC 1512, NGC 4535, and NGC 7496 using new MeerKAT-HI observations together with ALMA CO (2-1) observations from the PHANGS collaboration. We use the analysis tool 3D-Barolo to fit tilted ring models to the HI and CO observations. A combined approach of using the HI to constrain the true disk orientation parameters before applying these to the CO datasets is tested. This paper sets expectations for the results of the upcoming high-resolution HI coverage of many galaxies in the PHANGS-ALMA sample using MeerKAT or VLA, to establish a robust methodology for characterizing galaxy orientations and deriving dynamics from combining new HI with existing CO data.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
PHANGS-MeerKAT and MHONGOOSE HI observations of nearby spiral galaxies: physical drivers of the molecular gas fraction, $R_{\mathrm{mol}}$
Authors:
Cosima Eibensteiner,
Jiayi Sun,
Frank Bigiel,
Adam K. Leroy,
Eva Schinnerer,
Erik Rosolowsky,
Sushma Kurapati,
D. J. Pisano,
W. J. G de Blok,
Ashley T. Barnes,
Mallory Thorp,
Dario Colombo,
Eric W. Koch,
I-Da Chiang,
Eve C. Ostriker,
Eric J. Murphy,
Nikki Zabel,
Sebstian Laudage,
Filippo M. Maccagni,
Julia Healy,
Srikrishna Sekhar,
Dyas Utomo,
Jakob den Brok,
Yixian Cao,
Mélanie Chevance
, et al. (14 additional authors not shown)
Abstract:
The molecular-to-atomic gas ratio is crucial to the evolution of the interstellar medium in galaxies. We investigate the balance between the atomic ($Σ_{\rm HI}$) and molecular gas ($Σ_{\rm H2}$) surface densities in eight nearby star-forming galaxies using new high-quality observations from MeerKAT and ALMA (for HI and CO, respectively). We define the molecular gas ratio as…
▽ More
The molecular-to-atomic gas ratio is crucial to the evolution of the interstellar medium in galaxies. We investigate the balance between the atomic ($Σ_{\rm HI}$) and molecular gas ($Σ_{\rm H2}$) surface densities in eight nearby star-forming galaxies using new high-quality observations from MeerKAT and ALMA (for HI and CO, respectively). We define the molecular gas ratio as $R_{\rm mol} = Σ_{\rm H2} / Σ_{\rm HI}$ and measure how it depends on local conditions in the galaxy disks using multi-wavelength observations. We find that, depending on the galaxy, HI is detected at $>3σ$ out to 20-120 kpc in galactocentric radius ($r_{\rm gal}$). The typical radius at which $Σ_{\rm HI}$ reaches 1~$\rm M_\odot~pc^{-2}$ is $r_{\rm HI}\approx22$~kpc, which corresponds to 1-3 times the optical radius ($r_{25}$). $R_{\rm mol}$ correlates best with the dynamical equilibrium pressure, P$_{\rm DE}$, among potential drivers studied, with a median correlation coefficient of $<ρ>=0.89$. Correlations between $R_{\rm mol}$ and star formation rate, total gas and stellar surface density, metallicity, and $Σ_{\rm SFR}$/P$_{\rm DE}$ are present but somewhat weaker. Our results also show a direct correlation between P$_{\rm DE}$ and $Σ_{\rm SFR}$, supporting self-regulation models. Quantitatively, we measure similar scalings as previous works and attribute the modest differences that we find to the effect of varying resolution and sensitivity. At $r_{\rm gal} {\gtrsim}0.4~r_{25}$, atomic gas dominates over molecular gas, and at the balance of these two gas phases, we find that the baryon mass is dominated by stars, with $Σ_{*} > 5~Σ_{\rm gas}$. Our study constitutes an important step in the statistical investigation of how local galaxy properties impact the conversion from atomic to molecular gas in nearby galaxies.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Hybrid RAG-empowered Multi-modal LLM for Secure Healthcare Data Management: A Diffusion-based Contract Theory Approach
Authors:
Cheng Su,
Jinbo Wen,
Jiawen Kang,
Yonghua Wang,
Hudan Pan,
M. Shamim Hossain
Abstract:
Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amount…
▽ More
Secure data management and effective data sharing have become paramount in the rapidly evolving healthcare landscape. The advancement of generative artificial intelligence has positioned Multi-modal Large Language Models (MLLMs) as crucial tools for managing healthcare data. MLLMs can support multi-modal inputs and generate diverse types of content by leveraging large-scale training on vast amounts of multi-modal data. However, critical challenges persist in developing medical MLLMs, including healthcare data security and freshness issues, affecting the output quality of MLLMs. In this paper, we propose a hybrid Retrieval-Augmented Generation (RAG)-empowered medical MLLMs framework for healthcare data management. This framework leverages a hierarchical cross-chain architecture to facilitate secure data training. Moreover, it enhances the output quality of MLLMs through hybrid RAG, which employs multi-modal metrics to filter various unimodal RAG results and incorporates these retrieval results as additional inputs to MLLMs. Additionally, we employ age of information to indirectly evaluate the data freshness impact of MLLMs and utilize contract theory to incentivize healthcare data holders to share fresh data, mitigating information asymmetry in data sharing. Finally, we utilize a generative diffusion model-based reinforcement learning algorithm to identify the optimal contract for efficient data sharing. Numerical results demonstrate the effectiveness of the proposed schemes, which achieve secure and efficient healthcare data management.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Supercharging Federated Learning with Flower and NVIDIA FLARE
Authors:
Holger R. Roth,
Daniel J. Beutel,
Yan Cheng,
Javier Fernandez Marques,
Heng Pan,
Chester Chen,
Zhihong Zhang,
Yuhong Wen,
Sean Yang,
Isaac,
Yang,
Yuan-Ting Hsieh,
Ziyue Xu,
Daguang Xu,
Nicholas D. Lane,
Andrew Feng
Abstract:
Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in re…
▽ More
Several open-source systems, such as Flower and NVIDIA FLARE, have been developed in recent years while focusing on different aspects of federated learning (FL). Flower is dedicated to implementing a cohesive approach to FL, analytics, and evaluation. Over time, Flower has cultivated extensive strategies and algorithms tailored for FL application development, fostering a vibrant FL community in research and industry. Conversely, FLARE has prioritized the creation of an enterprise-ready, resilient runtime environment explicitly designed for FL applications in production environments. In this paper, we describe our initial integration of both frameworks and show how they can work together to supercharge the FL ecosystem as a whole. Through the seamless integration of Flower and FLARE, applications crafted within the Flower framework can effortlessly operate within the FLARE runtime environment without necessitating any modifications. This initial integration streamlines the process, eliminating complexities and ensuring smooth interoperability between the two platforms, thus enhancing the overall efficiency and accessibility of FL applications.
△ Less
Submitted 22 July, 2024; v1 submitted 21 May, 2024;
originally announced July 2024.
-
FAST survey of H I and OH absorption towards extragalactic radio sources
Authors:
Yogesh Chandola,
D. J. Saikia,
Yin-Zhe Ma,
Zheng Zheng,
Chao-Wei Tsai,
Di Li,
Denis Tramonte,
Hengxing Pan
Abstract:
Neutral atomic hydrogen and molecular gas in the host galaxies of radio active galactic nuclei (AGN) can be traced using H I 21-cm and OH-1667 MHz absorption lines to understand the fueling and feedback processes. We present the results of an H I and OH absorption survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) towards 40 radio sources of low-intermediate radio luminos…
▽ More
Neutral atomic hydrogen and molecular gas in the host galaxies of radio active galactic nuclei (AGN) can be traced using H I 21-cm and OH-1667 MHz absorption lines to understand the fueling and feedback processes. We present the results of an H I and OH absorption survey with the Five-hundred-meter Aperture Spherical radio Telescope (FAST) towards 40 radio sources of low-intermediate radio luminosity ($\sim$10$^{23}$-10$^{26}$ W Hz$^{-1}$ at 1.4 GHz), red mid-infrared color (W2[4.6 $μ$m]$-$W3[12 $μ$m] $>$ 2.5 mag) and redshift up to 0.35. From 13 sources with good data at H I observing frequencies, we report the detection of H I absorption towards 8 sources, 5 of which are new detections including 4 in the redshift range 0.25 to 0.35. Our detection rates are consistent with our previous results with dependence on the star-formation history of the host galaxy reflected in the mid-infrared \textit{WISE} W2$-$W3 colors and the compactness of the radio source. We find no significant dependence of detection rates on radio luminosity or redshift. We also find that H I column densities are anti-correlated with the low-frequency spectral indices ($α_{\rm 150 MHz}^{\rm 1.4 GHz}$, $S_ν\propto ν^{-α}$). We do not have any detection from 23 sources with good data at OH observing frequencies. However, by stacking the spectra we estimate the 3$σ$ upper limit of OH column density to be 2.27$\times$10$^{14}$$T_{\rm ex}$/10 K $\times$1/$f_{\rm c}$ cm$^{-2}$. By stacking the OH spectra for 7 associated H I absorbers, we get a 3$σ$ upper limit of 3.47$\times$10$^{14}$ $T_{\rm ex}$/10 K $\times$1/$f_{\rm c}$ cm$^{-2}$ on OH column density and 1.78$\times$10$^{-7}$ on [OH]/[H I] ratio.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension
Authors:
Jiafeng Liang,
Shixin Jiang,
Zekun Wang,
Haojie Pan,
Zerui Chen,
Zheng Chu,
Ming Liu,
Ruiji Fu,
Zhongyuan Wang,
Bing Qin
Abstract:
There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines…
▽ More
There are substantial instructional videos on the Internet, which provide us tutorials for completing various tasks. Existing instructional video datasets only focus on specific steps at the video level, lacking experiential guidelines at the task level, which can lead to beginners struggling to learn new tasks due to the lack of relevant experience. Moreover, the specific steps without guidelines are trivial and unsystematic, making it difficult to provide a clear tutorial. To address these problems, we present the GUIDE (Guideline-Guided) dataset, which contains 3.5K videos of 560 instructional tasks in 8 domains related to our daily life. Specifically, we annotate each instructional task with a guideline, representing a common pattern shared by all task-related videos. On this basis, we annotate systematic specific steps, including their associated guideline steps, specific step descriptions and timestamps. Our proposed benchmark consists of three sub-tasks to evaluate comprehension ability of models: (1) Step Captioning: models have to generate captions for specific steps from videos. (2) Guideline Summarization: models have to mine the common pattern in task-related videos and summarize a guideline from them. (3) Guideline-Guided Captioning: models have to generate captions for specific steps under the guide of guideline. We evaluate plenty of foundation models with GUIDE and perform in-depth analysis. Given the diversity and practicality of GUIDE, we believe that it can be used as a better benchmark for instructional video comprehension.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SciDMT: A Large-Scale Corpus for Detecting Scientific Mentions
Authors:
Huitong Pan,
Qi Zhang,
Cornelia Caragea,
Eduard Dragut,
Longin Jan Latecki
Abstract:
We present SciDMT, an enhanced and expanded corpus for scientific mention detection, offering a significant advancement over existing related resources. SciDMT contains annotated scientific documents for datasets (D), methods (M), and tasks (T). The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated me…
▽ More
We present SciDMT, an enhanced and expanded corpus for scientific mention detection, offering a significant advancement over existing related resources. SciDMT contains annotated scientific documents for datasets (D), methods (M), and tasks (T). The corpus consists of two components: 1) the SciDMT main corpus, which includes 48 thousand scientific articles with over 1.8 million weakly annotated mention annotations in the format of in-text span, and 2) an evaluation set, which comprises 100 scientific articles manually annotated for evaluation purposes. To the best of our knowledge, SciDMT is the largest corpus for scientific entity mention detection. The corpus's scale and diversity are instrumental in developing and refining models for tasks such as indexing scientific papers, enhancing information retrieval, and improving the accessibility of scientific knowledge. We demonstrate the corpus's utility through experiments with advanced deep learning architectures like SciBERT and GPT-3.5. Our findings establish performance baselines and highlight unresolved challenges in scientific mention detection. SciDMT serves as a robust benchmark for the research community, encouraging the development of innovative models to further the field of scientific information extraction.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Permeability distribution of gas drainage of borehole with the different moisture content caused polar permeability effect
Authors:
Lei Zhang,
Yao Zhang,
Hongyu Pan,
Yan Cao,
Yuhang Chu,
Shihua Yang
Abstract:
In order to study the penetration characteristics in areas with different water content and different stress distributions in the radial direction of the hole after hydraulicization measures, an improved LFTD1812 triaxial permeability meter was used to conduct a test to measure the polar permeability characteristics of coal with different water content combinations were measured by permeability in…
▽ More
In order to study the penetration characteristics in areas with different water content and different stress distributions in the radial direction of the hole after hydraulicization measures, an improved LFTD1812 triaxial permeability meter was used to conduct a test to measure the polar permeability characteristics of coal with different water content combinations were measured by permeability instrument, and the porosity, permeability, pressure gradient and seepage velocity of different samples were analyzed. The relationship between sample porosity, permeability, pressure gradient and seepage velocity was discussed, the influence of moisture content on permeability was discussed, and the directionality and the directivity and polarization effect of permeability were found.. Result shows that The relationship between permeability and porosity shows two trends of exponential type and logarithmic type, and the porosity-permeability(φ-k) plane is divided into three influence regions: super index (I), index (II) and logarithm (III).
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge
Authors:
Hongpeng Pan,
Shifeng Yi,
Shouwei Yang,
Lei Qi,
Bing Hu,
Yi Xu,
Yang Yang
Abstract:
This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tunin…
▽ More
This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tuning methods based on pseudo-labels. To address this issue, we propose the VLM+ framework, which integrates the multimodal large language model (MM-LLM). Specifically, we use MM-LLM to generate a series of referential expressions for each category. Based on the VLM predictions and the given annotations, we select the best referential expression for each category by matching the maximum IoU. Subsequently, we use these referential expressions to generate pseudo-labels for all images in the training set and then combine them with the original labeled data to fine-tune the VLM. Additionally, we employ iterative pseudo-label generation and optimization to further enhance the performance of the VLM. Our approach achieve 32.56 mAP in the final test.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
A 260 pc resolution ALMA map of HCN(1-0) in the galaxy NGC 4321
Authors:
Lukas Neumann,
Frank Bigiel,
Ashley T. Barnes,
Molly J. Gallagher,
Adam Leroy,
Antonio Usero,
Erik Rosolowsky,
Ivana Bešlić,
Médéric Boquien,
Yixian Cao,
Mélanie Chevance,
Dario Colombo,
Daniel A. Dale,
Cosima Eibensteiner,
Kathryn Grasha,
Jonathan D. Henshaw,
María J. Jiménez-Donaire,
Sharon Meidt,
Shyam H. Menon,
Eric J. Murphy,
Hsi-An Pan,
Miguel Querejeta,
Toshiki Saito,
Eva Schinnerer,
Sophia K. Stuber
, et al. (2 additional authors not shown)
Abstract:
The star formation rate (SFR) is tightly connected to the amount of dense gas in molecular clouds. However, it is not fully understood how the relationship between dense molecular gas and star formation varies within galaxies and in different morphological environments. In this work, we study dense gas and star formation in the nearby spiral galaxy NGC 4321 to test how the amount of dense gas and…
▽ More
The star formation rate (SFR) is tightly connected to the amount of dense gas in molecular clouds. However, it is not fully understood how the relationship between dense molecular gas and star formation varies within galaxies and in different morphological environments. In this work, we study dense gas and star formation in the nearby spiral galaxy NGC 4321 to test how the amount of dense gas and its ability to form stars varies with environmental properties at 260 pc scales. We present new ALMA observations of HCN(1-0) line emission. Combined with existing CO(2-1) observations from ALMA, and H-alpha from MUSE, as well as F2100W from JWST to trace the SFR, we measure the HCN/CO line ratio, a proxy for the dense gas fraction and SFR/HCN, a proxy for the star formation efficiency of the dense gas. Towards the centre of the galaxy, HCN/CO systematically increases while SFR/HCN decreases, but these ratios stay roughly constant throughout the disc. Spiral arms, interarm regions, and bar ends show similar HCN/CO and SFR/HCN. On the bar, there is a significantly lower SFR/HCN at a similar HCN/CO. We conclude that the centres of galaxies show the strongest environmental influence on dense gas and star formation, suggesting either that clouds couple strongly to the surrounding pressure or that HCN is tracing more of the bulk molecular gas that is less efficiently converted into stars. On the contrary, across the disc of NGC 4321, where the ISM pressure is typically low, SFR/HCN does not show large variations (< 0.3 dex) in agreement with Galactic observations of molecular clouds. Despite the large variations across environments and physical conditions, HCN/CO is a good predictor of the mean molecular gas surface density at 260 pc scales.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
SparseDet: A Simple and Effective Framework for Fully Sparse LiDAR-based 3D Object Detection
Authors:
Lin Liu,
Ziying Song,
Qiming Xia,
Feiyang Jia,
Caiyan Jia,
Lei Yang,
Hongyu Pan
Abstract:
LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficie…
▽ More
LiDAR-based sparse 3D object detection plays a crucial role in autonomous driving applications due to its computational efficiency advantages. Existing methods either use the features of a single central voxel as an object proxy, or treat an aggregated cluster of foreground points as an object proxy. However, the former lacks the ability to aggregate contextual information, resulting in insufficient information expression in object proxies. The latter relies on multi-stage pipelines and auxiliary tasks, which reduce the inference speed. To maintain the efficiency of the sparse framework while fully aggregating contextual information, in this work, we propose SparseDet which designs sparse queries as object proxies. It introduces two key modules, the Local Multi-scale Feature Aggregation (LMFA) module and the Global Feature Aggregation (GFA) module, aiming to fully capture the contextual information, thereby enhancing the ability of the proxies to represent objects. Where LMFA sub-module achieves feature fusion across different scales for sparse key voxels %which does this through via coordinate transformations and using nearest neighbor relationships to capture object-level details and local contextual information, GFA sub-module uses self-attention mechanisms to selectively aggregate the features of the key voxels across the entire scene for capturing scene-level contextual information. Experiments on nuScenes and KITTI demonstrate the effectiveness of our method. Specifically, on nuScene, SparseDet surpasses the previous best sparse detector VoxelNeXt by 2.2\% mAP with 13.5 FPS, and on KITTI, it surpasses VoxelNeXt by 1.12\% $\mathbf{AP_{3D}}$ on hard level tasks with 17.9 FPS.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Room-temperature tunable tunneling magnetoresistance in Fe3GaTe2/WSe2/Fe3GaTe2 van der Waals heterostructures
Authors:
Haiyang Pan,
Anil Kumar Singh,
Chusheng Zhang,
Xueqi Hu,
Jiayu Shi,
Liheng An,
Naizhou Wang,
Ruihuan Duan,
Zheng Liu,
S tuart S. P. Parkin,
Pritam Deb,
Weibo Gao
Abstract:
The exceptional properties of two-dimensional (2D) magnet materials present a novel approach to fabricate functional magnetic tunnel junctions (MTJ) by constructing full van der Waals (vdW) heterostructures with atomically sharp and clean interfaces. The exploration of vdW MTJ devices with high working temperature and adjustable functionalities holds great potential for advancing the application o…
▽ More
The exceptional properties of two-dimensional (2D) magnet materials present a novel approach to fabricate functional magnetic tunnel junctions (MTJ) by constructing full van der Waals (vdW) heterostructures with atomically sharp and clean interfaces. The exploration of vdW MTJ devices with high working temperature and adjustable functionalities holds great potential for advancing the application of 2D materials in magnetic sensing and data storage. Here, we report the observation of highly tunable room-temperature tunneling magnetoresistance through electronic means in a full vdW Fe3GaTe2/WSe2/Fe3GaTe2 MTJ. The spin valve effect of the MTJ can be detected even with the current below 1 nA, both at low and room temperatures, yielding a tunneling magnetoresistance (TMR) of 340% at 2 K and 50% at 300 K, respectively. Importantly, the magnitude and sign of TMR can be modulated by a DC bias current, even at room temperature, a capability that was previously unrealized in full vdW MTJs. This tunable TMR arises from the contribution of energy-dependent localized spin states in the metallic ferromagnet Fe3GaTe2 during tunnel transport when a finite electrical bias is applied. Our work offers a new perspective for designing and exploring room-temperature tunable spintronic devices based on vdW magnet heterostructures.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
High-resolution Observation of Blowout Jets Regulated by Sunspot Rotation
Authors:
Tingyu Gou,
Rui Liu,
Yang Su,
Astrid M. Veronig,
Hanya Pan,
Runbin Luo,
Weiqun Gan
Abstract:
Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-…
▽ More
Coronal jets are believed to be the miniature version of large-scale solar eruptions. In particular, the eruption of a mini-filament inside the base arch is suggested to be the trigger and even driver of blowout jets. Here we propose an alternative triggering mechanism, based on high-resolution H-alpha observations of a blowout jet associated with a mini-filament and an M1.2-class flare. The mini-filament remains largely stationary during the blowout jet, except that it is straddled by flare loops connecting two flare ribbons, indicating that the magnetic arcade embedding the mini-filament has been torn into two parts, with the upper part escaping with the blowout jet. In the wake of the flare, the southern end of the mini-filament fans out like neighboring fibrils, indicative of mass and field exchanges between the mini-filament and the fibrils. The blowout jet is preceded by a standard jet. With H-alpha fibrils moving toward the single-strand spire in a sweeping fashion, the standard jet transitions to the blowout jet. The similar pattern of standard-to-blowout jet transition occurs in an earlier C-class flare before the mini-filament forms. The spiraling morphology and sweeping direction of these fibrils are suggestive of their footpoints being dragged by the leading sunspot that undergoes clockwise rotation for over two days. Soon after the sunspot rotation reaches a peak angular speed as fast as 10 deg/hr, the dormant active region becomes flare-productive, and the mini-filament forms through the interaction of moving magnetic features from the rotating sunspot with satellite spots/pores. Hence, we suggest that the sunspot rotation plays a key role in building up free energy for flares and jets and in triggering blowout jets by inducing sweeping motions of fibrils.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Authors:
Yue Ma,
Hongyu Liu,
Hongfa Wang,
Heng Pan,
Yingqing He,
Junkun Yuan,
Ailing Zeng,
Chengfei Cai,
Heung-Yeung Shum,
Wei Liu,
Qifeng Chen
Abstract:
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ…
▽ More
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equipped the powerful Stable Diffusion model with two well-designed technologies. Specifically, we first adopt a new explicit motion signal, namely expression-aware landmark, to guide the animation process. We discover this landmark can not only ensure the accurate motion alignment between the reference portrait and target motion during inference but also increase the ability to portray exaggerated expressions (i.e., large pupil movements) and avoid identity leakage. Then, we propose a facial fine-grained loss to improve the model's ability of subtle expression perception and reference portrait appearance reconstruction by using both expression and facial masks. Accordingly, our method demonstrates significant performance in controlling the expression of freestyle portraits, including real humans, cartoons, sculptures, and even animals. By leveraging a simple and effective progressive generation strategy, we extend our model to stable long-term animation, thus increasing its potential application value. To address the lack of a benchmark for this field, we introduce EmojiBench, a comprehensive benchmark comprising diverse portrait images, driving videos, and landmarks. We show extensive evaluations on EmojiBench to verify the superiority of Follow-Your-Emoji.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
First results of AUP Nb3Sn quadrupole horizontal tests
Authors:
M. Baldini,
G. Ambrosio,
G. Apollinari,
J. Blowers,
R. Bossert,
R. Carcagno,
G. Chlachidze,
J. DiMarco,
S. Feher,
S. Krave,
V. Lombardo,
L. Martin,
C. Narug,
T. H. Nicol,
V. Nikolic,
A. Nobrega,
V. Marinozzi,
C. Orozco,
T. Page,
S. Stoynev,
T. Strauss,
M. Turenne,
D. Turrioni,
A. Vouris,
M. Yu
, et al. (26 additional authors not shown)
Abstract:
The Large Hadron Collider will soon undergo an upgrade to increase its luminosity by a factor of ~10 [1]. A crucial part of this upgrade will be replacement of the NbTi focusing magnets with Nb3Sn magnets that achieve a ~50% increase in the field strength. This will be the first ever large-scale implementation of Nb3Sn magnets in a particle accelerator. The High-Luminosity LHC Upgrade, HL-LHC is a…
▽ More
The Large Hadron Collider will soon undergo an upgrade to increase its luminosity by a factor of ~10 [1]. A crucial part of this upgrade will be replacement of the NbTi focusing magnets with Nb3Sn magnets that achieve a ~50% increase in the field strength. This will be the first ever large-scale implementation of Nb3Sn magnets in a particle accelerator. The High-Luminosity LHC Upgrade, HL-LHC is a CERN project with a world-wide collaboration. It is under construction and utilizes Nb3Sn Magnets (named MQXF) as key ingredients to increase tenfold the integrated luminosity delivered to the CMS and ATLAS experiments in the next decade.
The HL-LHC AUP is the US effort to contribute approximately 50% of the low-beta focusing magnets and crab cavities for the HL-LHC.
This paper will present the program to fabricate the Nb3Sn superconducting magnets. We are reporting the status of the HL-LHC AUP project present the results from horizontal tests of the first fully assembled cryo-assembly.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Efficient Visual Fault Detection for Freight Train via Neural Architecture Search with Data Volume Robustness
Authors:
Yang Zhang,
Mingying Li,
Huilin Pan,
Moyun Liu,
Yang Zhou
Abstract:
Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neu…
▽ More
Deep learning-based fault detection methods have achieved significant success. In visual fault detection of freight trains, there exists a large characteristic difference between inter-class components (scale variance) but intra-class on the contrary, which entails scale-awareness for detectors. Moreover, the design of task-specific networks heavily relies on human expertise. As a consequence, neural architecture search (NAS) that automates the model design process gains considerable attention because of its promising performance. However, NAS is computationally intensive due to the large search space and huge data volume. In this work, we propose an efficient NAS-based framework for visual fault detection of freight trains to search for the task-specific detection head with capacities of multi-scale representation. First, we design a scale-aware search space for discovering an effective receptive field in the head. Second, we explore the robustness of data volume to reduce search costs based on the specifically designed search space, and a novel sharing strategy is proposed to reduce memory and further improve search efficiency. Extensive experimental results demonstrate the effectiveness of our method with data volume robustness, which achieves 46.8 and 47.9 mAP on the Bottom View and Side View datasets, respectively. Our framework outperforms the state-of-the-art approaches and linearly decreases the search costs with reduced data volumes.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection
Authors:
Ziying Song,
Feiyang Jia,
Hongyu Pan,
Yadan Luo,
Caiyan Jia,
Guoxin Zhang,
Lin Liu,
Yang Ji,
Lei Yang,
Li Wang
Abstract:
In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird's Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the…
▽ More
In the field of 3D object detection tasks, fusing heterogeneous features from LiDAR and camera sensors into a unified Bird's Eye View (BEV) representation is a widely adopted paradigm. However, existing methods are often compromised by imprecise sensor calibration, resulting in feature misalignment in LiDAR-camera BEV fusion. Moreover, such inaccuracies result in errors in depth estimation for the camera branch, ultimately causing misalignment between LiDAR and camera BEV features. In this work, we propose a novel ContrastAlign approach that utilizes contrastive learning to enhance the alignment of heterogeneous modalities, thereby improving the robustness of the fusion process. Specifically, our approach includes the L-Instance module, which directly outputs LiDAR instance features within LiDAR BEV features. Then, we introduce the C-Instance module, which predicts camera instance features through RoI (Region of Interest) pooling on the camera BEV features. We propose the InstanceFusion module, which utilizes contrastive learning to generate similar instance features across heterogeneous modalities. We then use graph matching to calculate the similarity between the neighboring camera instance features and the similarity instance features to complete the alignment of instance features. Our method achieves state-of-the-art performance, with an mAP of 70.3%, surpassing BEVFusion by 1.8% on the nuScenes validation set. Importantly, our method outperforms BEVFusion by 7.3% under conditions with misalignment noise.
△ Less
Submitted 5 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
FastDrag: Manipulate Anything in One Step
Authors:
Xuanjia Zhao,
Jian Guan,
Congyi Fan,
Dongli Xu,
Youtian Lin,
Haiwei Pan,
Pengming Feng
Abstract:
Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-ste…
▽ More
Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt $n$-step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds. Meanwhile, null regions emerging after applying LWF are addressed by our proposed bilateral nearest neighbor interpolation (BNNI) strategy. This strategy interpolates these regions using similar features from neighboring areas, thus enhancing semantic integrity. Additionally, a consistency-preserving strategy is introduced to maintain the consistency between the edited and original images by adopting semantic information from the original image, saved as key and value pairs in self-attention module during diffusion inversion, to guide the diffusion sampling. Our FastDrag is validated on the DragBench dataset, demonstrating substantial improvements in processing time over existing methods, while achieving enhanced editing performance. Project page: https://fastdrag-site.github.io/ .
△ Less
Submitted 6 June, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction
Authors:
Bingchen Yang,
Haiyong Jiang,
Hao Pan,
Peter Wonka,
Jun Xiao,
Guosheng Lin
Abstract:
Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one ste…
▽ More
Reverse engineering CAD models from raw geometry is a classic but challenging research problem. In particular, reconstructing the CAD modeling sequence from point clouds provides great interpretability and convenience for editing. To improve upon this problem, we introduce geometric guidance into the reconstruction network. Our proposed model, PS-CAD, reconstructs the CAD modeling sequence one step at a time. At each step, we provide two forms of geometric guidance. First, we provide the geometry of surfaces where the current reconstruction differs from the complete model as a point cloud. This helps the framework to focus on regions that still need work. Second, we use geometric analysis to extract a set of planar prompts, that correspond to candidate surfaces where a CAD extrusion step could be started. Our framework has three major components. Geometric guidance computation extracts the two types of geometric guidance. Single-step reconstruction computes a single candidate CAD modeling step for each provided prompt. Single-step selection selects among the candidate CAD modeling steps. The process continues until the reconstruction is completed. Our quantitative results show a significant improvement across all metrics. For example, on the dataset DeepCAD, PS-CAD improves upon the best published SOTA method by reducing the geometry errors (CD and HD) by 10%, and the structural error (ECD metric) by about 15%.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Local and nonlocal stochastic control of quantum chaos: Measurement- and control-induced criticality
Authors:
Haining Pan,
Sriram Ganeshan,
Thomas Iadecola,
Justin H. Wilson,
J. H. Pixley
Abstract:
We theoretically study the topology of the phase diagram of a family of quantum models inspired by the classical Bernoulli map under stochastic control. The quantum models inherit a control-induced phase transition from the classical model and also manifest an entanglement phase transition intrinsic to the quantum setting. This measurement-induced phase transition has been shown in various setting…
▽ More
We theoretically study the topology of the phase diagram of a family of quantum models inspired by the classical Bernoulli map under stochastic control. The quantum models inherit a control-induced phase transition from the classical model and also manifest an entanglement phase transition intrinsic to the quantum setting. This measurement-induced phase transition has been shown in various settings to either coincide or split off from the control transition, but a systematic understanding of the necessary and sufficient conditions for the two transitions to coincide in this case has so far been lacking. In this work, we generalize the control map to allow for either local or global control action. While this does not affect the classical aspects of the control transition that is described by a random walk, it significantly influences the quantum dynamics, leading to the universality class of the measurement-induced transition being dependent on the locality of the control operation. In the presence of a global control map, the two transitions coincide and the control-induced phase transition dominates the measurement-induced phase transition. Contrarily, the two transitions split in the presence of the local control map or additional projective measurements and generically take on distinct universality classes. For local control, the measurement-induced phase transition recovers the Haar logarithmic conformal field theory universality class found in feedback-free models. However, for global control, a novel universality class with correlation length exponent $ν\approx 0.7$ emerges from the interplay of control and projective measurements. This work provides a more refined understanding of the relationship between the control- and measurement-induced phase transitions.
△ Less
Submitted 22 August, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
DCT-Based Decorrelated Attention for Vision Transformers
Authors:
Hongyi Pan,
Emadeldeen Hamdan,
Xin Zhu,
Koushik Biswas,
Ahmet Enis Cetin,
Ulas Bagci
Abstract:
Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transf…
▽ More
Central to the Transformer architectures' effectiveness is the self-attention mechanism, a function that maps queries, keys, and values into a high-dimensional vector space. However, training the attention weights of queries, keys, and values is non-trivial from a state of random initialization. In this paper, we propose two methods. (i) We first address the initialization problem of Vision Transformers by introducing a simple, yet highly innovative, initialization approach utilizing Discrete Cosine Transform (DCT) coefficients. Our proposed DCT-based attention initialization marks a significant gain compared to traditional initialization strategies; offering a robust foundation for the attention mechanism. Our experiments reveal that the DCT-based initialization enhances the accuracy of Vision Transformers in classification tasks. (ii) We also recognize that since DCT effectively decorrelates image information in the frequency domain, this decorrelation is useful for compression because it allows the quantization step to discard many of the higher-frequency components. Based on this observation, we propose a novel DCT-based compression technique for the attention function of Vision Transformers. Since high-frequency DCT coefficients usually correspond to noise, we truncate the high-frequency DCT components of the input patches. Our DCT-based compression reduces the size of weight matrices for queries, keys, and values. While maintaining the same level of accuracy, our DCT compressed Swin Transformers obtain a considerable decrease in the computational overhead.
△ Less
Submitted 28 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models
Authors:
Rui Xu,
Jiepeng Wang,
Hao Pan,
Yang Liu,
Xin Tong,
Shiqing Xin,
Changhe Tu,
Taku Komura,
Wenping Wang
Abstract:
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently…
▽ More
In this paper, we study an under-explored but important factor of diffusion generative models, i.e., the combinatorial complexity. Data samples are generally high-dimensional, and for various structured generation tasks, there are additional attributes which are combined to associate with data samples. We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models, causing degraded test time performance. We present a simple fix to this problem by constructing stochastic processes that fully exploit the combinatorial structures, hence the name ComboStoc. Using this simple strategy, we show that network training is significantly accelerated across diverse data modalities, including images and 3D structured shapes. Moreover, ComboStoc enables a new way of test time generation which uses insynchronized time steps for different dimensions and attributes, thus allowing for varying degrees of control over them.
△ Less
Submitted 24 May, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning
Authors:
Zheyuan Zhang,
Elif Keles,
Gorkem Durak,
Yavuz Taktak,
Onkar Susladkar,
Vandan Gorade,
Debesh Jha,
Asli C. Ormeci,
Alpay Medetalibeyoglu,
Lanhong Yao,
Bin Wang,
Ilkin Sevgi Isler,
Linkai Peng,
Hongyi Pan,
Camila Lopes Vendrami,
Amir Bourhani,
Yury Velichko,
Boqing Gong,
Concetto Spampinato,
Ayis Pyrros,
Pallavi Tiwari,
Derk C. F. Klatte,
Megan Engels,
Sanne Hoogenboom,
Candice W. Bolan
, et al. (13 additional authors not shown)
Abstract:
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st…
▽ More
Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective study, we collected a large dataset (767 scans from 499 participants) of T1-weighted (T1W) and T2-weighted (T2W) abdominal MRI series from five centers between March 2004 and November 2022. We also collected CT scans of 1,350 patients from publicly available sources for benchmarking purposes. We developed a new pancreas segmentation method, called PanSegNet, combining the strengths of nnUNet and a Transformer network with a new linear attention module enabling volumetric computation. We tested PanSegNet's accuracy in cross-modality (a total of 2,117 scans) and cross-center settings with Dice and Hausdorff distance (HD95) evaluation metrics. We used Cohen's kappa statistics for intra and inter-rater agreement evaluation and paired t-tests for volume and Dice comparisons, respectively. For segmentation accuracy, we achieved Dice coefficients of 88.3% (std: 7.2%, at case level) with CT, 85.0% (std: 7.9%) with T1W MRI, and 86.3% (std: 6.4%) with T2W MRI. There was a high correlation for pancreas volume prediction with R^2 of 0.91, 0.84, and 0.85 for CT, T1W, and T2W, respectively. We found moderate inter-observer (0.624 and 0.638 for T1W and T2W MRI, respectively) and high intra-observer agreement scores. All MRI data is made available at https://osf.io/kysnj/. Our source code is available at https://github.com/NUBagciLab/PaNSegNet.
△ Less
Submitted 25 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
CTS: A Consistency-Based Medical Image Segmentation Model
Authors:
Kejia Zhang,
Lan Zhang,
Haiwei Pan,
Baolong Yu
Abstract:
In medical image segmentation tasks, diffusion models have shown significant potential. However, mainstream diffusion models suffer from drawbacks such as multiple sampling times and slow prediction results. Recently, consistency models, as a standalone generative network, have resolved this issue. Compared to diffusion models, consistency models can reduce the sampling times to once, not only ach…
▽ More
In medical image segmentation tasks, diffusion models have shown significant potential. However, mainstream diffusion models suffer from drawbacks such as multiple sampling times and slow prediction results. Recently, consistency models, as a standalone generative network, have resolved this issue. Compared to diffusion models, consistency models can reduce the sampling times to once, not only achieving similar generative effects but also significantly speeding up training and prediction. However, they are not suitable for image segmentation tasks, and their application in the medical imaging field has not yet been explored. Therefore, this paper applies the consistency model to medical image segmentation tasks, designing multi-scale feature signal supervision modes and loss function guidance to achieve model convergence. Experiments have verified that the CTS model can obtain better medical image segmentation results with a single sampling during the test phase.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Variable Substitution and Bilinear Programming for Aligning Partially Overlapping Point Sets
Authors:
Wei Lian,
Zhesen Cui,
Fei Ma,
Hang Pan,
Wangmeng Zuo
Abstract:
In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, t…
▽ More
In many applications, the demand arises for algorithms capable of aligning partially overlapping point sets while remaining invariant to the corresponding transformations. This research presents a method designed to meet such requirements through minimization of the objective function of the robust point matching (RPM) algorithm. First, we show that the RPM objective is a cubic polynomial. Then, through variable substitution, we transform the RPM objective to a quadratic function. Leveraging the convex envelope of bilinear monomials, we proceed to relax the resulting objective function, thus obtaining a lower bound problem that can be conveniently decomposed into distinct linear assignment and low-dimensional convex quadratic program components, both amenable to efficient optimization. Furthermore, a branch-and-bound (BnB) algorithm is devised, which solely branches over the transformation parameters, thereby boosting convergence rate. Empirical evaluations demonstrate better robustness of the proposed methodology against non-rigid deformation, positional noise, and outliers, particularly in scenarios where outliers remain distinct from inliers, when compared with prevailing state-of-the-art approaches.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Search for solar axions by Primakoff effect with the full dataset of the CDEX-1B Experiment
Authors:
L. T. Yang,
S. K. Liu,
Q. Yue,
K. J. Kang,
Y. J. Li,
H. P. An,
Greeshma C.,
J. P. Chang,
Y. H. Chen,
J. P. Cheng,
W. H. Dai,
Z. Deng,
C. H. Fang,
X. P. Geng,
H. Gong,
Q. J. Guo,
T. Guo,
X. Y. Guo,
L. He,
J. R. He,
J. W. Hu,
H. X. Huang,
T. C. Huang,
L. Jiang,
S. Karmakar
, et al. (61 additional authors not shown)
Abstract:
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axio…
▽ More
We present the first limit on $g_{Aγ}$ coupling constant using the Bragg-Primakoff conversion based on an exposure of 1107.5 kg days of data from the CDEX-1B experiment at the China Jinping Underground Laboratory. The data are consistent with the null signal hypothesis, and no excess signals are observed. Limits of the coupling $g_{Aγ}<2.08\times10^{-9}$ GeV$^{-1}$ (95\% C.L.) are derived for axions with mass up to 100 eV/$c^2$. Within the hadronic model of KSVZ, our results exclude axion mass $>5.3~\rm{eV}/c^2$ at 95\% C.L.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.