-
An Exploratory Deep Learning Approach for Predicting Subsequent Suicidal Acts in Chinese Psychological Support Hotlines
Authors:
Changwei Song,
Qing Zhao,
Jianqiang Li,
Yining Chen,
Yongsheng Tong,
Guanghui Fu
Abstract:
Psychological support hotlines are an effective suicide prevention measure that typically relies on professionals using suicide risk assessment scales to predict individual risk scores. However, the accuracy of scale-based predictive methods for suicide risk assessment can vary widely depending on the expertise of the operator. This limitation underscores the need for more reliable methods, prompt…
▽ More
Psychological support hotlines are an effective suicide prevention measure that typically relies on professionals using suicide risk assessment scales to predict individual risk scores. However, the accuracy of scale-based predictive methods for suicide risk assessment can vary widely depending on the expertise of the operator. This limitation underscores the need for more reliable methods, prompting this research's innovative exploration of the use of artificial intelligence to improve the accuracy and efficiency of suicide risk prediction within the context of psychological support hotlines. The study included data from 1,549 subjects from 2015-2017 in China who contacted a psychological support hotline. Each participant was followed for 12 months to identify instances of suicidal behavior. We proposed a novel multi-task learning method that uses the large-scale pre-trained model Whisper for feature extraction and fits psychological scales while predicting the risk of suicide. The proposed method yields a 2.4\% points improvement in F1-score compared to the traditional manual approach based on the psychological scales. Our model demonstrated superior performance compared to the other eight popular models. To our knowledge, this study is the first to apply deep learning to long-term speech data to predict suicide risk in China, indicating grate potential for clinical applications. The source code is publicly available at: \url{https://github.com/songchangwei/Suicide-Risk-Prediction}.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Nonrelativistic limit of Chandrasekhar Variational Model for Neutron Stars
Authors:
Yuanhui Chen,
Qingxuan Wang
Abstract:
In this paper, we consider the nonrelativistic limit of Chandrasekhar variational model for neutron stars. We show that the minimizer $ρ_{c}$ of Chandrasekhar energy $E_c(N)$ converges strongly to the minimizer $ρ_{\infty}$ of limit energy $E_{\infty}(N)$ in $L^1\cap L^{\frac{5}{3}}(\mathbb{R}^3)$ as the speed of light $c\rightarrow\infty$, this is a limit between two free boundary problems. We de…
▽ More
In this paper, we consider the nonrelativistic limit of Chandrasekhar variational model for neutron stars. We show that the minimizer $ρ_{c}$ of Chandrasekhar energy $E_c(N)$ converges strongly to the minimizer $ρ_{\infty}$ of limit energy $E_{\infty}(N)$ in $L^1\cap L^{\frac{5}{3}}(\mathbb{R}^3)$ as the speed of light $c\rightarrow\infty$, this is a limit between two free boundary problems. We develop a novel approach to obtain the convergence rates. For the radius $R_c$ of the compact support of $ρ_c(x)$ and the radius $R_\infty$ of the compact support of $ρ_\infty(x)$, we prove that $|R_c-R_\infty|\leq O(\frac{1}{c^2})$ as $c\rightarrow\infty$. Moreover, we show that the $L^\infty$-convergence rate of $|ρ_c-ρ_\infty|$ is not bigger than $O(\frac{1}{c^3})$ in the corner layer $B(R_\infty +\frac{K_1}{c^2})\setminus B(R_\infty -\frac{K_1}{c^2})\, (K_1>0)$, while $L^\infty$-convergence rate of $|ρ_c-ρ_\infty|$ is not bigger than $O(\frac{1}{c^2})$ inside the corner layer $B(R_\infty -\frac{K_1}{c^2})$ for $N$ large enough.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach
Authors:
Yifei Chen,
Shenghao Zhu,
Zhaojie Fang,
Chang Liu,
Binfeng Zou,
Yuhe Wang,
Shuo Chang,
Fan Jia,
Feiwei Qin,
Jin Fan,
Yong Peng,
Changmiao Wang
Abstract:
Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates…
▽ More
Alzheimer's Disease (AD) is a complex neurodegenerative disorder marked by memory loss, executive dysfunction, and personality changes. Early diagnosis is challenging due to subtle symptoms and varied presentations, often leading to misdiagnosis with traditional unimodal diagnostic methods due to their limited scope. This study introduces an advanced multimodal classification model that integrates clinical, cognitive, neuroimaging, and EEG data to enhance diagnostic accuracy. The model incorporates a feature tagger with a tabular data coding architecture and utilizes the TimesBlock module to capture intricate temporal patterns in Electroencephalograms (EEG) data. By employing Cross-modal Attention Aggregation module, the model effectively fuses Magnetic Resonance Imaging (MRI) spatial information with EEG temporal data, significantly improving the distinction between AD, Mild Cognitive Impairment, and Normal Cognition. Simultaneously, we have constructed the first AD classification dataset that includes three modalities: EEG, MRI, and tabular data. Our innovative approach aims to facilitate early diagnosis and intervention, potentially slowing the progression of AD. The source code and our private ADMC dataset are available at https://github.com/JustlfC03/MSTNet.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
AdaMotif: Graph Simplification via Adaptive Motif Design
Authors:
Hong Zhou,
Peifeng Lai,
Zhida Sun,
Xiangyuan Chen,
Yang Chen,
Huisi Wu,
Yong Wang
Abstract:
With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures vi…
▽ More
With the increase of graph size, it becomes difficult or even impossible to visualize graph structures clearly within the limited screen space. Consequently, it is crucial to design effective visual representations for large graphs. In this paper, we propose AdaMotif, a novel approach that can capture the essential structure patterns of large graphs and effectively reveal the overall structures via adaptive motif designs. Specifically, our approach involves partitioning a given large graph into multiple subgraphs, then clustering similar subgraphs and extracting similar structural information within each cluster. Subsequently, adaptive motifs representing each cluster are generated and utilized to replace the corresponding subgraphs, leading to a simplified visualization. Our approach aims to preserve as much information as possible from the subgraphs while simplifying the graph efficiently. Notably, our approach successfully visualizes crucial community information within a large graph. We conduct case studies and a user study using real-world graphs to validate the effectiveness of our proposed approach. The results demonstrate the capability of our approach in simplifying graphs while retaining important structural and community information.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
Model-independent determination of the strong-phase difference between $D^0$ and $\bar{D}^0 \to π^+π^-π^+π^-$ decays
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (647 additional authors not shown)
Abstract:
Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a…
▽ More
Measurements of the strong-phase difference between $D^0$ and $\bar{D}^0\toπ^+π^-π^+π^-$ are performed in bins of phase space. The study exploits a sample of quantum-correlated $D\bar{D}$ mesons collected by the BESIII experiment in $e^+e^-$ collisions at a center-of-mass energy of 3.773~GeV, corresponding to an integrated luminosity of 2.93~fb$^{-1}$. Here, $D$ denotes a neutral charm meson in a superposition of flavor eigenstates. The reported results are valuable for measurements of the $C\!P$-violating phase $γ$ (also denoted $φ_3$) in $B^\pm \to DK^\pm$, $D \to π^+π^-π^+π^-$ decays, and the binning schemes are designed to provide good statistical sensitivity to this parameter. The expected uncertainty on $γ$ arising from the precision of the strong-phase measurements, when applied to very large samples of $B$-meson decays, is around $1.5^\circ$ or $2^\circ$, depending on the binning scheme. The binned strong-phase parameters are combined to give a value of $F_+^{4π} = 0.746 \pm 0.010 \pm 0.004$ for the $C\!P$-even fraction of $D^0 \to π^+π^-π^+π^-$ decays, which is around 30\% more precise than the previous best measurement of this quantity.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
PACiM: A Sparsity-Centric Hybrid Compute-in-Memory Architecture via Probabilistic Approximation
Authors:
Wenlun Zhang,
Shimpei Ando,
Yung-Chin Chen,
Satomi Miyagi,
Shinya Takamaeda-Yamazaki,
Kentaro Yoshioka
Abstract:
Approximate computing emerges as a promising approach to enhance the efficiency of compute-in-memory (CiM) systems in deep neural network processing. However, traditional approximate techniques often significantly trade off accuracy for power efficiency, and fail to reduce data transfer between main memory and CiM banks, which dominates power consumption. This paper introduces a novel probabilisti…
▽ More
Approximate computing emerges as a promising approach to enhance the efficiency of compute-in-memory (CiM) systems in deep neural network processing. However, traditional approximate techniques often significantly trade off accuracy for power efficiency, and fail to reduce data transfer between main memory and CiM banks, which dominates power consumption. This paper introduces a novel probabilistic approximate computation (PAC) method that leverages statistical techniques to approximate multiply-and-accumulation (MAC) operations, reducing approximation error by 4X compared to existing approaches. PAC enables efficient sparsity-based computation in CiM systems by simplifying complex MAC vector computations into scalar calculations. Moreover, PAC enables sparsity encoding and eliminates the LSB activations transmission, significantly reducing data reads and writes. This sets PAC apart from traditional approximate computing techniques, minimizing not only computation power but also memory accesses by 50%, thereby boosting system-level efficiency. We developed PACiM, a sparsity-centric architecture that fully exploits sparsity to reduce bit-serial cycles by 81% and achieves a peak 8b/8b efficiency of 14.63 TOPS/W in 65 nm CMOS while maintaining high accuracy of 93.85/72.36/66.02% on CIFAR-10/CIFAR-100/ImageNet benchmarks using a ResNet-18 model, demonstrating the effectiveness of our PAC methodology.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
ReXamine-Global: A Framework for Uncovering Inconsistencies in Radiology Report Generation Metrics
Authors:
Oishi Banerjee,
Agustina Saenz,
Kay Wu,
Warren Clements,
Adil Zia,
Dominic Buensalido,
Helen Kavnoudias,
Alain S. Abi-Ghanem,
Nour El Ghawi,
Cibele Luna,
Patricia Castillo,
Khaled Al-Surimi,
Rayyan A. Daghistani,
Yuh-Min Chen,
Heng-sheng Chao,
Lars Heiliger,
Moon Kim,
Johannes Haubold,
Frederic Jonske,
Pranav Rajpurkar
Abstract:
Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First,…
▽ More
Given the rapidly expanding capabilities of generative AI models for radiology, there is a need for robust metrics that can accurately measure the quality of AI-generated radiology reports across diverse hospitals. We develop ReXamine-Global, a LLM-powered, multi-site framework that tests metrics across different writing styles and patient populations, exposing gaps in their generalization. First, our method tests whether a metric is undesirably sensitive to reporting style, providing different scores depending on whether AI-generated reports are stylistically similar to ground-truth reports or not. Second, our method measures whether a metric reliably agrees with experts, or whether metric and expert scores of AI-generated report quality diverge for some sites. Using 240 reports from 6 hospitals around the world, we apply ReXamine-Global to 7 established report evaluation metrics and uncover serious gaps in their generalizability. Developers can apply ReXamine-Global when designing new report evaluation metrics, ensuring their robustness across sites. Additionally, our analysis of existing metrics can guide users of those metrics towards evaluation procedures that work reliably at their sites of interest.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Six Maxims of Statistical Acumen for Astronomical Data Analysis
Authors:
Hyungsuk Tak,
Yang Chen,
Vinay L. Kashyap,
Kaisey S. Mandel,
Xiao-Li Meng,
Aneta Siemiginowska,
David A. van Dyk
Abstract:
The production of complex astronomical data is accelerating, especially with newer telescopes producing ever more large-scale surveys. The increased quantity, complexity, and variety of astronomical data demand a parallel increase in skill and sophistication in developing, deciding, and deploying statistical methods. Understanding limitations and appreciating nuances in statistical and machine lea…
▽ More
The production of complex astronomical data is accelerating, especially with newer telescopes producing ever more large-scale surveys. The increased quantity, complexity, and variety of astronomical data demand a parallel increase in skill and sophistication in developing, deciding, and deploying statistical methods. Understanding limitations and appreciating nuances in statistical and machine learning methods and the reasoning behind them is essential for improving data-analytic proficiency and acumen. Aiming to facilitate such improvement in astronomy, we delineate cautionary tales in statistics via six maxims, with examples drawn from the astronomical literature. Inspired by the significant quality improvement in business and manufacturing processes by the routine adoption of Six Sigma, we hope the routine reflection on these Six Maxims will improve the quality of both data analysis and scientific findings in astronomy.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Forward Ray Tracing and Hot Spots in Kerr Spacetime
Authors:
Lihang Zhou,
Zhen Zhong,
Yifan Chen,
Vitor Cardoso
Abstract:
Hotspots, often characterized as point-like emissions, frequently manifest near black holes, displaying significantly increased luminosity compared to the surrounding accretion flow. It is noteworthy that these hotspots regularly occur outside the black hole at the center of the Milky Way. Light rays emitted from these sources follow complex trajectories around the black hole, ultimately arriving…
▽ More
Hotspots, often characterized as point-like emissions, frequently manifest near black holes, displaying significantly increased luminosity compared to the surrounding accretion flow. It is noteworthy that these hotspots regularly occur outside the black hole at the center of the Milky Way. Light rays emitted from these sources follow complex trajectories around the black hole, ultimately arriving at distinct locations on the observer's image plane. To extract precise spacetime information, including the black hole mass, spin, and inclination angle, it is crucial to accurately resolve both the direct emission and its higher-order images, despite the latter's intensity suppression. To enhance the accuracy and efficiency of modeling and analyzing these hotspots, we introduce a forward ray tracing method, a departure from the traditional backward ray tracing approach. By utilizing conserved quantities in Kerr spacetime, this method initiates geodesics from a specified emission point near the black hole and terminates them at a distant observer, effectively capturing multiple images. By introducing perturbations to these geodesics, we map finite-size emissions to distinct regions on the image plane, allowing for the quantification of image shapes and amplification rates. This approach facilitates efficient spacetime tomography and hotspot localization, leveraging observations from the Event Horizon Telescope and its next-generation upgrades.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
In-Context Imitation Learning via Next-Token Prediction
Authors:
Letian Fu,
Huang Huang,
Gaurav Datta,
Lawrence Yunliang Chen,
William Chung-Ho Panitch,
Fangchen Liu,
Hui Li,
Ken Goldberg
Abstract:
We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor traj…
▽ More
We explore how to enhance next-token prediction models to perform in-context imitation learning on a real robot, where the robot executes new tasks by interpreting contextual information provided during the input phase, without updating its underlying policy parameters. We propose In-Context Robot Transformer (ICRT), a causal transformer that performs autoregressive prediction on sensorimotor trajectories without relying on any linguistic data or reward function. This formulation enables flexible and training-free execution of new tasks at test time, achieved by prompting the model with sensorimotor trajectories of the new task composing of image observations, actions and states tuples, collected through human teleoperation. Experiments with a Franka Emika robot demonstrate that the ICRT can adapt to new tasks specified by prompts, even in environment configurations that differ from both the prompt and the training data. In a multitask environment setup, ICRT significantly outperforms current state-of-the-art next-token prediction models in robotics on generalizing to unseen tasks. Code, checkpoints and data are available on https://icrt.dev/
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Explicit Folded Reed-Solomon and Multiplicity Codes Achieve Relaxed Generalized Singleton Bound
Authors:
Yeyuan Chen,
Zihan Zhang
Abstract:
In this paper, we prove that any `appropriate' folded Reed-Solomon and univariate multiplicity codes achieve relaxed generalized Singleton bound for list size $L\ge1.$ More concretely, we show the following: (1) Any $(s,γ)$-folded RS code over the alphabet $\mathbb{F}_q^s$ of block length $n$ and rate $R$ with pair-wise distinct evaluation points…
▽ More
In this paper, we prove that any `appropriate' folded Reed-Solomon and univariate multiplicity codes achieve relaxed generalized Singleton bound for list size $L\ge1.$ More concretely, we show the following: (1) Any $(s,γ)$-folded RS code over the alphabet $\mathbb{F}_q^s$ of block length $n$ and rate $R$ with pair-wise distinct evaluation points $\{γ^iα_j\}_{(i,j)\in\left(\{0\}\sqcup[s-1],[n]\right)}\subset\mathbb{F}_q$ are $\left(\frac{L}{L+1}\left(1-\frac{sR}{s-L+1}\right),L\right)$ (average-radius) list-decodable for list size $L\in[s]$. (2) Any $s$-order univariate multiplicity code over the alphabet $\mathbb{F}_p^s$ ($p$ is a prime) of block length $n$ and rate $R$ with pair-wise distinct evaluation points $\{α_i\}_{i\in[n]}\subset\mathbb{F}_p$ are $\left(\frac{L}{L+1}\left(1-\frac{sR}{s-L+1}\right),L\right)$ (average-radius) list-decodable for list size $L\in[s]$.
Choose $s=Θ(1/ε^2)$ and $L=O(1/ε)$, our results imply that both explicit folded RS codes and explicit univariate multiplicity codes achieve list decoding capacity $1-R-ε$ with evidently optimal list size $O(1/ε)$, which exponentially improves the previous state-of-the-art $(1/ε)^{O(1/ε)}$ established by Kopparty, Ron-Zewi, Saraf, and Wootters (FOCS 2018 or SICOMP, 2023) and Tamo (IEEE TIT, 2024). In particular, our results on folded Reed--Solomon codes fully resolve a long-standing open problem originally proposed by Guruswami and Rudra (STOC 2006 or IEEE TIT, 2008). Furthermore, our results imply the first explicit constructions of $(1-R-ε,O(1/ε))$ (average-radius) list-decodable codes of rate $R$ with polynomial-sized alphabets in the literature.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Statistical QoS Provision in Business-Centric Networks
Authors:
Chang Wu,
Yuang Chen,
Hancheng Lu
Abstract:
More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow…
▽ More
More refined resource management and Quality of Service (QoS) provisioning is a critical goal of wireless communication technologies. In this paper, we propose a novel Business-Centric Network (BCN) aimed at enabling scalable QoS provisioning, based on a cross-layer framework that captures the relationship between application, transport parameters, and channels. We investigate both continuous flow and event-driven flow models, presenting key QoS metrics such as throughput, delay, and reliability. By jointly considering power and bandwidth allocation, transmission parameters, and AP network topology across layers, we optimize weighted resource efficiency with statistical QoS provisioning. To address the coupling among parameters, we propose a novel deep reinforcement learning (DRL) framework, which is Collaborative Optimization among Heterogeneous Actors with Experience Sharing (COHA-ES). Power and sub-channel (SC) Actors representing multiple APs are jointly optimized under the unified guidance of a common critic. Additionally, we introduce a novel multithreaded experience-sharing mechanism to accelerate training and enhance rewards. Extensive comparative experiments validate the effectiveness of our DRL framework in terms of convergence and efficiency. Moreover, comparative analyses demonstrate the comprehensive advantages of the BCN structure in enhancing both spectral and energy efficiency.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Ray-Distance Volume Rendering for Neural Scene Reconstruction
Authors:
Ruihong Yin,
Yunlu Chen,
Sezer Karaoglu,
Theo Gevers
Abstract:
Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene re…
▽ More
Existing methods in neural scene reconstruction utilize the Signed Distance Function (SDF) to model the density function. However, in indoor scenes, the density computed from the SDF for a sampled point may not consistently reflect its real importance in volume rendering, often due to the influence of neighboring objects. To tackle this issue, our work proposes a novel approach for indoor scene reconstruction, which instead parameterizes the density function with the Signed Ray Distance Function (SRDF). Firstly, the SRDF is predicted by the network and transformed to a ray-conditioned density function for volume rendering. We argue that the ray-specific SRDF only considers the surface along the camera ray, from which the derived density function is more consistent to the real occupancy than that from the SDF. Secondly, although SRDF and SDF represent different aspects of scene geometries, their values should share the same sign indicating the underlying spatial occupancy. Therefore, this work introduces a SRDF-SDF consistency loss to constrain the signs of the SRDF and SDF outputs. Thirdly, this work proposes a self-supervised visibility task, introducing the physical visibility geometry to the reconstruction task. The visibility task combines prior from predicted SRDF and SDF as pseudo labels, and contributes to generating more accurate 3D geometry. Our method implemented with different representations has been validated on indoor datasets, achieving improved performance in both reconstruction and view synthesis.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Authors:
Jialin Wu,
Jiangyi Deng,
Shengyuan Pang,
Yanjiao Chen,
Jiayang Xu,
Xinfeng Li,
Wenyuan Xu
Abstract:
Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we…
▽ More
Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we reveal for the first time that effective and efficient content moderation can be achieved by extracting conceptual features from chat-oriented LLMs, despite their initial fine-tuning for conversation rather than content moderation. We propose a practical and unified content moderation framework for LLM services, named Legilimens, which features both effectiveness and efficiency. Our red-team model-based data augmentation enhances the robustness of Legilimens against state-of-the-art jailbreaking. Additionally, we develop a framework to theoretically analyze the cost-effectiveness of Legilimens compared to other methods. We have conducted extensive experiments on five host LLMs, seventeen datasets, and nine jailbreaking methods to verify the effectiveness, efficiency, and robustness of Legilimens against normal and adaptive adversaries. A comparison of Legilimens with both commercial and academic baselines demonstrates the superior performance of Legilimens. Furthermore, we confirm that Legilimens can be applied to few-shot scenarios and extended to multi-label classification tasks.
△ Less
Submitted 5 September, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty
Authors:
Saining Zhang,
Baijun Ye,
Xiaoxue Chen,
Yuantao Chen,
Zongzheng Zhang,
Cheng Peng,
Yongliang Shi,
Hao Zhao
Abstract:
Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuiti…
▽ More
Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images
Authors:
Zafer Yildiz,
Yuwen Chen,
Maciej A. Mazurowski
Abstract:
Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in the fo…
▽ More
Creating annotations for 3D medical data is time-consuming and often requires highly specialized expertise. Various tools have been implemented to aid this process. Segment Anything Model 2 (SAM 2) offers a general-purpose prompt-based segmentation algorithm designed to annotate videos. In this paper, we adapt this model to the annotation of 3D medical images and offer our implementation in the form of an extension to the popular annotation software: 3D Slicer. Our extension allows users to place point prompts on 2D slices to generate annotation masks and propagate these annotations across entire volumes in either single-directional or bi-directional manners. Our code is publicly available on https://github.com/mazurowski-lab/SlicerSegmentWithSAM and can be easily installed directly from the Extension Manager of 3D Slicer as well.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
A Yebes W band Line Survey towards an Unshocked Molecular Cloud of Supernova Remnant 3C391: Evidence of Cosmic-Ray-Induced Chemistry
Authors:
Tian-Yu Tu,
Prathap Rayalacheruvu,
Liton Majumdar,
Yang Chen,
Ping Zhou,
Miguel Santander-García
Abstract:
Cosmic rays (CRs) have strong influences on the chemistry of dense molecular clouds (MCs). To study the detailed chemistry induced by CRs, we conducted a Yebes W band line survey towards an unshocked MC (which we named as 3C391:NML) associated with supernova remnant (SNR) 3C391. We detected emission lines of 18 molecular species in total and estimated their column densities with local thermodynami…
▽ More
Cosmic rays (CRs) have strong influences on the chemistry of dense molecular clouds (MCs). To study the detailed chemistry induced by CRs, we conducted a Yebes W band line survey towards an unshocked MC (which we named as 3C391:NML) associated with supernova remnant (SNR) 3C391. We detected emission lines of 18 molecular species in total and estimated their column densities with local thermodynamic equilibrium (LTE) and non-LTE analysis. Using the abundance ratio N(HCO+)/N(CO) and an upper limit of N(DCO+)/N(HCO+), we estimated the CR ionization rate of 3C391:NML is $ζ\gtrsim 2.7\times 10^{-14}\rm \ s^{-1}$ with an analytic method. However, we caution on adopting this value because chemical equilibrium, which is a prerequisite of using the equations, is not necessarily reached in 3C391:NML. We observed lower N(HCO+)/N(HOC+), higher N(HCS+)/N(CS), and higher X($l$-C3H+) by an order of magnitude in 3C391:NML than the typical values in quiescent dense MCs. We found that an enhanced CR ionization rate (of order $\sim 10^{-15}$ or $\sim 10^{-14}\rm \ s^{-1}$) is needed to reproduce the observation with chemical model. This is higher than the values found in typical MCs by 2--3 orders of magnitude.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders
Authors:
Baijiong Lin,
Weisen Jiang,
Pengguang Chen,
Shu Liu,
Ying-Cong Chen
Abstract:
Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two…
▽ More
Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior performance of MTMamba++ over CNN-based and Transformer-based methods. The code is available at https://github.com/EnVision-Research/MTMamba.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Stochastic dominance for super heavy-tailed random variables
Authors:
Yuyu Chen,
Seva Shneer
Abstract:
We introduce a class of super heavy-tailed distributions and establish the inequality that any weighted average of independent and identically distributed super heavy-tailed random variables stochastically dominates one such random variable. We show that many commonly used extremely heavy-tailed (i.e., infinite-mean) distributions, such as the Pareto, Fréchet, and Burr distributions, belong to the…
▽ More
We introduce a class of super heavy-tailed distributions and establish the inequality that any weighted average of independent and identically distributed super heavy-tailed random variables stochastically dominates one such random variable. We show that many commonly used extremely heavy-tailed (i.e., infinite-mean) distributions, such as the Pareto, Fréchet, and Burr distributions, belong to the class of super heavy-tailed distributions. The established stochastic dominance relation is further generalized to allow negatively dependent or non-identically distributed random variables. In particular, the weighted average of non-identically distributed random variables stochastically dominates their distribution mixtures. Applications of these results in portfolio diversification, goods bundling, and inventory management are discussed. Remarkably, in the presence of super heavy-tailedness, the results that hold for finite-mean models in these applications are flipped.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Graph and Sequential Neural Networks in Session-based Recommendation: A Survey
Authors:
Zihao Li,
Chao Yang,
Yakun Chen,
Xianzhi Wang,
Hongxu Chen,
Guandong Xu,
Lina Yao,
Quan Z. Sheng
Abstract:
Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview…
▽ More
Recent years have witnessed the remarkable success of recommendation systems (RSs) in alleviating the information overload problem. As a new paradigm of RSs, session-based recommendation (SR) specializes in users' short-term preference capture and aims to provide a more dynamic and timely recommendation based on the ongoing interacted actions. In this survey, we will give a comprehensive overview of the recent works on SR. First, we clarify the definitions of various SR tasks and introduce the characteristics of session-based recommendation against other recommendation tasks. Then, we summarize the existing methods in two categories: sequential neural network based methods and graph neural network (GNN) based methods. The standard frameworks and technical are also introduced. Finally, we discuss the challenges of SR and new research directions in this area.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Data-driven Effective Modeling of Multiscale Stochastic Dynamical Systems
Authors:
Yuan Chen,
Dongbin Xiu
Abstract:
We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effecti…
▽ More
We present a numerical method for learning the dynamics of slow components of unknown multiscale stochastic dynamical systems. While the governing equations of the systems are unknown, bursts of observation data of the slow variables are available. By utilizing the observation data, our proposed method is capable of constructing a generative stochastic model that can accurately capture the effective dynamics of the slow variables in distribution. We present a comprehensive set of numerical examples to demonstrate the performance of the proposed method.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
PAT: Pruning-Aware Tuning for Large Language Models
Authors:
Yijiang Liu,
Huanrui Yang,
Youxin Chen,
Rongyu Zhang,
Miao Wang,
Yuan Du,
Li Du
Abstract:
Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which reduces less significant weight dimensions, is one solution. Yet, traditional post-hoc pruning often leads to significant performance loss, with limited recovery fro…
▽ More
Large language models (LLMs) excel in language tasks, especially with supervised fine-tuning after pre-training. However, their substantial memory and computational requirements hinder practical applications. Structural pruning, which reduces less significant weight dimensions, is one solution. Yet, traditional post-hoc pruning often leads to significant performance loss, with limited recovery from further fine-tuning due to reduced capacity. Since the model fine-tuning refines the general and chaotic knowledge in pre-trained models, we aim to incorporate structural pruning with the fine-tuning, and propose the Pruning-Aware Tuning (PAT) paradigm to eliminate model redundancy while preserving the model performance to the maximum extend. Specifically, we insert the innovative Hybrid Sparsification Modules (HSMs) between the Attention and FFN components to accordingly sparsify the upstream and downstream linear modules. The HSM comprises a lightweight operator and a globally shared trainable mask. The lightweight operator maintains a training overhead comparable to that of LoRA, while the trainable mask unifies the channels to be sparsified, ensuring structural pruning. Additionally, we propose the Identity Loss which decouples the transformation and scaling properties of the HSMs to enhance training robustness. Extensive experiments demonstrate that PAT excels in both performance and efficiency. For example, our Llama2-7b model with a 25\% pruning ratio achieves 1.33$\times$ speedup while outperforming the LoRA-finetuned model by up to 1.26\% in accuracy with a similar training cost. Code: https://github.com/kriskrisliu/PAT_Pruning-Aware-Tuning
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
KBSS-InCLOSE I: Design and First Results from the Inner CGM of QSO Line Of Sight Emitting Galaxies at z~2-3
Authors:
Evan Haze Nunez,
Charles C. Steidel,
Evan N. Kirby,
Gwen C. Rudie,
Nikolaus Z. Prusinski,
Yuguang Chen,
Zhuyun Zhuang,
Allison L. Strom,
Dawn K. Erb,
Max Pettini,
Louise Welsh,
Dave S. N. Rupke,
Ryan J. Cooke
Abstract:
We present the design and first results of the Inner Circumgalactic Medium (CGM) of QSO Line of Sight Emitting galaxies at $z\sim 2-3$, KBSS-InCLOSE. The survey will connect galaxy properties (e.g., stellar mass $M_*$, interstellar medium ISM metallicity) with the physical conditions of the inner CGM (e.g., kinematics, metallicity) to directly observe the galaxy-scale baryon cycle. We obtain deep…
▽ More
We present the design and first results of the Inner Circumgalactic Medium (CGM) of QSO Line of Sight Emitting galaxies at $z\sim 2-3$, KBSS-InCLOSE. The survey will connect galaxy properties (e.g., stellar mass $M_*$, interstellar medium ISM metallicity) with the physical conditions of the inner CGM (e.g., kinematics, metallicity) to directly observe the galaxy-scale baryon cycle. We obtain deep Keck/KCWI optical IFU pointings of Keck Baryonic Structure Survey (KBSS) QSOs to discover new star-forming galaxies at small projected distances $b\lesssim12"$ (98 kpc, $\overline{z}=2.3$), then obtain follow-up Keck/MOSFIRE NIR spectra to confirm their redshifts. We leverage KBSS images and Keck/HIRES QSO spectra to model stellar populations and inner CGM absorption. In this paper, we analyze two QSO fields and discover more than 15 new galaxies with KCWI, then use MOSFIRE for two galaxies Q2343-G1 ($z=2.43$; G1) and Q2233-N1 ($z=3.15$; N1), which are both associated with Damped Lyman Alpha absorbers. We find that G1 has typical $M_*$,UV/optical emission properties. N1 has lower $M_*$ with very strong nebular emission. We jointly analyze neutral phase CGM and ionized ISM in N/O (for the first time at this $z$), dust extinction, and high-ionization CGM finding that: G1's CGM is metal poor and less evolved than its ISM, while N1's CGM and ISM abundances are comparable; their CGM shows $\sim1$ dex less dust extinction than the ISM; and G1's CGM has direct evidence of hot, metal-rich galactic outflow ejecta. These findings support that metals and dust are driven into the CGM from outflows, but may also be e.g., stripped ISM gas or satellite enrichment. The full KBSS-InCLOSE sample will explore these scenarios.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
The Impact of Group Discussion and Formation on Student Performance: An Experience Report in a Large CS1 Course
Authors:
Tong Wu,
Xiaohang Tang,
Sam Wong,
Xi Chen,
Clifford A. Shaffer,
Yan Chen
Abstract:
Programming instructors often conduct collaborative learning activities, such as Peer Instruction (PI), to enhance student motivation, engagement, and learning gains. However, the impact of group discussion and formation mechanisms on student performance remains unclear. To investigate this, we conducted an 11-session experiment in a large, in-person CS1 course. We employed both random and experti…
▽ More
Programming instructors often conduct collaborative learning activities, such as Peer Instruction (PI), to enhance student motivation, engagement, and learning gains. However, the impact of group discussion and formation mechanisms on student performance remains unclear. To investigate this, we conducted an 11-session experiment in a large, in-person CS1 course. We employed both random and expertise-balanced grouping methods to examine the efficacy of different group mechanisms and the impact of expert students' presence on collaborative learning. Our observations revealed complex dynamics within the collaborative learning environment. Among 255 groups, 146 actively engaged in discussions, with 96 of these groups demonstrating improvement for poor-performing students. Interestingly, our analysis revealed that different grouping methods (expertise-balanced or random) did not significantly influence discussion engagement or poor-performing students' improvement. In our deeper qualitative analysis, we found that struggling students often derived benefits from interactions with expert peers, but this positive effect was not consistent across all groups. We identified challenges that expert students face in peer instruction interactions, highlighting the complexity of leveraging expertise within group discussions.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Integrated Sensing, Communication, and Powering over Multi-antenna OFDM Systems
Authors:
Yilong Chen,
Chao Hu,
Zixiang Ren,
Han Hu,
Jie Xu,
Lexi Xu,
Lei Liu,
Shuguang Cui
Abstract:
This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on t…
▽ More
This paper considers a multi-functional orthogonal frequency division multiplexing (OFDM) system with integrated sensing, communication, and powering (ISCAP), in which a multi-antenna base station (BS) transmits OFDM signals to simultaneously deliver information to multiple information receivers (IRs), provide energy supply to multiple energy receivers (ERs), and sense potential targets based on the echo signals. To facilitate ISCAP, the BS employs the joint transmit beamforming design by sending dedicated sensing/energy beams jointly with information beams. Furthermore, we consider the beam scanning for sensing, in which the joint beams scan in different directions over time to sense potential targets. In order to ensure the sensing beam scanning performance and meet the communication and powering requirements, it is essential to properly schedule IRs and ERs and design the resource allocation over time, frequency, and space. More specifically, we optimize the joint transmit beamforming over multiple OFDM symbols and subcarriers, with the objective of minimizing the average beampattern matching error of beam scanning for sensing, subject to the constraints on the average communication rates at IRs and the average harvested power at ERs. We find converged high-quality solutions to the formulated problem by proposing efficient iterative algorithms based on advanced optimization techniques. We also develop various heuristic designs based on the principles of zero-forcing (ZF) beamforming, round-robin user scheduling, and time switching, respectively. Numerical results show that our proposed algorithms adaptively generate information and sensing/energy beams at each time-frequency slot to match the scheduled IRs/ERs with the desired scanning beam, significantly outperforming the heuristic designs.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection
Authors:
Zhongwen Yu,
Qiu Guan,
Jianmin Yang,
Zhiqiang Yang,
Qianwei Zhou,
Yang Chen,
Feng Chen
Abstract:
In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl…
▽ More
In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above problems, we propose a novel model named Lightweight Shunt Matching-YOLO (LSM-YOLO), with Lightweight Adaptive Extraction (LAE) and Multipath Shunt Feature Matching (MSFM). Firstly, by using LAE to refine feature extraction, the model can obtain more contextual information and high-resolution details from multiscale feature maps, thereby extracting detailed features of ROI in medical images while reducing the influence of noise. Secondly, MSFM is utilized to further refine the fusion of high-level semantic features and low-level visual features, enabling better fusion between ROI features and neighboring features, thereby improving the detection rate for better diagnostic assistance. Experimental results demonstrate that LSM-YOLO achieves 48.6% AP on a private dataset of pancreatic tumors, 65.1% AP on the BCCD blood cell detection public dataset, and 73.0% AP on the Br35h brain tumor detection public dataset. Our model achieves state-of-the-art performance with minimal parameter cost on the above three datasets. The source codes are at: https://github.com/VincentYuuuuuu/LSM-YOLO.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Riemann-based Multi-scale Attention Reasoning Network for Text-3D Retrieval
Authors:
Wenrui Li,
Wei Han,
Yandu Chen,
Yeyu Chai,
Yidan Lu,
Xingtao Wang,
Xiaopeng Fan
Abstract:
Due to the challenges in acquiring paired Text-3D data and the inherent irregularity of 3D data structures, combined representation learning of 3D point clouds and text remains unexplored. In this paper, we propose a novel Riemann-based Multi-scale Attention Reasoning Network (RMARN) for text-3D retrieval. Specifically, the extracted text and point cloud features are refined by their respective Ad…
▽ More
Due to the challenges in acquiring paired Text-3D data and the inherent irregularity of 3D data structures, combined representation learning of 3D point clouds and text remains unexplored. In this paper, we propose a novel Riemann-based Multi-scale Attention Reasoning Network (RMARN) for text-3D retrieval. Specifically, the extracted text and point cloud features are refined by their respective Adaptive Feature Refiner (AFR). Furthermore, we introduce the innovative Riemann Local Similarity (RLS) module and the Global Pooling Similarity (GPS) module. However, as 3D point cloud data and text data often possess complex geometric structures in high-dimensional space, the proposed RLS employs a novel Riemann Attention Mechanism to reflect the intrinsic geometric relationships of the data. Without explicitly defining the manifold, RMARN learns the manifold parameters to better represent the distances between text-point cloud samples. To address the challenges of lacking paired text-3D data, we have created the large-scale Text-3D Retrieval dataset T3DR-HIT, which comprises over 3,380 pairs of text and point cloud data. T3DR-HIT contains coarse-grained indoor 3D scenes and fine-grained Chinese artifact scenes, consisting of 1,380 and over 2,000 text-3D pairs, respectively. Experiments on our custom datasets demonstrate the superior performance of the proposed method. Our code and proposed datasets are available at \url{https://github.com/liwrui/RMARN}.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Guided and Fused: Efficient Frozen CLIP-ViT with Feature Guidance and Multi-Stage Feature Fusion for Generalizable Deepfake Detection
Authors:
Yingjian Chen,
Lei Zhang,
Yakun Niu,
Pei Chen,
Lei Tan,
Jing Zhou
Abstract:
The rise of generative models has sparked concerns about image authenticity online, highlighting the urgent need for an effective and general detector. Recent methods leveraging the frozen pre-trained CLIP-ViT model have made great progress in deepfake detection. However, these models often rely on visual-general features directly extracted by the frozen network, which contain excessive informatio…
▽ More
The rise of generative models has sparked concerns about image authenticity online, highlighting the urgent need for an effective and general detector. Recent methods leveraging the frozen pre-trained CLIP-ViT model have made great progress in deepfake detection. However, these models often rely on visual-general features directly extracted by the frozen network, which contain excessive information irrelevant to the task, resulting in limited detection performance. To address this limitation, in this paper, we propose an efficient Guided and Fused Frozen CLIP-ViT (GFF), which integrates two simple yet effective modules. The Deepfake-Specific Feature Guidance Module (DFGM) guides the frozen pre-trained model in extracting features specifically for deepfake detection, reducing irrelevant information while preserving its generalization capabilities. The Multi-Stage Fusion Module (FuseFormer) captures low-level and high-level information by fusing features extracted from each stage of the ViT. This dual-module approach significantly improves deepfake detection by fully leveraging CLIP-ViT's inherent advantages. Extensive experiments demonstrate the effectiveness and generalization ability of GFF, which achieves state-of-the-art performance with optimal results in only 5 training epochs. Even when trained on only 4 classes of ProGAN, GFF achieves nearly 99% accuracy on unseen GANs and maintains an impressive 97% accuracy on unseen diffusion models.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Quantum error correction below the surface code threshold
Authors:
Rajeev Acharya,
Laleh Aghababaie-Beni,
Igor Aleiner,
Trond I. Andersen,
Markus Ansmann,
Frank Arute,
Kunal Arya,
Abraham Asfaw,
Nikita Astrakhantsev,
Juan Atalaya,
Ryan Babbush,
Dave Bacon,
Brian Ballard,
Joseph C. Bardin,
Johannes Bausch,
Andreas Bengtsson,
Alexander Bilmes,
Sam Blackwell,
Sergio Boixo,
Gina Bortoli,
Alexandre Bourassa,
Jenna Bovaird,
Leon Brill,
Michael Broughton,
David A. Browne
, et al. (224 additional authors not shown)
Abstract:
Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this…
▽ More
Quantum error correction provides a path to reach practical quantum computing by combining multiple physical qubits into a logical qubit, where the logical error rate is suppressed exponentially as more qubits are added. However, this exponential suppression only occurs if the physical error rate is below a critical threshold. In this work, we present two surface code memories operating below this threshold: a distance-7 code and a distance-5 code integrated with a real-time decoder. The logical error rate of our larger quantum memory is suppressed by a factor of $Λ$ = 2.14 $\pm$ 0.02 when increasing the code distance by two, culminating in a 101-qubit distance-7 code with 0.143% $\pm$ 0.003% error per cycle of error correction. This logical memory is also beyond break-even, exceeding its best physical qubit's lifetime by a factor of 2.4 $\pm$ 0.3. We maintain below-threshold performance when decoding in real time, achieving an average decoder latency of 63 $μ$s at distance-5 up to a million cycles, with a cycle time of 1.1 $μ$s. To probe the limits of our error-correction performance, we run repetition codes up to distance-29 and find that logical performance is limited by rare correlated error events occurring approximately once every hour, or 3 $\times$ 10$^9$ cycles. Our results present device performance that, if scaled, could realize the operational requirements of large scale fault-tolerant quantum algorithms.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Integrating Multi-Head Convolutional Encoders with Cross-Attention for Improved SPARQL Query Translation
Authors:
Yi-Hui Chen,
Eric Jui-Lin Lu,
Kwan-Ho Cheng
Abstract:
The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NM…
▽ More
The main task of the KGQA system (Knowledge Graph Question Answering) is to convert user input questions into query syntax (such as SPARQL). With the rise of modern popular encoders and decoders like Transformer and ConvS2S, many scholars have shifted the research direction of SPARQL generation to the Neural Machine Translation (NMT) architecture or the generative AI field of Text-to-SPARQL. In NMT-based QA systems, the system treats knowledge base query syntax as a language. It uses NMT-based translation models to translate natural language questions into query syntax. Scholars use popular architectures equipped with cross-attention, such as Transformer, ConvS2S, and BiLSTM, to train translation models for query syntax. To achieve better query results, this paper improved the ConvS2S encoder and added multi-head attention from the Transformer, proposing a Multi-Head Conv encoder (MHC encoder) based on the n-gram language model. The principle is to use convolutional layers to capture local hidden features in the input sequence with different receptive fields, using multi-head attention to calculate dependencies between them. Ultimately, we found that the translation model based on the Multi-Head Conv encoder achieved better performance than other encoders, obtaining 76.52\% and 83.37\% BLEU-1 (BiLingual Evaluation Understudy) on the QALD-9 and LC-QuAD-1.0 datasets, respectively. Additionally, in the end-to-end system experiments on the QALD-9 and LC-QuAD-1.0 datasets, we achieved leading results over other KGQA systems, with Macro F1-measures reaching 52\% and 66\%, respectively. Moreover, the experimental results show that with limited computational resources, if one possesses an excellent encoder-decoder architecture and cross-attention, experts and scholars can achieve outstanding performance equivalent to large pre-trained models using only general embeddings.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Training-free Long Video Generation with Chain of Diffusion Model Experts
Authors:
Wenhao Li,
Yichao Cao,
Xiu Su,
Xi Lin,
Shan You,
Mingkai Zheng,
Yi Chen,
Chang Xu
Abstract:
Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{…
▽ More
Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{con}trol and spatial-temporal re\textbf{fine}ment. It can generate high-quality videos with chain of off-the-shelf diffusion model experts, each expert responsible for a decoupled subtask. During the refinement, we introduce coordinated denoising, which can merge multiple diffusion experts' capabilities into a single sampling. Furthermore, we design ConFiner-Long framework, which can generate long coherent video with three constraint strategies on ConFiner. Experimental results indicate that with only 10\% of the inference cost, our ConFiner surpasses representative models like Lavie and Modelscope across all objective and subjective metrics. And ConFiner-Long can generate high-quality and coherent videos with up to 600 frames.
△ Less
Submitted 2 September, 2024; v1 submitted 23 August, 2024;
originally announced August 2024.
-
TVG: A Training-free Transition Video Generation Method with Diffusion Models
Authors:
Rui Zhang,
Yaosen Chen,
Yuegen Liu,
Wei Wang,
Xuming Wen,
Hongxia Wang
Abstract:
Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationshi…
▽ More
Transition videos play a crucial role in media production, enhancing the flow and coherence of visual narratives. Traditional methods like morphing often lack artistic appeal and require specialized skills, limiting their effectiveness. Recent advances in diffusion model-based video generation offer new possibilities for creating transitions but face challenges such as poor inter-frame relationship modeling and abrupt content changes. We propose a novel training-free Transition Video Generation (TVG) approach using video-level diffusion models that addresses these limitations without additional training. Our method leverages Gaussian Process Regression ($\mathcal{GPR}$) to model latent representations, ensuring smooth and dynamic transitions between frames. Additionally, we introduce interpolation-based conditional controls and a Frequency-aware Bidirectional Fusion (FBiF) architecture to enhance temporal control and transition reliability. Evaluations of benchmark datasets and custom image pairs demonstrate the effectiveness of our approach in generating high-quality smooth transition videos. The code are provided in https://sobeymil.github.io/tvg.com.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias
Authors:
Yifan Chen,
Xiaoou Cheng,
Jonathan Niles-Weed,
Jonathan Weare
Abstract:
The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or…
▽ More
The unadjusted Langevin algorithm is commonly used to sample probability distributions in extremely high-dimensional settings. However, existing analyses of the algorithm for strongly log-concave distributions suggest that, as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error in the $W_2$ metric scales in proportion to $d$ or $\sqrt{d}$. In this paper, we argue that, despite this poor scaling of the $W_2$ error for the full set of variables, the behavior for a small number of variables can be significantly better: a number of iterations proportional to $K$, up to logarithmic terms in $d$, often suffices for the algorithm to converge to within a desired $W_2$ error for all $K$-marginals. We refer to this effect as delocalization of bias. We show that the delocalization effect does not hold universally and prove its validity for Gaussian distributions and strongly log-concave distributions with certain sparse interactions. Our analysis relies on a novel $W_{2,\ell^\infty}$ metric to measure convergence. A key technical challenge we address is the lack of a one-step contraction property in this metric. Finally, we use asymptotic arguments to explore potential generalizations of the delocalization effect beyond the Gaussian and sparse interactions setting.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points
Authors:
Bing He,
Yunuo Chen,
Guo Lu,
Li Song,
Wenjun Zhang
Abstract:
Recently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scen…
▽ More
Recently, the dynamic scene reconstruction using Gaussians has garnered increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in the canonical space. However, the inherently low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To overcome these challenges, we introduce a novel approach utilizing discrete 3D control points. This method models local rays physically and establishes a motion-decoupling coordinate system, which effectively merges traditional graphics with learnable pipelines for a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that incorporates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D real-world reconstruction into four independent submodules: 3D segmentation, 3D control points generation, object-wise motion manipulation, and residual compensation. Our experiments demonstrate that this method outperforms existing state-of-the-art 4D Gaussian Splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Our approach also significantly accelerates training, with the optimization of our 3D control points achievable within just 2 seconds per frame on a single NVIDIA 4070 GPU.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
VCEMO: Multi-Modal Emotion Recognition for Chinese Voiceprints
Authors:
Jinghua Tang,
Liyun Zhang,
Yu Lu,
Dian Ding,
Lanqing Yang,
YiChao Chen,
Minjie Bian,
Xiaoshan Li,
Guangtao Xue
Abstract:
Emotion recognition can enhance humanized machine responses to user commands, while voiceprint-based perception systems can be easily integrated into commonly used devices like smartphones and stereos. Despite having the largest number of speakers, there is a noticeable absence of high-quality corpus datasets for emotion recognition using Chinese voiceprints. Hence, this paper introduces the VCEMO…
▽ More
Emotion recognition can enhance humanized machine responses to user commands, while voiceprint-based perception systems can be easily integrated into commonly used devices like smartphones and stereos. Despite having the largest number of speakers, there is a noticeable absence of high-quality corpus datasets for emotion recognition using Chinese voiceprints. Hence, this paper introduces the VCEMO dataset to address this deficiency. The proposed dataset is constructed from everyday conversations and comprises over 100 users and 7,747 textual samples. Furthermore, this paper proposes a multimodal-based model as a benchmark, which effectively fuses speech, text, and external knowledge using a co-attention structure. The system employs contrastive learning-based regulation for the uneven distribution of the dataset and the diversity of emotional expressions. The experiments demonstrate the significant improvement of the proposed model over SOTA on the VCEMO and IEMOCAP datasets. Code and dataset will be released for research.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Data-Driven Parametrization of Molecular Mechanics Force Fields for Expansive Chemical Space Coverage
Authors:
Tianze Zheng,
Ailun Wang,
Xu Han,
Yu Xia,
Xingyuan Xu,
Jiawei Zhan,
Yu Liu,
Yang Chen,
Zhi Wang,
Xiaojie Wu,
Sheng Gong,
Wen Yan
Abstract:
A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this…
▽ More
A force field is a critical component in molecular dynamics simulations for computational drug discovery. It must achieve high accuracy within the constraints of molecular mechanics' (MM) limited functional forms, which offers high computational efficiency. With the rapid expansion of synthetically accessible chemical space, traditional look-up table approaches face significant challenges. In this study, we address this issue using a modern data-driven approach, developing ByteFF, an Amber-compatible force field for drug-like molecules. To create ByteFF, we generated an expansive and highly diverse molecular dataset at the B3LYP-D3(BJ)/DZVP level of theory. This dataset includes 2.4 million optimized molecular fragment geometries with analytical Hessian matrices, along with 3.2 million torsion profiles. We then trained an edge-augmented, symmetry-preserving molecular graph neural network (GNN) on this dataset, employing a carefully optimized training strategy. Our model predicts all bonded and non-bonded MM force field parameters for drug-like molecules simultaneously across a broad chemical space. ByteFF demonstrates state-of-the-art performance on various benchmark datasets, excelling in predicting relaxed geometries, torsional energy profiles, and conformational energies and forces. Its exceptional accuracy and expansive chemical space coverage make ByteFF a valuable tool for multiple stages of computational drug discovery.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation
Authors:
Xiaowei Mao,
Yan Lin,
Shengnan Guo,
Yubin Chen,
Xingyu Xian,
Haomin Wen,
Qisen Xu,
Youfang Lin,
Huaiyu Wan
Abstract:
Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground t…
▽ More
Uncertainty quantification in travel time estimation (TTE) aims to estimate the confidence interval for travel time, given the origin (O), destination (D), and departure time (T). Accurately quantifying this uncertainty requires generating the most likely path and assessing travel time uncertainty along the path. This involves two main challenges: 1) Predicting a path that aligns with the ground truth, and 2) modeling the impact of travel time in each segment on overall uncertainty under varying conditions. We propose DutyTTE to address these challenges. For the first challenge, we introduce a deep reinforcement learning method to improve alignment between the predicted path and the ground truth, providing more accurate travel time information from road segments to improve TTE. For the second challenge, we propose a mixture of experts guided uncertainty quantification mechanism to better capture travel time uncertainty for each segment under varying contexts. Additionally, we calibrate our results using Hoeffding's upper-confidence bound to provide statistical guarantees for the estimated confidence intervals. Extensive experiments on two real-world datasets demonstrate the superiority of our proposed method.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Broad-band X-ray spectral and timing properties of the accreting millisecond X-ray pulsar IGR J17498$-$2921 during the 2023 outburst
Authors:
Zhaosheng Li,
L. Kuiper,
Y. Y. Pan,
M. Falanga,
J. Poutanen,
Y. P. Chen,
R. X. Xu,
M. Y. Ge,
Y. Huang,
L. M. Song,
S. Zhang,
F. J. Lu,
S. N. Zhang
Abstract:
We report on the broadband spectral and timing properties of the accreting millisecond X-ray pulsar IGR J17498$-$2921 during its April 2023 outburst using data from NICER (1$-$10 keV), NuSTAR (3$-$79 keV), Insight-HXMT (2$-$150 keV), and INTEGRAL (30$-$150 keV). We detect significant 401 Hz pulsations across the 0.5$-$150 keV band. The pulse fraction increases from $\sim$2% at 1 keV to $\sim$13% a…
▽ More
We report on the broadband spectral and timing properties of the accreting millisecond X-ray pulsar IGR J17498$-$2921 during its April 2023 outburst using data from NICER (1$-$10 keV), NuSTAR (3$-$79 keV), Insight-HXMT (2$-$150 keV), and INTEGRAL (30$-$150 keV). We detect significant 401 Hz pulsations across the 0.5$-$150 keV band. The pulse fraction increases from $\sim$2% at 1 keV to $\sim$13% at 66 keV. Five type-I X-ray bursts have been detected, including three photospheric radius expansion bursts, with a rise time of $\sim$2 s and an exponential decay time of $\sim$5 s. The recurrence time is $\sim$9.1 h, which can be explained by unstable thermonuclear burning of hydrogen-deficient material on the neutron star surface. The quasi-simultaneous 1$-$150 keV broadband spectra from NICER, NuSTAR, and INTEGRAL can be well fitted by an absorbed reflection model, relxillCp, and a Gaussian line of instrumental origin. The Comptonized emission from the hot corona is characterized by a photon index $Γ$ of $\sim$1.8 and an electron temperature $kT_{\rm e}$ of $\sim$40 keV. We obtain a low inclination angle $i\sim34^{\circ}$. The accretion disk shows properties of strong ionization, $\log(ξ/{\rm erg~cm~s^{-1}})\sim4.5$, over-solar abundance, $A_{\rm Fe}\sim 7.7$, and high density, $\log(n_{\rm e}/{\rm cm^{-3}})\sim 19.5$. However, a lower disk density with normal abundance and ionization could also be possible. From the inner disk radius $R_{\rm in}=1.67R_{\rm ISCO}$ and the long-term spin-down rate of $-3.1(2)\times10^{-15}~{\rm Hz~s^{-1}}$, we constrain the magnetic field of IGR J17498$-$2921 in the range of $(0.9-2.4)\times10^8$ G.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
DUNE Phase II: Scientific Opportunities, Detector Concepts, Technological Solutions
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
H. Amar,
P. Amedo,
J. Anderson,
C. Andreopoulos,
M. Andreotti
, et al. (1347 additional authors not shown)
Abstract:
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I…
▽ More
The international collaboration designing and constructing the Deep Underground Neutrino Experiment (DUNE) at the Long-Baseline Neutrino Facility (LBNF) has developed a two-phase strategy toward the implementation of this leading-edge, large-scale science project. The 2023 report of the US Particle Physics Project Prioritization Panel (P5) reaffirmed this vision and strongly endorsed DUNE Phase I and Phase II, as did the European Strategy for Particle Physics. While the construction of the DUNE Phase I is well underway, this White Paper focuses on DUNE Phase II planning. DUNE Phase-II consists of a third and fourth far detector (FD) module, an upgraded near detector complex, and an enhanced 2.1 MW beam. The fourth FD module is conceived as a "Module of Opportunity", aimed at expanding the physics opportunities, in addition to supporting the core DUNE science program, with more advanced technologies. This document highlights the increased science opportunities offered by the DUNE Phase II near and far detectors, including long-baseline neutrino oscillation physics, neutrino astrophysics, and physics beyond the standard model. It describes the DUNE Phase II near and far detector technologies and detector design concepts that are currently under consideration. A summary of key R&D goals and prototyping phases needed to realize the Phase II detector technical designs is also provided. DUNE's Phase II detectors, along with the increased beam power, will complete the full scope of DUNE, enabling a multi-decadal program of groundbreaking science with neutrinos.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Surrogate Constructed Scalable Circuits ADAPT-VQE in the Schwinger model
Authors:
Erik Gustafson,
Kyle Sherbert,
Adrien Florio,
Karunya Shirali,
Yanzhu Chen,
Henry Lamm,
Semeon Valgushev,
Andreas Weichselbaum,
Sophia E. Economou,
Robert D. Pisarski,
Norm M. Tubman
Abstract:
Inspired by recent advancements of simulating periodic systems on quantum computers, we develop a new approach, (SC)$^2$-ADAPT-VQE, to further advance the simulation of these systems. Our approach extends the scalable circuits ADAPT-VQE framework, which builds an ansatz from a pool of coordinate-invariant operators defined for arbitrarily large, though not arbitrarily small, volumes. Our method us…
▽ More
Inspired by recent advancements of simulating periodic systems on quantum computers, we develop a new approach, (SC)$^2$-ADAPT-VQE, to further advance the simulation of these systems. Our approach extends the scalable circuits ADAPT-VQE framework, which builds an ansatz from a pool of coordinate-invariant operators defined for arbitrarily large, though not arbitrarily small, volumes. Our method uses a classically tractable ``Surrogate Constructed'' method to remove irrelevant operators from the pool, reducing the minimum size for which the scalable circuits are defined. Bringing together the scalable circuits and the surrogate constructed approaches forms the core of the (SC)$^2$ methodology. Our approach allows for a wider set of classical computations, on small volumes, which can be used for a more robust extrapolation protocol. While developed in the context of lattice models, the surrogate construction portion is applicable to a wide variety of problems where information about the relative importance of operators in the pool is available. As an example, we use it to compute properties of the Schwinger model - quantum electrodynamics for a single, massive fermion in $1+1$ dimensions - and show that our method can be used to accurately extrapolate to the continuum limit.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Weighted Envy-Freeness in House Allocation
Authors:
Sijia Dai,
Yankai Chen,
Xiaowei Wu,
Yicheng Xu,
Yong Zhang
Abstract:
The classic house allocation problem involves assigning $m$ houses to $n$ agents based on their utility functions, ensuring each agent receives exactly one house. A key criterion in these problems is satisfying fairness constraints such as envy-freeness. We extend this problem by considering agents with arbitrary weights, focusing on the concept of weighted envy-freeness, which has been extensivel…
▽ More
The classic house allocation problem involves assigning $m$ houses to $n$ agents based on their utility functions, ensuring each agent receives exactly one house. A key criterion in these problems is satisfying fairness constraints such as envy-freeness. We extend this problem by considering agents with arbitrary weights, focusing on the concept of weighted envy-freeness, which has been extensively studied in fair division. We present a polynomial-time algorithm to determine whether weighted envy-free allocations exist and, if so, to compute one. Since weighted envy-free allocations do not always exist, we also investigate the potential of achieving such allocations through the use of subsidies. We provide several characterizations for weighted envy-freeable allocations (allocations that can be turned weighted envy-free by introducing subsidies) and show that they do not always exist, which is different from the unweighted setting. Furthermore, we explore the existence of weighted envy-freeable allocations in specific scenarios and outline the conditions under which they exist.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Formation mechanism of the (2 x 1) reconstruction of calcite (104)
Authors:
Haojun Zhou,
Yingquan Chen,
Mingyue Ding,
Xiaoliang Zhong
Abstract:
Calcite has recently attracted extensive research interest in fields ranging from geoscience to carbon dioxide removal. Although much effort has been made to study the (2x1) reconstruction of the most stable (104) surface, the origin of this reconstruction remains unclear. Here, we carefully investigate the atomic and electronic structures of calcite (104) via density functional theory methods wit…
▽ More
Calcite has recently attracted extensive research interest in fields ranging from geoscience to carbon dioxide removal. Although much effort has been made to study the (2x1) reconstruction of the most stable (104) surface, the origin of this reconstruction remains unclear. Here, we carefully investigate the atomic and electronic structures of calcite (104) via density functional theory methods with van der Waals corrections. The results unambiguously show that the driving force for this reconstruction is the intrinsic demands of surface atoms to increase the coordination numbers. On reconstructing, calcite (104) forms four additional Ca-O bonds per (2x1) unit cell. Besides, phonon spectrums indicate both unreconstructed and reconstructed surfaces are dynamically stable. Finally, by applying the climbing image nudged elastic band method, an energy barrier is predicted during the reconstructing. This work delivers a full picture for the formation of calcite (104)-(2x1) reconstruction and can greatly advance the understanding of surface science for calcite.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
A framework for extracting the rates of photophysical processes from biexponentially decaying photon emission data
Authors:
Jill M. Cleveland,
Tory A. Welsch,
Eric Y. Chen,
D. Bruce Chase,
Matthew F. Doty,
Hanz Y. Ramírez-Gómez
Abstract:
There is strong interest in designing and realizing optically-active semiconductor nanostructures of greater complexity for applications in fields ranging from biomedical engineering to quantum computing. While these increasingly complex nanostructures can implement progressively sophisticated optical functions, the presence of more material constituents and interfaces also leads to increasingly c…
▽ More
There is strong interest in designing and realizing optically-active semiconductor nanostructures of greater complexity for applications in fields ranging from biomedical engineering to quantum computing. While these increasingly complex nanostructures can implement progressively sophisticated optical functions, the presence of more material constituents and interfaces also leads to increasingly complex exciton dynamics. In particular, the rates of carrier trapping and detrapping in complex heterostructures are critically important for advanced optical functionality, but they can rarely be directly measured. In this work, we develop a model that includes trapping and release of carriers by optically inactive states. The model explains the widely observed biexponential decay of the photoluminescence signal from neutral excitons in low dimensional semiconductor emitters. The model also allows determination of likelihood intervals for all the transition rates involved in the emission dynamics, without the use of approximations. Furthermore, in cases for which the high temperature limit is suitable, the model leads to specific values of such rates, outperforming reduced models previously used to estimate those quantities. We demonstrate the value of this model by applying it to time resolved photoluminescence measurements of CdSeTe/CdS heterostructures. We obtain values not only for the radiative and nonradiative lifetimes, but also for the delayed photoluminescence originating in trapping and release.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Self-supervised Learning for Geospatial AI: A Survey
Authors:
Yile Chen,
Weiming Huang,
Kaiqi Zhao,
Yue Jiang,
Gao Cong
Abstract:
The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. Th…
▽ More
The proliferation of geospatial data in urban and territorial environments has significantly facilitated the development of geospatial artificial intelligence (GeoAI) across various urban applications. Given the vast yet inherently sparse labeled nature of geospatial data, there is a critical need for techniques that can effectively leverage such data without heavy reliance on labeled datasets. This requirement aligns with the principles of self-supervised learning (SSL), which has attracted increasing attention for its adoption in geospatial data. This paper conducts a comprehensive and up-to-date survey of SSL techniques applied to or developed for three primary data (geometric) types prevalent in geospatial vector data: points, polylines, and polygons. We systematically categorize various SSL techniques into predictive and contrastive methods, discussing their application with respect to each data type in enhancing generalization across various downstream tasks. Furthermore, we review the emerging trends of SSL for GeoAI, and several task-specific SSL techniques. Finally, we discuss several key challenges in the current research and outline promising directions for future investigation. By presenting a structured analysis of relevant studies, this paper aims to inspire continued advancements in the integration of SSL with GeoAI, encouraging innovative methods to harnessing the power of geospatial data.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Authors:
Luyao Cheng,
Hui Wang,
Siqi Zheng,
Yafeng Chen,
Rongjie Huang,
Qinglin Zhang,
Qian Chen,
Xihao Li
Abstract:
Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals…
▽ More
Speaker diarization, the process of segmenting an audio stream or transcribed speech content into homogenous partitions based on speaker identity, plays a crucial role in the interpretation and analysis of human speech. Most existing speaker diarization systems rely exclusively on unimodal acoustic information, making the task particularly challenging due to the innate ambiguities of audio signals. Recent studies have made tremendous efforts towards audio-visual or audio-semantic modeling to enhance performance. However, even the incorporation of up to two modalities often falls short in addressing the complexities of spontaneous and unstructured conversations. To exploit more meaningful dialogue patterns, we propose a novel multimodal approach that jointly utilizes audio, visual, and semantic cues to enhance speaker diarization. Our method elegantly formulates the multimodal modeling as a constrained optimization problem. First, we build insights into the visual connections among active speakers and the semantic interactions within spoken content, thereby establishing abundant pairwise constraints. Then we introduce a joint pairwise constraint propagation algorithm to cluster speakers based on these visual and semantic constraints. This integration effectively leverages the complementary strengths of different modalities, refining the affinity estimation between individual speaker embeddings. Extensive experiments conducted on multiple multimodal datasets demonstrate that our approach consistently outperforms state-of-the-art speaker diarization methods.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
A Deconfounding Approach to Climate Model Bias Correction
Authors:
Wentao Gao,
Jiuyong Li,
Debo Cheng,
Lin Liu,
Jixue Liu,
Thuc Duy Le,
Xiaojing Du,
Xiongren Chen,
Yanchang Zhao,
Yun Chen
Abstract:
Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglec…
▽ More
Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, GCM outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate representation of complex climate phenomena. Traditional bias correction methods, which rely on historical observation data and statistical techniques, often neglect unobserved confounders, leading to biased results. This paper proposes a novel bias correction approach to utilize both GCM and observational data to learn a factor model that captures multi-cause latent confounders. Inspired by recent advances in causality based time series deconfounding, our method first constructs a factor model to learn latent confounders from historical data and then applies them to enhance the bias correction process using advanced time series forecasting models. The experimental results demonstrate significant improvements in the accuracy of precipitation outputs. By addressing unobserved confounders, our approach offers a robust and theoretically grounded solution for climate model bias correction.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Radiation Hydrodynamic Simulations of Massive Stars in Gas-rich Environments: Accretion of AGN Stars Suppressed By Thermal Feedback
Authors:
Yi-Xian Chen,
Yan-Fei Jiang,
Jeremy Goodman,
Douglas N. C. Lin
Abstract:
Massive stars may form in or be captured into AGN disks. Recent 1D studies employing stellar-evolution codes have demonstrated the potential for rapid growth of such stars through accretion up to a few hundred $M_\odot$. We perform 3D radiation hydrodynamic simulations of moderately massive stars' envelopes, in order to determine the rate and critical radius $R_{\rm crit}$ of their accretion proce…
▽ More
Massive stars may form in or be captured into AGN disks. Recent 1D studies employing stellar-evolution codes have demonstrated the potential for rapid growth of such stars through accretion up to a few hundred $M_\odot$. We perform 3D radiation hydrodynamic simulations of moderately massive stars' envelopes, in order to determine the rate and critical radius $R_{\rm crit}$ of their accretion process in an isotropic gas-rich environment in the absence of luminosity-driven mass loss. We find that in the ``fast-diffusion" regime where characteristic radiative diffusion speed $c/τ$ is faster than the gas sound speed $c_s$, the accretion rate is suppressed by feedback from gravitational and radiative advection energy flux, in addition to the stellar luminosity. Alternatively, in the ``slow-diffusion" regime where $c/τ<c_s$, due to adiabatic accretion, the stellar envelope expands quickly to become hydrostatic and further net accretion occurs on thermal timescales in the absence of self-gravity. When the radiation entropy of the medium is less than that of the star, however, this hydrostatic envelope can become more massive than the star itself. Within this sub-regime, self-gravity of the envelope excites runaway growth. Applying our results to realistic environments, moderately massive stars ($\lesssim 100M_\odot$) embedded in AGN disks typically accrete in the fast-diffusion regime, leading to reduction of steady-state accretion rate 1-2 orders of magnitudes lower than expected by previous 1D calculations and $R_{\rm crit}$ smaller than the disk scale height, except in the opacity window at temperature $T\sim 2000$K. Accretion in slow diffusion regime occurs in regions with very high density $ρ\gtrsim 10^{-9}$g/cm$^3$, and needs to be treated with caution in 1D long-term calculations.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
On the third kind periods for abelian $t$-modules
Authors:
Yen-Tsung Chen,
Changningphaabi Namoijam
Abstract:
Inspired by the relations between periods of elliptic integrals of the third kind and the periods of the extensions of the corresponding elliptic curves by the multiplicative group, we introduce the notion of the third kind periods for abelian $t$-modules and establish an evaluation for these periods that is parallel to the classical setting. When we specialize our result to the case of Drinfeld m…
▽ More
Inspired by the relations between periods of elliptic integrals of the third kind and the periods of the extensions of the corresponding elliptic curves by the multiplicative group, we introduce the notion of the third kind periods for abelian $t$-modules and establish an evaluation for these periods that is parallel to the classical setting. When we specialize our result to the case of Drinfeld modules, an explicit formula for these third kind periods is established. We also prove the algebraic independence of periods of the first, the second, and the third kind for Drinfeld modules of arbitrary rank. This generalizes prior results of Chang for rank $2$ Drinfeld modules.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Against All Odds: Overcoming Typology, Script, and Language Confusion in Multilingual Embedding Inversion Attacks
Authors:
Yiyi Chen,
Russa Biswas,
Heather Lent,
Johannes Bjerva
Abstract:
Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. In response, the burgeoning field of LLM Security aims to study and defend against such threats. Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that mul…
▽ More
Large Language Models (LLMs) are susceptible to malicious influence by cyber attackers through intrusions such as adversarial, backdoor, and embedding inversion attacks. In response, the burgeoning field of LLM Security aims to study and defend against such threats. Thus far, the majority of works in this area have focused on monolingual English models, however, emerging research suggests that multilingual LLMs may be more vulnerable to various attacks than their monolingual counterparts. While previous work has investigated embedding inversion over a small subset of European languages, it is challenging to extrapolate these findings to languages from different linguistic families and with differing scripts. To this end, we explore the security of multilingual LLMs in the context of embedding inversion attacks and investigate cross-lingual and cross-script inversion across 20 languages, spanning over 8 language families and 12 scripts. Our findings indicate that languages written in Arabic script and Cyrillic script are particularly vulnerable to embedding inversion, as are languages within the Indo-Aryan language family. We further observe that inversion models tend to suffer from language confusion, sometimes greatly reducing the efficacy of an attack. Accordingly, we systematically explore this bottleneck for inversion models, uncovering predictable patterns which could be leveraged by attackers. Ultimately, this study aims to further the field's understanding of the outstanding security vulnerabilities facing multilingual LLMs and raise awareness for the languages most at risk of negative impact from these attacks.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Existence of complements for foliations
Authors:
Yen-An Chen,
Dongchen Jiao,
Pascale Voegtli
Abstract:
This paper demonstrates the existence of $\mathbb{Q}$-complements for algebraically integrable log-Fano foliations on klt ambient varieties. Additionally, we investigate properties of algebraically integrable Fano foliations such as a partial inversion of adjunction as well as a connectedness principle.
This paper demonstrates the existence of $\mathbb{Q}$-complements for algebraically integrable log-Fano foliations on klt ambient varieties. Additionally, we investigate properties of algebraically integrable Fano foliations such as a partial inversion of adjunction as well as a connectedness principle.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.