Search | arXiv e-print repository

GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices

Authors: Hanqiu Wang, Zihao Zhan, Haoqi Shan, Siqi Dai, Max Panoff, Shuo Wang

Abstract: The advent and growing popularity of Virtual Reality (VR) and Mixed Reality (MR) solutions have revolutionized the way we interact with digital platforms. The cutting-edge gaze-controlled typing methods, now prevalent in high-end models of these devices, e.g., Apple Vision Pro, have not only improved user experience but also mitigated traditional keystroke inference attacks that relied on hand ges… ▽ More The advent and growing popularity of Virtual Reality (VR) and Mixed Reality (MR) solutions have revolutionized the way we interact with digital platforms. The cutting-edge gaze-controlled typing methods, now prevalent in high-end models of these devices, e.g., Apple Vision Pro, have not only improved user experience but also mitigated traditional keystroke inference attacks that relied on hand gestures, head movements and acoustic side-channels. However, this advancement has paradoxically given birth to a new, potentially more insidious cyber threat, GAZEploit. In this paper, we unveil GAZEploit, a novel eye-tracking based attack specifically designed to exploit these eye-tracking information by leveraging the common use of virtual appearances in VR applications. This widespread usage significantly enhances the practicality and feasibility of our attack compared to existing methods. GAZEploit takes advantage of this vulnerability to remotely extract gaze estimations and steal sensitive keystroke information across various typing scenarios-including messages, passwords, URLs, emails, and passcodes. Our research, involving 30 participants, achieved over 80% accuracy in keystroke inference. Alarmingly, our study also identified over 15 top-rated apps in the Apple Store as vulnerable to the GAZEploit attack, emphasizing the urgent need for bolstered security measures for this state-of-the-art VR/MR text entry method. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: 15 pages, 20 figures, Accepted at ACM CCS'24

arXiv:2408.13677 [pdf]

Room-temperature polariton condensate in a two-dimensional hybrid perovskite

Authors: Marti Struve, Christoph Bennenhei, Hamid Pashaei Adl, Kok Wee Song, Hangyong Shan, Nadiya Mathukhno, Jens-Christian Drawer, Falk Eilenberger, Naga Pratibha Jasti, David Cahen, Oleksandr Kyriienko, Christian Schneider, Martin Esmann

Abstract: Layered 2D halide perovskites are chemically synthesized realizations of quantum well stacks with giant exciton oscillator strengths, tunable emission spectra and very large exciton binding energies. While these features render 2D halide perovskites a promising platform for room-temperature polaritonics, bosonic condensation and polariton lasing in 2D perovskites have so far remained elusive at am… ▽ More Layered 2D halide perovskites are chemically synthesized realizations of quantum well stacks with giant exciton oscillator strengths, tunable emission spectra and very large exciton binding energies. While these features render 2D halide perovskites a promising platform for room-temperature polaritonics, bosonic condensation and polariton lasing in 2D perovskites have so far remained elusive at ambient conditions. Here, we demonstrate room-temperature cavity exciton-polariton condensation in mechanically exfoliated crystals of the 2D Ruddlesden-Popper iodide perovskite $(BA)_{2}(MA)_{2}Pb_{3}I_{10}$ in an open optical microcavity. We observe a polariton condensation threshold of $P_{th}=6.76 fJ$ per pulse and detect a strong non-linear response. Interferometric measurements confirm the spontaneous emergence of spatial coherence across the condensate with an associated first-order autocorrelation reaching $g^{(1)}\approx 0.6$. Our results lay the foundation for a new class of room-temperature polariton lasers based on 2D halide perovskites with great potential for hetero-integration with other van-der-Waals materials and combination with photonic crystals or waveguides. △ Less

Submitted 24 August, 2024; originally announced August 2024.

Comments: 21 pages, 8 figures

arXiv:2407.10315 [pdf, other]

Order parameters and phase transitions of continual learning in deep neural networks

Authors: Haozhe Shan, Qianyi Li, Haim Sompolinsky

Abstract: Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics th… ▽ More Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks. It gives rise to order parameters (OPs) that capture how task relations and network architecture influence forgetting and knowledge transfer, as verified by numerical evaluations. We found that the input and rule similarity between tasks have different effects on CL performance. In addition, the theory predicts that increasing the network depth can effectively reduce overlap between tasks, thereby lowering forgetting. For networks with task-specific readouts, the theory identifies a phase transition where CL performance shifts dramatically as tasks become less similar, as measured by the OPs. Sufficiently low similarity leads to catastrophic anterograde interference, where the network retains old tasks perfectly but completely fails to generalize new learning. Our results delineate important factors affecting CL performance and suggest strategies for mitigating forgetting. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 26 pages, 8 figures

arXiv:2407.09857 [pdf, other]

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Authors: Shaohong Wang, Lu Bin, Xinyu Xiao, Zhiyu Xiang, Hangguan Shan, Eryun Liu

Abstract: Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of t… ▽ More Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of the advantages offered by the camera modality. This work proposes an instance-level fusion transformer for visual collaborative perception (IFTR), which enhances the detection performance of camera-only collaborative perception systems through the communication and sharing of visual features. To capture the visual information from multiple agents, we design an instance feature aggregation that interacts with the visual features of individual agents using predefined grid-shaped bird eye view (BEV) queries, generating more comprehensive and accurate BEV features. Additionally, we devise a cross-domain query adaptation as a heuristic to fuse 2D priors, implicitly encoding the candidate positions of targets. Furthermore, IFTR optimizes communication efficiency by sending instance-level features, achieving an optimal performance-bandwidth trade-off. We evaluate the proposed IFTR on a real dataset, DAIR-V2X, and two simulated datasets, OPV2V and V2XSet, achieving performance improvements of 57.96%, 9.23% and 12.99% in AP@70 metrics compared to the previous SOTAs, respectively. Extensive experiments demonstrate the superiority of IFTR and the effectiveness of its key components. The code is available at https://github.com/wangsh0111/IFTR. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.09048 [pdf, other]

KUNPENG: An Embodied Large Model for Intelligent Maritime

Authors: Naiyao Wang, Tongbang Jiang, Ye Wang, Shaoyang Qiu, Bo Zhang, Xinqiang Xie, Munan Li, Chunliu Wang, Yiyang Wang, Hongxiang Ren, Ruili Wang, Hongjun Shan, Hongbo Liu

Abstract: Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic… ▽ More Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic maritime environment, along with diverse and heterogeneous large-scale data sources, present challenges for real-time decision-making in intelligent maritime. In this paper, We propose KUNPENG, the first-ever embodied large model for intelligent maritime in the smart ocean construction, which consists of six systems. The model perceives multi-source heterogeneous data for the cognition of environmental interaction and make autonomous decision strategies, which are used for intelligent vessels to perform navigation behaviors under safety and emergency guarantees and continuously optimize power to achieve embodied intelligence in maritime. In comprehensive maritime task evaluations, KUNPENG has demonstrated excellent performance. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 9 pages, 3 figures

arXiv:2407.03548 [pdf, other]

HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation

Authors: Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan

Abstract: Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from uns… ▽ More Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from unstable feature space. In this work, we propose to complement discriminative segmentation methods with the knowledge of underlying data distribution from generative models. To that end, we propose a novel hybrid diffusion framework for medical image segmentation, termed HiDiff, which can synergize the strengths of existing discriminative segmentation models and new generative diffusion models. HiDiff comprises two key components: discriminative segmentor and diffusion refiner. First, we utilize any conventional trained segmentation models as discriminative segmentor, which can provide a segmentation mask prior for diffusion refiner. Second, we propose a novel binary Bernoulli diffusion model (BBDM) as the diffusion refiner, which can effectively, efficiently, and interactively refine the segmentation mask by modeling the underlying data distribution. Third, we train the segmentor and BBDM in an alternate-collaborative manner to mutually boost each other. Extensive experimental results on abdomen organ, brain tumor, polyps, and retinal vessels segmentation datasets, covering four widely-used modalities, demonstrate the superior performance of HiDiff over existing medical segmentation algorithms, including the state-of-the-art transformer- and diffusion-based ones. In addition, HiDiff excels at segmenting small objects and generalizing to new datasets. Source codes are made available at https://github.com/takimailto/HiDiff. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Accepted by IEEE Transactions on Medical Imaging 2024

arXiv:2406.06693 [pdf, other]

The measurement of the splashback radius of dark matter halo

Authors: Weiwei Xu, Huanyuan Shan, Ran Li, Ji Yao, Chunxiang Wang, Nan Li, Chaoli Zhang

Abstract: In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and re… ▽ More In the hierarchical evolution framework of cosmology, larger halos grow through matter accretion and halo mergers. To clarify the halo evolution, we need to define the halo mass and radius physically. However, the pseudo-evolution problem makes the process difficult. Thus, we aim to measure the splashback radius, a physically defined halo radius for a large number of halos with various mass and redshift, and to determine the most important parameters to affect it. We use the typical definition of splashback radius (Rsp) as the radius with the steepest radial density profile. In this work, we measure Rsp of dark matter halos within the mass of 1e13-3e15Msun and redshifts spanning 0.08-0.65. This is the measurement of the Rsp in the largest range of halo mass and redshift. Using the shear catalog of the DECaLS DR8, we investigate Rsp of halos associated with galaxies and galaxy clusters identified in the various catalogs. Our finding reveals a trend wherein massive halos demonstrate a larger Rsp, and the normalized splashback radius (Rsp/R200m) shows a U-shaped mass evolution. The upturn in these relations mainly comes from the contribution of massive halos with low redshifts. We further find Rsp increases with the peak height, while Rsp/R200m has a negative relation with the peak height. We also find the Rsp >~R200m for most halos, indicating their low accretion rates. Our result is consistent with previous literature across a wide range of mass, redshift, and peak height, as well as the simulation work from More et al. (2015). △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures, submitted to ApJ

arXiv:2406.03797 [pdf, other]

Morpho-Photometric Classification of KiDS DR5 Sources Based on Neural Networks: A Comprehensive Star-Quasar-Galaxy Catalog

Authors: Hai-Cheng Feng, Rui Li, Nicola R. Napolitano, Sha-Sha Li, J. M. Bai, Ran Li, H. T. Liu, Kai-Xing Lu, Mario Radovich, Huan-Yuan Shan, Jian-Guo Wang, Wen-Zhe Xi, Ling-Hua Xie, Yang-Wei Zhang

Abstract: We present a novel multimodal neural network for classifying astronomical sources in multiband ground-based observations, from optical to near infrared, to separate sources in stars, galaxies and quasars. Our approach combines a convolutional neural network branch for learning morphological features from $r$-band images with an artificial neural network branch for extracting spectral energy distri… ▽ More We present a novel multimodal neural network for classifying astronomical sources in multiband ground-based observations, from optical to near infrared, to separate sources in stars, galaxies and quasars. Our approach combines a convolutional neural network branch for learning morphological features from $r$-band images with an artificial neural network branch for extracting spectral energy distribution (SED) information. Specifically, we have used 9-band optical ($ugri$) and NIR ($ZYHJK_s$) data from the Kilo-Degree Survey (KiDS) Data Release 5. The two branches of the network are concatenated and feed into fully-connected layers for final classification. We train the network on a spectroscopically confirmed sample from the Sloan Digital Sky Survey cross-matched with KiDS. The trained model achieves 98.76\% overall accuracy on an independent testing dataset, with F1 scores exceeding 95\% for each class. Raising the output probability threshold, we obtain higher purity at the cost of a lower completeness. We have also validated the network using external catalogs cross-matched with KiDS, correctly classifying 99.74\% of a pure star sample selected from Gaia parallaxes and proper motions, and 99.74\% of an external galaxy sample from the Galaxy and Mass Assembly survey, adjusted for low-redshift contamination. We apply the trained network to 27,334,751 KiDS DR5 sources with $r \leqslant 23$ mag to generate a new classification catalog. This multimodal neural network successfully leverages both morphological and SED information to enable efficient and robust classification of stars, quasars, and galaxies in large photometric surveys. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 18 pages, 12 figures, 2 tables, Submitted to ApJS

arXiv:2405.16121 [pdf]

Design and Implementation of an Emotion Analysis System Based on EEG Signals

Authors: Zhang Yutian, Huang Shan, Zhang Jianing, Fan Ci'en

Abstract: Traditional brain-computer systems are complex and expensive, and emotion classification algorithms lack repre-sentations of the intrinsic relationships between different channels of electroencephalogram (EEG) signals. There is still room for improvement in accuracy. To lower the research barrier for EEG and harness the rich information embedded in multi-channel EEG, we propose and implement a sim… ▽ More Traditional brain-computer systems are complex and expensive, and emotion classification algorithms lack repre-sentations of the intrinsic relationships between different channels of electroencephalogram (EEG) signals. There is still room for improvement in accuracy. To lower the research barrier for EEG and harness the rich information embedded in multi-channel EEG, we propose and implement a simple and user-friendly brain-computer system for classifying four emotions: happiness, sorrow, sadness, and tranquility. This system utilizes the fusion of convolutional attention mechanisms and fully pre-activated residual blocks, termed Attention-Convolution-based Pre-Activated Residual Network (ACPA-ResNet).In the hardware acquisition and preprocessing phase, we employ the ADS1299 integrated chip as the analog front-end and utilize the ESP32 microcontroller for initial EEG signal processing. Data is wirelessly transmitted to a PC through UDP protocol for further preprocessing. In the emotion analysis phase, ACPA-ResNet is designed to automatically extract and learn features from EEG signals, thereby enabling accurate classification of emotional states by learning time-frequency domain characteristics. ACPA-ResNet introduces an attention mechanism on the foundation of residual networks, adaptively assigning different weights to each channel. This allows it to focus on more meaningful EEG signals in both spatial and channel dimensions while avoiding the problems of gradient dispersion and explosion associated with deep network architectures.Through testing on 16 subjects, our system demonstrates stable EEG signal acquisition and transmission. The novel network significantly enhances emotion recognition accuracy, achieving an average emotion classification accuracy of 95.1%. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.03135 [pdf, other]

CURLING - I. The Influence of Point-like Image Approximation on the Outcomes of Cluster Strong Lens Modeling

Authors: Yushan Xie, Huanyuan Shan, Nan Li, Ran Li, Eric Jullo, Chen Su, Xiaoyue Cao, Jean-Paul Kneib, Ana Acebron, Mengfan He, Ji Yao, Chunxiang Wang, Jiadong Li, Yin Li

Abstract: Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first pap… ▽ More Cluster-scale strong lensing is a powerful tool for exploring the properties of dark matter and constraining cosmological models. However, due to the complex parameter space, pixelized strong lens modeling in galaxy clusters is computationally expensive, leading to the point-source approximation of strongly lensed extended images, potentially introducing systematic biases. Herein, as the first paper of the ClUsteR strong Lens modelIng for the Next-Generation observations (CURLING) program, we use lensing ray-tracing simulations to quantify the biases and uncertainties arising from the point-like image approximation for JWST-like observations. Our results indicate that the approximation works well for reconstructing the total cluster mass distribution, but can bias the magnification measurements near critical curves and the constraints on the cosmological parameters, the total matter density of the Universe $Ω_{\rm m}$, and dark energy equation of state parameter $w$. To mitigate the biases, we propose incorporating the extended surface brightness distribution of lensed sources into the modeling. This approach reduces the bias in magnification from 46.2 per cent to 0.09 per cent for $μ\sim 1000$. Furthermore, the median values of cosmological parameters align more closely with the fiducial model. In addition to the improved accuracy, we also demonstrate that the constraining power can be substantially enhanced. In conclusion, it is necessary to model cluster-scale strong lenses with pixelized multiple images, especially for estimating the intrinsic luminosity of highly magnified sources and accurate cosmography in the era of high-precision observations. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 12 pages, 8 figures

arXiv:2404.14162 [pdf, other]

FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

Abstract: Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventiona… ▽ More Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventional latent diffusion process in three major aspects. First, we propose incorporating warped clothes as both the starting point and local condition, supplying the model with faithful clothes priors. Second, we introduce a novel clothes flattening network to constrain generated try-on images, providing clothes-consistent faithful supervision. Third, we devise a clothes-posterior sampling for faithful inference, further enhancing the model performance over conventional clothes-agnostic Gaussian sampling. Extensive experimental results on the benchmark VITON-HD and Dress Code datasets demonstrate that our FLDM-VTON outperforms state-of-the-art baselines and is able to generate photo-realistic try-on images with faithful clothing details. △ Less

Submitted 19 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI 2024

arXiv:2404.13596 [pdf, other]

New galaxy UV luminosity constraints on warm dark matter from JWST

Authors: Bin Liu, Huanyuan Shan, Jiajun Zhang

Abstract: We exploit the recent {\it James Webb Space Telescope} (JWST) determination of galaxy UV luminosity functions over the redshift range $z=9-14.5$ to derive constraints on warm dark matter (WDM) models. The delayed structure formation in WDM universes make high-redshift observations a powerful probe to set limits on the particle mass $m_\mathrm{x}$ of WDM candidates. By integrating these observation… ▽ More We exploit the recent {\it James Webb Space Telescope} (JWST) determination of galaxy UV luminosity functions over the redshift range $z=9-14.5$ to derive constraints on warm dark matter (WDM) models. The delayed structure formation in WDM universes make high-redshift observations a powerful probe to set limits on the particle mass $m_\mathrm{x}$ of WDM candidates. By integrating these observations with blank-field surveys conducted by the {\it Hubble Space Telescope} (HST) at redshifts $z=4-8$, we impose constraints on both astrophysical parameters ($β$, $γ$, $ε_{\mathrm N}$, $M_c$ for a double-power law star formation efficiency, and $σ_{M_{\mathrm{UV}}}$ for a Gaussian magnitude-halo mass relation) and the WDM parameter (dark matter particle mass $m_\mathrm{x}$) simultaneously. We find a new limit of $m_\mathrm{x} \geq 3.2$ keV for the mass of thermal relic WDM particles at $95\%$ confidence level. This bound is tighter than the most stringent result derived using HST data before. Future JWST observations could further reduce the observation uncertainties and improve this constraint. △ Less

Submitted 20 May, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: 9 pages, 3 figures and 1 Table. Accepted for publication in ApJ. Match the revised version

arXiv:2404.02570 [pdf, other]

MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

Authors: Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko

Abstract: This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained… ▽ More This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained languages models: XLM-R and Furina. We experiment with 1) single-source transfer and select source languages based on typological similarity, 2) augmenting English training data with the two nearest-neighbor source languages, and 3) multi-source transfer where we compare selecting on all training languages against languages from the same family. We further study machine translation-based data augmentation and the impact of script differences. Our submission achieved the first place in the C8 (Kinyarwanda) test set. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2403.13525 [pdf, ps, other]

Extremal spectral radius of degree-based weighted adjacency matrices of graphs with given order and size

Authors: Chenghao Shen, Haiying Shan

Abstract: The $f$ adjacency matrix is a type of edge-weighted adjacency matrix, whose weight of an edge $ij$ is $f(d_i,d_j)$, where $f$ is a real symmetric function and $d_i,d_j$ are the degrees of vertex $i$ and vertex $j$. The $f$-spectral radius of a graph is the spectral radius of its $f$-adjacency matrix. In this paper, the effect of subdividing an edge on $f$-spectral radius is discussed. Some necessa… ▽ More The $f$ adjacency matrix is a type of edge-weighted adjacency matrix, whose weight of an edge $ij$ is $f(d_i,d_j)$, where $f$ is a real symmetric function and $d_i,d_j$ are the degrees of vertex $i$ and vertex $j$. The $f$-spectral radius of a graph is the spectral radius of its $f$-adjacency matrix. In this paper, the effect of subdividing an edge on $f$-spectral radius is discussed. Some necessary conditions of the extremal graph with given order and size are derived. As an example, we obtain the bicyclic graph(s) with the smallest $f$-spectral radius for fixed order $n\geq8$ by applying generalized Lu-Man method. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13374 [pdf, other]

Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity

Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Han Hu, Hangguan Shan, Tony Q. S. Quek

Abstract: This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-conve… ▽ More This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-convex loss function or homogeneously distributed dataset, we conduct convergence analysis for not only strongly-convex but also non-convex loss function over heterogeneous dataset. According to our theoretical analysis, as long as the fraction of dataset from malicious users is less than half, RAGA can achieve convergence at rate $\mathcal{O}({1}/{T^{2/3- δ}})$ where $T$ is the iteration number and $δ\in (0, 2/3)$ for non-convex loss function, and at linear rate for strongly-convex loss function. Moreover, stationary point or global optimal solution is proved to obtainable as data heterogeneity vanishes. Experimental results corroborate the robustness of RAGA to Byzantine attacks and verifies the advantage of RAGA over baselines on convergence performance under various intensity of Byzantine attacks, for heterogeneous dataset. △ Less

Submitted 27 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12749 [pdf, other]

Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

Authors: Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank

Abstract: Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs fro… ▽ More Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs from standard German in lexical distribution, syntactic construction, and entity information. We conduct in-domain, cross-domain, sequential, and joint experiments on two Bavarian and three German corpora and present the first comprehensive NER results on Bavarian. Incorporating knowledge from the larger German NER (sub-)datasets notably improves on bar-wiki and moderately on bar-tweet. Inversely, training first on Bavarian contributes slightly to the seminal German CoNLL 2006 corpus. Moreover, with gold dialect labels on Bavarian tweets, we assess multi-task learning between five NER and two Bavarian-German dialect identification tasks and achieve NER SOTA on bar-wiki. We substantiate the necessity of our low-resource BarNER corpus and the importance of diversity in dialects, genres, and topics in enhancing model performance. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: LREC-COLING 2024

arXiv:2403.09376 [pdf, ps, other]

The distance spectral radius of $k$-uniform hypertrees with given number of vertices of maximum degree

Authors: Xiaoqi Liu, Haiying Shan

Abstract: This paper investigates the influence of two graft transformations on the distance spectral radius of connected uniform hypergraphs. Specifically, we study $k$-uniform hypertrees with given size, maximum degree and number of vertices of maximum degree, and give the structure of such hypergraph with maximum distance spectral radius. This paper investigates the influence of two graft transformations on the distance spectral radius of connected uniform hypergraphs. Specifically, we study $k$-uniform hypertrees with given size, maximum degree and number of vertices of maximum degree, and give the structure of such hypergraph with maximum distance spectral radius. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.06128 [pdf, other]

Low-dose CT Denoising with Language-engaged Dual-space Alignment

Authors: Zhihao Chen, Tao Chen, Chenhui Wang, Chuang Niu, Ge Wang, Hongming Shan

Abstract: While various deep learning methods were proposed for low-dose computed tomography (CT) denoising, they often suffer from over-smoothing, blurring, and lack of explainability. To alleviate these issues, we propose a plug-and-play Language-Engaged Dual-space Alignment loss (LEDA) to optimize low-dose CT denoising models. Our idea is to leverage large language models (LLMs) to align denoised CT and… ▽ More While various deep learning methods were proposed for low-dose computed tomography (CT) denoising, they often suffer from over-smoothing, blurring, and lack of explainability. To alleviate these issues, we propose a plug-and-play Language-Engaged Dual-space Alignment loss (LEDA) to optimize low-dose CT denoising models. Our idea is to leverage large language models (LLMs) to align denoised CT and normal dose CT images in both the continuous perceptual space and discrete semantic space, which is the first LLM-based scheme for low-dose CT denoising. LEDA involves two steps: the first is to pretrain an LLM-guided CT autoencoder, which can encode a CT image into continuous high-level features and quantize them into a token space to produce semantic tokens derived from the LLM's vocabulary; and the second is to minimize the discrepancy between the denoised CT images and normal dose CT in terms of both encoded high-level features and quantized token embeddings derived by the LLM-guided CT autoencoder. Extensive experimental results on two public LDCT denoising datasets demonstrate that our LEDA can enhance existing denoising models in terms of quantitative metrics and qualitative evaluation, and also provide explainability through language-level image understanding. Source code is available at https://github.com/hao1635/LEDA. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: 11 pages, 6 figures

arXiv:2403.05545 [pdf]

Unveiling the influence of behavioural, built environment and socio-economic features on the spatial and temporal variability of bus use using explainable machine learning

Authors: Sui Tao, Francisco Rowe, Hongyu Shan

Abstract: Understanding the variability of people's travel patterns is key to transport planning and policy-making. However, to what extent daily transit use displays geographic and temporal variabilities, and what are the contributing factors have not been fully addressed. Drawing on smart card data in Beijing, China, this study seeks to address these deficits by adopting new indices to capture the spatial… ▽ More Understanding the variability of people's travel patterns is key to transport planning and policy-making. However, to what extent daily transit use displays geographic and temporal variabilities, and what are the contributing factors have not been fully addressed. Drawing on smart card data in Beijing, China, this study seeks to address these deficits by adopting new indices to capture the spatial and temporal variability of bus use during peak hours and investigate their associations with relevant contextual features. Using explainable machine learning, our findings reveal non-linear interaction between spatial and temporal variability and trip frequency. Furthermore, greater distance to the urban centres (>10 kilometres) is associated with increased spatial variability of bus use, while greater separation of trip origins and destinations from the subcentres reduces both spatial and temporal variability. Higher availability of bus routes is linked to higher spatial variability but lower temporal variability. Meanwhile, both lower and higher road density is associated with higher spatial variability of bus use especially in morning times. These findings indicate that different built environment features moderate the flexibility of travel time and locations. Implications are derived to inform more responsive and reliable operation and planning of transit systems. △ Less

Submitted 6 February, 2024; originally announced March 2024.

Comments: 58 pages including supplementary material

arXiv:2402.18266 [pdf]

doi 10.1103/PhysRevLett.131.206901

Second-order temporal coherence of polariton lasers based on an atomically thin crystal in a microcavity

Authors: Hangyong Shan, Jens-Christian Drawer, Meng Sun, Carlos Anton-Solanas, Martin Esmann, Kentaro Yumigeta, Kenji Watanabe, Takashi Taniguchi, Sefaattin Tongay, Sven Höfling, Ivan Savenko, Christian Schneider

Abstract: Bosonic condensation and lasing of exciton-polaritons in microcavities is a fascinating solid-state phenomenon. It provides a versatile platform to study out-of-equilibrium many-body physics and has recently appeared at the forefront of quantum technologies. Here, we study the photon statistics via the second-order temporal correlation function of polariton lasing emerging from an optical microcav… ▽ More Bosonic condensation and lasing of exciton-polaritons in microcavities is a fascinating solid-state phenomenon. It provides a versatile platform to study out-of-equilibrium many-body physics and has recently appeared at the forefront of quantum technologies. Here, we study the photon statistics via the second-order temporal correlation function of polariton lasing emerging from an optical microcavity integrated with an atomically thin MoSe2 crystal. Furthermore, we investigate the macroscopic polariton phase transition for varying excitation powers and temperatures. The lower-polariton exhibits photon bunching below the threshold, implying a dominant thermal distribution of the emission, while above the threshold, the second-order correlation transits towards unity, which evidences the formation of a coherent state. Our findings are in agreement with a microscopic numerical model, which explicitly includes scattering with phonons on the quantum level. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: This manuscript was published in Phys. Rev. Lett., see https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.131.206901

Journal ref: Physical Review Letters 131, 206901 (2023)

arXiv:2402.18010 [pdf, other]

How to coadd images: II. Anti-aliasing and PSF deconvolution

Authors: Lei Wang, Huanyuan Shan, Lin Nie, Dezi Liu, Zhaojun Yan, Guoliang Li, Cheng Cheng, Yushan Xie, Han Qu, Wenwen Zheng, Xi Kang

Abstract: We have developed a novel method for co-adding multiple under-sampled images that combines the iteratively reweighted least squares and divide-and-conquer algorithms. Our approach not only allows for the anti-aliasing of the images but also enables PSF deconvolution, resulting in enhanced restoration of extended sources, the highest PSNR, and reduced ringing artefacts. To test our method, we condu… ▽ More We have developed a novel method for co-adding multiple under-sampled images that combines the iteratively reweighted least squares and divide-and-conquer algorithms. Our approach not only allows for the anti-aliasing of the images but also enables PSF deconvolution, resulting in enhanced restoration of extended sources, the highest PSNR, and reduced ringing artefacts. To test our method, we conducted numerical simulations that replicated observation runs of the CSST/VST telescope and compared our results to those obtained using previous algorithms. The simulation showed that our method outperforms previous approaches in several ways, such as restoring the profile of extended sources and minimizing ringing artefacts. Additionally, because our method relies on the inherent advantages of least squares fitting, it is more versatile and does not depend on the local uniformity hypothesis for the PSF. However, the new method consumes much more computation than the other approaches. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 16 pages, 5 figures, 2 tables, accepted for publishing on RAA

arXiv:2402.14152 [pdf, other]

ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM

Authors: Jonathan Ku, Junyao Zhang, Haoxuan Shan, Saichand Samudrala, Jiawen Wu, Qilin Zheng, Ziru Li, JV Rajendran, Yiran Chen

Abstract: Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Pr… ▽ More Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Processing-in-memory (PIM) is a promising approach to tackle this problem. In this work, we design ModSRAM, the first 8T SRAM PIM architecture to compute large-number modular multiplication efficiently. In addition, we propose R4CSA-LUT, a new algorithm that reduces the cycles for an interleaved algorithm and eliminates carry propagation for addition based on look-up tables (LUT). ModSRAM is co-designed with R4CSA-LUT to support modular multiplication and data reuse in memory with 52% cycle reduction compared to prior works with only 32% area overhead. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: DAC 2024

arXiv:2402.11423 [pdf, other]

VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger

Authors: Zihao Zhan, Yirui Yang, Haoqi Shan, Hanqiu Wang, Yier Jin, Shuo Wang

Abstract: Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vec… ▽ More Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vectors, enabling adversaries to manipulate the charger and perform a series of attacks. In this paper, we propose VoltSchemer, a set of innovative attacks that grant attackers control over commercial-off-the-shelf wireless chargers merely by modulating the voltage from the power supply. These attacks represent the first of its kind, exploiting voltage noises from the power supply to manipulate wireless chargers without necessitating any malicious modifications to the chargers themselves. The significant threats imposed by VoltSchemer are substantiated by three practical attacks, where a charger can be manipulated to: control voice assistants via inaudible voice commands, damage devices being charged through overcharging or overheating, and bypass Qi-standard specified foreign-object-detection mechanism to damage valuable items exposed to intense magnetic fields. We demonstrate the effectiveness and practicality of the VoltSchemer attacks with successful attacks on 9 top-selling COTS wireless chargers. Furthermore, we discuss the security implications of our findings and suggest possible countermeasures to mitigate potential threats. △ Less

Submitted 17 February, 2024; originally announced February 2024.

Comments: This paper has been accepted by the 33rd USENIX Security Symposium

arXiv:2402.02299 [pdf, other]

doi 10.1145/3517810

A Review and Comparison of AI Enhanced Side Channel Analysis

Authors: Max Panoff, Honggang Yu, Haoqi Shan, Yier Jin

Abstract: Side Channel Analysis (SCA) presents a clear threat to privacy and security in modern computing systems. The vast majority of communications are secured through cryptographic algorithms. These algorithms are often provably-secure from a cryptographical perspective, but their implementation on real hardware introduces vulnerabilities. Adversaries can exploit these vulnerabilities to conduct SCA and… ▽ More Side Channel Analysis (SCA) presents a clear threat to privacy and security in modern computing systems. The vast majority of communications are secured through cryptographic algorithms. These algorithms are often provably-secure from a cryptographical perspective, but their implementation on real hardware introduces vulnerabilities. Adversaries can exploit these vulnerabilities to conduct SCA and recover confidential information, such as secret keys or internal states. The threat of SCA has greatly increased as machine learning, and in particular deep learning, enhanced attacks become more common. In this work, we will examine the latest state-of-the-art deep learning techniques for side channel analysis, the theory behind them, and how they are conducted. Our focus will be on profiling attacks using deep learning techniques, but we will also examine some new and emerging methodologies enhanced by deep learning techniques, such as non-profiled attacks, artificial trace generation, and others. Finally, different deep learning enhanced SCA schemes attempted against the ANSSI SCA Database (ASCAD) and their relative performance will be evaluated and compared. This will lead to new research directions to secure cryptographic implementations against the latest SCA attacks. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: This paper has been accepted by ACM Journal on Emerging Technologies in Computing Systems (JETC)

arXiv:2402.02227 [pdf, other]

doi 10.1109/SP46214.2022.9833718

Invisible Finger: Practical Electromagnetic Interference Attack on Touchscreen-based Electronic Devices

Authors: Haoqi Shan, Boyi Zhang, Zihao Zhan, Dean Sullivan, Shuo Wang, Yier Jin

Abstract: Touchscreen-based electronic devices such as smart phones and smart tablets are widely used in our daily life. While the security of electronic devices have been heavily investigated recently, the resilience of touchscreens against various attacks has yet to be thoroughly investigated. In this paper, for the first time, we show that touchscreen-based electronic devices are vulnerable to intentiona… ▽ More Touchscreen-based electronic devices such as smart phones and smart tablets are widely used in our daily life. While the security of electronic devices have been heavily investigated recently, the resilience of touchscreens against various attacks has yet to be thoroughly investigated. In this paper, for the first time, we show that touchscreen-based electronic devices are vulnerable to intentional electromagnetic interference (IEMI) attacks in a systematic way and how to conduct this attack in a practical way. Our contribution lies in not just demonstrating the attack, but also analyzing and quantifying the underlying mechanism allowing the novel IEMI attack on touchscreens in detail. We show how to calculate both the minimum amount of electric field and signal frequency required to induce touchscreen ghost touches. We further analyze our IEMI attack on real touchscreens with different magnitudes, frequencies, duration, and multitouch patterns. The mechanism of controlling the touchscreen-enabled electronic devices with IEMI signals is also elaborated. We design and evaluate an out-of-sight touchscreen locator and touch injection feedback mechanism to assist a practical IEMI attack. Our attack works directly on the touchscreen circuit regardless of the touchscreen scanning mechanism or operating system. Our attack can inject short-tap, long-press, and omni-directional gestures on touchscreens from a distance larger than the average thickness of common tabletops. Compared with the state-of-the-art touchscreen attack, ours can accurately inject different types of touch events without the need for sensing signal synchronization, which makes our attack more robust and practical. In addition, rather than showing a simple proof-of-concept attack, we present and demonstrate the first ready-to-use IEMI based touchscreen attack vector with end-to-end attack scenarios. △ Less

Submitted 3 February, 2024; originally announced February 2024.

Comments: This paper has been accepted by 2022 IEEE Symposium on Security and Privacy (SP) and won distinguished paper award

arXiv:2401.17794 [pdf, other]

Influence of sources with a spectral peak in the detection of Cosmic Dawn and Epoch of Reionization

Authors: Mengfan He, Qian Zheng, Quan Guo, Huanyuan Shan, Zhenghao Zhu, Yushan Xie, Yan Huang, Feiyu Zhao

Abstract: Foreground removal is one of the biggest challenges in the detection of the Cosmic Dawn (CD) and Epoch of Reionization (EoR). Various foreground subtraction techniques have been developed based on the spectral smoothness of foregrounds. However, the sources with a spectral peak (SP) at Megahertz may break down the spectral smoothness at low frequencies (< 1000 MHz). In this paper, we cross-match t… ▽ More Foreground removal is one of the biggest challenges in the detection of the Cosmic Dawn (CD) and Epoch of Reionization (EoR). Various foreground subtraction techniques have been developed based on the spectral smoothness of foregrounds. However, the sources with a spectral peak (SP) at Megahertz may break down the spectral smoothness at low frequencies (< 1000 MHz). In this paper, we cross-match the GaLactic and Extragalactic All-sky Murchison Widefield Array (GLEAM) extragalactic source catalogue with three other radio source catalogues, covering the frequency range from 72 MHz to 1.4 GHz, to search for sources with spectral turnover. 4,423 sources from the GLEAM catalogue are identified as SP sources, representing approximately 3.2 per cent of the GLEAM radio source population. We utilize the properties of SP source candidates obtained from real observations to establish simulations and test the impact of SP sources on the extraction of CD/EoR signals. We statistically compare the differences introduced by SP sources in the residuals after removing the foregrounds with three methods, which are polynomial fitting, Principal Component Analysis (PCA), and fast independent component analysis (FastICA). Our results indicate that the presence of SP sources in the foregrounds has a negligible influence on extracting the CD/EoR signal. After foreground subtraction, the contribution from SP sources to the total power in the two-dimensional (2D) power spectrum within the EoR window is approximately 3 to 4 orders of magnitude lower than the CD/EoR signal. △ Less

Submitted 31 January, 2024; originally announced January 2024.

Comments: 14 pages, 14 figures

arXiv:2401.11764 [pdf, other]

Identity-Driven Multimedia Forgery Detection via Reference Assistance

Authors: Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Haijun Shan, Yugang Jiang

Abstract: Recent advancements in "deepfake" techniques have paved the way for generating various media forgeries. In response to the potential hazards of these media forgeries, many researchers engage in exploring detection methods, increasing the demand for high-quality media forgery datasets. Despite this, existing datasets have certain limitations. Firstly, most datasets focus on manipulating visual moda… ▽ More Recent advancements in "deepfake" techniques have paved the way for generating various media forgeries. In response to the potential hazards of these media forgeries, many researchers engage in exploring detection methods, increasing the demand for high-quality media forgery datasets. Despite this, existing datasets have certain limitations. Firstly, most datasets focus on manipulating visual modality and usually lack diversity, as only a few forgery approaches are considered. Secondly, the quality of media is often inadequate in clarity and naturalness. Meanwhile, the size of the dataset is also limited. Thirdly, it is commonly observed that real-world forgeries are motivated by identity, yet the identity information of the individuals portrayed in these forgeries within existing datasets remains under-explored. For detection, identity information could be an essential clue to boost performance. Moreover, official media concerning relevant identities on the Internet can serve as prior knowledge, aiding both the audience and forgery detectors in determining the true identity. Therefore, we propose an identity-driven multimedia forgery dataset, IDForge, which contains 249,138 video shots sourced from 324 wild videos of 54 celebrities collected from the Internet. The fake video shots involve 9 types of manipulation across visual, audio, and textual modalities. Additionally, IDForge provides extra 214,438 real video shots as a reference set for the 54 celebrities. Correspondingly, we propose the Reference-assisted Multimodal Forgery Detection Network (R-MFDN), aiming at the detection of deepfake videos. Through extensive experiments on the proposed dataset, we demonstrate the effectiveness of R-MFDN on the multimedia detection task. △ Less

Submitted 7 August, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.10966 [pdf, other]

doi 10.1109/JBHI.2024.3357453

HOPE: Hybrid-granularity Ordinal Prototype Learning for Progression Prediction of Mild Cognitive Impairment

Authors: Chenhui Wang, Yiming Lei, Tao Chen, Junping Zhang, Yuxin Li, Hongming Shan

Abstract: Mild cognitive impairment (MCI) is often at high risk of progression to Alzheimer's disease (AD). Existing works to identify the progressive MCI (pMCI) typically require MCI subtype labels, pMCI vs. stable MCI (sMCI), determined by whether or not an MCI patient will progress to AD after a long follow-up. However, prospectively acquiring MCI subtype data is time-consuming and resource-intensive; th… ▽ More Mild cognitive impairment (MCI) is often at high risk of progression to Alzheimer's disease (AD). Existing works to identify the progressive MCI (pMCI) typically require MCI subtype labels, pMCI vs. stable MCI (sMCI), determined by whether or not an MCI patient will progress to AD after a long follow-up. However, prospectively acquiring MCI subtype data is time-consuming and resource-intensive; the resultant small datasets could lead to severe overfitting and difficulty in extracting discriminative information. Inspired by that various longitudinal biomarkers and cognitive measurements present an ordinal pathway on AD progression, we propose a novel Hybrid-granularity Ordinal PrototypE learning (HOPE) method to characterize AD ordinal progression for MCI progression prediction. First, HOPE learns an ordinal metric space that enables progression prediction by prototype comparison. Second, HOPE leverages a novel hybrid-granularity ordinal loss to learn the ordinal nature of AD via effectively integrating instance-to-instance ordinality, instance-to-class compactness, and class-to-class separation. Third, to make the prototype learning more stable, HOPE employs an exponential moving average strategy to learn the global prototypes of NC and AD dynamically. Experimental results on the internal ADNI and the external NACC datasets demonstrate the superiority of the proposed HOPE over existing state-of-the-art methods as well as its interpretability. Source code is made available at https://github.com/thibault-wch/HOPE-for-mild-cognitive-impairment. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: IEEE Journal of Biomedical and Health Informatics, 2024

Journal ref: IEEE Journal of Biomedical and Health Informatics, 2024

arXiv:2401.06267 [pdf]

Organic room-temperature polariton condensate in a higher-order topological lattice

Authors: Christoph Bennenhei, Hangyong Shan, Marti Struve, Nils Kunte, Falk Eilenberger, Jürgen Ohmer, Utz Fischer, Stefan Schumacher, Xuekai Ma, Christian Schneider, Martin Esmann

Abstract: Organic molecule exciton-polaritons in photonic lattices are a versatile platform to emulate unconventional phases of matter at ambient conditions, including protected interface modes in topological insulators. Here, we investigate bosonic condensation in the most prototypical higher-order topological lattice: a 2D-version of the Su-Schrieffer-Heeger (SSH) model, supporting both 0D and 1D topologi… ▽ More Organic molecule exciton-polaritons in photonic lattices are a versatile platform to emulate unconventional phases of matter at ambient conditions, including protected interface modes in topological insulators. Here, we investigate bosonic condensation in the most prototypical higher-order topological lattice: a 2D-version of the Su-Schrieffer-Heeger (SSH) model, supporting both 0D and 1D topological modes. We study fluorescent protein-filled, structured microcavities defining a staggered photonic trapping potential and observe the resulting first- and higher-order topologically protected modes via spatially resolved photoluminescence spectroscopy. We account for the spatial mode patterns by tight-binding calculations and theoretically characterize the topological invariants of the lattice. Under strong optical pumping, we observe bosonic condensation into the topological modes. Via interferometric measurements, we map the spatial first-order coherence in the protected 1D modes extending over 10 microns. Our findings pave the way towards organic on-chip polaritonics using higher-order topology as a tool for the generation of robustly confined polaritonic lasing states. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: 23 pages, 7 figures

arXiv:2401.04588 [pdf, other]

Revealing dark exciton signatures in polariton spectra of 2D materials

Authors: Beatriz Ferreira, Hangyong Shan, Roberto Rosati, Jamie M. Fitzgerald, Lukas Lackner, Bo Han, Martin Esmann, Patrick Hays, Gilbert Liebling, Kenji Watanabe, Takashi Taniguchi, Falk Eilenberger, Sefaattin Tongay, Christian Schneider, Ermin Malic

Abstract: Dark excitons in transition metal dichalcogenides (TMD) have been so far neglected in the context of polariton physics due to their lack of oscillator strength. However, in tungsten-based TMDs, dark excitons are known to be the energetically lowest states and could thus provide important scattering partners for polaritons. In this joint theory-experiment work, we investigate the impact of the full… ▽ More Dark excitons in transition metal dichalcogenides (TMD) have been so far neglected in the context of polariton physics due to their lack of oscillator strength. However, in tungsten-based TMDs, dark excitons are known to be the energetically lowest states and could thus provide important scattering partners for polaritons. In this joint theory-experiment work, we investigate the impact of the full exciton energy landscape on polariton absorption and reflectance. By changing the cavity detuning, we vary the polariton energy relative to the unaffected dark excitons in such a way that we open or close specific phonon-driven scattering channels. We demonstrate both in theory and experiment that this controlled switching of scattering channels manifests in characteristic sharp changes in optical spectra of polaritons. These spectral features can be exploited to extract the position of dark excitons. Our work suggests new possibilities for exploiting polaritons for fingerprinting nanomaterials via their unique exciton landscape. △ Less

Submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.03825 [pdf, ps, other]

Circumventing the polariton bottleneck via dark excitons in 2D semiconductors

Authors: Jamie M. Fitzgerald, Roberto Rosati, Beatriz Ferreira, Hangyong Shan, Christian Schneider, Ermin Malic

Abstract: Efficient scattering into the exciton polariton ground state is a key prerequisite for generating Bose-Einstein condensates and low-threshold polariton lasing. However, this can be challenging to achieve at low densities due to the polariton bottleneck effect that impedes phonon-driven scattering into low-momentum polariton states. The rich exciton landscape of transition metal dichalcogenides (TM… ▽ More Efficient scattering into the exciton polariton ground state is a key prerequisite for generating Bose-Einstein condensates and low-threshold polariton lasing. However, this can be challenging to achieve at low densities due to the polariton bottleneck effect that impedes phonon-driven scattering into low-momentum polariton states. The rich exciton landscape of transition metal dichalcogenides (TMDs) provides potential intervalley scattering pathways via dark excitons to rapidly populate these polaritons. Here, we present a microscopic study exploring the time- and momentum-resolved relaxation of exciton polaritons supported by a \ce{MoSe2} monolayer integrated within a Fabry-Perot cavity. By exploiting phonon-assisted transitions between momentum-dark excitons and the lower polariton branch, we demonstrate that it is possible to circumvent the bottleneck region and efficiently populate the polariton ground state. Furthermore, this intervalley pathway is predicted to give rise to, yet unobserved, angle-resolved phonon sidebands in low-temperature photoluminescence spectra that are associated with momentum-dark excitons. This represents a distinctive experimental signature for efficient phonon-mediated polariton-dark-exciton interactions. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.15663 [pdf, other]

IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models

Authors: Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge Wang

Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted an increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn rich vision-language correlation from image-text pairs. However, despite these developme… ▽ More Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted an increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn rich vision-language correlation from image-text pairs. However, despite these developments, the application of LLMs and VLMs in image quality assessment (IQA), particularly in medical imaging, remains to be explored, which is valuable for objective performance evaluation and potential supplement or even replacement of radiologists' opinions. To this end, this paper introduces IQAGPT, an innovative image quality assessment system integrating an image quality captioning VLM with ChatGPT for generating quality scores and textual reports. First, we build a CT-IQA dataset for training and evaluation, comprising 1,000 CT slices with diverse quality levels professionally annotated. To better leverage the capabilities of LLMs, we convert annotated quality scores into semantically rich text descriptions using a prompt template. Second, we fine-tune the image quality captioning VLM on the CT-IQA dataset to generate quality descriptions. The captioning model fuses the image and text features through cross-modal attention. Third, based on the quality descriptions, users can talk with ChatGPT to rate image quality scores or produce a radiological quality report. Our preliminary results demonstrate the feasibility of assessing image quality with large models. Remarkably, our IQAGPT outperforms GPT-4 and CLIP-IQA, as well as the multi-task classification and regression models that solely rely on images. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 14 pages, 9 figures

arXiv:2312.13190 [pdf, other]

doi 10.1109/AsianHOST59942.2023.10409305

HeisenTrojans: They Are Not There Until They Are Triggered

Authors: Akshita Reddy Mavurapu, Haoqi Shan, Xiaolong Guo, Orlando Arias, Dean Sullivan

Abstract: The hardware security community has made significant advances in detecting Hardware Trojan vulnerabilities using software fuzzing-inspired automated analysis. However, the Electronic Design Automation (EDA) code base itself remains under-examined by the same techniques. Our experiments in fuzzing EDA tools demonstrate that, indeed, they are prone to software bugs. As a consequence, this paper unve… ▽ More The hardware security community has made significant advances in detecting Hardware Trojan vulnerabilities using software fuzzing-inspired automated analysis. However, the Electronic Design Automation (EDA) code base itself remains under-examined by the same techniques. Our experiments in fuzzing EDA tools demonstrate that, indeed, they are prone to software bugs. As a consequence, this paper unveils HeisenTrojan attacks, a new hardware attack that does not generate harmful hardware, but rather, exploits software vulnerabilities in the EDA tools themselves. A key feature of HeisenTrojan attacks is that they are capable of deploying a malicious payload on the system hosting the EDA tools without triggering verification tools because HeisenTrojan attacks do not rely on superfluous or malicious hardware that would otherwise be noticeable. The aim of a HeisenTrojan attack is to execute arbitrary code on the system on which the vulnerable EDA tool is hosted, thereby establishing a permanent presence and providing a beachhead for intrusion into that system. Our analysis reveals 83% of the EDA tools analyzed have exploitable bugs. In what follows, we demonstrate an end- to-end attack and provide analysis on the existing capabilities of fuzzers to find HeisenTrojan attacks in order to emphasize their practicality and the need to secure EDA tools against them. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: This paper has been accepted by IEEE Asian Hardware Oriented Security and Trust Symposium (AsianHOST' 2023)

arXiv:2312.13189 [pdf, ps, other]

doi 10.1109/AsianHOST59942.2023.10409308

When Memory Mappings Attack: On the (Mis)use of the ARM Cortex-M FPB Unit

Authors: Haoqi Shan, Dean Sullivan, Orlando Arias

Abstract: In recent years we have seen an explosion in the usage of low-cost, low-power microcontrollers (MCUs) in embedded devices around us due to the popularity of Internet of Things (IoT) devices. Although this is good from an economics perspective, it has also been detrimental for security as microcontroller-based systems are now a viable attack target. In response, researchers have developed various p… ▽ More In recent years we have seen an explosion in the usage of low-cost, low-power microcontrollers (MCUs) in embedded devices around us due to the popularity of Internet of Things (IoT) devices. Although this is good from an economics perspective, it has also been detrimental for security as microcontroller-based systems are now a viable attack target. In response, researchers have developed various protection mechanisms dedicated to improve security in these resource-constrained embedded systems. We demonstrate in this paper these defenses fall short when we leverage benign memory mapped design-for-debug (DfD) structures added by MCU vendors in their products. In particular, we utilize the Flash Patch and Breakpoint (FPB) unit present in the ARM Cortex-M family to build new attack primitives which can be used to bypass common defenses for embedded devices. Our work serves as a warning and a call in balancing security and debug structures in modern microcontrollers. △ Less

Submitted 20 December, 2023; originally announced December 2023.

Comments: This paper has been accepted by IEEE Asian Hardware Oriented Security and Trust Symposium (AsianHOST' 2023) and won Best Paper Award

arXiv:2312.10479 [pdf, other]

A Soft Contrastive Learning-based Prompt Model for Few-shot Sentiment Analysis

Authors: Jingyi Zhou, Jie Zhou, Jiabao Zhao, Siyin Wang, Haijun Shan, Gui Tao, Qi Zhang, Xuanjing Huang

Abstract: Few-shot text classification has attracted great interest in both academia and industry due to the lack of labeled data in many fields. Different from general text classification (e.g., topic classification), few-shot sentiment classification is more challenging because the semantic distances among the classes are more subtle. For instance, the semantic distances between the sentiment labels in a… ▽ More Few-shot text classification has attracted great interest in both academia and industry due to the lack of labeled data in many fields. Different from general text classification (e.g., topic classification), few-shot sentiment classification is more challenging because the semantic distances among the classes are more subtle. For instance, the semantic distances between the sentiment labels in a positive or negative polarity (e.g., ``love" and ``joy", ``remorse" and ``sadness") are close, while the distances are large for the sentiment labels in two opposite polarities (e.g., ``love" and ``sadness"). To address this problem, we propose a Soft Contrastive learning-based Prompt (\texttt{SCP}) model for few-shot sentiment analysis. First, we design a sentiment-aware chain of thought prompt module to guide the model to predict the sentiment from coarse grain to fine grain via a series of intermediate reasoning steps. Then, we propose a soft contrastive learning algorithm to take the correlation of the labels into account. A series of experiments on several sentiment analysis datasets show the great advantages of \texttt{SCP} by comparing it with SOTA baselines (e.g., ChatGPT). △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP

arXiv:2312.07846 [pdf, other]

Prompted Contextual Transformer for Incomplete-View CT Reconstruction

Authors: Chenglong Ma, Zilong Li, Junjun He, Junping Zhang, Yi Zhang, Hongming Shan

Abstract: Incomplete-view computed tomography (CT) can shorten the data acquisition time and allow scanning of large objects, including sparse-view and limited-angle scenarios, each with various settings, such as different view numbers or angular ranges. However, the reconstructed images present severe, varying artifacts due to different missing projection data patterns. Existing methods tackle these scenar… ▽ More Incomplete-view computed tomography (CT) can shorten the data acquisition time and allow scanning of large objects, including sparse-view and limited-angle scenarios, each with various settings, such as different view numbers or angular ranges. However, the reconstructed images present severe, varying artifacts due to different missing projection data patterns. Existing methods tackle these scenarios/settings separately and individually, which are cumbersome and lack the flexibility to adapt to new settings. To enjoy the multi-setting synergy in a single model, we propose a novel Prompted Contextual Transformer (ProCT) for incomplete-view CT reconstruction. The novelties of ProCT lie in two folds. First, we devise a projection view-aware prompting to provide setting-discriminative information, enabling a single ProCT to handle diverse incomplete-view CT settings. Second, we propose artifact-aware contextual learning to sense artifact pattern knowledge from in-context image pairs, making ProCT capable of accurately removing the complex, unseen artifacts. Extensive experimental results on two publicly available clinical CT datasets demonstrate the superior performance of ProCT over state-of-the-art methods -- including single-setting models -- on a wide range of incomplete-view CT settings, strong transferability to unseen datasets and scenarios, and improved performance when sinogram data is available. The code is available at: https://github.com/Masaaki-75/proct △ Less

Submitted 11 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.06345 [pdf, other]

The Hubble Deep Hydrogen Alpha (HDH$α$) Project: I. Catalog of Emission-line Galaxies

Authors: Shuairu Zhu, Zhen-Ya Zheng, James Rhoads, Junxian Wang, Linhua Jiang, Chunyan Jiang, Fang-Ting Yuan, P. T. Rahna, Weida Hu, Ruqiu Lin, Huanyuan Shan, Chun Xu, Leopoldo Infante, L. Felipe Barrientos, Xianzhong Zheng, Guanwen Fang, Zhixiong Liang

Abstract: We present the first results of the Hubble Deep Hydrogen Alpha (HDH$α$) project, which analyzes the space-borne deep H$α$ narrowband imaging data in the GOODS-S region. The HDH$α$ data comprises 72 orbits' images taken with the HST ACS/WFC F658N filter. The exposure time varies across a total area of $\sim$76.1 $\rm{arcmin}^2$, adding up to a total exposure time of 195.7 ks, among which 68.8 ks ar… ▽ More We present the first results of the Hubble Deep Hydrogen Alpha (HDH$α$) project, which analyzes the space-borne deep H$α$ narrowband imaging data in the GOODS-S region. The HDH$α$ data comprises 72 orbits' images taken with the HST ACS/WFC F658N filter. The exposure time varies across a total area of $\sim$76.1 $\rm{arcmin}^2$, adding up to a total exposure time of 195.7 ks, among which 68.8 ks are spent in the deepest region. These images are aligned, reprojected, and combined to have the same pixel grid as the Hubble Legacy Fields (HLF). The scientific goals of the HDH$α$ include establishing a sample of emission-line galaxies (ELGs) including [O III] emitters at $z\sim$ 0.3, [O II] emitters at $z\sim$ 0.8, and Lyman-$α$ emitters (LAEs) at $z \sim 4.4$, studying the line morphology of ELGs with high resolution imaging data, and statistically analyzing the line luminosity functions and line equivalent-width distributions of ELGs selected with HST. Furthermore, the HDH$α$ project enhances the legacy value of the GOODS-S field by contributing the first HST-based narrowband image to the existing data sets, which includes the HST broadband data and other ancillary data from X-ray to radio taken by other facilities. In this paper, we describe the data reduction process of the HDH$α$, select ELGs based on HST's F658N and broadband data, validate the redshifts of the selected candidates by cross matching with the public spectroscopic catalogs in the GOODS-S, and present a final catalog of the confirmed [O III] emitters at $z\sim$ 0.3, [O II] emitters at $z\sim$ 0.8, and LAEs at $z \sim 4.4$. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 27 pages, 14 figures, 9 tables, accepted by ApJS

arXiv:2312.06239 [pdf, other]

CSST Strong Lensing Preparation: Forecasting the galaxy-galaxy strong lensing population for the China Space Station Telescope

Authors: Xiaoyue Cao, Ran Li, Nan Li, Rui Li, Yun Chen, Keyi Ding, Huanyuan Shan, Hu Zhan, Xin Zhang, Wei Du, Shuo Cao

Abstract: Galaxy-galaxy strong gravitational lensing (GGSL) is a powerful probe for the formation and evolution of galaxies and cosmology, while the sample size of GGSLs leads to considerable uncertainties and potential bias. The China Space Station Telescope (CSST, to be launched in late 2026) will conduct observations across 17,500 square degrees of the sky, capturing images in the $ugriz$ bands with a sp… ▽ More Galaxy-galaxy strong gravitational lensing (GGSL) is a powerful probe for the formation and evolution of galaxies and cosmology, while the sample size of GGSLs leads to considerable uncertainties and potential bias. The China Space Station Telescope (CSST, to be launched in late 2026) will conduct observations across 17,500 square degrees of the sky, capturing images in the $ugriz$ bands with a spatial resolution comparable to that of the Hubble Space Telescope. We ran a set of Monte Carlo simulations to predict that the CSST's wide-field survey will observe $\sim$160,000 galaxy-galaxy strong lenses over its lifespan, increasing the number of existing galaxy-galaxy strong lens samples by three orders of magnitude. This is comparable to the capabilities of the $\it Euclid$ telescope but with the added benefit of additional color information. Specifically, the CSST can detect strong lenses with Einstein radii about $0.64\pm0.42^{"}$, corresponding to the velocity dispersions of $217.19 \pm 50.55 \, \text{km/s}$. These lenses exhibit a median magnification of $\sim$5. The apparent magnitude of the unlensed sources in the g-band is $25.87 \pm 1.19$. The signal-to-noise ratio of the lensed images covers a range of $\sim 20$ to $\sim 1000$, allowing us to determine the Einstein radius with an accuracy ranging from $\sim 1 \%$ to $\sim 0.1 \%$, ignoring various modeling systematics. Our estimates indicate that CSST can observe rare systems like double source-plane and spiral galaxy lenses. The above selection functions of the CSST strong lensing observation help optimize the strategy of finding and modeling GGSLs. △ Less

Submitted 30 July, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 17 pages, 14 figures. Accepted by MNRAS

arXiv:2312.05038 [pdf, other]

Prompt-In-Prompt Learning for Universal Image Restoration

Authors: Zilong Li, Yiming Lei, Chenglong Ma, Junping Zhang, Hongming Shan

Abstract: Image restoration, which aims to retrieve and enhance degraded images, is fundamental across a wide range of applications. While conventional deep learning approaches have notably improved the image quality across various tasks, they still suffer from (i) the high storage cost needed for various task-specific models and (ii) the lack of interactivity and flexibility, hindering their wider applicat… ▽ More Image restoration, which aims to retrieve and enhance degraded images, is fundamental across a wide range of applications. While conventional deep learning approaches have notably improved the image quality across various tasks, they still suffer from (i) the high storage cost needed for various task-specific models and (ii) the lack of interactivity and flexibility, hindering their wider application. Drawing inspiration from the pronounced success of prompts in both linguistic and visual domains, we propose novel Prompt-In-Prompt learning for universal image restoration, named PIP. First, we present two novel prompts, a degradation-aware prompt to encode high-level degradation knowledge and a basic restoration prompt to provide essential low-level information. Second, we devise a novel prompt-to-prompt interaction module to fuse these two prompts into a universal restoration prompt. Third, we introduce a selective prompt-to-feature interaction module to modulate the degradation-related feature. By doing so, the resultant PIP works as a plug-and-play module to enhance existing restoration models for universal image restoration. Extensive experimental results demonstrate the superior performance of PIP on multiple restoration tasks, including image denoising, deraining, dehazing, deblurring, and low-light enhancement. Remarkably, PIP is interpretable, flexible, efficient, and easy-to-use, showing promising potential for real-world applications. The code is available at https://github.com/longzilicart/pip_universal. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04433 [pdf, other]

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Authors: Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

Abstract: Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. D… ▽ More Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model. The subject learning aims to accurately capture the fine appearance of the subject from provided images, which is achieved by combining textual inversion and fine-tuning of our carefully designed identity adapter. In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern. Combining these two lightweight and efficient adapters allows for flexible customization of any subject with any motion. Extensive experimental results demonstrate the superior performance of our DreamVideo over the state-of-the-art methods for customized video generation. Our project page is at https://dreamvideo-t2v.github.io. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2311.12386 [pdf, other]

Point, Segment and Count: A Generalized Framework for Object Counting

Authors: Zhizhong Huang, Mingliang Dai, Yi Zhang, Junping Zhang, Hongming Shan

Abstract: Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names, \emph{a.k.a} few-shot and zero-shot counting. In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection. Our framework combines the superior advantages of two foundation models without compromising their zero-shot capability:… ▽ More Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names, \emph{a.k.a} few-shot and zero-shot counting. In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection. Our framework combines the superior advantages of two foundation models without compromising their zero-shot capability: (\textbf{i}) SAM to segment all possible objects as mask proposals, and (\textbf{ii}) CLIP to classify proposals to obtain accurate object counts. However, this strategy meets the obstacles of efficiency overhead and the small crowded objects that cannot be localized and distinguished. To address these issues, our framework, termed PseCo, follows three steps: point, segment, and count. Specifically, we first propose a class-agnostic object localization to provide accurate but least point prompts for SAM, which consequently not only reduces computation costs but also avoids missing small objects. Furthermore, we propose a generalized object classification that leverages CLIP image/text embeddings as the classifier, following a hierarchical knowledge distillation to obtain discriminative classifications among hierarchical mask proposals. Extensive experimental results on FSC-147, COCO, and LVIS demonstrate that PseCo achieves state-of-the-art performance in both few-shot/zero-shot object counting/detection. Code: https://github.com/Hzzone/PseCo △ Less

Submitted 27 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

Comments: Accepted by CVPR 2024. Camera ready

arXiv:2311.12049 [pdf, other]

Energizing Federated Learning via Filter-Aware Attention

Authors: Ziyuan Yang, Zerui Shao, Huijie Huangfu, Hui Yu, Andrew Beng Jin Teoh, Xiaoxiao Li, Hongming Shan, Yi Zhang

Abstract: Federated learning (FL) is a promising distributed paradigm, eliminating the need for data sharing but facing challenges from data heterogeneity. Personalized parameter generation through a hypernetwork proves effective, yet existing methods fail to personalize local model structures. This leads to redundant parameters struggling to adapt to diverse data distributions. To address these limitations… ▽ More Federated learning (FL) is a promising distributed paradigm, eliminating the need for data sharing but facing challenges from data heterogeneity. Personalized parameter generation through a hypernetwork proves effective, yet existing methods fail to personalize local model structures. This leads to redundant parameters struggling to adapt to diverse data distributions. To address these limitations, we propose FedOFA, utilizing personalized orthogonal filter attention for parameter recalibration. The core is the Two-stream Filter-aware Attention (TFA) module, meticulously designed to extract personalized filter-aware attention maps, incorporating Intra-Filter Attention (IntraFa) and Inter-Filter Attention (InterFA) streams. These streams enhance representation capability and explore optimal implicit structures for local models. Orthogonal regularization minimizes redundancy by averting inter-correlation between filters. Furthermore, we introduce an Attention-Guided Pruning Strategy (AGPS) for communication efficiency. AGPS selectively retains crucial neurons while masking redundant ones, reducing communication costs without performance sacrifice. Importantly, FedOFA operates on the server side, incurring no additional computational cost on the client, making it advantageous in communication-constrained scenarios. Extensive experiments validate superior performance over state-of-the-art approaches, with code availability upon paper acceptance. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.11683 [pdf, ps, other]

SIAM: A Simple Alternating Mixer for Video Prediction

Authors: Xin Zheng, Ziang Peng, Yuan Cao, Hongming Shan, Junping Zhang

Abstract: Video prediction, predicting future frames from the previous ones, has broad applications such as autonomous driving and weather forecasting. Existing state-of-the-art methods typically focus on extracting either spatial, temporal, or spatiotemporal features from videos. Different feature focuses, resulting from different network architectures, may make the resultant models excel at some video pre… ▽ More Video prediction, predicting future frames from the previous ones, has broad applications such as autonomous driving and weather forecasting. Existing state-of-the-art methods typically focus on extracting either spatial, temporal, or spatiotemporal features from videos. Different feature focuses, resulting from different network architectures, may make the resultant models excel at some video prediction tasks but perform poorly on others. Towards a more generic video prediction solution, we explicitly model these features in a unified encoder-decoder framework and propose a novel simple alternating Mixer (SIAM). The novelty of SIAM lies in the design of dimension alternating mixing (DaMi) blocks, which can model spatial, temporal, and spatiotemporal features through alternating the dimensions of the feature maps. Extensive experimental results demonstrate the superior performance of the proposed SIAM on four benchmark video datasets covering both synthetic and real-world scenarios. △ Less

Submitted 20 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.10951 [pdf, other]

Detecting Cosmic 21 cm Global Signal Using an Improved Polynomial Fitting Algorithm

Authors: Tianyang Liu, Junhua Gu, Quan Guo, Huanyuan Shan, Qian Zheng, Jingying Wang

Abstract: Detecting the cosmic 21 cm signal from Epoch of Reionization (EoR) has always been a difficult task. Although the Galactic foreground can be regarded as a smooth power-law spectrum, due to the chromaticity of the antenna, additional structure will be introduced into the global spectrum, making the polynomial fitting algorithm perform poorly. In this paper, we introduce an improved polynomial fitti… ▽ More Detecting the cosmic 21 cm signal from Epoch of Reionization (EoR) has always been a difficult task. Although the Galactic foreground can be regarded as a smooth power-law spectrum, due to the chromaticity of the antenna, additional structure will be introduced into the global spectrum, making the polynomial fitting algorithm perform poorly. In this paper, we introduce an improved polynomial fitting algorithm - the Vari-Zeroth-Order Polynomial (VZOP) fitting and use it to fit the simulation data. This algorithm is developed for the upcoming Low-frequency Anechoic Chamber Experiment (LACE), yet it is a general method suitable for application in any single antenna-based global 21 cm signal experiment. VZOP defines a 24-hour averaged beam model that brings information about the antenna beam into the polynomial model. Assuming that the beam can be measured, VZOP can successfully recover the 21 cm absorption feature, even if the beam is extremely frequency-dependent. In real observations, due to various systematics, the corrected measured beam contains residual errors that are not completely random. Assuming the errors are frequency-dependent, VZOP is capable of recovering the 21 cm absorption feature even when the error reaches 10%. Even in the most extreme scenario where the errors are completely random, VZOP can at least give a fitting result that is not worse than the common polynomial fitting. In conclusion, the fitting effect of VZOP depends on the structure of the error and the accuracy of the beam measurement. △ Less

Submitted 17 November, 2023; originally announced November 2023.

Comments: 14 pages, 15 figures, Accepted for publication in MNRAS

arXiv:2311.09532 [pdf, other]

LightEMU: Hardware Assisted Fuzzing of Trusted Applications

Authors: Haoqi Shan, Sravani Nissankararao, Yujia Liu, Moyao Huang, Shuo Wang, Yier Jin, Dean Sullivan

Abstract: Trusted Execution Environments (TEEs) are deployed in many CPU designs because of the confidentiality and integrity guarantees they provide. ARM TrustZone is a TEE extensively deployed on smart phones, IoT devices, and notebooks. Specifically, TrustZone is used to separate code execution and data into two worlds, normal world and secure world. However, this separation inherently prevents tradition… ▽ More Trusted Execution Environments (TEEs) are deployed in many CPU designs because of the confidentiality and integrity guarantees they provide. ARM TrustZone is a TEE extensively deployed on smart phones, IoT devices, and notebooks. Specifically, TrustZone is used to separate code execution and data into two worlds, normal world and secure world. However, this separation inherently prevents traditional fuzzing approaches which rely upon coverage-guided feedback and existing fuzzing research is, therefore, extremely limited. In this paper, we present a native and generic method to perform efficient and scalable feedback-driven fuzzing on Trusted Applications (TAs) using ARM CoreSight. We propose LightEMU, a novel fuzzing framework that allows us to fuzz TAs by decoupling them from relied TEE. We argue that LightEMU is a promising first-stage approach for rapidly discovering TA vulnerabilities prior to investing effort in whole system TEE evaluation precisely because the majority of publicly disclosed TrustZone bugs reside in the TA code itself. We implement LightEMU and adapt it to Teegris, Trusty, OP-TEE and QSEE and evaluate 8 real-world TAs while triggering 3 unique crashes and achieving x10 time speedup when fuzzing TAs using the state-of-the-art TrustZone fuzzing framework. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: This paper has been accepted by IEEE International Symposium on Hardware Oriented Security and Trust (HOST'2024)

arXiv:2310.15053 [pdf, other]

Weak Lensing Reconstruction by Counting DECaLS Galaxies

Authors: Jian Qin, Pengjie Zhang, Haojie Xu, Yu Yu, Ji Yao, Ruijie Ma, Huanyuan Shan

Abstract: Alternative to weak lensing measurements through cosmic shear, we present a weak lensing convergence $\hatκ$ map reconstructed through cosmic magnification effect in DECaLS galaxies of the DESI imaging surveys DR9. This is achieved by linearly weighing $12$ maps of galaxy number overdensity in different magnitude bins of $grz$ photometry bands. The weight is designed to eliminate the mean galaxy d… ▽ More Alternative to weak lensing measurements through cosmic shear, we present a weak lensing convergence $\hatκ$ map reconstructed through cosmic magnification effect in DECaLS galaxies of the DESI imaging surveys DR9. This is achieved by linearly weighing $12$ maps of galaxy number overdensity in different magnitude bins of $grz$ photometry bands. The weight is designed to eliminate the mean galaxy deterministic bias, minimize galaxy shot noise while maintaining the lensing convergence signal. We also perform corrections of imaging systematics in the galaxy number overdensity. The $\hatκ$ map has $8365$ deg$^2$ sky coverage. Given the low number density of DECaLS galaxies, the $\hatκ$ map is overwhelmed by shot noise and the map quality is difficult to evaluate using the lensing auto-correlation. Alternatively, we measure its cross-correlation with the cosmic shear catalogs of DECaLS galaxies of DESI imaging surveys DR8, which has $8365$ deg$^2$ overlap in sky coverage with the $\hatκ$ map. We detect a convergence-shear cross-correlation signal with $S/N\simeq 10$. The analysis also shows that the galaxy intrinsic clustering is suppressed by a factor $\mathcal{O}(10^2)$ and the residual galaxy clustering contamination in the $\hatκ$ map is consistent with zero. Various tests with different galaxy and shear samples, and the Akaike information criterion analysis all support the lensing detection. So is the imaging systematics corrections, which enhance the lensing signal detection by $\sim 30\%$. We discuss various issues for further improvement of the measurements. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.13929 [pdf, other]

doi 10.1093/mnras/stad3429

The Intensity of Diffuse Galactic Emission Reflected by Meteor Trails

Authors: Feiyu Zhao, Ruxi Liang, Zepei Yang, Huanyuan Shan, Qian Zheng, Qiqian Zhang, Quan Guo

Abstract: We calculate the reflection of diffuse galactic emission by meteor trails and investigate its potential relationship to Meteor Radio Afterglow (MRA). The formula to calculate the reflection of diffuse galactic emission is derived from a simplified case, assuming that the signals are mirrored by the cylindrical over-dense ionization trail of meteors. The overall observed reflection is simulated thr… ▽ More We calculate the reflection of diffuse galactic emission by meteor trails and investigate its potential relationship to Meteor Radio Afterglow (MRA). The formula to calculate the reflection of diffuse galactic emission is derived from a simplified case, assuming that the signals are mirrored by the cylindrical over-dense ionization trail of meteors. The overall observed reflection is simulated through a ray tracing algorithm together with the diffuse galactic emission modelled by the GSM sky model. We demonstrate that the spectrum of the reflected signal is broadband and follows a power law with a negative spectral index of around -1.3. The intensity of the reflected signal varies with local sidereal time and the brightness of the meteor and can reach 2000 Jy. These results agree with some previous observations of MRAs. Therefore, we think that the reflection of galactic emission by meteor trails can be a possible mechanism causing MRAs, which is worthy of further research. △ Less

Submitted 15 November, 2023; v1 submitted 21 October, 2023; originally announced October 2023.

Comments: 15 pages, 10 figures, 2 tables, accepted for publication in MNRAS, 10.1093/mnras/stad3429

arXiv:2310.09821 [pdf, other]

LICO: Explainable Models with Language-Image Consistency

Authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan

Abstract: Interpreting the decisions of deep learning models has been actively studied since the explosion of deep neural networks. One of the most convincing interpretation approaches is salience-based visual interpretation, such as Grad-CAM, where the generation of attention maps depends merely on categorical labels. Although existing interpretation methods can provide explainable decision clues, they oft… ▽ More Interpreting the decisions of deep learning models has been actively studied since the explosion of deep neural networks. One of the most convincing interpretation approaches is salience-based visual interpretation, such as Grad-CAM, where the generation of attention maps depends merely on categorical labels. Although existing interpretation methods can provide explainable decision clues, they often yield partial correspondence between image and saliency maps due to the limited discriminative information from one-hot labels. This paper develops a Language-Image COnsistency model for explainable image classification, termed LICO, by correlating learnable linguistic prompts with corresponding visual features in a coarse-to-fine manner. Specifically, we first establish a coarse global manifold structure alignment by minimizing the distance between the distributions of image and language features. We then achieve fine-grained saliency maps by applying optimal transport (OT) theory to assign local feature maps with class-specific prompts. Extensive experimental results on eight benchmark datasets demonstrate that the proposed LICO achieves a significant improvement in generating more explainable attention maps in conjunction with existing interpretation methods such as Grad-CAM. Remarkably, LICO improves the classification performance of existing models without introducing any computational overhead during inference. Source code is made available at https://github.com/ymLeiFDU/LICO. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: Accepted by NeurIPS 2023

arXiv:2310.08880 [pdf, other]

On the sum of the first two largest signless Laplacian eigenvalues of a graph

Authors: Zi-Ming Zhou, Chang-Xiang He, Hai-Ying Shan

Abstract: For a graph $G$, let $S_2(G)$ be the sum of the first two largest signless Laplacian eigenvalues of $G$, and $f(G)=e(G)+3-S_2(G)$. Oliveira, Lima, Rama and Carvalho conjectured that $K^+_{1,n-1}$ (the star graph with an additional edge) is the unique graph with minimum value of $f(G)$ on $n$ vertices. In this paper, we prove this conjecture, which also confirm a conjecture for the upper bound of… ▽ More For a graph $G$, let $S_2(G)$ be the sum of the first two largest signless Laplacian eigenvalues of $G$, and $f(G)=e(G)+3-S_2(G)$. Oliveira, Lima, Rama and Carvalho conjectured that $K^+_{1,n-1}$ (the star graph with an additional edge) is the unique graph with minimum value of $f(G)$ on $n$ vertices. In this paper, we prove this conjecture, which also confirm a conjecture for the upper bound of $S_2(G)$ proposed by Ashraf et al. △ Less

Submitted 13 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 15 pages, 5 figures

arXiv:2310.06518 [pdf, other]

doi 10.1103/PhysRevD.109.063509

21-cm foreground removal using AI and frequency-difference technique

Authors: Feng Shi, Haoxiang Chang, Le Zhang, Huanyuan Shan, Jiajun Zhang, Suiping Zhou, Ming Jiang, Zitong Wang

Abstract: The deep learning technique has been employed in removing foreground contaminants from 21 cm intensity mapping, but its effectiveness is limited by the large dynamic range of the foreground amplitude. In this study, we develop a novel foreground removal technique grounded in U-Net networks. The essence of this technique lies in introducing an innovative data preprocessing step specifically, utiliz… ▽ More The deep learning technique has been employed in removing foreground contaminants from 21 cm intensity mapping, but its effectiveness is limited by the large dynamic range of the foreground amplitude. In this study, we develop a novel foreground removal technique grounded in U-Net networks. The essence of this technique lies in introducing an innovative data preprocessing step specifically, utilizing the temperature difference between neighboring frequency bands as input, which can substantially reduce the dynamic range of foreground amplitudes by approximately two orders of magnitude. This reduction proves to be highly advantageous for the U-Net foreground removal. We observe that the HI signal can be reliably recovered, as indicated by the cross-correlation power spectra showing unity agreement at the scale of $k < 0.3 h^{-1}$Mpc in the absence of instrumental effects. Moreover, accounting for the systematic beam effects, our reconstruction displays consistent auto-correlation and cross-correlation power spectrum ratios at the $1σ$ level across scales $k \lesssim 0.1 h^{-1}$Mpc, with only a 10% reduction observed in the cross-correlation power spectrum at $k\simeq0.2 h^{-1}$Mpc. The effects of redshift-space distortion are also reconstructed successfully, as evidenced by the quadrupole power spectra matching. In comparison, our method outperforms the traditional Principal Component Analysis method, which derived cross-correlation ratios are underestimated by around 60%. We simulated various white noise levels in the map and found that the mean cross-correlation ratio $\bar{R}_\mathrm{cross} \gtrsim 0.8$ when the level of the thermal noise is smaller than or equal to that of the HI signal. We conclude that the proposed frequency-difference technique can significantly enhance network performance by reducing the amplitude range of foregrounds and aiding in the prevention of HI loss. △ Less

Submitted 12 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 20 pages, 19 figures

Journal ref: Physical Review D 109, 063509 (2024)

Showing 1–50 of 277 results for author: Shan, H