Search | arXiv e-print repository

DV-FSR: A Dual-View Target Attack Framework for Federated Sequential Recommendation

Authors: Qitao Qin, Yucong Luo, Mingyue Cheng, Qingyang Mao, Chenyi Lei

Abstract: Federated recommendation (FedRec) preserves user privacy by enabling decentralized training of personalized models, but this architecture is inherently vulnerable to adversarial attacks. Significant research has been conducted on targeted attacks in FedRec systems, motivated by commercial and social influence considerations. However, much of this work has largely overlooked the differential robust… ▽ More Federated recommendation (FedRec) preserves user privacy by enabling decentralized training of personalized models, but this architecture is inherently vulnerable to adversarial attacks. Significant research has been conducted on targeted attacks in FedRec systems, motivated by commercial and social influence considerations. However, much of this work has largely overlooked the differential robustness of recommendation models. Moreover, our empirical findings indicate that existing targeted attack methods achieve only limited effectiveness in Federated Sequential Recommendation (FSR) tasks. Driven by these observations, we focus on investigating targeted attacks in FSR and propose a novel dualview attack framework, named DV-FSR. This attack method uniquely combines a sampling-based explicit strategy with a contrastive learning-based implicit gradient strategy to orchestrate a coordinated attack. Additionally, we introduce a specific defense mechanism tailored for targeted attacks in FSR, aiming to evaluate the mitigation effects of the attack method we proposed. Extensive experiments validate the effectiveness of our proposed approach on representative sequential models. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.04652 [pdf, other]

Privacy-Preserving Race/Ethnicity Estimation for Algorithmic Bias Measurement in the U.S

Authors: Saikrishna Badrinarayanan, Osonde Osoba, Miao Cheng, Ryan Rogers, Sakshi Jain, Rahul Tandra, Natesh S. Pillai

Abstract: AI fairness measurements, including tests for equal treatment, often take the form of disaggregated evaluations of AI systems. Such measurements are an important part of Responsible AI operations. These measurements compare system performance across demographic groups or sub-populations and typically require member-level demographic signals such as gender, race, ethnicity, and location. However, s… ▽ More AI fairness measurements, including tests for equal treatment, often take the form of disaggregated evaluations of AI systems. Such measurements are an important part of Responsible AI operations. These measurements compare system performance across demographic groups or sub-populations and typically require member-level demographic signals such as gender, race, ethnicity, and location. However, sensitive member-level demographic attributes like race and ethnicity can be challenging to obtain and use due to platform choices, legal constraints, and cultural norms. In this paper, we focus on the task of enabling AI fairness measurements on race/ethnicity for \emph{U.S. LinkedIn members} in a privacy-preserving manner. We present the Privacy-Preserving Probabilistic Race/Ethnicity Estimation (PPRE) method for performing this task. PPRE combines the Bayesian Improved Surname Geocoding (BISG) model, a sparse LinkedIn survey sample of self-reported demographics, and privacy-enhancing technologies like secure two-party computation and differential privacy to enable meaningful fairness measurements while preserving member privacy. We provide details of the PPRE method and its privacy guarantees. We then illustrate sample measurement operations. We conclude with a review of open research and engineering challenges for expanding our privacy-preserving fairness measurement capabilities. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: Saikrishna Badrinarayanan and Osonde Osoba contributed equally to this work

arXiv:2408.16057 [pdf, other]

Corner Charge Fluctuations and Many-Body Quantum Geometry

Authors: Xiao-Chuan Wu, Kang-Le Cai, Meng Cheng, Prashant Kumar

Abstract: In many-body systems with U(1) global symmetry, the charge fluctuations in a subregion reveal important insights into entanglement and other global properties. For subregions with sharp corners, bipartite fluctuations have been predicted to exhibit a universal shape dependence on the corner angle in certain quantum phases and transitions, characterized by a "universal angle function" and a "univer… ▽ More In many-body systems with U(1) global symmetry, the charge fluctuations in a subregion reveal important insights into entanglement and other global properties. For subregions with sharp corners, bipartite fluctuations have been predicted to exhibit a universal shape dependence on the corner angle in certain quantum phases and transitions, characterized by a "universal angle function" and a "universal coefficient." However, we demonstrate that this simple formula is insufficient for charge insulators, including composite fermi liquids. In these systems, the corner contribution may depend on the corner angle, subregion orientation, and other microscopic details. We provide an infinite series representation of the corner term, introducing orientation-resolved universal angle functions with their non-universal coefficients. In the small-angle limit or under orientation averaging, the remaining terms' coefficients are fully determined by the many-body quantum metric, which, while not universal, adheres to both a universal topological lower bound and an energetic upper bound. We also clarify the conditions for bound saturation in (anisotropic) Landau levels, leveraging the generalized Kohn theorem and holomorphic properties of many-body wavefunctions. We find that a broad class of fractional quantum Hall wavefunctions, including unprojected parton states and composite-fermion Fermi sea wavefunctions, saturates the bounds. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 32 pages, 7 figures

arXiv:2408.12654 [pdf, other]

White Dwarf-Black Hole Binary Progenitors of Low Redshift Gamma-ray Bursts

Authors: Nicole M. Lloyd-Ronning, Jarrett L. Johnson, Phoebe R. Upton Sanderbeck, Makana Silva, Roseanne M. Cheng

Abstract: Although there is strong evidence that many long GRBs are associated with the collapse of a massive star, tantalizing results in recent years have upended the direct association of {\em all} long GRBs with massive stars. In particular, kilonova signals in some long GRB light curves as well as a suggested uptick in the rate density of long GRBs at low redshifts (deviating significantly from the sta… ▽ More Although there is strong evidence that many long GRBs are associated with the collapse of a massive star, tantalizing results in recent years have upended the direct association of {\em all} long GRBs with massive stars. In particular, kilonova signals in some long GRB light curves as well as a suggested uptick in the rate density of long GRBs at low redshifts (deviating significantly from the star formation rate) suggest that compact object mergers may be a non-negligible fraction of the long GRB population. Here we investigate the contribution of white dwarf-black hole mergers to the long GRB population. We present evidence for the deviation of the long GRB rate density from the star formation rate at low redshifts, and provide analytic and numerical arguments for why a white dwarf-black hole merger system may be a viable progenitor to explain this deviation. We show the range of parameter space in which the durations, energetics, and rates of these systems can account for a significant sub-population of low-redshift long GRBs. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Report number: LA-UR-24-28685

arXiv:2408.11379 [pdf]

High quality epitaxial piezoelectric and ferroelectric wurtzite Al$_{1-x}$Sc$_x$N thin films

Authors: Yang Zeng, Yihan Lei, Yanghe Wang, Mingqiang Cheng, Luocheng Liao, Xuyang Wang, Jinxin Ge, Zhenghao Liu, Wenjie Ming, Chao Li, Shuhong Xie, Jiangyu Li, Changjian Li

Abstract: Piezoelectric and ferroelectric wurtzite are promising to reshape modern microelectronics because they can be easily integrated with mainstream semiconductor technology. Sc doped AlN (Al$_{1-x}$Sc$_x$N) has attracted much attention for its enhanced piezoelectric and emerging ferroelectric properties, yet the commonly used sputtering results in polycrystalline Al$_{1-x}$Sc$_x$N films with high leak… ▽ More Piezoelectric and ferroelectric wurtzite are promising to reshape modern microelectronics because they can be easily integrated with mainstream semiconductor technology. Sc doped AlN (Al$_{1-x}$Sc$_x$N) has attracted much attention for its enhanced piezoelectric and emerging ferroelectric properties, yet the commonly used sputtering results in polycrystalline Al$_{1-x}$Sc$_x$N films with high leakage current. Here we report the pulsed laser deposition of single crystalline epitaxial Al$_{1-x}$Sc$_x$N thin films on sapphire and 4H-SiC substrates. Pure wurtzite phase is maintained up to $x = 0.3$ with minimal oxygen contamination. Polarization is estimated to be 140 $μ$C/cm$^2$ via atomic scale microscopy imaging and found to be switchable via a scanning probe. The piezoelectric coefficient is found to be 5 times of undoped one when $x = 0.3$, making it desirable for high frequency radiofrequency (RF) filters and three-dimensional nonvolatile memories. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11085 [pdf, other]

GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

Authors: Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Zirui Wang, Ming Cheng, Victor Adrian Prisacariu, Tristan Braud

Abstract: We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences.… ▽ More We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences. GSLoc obviates the need for training feature extractors or descriptors by operating directly on RGB images, utilizing the 3D vision foundation model, MASt3R, for precise 2D matching. To improve the robustness of our model in challenging outdoor environments, we incorporate an exposure-adaptive module within the 3DGS framework. Consequently, GSLoc enables efficient pose refinement given a single RGB query and a coarse initial pose estimation. Our proposed approach surpasses leading NeRF-based optimization methods in both accuracy and runtime across indoor and outdoor visual localization benchmarks, achieving state-of-the-art accuracy on two indoor datasets. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: The project page is available at https://gsloc.active.vision

arXiv:2408.08429 [pdf, ps, other]

SLOCC and LU classification of black holes with eight electric and magnetic charges

Authors: Dafa Li, Maggie Cheng, Xiangrong Li, Shuwang Li

Abstract: In \cite{Linde}, Kallosh and Linde discussed the SLOCC classification of black holes. However, the criteria for the SLOCC classification of black holes have not been given. In addition, the LU classification of black holes has not been studied in the past. In this paper we will consider both SLOCC and LU classification of the STU black holes with four integer electric charges $q_{i} $ and four int… ▽ More In \cite{Linde}, Kallosh and Linde discussed the SLOCC classification of black holes. However, the criteria for the SLOCC classification of black holes have not been given. In addition, the LU classification of black holes has not been studied in the past. In this paper we will consider both SLOCC and LU classification of the STU black holes with four integer electric charges $q_{i} $ and four integer magnetic charges $p^{i}$, $i=0,1,2,3$. Two STU black holes with eight charges are considered SLOCC (LU) equivalent if and only if their corresponding states of three qubits are SLOCC (LU) equivalent. Under this definition, we give criteria for the classification of the eight-charge STU black holes under SLOCC and under LU, respectively. We will study the classification of the black holes via the classification of SLOCC and LU entanglement of three qubits. We then identify a set of black holes corresponding to the state W of three qubits, which is of interest since it has the maximal average von Neumann entropy of entanglement. Via von Neumann entanglement entropy, we partition the STU black holes corresponding to pure states of GHZ SLOCC class into five families under LU. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Journal ref: Int J theor phys 63, issue 6, 144 (2024)

arXiv:2408.07825 [pdf, other]

SSRFlow: Semantic-aware Fusion with Spatial Temporal Re-embedding for Real-world Scene Flow

Authors: Zhiyang Lu, Qinghan Chen, Zhimin Yuan, Ming Cheng

Abstract: Scene flow, which provides the 3D motion field of the first frame from two consecutive point clouds, is vital for dynamic scene perception. However, contemporary scene flow methods face three major challenges. Firstly, they lack global flow embedding or only consider the context of individual point clouds before embedding, leading to embedded points struggling to perceive the consistent semantic r… ▽ More Scene flow, which provides the 3D motion field of the first frame from two consecutive point clouds, is vital for dynamic scene perception. However, contemporary scene flow methods face three major challenges. Firstly, they lack global flow embedding or only consider the context of individual point clouds before embedding, leading to embedded points struggling to perceive the consistent semantic relationship of another frame. To address this issue, we propose a novel approach called Dual Cross Attentive (DCA) for the latent fusion and alignment between two frames based on semantic contexts. This is then integrated into Global Fusion Flow Embedding (GF) to initialize flow embedding based on global correlations in both contextual and Euclidean spaces. Secondly, deformations exist in non-rigid objects after the warping layer, which distorts the spatiotemporal relation between the consecutive frames. For a more precise estimation of residual flow at next-level, the Spatial Temporal Re-embedding (STR) module is devised to update the point sequence features at current-level. Lastly, poor generalization is often observed due to the significant domain gap between synthetic and LiDAR-scanned datasets. We leverage novel domain adaptive losses to effectively bridge the gap of motion inference from synthetic to real-world. Experiments demonstrate that our approach achieves state-of-the-art (SOTA) performance across various datasets, with particularly outstanding results in real-world LiDAR-scanned situations. Our code will be released upon publication. △ Less

Submitted 30 July, 2024; originally announced August 2024.

Comments: 19 pages,12 figures. arXiv admin note: substantial text overlap with arXiv:2403.07032

arXiv:2408.03984 [pdf, other]

Fractionalization as an alternate to charge ordering in electronic insulators

Authors: Seth Musser, Meng Cheng, T. Senthil

Abstract: Incompressible insulating phases of electronic systems at partial filling of a lattice are often associated with charge ordering that breaks lattice symmetry. The resulting phases have an enlarged unit cell with an effective integer filling. Here we explore the possibility of insulating states - which we dub "Quantum Charge Liquids" (QCL) - at partial lattice filling that preserve lattice translat… ▽ More Incompressible insulating phases of electronic systems at partial filling of a lattice are often associated with charge ordering that breaks lattice symmetry. The resulting phases have an enlarged unit cell with an effective integer filling. Here we explore the possibility of insulating states - which we dub "Quantum Charge Liquids" (QCL) - at partial lattice filling that preserve lattice translation symmetry. Such QCL phases must necessarily either have gapped fractionally charged excitations and associated topological order or have gapless neutral excitations. We establish some general constraints on gapped fermionic QCL phases that restrict the nature of their topological order. We prove a number of results on the minimal topological order that is consistent with the lattice filling. In particular we show that at rational fillings $ν= p/q$ with $q$ an even integer the minimal ground state degeneracy on a torus of the fermionic QCL is $4q^2$, 4 times larger than that of the bosonic QCL at the same filling. We comment on models and physical systems which may host fermionic QCL phases and discuss the phenomenology of these phases. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 29 pages, 6 figures

arXiv:2408.03934 [pdf, other]

From Words to Worth: Newborn Article Impact Prediction with LLM

Authors: Penghai Zhao, Qinghua Xing, Kairan Dou, Jinyu Tian, Ying Tai, Jian Yang, Ming-Ming Cheng, Xiang Li

Abstract: As the academic landscape expands, the challenge of efficiently identifying potentially high-impact articles among the vast number of newly published works becomes critical. This paper introduces a promising approach, leveraging the capabilities of fine-tuned LLMs to predict the future impact of newborn articles solely based on titles and abstracts. Moving beyond traditional methods heavily relian… ▽ More As the academic landscape expands, the challenge of efficiently identifying potentially high-impact articles among the vast number of newly published works becomes critical. This paper introduces a promising approach, leveraging the capabilities of fine-tuned LLMs to predict the future impact of newborn articles solely based on titles and abstracts. Moving beyond traditional methods heavily reliant on external information, the proposed method discerns the shared semantic features of highly impactful papers from a large collection of title-abstract and potential impact pairs. These semantic features are further utilized to regress an improved metric, TNCSI_SP, which has been endowed with value, field, and time normalization properties. Additionally, a comprehensive dataset has been constructed and released for fine-tuning the LLM, containing over 12,000 entries with corresponding titles, abstracts, and TNCSI_SP. The quantitative results, with an NDCG@20 of 0.901, demonstrate that the proposed approach achieves state-of-the-art performance in predicting the impact of newborn articles when compared to competitive counterparts. Finally, we demonstrate a real-world application for predicting the impact of newborn journal articles to demonstrate its noteworthy practical value. Overall, our findings challenge existing paradigms and propose a shift towards a more content-focused prediction of academic impact, offering new insights for assessing newborn article impact. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 7 pages for main sections, plus 3 additional pages for appendices. Code, dataset are released at https://sway.cloud.microsoft/KOH09sPR21Ubojbc

arXiv:2408.02863 [pdf, other]

Enhanced Superconducting Qubit Performance Through Ammonium Fluoride Etch

Authors: Cameron J. Kopas, Dominic P. Goronzy, Thang Pham, Carlos G. Torres Castanedo, Matthew Cheng, Rory Cochrane, Patrick Nast, Ella Lachman, Nikolay Z. Zhelev, Andre Vallieres, Akshay A. Murthy, Jin-su Oh, Lin Zhou, Matthew J. Kramer, Hilal Cansizoglu, Michael J. Bedzyk, Vinayak P. Dravid, Alexander Romanenko, Anna Grassellino, Josh Y. Mutus, Mark C. Hersam, Kameshwar Yadavalli

Abstract: The performance of superconducting qubits is often limited by dissipation and two-level systems (TLS) losses. The dominant sources of these losses are believed to originate from amorphous materials and defects at interfaces and surfaces, likely as a result of fabrication processes or ambient exposure. Here, we explore a novel wet chemical surface treatment at the Josephson junction-substrate and t… ▽ More The performance of superconducting qubits is often limited by dissipation and two-level systems (TLS) losses. The dominant sources of these losses are believed to originate from amorphous materials and defects at interfaces and surfaces, likely as a result of fabrication processes or ambient exposure. Here, we explore a novel wet chemical surface treatment at the Josephson junction-substrate and the substrate-air interfaces by replacing a buffered oxide etch (BOE) cleaning process with one that uses hydrofluoric acid followed by aqueous ammonium fluoride. We show that the ammonium fluoride etch process results in a statistically significant improvement in median $\text{T}_1$ by $\sim22\%$ ($p=0.002$), and a reduction in the number of strongly-coupled TLS in the tunable frequency range. Microwave resonator measurements on samples treated with the ammonium fluoride etch prior to niobium deposition also show $\sim33\%$ lower TLS-induced loss tangent compared to the BOE treated samples. As the chemical treatment primarily modifies the Josephson junction-substrate interface and substrate-air interface, we perform targeted chemical and structural characterizations to examine materials' differences at these interfaces and identify multiple microscopic changes that could contribute to decreased TLS. △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.02046 [pdf, other]

Chiral spin liquid in a generalized Kitaev honeycomb model with $\mathbb{Z}_4$ 1-form symmetry

Authors: Yu-Xin Yang, Meng Cheng, Ji-Yao Chen

Abstract: We explore a large $N$ generalization of the Kitaev model on the honeycomb lattice with a simple nearest-neighbor interacting Hamiltonian. In particular, we focus on the $\mathbb{Z}_4$ case with isotropic couplings, which is characterized by an exact $\mathbb{Z}_4$ one-form symmetry. Guided by symmetry considerations and an analytical study in the single chain limit, on the infinitely long cylinde… ▽ More We explore a large $N$ generalization of the Kitaev model on the honeycomb lattice with a simple nearest-neighbor interacting Hamiltonian. In particular, we focus on the $\mathbb{Z}_4$ case with isotropic couplings, which is characterized by an exact $\mathbb{Z}_4$ one-form symmetry. Guided by symmetry considerations and an analytical study in the single chain limit, on the infinitely long cylinders, we find the model is gapped with an extremely short correlation length. Combined with the $\mathbb{Z}_4$ one-form symmetry, this suggests the model is topologically ordered. To pin down the nature of this phase, we further study the model on both finite and infinitely long strips, where we consistently find a $c=1$ conformal field theory (CFT) description, suggesting the existence of chiral edge modes described by a free boson CFT. Further evidence is found by studying the dimer correlators on infinitely long strips. We find the dimer correlation functions show a power-law decay with the exponent close to 2 on the boundary of the strip, while decay much faster in the bulk. Combined with the topological entanglement entropy extracted from cylinder geometry, we identify the spin liquid is chiral and supports a $\mathrm{U}(1)_{-8}$ chiral topological order. A unified perspective for all $\mathbb{Z}_N$ type Kitaev models is also discussed. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: 10 pages, 5 figures

arXiv:2408.01695 [pdf]

Transformer for seismic image super-resolution

Authors: Shiqi Dong, Xintong Dong, Kaiyuan Zheng, Ming Cheng, Tie Zhong, Hongzhou Wang

Abstract: Seismic images obtained by stacking or migration are usually characterized as low signal-to-noise ratio (SNR), low dominant frequency and sparse sampling both in depth (or time) and offset dimensions. For improving the resolution of seismic images, we proposed a deep learning-based method to achieve super-resolution (SR) in only one step, which means performing the denoising, interpolation and fre… ▽ More Seismic images obtained by stacking or migration are usually characterized as low signal-to-noise ratio (SNR), low dominant frequency and sparse sampling both in depth (or time) and offset dimensions. For improving the resolution of seismic images, we proposed a deep learning-based method to achieve super-resolution (SR) in only one step, which means performing the denoising, interpolation and frequency extrapolation at the same time. We design a seismic image super-resolution Transformer (SIST) to extract and fuse local and global features, which focuses more on the energy and extension shapes of effective events (horizons, folds and faults, etc.) from noisy seismic images. We extract the edge images of input images by Canny algorithm as masks to generate the input data with double channels, which improves the amplitude preservation and reduces the interference of noises. The residual groups containing Swin-Transformer blocks and residual connections consist of the backbone of SIST, which extract the global features in a window with preset size and decrease computational cost meanwhile. The pixel shuffle layers are used to up-sample the output feature maps from the backbone to improve the edges, meanwhile up-sampling the input data through a skip connection to enhance the amplitude preservation of the final images especially for clarifying weak events. 3-dimensional synthetic seismic volumes with complex geological structures are created, and the amplitudes of half of the volumes are mixtures of strong and weak, then select 2-dimensional slices randomly to generate training datasets which fits field data well to perform supervised learning. Both numerical tests on synthetic and field data in different exploration regions demonstrate the feasibility of our method. △ Less

Submitted 3 August, 2024; originally announced August 2024.

arXiv:2407.16639 [pdf, other]

Distortion Recovery: A Two-Stage Method for Guitar Effect Removal

Authors: Ying-Shuo Lee, Yueh-Po Peng, Jui-Te Wu, Ming Cheng, Li Su, Yi-Hsuan Yang

Abstract: Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions… ▽ More Removing audio effects from electric guitar recordings makes it easier for post-production and sound editing. An audio distortion recovery model not only improves the clarity of the guitar sounds but also opens up new opportunities for creative adjustments in mixing and mastering. While progress have been made in creating such models, previous efforts have largely focused on synthetic distortions that may be too simplistic to accurately capture the complexities seen in real-world recordings. In this paper, we tackle the task by using a dataset of guitar recordings rendered with commercial-grade audio effect VST plugins. Moreover, we introduce a novel two-stage methodology for audio distortion recovery. The idea is to firstly process the audio signal in the Mel-spectrogram domain in the first stage, and then use a neural vocoder to generate the pristine original guitar sound from the processed Mel-spectrogram in the second stage. We report a set of experiments demonstrating the effectiveness of our approach over existing methods, through both subjective and objective evaluation metrics. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: DAFx 2024

arXiv:2407.11510 [pdf, other]

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Authors: Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang, Ming Li

Abstract: In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of… ▽ More In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted By InterSpeech2024

arXiv:2407.10974 [pdf, other]

doi 10.1093/mnras/stae1739

Age and metal gradients in massive quiescent galaxies at $0.6 \lesssim z \lesssim 1.0$: implications for quenching and assembly histories

Authors: Chloe M. Cheng, Mariska Kriek, Aliza G. Beverage, Arjen van der Wel, Rachel Bezanson, Francesco D'Eugenio, Marijn Franx, Pavel E. Mancera Piña, Angelos Nersesian, Martje Slob, Katherine A. Suess, Pieter G. van Dokkum, Po-Feng Wu, Anna Gallazzi, Stefano Zibetti

Abstract: We present spatially resolved, simple stellar population equivalent ages, stellar metallicities, and abundance ratios for 456 massive ($10.3\lesssim\log(\mathrm{M}_*/\mathrm{M}_\odot)\lesssim11.8$) quiescent galaxies at $0.6\lesssim z\lesssim1.0$ from the Large Early Galaxy Astrophysics Census, derived using full-spectrum models. Typically, we find flat age and [Mg/Fe] gradients, and negative [Fe/… ▽ More We present spatially resolved, simple stellar population equivalent ages, stellar metallicities, and abundance ratios for 456 massive ($10.3\lesssim\log(\mathrm{M}_*/\mathrm{M}_\odot)\lesssim11.8$) quiescent galaxies at $0.6\lesssim z\lesssim1.0$ from the Large Early Galaxy Astrophysics Census, derived using full-spectrum models. Typically, we find flat age and [Mg/Fe] gradients, and negative [Fe/H] gradients, implying iron-rich cores. We also estimate intrinsic [Fe/H] gradients via forward modelling. We examine the observed gradients in three age bins. Younger quiescent galaxies typically have negative [Fe/H] gradients and positive age gradients, possibly indicating a recent central starburst. Additionally, this finding suggests that photometrically measured flat colour gradients in young quiescent galaxies are the result of the positive age and negative metallicity gradients cancelling each other. For older quiescent galaxies, the age gradients become flat and [Fe/H] gradients weaken, though remain negative. Thus, negative colour gradients at older ages are likely driven by metallicity gradients. The diminishing age gradient may result from the starburst fading. Furthermore, the persistence of the [Fe/H] gradients may suggest that the outskirts are simultaneously built up by mergers with lower metallicity satellites. On the other hand, the gradients could be inherited from the star-forming phase, in which case mergers may not be needed to explain our findings. This work illustrates the need for resolved spectroscopy, instead of just photometry, to measure stellar population gradients. Extending these measurements to higher redshift is imperative for understanding how stellar populations in quiescent galaxies are assembled over cosmic time. △ Less

Submitted 23 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted for publication in MNRAS; minor typesetting corrections after copyediting

Report number: MN-24-1137-MJ

arXiv:2407.04557 [pdf, other]

Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patterns into materials generation remains a challenge. Here, we introduce Structural Constraint Integration in the GENerative model (SCIGEN). Our approach can modify any trained generative diffusion model by strategic masking of the denoised structure with a diffused constrained structure prior to each diffusion step to steer the generation toward constrained outputs. Furthermore, we mathematically prove that SCIGEN effectively performs conditional sampling from the original distribution, which is crucial for generating stable constrained materials. We generate eight million compounds using Archimedean lattices as prototype constraints, with over 10% surviving a multi-staged stability pre-screening. High-throughput density functional theory (DFT) on 26,000 survived compounds shows that over 50% passed structural optimization at the DFT level. Since the properties of quantum materials are closely related to geometric patterns, our results indicate that SCIGEN provides a general framework for generating quantum materials candidates. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 512 pages total, 4 main figures + 218 supplementary figures

arXiv:2407.04305 [pdf, other]

Towards Stable 3D Object Detection

Authors: Jiabao Wang, Qiang Meng, Guochao Liu, Liujiang Yan, Ke Wang, Ming-Ming Cheng, Qibin Hou

Abstract: In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of… ▽ More In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of confidence, box localization, extent, and heading. By benchmarking state-of-the-art object detectors on the Waymo Open Dataset, SI reveals interesting properties of object stability that have not been previously discovered by other metrics. To help models improve their stability, we further introduce a general and effective training strategy, called Prediction Consistency Learning (PCL). PCL essentially encourages the prediction consistency of the same objects under different timestamps and augmentations, leading to enhanced detection stability. Furthermore, we examine the effectiveness of PCL with the widely-used CenterPoint, and achieve a remarkable SI of 86.00 for vehicle class, surpassing the baseline by 5.48. We hope our work could serve as a reliable baseline and draw the community's attention to this crucial issue in 3D object detection. Codes will be made publicly available. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04179 [pdf, other]

Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Authors: Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

Abstract: Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed… ▽ More Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed. To fill this gap, this paper proposes a novel online defense algorithm that effectively counters syntax-based as well as special token-based backdoor attacks. The algorithm replaces semantically meaningful words in sentences with entirely different ones but preserves the syntactic templates or special tokens, and then compares the predicted labels before and after the substitution to determine whether a sentence contains triggers. Experimental results confirm the algorithm's performance against these two types of triggers, offering a comprehensive defense strategy for model integrity. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02556 [pdf, other]

Carbon and Iron Deficiencies in Quiescent Galaxies at z=1-3 from JWST-SUSPENSE: Implications for the Formation Histories of Massive Galaxies

Authors: Aliza G. Beverage, Martje Slob, Mariska Kriek, Charlie Conroy, Guillermo Barro, Rachel Bezanson, Gabriel Brammer, Chloe M. Cheng, Anna de Graaff, Natascha M. Förster Schreiber, Marijn Franx, Brian Lorenz, Pavel E. Mancera Piña, Danilo Marchesini, Adam Muzzin, Andrew B. Newman, Sedona H. Price, Alice E. Shapley, Mauro Stefanon, Katherine A. Suess, Pieter van Dokkum, David Weinberg, Daniel R. Weisz

Abstract: We present the stellar metallicities and multi-element abundances (C, Mg, Si, Ca, Ti, Cr, and Fe) of 15 massive (log M/M$_\odot$=10.2-11.2) quiescent galaxies at z=1-3, derived from ultradeep JWST-SUSPENSE spectra. Compared to quiescent galaxies at z~0, these galaxies exhibit a deficiency of 0.25 dex in [C/H], 0.16 dex in [Fe/H], and 0.07 dex in [Mg/H], implying rapid formation and quenching befor… ▽ More We present the stellar metallicities and multi-element abundances (C, Mg, Si, Ca, Ti, Cr, and Fe) of 15 massive (log M/M$_\odot$=10.2-11.2) quiescent galaxies at z=1-3, derived from ultradeep JWST-SUSPENSE spectra. Compared to quiescent galaxies at z~0, these galaxies exhibit a deficiency of 0.25 dex in [C/H], 0.16 dex in [Fe/H], and 0.07 dex in [Mg/H], implying rapid formation and quenching before significant enrichment from asymptotic giant branch stars and Type Ia supernovae. Additionally, we find that galaxies that form at higher redshift have higher [Mg/Fe] and lower [Fe/H] and [Mg/H], irrespective of their observed redshift. The evolution in [Fe/H] and [C/H] is therefore primarily explained by lower redshift samples naturally including galaxies with longer star-formation timescales. On the other hand, the lower [Mg/H] can be explained by galaxies forming at earlier epochs expelling larger gas reservoirs during their quenching phase. Consequently, the mass-metallicity relation, primarily reflecting [Mg/H], is also lower at z=1-3 compared to the lower redshift relation, though the slopes are similar. Finally, we compare our results to standard stellar population modeling approaches employing solar abundance patterns and non-parametric star-formation histories (using Prospector). Our SSP-equivalent ages agree with the mass-weighted ages from Prospector, while the metallicities disagree significantly. Nonetheless, the metallicities better reflect [Fe/H] than total [Z/H]. We also find that star-formation timescales inferred from elemental abundances are significantly shorter than those from Prospector, and we discuss the resulting implications for the early formation of massive galaxies. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Submitted to ApJ; 18 pages, 6 figures, 1 table

arXiv:2407.00256 [pdf, other]

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

MSC Class: 68T01

Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

arXiv:2406.17806 [pdf, other]

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts. As the initial step in investigating this behavior, we identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT). Empirical studies using MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages -- perception, intent reasoning, and safety judgement -- in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications. We make our project available at https://turningpoint-ai.github.io/MOSSBench/. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.11181 [pdf, other]

General Scintillation for Gaussian Beam Propagating through Oceanic Turbulence and UWOC System Performance Evaluation

Authors: Yuxuan Li, Xiang Yi, Xinyue Tao, Ata Yalçın, Mingjian Cheng, Lu Zhang

Abstract: In this paper, we derive a general and exact closed-form expression of scintillation index (SI) for a Gaussian beam propagating through weak oceanic turbulence, based on the general oceanic turbulence optical power spectrum (OTOPS) and the Rytov theory. Our universal expression not only includes existing Rytov variances but also accounts for actual cases where the Kolmogorov microscale is non-zero… ▽ More In this paper, we derive a general and exact closed-form expression of scintillation index (SI) for a Gaussian beam propagating through weak oceanic turbulence, based on the general oceanic turbulence optical power spectrum (OTOPS) and the Rytov theory. Our universal expression not only includes existing Rytov variances but also accounts for actual cases where the Kolmogorov microscale is non-zero. The correctness and accuracy of our derivation are verified through comparison with the published work under identical conditions. By utilizing our derived expressions, we analyze the impact of various beam, propagation and oceanic turbulence parameters on both SI and bit error rate (BER) performance of underwater wireless optical communication (UWOC) systems. Numerical results demonstrate that the relationship between the Kolmogorov microscale and SI is nonlinear. Additionally, considering that certain oceanic turbulence parameters are related to depth, we use temperature and salinity data from Argo buoy deployed in real oceans to investigate the dependence of SI on depth. Our findings will contribute to the design and optimization of UWOC systems. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.10777 [pdf, other]

RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning

Authors: Haoyu Wang, Tianci Liu, Ruirui Li, Monica Cheng, Tuo Zhao, Jing Gao

Abstract: Pre-trained language models, trained on large-scale corpora, demonstrate strong generalizability across various NLP tasks. Fine-tuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However… ▽ More Pre-trained language models, trained on large-scale corpora, demonstrate strong generalizability across various NLP tasks. Fine-tuning these models for specific tasks typically involves updating all parameters, which is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, such as the popular LoRA family, introduce low-rank matrices to learn only a few parameters efficiently. However, during inference, the product of these matrices updates all pre-trained parameters, complicating tasks like knowledge editing that require selective updates. We propose a novel PEFT method, which conducts \textbf{r}ow and c\textbf{o}lumn-wise spar\textbf{se} \textbf{lo}w-\textbf{r}ank \textbf{a}daptation (RoseLoRA), to address this challenge. RoseLoRA identifies and updates only the most important parameters for a specific task, maintaining efficiency while preserving other model knowledge. By adding a sparsity constraint on the product of low-rank matrices and converting it to row and column-wise sparsity, we ensure efficient and precise model updates. Our theoretical analysis guarantees the lower bound of the sparsity with respective to the matrix product. Extensive experiments on five benchmarks across twenty datasets demonstrate that RoseLoRA outperforms baselines in both general fine-tuning and knowledge editing tasks. △ Less

Submitted 30 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08556 [pdf]

doi 10.1038/s41467-024-49261-6

Macroscopic Tunneling Probe of Moiré Spin Textures in Twisted CrI$_3$

Authors: Bowen Yang, Tarun Patel, Meixin Cheng, Kostyantyn Pichugin, Lin Tian, Nachiket Sherlekar, Shaohua Yan, Yang Fu, Shangjie Tian, Hechang Lei, Michael E. Reimer, Junichi Okamoto, Adam W. Tsen

Abstract: Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evide… ▽ More Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evidence of coexisting FM and AFM layer order in small-twist-angle CrI$_3$ bilayers and double bilayers. Yet, the nature of the magnetic textures remains unresolved and possibilities for their manipulation and electrical readout are unexplored. Here, we use tunneling magnetoresistance to investigate the collective spin states of twisted double-bilayer CrI$_3$ under both out-of-plane and in-plane magnetic fields together with detailed micromagnetic simulations of domain dynamics based on magnetic circular dichroism. Our results capture hysteretic and anisotropic field evolutions of the magnetic states and we further uncover two distinct non-volatile spin textures (out-of-plane and in-plane domains) at $\approx$ 1° twist angle, with a different global tunneling resistance that can be switched by magnetic field. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages, 5 figures

arXiv:2406.04727 [pdf, other]

MMPolymer: A Multimodal Multitask Pretraining Framework for Polymer Property Prediction

Authors: Fanmeng Wang, Wentao Guo, Minjie Cheng, Shen Yuan, Hongteng Xu, Zhifeng Gao

Abstract: Polymers are high-molecular-weight compounds constructed by the covalent bonding of numerous identical or similar monomers so that their 3D structures are complex yet exhibit unignorable regularity. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, existing polymer property prediction method… ▽ More Polymers are high-molecular-weight compounds constructed by the covalent bonding of numerous identical or similar monomers so that their 3D structures are complex yet exhibit unignorable regularity. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, existing polymer property prediction methods heavily rely on the information learned from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, resulting in sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating polymer 1D sequential and 3D structural information to encourage downstream polymer property prediction tasks. Besides, considering the scarcity of polymer 3D data, we further introduce the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, in addition to predicting masked tokens and recovering clear 3D coordinates, MMPolymer achieves the cross-modal alignment of latent representations. Then we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experiments show that MMPolymer achieves state-of-the-art performance in downstream property prediction tasks. Moreover, given the pretrained MMPolymer, utilizing merely a single modality in the fine-tuning phase can also outperform existing methods, showcasing the exceptional capability of MMPolymer in polymer feature extraction and utilization. △ Less

Submitted 26 July, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

Comments: Accepted by the 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024)

arXiv:2406.02965 [pdf, other]

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative prompts take effect. Our extensive empirical analysis identifies two primary behaviors of negative prompts. Delayed Effect: The impact of negative prompts is observed after positive prompts render corresponding content. Deletion Through Neutralization: Negative prompts delete concepts from the generated image through a mutual cancellation effect in latent space with positive prompts. These insights reveal significant potential real-world applications; for example, we demonstrate that negative prompts can facilitate object inpainting with minimal alterations to the background via a simple adaptive algorithm. We believe our findings will offer valuable insights for the community in capitalizing on the potential of negative prompts. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01970 [pdf, other]

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positions, seeds, and prompts. To be specific, extracting these patches from one noise and injecting them into another noise leads to object generation in targeted areas. We identify these patches by analyzing the dispersion of object bounding boxes across generated images, leading to the development of a posterior analysis technique. Furthermore, we create a dataset consisting of Gaussian noises labeled with bounding boxes corresponding to the objects appearing in the generated images and train a detector that identifies these patches from the initial noise. To explain the formation of these patches, we reveal that they are outliers in Gaussian noise, and follow distinct distributions through two-sample tests. Finally, we find the misalignment between prompts and the trigger patch patterns can result in unsuccessful image generations. The study proposes a reject-sampling strategy to obtain optimal noise, aiming to improve prompt adherence and positional diversity in image generation. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00816 [pdf, other]

Invisible Backdoor Attacks on Diffusion Models

Authors: Sen Li, Junchi Ma, Minhao Cheng

Abstract: In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of… ▽ More In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of diffusion models to backdoor attacks, enabling the generation of specific target images through corresponding triggers. However, prevailing backdoor attack methods rely on manually crafted trigger generation functions, often manifesting as discernible patterns incorporated into input noise, thus rendering them susceptible to human detection. In this paper, we present an innovative and versatile optimization framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors. Our proposed framework is applicable to both unconditional and conditional diffusion models, and notably, we are the pioneers in demonstrating the backdooring of diffusion models within the context of text-guided image editing and inpainting pipelines. Moreover, we also show that the backdoors in the conditional generation can be directly applied to model watermarking for model ownership verification, which further boosts the significance of the proposed framework. Extensive experiments on various commonly used samplers and datasets verify the efficacy and stealthiness of the proposed framework. Our code is publicly available at https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion

arXiv:2406.00670 [pdf, other]

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Authors: Yunheng Li, ZhongYu Li, Quansheng Zeng, Qibin Hou, Ming-Ming Cheng

Abstract: Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual fea… ▽ More Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual features weakens the zero-shot ability for novel classes. The large differences between the visual features from different layers make these features hard to align well with the text embeddings. We resolve this problem by introducing a series of independent decoders to align the multi-level visual features with the text embeddings in a cascaded way, forming a novel but simple framework named Cascade-CLIP. Our Cascade-CLIP is flexible and can be easily applied to existing zero-shot semantic segmentation methods. Experimental results show that our simple Cascade-CLIP achieves superior zero-shot performance on segmentation benchmarks, like COCO-Stuff, Pascal-VOC, and Pascal-Context. Our code is available at: https://github.com/HVision-NKU/Cascade-CLIP △ Less

Submitted 6 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2405.20396 [pdf, other]

Using the COSMIC Population Synthesis Code to Investigate How Metallicity Affects the Rates of Interacting Binaries

Authors: Ayanah L. Cason, Nicole M. Lloyd-Ronning, Roseanne M. Cheng

Abstract: We use COSMIC, a galaxy population synthesis code, to investigate how metallicity affects the rate of formation of massive stars with a closely orbiting compact object companion, the suggested progenitors of radio loud long gamma-ray bursts. We present the evolution time of these systems at different metallicities, and how the formation rates of these systems are anti-correlated with metallicity.… ▽ More We use COSMIC, a galaxy population synthesis code, to investigate how metallicity affects the rate of formation of massive stars with a closely orbiting compact object companion, the suggested progenitors of radio loud long gamma-ray bursts. We present the evolution time of these systems at different metallicities, and how the formation rates of these systems are anti-correlated with metallicity. In particular, these systems occur about 10 times more frequently in at metallicities between $Z = 2\times 10^{-4}$ and $2 \times 10^{-3}$, compared to those between $Z = 2\times 10^{-3}$ and $2 \times 10^{-2}$. This work serves as a prerequisite to predicting the global rates of these systems as a function of redshift, ultimately giving crucial insight into our understanding of the progenitors of long gamma-ray bursts and their evolution over cosmic time. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: submitted to RNAAS

arXiv:2405.18991 [pdf, other]

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the production of consistent frames and seamless motion transitions. The motion module can be adapted to various DiT baseline methods to generate video with different styles. It can also generate videos with different frame rates and resolutions during both training and inference phases, suitable for both images and videos. Moreover, we introduce slice VAE, a novel approach to condense the temporal axis, facilitating the generation of long duration videos. Currently, EasyAnimate exhibits the proficiency to generate videos with 144 frames. We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training (both the baseline model and LoRA model), and end-to-end video inference. Code is available at: https://github.com/aigc-apps/EasyAnimate. We are continuously working to enhance the performance of our method. △ Less

Submitted 5 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 8 pages, 6 figures

arXiv:2405.11430 [pdf, other]

MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Authors: Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

Abstract: Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and fo… ▽ More Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and found that these might not thoroughly evaluate LLMs' code generation capacities due to limitations in quality, difficulty, and granularity. To resolve this, we introduce the Mostly Hard Python Problems (MHPP) dataset, consisting of 140 unique human-curated problems. By focusing on the combination of natural language and code reasoning, MHPP gauges LLMs' abilities to comprehend specifications and restrictions, engage in multi-step reasoning, and apply coding knowledge effectively. Initial evaluations of 22 LLMs using MHPP showed many high-performing models on HumanEval failed to achieve similar success on MHPP. Moreover, MHPP highlighted various previously undiscovered limitations within various LLMs, leading us to believe that it could pave the way for a better understanding of LLMs' capabilities and limitations. Dataset and code are available at https://github.com/SparksofAGI/MHPP. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

arXiv:2405.11137 [pdf, ps, other]

Slow entropy and variational dynamical systems

Authors: Minhua Cheng, Carlos Ospina, Kurt Vinhage, Yibo Zhai

Abstract: We define variational properties for dynamical systems with subexponential complexity, and study these properties in certain specific examples. By computing the value of slow entropy directly, we show that Sturmian systems are not variational, while a class of interval exchange transformations are variational We define variational properties for dynamical systems with subexponential complexity, and study these properties in certain specific examples. By computing the value of slow entropy directly, we show that Sturmian systems are not variational, while a class of interval exchange transformations are variational △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.11028 [pdf, other]

Simulations of Interacting Binary Systems -- Pathways to Radio Bright GRB Progenitors

Authors: Angel Hernandez, Roseanne M. Cheng, Nicole M. Lloyd-Ronning, Carl E. Fields

Abstract: Although the association of gamma-ray bursts (GRBs) with massive stellar death is on firm footing, the nature of the progenitor system and the key ingredients required for a massive star to produce a gamma-ray burst remain open questions. Here, we investigate the evolution of a massive star with a closely orbiting compact object companion using the stellar evolution code MESA. In particular, we ex… ▽ More Although the association of gamma-ray bursts (GRBs) with massive stellar death is on firm footing, the nature of the progenitor system and the key ingredients required for a massive star to produce a gamma-ray burst remain open questions. Here, we investigate the evolution of a massive star with a closely orbiting compact object companion using the stellar evolution code MESA. In particular, we examine how the companion influences the angular momentum and circumstellar environment near the end of the massive star life. We find that tidal effects can cause the compact object companion to significantly increase the angular momentum of the massive star, for orbital periods in the range of up to $\sim 4$ days. We model the density profile evolution of the massive star and discuss how tidal interactions may also lead to stripping of the outer stellar envelope in a way that can create an environment around the binary system that deviates from a typical $1/r^{2}$ wind density profile. We show how our results depend on the metallicity of the system, initial spin of the star, mass ratio, as well as accretion and dynamo prescriptions in the simulations. We conclude that these systems may be viable progenitors for radio-bright, long gamma-ray bursts. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Submitted to ApJ - comments welcome

Report number: LA-UR-24-22983

arXiv:2405.06975 [pdf, other]

Input Snapshots Fusion for Scalable Discrete Dynamic Graph Nerual Networks

Authors: QingGuo Qi, Hongyang Chen, Minhao Cheng, Han Liu

Abstract: Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an I… ▽ More Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an Input {\bf S}napshots {\bf F}usion based {\bf Dy}namic {\bf G}raph Neural Network (SFDyG). By eliminating the partitioning of snapshots within the input window, we obtain a multi-graph (more than one edge between two nodes). Subsequently, by introducing a graph denoising problem with the assumption of temporal decayed smoothing, we integrate Hawkes process theory into Graph Neural Networks to model the generated multi-graph. Furthermore, based on the multi-graph, we propose a scalable three-step mini-batch training method and demonstrate its equivalence to full-batch training counterpart. Our experiments, conducted on eight distinct dynamic graph datasets for future link prediction tasks, revealed that SFDyG generally surpasses related methods. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06388 [pdf, other]

Recovery of transversely-isotropic elastic material parameters in induction motor rotors

Authors: Hanz Martin Cheng, Tapio Helin, Ville-Petteri Manninen, Timo Holopainen, Juha Jokinen, Samu Sorvari, Andreas Rupp

Abstract: We propose numerical algorithms for recovering parameters in eigenvalue problems for linear elasticity of transversely isotropic materials. Specifically, the algorithms are used to recover the elastic constants of a rotor core. Numerical tests show that in the noiseless setup, two pairs of bending modes are sufficient for recovering one to four parameters accurately. To recover all five parameters… ▽ More We propose numerical algorithms for recovering parameters in eigenvalue problems for linear elasticity of transversely isotropic materials. Specifically, the algorithms are used to recover the elastic constants of a rotor core. Numerical tests show that in the noiseless setup, two pairs of bending modes are sufficient for recovering one to four parameters accurately. To recover all five parameters that govern the elastic properties of electric engines accurately, we require three pairs of bending modes and one torsional mode. Moreover, we study the stability of the inversion method against multiplicative noise; for tests in which the data contained multiplicative noise of at most $1\%$, we find that all parameters can be recovered with an error less than $10\%$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

MSC Class: 65Z05; 65C20

arXiv:2405.03639 [pdf, other]

Strong-to-Weak Spontaneous Symmetry Breaking in Mixed Quantum States

Authors: Leonardo A. Lessa, Ruochen Ma, Jian-Hao Zhang, Zhen Bi, Meng Cheng, Chong Wang

Abstract: Symmetry in mixed quantum states can manifest in two distinct forms: \textit{strong symmetry}, where each individual pure state in the quantum ensemble is symmetric with the same charge, and \textit{weak symmetry}, which applies only to the entire ensemble. This paper explores a novel type of spontaneous symmetry breaking (SSB) where a strong symmetry is broken to a weak one. While the SSB of a we… ▽ More Symmetry in mixed quantum states can manifest in two distinct forms: \textit{strong symmetry}, where each individual pure state in the quantum ensemble is symmetric with the same charge, and \textit{weak symmetry}, which applies only to the entire ensemble. This paper explores a novel type of spontaneous symmetry breaking (SSB) where a strong symmetry is broken to a weak one. While the SSB of a weak symmetry is measured by the long-ranged two-point correlation function $\mathrm{Tr}(O_xO^{\dagger}_yρ)$, the strong-to-weak SSB (SW-SSB) is measured by the fidelity $F(ρ, O_xO^{\dagger}_yρO_yO^{\dagger}_x)$, dubbed the \textit{fidelity correlator}. We prove that SW-SSB is a universal property of mixed-state quantum phases, in the sense that the phenomenon of SW-SSB is robust against symmetric low-depth local quantum channels. { We also show that the symmetry breaking is "spontaneous " in the sense that the effect of a local symmetry-breaking measurement cannot be recovered locally.} We argue that a thermal state at a nonzero temperature in the canonical ensemble (with fixed symmetry charge) should have spontaneously broken strong symmetry. Additionally, we study non-thermal scenarios where decoherence induces SW-SSB, leading to phase transitions described by classical statistical models with bond randomness. In particular, the SW-SSB transition of a decohered Ising model can be viewed as the "ungauged" version of the celebrated toric code decodability transition. We confirm that, in the decohered Ising model, the SW-SSB transition defined by the fidelity correlator is the only physical transition in terms of channel recoverability. We also comment on other (inequivalent) definitions of SW-SSB, through correlation functions with higher Rényi indices. △ Less

Submitted 3 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 17+6 pages, 4 figures

arXiv:2405.02390 [pdf, other]

Towards a classification of mixed-state topological orders in two dimensions

Authors: Tyler Ellison, Meng Cheng

Abstract: The classification and characterization of topological phases of matter is well understood for ground states of gapped Hamiltonians that are well isolated from the environment. However, decoherence due to interactions with the environment is inevitable -- thus motivating the investigation of topological orders in the context of mixed states. Here, we take a step toward classifying mixed-state topo… ▽ More The classification and characterization of topological phases of matter is well understood for ground states of gapped Hamiltonians that are well isolated from the environment. However, decoherence due to interactions with the environment is inevitable -- thus motivating the investigation of topological orders in the context of mixed states. Here, we take a step toward classifying mixed-state topological orders in two spatial dimensions by considering their (emergent) generalized symmetries. We argue that their 1-form symmetries and the associated anyon theories lead to a partial classification under two-way connectivity by quasi-local quantum channels. This allows us to establish mixed-state topological orders that are intrinsically mixed, i.e., that have no ground state counterpart. We provide a wide range of examples based on topological subsystem codes, decohering $G$-graded string-net models, and "classically gauging" symmetry-enriched topological orders. One of our main examples is an Ising string-net model under the influence of dephasing noise. We study the resulting space of locally-indistinguishable states and compute the modular transformations within a particular coherent space. Based on our examples, we identify two possible effects of quasi-local quantum channels on anyon theories: (1) anyons can be incoherently proliferated -- thus reducing to a commutant of the proliferated anyons, or (2) the system can be "classically gauged", resulting in the symmetrization of anyons and an extension by transparent bosons. Given these two mechanisms, we conjecture that mixed-state topological orders are classified by premodular anyon theories, i.e., those for which the braiding relations may be degenerate. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 33+10 pages, 9 figures

arXiv:2405.01434 [pdf, other]

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

Abstract: For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent… ▽ More For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner. To extend our method to long-range video generation, we further introduce a novel semantic space temporal motion prediction module, named Semantic Motion Predictor. It is trained to estimate the motion conditions between two provided images in the semantic spaces. This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation. By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos encompassing a rich variety of contents. The proposed StoryDiffusion encompasses pioneering explorations in visual story generation with the presentation of images and videos, which we hope could inspire more research from the aspect of architectural modifications. Our code is made publicly available at https://github.com/HVision-NKU/StoryDiffusion. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00390 [pdf, other]

CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Authors: Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen

Abstract: Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed… ▽ More Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well. △ Less

Submitted 20 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: ACL 2024

arXiv:2404.16646 [pdf, other]

Improving TAS Adaptability with a Variable Temperature Threshold

Authors: Anthony Dowling, Ming-Cheng Cheng, Yu Liu

Abstract: Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution. These methods aim to avoid issues such as accelerated aging of the device, premature failure and degraded chip performance. In this work, we implement a new TAS algorithm, VTF-TAS, which makes use of a variable temperature threshold to control task execution and thermal diss… ▽ More Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution. These methods aim to avoid issues such as accelerated aging of the device, premature failure and degraded chip performance. In this work, we implement a new TAS algorithm, VTF-TAS, which makes use of a variable temperature threshold to control task execution and thermal dissipation. To enable adequate execution of the tasks to reach their deadlines, this threshold is managed based on the theory of fluid scheduling. Using an evaluation methodology as described in POD-TAS, we evaluate VTF-TAS using a set of 4 benchmarks from the COMBS benchmark suite to examine its ability to minimize chip temperature throughout schedule execution. Through our evaluation, we demonstrate that this new algorithm is able to adaptively manage the temperature threshold such that the peak temperature during schedule execution is lower than POD-TAS, with no requirement for an expensive search procedure to obtain an optimal threshold for scheduling. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13312 [pdf]

Seismic Interpolation Transformer for Consecutively Missing Data: A Case Study in DAS-VSP Data

Authors: Ming Cheng, Jun Lin, Xintong Dong, Shaoping Lu, Tie Zhong

Abstract: Distributed optical fiber acoustic sensing (DAS) is a rapidly-developed seismic acquisition technology with advantages of low cost, high resolution, high sensitivity, and small interval, etc. Nonetheless, consecutively missing cases often appear in real seismic data acquired by DAS system due to some factors, including optical fiber damage and inferior coupling between cable and well. Recently, so… ▽ More Distributed optical fiber acoustic sensing (DAS) is a rapidly-developed seismic acquisition technology with advantages of low cost, high resolution, high sensitivity, and small interval, etc. Nonetheless, consecutively missing cases often appear in real seismic data acquired by DAS system due to some factors, including optical fiber damage and inferior coupling between cable and well. Recently, some deep-learning seismic interpolation methods based on convolutional neural network (CNN) have shown impressive performance in regular and random missing cases but still remain the consecutively missing case as a challenging task. The main reason is that the weight sharing makes it difficult for CNN to capture enough comprehensive features. In this paper, we propose a transformer-based interpolation method, called seismic interpolation transformer (SIT), to deal with the consecutively missing case. This proposed SIT is an encoder-decoder structure connected by some U-shaped swin-transformer blocks. In encoder and decoder part, the multi-head self-attention (MSA) mechanism is used to capture global features which is essential for the reconstruction of consecutively missing traces. The U-shaped swin-transformer blocks are utilized to perform feature extraction operations on feature maps with different resolutions. Moreover, we combine the loss based on structural similarity index (SSIM) and L1 norm to propose a novel loss function for SIT. In experiments, this proposed SIT outperforms U-Net and swin-transformer. Moreover, ablation studies also demonstrate the advantages of new network architecture and loss function. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.12605 [pdf, other]

GluMarker: A Novel Predictive Modeling of Glycemic Control Through Digital Biomarkers

Authors: Ziyi Zhou, Ming Cheng, Xingjian Diao, Yanjun Cui, Xiangling Li

Abstract: The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited… ▽ More The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited attention given to overall glycemic control. This leaves a gap in expanding the scope of digital biomarkers for overall glycemic control in diabetes management. To address such a research gap, we propose GluMarker -- an end-to-end framework for modeling digital biomarkers using broader factors sources to predict glycemic control. Through the assessment and refinement of various machine learning baselines, GluMarker achieves state-of-the-art on Anderson's dataset in predicting next-day glycemic control. Moreover, our research identifies key digital biomarkers for the next day's glycemic control prediction. These identified biomarkers are instrumental in illuminating the daily factors that influence glycemic management, offering vital insights for diabetes care. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12432 [pdf, other]

The JWST-SUSPENSE Ultradeep Spectroscopic Program: Survey Overview and Star-Formation Histories of Quiescent Galaxies at 1 < z < 3

Authors: Martje Slob, Mariska Kriek, Aliza G. Beverage, Katherine A. Suess, Guillermo Barro, Rachel Bezanson, Gabriel Brammer, Chloe M. Cheng, Charlie Conroy, Anna de Graaff, Natascha M. Förster Schreiber, Marijn Franx, Brian Lorenz, Pavel E. Mancera Piña, Danilo Marchesini, Adam Muzzin, Andrew B. Newman, Sedona H. Price, Alice E. Shapley, Mauro Stefanon, Pieter van Dokkum, Daniel R. Weisz

Abstract: We present an overview and first results from the Spectroscopic Ultradeep Survey Probing Extragalactic Near-infrared Stellar Emission (SUSPENSE), executed with NIRSpec on JWST. The primary goal of the SUSPENSE program is to characterize the stellar, chemical, and kinematic properties of massive quiescent galaxies at cosmic noon. In a single deep NIRSpec/MSA configuration, we target 20 distant quie… ▽ More We present an overview and first results from the Spectroscopic Ultradeep Survey Probing Extragalactic Near-infrared Stellar Emission (SUSPENSE), executed with NIRSpec on JWST. The primary goal of the SUSPENSE program is to characterize the stellar, chemical, and kinematic properties of massive quiescent galaxies at cosmic noon. In a single deep NIRSpec/MSA configuration, we target 20 distant quiescent galaxy candidates ($z=1-3$, $H_{AB}\le23$), as well as 53 star-forming galaxies at $z=1-4$. With 16~hr of integration and the G140M-F100LP dispersion-filter combination, we observe numerous Balmer and metal absorption lines for all quiescent candidates. We derive stellar masses (log$M_*/M_{\odot}\sim10.2-11.5$) and detailed star-formation histories (SFHs) and show that all 20 candidate quiescent galaxies indeed have quenched stellar populations. These galaxies show a variety of mass-weighted ages ($0.8-3.3$~Gyr) and star formation timescales ($\sim0.5-4$~Gyr), and four out of 20 galaxies were already quiescent by $z=3$. On average, the $z>1.75$ $[z<1.75]$ galaxies formed 50\% of their stellar mass before $z=4$ $[z=3]$. Furthermore, the typical SFHs of galaxies in these two redshift bins ($z_{\text{mean}}=2.2~[1.3]$) indicate that galaxies at higher redshift formed earlier and over shorter star-formation timescales compared to lower redshifts. Although this evolution is naturally explained by the growth of the quiescent galaxy population over cosmic time, number density calculations imply that mergers and/or late-time star formation also contribute to the evolution. In future work, we will further unravel the early formation, quenching, and late-time evolution of these galaxies by extending this work with studies on their chemical abundances, resolved stellar populations and kinematics. △ Less

Submitted 18 July, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted in ApJ; 25 pages, 14 figures, 2 tables (excluding appendices)

arXiv:2404.12400 [pdf, other]

Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of a multi-scale k-nearest neighbors (KNN) algorithm with feature fusion for graph construction, marking a leap in dimensionality reduction techniques by preserving essential data features. Moreover, the groundbreaking graph construction mechanism and the high-performance lightweight GCN increase embedding extraction speed by up to 36 times faster. We further offer Efflex in two versions, Efflex-L for scenarios demanding high accuracy, and Efflex-B for environments requiring swift data processing. Comprehensive experimentation with the Porto and Geolife datasets validates our approach, positioning Efflex as the state-of-the-art in the domain. Such enhancements in speed and accuracy highlight the versatility of Efflex, underscoring its wide-ranging potential for deployment in time-sensitive and computationally constrained applications. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.11924 [pdf, other]

Toward Short-Term Glucose Prediction Solely Based on CGM Time Series

Authors: Ming Cheng, Xingjian Diao, Ziyi Zhou, Yanjun Cui, Wenjun Liu, Shitong Cheng

Abstract: The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to… ▽ More The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to respond to immediate glucose level changes cannot analyze glucose variability comprehensively. Moreover, contemporary research generally integrates various physiological parameters (e.g. insulin doses, food intake, etc.), which inevitably raises data privacy concerns. To bridge such a research gap, we propose TimeGlu -- an end-to-end pipeline for short-term glucose prediction solely based on CGM time series data. We implement four baseline methods to conduct a comprehensive comparative analysis of the model's performance. Through extensive experiments on two contrasting datasets (CGM Glucose and Colas dataset), TimeGlu achieves state-of-the-art performance without the need for additional personal data from patients, providing effective guidance for real-world diabetic glucose management. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10901 [pdf, other]

CrossGP: Cross-Day Glucose Prediction Excluding Physiological Information

Authors: Ziyi Zhou, Ming Cheng, Yanjun Cui, Xingjian Diao, Zhaorui Ma

Abstract: The increasing number of diabetic patients is a serious issue in society today, which has significant negative impacts on people's health and the country's financial expenditures. Because diabetes may develop into potential serious complications, early glucose prediction for diabetic patients is necessary for timely medical treatment. Existing glucose prediction methods typically utilize patients'… ▽ More The increasing number of diabetic patients is a serious issue in society today, which has significant negative impacts on people's health and the country's financial expenditures. Because diabetes may develop into potential serious complications, early glucose prediction for diabetic patients is necessary for timely medical treatment. Existing glucose prediction methods typically utilize patients' private data (e.g. age, gender, ethnicity) and physiological parameters (e.g. blood pressure, heart rate) as reference features for glucose prediction, which inevitably leads to privacy protection concerns. Moreover, these models generally focus on either long-term (monthly-based) or short-term (minute-based) predictions. Long-term prediction methods are generally inaccurate because of the external uncertainties that can greatly affect the glucose values, while short-term ones fail to provide timely medical guidance. Based on the above issues, we propose CrossGP, a novel machine-learning framework for cross-day glucose prediction solely based on the patient's external activities without involving any physiological parameters. Meanwhile, we implement three baseline models for comparison. Extensive experiments on Anderson's dataset strongly demonstrate the superior performance of CrossGP and prove its potential for future real-life applications. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.09419 [pdf]

Predicting Accurate Hot Spots in a More Than Ten-Thousand-Core GPU with a Million-Time Speedup over FEM Enabled by a Physics-based Learning Algorithm

Authors: Lin Jian, Yu Liu, Ming-Cheng Cheng

Abstract: The classical proper orthogonal decomposition (POD) with the Galerkin projection (GP) has been revised for chip-level thermal simulation of microprocessors with a large number of cores. An ensemble POD-GP methodology (EnPOD-GP) is introduced to significantly improve the training effectiveness and prediction accuracy by dividing a large number of heat sources into heat source blocks (HSBs) each of… ▽ More The classical proper orthogonal decomposition (POD) with the Galerkin projection (GP) has been revised for chip-level thermal simulation of microprocessors with a large number of cores. An ensemble POD-GP methodology (EnPOD-GP) is introduced to significantly improve the training effectiveness and prediction accuracy by dividing a large number of heat sources into heat source blocks (HSBs) each of which may contains one or a very small number of heat sources. Although very accurate, efficient and robust to any power map, EnPOD-GP suffers from intensive training for microprocessors with an enormous number of cores. A local-domain EnPOD-GP model (LEnPOD-GP) is thus proposed to further minimize the training burden. LEnPOD-GP utilizes the concepts of local domain truncation and generic building blocks to reduce the massive training data. LEnPOD-GP has been demonstrated on thermal simulation of NVIDIA Tesla Volta GV100, a GPU with more than 13,000 cores including FP32, FP64, INT32, and Tensor Cores. Due to the domain truncation for LEnPOD-GP, the least square error (LSE) is degraded but is still as small as 1.6% over the entire space and below 1.4% in the device layer when using 4 modes per HSB. When only the maximum temperature of the entire GPU is of interest, LEnPOD-GP offers a computing speed 1.1 million times faster than the FEM with a maximum error near 1.2 degrees over the entire simulation time. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 8 pages, 8 figures

arXiv:2404.09403 [pdf, other]

Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most tra… ▽ More Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most traditional fusion models that incorporate all modalities identically in neural networks, our model designates a prime modality and regards the remaining modalities as detectors in the information pathway, serving to distill the flow of information. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of multimodal representation learning. Experimental evaluations on the MUStARD, CMU-MOSI, and CMU-MOSEI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. Remarkably, on the CMU-MOSI dataset, ITHP surpasses human-level performance in the multimodal sentiment binary classification task across all evaluation metrics (i.e., Binary Accuracy, F1 Score, Mean Absolute Error, and Pearson Correlation). △ Less

Submitted 22 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted by ICLR 2024. Camera Ready Version

Showing 1–50 of 737 results for author: Cheng, M