Search | arXiv e-print repository

No evidence for a significant evolution of $M_{\bullet}$-$M_*$ relation up to z$\sim$4

Authors: Yang Sun, Jianwei Lyu, George H. Rieke, Zhiyuan Ji, Fengwu Sun, Yongda Zhu, Andrew J. Bunker, Phillip A. Cargile, Chiara Circosta, Francesco D'Eugenio, Eiichi Egami, Kevin Hainline, Jakob M. Helton, Pierluigi Rinaldi, Brant E. Robertson, Jan Scholtz, Irene Shivaei, Meredith A. Stone, Sandro Tacchella, Christina C. Williams, Christopher N. A. Willmer, Chris Willott

Abstract: Over the past two decades, tight correlations between black hole masses ($M_\bullet$) and their host galaxy properties have been firmly established at low-$z$ ($z<1$), indicating coevolution of supermassive black holes and galaxies. However, the situation at high-$z$, especially beyond cosmic noon ($z\gtrsim2.5$), is controversial. With a combination of \emph{JWST} NIRCam/wide field slitless spect… ▽ More Over the past two decades, tight correlations between black hole masses ($M_\bullet$) and their host galaxy properties have been firmly established at low-$z$ ($z<1$), indicating coevolution of supermassive black holes and galaxies. However, the situation at high-$z$, especially beyond cosmic noon ($z\gtrsim2.5$), is controversial. With a combination of \emph{JWST} NIRCam/wide field slitless spectroscopy (WFSS) from FRESCO, CONGRESS and deep multi-band NIRCam/image data from JADES in the GOODS fields, we study the black hole to galaxy mass relation at z$\sim$1--4. After identifying 18 broad-line active galactic nuclei (BL AGNs) at $0.9<z<3.6$ (with 8 at $z>2.5$) from the WFSS data, we measure their black hole masses based on broad near-infrared lines (Pa $α$, Pa $β$, and He\,I $λ$10833\,Å), and constrain their stellar masses ($M_{*}$) from AGN-galaxy image decomposition or SED decomposition. Taking account of the observational biases, the intrinsic scatter of the $M_{\bullet}-M_{*}$ relation, and the errors in mass measurements, we find no significant difference in the $M_{\bullet}/M_{*}$ ratio for 2.5 $< $ z $ <$ 3.6 compared to that at lower redshifts ($1 < z < 2.5$), suggesting no evolution of the $M_{\bullet} - M_{*}$ relation up to z$\sim$4. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 21 pages, 11 figures, submitted to AAS Journals

arXiv:2409.05180 [pdf, other]

The Effect of Radiation and Supernovae Feedback on LyC Escape in Local Star-forming Galaxies

Authors: Cody A. Carr, Renyue Cen, Claudia Scarlata, Xinfeng Xu, Alaina Henry, Rui Marques-Chaves, Daniel Schaerer, Ricardo O. Amorín, M. S. Oey, Lena Komarova, Sophia Flury, Anne Jaskot, Alberto Saldana-Lopez, Zhiyuan Ji, Mason Huberty, Timothy Heckman, Göran Ostlin, Omkar Bait, Matthew James Hayes, Trinh Thuan, Danielle A. Berg, Mauro Giavalisco, Sanchayeeta Borthakur, John Chisholm, Harry C. Ferguson , et al. (3 additional authors not shown)

Abstract: Feedback is widely recognized as an essential condition for Lyman continuum (LyC) escape in star-forming galaxies. However, the mechanisms by which galactic outflows clear neutral gas and dust remain unclear. In this paper, we model the Mg II 2796Å, 2804Å absorption + emission lines in 29 galaxies taken from the Low-z LyC Survey (LzLCS) to investigate the impact of (radiation + mechanical) feedbac… ▽ More Feedback is widely recognized as an essential condition for Lyman continuum (LyC) escape in star-forming galaxies. However, the mechanisms by which galactic outflows clear neutral gas and dust remain unclear. In this paper, we model the Mg II 2796Å, 2804Å absorption + emission lines in 29 galaxies taken from the Low-z LyC Survey (LzLCS) to investigate the impact of (radiation + mechanical) feedback on LyC escape. Using constraints on Mg$^+$ and photoionization models, we map the outflows' neutral hydrogen content and predict $f_{esc}^{LyC}$ with a multiphase wind model. We measure mass, momentum, and energy loading factors for the neutral winds, which carry up to 10% of the momentum and 1% of the energy in SFR-based deposition rates. We use SED template fitting to determine the relative ages of stellar populations, allowing us to identify radiation feedback dominant systems. We then examine feedback related properties (stellar age, loading factors, etc.) under conditions that optimize feedback efficiency, specifically high star formation rate surface density and compact UV half-light radii. Our findings indicate that the strongest leakers are radiation feedback dominant, lack Mg II outflows, but have extended broad components in higher ionization lines like [O III] 5007Å, as observed by Amorín et al. (2024). In contrast, galaxies experiencing supernovae feedback typically exhibit weaker $f_{esc}^{LyC}$ and show evidence of outflows in both Mg II and higher ionization lines. We attribute these findings to rapid or "catastrophic" cooling in the radiation-dominant systems, which, given the low metallicities in our sample, are likely experiencing delayed supernovae. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 34 pages, 16 figures, 7 tables

arXiv:2409.03728 [pdf, other]

Multiplicity dependent $J/ψ$ and $ψ(2S)$ production at forward and backward rapidity in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, C. Aidala, Y. Akiba, M. Alfred, V. Andrieux, S. Antsupov, N. Apadula, H. Asano, B. Azmoun, V. Babintsev, N. S. Bandara, E. Bannikov, K. N. Barish, S. Bathe, A. Bazilevsky, M. Beaumier, R. Belmont, A. Berdnikov, Y. Berdnikov, L. Bichon, B. Blankenship, D. S. Blau, J. S. Bok , et al. (276 additional authors not shown)

Abstract: The $J/ψ$ and $ψ(2S)$ charmonium states, composed of $c\bar{c}$ quark pairs and known since the 1970s, are widely believed to serve as ideal probes to test quantum chromodynamics in high-energy hadronic interactions. However, there is not yet a complete understanding of the charmonium-production mechanism. Recent measurements of $J/ψ$ production as a function of event charged-particle multiplicity… ▽ More The $J/ψ$ and $ψ(2S)$ charmonium states, composed of $c\bar{c}$ quark pairs and known since the 1970s, are widely believed to serve as ideal probes to test quantum chromodynamics in high-energy hadronic interactions. However, there is not yet a complete understanding of the charmonium-production mechanism. Recent measurements of $J/ψ$ production as a function of event charged-particle multiplicity at the collision energies of both the Large Hadron Collider (LHC) and the Relativistic Heavy Ion Collider (RHIC) show enhanced $J/ψ$ production yields with increasing multiplicity. One potential explanation for this type of dependence is multiparton interactions (MPI). We carry out the first measurements of self-normalized $J/ψ$ yields and the $ψ(2S)$ to $J/ψ$ ratio at both forward and backward rapidities as a function of self-normalized charged-particle multiplicity in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV. In addition, detailed {\sc pythia} studies tuned to RHIC energies were performed to investigate the MPI impacts. We find that the PHENIX data at RHIC are consistent with recent LHC measurements and can only be described by {\sc pythia} calculations that include MPI effects. The forward and backward $ψ(2S)$ to $J/ψ$ ratio, which serves as a unique and powerful approach to study final-state effects on charmonium production, is found to be less dependent on the charged-particle multiplicity. △ Less

Submitted 5 September, 2024; originally announced September 2024.

Comments: 301 authors from 69 institutions, 8 pages, 3 figures. v1 is version submitted to Physical Review D Letters. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2409.03197 [pdf, other]

doi 10.3847/1538-4357/ad74f1

Active Galactic Nuclei in the Green Valley at z$\sim$0.7

Authors: Charity Woodrum, Christina C. Williams, Marcia Rieke, Kevin N. Hainline, Raphael E. Hviding, Zhiyuan Ji, Robert Kennicutt, Christopher N. A. Willmer

Abstract: We present NIR spectroscopy using MMT/MMIRS for a sample of twenty-nine massive galaxies ($\mathrm{log\ M_* / M_{\odot} \gtrsim10}$) at $\mathrm{z\sim0.7}$ with optical spectroscopy from the LEGA-C survey. Having both optical and NIR spectroscopy at this redshift allows us to measure the full suite of rest-optical strong emission lines, enabling the study of ionization sources and the rest-optical… ▽ More We present NIR spectroscopy using MMT/MMIRS for a sample of twenty-nine massive galaxies ($\mathrm{log\ M_* / M_{\odot} \gtrsim10}$) at $\mathrm{z\sim0.7}$ with optical spectroscopy from the LEGA-C survey. Having both optical and NIR spectroscopy at this redshift allows us to measure the full suite of rest-optical strong emission lines, enabling the study of ionization sources and the rest-optical selection of active galactic nuclei (AGN), as well as the measurement of dust-corrected $\mathrm{Hα}$-based SFRs. We find that eleven out of twenty-nine galaxies host AGN. We infer the nonparametric star formation histories with the SED fitting code \texttt{Prospector} and classify galaxies as star-forming, green valley, or quiescent based on their most recent sSFRs. We explore the connection between AGN activity and suppressed star formation and find that $89\pm15\%$ of galaxies in the green valley or below host AGN, while only $15\%\pm8\%$ of galaxies above the green valley host AGN. We construct the star-forming main sequence (SFMS) and find that the AGN host galaxies are 0.37 dex below the SFMS while galaxies without detectable AGN are consistent with being on the SFMS. However, when compared to a bootstrapped mass-matched sample, the SFRs of our sample of AGN host galaxies are consistent with the full LEGA-C sample. Based on this mass-matched analysis, we cannot rule out that this suppression of star formation is driven by other processes associated with the higher mass of the AGN sample. We therefore cannot link the presence of AGN activity to the quenching of star formation. △ Less

Submitted 4 September, 2024; originally announced September 2024.

Comments: 23 pages, 6 figures, 3 tables. Accepted for publication in ApJ

arXiv:2409.01286 [pdf, other]

Ionising properties of galaxies in JADES for a stellar mass complete sample: resolving the cosmic ionising photon budget crisis at the Epoch of Reionisation

Authors: C. Simmonds, S. Tacchella, K. Hainline, B. D. Johnson, D. Puskás, B. Robertson, W. M. Baker, R. Bhatawdekar, K. Boyett, A. J. Bunker, P. A. Cargile, S. Carniani, J. Chevallard, M. Curti, E. Curtis-Lake, Z. Ji, G. C. Jones, N. Kumari, I. Laseter, R. Maiolino, M. V. Maseda, P. Rinaldi, A. Stoffers, H. Übler, N. C. Villanueva , et al. (4 additional authors not shown)

Abstract: We use NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) to study the ionising properties of a sample of 15721 galaxies at $3 \leq z_{\rm{phot}} \leq 9$, 90\% complete in stellar mass down to log(M$_{\star}$/[M$_{\odot}$])$\approx 7.5$. Out of the full sample, 1620 of the galaxies have spectroscopic redshift measurements from the literature. We use the spectral energy distrib… ▽ More We use NIRCam imaging from the JWST Advanced Deep Extragalactic Survey (JADES) to study the ionising properties of a sample of 15721 galaxies at $3 \leq z_{\rm{phot}} \leq 9$, 90\% complete in stellar mass down to log(M$_{\star}$/[M$_{\odot}$])$\approx 7.5$. Out of the full sample, 1620 of the galaxies have spectroscopic redshift measurements from the literature. We use the spectral energy distribution fitting code \texttt{Prospector} to fit all available photometry and infer galaxy properties. We find a significantly milder evolution of the ionising photon production efficiency (\xion\/) with redshift and UV magnitude than previously reported. Interestingly, we observe two distinct populations in \xion\/, distinguished by their burstiness (given by SFR$_{10}$/SFR$_{100}$). Both populations show the same evolution with $z$ and M$_{\rm{UV}}$, but have a different \xion\/ normalisation. We convolve the more representative $\log(ξ_{\rm{ion}} (z,\text{M}_{\rm{UV}}))$ relations (accounting for $\sim96$\% of the sample), with luminosity functions from literature, to place constraints on the cosmic ionising photon budget. By combining our results, we find that one of our models can match the observational constraints from the \lya\/ forest at $z\lesssim6$. We conclude that galaxies with M$_{\rm{UV}}$ between $-16$ and $-20$, adopting a reasonable escape fraction, can produce enough ionising photons to ionise the Universe, without exceeding the required ionising photon budget. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Submitted to MNRAS. 23 pages, 21 figures

arXiv:2408.16290 [pdf, other]

Gain/Loss-free Non-Hermitian Metamaterials

Authors: Wu Maopeng, Weng Mingze, Chi Zhonghai, Qi Yingyi, Zheng Siyong, Liu Fubei, Li Xinxin, Zhao Qian, Meng Yonggang, Zhou Ji

Abstract: Because of the ease of using optical gain or loss, it's widely believed that photonics provides an ideal platform to explore various non-Hermitian (NH) paradigms. Here, without any gain or loss, the non-Bloch wave transport that is unique to NH systems is demonstrated at the junction of the two dimensional Chern insulator and the normal conductor. In the band gap of the non-trivial Chern insulator… ▽ More Because of the ease of using optical gain or loss, it's widely believed that photonics provides an ideal platform to explore various non-Hermitian (NH) paradigms. Here, without any gain or loss, the non-Bloch wave transport that is unique to NH systems is demonstrated at the junction of the two dimensional Chern insulator and the normal conductor. In the band gap of the non-trivial Chern insulator, the interface between two material types can be effectively described by a one-dimensional NH Hamiltonian--such NH character of the interface is ascribed to the conductor self-energy of a reservoir. As a consequence of asymmetric hopping terms in the interface Hamiltonian, theoretical analysis shows that the wave propagation along the interface exhibits dissipative non-reciprocity (dubed non-Bloch transport). What's more, this anomaly transport is verified in the junction formed by the electromagnetic metamaterial which is constructed by the reverse-design strategy; the strategy enables the general solution of metamaterial structures for emulating any tight-binding models. Further, implementing this strategy, we also investigate the gapless boundary modes in the Haldane-like hyperbolic metamaterial. Our work provides a conceptually rich avenue to construct NH systems for both optics and electronics. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.14008 [pdf, other]

LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models

Authors: Qihang Ge, Wei Sun, Yu Zhang, Yunhao Li, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Xiongkuo Min, Guangtao Zhai

Abstract: The explosive growth of videos on streaming media platforms has underscored the urgent need for effective video quality assessment (VQA) algorithms to monitor and perceptually optimize the quality of streaming videos. However, VQA remains an extremely challenging task due to the diverse video content and the complex spatial and temporal distortions, thus necessitating more advanced methods to addr… ▽ More The explosive growth of videos on streaming media platforms has underscored the urgent need for effective video quality assessment (VQA) algorithms to monitor and perceptually optimize the quality of streaming videos. However, VQA remains an extremely challenging task due to the diverse video content and the complex spatial and temporal distortions, thus necessitating more advanced methods to address these issues. Nowadays, large multimodal models (LMMs), such as GPT-4V, have exhibited strong capabilities for various visual understanding tasks, motivating us to leverage the powerful multimodal representation ability of LMMs to solve the VQA task. Therefore, we propose the first Large Multi-Modal Video Quality Assessment (LMM-VQA) model, which introduces a novel spatiotemporal visual modeling strategy for quality-aware feature extraction. Specifically, we first reformulate the quality regression problem into a question and answering (Q&A) task and construct Q&A prompts for VQA instruction tuning. Then, we design a spatiotemporal vision encoder to extract spatial and temporal features to represent the quality characteristics of videos, which are subsequently mapped into the language space by the spatiotemporal projector for modality alignment. Finally, the aligned visual tokens and the quality-inquired text tokens are aggregated as inputs for the large language model (LLM) to generate the quality score and level. Extensive experiments demonstrate that LMM-VQA achieves state-of-the-art performance across five VQA benchmarks, exhibiting an average improvement of $5\%$ in generalization ability over existing methods. Furthermore, due to the advanced design of the spatiotemporal encoder and projector, LMM-VQA also performs exceptionally well on general video understanding tasks, further validating its effectiveness. Our code will be released at https://github.com/Sueqk/LMM-VQA. △ Less

Submitted 26 August, 2024; originally announced August 2024.

arXiv:2408.13704 [pdf, other]

DHP Benchmark: Are LLMs Good NLG Evaluators?

Authors: Yicheng Wang, Jiayi Yuan, Yu-Neng Chuang, Zhuoer Wang, Yingchi Liu, Mark Cusick, Param Kulkarni, Zhengping Ji, Yasser Ibrahim, Xia Hu

Abstract: Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain inadequately explored. Current studies depend on human assessments and simple metrics that fail to capture the discernment of LLMs across diverse NLG tasks. To address this gap, we propose the Discernment of Hierarchical Per… ▽ More Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain inadequately explored. Current studies depend on human assessments and simple metrics that fail to capture the discernment of LLMs across diverse NLG tasks. To address this gap, we propose the Discernment of Hierarchical Perturbation (DHP) benchmarking framework, which provides quantitative discernment scores for LLMs utilizing hierarchically perturbed text data and statistical tests to measure the NLG evaluation capabilities of LLMs systematically. We have re-established six evaluation datasets for this benchmark, covering four NLG tasks: Summarization, Story Completion, Question Answering, and Translation. Our comprehensive benchmarking of five major LLM series provides critical insight into their strengths and limitations as NLG evaluators. △ Less

Submitted 24 August, 2024; originally announced August 2024.

arXiv:2408.11144 [pdf, other]

Measurement of inclusive jet cross section and substructure in $p$$+$$p$ collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, J. Alexander, M. Alfred, V. Andrieux, S. Antsupov, K. Aoki, N. Apadula, H. Asano, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, X. Bai, N. S. Bandara, B. Bannier, E. Bannikov, K. N. Barish, S. Bathe , et al. (422 additional authors not shown)

Abstract: The jet cross-section and jet-substructure observables in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV were measured by the PHENIX Collaboration at the Relativistic Heavy Ion Collider (RHIC). Jets are reconstructed from charged-particle tracks and electromagnetic-calorimeter clusters using the anti-$k_{t}$ algorithm with a jet radius $R=0.3$ for jets with transverse momentum within $8.0<p_T<40.0$ Ge… ▽ More The jet cross-section and jet-substructure observables in $p$$+$$p$ collisions at $\sqrt{s}=200$ GeV were measured by the PHENIX Collaboration at the Relativistic Heavy Ion Collider (RHIC). Jets are reconstructed from charged-particle tracks and electromagnetic-calorimeter clusters using the anti-$k_{t}$ algorithm with a jet radius $R=0.3$ for jets with transverse momentum within $8.0<p_T<40.0$ GeV/$c$ and pseudorapidity $|η|<0.15$. Measurements include the jet cross section, as well as distributions of SoftDrop-groomed momentum fraction ($z_g$), charged-particle transverse momentum with respect to jet axis ($j_T$), and radial distributions of charged particles within jets ($r$). Also meaureed was the distribution of $ξ=-ln(z)$, where $z$ is the fraction of the jet momentum carried by the charged particle. The measurements are compared to theoretical next-to and next-to-next-to-leading-order calculatios, PYTHIA event generator, and to other existing experimental results. Indicated from these meaurements is a lower particle multiplicity in jets at RHIC energies when compared to models. Also noted are implications for future jet measurements with sPHENIX at RHIC as well as at the future Election-Ion Collider. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 446 authors from 77 institutions, 11 pages, 8 figures. v1 is version submitted to Physical Review D. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2408.05402 [pdf, other]

doi 10.1016/j.gmod.2024.101225

Mesh deformation-based single-view 3D reconstruction of thin eyeglasses frames with differentiable rendering

Authors: Fan Zhang, Ziyue Ji, Weiguang Kang, Weiqing Li, Zhiyong Su

Abstract: With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely d… ▽ More With the support of Virtual Reality (VR) and Augmented Reality (AR) technologies, the 3D virtual eyeglasses try-on application is well on its way to becoming a new trending solution that offers a "try on" option to select the perfect pair of eyeglasses at the comfort of your own home. Reconstructing eyeglasses frames from a single image with traditional depth and image-based methods is extremely difficult due to their unique characteristics such as lack of sufficient texture features, thin elements, and severe self-occlusions. In this paper, we propose the first mesh deformation-based reconstruction framework for recovering high-precision 3D full-frame eyeglasses models from a single RGB image, leveraging prior and domain-specific knowledge. Specifically, based on the construction of a synthetic eyeglasses frame dataset, we first define a class-specific eyeglasses frame template with pre-defined keypoints. Then, given an input eyeglasses frame image with thin structure and few texture features, we design a keypoint detector and refiner to detect predefined keypoints in a coarse-to-fine manner to estimate the camera pose accurately. After that, using differentiable rendering, we propose a novel optimization approach for producing correct geometry by progressively performing free-form deformation (FFD) on the template mesh. We define a series of loss functions to enforce consistency between the rendered result and the corresponding RGB input, utilizing constraints from inherent structure, silhouettes, keypoints, per-pixel shading information, and so on. Experimental results on both the synthetic dataset and real images demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 9 August, 2024; originally announced August 2024.

Journal ref: Graphical Models, Volume 135, October 2024, 101225

arXiv:2408.02061 [pdf, other]

ParkingE2E: Camera-based End-to-end Parking Network, from Images to Planning

Authors: Changze Li, Ziheng Ji, Zhe Chen, Tong Qin, Ming Yang

Abstract: Autonomous parking is a crucial task in the intelligent driving field. Traditional parking algorithms are usually implemented using rule-based schemes. However, these methods are less effective in complex parking scenarios due to the intricate design of the algorithms. In contrast, neural-network-based methods tend to be more intuitive and versatile than the rule-based methods. By collecting a lar… ▽ More Autonomous parking is a crucial task in the intelligent driving field. Traditional parking algorithms are usually implemented using rule-based schemes. However, these methods are less effective in complex parking scenarios due to the intricate design of the algorithms. In contrast, neural-network-based methods tend to be more intuitive and versatile than the rule-based methods. By collecting a large number of expert parking trajectory data and emulating human strategy via learning-based methods, the parking task can be effectively addressed. In this paper, we employ imitation learning to perform end-to-end planning from RGB images to path planning by imitating human driving trajectories. The proposed end-to-end approach utilizes a target query encoder to fuse images and target features, and a transformer-based decoder to autoregressively predict future waypoints. We conducted extensive experiments in real-world scenarios, and the results demonstrate that the proposed method achieved an average parking success rate of 87.8% across four different real-world garages. Real-vehicle experiments further validate the feasibility and effectiveness of the method proposed in this paper. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.00170 [pdf, other]

CREW: Facilitating Human-AI Teaming Research

Authors: Lingyu Zhang, Zhengran Ji, Boyuan Chen

Abstract: With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary res… ▽ More With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary research efforts from machine learning to human-computer interaction, robotics, cognitive science, neuroscience, psychology, social science, and complex systems. However, existing platforms for Human-AI teaming research are limited, often supporting oversimplified scenarios and a single task, or specifically focusing on either human-teaming research or multi-agent AI algorithms. We introduce CREW, a platform to facilitate Human-AI teaming research and engage collaborations from multiple scientific disciplines, with a strong emphasis on human involvement. It includes pre-built tasks for cognitive studies and Human-AI teaming with expandable potentials from our modular design. Following conventional cognitive neuroscience research, CREW also supports multimodal human physiological signal recording for behavior analysis. Moreover, CREW benchmarks real-time human-guided reinforcement learning agents using state-of-the-art algorithms and well-tuned baselines. With CREW, we were able to conduct 50 human subject studies within a week to verify the effectiveness of our benchmark. △ Less

Submitted 31 July, 2024; originally announced August 2024.

Comments: Our project website is at: http://generalroboticslab.com/CREW

arXiv:2407.21408 [pdf, other]

Benchmarking AIGC Video Quality Assessment: A Dataset and Unified Model

Authors: Zhichao Zhang, Xinyue Li, Wei Sun, Jun Jia, Xiongkuo Min, Zicheng Zhang, Chunyi Li, Zijian Chen, Puyi Wang, Zhongpeng Ji, Fengyu Sun, Shangling Jui, Guangtao Zhai

Abstract: In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessi… ▽ More In recent years, artificial intelligence (AI) driven video generation has garnered significant attention due to advancements in stable diffusion and large language model techniques. Thus, there is a great demand for accurate video quality assessment (VQA) models to measure the perceptual quality of AI-generated content (AIGC) videos as well as optimize video generation techniques. However, assessing the quality of AIGC videos is quite challenging due to the highly complex distortions they exhibit (e.g., unnatural action, irrational objects, etc.). Therefore, in this paper, we try to systemically investigate the AIGC-VQA problem from both subjective and objective quality assessment perspectives. For the subjective perspective, we construct a Large-scale Generated Vdeo Quality assessment (LGVQ) dataset, consisting of 2,808 AIGC videos generated by 6 video generation models using 468 carefully selected text prompts. Unlike previous subjective VQA experiments, we evaluate the perceptual quality of AIGC videos from three dimensions: spatial quality, temporal quality, and text-to-video alignment, which hold utmost importance for current video generation techniques. For the objective perspective, we establish a benchmark for evaluating existing quality assessment metrics on the LGVQ dataset, which reveals that current metrics perform poorly on the LGVQ dataset. Thus, we propose a Unify Generated Video Quality assessment (UGVQ) model to comprehensively and accurately evaluate the quality of AIGC videos across three aspects using a unified model, which uses visual, textual and motion features of video and corresponding prompt, and integrates key features to enhance feature expression. We hope that our benchmark can promote the development of quality evaluation metrics for AIGC videos. The LGVQ dataset and the UGVQ metric will be publicly released. △ Less

Submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.21272 [pdf]

doi 10.1109/JBHI.2019.2929842

Automated Quantification of Hyperreflective Foci in SD-OCT With Diabetic Retinopathy

Authors: Idowu Paul Okuwobi, Zexuan Ji, Wen Fan, Songtao Yuan, Loza Bekalo, Qiang Chen

Abstract: The presence of hyperreflective foci (HFs) is related to retinal disease progression, and the quantity has proven to be a prognostic factor of visual and anatomical outcome in various retinal diseases. However, lack of efficient quantitative tools for evaluating the HFs has deprived ophthalmologist of assessing the volume of HFs. For this reason, we propose an automated quantification algorithm to… ▽ More The presence of hyperreflective foci (HFs) is related to retinal disease progression, and the quantity has proven to be a prognostic factor of visual and anatomical outcome in various retinal diseases. However, lack of efficient quantitative tools for evaluating the HFs has deprived ophthalmologist of assessing the volume of HFs. For this reason, we propose an automated quantification algorithm to segment and quantify HFs in spectral domain optical coherence tomography (SD-OCT). The proposed algorithm consists of two parallel processes namely: region of interest (ROI) generation and HFs estimation. To generate the ROI, we use morphological reconstruction to obtain the reconstructed image and histogram constructed for data distributions and clustering. In parallel, we estimate the HFs by extracting the extremal regions from the connected regions obtained from a component tree. Finally, both the ROI and the HFs estimation process are merged to obtain the segmented HFs. The proposed algorithm was tested on 40 3D SD-OCT volumes from 40 patients diagnosed with non-proliferative diabetic retinopathy (NPDR), proliferative diabetic retinopathy (PDR), and diabetic macular edema (DME). The average dice similarity coefficient (DSC) and correlation coefficient (r) are 69.70%, 0.99 for NPDR, 70.31%, 0.99 for PDR, and 71.30%, 0.99 for DME, respectively. The proposed algorithm can provide ophthalmologist with good HFs quantitative information, such as volume, size, and location of the HFs. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: IEEE Journal of Biomedical and Health Informatics, Volume: 24, Issue: 4, pp. 1125 - 1136, 2020

Journal ref: IEEE Journal of Biomedical and Health Informatics, Volume: 24, Issue: 4, pp. 1125 - 1136, 2020

arXiv:2407.17120 [pdf, other]

Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

Authors: Jingren Liu, Zhong Ji, YunLong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li

Abstract: Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics fo… ▽ More Parameter-efficient fine-tuning for continual learning (PEFT-CL) has shown promise in adapting pre-trained models to sequential tasks while mitigating catastrophic forgetting problem. However, understanding the mechanisms that dictate continual performance in this paradigm remains elusive. To tackle this complexity, we undertake a rigorous analysis of PEFT-CL dynamics to derive relevant metrics for continual scenarios using Neural Tangent Kernel (NTK) theory. With the aid of NTK as a mathematical analysis tool, we recast the challenge of test-time forgetting into the quantifiable generalization gaps during training, identifying three key factors that influence these gaps and the performance of PEFT-CL: training sample size, task-level feature orthogonality, and regularization. To address these challenges, we introduce NTK-CL, a novel framework that eliminates task-specific parameter storage while adaptively generating task-relevant features. Aligning with theoretical guidance, NTK-CL triples the feature representation of each sample, theoretically and empirically reducing the magnitude of both task-interplay and task-specific generalization gaps. Grounded in NTK analysis, our approach imposes an adaptive exponential moving average mechanism and constraints on task-level feature orthogonality, maintaining intra-task NTK forms while attenuating inter-task NTK forms. Ultimately, by fine-tuning optimizable parameters with appropriate regularization, NTK-CL achieves state-of-the-art performance on established PEFT-CL benchmarks. This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems. △ Less

Submitted 24 July, 2024; originally announced July 2024.

arXiv:2407.15414 [pdf, other]

Weights Shuffling for Improving DPSGD in Transformer-based Models

Authors: Jungang Yang, Zhe Ji, Liyao Xiang

Abstract: Differential Privacy (DP) mechanisms, especially in high-dimensional settings, often face the challenge of maintaining privacy without compromising the data utility. This work introduces an innovative shuffling mechanism in Differentially-Private Stochastic Gradient Descent (DPSGD) to enhance the utility of large models at the same privacy guarantee of the unshuffled case. Specifically, we reveal… ▽ More Differential Privacy (DP) mechanisms, especially in high-dimensional settings, often face the challenge of maintaining privacy without compromising the data utility. This work introduces an innovative shuffling mechanism in Differentially-Private Stochastic Gradient Descent (DPSGD) to enhance the utility of large models at the same privacy guarantee of the unshuffled case. Specifically, we reveal that random shuffling brings additional randomness to the trajectory of gradient descent while not impacting the model accuracy by the permutation invariance property -- the model can be equivalently computed in both forward and backward propagations under permutation. We show that permutation indeed improves the privacy guarantee of DPSGD in theory, but tracking the exact privacy loss on shuffled model is particularly challenging. Hence we exploit the approximation on sum of lognormal distributions to derive the condition for the shuffled DPSGD to meet the DP guarantee. Auditing results show that our condition offers a DP guarantee quite close to the audited privacy level, demonstrating our approach an effective estimation in practice. Experimental results have verified our theoretical derivation and illustrate that our mechanism improves the accuracy of DPSGD over the state-of-the-art baselines on a variety of models and tasks. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.12023 [pdf, other]

CMMaTH: A Chinese Multi-modal Math Skill Evaluation Benchmark for Foundation Models

Authors: Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Zhi-Long Ji, Jin-Feng Bai, Zhen-Ru Pan, Fan-Hu Zeng, Jian Xu, Jia-Xin Zhang, Cheng-Lin Liu

Abstract: Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 ed… ▽ More Due to the rapid advancements in multimodal large language models, evaluating their multimodal mathematical capabilities continues to receive wide attention. Despite the datasets like MathVista proposed benchmarks for assessing mathematical capabilities in multimodal scenarios, there is still a lack of corresponding evaluation tools and datasets for fine-grained assessment in the context of K12 education in Chinese language. To systematically evaluate the capability of multimodal large models in solving Chinese multimodal mathematical problems, we propose a Chinese Multi-modal Math Skill Evaluation Benchmark, named CMMaTH, contraining 23k multimodal K12 math related questions, forming the largest Chinese multimodal mathematical problem benchmark to date. CMMaTH questions from elementary to high school levels, provide increased diversity in problem types, solution objectives, visual elements, detailed knowledge points, and standard solution annotations. We have constructed an open-source tool GradeGPT integrated with the CMMaTH dataset, facilitating stable, rapid, and cost-free model evaluation. Our data and code are available. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2407.11473 [pdf, other]

Quantum Maximum Entropy Inference and Hamiltonian Learning

Authors: Minbo Gao, Zhengfeng Ji, Fuchao Wei

Abstract: Maximum entropy inference and learning of graphical models are pivotal tasks in learning theory and optimization. This work extends algorithms for these problems, including generalized iterative scaling (GIS) and gradient descent (GD), to the quantum realm. While the generalization, known as quantum iterative scaling (QIS), is straightforward, the key challenge lies in the non-commutative nature o… ▽ More Maximum entropy inference and learning of graphical models are pivotal tasks in learning theory and optimization. This work extends algorithms for these problems, including generalized iterative scaling (GIS) and gradient descent (GD), to the quantum realm. While the generalization, known as quantum iterative scaling (QIS), is straightforward, the key challenge lies in the non-commutative nature of quantum problem instances, rendering the convergence rate analysis significantly more challenging than the classical case. Our principal technical contribution centers on a rigorous analysis of the convergence rates, involving the establishment of both lower and upper bounds on the spectral radius of the Jacobian matrix for each iteration of these algorithms. Furthermore, we explore quasi-Newton methods to enhance the performance of QIS and GD. Specifically, we propose using Anderson mixing and the L-BFGS method for QIS and GD, respectively. These quasi-Newton techniques exhibit remarkable efficiency gains, resulting in orders of magnitude improvements in performance. As an application, our algorithms provide a viable approach to designing Hamiltonian learning algorithms. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 7 figures

arXiv:2407.10411 [pdf]

doi 10.23977/tracam.2024.040107

A Study on Lampreys Population Based on Sex-Ratio-Related Growth-Balance Model

Authors: Zuhua Ji, Jiarui Chen, Zihang Wang

Abstract: Lampreys are one of the oldest species in the world, living longer than dinosaurs, which is related to the ability to change the sex ratio during their lifespan. In this paper, to understand how sex ratio and food quantity affect the population growth rate of lampreys, the researchers draw inspiration from the logistics model and established a model called EcoSexChange(ESC), which results in a pop… ▽ More Lampreys are one of the oldest species in the world, living longer than dinosaurs, which is related to the ability to change the sex ratio during their lifespan. In this paper, to understand how sex ratio and food quantity affect the population growth rate of lampreys, the researchers draw inspiration from the logistics model and established a model called EcoSexChange(ESC), which results in a population initially increasing and then stabilizing, a reasonable outcome that may apply to other organisms with significant differences in consumption between sexes. Subsequently, this paper develops the Sex Ratio Adaptation Eco Impact (SRAEI) model based on the ESC model using the ABM algorithm to simulate how the population of lampreys, whose lives are divided into seven stages, grows and stabilizes. Then introduces a sudden disaster factor in the middle of the simulation, while also comparing lampreys that cannot adjust their sex ratio. The results of this paper are of great reference significance for people to analyze the population changes of lampreys in different living environments, and they are also easy to apply to other species with large differences between males and females. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Journal ref: Transactions on Computational and Applied Mathematics. 2024 May 6;4(1):48-55

arXiv:2407.09486 [pdf, other]

ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving

Authors: Tao Huang, Pengfei Chen, Kyoka Gong, Jocky Hawk, Zachary Bright, Wenxin Xie, Kecheng Huang, Zhi Ji

Abstract: Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, mo… ▽ More Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, monitoring and autoscaling service towards serverless LLM serving. ENOVA deconstructs the execution process of LLM service comprehensively, based on which ENOVA designs a configuration recommendation module for automatic deployment on any GPU clusters and a performance detection module for autoscaling. On top of them, ENOVA implements a deployment execution engine for multi-GPU cluster scheduling. The experiment results show that ENOVA significantly outperforms other state-of-the-art methods and is suitable for wide deployment in large online systems. △ Less

Submitted 17 May, 2024; originally announced July 2024.

arXiv:2407.08586 [pdf, other]

Centrality dependence of Lévy-stable two-pion Bose-Einstein correlations in $\sqrt{s_{_{NN}}}=200$ GeV Au$+$Au collisions

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, C. Aidala, N. N. Ajitanand, Y. Akiba, R. Akimoto, H. Al-Ta'ani, J. Alexander, A. Angerami, K. Aoki, N. Apadula, Y. Aramaki, H. Asano, E. C. Aschenauer, E. T. Atomssa, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, B. Bannier, K. N. Barish, B. Bassalleck, S. Bathe , et al. (377 additional authors not shown)

Abstract: The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability… ▽ More The PHENIX experiment measured the centrality dependence of two-pion Bose-Einstein correlation functions in $\sqrt{s_{_{NN}}}=200$~GeV Au$+$Au collisions at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory. The data are well represented by Lévy-stable source distributions. The extracted source parameters are the correlation-strength parameter $λ$, the Lévy index of stability $α$, and the Lévy-scale parameter $R$ as a function of transverse mass $m_T$ and centrality. The $λ(m_T)$ parameter is constant at larger values of $m_T$, but decreases as $m_T$ decreases. The Lévy scale parameter $R(m_T)$ decreases with $m_T$ and exhibits proportionality to the length scale of the nuclear overlap region. The Lévy exponent $α(m_T)$ is independent of $m_T$ within uncertainties in each investigated centrality bin, but shows a clear centrality dependence. At all centralities, the Lévy exponent $α$ is significantly different from that of Gaussian ($α=2$) or Cauchy ($α=1$) source distributions. Comparisons to the predictions of Monte-Carlo simulations of resonance-decay chains show that in all but the most peripheral centrality class (50%-60%), the obtained results are inconsistent with the measurements, unless a significant reduction of the in-medium mass of the $η'$ meson is included. In each centrality class, the best value of the in-medium $η'$ mass is compared to the mass of the $η$ meson, as well as to several theoretical predictions that consider restoration of $U_A(1)$ symmetry in hot hadronic matter. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 401 authors from 75 institutions, 20 pages, 15 figures, 2 tables. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2407.07999 [pdf, ps, other]

Fusion of Short-term and Long-term Attention for Video Mirror Detection

Authors: Mingchen Xu, Jing Wu, Yukun Lai, Ze Ji

Abstract: Techniques for detecting mirrors from static images have witnessed rapid growth in recent years. However, these methods detect mirrors from single input images. Detecting mirrors from video requires further consideration of temporal consistency between frames. We observe that humans can recognize mirror candidates, from just one or two frames, based on their appearance (e.g. shape, color). However… ▽ More Techniques for detecting mirrors from static images have witnessed rapid growth in recent years. However, these methods detect mirrors from single input images. Detecting mirrors from video requires further consideration of temporal consistency between frames. We observe that humans can recognize mirror candidates, from just one or two frames, based on their appearance (e.g. shape, color). However, to ensure that the candidate is indeed a mirror (not a picture or a window), we often need to observe more frames for a global view. This observation motivates us to detect mirrors by fusing appearance features extracted from a short-term attention module and context information extracted from a long-term attention module. To evaluate the performance, we build a challenging benchmark dataset of 19,255 frames from 281 videos. Experimental results demonstrate that our method achieves state-of-the-art performance on the benchmark dataset. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.05993 [pdf, other]

Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution

Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

Abstract: In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, Sta… ▽ More In this paper, we propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Existing methods are primarily based on convolutional neural networks (CNNs) or Transformers. CNNs-based methods fail to capture long-range dependencies, while Transformer-based approaches face heavy calculation challenges due to their quadratic computational complexity. Recently, State Space Models (SSMs) especially Mamba have emerged, capable of modeling long-range dependencies with linear computational complexity. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks, which may help to super-resolve low-resolution medical images in an efficient way. Specifically, we obtain self-priors by perturbing the brightness inpainting of the input image during network training, which can learn detailed texture and brightness information that is beneficial for super-resolution. Furthermore, we combine Mamba with Unet network to mine global features at different levels. We also design an improved 2D-Selective-Scan (ISS2D) module to divide image features into different directional sequences to learn long-range dependencies in multiple directions, and adaptively fuse sequence information to enhance super-resolved feature representation. Both qualitative and quantitative experimental results demonstrate that our approach outperforms current state-of-the-art methods on two public medical datasets: the IXI and fastMRI. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05969 [pdf, other]

Deform-Mamba Network for MRI Super-Resolution

Authors: Zexin Ji, Beiji Zou, Xiaoyan Kui, Pierre Vera, Su Ruan

Abstract: In this paper, we propose a new architecture, called Deform-Mamba, for MR image super-resolution. Unlike conventional CNN or Transformer-based super-resolution approaches which encounter challenges related to the local respective field or heavy computational cost, our approach aims to effectively explore the local and global information of images. Specifically, we develop a Deform-Mamba encoder wh… ▽ More In this paper, we propose a new architecture, called Deform-Mamba, for MR image super-resolution. Unlike conventional CNN or Transformer-based super-resolution approaches which encounter challenges related to the local respective field or heavy computational cost, our approach aims to effectively explore the local and global information of images. Specifically, we develop a Deform-Mamba encoder which is composed of two branches, modulated deform block and vision Mamba block. We also design a multi-view context module in the bottleneck layer to explore the multi-view contextual content. Thanks to the extracted features of the encoder, which include content-adaptive local and efficient global information, the vision Mamba decoder finally generates high-quality MR images. Moreover, we introduce a contrastive edge loss to promote the reconstruction of edge and contrast related content. Quantitative and qualitative experimental results indicate that our approach on IXI and fastMRI datasets achieves competitive performance. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.04693 [pdf, other]

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

Abstract: Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucin… ▽ More Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucinations, this paper introduces an iterative self-training framework that simultaneously and progressively scales up the hallucination annotation dataset and improves the accuracy of the hallucination annotator. Based on the Expectation Maximization (EM) algorithm, in each iteration, the framework first applies a hallucination annotation pipeline to annotate a scaled dataset and then trains a more accurate hallucination annotator on the dataset. This new hallucination annotator is adopted in the hallucination annotation pipeline used for the next iteration. Extensive experimental results demonstrate that the finally obtained hallucination annotator with only 7B parameters surpasses the performance of GPT-4 and obtains new state-of-the-art hallucination detection results on HaluEval and HalluQA by zero-shot inference. Such an annotator can not only evaluate the hallucination levels of various LLMs on the large-scale dataset but also help to mitigate the hallucination of LLMs generations, with the Natural Language Inference (NLI) metric increasing from 25% to 37% on HaluEval. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 9 pages

arXiv:2407.03282 [pdf, other]

LLM Internal States Reveal Hallucination Risk Faced With a Query

Authors: Ziwei Ji, Delong Chen, Etsuko Ishii, Samuel Cahyawijaya, Yejin Bang, Bryan Wilie, Pascale Fung

Abstract: The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadl… ▽ More The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate their own hallucination risk before response generation. We analyze the internal mechanisms of LLMs broadly both in terms of training data sources and across 15 diverse Natural Language Generation (NLG) tasks, spanning over 700 datasets. Our empirical analysis reveals two key insights: (1) LLM internal states indicate whether they have seen the query in training data or not; and (2) LLM internal states show they are likely to hallucinate or not regarding the query. Our study explores particular neurons, activation layers, and tokens that play a crucial role in the LLM perception of uncertainty and hallucination risk. By a probing estimator, we leverage LLM self-assessment, achieving an average hallucination estimation accuracy of 84.32\% at run time. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02575 [pdf, other]

JADES: The star-formation and chemical enrichment history of a luminous galaxy at z~9.43 probed by ultra-deep JWST/NIRSpec spectroscopy

Authors: Mirko Curti, Joris Witstok, Peter Jakobsen, Chiaki Kobayashi, Emma Curtis-Lake, Kevin Hainline, Xihan Ji, Francesco D'Eugenio, Jacopo Chevallard, Roberto Maiolino, Jan Scholtz, Stefano Carniani, Santiago Arribas, William M. Baker, Rachana Bhatawdekar, Kristan Boyett, Andrew J. Bunker, Alex Cameron, Phillip A. Cargile, Stephane Charlot, Daniel J. Eisenstein, Zhiyuan Ji, Benjamin D. Johnson, Nimisha Kumari, Michael V. Maseda , et al. (8 additional authors not shown)

Abstract: We analyse ultra-deep JWST observations of the galaxy JADES-GS-z9-0 at z = 9.4327, and derive detailed stellar and interstellar medium (ISM) properties of this luminous (MUV=-20.43) high-redshift system. Complementary information from NIRCam imaging and NIRSpec (both low- and medium-resolution) spectroscopy reveal a compact system (Re ~110 pc) characterised by a steeply rising star formation histo… ▽ More We analyse ultra-deep JWST observations of the galaxy JADES-GS-z9-0 at z = 9.4327, and derive detailed stellar and interstellar medium (ISM) properties of this luminous (MUV=-20.43) high-redshift system. Complementary information from NIRCam imaging and NIRSpec (both low- and medium-resolution) spectroscopy reveal a compact system (Re ~110 pc) characterised by a steeply rising star formation history, which is reflected in the inferred young stellar age (t ~ 3 Myr, light-weighted), high star-formation rate surface density (ΣSFR ~ 72 M yr-1 kpc-2), high ionisation parameter (log(U) ~ -1.5), low metallicity (12+log(O/H) ~ 7.5), and low carbon-over-oxygen abundance ([C/O] = -0.64). Leveraging the detection of N iii]1750 we derive nitrogen-over-oxygen abundance ([N/O] ~ 0) higher than the plateau followed by low-redshift galaxies of similar metallicity, possibly revealing the imprint from (very) massive stars on the ISM enrichment and favouring a top-heavy Initial Mass Function (IMF) scenario. Massive stars powering a hard radiation field are also required to explain the rest-frame UV line ratios, though the presence of the high-excitation [Ne v]λ3426 emission line possibly hints at additional ionization from an AGN. We also report the tentative detection of Lyα emission in the G140M spectrum, shifted by ~450 km/s redward of the systemic redshift. Combined with a modelling of the Lyα spectral break, we rule out the presence of very high column densities of neutral gas pertaining to local absorbers, as well as any extended surrounding ionised bubble, suggesting that JADES-GS-z9-0 has not yet significantly contributed to cosmic Reionization. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Submitted to A&A. Comments are welcome

arXiv:2406.17968 [pdf, other]

Efficient Document Ranking with Learnable Late Interactions

Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been proposed to realize more favorable latency-quality tradeoffs, by using a DE structure followed by a lightweight scorer based on query and document token embeddings. However, these lightweight scorers are often hand-crafted, and there is no understanding of their approximation power; further, such scorers require access to individual document token embeddings, which imposes an increased latency and storage burden. In this paper, we propose novel learnable late-interaction models (LITE) that resolve these issues. Theoretically, we prove that LITE is a universal approximator of continuous scoring functions, even for relatively small embedding dimension. Empirically, LITE outperforms previous late-interaction models such as ColBERT on both in-domain and zero-shot re-ranking tasks. For instance, experiments on MS MARCO passage re-ranking show that LITE not only yields a model with better generalization, but also lowers latency and requires 0.25x storage compared to ColBERT. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17608 [pdf, other]

Test-Time Generative Augmentation for Medical Image Segmentation

Authors: Xiao Ma, Yuhui Tao, Yuhan Zhang, Zexuan Ji, Yizhe Zhang, Qiang Chen

Abstract: In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has… ▽ More In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has been trained to comprehend and encapsulate comprehensive domain data knowledge, it is superior than segmentation models in terms of representing the data characteristics and distribution. Hence, by integrating the GM into test-time augmentation, we can effectively generate multiple views of a given test sample, aligning with the content and appearance characteristics of the sample and the related local data distribution. This approach renders the augmentation process more adaptable and resilient compared to conventional handcrafted transforms. Comprehensive experiments conducted across three medical image segmentation tasks (nine datasets) demonstrate the efficacy and versatility of the proposed TTGA in enhancing segmentation outcomes. Moreover, TTGA significantly improves pixel-wise error estimation, thereby facilitating the deployment of a more reliable segmentation system. Code will be released at: https://github.com/maxiao0234/TTGA. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 12pages, 2figures

arXiv:2406.11997 [pdf, other]

JADES: Physical properties of Ly$α$ and non-Ly$α$ emitters at z ~ 4.8-9.6

Authors: Nimisha Kumari, Renske Smit, Joris Witstok, Marco Sirianni, Roberto Maiolino, Andrew J. Bunker, Rachana Bhatawdekar, Kristan Boyett, Alex J. Cameron, Stefano Carniani, Stephane Charlot, Mirko Curti, Emma Curtis-Lake, Francesco D'Eugenio, Daniel J. Eisenstein, Kevin Hainline, Zhiyuan Ji, Gareth C. Jones, Brant Robertson, Aayush Saxena, Jan Scholtz, Charlotte Simmonds, Christina C. Williams, Christopher N. A. Willmer

Abstract: We investigate the physical properties of Lyman-alpha emitters (LAEs) and non-Lyman-alpha emitters (non-LAEs) at z$\sim$4.8--9.6 via a stacking analysis of 253 JWST/NIRSpec spectra of galaxies observed as part of the JWST Advanced Deep Extragalactic Survey (JADES). We identify a sample of 42 LAEs with the equivalent width of Ly$α$ $\gtrsim$20Åand a sample of 211 non-LAEs, divide each sample furthe… ▽ More We investigate the physical properties of Lyman-alpha emitters (LAEs) and non-Lyman-alpha emitters (non-LAEs) at z$\sim$4.8--9.6 via a stacking analysis of 253 JWST/NIRSpec spectra of galaxies observed as part of the JWST Advanced Deep Extragalactic Survey (JADES). We identify a sample of 42 LAEs with the equivalent width of Ly$α$ $\gtrsim$20Åand a sample of 211 non-LAEs, divide each sample further via the median redshift of the LAEs (z~6.3), and create composite spectra using the low and medium resolution spectra from NIRSpec. We estimate physical quantities such as dust extinction, UV continuum slope $β$, electron temperatures, ionization parameter, escape fraction of Ly$α$ and Lyman Continuum, and the photon production rate for each bin/stack. The existing dust-extinction laws do not appear to be valid at these epochs. The emission line ratio analyses show that active galactic nuclei might dominate all sub-samples, irrespective of Ly$α$ emission. LAEs show much higher [OIII]/[OII] and low [OII]/H$δ$ at z$\lesssim$6.3 compared to non-LAEs, but these line ratios are not sufficient to distinguish the two populations at z$>$6.3. However, the LAEs samples show large EW([OIII]4959, 5007) ($>$1000Å) compared to the non-LAEs sample at all redshifts. CIV/Ly$α$ and CIV/CIII] for LAE population at z$\lesssim$6.3 is $\sim$a factor of 5 larger than that for LAE population at z$>$6.3. The ionizing radiation for LAEs is hard, as revealed from several diagnostics, including CIV detection, high [OIII]/[OII] ($>$8), and large values of $ξ^{\star}_{ion}$. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Submitted to ApJ, 20 pages, 13 figures, 2 tables

arXiv:2406.10179 [pdf, other]

Multivariate Predictors of LyC Escape II: Predicting LyC Escape Fractions for High-Redshift Galaxies

Authors: Anne E. Jaskot, Anneliese C. Silveyra, Anna Plantinga, Sophia R. Flury, Matthew Hayes, John Chisholm, Timothy Heckman, Laura Pentericci, Daniel Schaerer, Maxime Trebitsch, Anne Verhamme, Cody Carr, Henry C. Ferguson, Zhiyuan Ji, Mauro Giavalisco, Alaina Henry, Rui Marques-Chaves, Göran Östlin, Alberto Saldana-Lopez, Claudia Scarlata, Gábor Worseck, Xinfeng Xu

Abstract: JWST is uncovering the properties of ever increasing numbers of galaxies at z>6, during the epoch of reionization. Connecting these observed populations to the process of reionization requires understanding how efficiently they produce Lyman continuum (LyC) photons and what fraction (fesc) of these photons escape into the intergalactic medium. By applying the Cox proportional hazards model, a surv… ▽ More JWST is uncovering the properties of ever increasing numbers of galaxies at z>6, during the epoch of reionization. Connecting these observed populations to the process of reionization requires understanding how efficiently they produce Lyman continuum (LyC) photons and what fraction (fesc) of these photons escape into the intergalactic medium. By applying the Cox proportional hazards model, a survival analysis technique, to the Low-redshift Lyman Continuum Survey (LzLCS), we develop new, empirical, multivariate predictions for fesc. The models developed from the LzLCS reproduce the observed fesc for z~3 samples, which suggests that LyC emitters may share similar properties at low and high redshift. Our best-performing models for the z~3 galaxies include information about dust attenuation, ionization, and/or morphology. We then apply these models to z$\gtrsim$6 galaxies. For large photometric samples, we find a median predicted fesc=0.047-0.14. For smaller spectroscopic samples, which may include stronger emission line galaxies, we find that $\geq$33% of the galaxies have fesc >0.2, and we identify several candidate extreme leakers with fesc $\geq$0.5. The current samples show no strong trend between predicted fesc and UV magnitude, but limited spectroscopic information makes this result uncertain. Multivariate predictions can give significantly different results from single variable predictions, and the predicted fesc for high-redshift galaxies can differ significantly depending on whether star formation rate surface density or radius is used as a measure of galaxy morphology. We provide all parameters necessary to predict fesc for additional samples of high-redshift galaxies using these models. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted for publication in ApJ. 33 pages, 9 figures, 10 tables, plus appendix

arXiv:2406.10171 [pdf, other]

Multivariate Predictors of LyC Escape I: A Survival Analysis of the Low-redshift Lyman Continuum Survey

Authors: Anne E. Jaskot, Anneliese C. Silveyra, Anna Plantinga, Sophia R. Flury, Matthew Hayes, John Chisholm, Timothy Heckman, Laura Pentericci, Daniel Schaerer, Maxime Trebitsch, Anne Verhamme, Cody Carr, Henry C. Ferguson, Zhiyuan Ji, Mauro Giavalisco, Alaina Henry, Rui Marques-Chaves, Göran Östlin, Alberto Saldana-Lopez, Claudia Scarlata, Gábor Worseck, Xinfeng Xu

Abstract: To understand how galaxies reionized the universe, we must determine how the escape fraction of Lyman Continuum (LyC) photons (fesc) depends on galaxy properties. Using the z~0.3 Low-redshift Lyman Continuum Survey (LzLCS), we develop and analyze new multivariate predictors of fesc. These predictions use the Cox proportional hazards model, a survival analysis technique that incorporates both detec… ▽ More To understand how galaxies reionized the universe, we must determine how the escape fraction of Lyman Continuum (LyC) photons (fesc) depends on galaxy properties. Using the z~0.3 Low-redshift Lyman Continuum Survey (LzLCS), we develop and analyze new multivariate predictors of fesc. These predictions use the Cox proportional hazards model, a survival analysis technique that incorporates both detections and upper limits. Our best model predicts the LzLCS fesc detections with a root-mean-square (RMS) scatter of 0.31 dex, better than single-variable correlations. According to ranking techniques, the most important predictors of fesc are the equivalent width (EW) of Lyman-series absorption lines and the UV dust attenuation, which track line-of-sight absorption due to HI and dust. The HI absorption EW is uniquely crucial for predicting fesc for the strongest LyC emitters, which show properties similar to weaker LyC emitters and whose high fesc may therefore result from favorable orientation. In the absence of HI information, star formation rate surface density ($Σ_{\rm SFR}$) and [O III]/[O II] ratio are the most predictive variables and highlight the connection between feedback and fesc. We generate a model suitable for z>6, which uses only the UV slope, $Σ_{\rm SFR}$, and [O III]/[O II]. We find that $Σ_{\rm SFR}$ is more important in predicting fesc at higher stellar masses, whereas [O III]/[O II] plays a greater role at lower masses. We also analyze predictions for other parameters, such as the ionizing-to-non ionizing flux ratio and Ly=alpha escape fraction. These multivariate models represent a promising tool for predicting fesc at high redshift. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted for publication in ApJ. 34 pages + appendix, 12 figures

arXiv:2406.09178 [pdf, other]

AutomaChef: A Physics-informed Demonstration-guided Learning Framework for Granular Material Manipulation

Authors: Minglun Wei, Xintong Yang, Yu-Kun Lai, Seyed Amir Tafrishi, Ze Ji

Abstract: Due to the complex physical properties of granular materials, research on robot learning for manipulating such materials predominantly either disregards the consideration of their physical characteristics or uses surrogate models to approximate their physical properties. Learning to manipulate granular materials based on physical information obtained through precise modelling remains an unsolved p… ▽ More Due to the complex physical properties of granular materials, research on robot learning for manipulating such materials predominantly either disregards the consideration of their physical characteristics or uses surrogate models to approximate their physical properties. Learning to manipulate granular materials based on physical information obtained through precise modelling remains an unsolved problem. In this paper, we propose to address this challenge by constructing a differentiable physics simulator for granular materials based on the Taichi programming language and developing a learning framework accelerated by imperfect demonstrations that are generated via gradient-based optimisation on non-granular materials through our simulator. Experimental results show that our method trains three policies that, when chained, are capable of executing the task of transporting granular materials in both simulated and real-world scenarios, which existing popular deep reinforcement learning models fail to accomplish. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2406.08455 [pdf, other]

AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind

Authors: Wei Ding, Fanhong Li, Ziteng Ji, Zhengrong Xue, Jia Liu

Abstract: We propose AToM-Bot, a novel task generation and execution framework for proactive robot-human interaction, which leverages the human mental and physical state inference capabilities of the Vision Language Model (VLM) prompted by the Affective Theory of Mind (AToM). Without requiring explicit commands by humans, AToM-Bot proactively generates and follows feasible tasks to improve general human wel… ▽ More We propose AToM-Bot, a novel task generation and execution framework for proactive robot-human interaction, which leverages the human mental and physical state inference capabilities of the Vision Language Model (VLM) prompted by the Affective Theory of Mind (AToM). Without requiring explicit commands by humans, AToM-Bot proactively generates and follows feasible tasks to improve general human well-being. When around humans, AToM-Bot first detects current human needs based on inferred human states and observations of the surrounding environment. It then generates tasks to fulfill these needs, taking into account its embodied constraints. We designed 16 daily life scenarios spanning 4 common scenes and tasked the same visual stimulus to 59 human subjects and our robot. We used the similarity between human open-ended answers and robot output, and the human satisfaction scores to metric robot performance. AToM-Bot received high human evaluations in need detection (6.42/7, 91.7%), embodied solution (6.15/7, 87.8%) and task execution (6.17/7, 88.1%). We show that AToM-Bot excels in generating and executing feasible plans to fulfill unspoken human needs. Videos and code are available at https://affective-tom-bot.github.io. △ Less

Submitted 15 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.05992 [pdf, other]

MHS-VM: Multi-Head Scanning in Parallel Subspaces for Vision Mamba

Authors: Zhongping Ji

Abstract: Recently, State Space Models (SSMs), with Mamba as a prime example, have shown great promise for long-range dependency modeling with linear complexity. Then, Vision Mamba and the subsequent architectures are presented successively, and they perform well on visual tasks. The crucial step of applying Mamba to visual tasks is to construct 2D visual features in sequential manners. To effectively organ… ▽ More Recently, State Space Models (SSMs), with Mamba as a prime example, have shown great promise for long-range dependency modeling with linear complexity. Then, Vision Mamba and the subsequent architectures are presented successively, and they perform well on visual tasks. The crucial step of applying Mamba to visual tasks is to construct 2D visual features in sequential manners. To effectively organize and construct visual features within the 2D image space through 1D selective scan, we propose a novel Multi-Head Scan (MHS) module. The embeddings extracted from the preceding layer are projected into multiple lower-dimensional subspaces. Subsequently, within each subspace, the selective scan is performed along distinct scan routes. The resulting sub-embeddings, obtained from the multi-head scan process, are then integrated and ultimately projected back into the high-dimensional space. Moreover, we incorporate a Scan Route Attention (SRA) mechanism to enhance the module's capability to discern complex structures. To validate the efficacy of our module, we exclusively substitute the 2D-Selective-Scan (SS2D) block in VM-UNet with our proposed module, and we train our models from scratch without using any pre-trained weights. The results indicate a significant improvement in performance while reducing the parameters of the original VM-UNet. The code for this study is publicly available at https://github.com/PixDeep/MHS-VM. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: 11 pages, 5 figures

arXiv:2406.05498 [pdf, other]

SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner

Authors: Xunguang Wang, Daoyuan Wu, Zhenlan Ji, Zongjie Li, Pingchuan Ma, Shuai Wang, Yingjiu Li, Yang Liu, Ning Liu, Juergen Rahmel

Abstract: Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into multiple categories: human-based, optimization-based, generation-based, and the recent indirect and multilingual jailbreaks. However, delivering a practical jailbreak defense is challenging because it needs to not only handle all the above ja… ▽ More Jailbreaking is an emerging adversarial attack that bypasses the safety alignment deployed in off-the-shelf large language models (LLMs) and has evolved into multiple categories: human-based, optimization-based, generation-based, and the recent indirect and multilingual jailbreaks. However, delivering a practical jailbreak defense is challenging because it needs to not only handle all the above jailbreak attacks but also incur negligible delays to user prompts, as well as be compatible with both open-source and closed-source LLMs. Inspired by how the traditional security concept of shadow stacks defends against memory overflow attacks, this paper introduces a generic LLM jailbreak defense framework called SelfDefend, which establishes a shadow LLM as a defense instance to concurrently protect the target LLM instance in the normal stack and collaborate with it for checkpoint-based access control. The effectiveness of SelfDefend builds upon our observation that existing LLMs (both target and defense LLMs) have the capability to identify harmful prompts or intentions in user queries, which we empirically validate using the commonly used GPT-3.5/4 models across all major jailbreak attacks. To further improve the defense's robustness and minimize costs, we employ a data distillation approach to tune dedicated open-source defense models. These models outperform six state-of-the-art defenses and match the performance of GPT-4-based SelfDefend, with significantly lower extra delays. We also empirically show that the tuned models are robust to adaptive jailbreaks and prompt injections. △ Less

Submitted 5 September, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

Comments: This paper completes its earlier vision paper, available at arXiv:2402.15727. Updated to the latest analysis and results

arXiv:2406.01059 [pdf, other]

VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model

Authors: Jinze Yang, Haoran Wang, Zining Zhu, Chenglong Liu, Meng Wymond Wu, Zeke Xie, Zhong Ji, Jungong Han, Mingming Sun

Abstract: In this paper, we focus on resolving the problem of image outpainting, which aims to extrapolate the surrounding parts given the center contents of an image. Although recent works have achieved promising performance, the lack of versatility and customization hinders their practical applications in broader scenarios. Therefore, this work presents a novel image outpainting framework that is capable… ▽ More In this paper, we focus on resolving the problem of image outpainting, which aims to extrapolate the surrounding parts given the center contents of an image. Although recent works have achieved promising performance, the lack of versatility and customization hinders their practical applications in broader scenarios. Therefore, this work presents a novel image outpainting framework that is capable of customizing the results according to the requirement of users. First of all, we take advantage of a Multimodal Large Language Model (MLLM) that automatically extracts and organizes the corresponding textual descriptions of the masked and unmasked part of a given image. Accordingly, the obtained text prompts are introduced to endow our model with the capacity to customize the outpainting results. In addition, a special Cross-Attention module, namely Center-Total-Surrounding (CTS), is elaborately designed to enhance further the the interaction between specific space regions of the image and corresponding parts of the text prompts. Note that unlike most existing methods, our approach is very resource-efficient since it is just slightly fine-tuned on the off-the-shelf stable diffusion (SD) model rather than being trained from scratch. Finally, the experimental results on three commonly used datasets, i.e. Scenery, Building, and WikiArt, demonstrate our model significantly surpasses the SoTA methods. Moreover, versatile outpainting results are listed to show its customized ability. △ Less

Submitted 3 August, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Our source code is available at: https://github.com/ucasyjz/VIP, 15 pages

arXiv:2406.00072 [pdf, other]

Methodology for Analyzing Proton Multiplicity Fluctuations with Azimuthal Partitions in Heavy-Ion Collisions

Authors: Dylan Neff, Zhongling Ji, Roli Esha, Gang Wang, Huan Huang

Abstract: A primary objective in high-energy heavy-ion collisions is to investigate the phase transition between confined and deconfined color matter. Complementary to the cumulants of conserved charges integrated over the full azimuth, we introduce a novel experimental approach to explore particle fluctuations in azimuthal partitions, which are potentially sensitive to the first-order phase transition in h… ▽ More A primary objective in high-energy heavy-ion collisions is to investigate the phase transition between confined and deconfined color matter. Complementary to the cumulants of conserved charges integrated over the full azimuth, we introduce a novel experimental approach to explore particle fluctuations in azimuthal partitions, which are potentially sensitive to the first-order phase transition in heavy-ion collisions. We evaluate proton multiplicity ($N_w$) fluctuations in azimuthal partitions of width $w$ to quantitatively estimate the clustering tendency among these protons. The $Δσ^2$ observable is defined as the normalized difference between the variance of the $N_w$ distribution and the binomial baseline. We demonstrate the feasibility and characteristics of this observable through simulations using the AMPT and MUSIC+FIST models. We also use a Gaussian correlation model to illustrate that the dependence of $Δσ^2$ on $w$ can be parameterized to accurately extract the strength and the range of the input interaction among protons. △ Less

Submitted 30 May, 2024; originally announced June 2024.

Comments: 10 pages, 15 figures

arXiv:2405.20315 [pdf, other]

ANAH: Analytical Annotation of Hallucinations in Large Language Models

Authors: Ziwei Ji, Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

Abstract: Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of… ▽ More Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of $\textbf{H}$allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively confirm that the hallucinations of LLMs progressively accumulate in the answer and use ANAH to train and evaluate hallucination annotators. We conduct extensive experiments on studying generative and discriminative annotators and show that, although current open-source LLMs have difficulties in fine-grained hallucination annotation, the generative annotator trained with ANAH can surpass all open-source LLMs and GPT-3.5, obtain performance competitive with GPT-4, and exhibits better generalization ability on unseen questions. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by ACL 2024

arXiv:2405.19732 [pdf, other]

Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning

Authors: Zixian Guo, Ming Liu, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo

Abstract: Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for… ▽ More Learning a skill generally relies on both practical experience by doer and insightful high-level guidance by instructor. Will this strategy also work well for solving complex non-convex optimization problems? Here, a common gradient-based optimizer acts like a disciplined doer, making locally optimal update at each step. Recent methods utilize large language models (LLMs) to optimize solutions for concrete problems by inferring from natural language instructions, akin to a high-level instructor. In this paper, we show that these two optimizers are complementary to each other, suggesting a collaborative optimization approach. The gradient-based optimizer and LLM-based optimizer are combined in an interleaved manner. We instruct LLMs using task descriptions and timely optimization trajectories recorded during gradient-based optimization. Inferred results from LLMs are used as restarting points for the next stage of gradient optimization. By leveraging both the locally rigorous gradient-based optimizer and the high-level deductive LLM-based optimizer, our combined optimization method consistently yields improvements over competitive baseline prompt tuning methods. Our results demonstrate the synergistic effect of conventional gradient-based optimization and the inference ability of LLMs. The code is released at https://github.com/guozix/LLM-catalyst. △ Less

Submitted 6 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18485 [pdf, other]

A shining cosmic dawn: spectroscopic confirmation of two luminous galaxies at $z\sim14$

Authors: Stefano Carniani, Kevin Hainline, Francesco D'Eugenio, Daniel J. Eisenstein, Peter Jakobsen, Joris Witstok, Benjamin D. Johnson, Jacopo Chevallard, Roberto Maiolino, Jakob M. Helton, Chris Willott, Brant Robertson, Stacey Alberts, Santiago Arribas, William M. Baker, Rachana Bhatawdekar, Kristan Boyett, Andrew J. Bunker, Alex J. Cameron, Phillip A. Cargile, Stéphane Charlot, Mirko Curti, Emma Curtis-Lake, Eiichi Egami, Giovanna Giardino , et al. (18 additional authors not shown)

Abstract: The discovery by JWST of an abundance of luminous galaxies in the very early Universe suggests that galaxies developed rapidly, in apparent tension with many standard models. However, most of these galaxies lack spectroscopic confirmation, so their distances and properties are uncertain. We present JADES JWST/NIRSpec spectroscopic confirmation of two luminous galaxies at redshifts of… ▽ More The discovery by JWST of an abundance of luminous galaxies in the very early Universe suggests that galaxies developed rapidly, in apparent tension with many standard models. However, most of these galaxies lack spectroscopic confirmation, so their distances and properties are uncertain. We present JADES JWST/NIRSpec spectroscopic confirmation of two luminous galaxies at redshifts of $z=14.32^{+0.08}_{-0.20}$ and $z=13.90\pm0.17$. The spectra reveal ultraviolet continua with prominent Lyman-$α$ breaks but no detected emission lines. This discovery proves that luminous galaxies were already in place 300 million years after the Big Bang and are more common than what was expected before JWST. The most distant of the two galaxies is unexpectedly luminous (M$_{\rm uv}=-20.81\pm0.16$) and is spatially resolved with a radius of 260 parsecs. Considering also the steep ultraviolet slope of the second galaxy ($β=-2.71\pm0.19$), we conclude that both are dominated by stellar continuum emission, showing that the excess of luminous galaxies in the early Universe cannot be entirely explained by accretion onto black holes. Galaxy formation models will need to address the existence of such large and luminous galaxies so early in cosmic history. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 26 pages, 15 figures

arXiv:2405.18462 [pdf, other]

JWST/MIRI photometric detection at $7.7\ μ\mathrm{m}$ in a galaxy at $z > 14$

Authors: Jakob M. Helton, George H. Rieke, Stacey Alberts, Zihao Wu, Daniel J. Eisenstein, Kevin N. Hainline, Stefano Carniani, Zhiyuan Ji, William M. Baker, Rachana Bhatawdekar, Andrew J. Bunker, Phillip A. Cargile, Stéphane Charlot, Jacopo Chevallard, Francesco D'Eugenio, Eiichi Egami, Benjamin D. Johnson, Gareth C. Jones, Jianwei Lyu, Roberto Maiolino, Pablo G. Pérez-González, Marcia J. Rieke, Brant Robertson, Aayush Saxena, Jan Scholtz , et al. (9 additional authors not shown)

Abstract: The James Webb Space Telescope (JWST) has spectroscopically confirmed numerous galaxies at $z > 10$. While weak rest-ultraviolet emission lines have only been seen in a handful of sources, the stronger rest-optical emission lines are highly diagnostic and accessible at mid-infrared wavelengths with the Mid-Infrared Instrument (MIRI) of JWST. We report the photometric detection of the most distant… ▽ More The James Webb Space Telescope (JWST) has spectroscopically confirmed numerous galaxies at $z > 10$. While weak rest-ultraviolet emission lines have only been seen in a handful of sources, the stronger rest-optical emission lines are highly diagnostic and accessible at mid-infrared wavelengths with the Mid-Infrared Instrument (MIRI) of JWST. We report the photometric detection of the most distant spectroscopically confirmed galaxy JADES-GS-z14-0 at $z = 14.32^{+0.08}_{-0.20}$ with MIRI at $7.7\ μ\mathrm{m}$. The most plausible solution for the stellar population properties is that this galaxy contains half a billion solar masses in stars with a strong burst of star formation in the most recent few million years. For this model, at least one-third of the flux at $7.7\ μ\mathrm{m}$ comes from the rest-optical emission lines $\mathrm{H}β$ and/or $\mathrm{[OIII]}λ\lambda4959,5007$. The inferred properties of JADES-GS-z14-0 suggest rapid mass assembly and metal enrichment during the earliest phases of galaxy formation. △ Less

Submitted 21 August, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Submitted; main text has 9 pages, 3 figures and 1 table; extended text has 15 pages, 5 figures, and 1 table

arXiv:2405.15972 [pdf, other]

SMILES Initial Data Release: Unveiling the Obscured Universe with MIRI Multi-band Imaging

Authors: Stacey Alberts, Jianwei Lyu, Irene Shivaei, George H. Rieke, Pablo G. Perez-Gonzalez, Nina Bonventura, Yongda Zhu, Jakob M. Helton, Zhiyuan Ji, Jane Morrison, Brant E. Robertson, Meredith A. Stone, Yang Sun, Christina C. Williams, Christopher N. A. Willmer

Abstract: The James Webb Space Telescope (JWST) is revolutionizing our view of the Universe through unprecedented sensitivity and resolution in the infrared, with some of the largest gains realized at its longest wavelengths. We present the Systematic Mid-infrared Instrument (MIRI) Legacy Extragalactic Survey (SMILES), an eight-band MIRI survey with Near-Infrared Spectrograph (NIRSpec) spectroscopic follow-… ▽ More The James Webb Space Telescope (JWST) is revolutionizing our view of the Universe through unprecedented sensitivity and resolution in the infrared, with some of the largest gains realized at its longest wavelengths. We present the Systematic Mid-infrared Instrument (MIRI) Legacy Extragalactic Survey (SMILES), an eight-band MIRI survey with Near-Infrared Spectrograph (NIRSpec) spectroscopic follow-up in the GOODS-S/HUDF region. SMILES takes full advantage of MIRI's continuous coverage from $5.6-25.5\,μ$m over a $\sim34$ arcmin$^2$ area to greatly expand our understanding of the obscured Universe up to cosmic noon and beyond. This work, together with a companion paper by Rieke et al., covers the SMILES science drivers and technical design, early results with SMILES, data reduction, photometric catalog creation, and the first data release. As part of the discussion on early results, we additionally present a high-level science demonstration on how MIRI's wavelength coverage and resolution will advance our understanding of cosmic dust using the full range of polycyclic aromatic hydrocarbon (PAH) emission features from $3.3-18\,μ$m. Using custom background subtraction, we produce robust reductions of the MIRI imaging that maximize the depths reached with our modest exposure times ($\sim0.6 - 2.2$ ks per filter). Included in our initial data release are (1) eight MIRI imaging mosaics reaching depths of $0.2-18\,μ$Jy ($5σ$) and (2) a $5-25.5\,μ$m photometric catalog with over 3,000 sources. Building upon the rich legacy of extensive photometric and spectroscopy coverage of GOODS-S/HUDF from the X-ray to the radio, SMILES greatly expands our investigative power in understanding the obscured Universe. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: 23 pages, 19 figures, submitted to ApJ. Comments welcome! Data release will go live at https://archive.stsci.edu/hlsp/smiles in the next few weeks

arXiv:2405.13166 [pdf, other]

FairLENS: Assessing Fairness in Law Enforcement Speech Recognition

Authors: Yicheng Wang, Mark Cusick, Mohamed Laila, Kate Puech, Zhengping Ji, Xia Hu, Michael Wilson, Noah Spitzer-Williams, Bryan Wheeler, Yasser Ibrahim

Abstract: Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most pub… ▽ More Automatic speech recognition (ASR) techniques have become powerful tools, enhancing efficiency in law enforcement scenarios. To ensure fairness for demographic groups in different acoustic environments, ASR engines must be tested across a variety of speakers in realistic settings. However, describing the fairness discrepancies between models with confidence remains a challenge. Meanwhile, most public ASR datasets are insufficient to perform a satisfying fairness evaluation. To address the limitations, we built FairLENS - a systematic fairness evaluation framework. We propose a novel and adaptable evaluation method to examine the fairness disparity between different models. We also collected a fairness evaluation dataset covering multiple scenarios and demographic dimensions. Leveraging this framework, we conducted fairness assessments on 1 open-source and 11 commercially available state-of-the-art ASR models. Our results reveal that certain models exhibit more biases than others, serving as a fairness guideline for users to make informed choices when selecting ASR models for a given real-world scenario. We further explored model biases towards specific demographic groups and observed that shifts in the acoustic domain can lead to the emergence of new biases. △ Less

Submitted 28 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.10908 [pdf, other]

UVCANDELS: The role of dust on the stellar mass-size relation of disk galaxies at 0.5 $\leq z \leq$ 3.0

Authors: Kalina V. Nedkova, Marc Rafelski, Harry I. Teplitz, Vihang Mehta, Laura DeGroot, Swara Ravindranath, Anahita Alavi, Alexander Beckett, Norman A. Grogin, Boris Häußler, Anton M. Koekemoer, Grecco A. Oyarzún, Laura Prichard, Mitchell Revalski, Gregory F. Snyder, Ben Sunnquist, Xin Wang, Rogier A. Windhorst, Nima Chartab, Christopher J. Conselice, Yicheng Guo, Nimish Hathi, Matthew J. Hayes, Zhiyuan Ji, Keunho J. Kim , et al. (8 additional authors not shown)

Abstract: We use the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey fields (UVCANDELS) to measure half-light radii in the rest-frame far-UV for $\sim$16,000 disk-like galaxies over $0.5\leq z \leq 3$. We compare these results to rest-frame optical sizes that we measure in a self-consistent way and find that the stellar mass-size relation of disk galaxies is steeper… ▽ More We use the Ultraviolet Imaging of the Cosmic Assembly Near-infrared Deep Extragalactic Legacy Survey fields (UVCANDELS) to measure half-light radii in the rest-frame far-UV for $\sim$16,000 disk-like galaxies over $0.5\leq z \leq 3$. We compare these results to rest-frame optical sizes that we measure in a self-consistent way and find that the stellar mass-size relation of disk galaxies is steeper in the rest-frame UV than in the optical across our entire redshift range. We show that this is mainly driven by massive galaxies ($\gtrsim10^{10}$M$_\odot$), which we find to also be among the most dusty. Our results are consistent with the literature and have commonly been interpreted as evidence of inside-out growth wherein galaxies form their central structures first. However, they could also suggest that the centers of massive galaxies are more heavily attenuated than their outskirts. We distinguish between these scenarios by modeling and selecting galaxies at $z=2$ from the VELA simulation suite in a way that is consistent with UVCANDELS. We show that the effects of dust alone can account for the size differences we measure at $z=2$. This indicates that, at different wavelengths, size differences and the different slopes of the stellar mass-size relation do not constitute evidence for inside-out growth. △ Less

Submitted 28 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: Accepted for publication in ApJ. 22 pages, 12 figures, and 4 tables

arXiv:2405.08555 [pdf, other]

Dual-Branch Network for Portrait Image Quality Assessment

Authors: Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai

Abstract: Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing de… ▽ More Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing devices. In this paper, we introduce a dual-branch network for portrait image quality assessment (PIQA), which can effectively address how the salient person and the background of a portrait image influence its visual quality. Specifically, we utilize two backbone networks (\textit{i.e.,} Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it. To enhance the quality-aware feature representation of the backbones, we pre-train them on the large-scale video quality assessment dataset LSVQ and the large-scale facial image quality assessment dataset GFIQA. Additionally, we leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features. Finally, we concatenate these features and regress them into quality scores via a multi-perception layer (MLP). We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ. Experimental results demonstrate that the proposed model achieves superior performance in the PIQ dataset, validating its effectiveness. The code is available at \url{https://github.com/sunwei925/DN-PIQA.git}. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.07551 [pdf, other]

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

Abstract: The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly in… ▽ More The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($μ$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

arXiv:2405.05806 [pdf, other]

MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation

Authors: Yuxiang Wei, Zhilong Ji, Jinfeng Bai, Hongzhi Zhang, Lei Zhang, Wangmeng Zuo

Abstract: Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they usually suffer from overfitting issues. The learned identity tends to entangle with irrelevant information… ▽ More Text-to-image (T2I) diffusion models have shown significant success in personalized text-to-image generation, which aims to generate novel images with human identities indicated by the reference images. Despite promising identity fidelity has been achieved by several tuning-free methods, they usually suffer from overfitting issues. The learned identity tends to entangle with irrelevant information, resulting in unsatisfied text controllability, especially on faces. In this work, we present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability. Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additional introduced cross attention. To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model. Additionally, a face-augmented dataset is constructed to facilitate disentangled identity learning, and further improve the editability. Extensive experiments demonstrate that our MasterWeaver can not only generate personalized images with faithful identity, but also exhibit superiority in text controllability. Our code can be found at https://github.com/csyxwei/MasterWeaver. △ Less

Submitted 28 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: ECCV 2024. Our code can be found at https://github.com/csyxwei/MasterWeaver

arXiv:2405.05772 [pdf, other]

JADES -- The small blue bump in GN-z11: insights into the nuclear region of a galaxy at z=10.6

Authors: Xihan Ji, Roberto Maiolino, Gary Ferland, Francesco D'Eugenio, Rachana Bhatawdekar, Stéphane Charlot, Jacopo Chevallard, Mirko Curti, Emma Curtis-Lake, Kevin Hainline, Zhiyuan Ji, Brant Robertson, Bruno Rodríguez Del Pino, Jan Scholtz, Sandro Tacchella, Christina C. Williams, Joris Witstok

Abstract: We report the detection of continuum excess in the rest-frame UV between 3000 Å and 3550 Å in the JWST/NIRSpec spectrum of GN-z11, a galaxy hosting an active galactic nucleus (AGN) at z = 10.603. The shape of the continuum excess resembles a Balmer continuum but has a break around 3546 Å in the rest frame, which is 100 Å bluewards to the Balmer limit at 3646 Å. A Balmer continuum model alone canno… ▽ More We report the detection of continuum excess in the rest-frame UV between 3000 Å and 3550 Å in the JWST/NIRSpec spectrum of GN-z11, a galaxy hosting an active galactic nucleus (AGN) at z = 10.603. The shape of the continuum excess resembles a Balmer continuum but has a break around 3546 Å in the rest frame, which is 100 Å bluewards to the Balmer limit at 3646 Å. A Balmer continuum model alone cannot fit the spectrum, implying a different origin for the continuum excess. The absence of the Balmer jump indicates an electron temperature of $\sim 3\times 10^4$ K, which is significantly higher than the temperature of $T_{e}({\rm O^{2+}}) \approx 1.3\times 10^{4}$ K inferred from [OIII]$λ4363$. The temperature difference must result from mixing of different ionized regions: the Balmer emission mainly arises from dense and hot clouds in the Broad Line Region, close to the accreting black hole, whereas the forbidden lines originate from less dense and colder gas in the host galaxy (although these ionized regions are kinematically similar in GN-z11 due to its small BH mass). We propose a potential explanation for the observed continuum excess to come from a complex of FeII emission, which shows a characteristic jump bluewards to the Balmer limit as previously seen in the spectra of many lower-redshift quasars. Through comparisons with Cloudy models, we show an Fe abundance or an overall metallicity above $\sim 1/3$ solar is likely needed. Besides the FeII emission, part of the small blue bump might also be associated with an OIII Bowen fluorescent line, a line often enhanced in dense AGN-ionized gas. Finally, the spectrum provides further evidence against Wolf-Rayet or massive stars dominating the nebular emission in GN-z11. △ Less

Submitted 20 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

Comments: 22 pages (including appendix), 18 figures, submitted to MNRAS

Showing 1–50 of 605 results for author: Ji, Z