-
The Rapid Formation of the Metal Poor Milky Way
Authors:
Turner Woody,
Charlie Conroy,
Phillip Cargile,
Ana Bonaca,
Vedant Chandra,
Jiwon Jesse Han,
Benjamin D. Johnson,
Rohan P. Naidu,
Yuan-Sen Ting
Abstract:
Our understanding of the assembly timeline of the Milky Way has been transforming along with the dramatic increase in astrometric and spectroscopic data available over the past several years. Many substructures in chemo-dynamical space have been discovered and identified as the remnants of various galactic mergers. To investigate the timeline of these mergers we select main sequence turn off & sub…
▽ More
Our understanding of the assembly timeline of the Milky Way has been transforming along with the dramatic increase in astrometric and spectroscopic data available over the past several years. Many substructures in chemo-dynamical space have been discovered and identified as the remnants of various galactic mergers. To investigate the timeline of these mergers we select main sequence turn off & subgiant stars (MSTOs) from the H3 survey, finding members in seven metal poor components of the halo: GSE, the Helmi Streams, Thamnos, Sequoia, Wukong/LMS-1, Arjuna, and I'itoi. We also select out the metal poor in situ disk to facilitate comparison to the evolution of the Milky Way itself at these early epochs. We fit individual isochrone ages to the MSTOs in each of these substructures and use the resulting age distributions to infer simple star formation histories. For GSE we resolve an extended star formation history that truncates $\approx10$ Gyr ago, as well as a clear age -- metallicity relation. From this age distribution and measured star formation history we infer that GSE merged with the Milky Way at a time $9.5-10.2$ Gyr ago, in agreement with previous estimates. We infer that the other mergers occurred at various times ranging from $9-13$ Gyr ago, and that the metal poor component of the disk built up within only a few billion years. These results reinforce the emerging picture that both the disk and halo of the Milky Way experienced a rapid assembly.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching
Authors:
Gael Le Lan,
Bowen Shi,
Zhaoheng Ni,
Sidd Srinivasan,
Anurag Kumar,
Brian Ellis,
David Kant,
Varun Nagaraja,
Ernie Chang,
Wei-Ning Hsu,
Yangyang Shi,
Vikas Chandra
Abstract:
We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model…
▽ More
We introduce a simple and efficient text-controllable high-fidelity music generation and editing model. It operates on sequences of continuous latent representations from a low frame rate 48 kHz stereo variational auto encoder codec that eliminates the information loss drawback of discrete representations. Based on a diffusion transformer architecture trained on a flow-matching objective the model can generate and edit diverse high quality stereo samples of variable duration, with simple text descriptions. We also explore a new regularized latent inversion method for zero-shot test-time text-guided editing and demonstrate its superior performance over naive denoising diffusion implicit model (DDIM) inversion for variety of music editing prompts. Evaluations are conducted on both objective and subjective metrics and demonstrate that the proposed model is not only competitive to the evaluated baselines on a standard text-to-music benchmark - quality and efficiency-wise - but also outperforms previous state of the art for music editing when combined with our proposed latent inversion. Samples are available at https://melodyflow.github.io.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Our Halo of Ice and Fire: Strong Kinematic Asymmetries in the Galactic Halo
Authors:
Jiwon Jesse Han,
Charlie Conroy,
Dennis Zaritsky,
Ana Bonaca,
Nelson Caldwell,
Vedant Chandra,
Yuan-Sen Ting
Abstract:
The kinematics of the stellar halo hold important clues to the assembly history and mass distribution of the Galaxy. In this study, we map the kinematics of stars across the Galactic halo with the H3 Survey. We find a complex distribution that breaks both azimuthal symmetry about the $Z$-axis and mirror symmetry about the Galactic plane. This asymmetry manifests as large variations in the radial v…
▽ More
The kinematics of the stellar halo hold important clues to the assembly history and mass distribution of the Galaxy. In this study, we map the kinematics of stars across the Galactic halo with the H3 Survey. We find a complex distribution that breaks both azimuthal symmetry about the $Z$-axis and mirror symmetry about the Galactic plane. This asymmetry manifests as large variations in the radial velocity dispersion $σ_r$ from as ``cold'' as 70 $\text{km}\text{ s}^{-1}$ to as ``hot'' as 160 $\text{km}\text{ s}^{-1}$. We use stellar chemistry to distinguish accreted stars from in-situ stars in the halo, and find that the accreted population has higher $σ_r$ and radially biased orbits, while the in-situ population has lower $σ_r$ and isotropic orbits. As a result, the Galactic halo kinematics are highly heterogeneous and poorly approximated as being spherical or axisymmetric. We measure radial profiles of $σ_r$ and the anisotropy parameter $β$ over Galactocentric radii $10-80\text{ kpc}$, and find that discrepancies in the literature are due to the nonspherical geometry and heterogeneous nature of the halo. Investigating the effect of strongly asymmetric $σ_r$ and $β$ on equilibrium models is a path forward to accurately constraining the Galactic gravitational field, including its total mass.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
The Extremely Metal Rich Knot of Stars at the Heart of the Galaxy
Authors:
Hans-Walter Rix,
Vedant Chandra,
Gail Zasowski,
Annalisa Pillepich,
Sergey Khoperskov,
Sofia Feltzing,
Rosemary F. Wyse,
Neige Frankel,
Danny Horta,
Juna Kollmeier,
Keivan G. Stassun,
Melissa Ness,
Jonathan C. Bird,
David L. Nidever,
Jose G. Fernandez,
João A. Amarante,
Chervin F. Laporte,
Jianhui Lian
Abstract:
We show with Gaia XP spectroscopy that extremely metal-rich stars in the Milky Way (EMR; $[M/H]_{XP} > 0.5$) - but only those - are largely confined to a tight "knot" at the center of the Galaxy. This EMR knot is round in projection, has a fairly abrupt edge near $\sim 1.5$kpc, and is a dynamically hot system. This central knot also contains very metal-rich (VMR; $+0.2\le [M/H]_{XP} \le +0.4$) sta…
▽ More
We show with Gaia XP spectroscopy that extremely metal-rich stars in the Milky Way (EMR; $[M/H]_{XP} > 0.5$) - but only those - are largely confined to a tight "knot" at the center of the Galaxy. This EMR knot is round in projection, has a fairly abrupt edge near $\sim 1.5$kpc, and is a dynamically hot system. This central knot also contains very metal-rich (VMR; $+0.2\le [M/H]_{XP} \le +0.4$) stars. However, in contrast to EMR stars, the bulk of VMR stars form an extended, highly flattened distribution in the inner Galaxy ($R_{\mathrm{GC}}\lesssim 5$ kpc). We draw on TNG50 simulations of Milky Way analogs for context and find that compact, metal-rich knots confined to $<1.5$kpc are a universal feature. In typical simulated analogs, the top 5-10% most metal-rich stars are confined to a central knot; however, in our Milky Way data this fraction is only 0.1%. Dust-penetrating wide-area near-infrared spectroscopy, such as SDSS-V, will be needed for a rigorous estimate of the fraction of stars in the Galactic EMR knot. Why in our Milky Way only EMR giants are confined to such a central knot remains to be explained. Remarkably, the central few kiloparsecs of the Milky Way harbor both the highest concentration of metal-poor stars (the `poor old heart') and almost all EMR stars. This highlights the stellar population diversity at the bottom of galactic potential wells.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
All-Sky Kinematics of the Distant Halo: The Reflex Response to the LMC
Authors:
Vedant Chandra,
Rohan P. Naidu,
Charlie Conroy,
Nicolas Garavito-Camargo,
Chervin Laporte,
Ana Bonaca,
Phillip A. Cargile,
Emily Cunningham,
Jiwon Jesse Han,
Benjamin D. Johnson,
Hans-Walter Rix,
Yuan-Sen Ting,
Turner Woody,
Dennis Zaritsky
Abstract:
The infall of the Large Magellanic Cloud (LMC) is predicted to displace the inner Milky Way (MW), imprinting an apparent 'reflex motion' on the observed velocities of distant halo stars. We construct the largest all-sky spectroscopic dataset of luminous red giant stars from $50-160$ kpc, including a new survey of the southern celestial hemisphere. We fit the full 6D kinematics of our data to measu…
▽ More
The infall of the Large Magellanic Cloud (LMC) is predicted to displace the inner Milky Way (MW), imprinting an apparent 'reflex motion' on the observed velocities of distant halo stars. We construct the largest all-sky spectroscopic dataset of luminous red giant stars from $50-160$ kpc, including a new survey of the southern celestial hemisphere. We fit the full 6D kinematics of our data to measure the amplitude and direction of the inner MW's motion towards the outer halo. The observed velocity grows with distance such that, relative to halo stars at $100$ kpc, the inner MW is lurching at $\approx 40$ km s$^{-1}$ towards a recent location along the LMC's past orbit. Our measurements align with N-body simulations of the halo's response to a $1.8 \times 10^{11} M_\odot$ LMC on first infall, suggesting that the LMC is at least 15% as massive as the MW. Our findings highlight the dramatic disequilibrium of the MW outskirts, and will enable more accurate measurements of the total mass of our Galaxy.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
An Introduction to Vision-Language Modeling
Authors:
Florian Bordes,
Richard Yuanzhe Pang,
Anurag Ajay,
Alexander C. Li,
Adrien Bardes,
Suzanne Petryk,
Oscar Mañas,
Zhiqiu Lin,
Anas Mahmoud,
Bargav Jayaraman,
Mark Ibrahim,
Melissa Hall,
Yunyang Xiong,
Jonathan Lebensold,
Candace Ross,
Srihari Jayakumar,
Chuan Guo,
Diane Bouchacourt,
Haider Al-Tahan,
Karthik Padthe,
Vasu Sharma,
Hu Xu,
Xiaoqing Ellen Tan,
Megan Richards,
Samuel Lavoie
, et al. (16 additional authors not shown)
Abstract:
Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol…
▽ More
Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technology. However, there are many challenges that need to be addressed to improve the reliability of those models. While language is discrete, vision evolves in a much higher dimensional space in which concepts cannot always be easily discretized. To better understand the mechanics behind mapping vision to language, we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them. Then, we present and discuss approaches to evaluate VLMs. Although this work primarily focuses on mapping images to language, we also discuss extending VLMs to videos.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
SpinQuant: LLM quantization with learned rotations
Authors:
Zechun Liu,
Changsheng Zhao,
Igor Fedorov,
Bilge Soran,
Dhruv Choudhary,
Raghuraman Krishnamoorthi,
Vikas Chandra,
Yuandong Tian,
Tijmen Blankevoort
Abstract:
Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a…
▽ More
Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rotation parameterizations that lead to identical outputs in full-precision Transformer architectures, and find that some random rotations lead to much better quantization than others, with an up to 13 points difference in downstream zero-shot reasoning performance. As a result, we propose SpinQuant that optimizes (or learns) the rotation matrices with Cayley optimization on a small validation set. With 4-bit quantization of weight, activation, and KV-cache, SpinQuant narrows the accuracy gap on zero-shot reasoning tasks with full precision to merely 2.9 points on the LLaMA-2 7B model, surpassing LLM-QAT by 19.1 points and SmoothQuant by 25.0 points. SpinQuant also outperforms concurrent work QuaRot, which applies random rotations to remove outliers. In particular, for LLaMA-2 7B/LLaMA-3 8B models that are hard to quantize, SpinQuant reduces the gap to full precision by 30.2%/34.1% relative to QuaRot.
△ Less
Submitted 28 May, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications
Authors:
Yang Li,
Changsheng Zhao,
Hyungtak Lee,
Ernie Chang,
Yangyang Shi,
Vikas Chandra
Abstract:
Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of L…
▽ More
Large language models (LLMs) significantly enhance the performance of various applications, but they are computationally intensive and energy-demanding. This makes it challenging to deploy them on devices with limited resources, such as personal computers and mobile/wearable devices, and results in substantial inference costs in resource-rich environments like cloud servers. To extend the use of LLMs, we introduce a low-rank decomposition approach to effectively compress these models, tailored to the requirements of specific applications. We observe that LLMs pretrained on general datasets contain many redundant components not needed for particular applications. Our method focuses on identifying and removing these redundant parts, retaining only the necessary elements for the target applications. Specifically, we represent the weight matrices of LLMs as a linear combination of base components. We then prune the irrelevant bases and enhance the model with new bases beneficial for specific applications. Deep compression results on the Llama 2-7b and -13B models, conducted on target applications including mathematical reasoning and code generation, show that our method significantly reduces model size while maintaining comparable accuracy to state-of-the-art low-rank compression techniques.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Assessing Engraftment Following Fecal Microbiota Transplant
Authors:
Chloe Herman,
Bridget M. Barker,
Thais F. Bartelli,
Vidhi Chandra,
Rosa Krajmalnik-Brown,
Mary Jewell,
Le Li,
Chen Liao,
Florencia McAllister,
Khemlal Nirmalkar,
Joao B. Xavier,
J. Gregory Caporaso
Abstract:
Fecal Microbiota Transplant (FMT) is an FDA approved treatment for recurrent Clostridium difficile infections, and is being explored for other clinical applications, from alleviating digestive and neurological disorders, to priming the microbiome for cancer treatment, and restoring microbiomes impacted by cancer treatment.
Quantifying the extent of engraftment following an FMT is important in de…
▽ More
Fecal Microbiota Transplant (FMT) is an FDA approved treatment for recurrent Clostridium difficile infections, and is being explored for other clinical applications, from alleviating digestive and neurological disorders, to priming the microbiome for cancer treatment, and restoring microbiomes impacted by cancer treatment.
Quantifying the extent of engraftment following an FMT is important in determining if a recipient didn't respond because the engrafted microbiome didn't produce the desired outcomes (a successful FMT, but negative treatment outcome), or the microbiome didn't engraft (an unsuccessful FMT and negative treatment outcome). The lack of a consistent methodology for quantifying FMT engraftment extent hinders the assessment of FMT success and its relation to clinical outcomes, and presents challenges for comparing FMT results and protocols across studies.
Here we review 46 studies of FMT in humans and model organisms and group their approaches for assessing the extent to which an FMT engrafts into three criteria: 1) Chimeric Asymmetric Community Coalescence investigates microbiome shifts following FMT engraftment. 2) Donated Microbiome Indicator Features tracks donated microbiome features as a signal of engraftment with methods such as differential abundance testing based on the current sample collection, or tracking changes in feature abundances that have been previously identified. 3) Temporal Stability examines how resistant post-FMT recipient's microbiomes are to reverting back to their baseline microbiome. Investigated together, these criteria provide a clear assessment of microbiome engraftment.
We discuss the pros and cons of each of these criteria, providing illustrative examples of their application. We also introduce key terminology and recommendations on how FMT studies can be analyzed for rigorous engraftment extent assessment.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Authors:
Avinash Paliwal,
Wei Ye,
Jinhui Xiong,
Dmytro Kotovenko,
Rakesh Ranjan,
Vikas Chandra,
Nima Khademi Kalantari
Abstract:
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured p…
▽ More
The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Revisiting shear stress tensor evolution: Non-resistive magnetohydrodynamics with momentum-dependent relaxation time
Authors:
Sunny Kumar Singh,
Manu Kurian,
Vinod Chandra
Abstract:
This study aims to develop second-order relativistic viscous magnetohydrodynamics (MHD) derived from kinetic theory within an extended relaxation time approximation (momentum/energy dependent) for the collision kernel. The investigation involves a detailed examination of shear stress tensor evolution equations and associated transport coefficients. The Boltzmann equation is solved using a Chapman-…
▽ More
This study aims to develop second-order relativistic viscous magnetohydrodynamics (MHD) derived from kinetic theory within an extended relaxation time approximation (momentum/energy dependent) for the collision kernel. The investigation involves a detailed examination of shear stress tensor evolution equations and associated transport coefficients. The Boltzmann equation is solved using a Chapman-Enskog-like gradient expansion for a charge-conserved conformal system, incorporating a momentum-dependent relaxation time. The derived relativistic non-resistive, viscous second-order MHD equations for the shear stress tensor reveal significant modifications in the coupling with dissipative charge current and magnetic field due to the momentum dependence of the relaxation time. By utilizing a power law parametrization to quantify the momentum dependence of the relaxation time, the anisotropic magnetic field-dependent shear coefficients in the Navier-Stokes limit have been investigated. The resulting viscous coefficients are seen to be sensitive to the momentum dependence of the relaxation time.
△ Less
Submitted 24 May, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
What Does the Large Magellanic Cloud Look Like? It Depends on [M/H] and Age
Authors:
Neige Frankel,
Rene Andrae,
Hans-Walter Rix,
Joshua Povick,
Vedant Chandra
Abstract:
We offer a new way to look at the Large Magellanic Cloud through stellar mono-abundance and mono-age-mono-abundance maps. These maps are based on $\gtrsim 500\,000$ member stars with photo-spectroscopic [M/H] and age estimates from Gaia DR3 data, and they are the first area-complete, metallicity- and age-differentiated stellar maps of any disk galaxy. Azimuthally averaged, these maps reveal a surp…
▽ More
We offer a new way to look at the Large Magellanic Cloud through stellar mono-abundance and mono-age-mono-abundance maps. These maps are based on $\gtrsim 500\,000$ member stars with photo-spectroscopic [M/H] and age estimates from Gaia DR3 data, and they are the first area-complete, metallicity- and age-differentiated stellar maps of any disk galaxy. Azimuthally averaged, these maps reveal a surprisingly simple picture of the Milky Way's largest satellite galaxy. For any [M/H] below -0.1 dex, the LMC's radial profile is well described by a simple exponential, but with a scale length that steadily shrinks towards higher metallicities, from nearly 2.3~kpc at [M/H]$=-1.8$ to only 0.75~kpc at [M/H]$=-0.25$. The prominence of the bar decreases dramatically with [M/H], making it barely discernible at [M/H]$\lesssim -1.5$. Yet, even for metal-rich populations, the bar has little impact on the azimuthally averaged profile of the mono-abundance components. Including ages, we find that the scale length is a greater function of age than of metallicity, with younger populations far more centrally concentrated. At old ages, the scale length decreases with increasing metallicity; at young ages, the scale-length is independent of metallicity. These findings provide quantitative support for a scenario where the LMC built its stellar structure effectively outside in.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
B-mesons as essential probes of hot QCD matter
Authors:
Vinod Chandra,
Santosh K. Das
Abstract:
This article elucidates the pivotal role of b-mesons and bottomonium states in exploring the existence and properties of hot QCD matter (commonly known as quark-gluon-plasma (QGP) produced within the crucible heavy-ion collision experiments). Owing to the complex and confounding nature of strong interaction force the direct detection of probing the hot QCD matter is not feasible. In light of this,…
▽ More
This article elucidates the pivotal role of b-mesons and bottomonium states in exploring the existence and properties of hot QCD matter (commonly known as quark-gluon-plasma (QGP) produced within the crucible heavy-ion collision experiments). Owing to the complex and confounding nature of strong interaction force the direct detection of probing the hot QCD matter is not feasible. In light of this, investigating the dynamics of b-quarks and anti-quarks within the hot QCD medium emerges as an invaluable indirect probe. The impact of b-quarks and the mesons spans a spectrum of interesting domains regarding the physics of QCD at finite temperature, encompassing the QCD phase transition, color screening, quarkonia dissociation, heavy quark energy loss and collective flow, anisotropic aspects, and strongly coupled nature of hot QCD medium. These aspects underscore the indispensable nature of B-mesons in the quest to create and explore the complex nature of strong interaction force through the QGP/hot QCD matter. In this context, we mainly focus on works related to transport studies of b-mesons in hot QCD medium, lattice QCD, and effective field theory studies on bottomonium states, and finally, open quantum system frameworks to quarkonia to explore the properties of hot QCD medium in relativistic heavy-ion collision experiments.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Authors:
Zechun Liu,
Changsheng Zhao,
Forrest Iandola,
Chen Lai,
Yuandong Tian,
Igor Fedorov,
Yunyang Xiong,
Ernie Chang,
Yangyang Shi,
Raghuraman Krishnamoorthi,
Liangzhen Lai,
Vikas Chandra
Abstract:
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our in…
▽ More
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight-sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.
△ Less
Submitted 26 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition
Authors:
Yang Li,
Yuan Shangguan,
Yuhao Wang,
Liangzhen Lai,
Ernie Chang,
Changsheng Zhao,
Yangyang Shi,
Vikas Chandra
Abstract:
Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are inv…
▽ More
Power consumption plays an important role in on-device streaming speech recognition, as it has a direct impact on the user experience. This study delves into how weight parameters in speech recognition models influence the overall power consumption of these models. We discovered that the impact of weight parameters on power consumption varies, influenced by factors including how often they are invoked and their placement in memory. Armed with this insight, we developed design guidelines aimed at optimizing on-device speech recognition models. These guidelines focus on minimizing power use without substantially affecting accuracy. Our method, which employs targeted compression based on the varying sensitivities of weight parameters, demonstrates superior performance compared to state-of-the-art compression methods. It achieves a reduction in energy usage of up to 47% while maintaining similar model accuracy and improving the real-time factor.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Authors:
Shitao Tang,
Jiacheng Chen,
Dilin Wang,
Chengzhou Tang,
Fuyang Zhang,
Yuchen Fan,
Vikas Chandra,
Yasutaka Furukawa,
Rakesh Ranjan
Abstract:
This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency…
▽ More
This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model. The project page is at https://mvdiffusion-plusplus.github.io.
△ Less
Submitted 30 April, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
A self-consistent data-driven model for determining stellar parameters from optical and near-IR spectra
Authors:
Logan Sizemore,
Diego Llanes,
Marina Kounkel,
Brian Hutchinson,
Keivan G. Stassun,
Vedant Chandra
Abstract:
Data-driven models, which apply machine learning to infer physical properties from large quantities of data, have become increasingly important for extracting stellar properties from spectra. In general, these methods have been applied to data in one wavelength regime or another. For example, APOGEE Net has been applied to near-IR spectra from the SDSS-V APOGEE survey to predict stellar parameters…
▽ More
Data-driven models, which apply machine learning to infer physical properties from large quantities of data, have become increasingly important for extracting stellar properties from spectra. In general, these methods have been applied to data in one wavelength regime or another. For example, APOGEE Net has been applied to near-IR spectra from the SDSS-V APOGEE survey to predict stellar parameters (Teff, log g, and [Fe/H]) for all stars with Teff from 3,000 to 50,000 K, including pre-main sequence stars, OB stars, main sequence dwarfs, and red giants. The increasing number of large surveys across multiple wavelength regimes provides the opportunity to improve data-driven models through learning from multiple datasets at once. In SDSS-V, a number of spectra of stars will be observed not just with APOGEE in near-IR, but also with BOSS in optical regime. Here we aim to develop a complementary model, BOSS Net, that will replicate the performance of APOGEE Net in these optical data through label transfer. We further improve the model by extending it to brown dwarfs, as well as white dwarfs, resulting in a comprehensive coverage between 1700<Teff<100,000 K and 0<log g<10, to ensure BOSS Net can reliably measure parameters of most of the commonly observed objects within this parameter space. We also update APOGEE Net to achieve a comparable performance in the near-IR regime. The resulting models provide a robust tool for measuring stellar evolutionary states, and in turn, enable characterization of the star forming history of the Galaxy.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Spectacular nucleosynthesis from early massive stars
Authors:
Alexander P. Ji,
Sanjana Curtis,
Nicholas Storm,
Vedant Chandra,
Kevin C. Schlaufman,
Keivan G. Stassun,
Alexander Heger,
Marco Pignatari,
Adrian M. Price-Whelan,
Maria Bergemann,
Guy S. Stringfellow,
Carla Frohlich,
Henrique Reggiani,
Erika M. Holmbeck,
Jamie Tayar,
Shivani P. Shah,
Emily J. Griffith,
Chervin F. P. Laporte,
Andrew R. Casey,
Keith Hawkins,
Danny Horta,
William Cerny,
Pierre Thibodeaux,
Sam A. Usman,
Joao A. S. Amarante
, et al. (17 additional authors not shown)
Abstract:
Stars formed with initial mass over 50 Msun are very rare today, but they are thought to be more common in the early universe. The fates of those early, metal-poor, massive stars are highly uncertain. Most are expected to directly collapse to black holes, while some may explode as a result of rotationally powered engines or the pair-creation instability. We present the chemical abundances of J0931…
▽ More
Stars formed with initial mass over 50 Msun are very rare today, but they are thought to be more common in the early universe. The fates of those early, metal-poor, massive stars are highly uncertain. Most are expected to directly collapse to black holes, while some may explode as a result of rotationally powered engines or the pair-creation instability. We present the chemical abundances of J0931+0038, a nearby low-mass star identified in early followup of SDSS-V Milky Way Mapper, which preserves the signature of unusual nucleosynthesis from a massive star in the early universe. J0931+0038 has relatively high metallicity ([Fe/H] = -1.76 +/- 0.13) but an extreme odd-even abundance pattern, with some of the lowest known abundance ratios of [N/Fe], [Na/Fe], [K/Fe], [Sc/Fe], and [Ba/Fe]. The implication is that a majority of its metals originated in a single extremely metal-poor nucleosynthetic source. An extensive search through nucleosynthesis predictions finds a clear preference for progenitors with initial mass > 50 Msun, making J0931+0038 one of the first observational constraints on nucleosynthesis in this mass range. However the full abundance pattern is not matched by any models in the literature. J0931+0038 thus presents a challenge for the next generation of nucleosynthesis models and motivates study of high-mass progenitor stars impacted by convection, rotation, jets, and/or binary companions. Though rare, more examples of unusual early nucleosynthesis in metal-poor stars should be found in upcoming large spectroscopic surveys.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Authors:
Peihao Wang,
Dejia Xu,
Zhiwen Fan,
Dilin Wang,
Sreyas Mohan,
Forrest Iandola,
Rakesh Ranjan,
Yilei Li,
Qiang Liu,
Zhangyang Wang,
Vikas Chandra
Abstract:
Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explai…
▽ More
Despite the remarkable performance of score distillation in text-to-3D generation, such techniques notoriously suffer from view inconsistency issues, also known as "Janus" artifact, where the generated objects fake each view with multiple front faces. Although empirically effective methods have approached this problem via score debiasing or prompt engineering, a more rigorous perspective to explain and tackle this problem remains elusive. In this paper, we reveal that the existing score distillation-based text-to-3D generation frameworks degenerate to maximal likelihood seeking on each view independently and thus suffer from the mode collapse problem, manifesting as the Janus artifact in practice. To tame mode collapse, we improve score distillation by re-establishing the entropy term in the corresponding variational objective, which is applied to the distribution of rendered images. Maximizing the entropy encourages diversity among different views in generated 3D assets, thereby mitigating the Janus problem. Based on this new objective, we derive a new update rule for 3D score distillation, dubbed Entropic Score Distillation (ESD). We theoretically reveal that ESD can be simplified and implemented by just adopting the classifier-free guidance trick upon variational score distillation. Although embarrassingly straightforward, our extensive experiments successfully demonstrate that ESD can be an effective treatment for Janus artifacts in score distillation.
△ Less
Submitted 29 March, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity
Authors:
Peihao Wang,
Zhiwen Fan,
Dejia Xu,
Dilin Wang,
Sreyas Mohan,
Forrest Iandola,
Rakesh Ranjan,
Yilei Li,
Qiang Liu,
Zhangyang Wang,
Vikas Chandra
Abstract:
Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS an…
▽ More
Score distillation has emerged as one of the most prevalent approaches for text-to-3D asset synthesis. Essentially, score distillation updates 3D parameters by lifting and back-propagating scores averaged over different views. In this paper, we reveal that the gradient estimation in score distillation is inherent to high variance. Through the lens of variance reduction, the effectiveness of SDS and VSD can be interpreted as applications of various control variates to the Monte Carlo estimator of the distilled score. Motivated by this rethinking and based on Stein's identity, we propose a more general solution to reduce variance for score distillation, termed Stein Score Distillation (SSD). SSD incorporates control variates constructed by Stein identity, allowing for arbitrary baseline functions. This enables us to include flexible guidance priors and network architectures to explicitly optimize for variance reduction. In our experiments, the overall pipeline, dubbed SteinDreamer, is implemented by instantiating the control variate with a monocular depth estimator. The results suggest that SSD can effectively reduce the distillation variance and consistently improve visual quality for both object- and scene-level generation. Moreover, we demonstrate that SteinDreamer achieves faster convergence than existing methods due to more stable gradient updates.
△ Less
Submitted 29 March, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
SqueezeSAM: User friendly mobile interactive segmentation
Authors:
Balakrishnan Varadarajan,
Bilge Soran,
Forrest Iandola,
Xiaoyu Xiang,
Yunyang Xiong,
Lemeng Wu,
Chenchen Zhu,
Raghuraman Krishnamoorthi,
Vikas Chandra
Abstract:
The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware…
▽ More
The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware due to its high computational demands and large model size. Our research aims to adapt SAM for use in mobile photography applications. To this end, we have developed a fully convolutional SqueezeSAM model architecture, which is 62.5 times faster and 31.6 times smaller than the original SAM, making it a viable solution for mobile applications. Furthermore, our tiny model achieves an mIOU within 1% of the original VIT-H architecture.
Automated segmentation holds significant value in the creation flow for photography applications, as evidenced by its adoption by leading industry players like apple and capcut. To facilitate this automation, we employ salient object detection and simulate potential user clicks for foreground object selection, generating an initial segmentation mask that users can subsequently edit interactively. A common user expectation is that a click on a specific part of an object will result in the segmentation of the entire object. For example, a click on a person's t-shirt in a photo should ideally segment the entire person, not just the t-shirt. However, SAM typically only segments the clicked area. We address this limitation through a novel data augmentation scheme. Consequently, if a user clicks on a person holding a basketball, both the person and the basketball are segmented together, aligning with user expectations and enhancing the overall user experience.
△ Less
Submitted 20 May, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Authors:
Yunyang Xiong,
Bala Varadarajan,
Lemeng Wu,
Xiaoyu Xiang,
Fanyi Xiao,
Chenchen Zhu,
Xiaoliang Dai,
Dilin Wang,
Fei Sun,
Forrest Iandola,
Raghuraman Krishnamoorthi,
Vikas Chandra
Abstract:
Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications.…
▽ More
Segment Anything Model (SAM) has emerged as a powerful tool for numerous vision applications. A key component that drives the impressive performance for zero-shot transfer and high versatility is a super large Transformer model trained on the extensive high-quality SA-1B dataset. While beneficial, the huge computation cost of SAM model has limited its applications to wider real-world applications. To address this limitation, we propose EfficientSAMs, light-weight SAM models that exhibits decent performance with largely reduced complexity. Our idea is based on leveraging masked image pretraining, SAMI, which learns to reconstruct features from SAM image encoder for effective visual representation learning. Further, we take SAMI-pretrained light-weight image encoders and mask decoder to build EfficientSAMs, and finetune the models on SA-1B for segment anything task. We perform evaluations on multiple vision tasks including image classification, object detection, instance segmentation, and semantic object detection, and find that our proposed pretraining method, SAMI, consistently outperforms other masked image pretraining methods. On segment anything task such as zero-shot instance segmentation, our EfficientSAMs with SAMI-pretrained lightweight image encoders perform favorably with a significant gain (e.g., ~4 AP on COCO/LVIS) over other fast SAM models.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
On The Open Prompt Challenge In Conditional Audio Generation
Authors:
Ernie Chang,
Sidd Srinivasan,
Mahi Luthra,
Pin-Jie Lin,
Varun Nagaraja,
Forrest Iandola,
Zechun Liu,
Zhaoheng Ni,
Changsheng Zhao,
Yangyang Shi,
Vikas Chandra
Abstract:
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two ke…
▽ More
Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two key insights: (1) User prompts are generally under-specified, leading to a large alignment gap between user prompts and training prompts. (2) There is a distribution of audio descriptions for which TTA models are better at generating higher quality audio, which we refer to as ``audionese''. To this end, we rewrite prompts with instruction-tuned models and propose utilizing text-audio alignment as feedback signals via margin ranking learning for audio improvements. On both objective and subjective human evaluations, we observed marked improvements in both text-audio alignment and music audio quality.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
In-Context Prompt Editing For Conditional Audio Generation
Authors:
Ernie Chang,
Pin-Jie Lin,
Yang Li,
Sidd Srinivasan,
Gael Le Lan,
David Kant,
Yangyang Shi,
Forrest Iandola,
Vikas Chandra
Abstract:
Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional au…
▽ More
Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional audio generation in the wild as user prompts are under-specified. In particular, we observe a consistent audio quality degradation in generated audio samples with user prompts, as opposed to training set prompts. To this end, we present a retrieval-based in-context prompt editing framework that leverages the training captions as demonstrative exemplars to revisit the user prompts. We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited with reference to the training captions as exemplars.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Measuring The Mass-Radius Relation of White Dwarfs Using Wide Binaries
Authors:
Stefan Arseneau,
Vedant Chandra,
Hsiang-Chih Hwang,
Nadia L. Zakamska,
Gautham Adamane Pallathadka,
Nicole R. Crumpler,
J. J. Hermes,
Kareem El-Badry,
Hans-Walter Rix,
Keivan G. Stassun,
Boris T. Gaensicke,
Joel R. Brownstein,
Sean Morrison
Abstract:
Measuring the mass-radius relation of individual white dwarfs is an empirically challenging task that has been performed for only a few dozen stars. We measure the white dwarf mass-radius relation using gravitational redshifts and radii of 137 white dwarfs in wide binaries with main sequence companions. We obtain the space velocities to these systems using the main sequence companion, and subtract…
▽ More
Measuring the mass-radius relation of individual white dwarfs is an empirically challenging task that has been performed for only a few dozen stars. We measure the white dwarf mass-radius relation using gravitational redshifts and radii of 137 white dwarfs in wide binaries with main sequence companions. We obtain the space velocities to these systems using the main sequence companion, and subtract these Doppler redshifts from the white dwarfs' apparent motions, isolating their gravitational redshifts. We use Gaia data to calculate the surface temperatures and radii of these white dwarfs, thereby deriving an empirical gravitational redshift-radius relation. This work demonstrates the utility of low-resolution Galactic surveys to measure the white dwarf equation of state. Our results are consistent with theoretical models, and represent the largest sample of individual white dwarf gravitational redshift measurements to date.
△ Less
Submitted 30 October, 2023;
originally announced October 2023.
-
Discovery of a proto-white dwarf with a massive unseen companion
Authors:
Gautham Adamane Pallathadka,
Vedant Chandra,
Nadia L. Zakamska,
Hsiang-Chih Hwang,
Yossef Zenati,
J. J. Hermes,
Kareem El-Badry,
Boris T. Gaensicke,
Sean Morrison,
Nicole R. Crumpler,
Stefan Arseneau
Abstract:
We report the discovery of SDSS~J022932.28+713002.7, a nascent extremely low-mass (ELM) white dwarf (WD) orbiting a massive ($> 1\,M_\odot$ at 2$σ$ confidence) companion with a period of 36 hours. We use a combination of spectroscopy, including data from the ongoing SDSS-V survey, and photometry to measure the stellar parameters for the primary pre-ELM white dwarf. The lightcurve of the primary WD…
▽ More
We report the discovery of SDSS~J022932.28+713002.7, a nascent extremely low-mass (ELM) white dwarf (WD) orbiting a massive ($> 1\,M_\odot$ at 2$σ$ confidence) companion with a period of 36 hours. We use a combination of spectroscopy, including data from the ongoing SDSS-V survey, and photometry to measure the stellar parameters for the primary pre-ELM white dwarf. The lightcurve of the primary WD exhibits ellipsoidal variation, which we combine with radial velocity data and $\tt{PHOEBE}$ binary simulations to estimate the mass of the invisible companion. We find that the primary WD has mass $M_1$ = $0.18^{+0.02}_{-0.02}$ M$_\odot$ and the unseen secondary has mass $M_2$ = $1.19^{+0.21}_{-0.14}$ M$_\odot$. The mass of the companion suggests that it is most likely a near-Chandrasekhar mass white dwarf or a neutron star. It is likely that the system recently went through a Roche lobe overflow from the visible primary onto the invisible secondary. The dynamical configuration of the binary is consistent with the theoretical evolutionary tracks for such objects, and the primary is currently in its contraction phase. The measured orbital period puts this system on a stable evolutionary path which, within a few Gyrs, will lead to a contracted ELM white dwarf orbiting a massive compact companion.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
The Three-Phase Evolution of the Milky Way
Authors:
Vedant Chandra,
Vadim A. Semenov,
Hans-Walter Rix,
Charlie Conroy,
Ana Bonaca,
Rohan P. Naidu,
Rene Andrae,
Jiadong Li,
Lars Hernquist
Abstract:
We illustrate the formation and evolution of the Milky Way over cosmic time, utilizing a sample of 10 million red giant stars with full chemodynamical information, including metallicities and $α$-abundances from low-resolution Gaia XP spectra. The evolution of angular momentum as a function of metallicity - a rough proxy for stellar age, particularly for high-[$α$/Fe] stars - displays three distin…
▽ More
We illustrate the formation and evolution of the Milky Way over cosmic time, utilizing a sample of 10 million red giant stars with full chemodynamical information, including metallicities and $α$-abundances from low-resolution Gaia XP spectra. The evolution of angular momentum as a function of metallicity - a rough proxy for stellar age, particularly for high-[$α$/Fe] stars - displays three distinct phases: the disordered and chaotic protogalaxy, the kinematically-hot old disk, and the kinematically-cold young disk. The old high-$α$ disk starts at [Fe/H] $\approx -1.0$, 'spinning up' from the nascent protogalaxy, and then exhibits a smooth 'cooldown' toward more ordered and circular orbits at higher metallicities. The young low-$α$ disk is kinematically cold throughout its metallicity range, with its observed properties modulated by a strong radial gradient. We interpret these trends using Milky Way analogs from the TNG50 cosmological simulation, identifying one that closely matches the kinematic evolution of our Galaxy. This halo's protogalaxy spins up into a relatively thin and misaligned high-$α$ disk at early times, which is subsequently heated and torqued by a major gas-rich merger. The merger contributes a large amount of low-metallicity gas and angular momentum, from which the kinematically cold low-$α$ stellar disk is subsequently born. This simulated history parallels several observed features of the Milky Way, particularly the decisive 'GSE' merger that likely occurred at $z \approx 2$. Our results provide an all-sky perspective on the emerging picture of our Galaxy's three-phase formation, impelled by the three physical mechanisms of spinup, merger, and cooldown.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Authors:
Jun Chen,
Deyao Zhu,
Xiaoqian Shen,
Xiang Li,
Zechun Liu,
Pengchuan Zhang,
Raghuraman Krishnamoorthi,
Vikas Chandra,
Yunyang Xiong,
Mohamed Elhoseiny
Abstract:
Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others. The challenge is to use a single model for performing diverse vision-language t…
▽ More
Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others. The challenge is to use a single model for performing diverse vision-language tasks effectively with simple multi-modal instructions. Towards this objective, we introduce MiniGPT-v2, a model that can be treated as a unified interface for better handling various vision-language tasks. We propose using unique identifiers for different tasks when training the model. These identifiers enable our model to better distinguish each task instruction effortlessly and also improve the model learning efficiency for each task. After the three-stage training, the experimental results show that MiniGPT-v2 achieves strong performance on many visual question-answering and visual grounding benchmarks compared to other vision-language generalist models. Our model and codes are available at https://minigpt-v2.github.io/
△ Less
Submitted 7 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
AspGap: Augmented Stellar Parameters and Abundances for 23 million RGB stars from Gaia XP low-resolution spectra
Authors:
Jiadong Li,
Kaze W. K. Wong,
David W. Hogg,
Hans-Walter Rix,
Vedant Chandra
Abstract:
We present AspGap, a new approach to infer stellar labels from low-resolution Gaia XP spectra, including precise [$α$/M] estimates for the first time. AspGap is a neural-network based regression model trained on APOGEE spectra. In the training step, AspGap learns to use XP spectra not only to predict stellar labels but also the high-resolution APOGEE spectra that lead to the same stellar labels. T…
▽ More
We present AspGap, a new approach to infer stellar labels from low-resolution Gaia XP spectra, including precise [$α$/M] estimates for the first time. AspGap is a neural-network based regression model trained on APOGEE spectra. In the training step, AspGap learns to use XP spectra not only to predict stellar labels but also the high-resolution APOGEE spectra that lead to the same stellar labels. The inclusion of this last model component -- dubbed the hallucinator -- creates a more physically motivated mapping and significantly improves the prediction of stellar labels in the validation, particularly of [$α$/M]. For giant stars, we find cross-validated rms accuracies for Teff, log g, [M/H], [$α$/M] of ~1%, 0.12 dex, 0.07 dex, 0.03 dex, respectively. We also validate our labels through comparison with external datasets and through a range of astrophysical tests that demonstrate that we are indeed determining [$α$/M] from the XP spectra, rather than just inferring it indirectly from correlations with other labels. We publicly release the AspGap codebase, along with our stellar parameter catalog for all giants observed by Gaia XP. AspGap enables new insights into the formation and chemo-dynamics of our Galaxy by providing precise [$α$/M] estimates for 23 million giant stars, including 12 million with radial velocities from Gaia.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Exploring Speech Enhancement for Low-resource Speech Synthesis
Authors:
Zhaoheng Ni,
Sravya Popuri,
Ning Dong,
Kohei Saijo,
Xiaohui Zhang,
Gael Le Lan,
Yangyang Shi,
Vikas Chandra,
Changhan Wang
Abstract:
High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive. Applying speech enhancement on Automatic Speech Recognition (ASR) corpus mitigates the issue by augmenting the training data, while how the nonlinear speech distortion brought by speech enhancement models affects TTS…
▽ More
High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive. Applying speech enhancement on Automatic Speech Recognition (ASR) corpus mitigates the issue by augmenting the training data, while how the nonlinear speech distortion brought by speech enhancement models affects TTS training still needs to be investigated. In this paper, we train a TF-GridNet speech enhancement model and apply it to low-resource datasets that were collected for the ASR task, then train a discrete unit based TTS model on the enhanced speech. We use Arabic datasets as an example and show that the proposed pipeline significantly improves the low-resource TTS system compared with other baseline methods in terms of ASR WER metric. We also run empirical analysis on the correlation between speech enhancement and TTS performances.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
FoleyGen: Visually-Guided Audio Generation
Authors:
Xinhao Mei,
Varun Nagaraja,
Gael Le Lan,
Zhaoheng Ni,
Ernie Chang,
Yangyang Shi,
Vikas Chandra
Abstract:
Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we…
▽ More
Recent advancements in audio generation have been spurred by the evolution of large-scale deep learning models and expansive datasets. However, the task of video-to-audio (V2A) generation continues to be a challenge, principally because of the intricate relationship between the high-dimensional visual and auditory data, and the challenges associated with temporal synchronization. In this study, we introduce FoleyGen, an open-domain V2A generation system built on a language modeling paradigm. FoleyGen leverages an off-the-shelf neural audio codec for bidirectional conversion between waveforms and discrete tokens. The generation of audio tokens is facilitated by a single Transformer model, which is conditioned on visual features extracted from a visual encoder. A prevalent problem in V2A generation is the misalignment of generated audio with the visible actions in the video. To address this, we explore three novel visual attention mechanisms. We further undertake an exhaustive evaluation of multiple visual encoders, each pretrained on either single-modal or multi-modal tasks. The experimental results on VGGSound dataset show that our proposed FoleyGen outperforms previous systems across all objective metrics and human evaluations.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Stack-and-Delay: a new codebook pattern for music generation
Authors:
Gael Le Lan,
Varun Nagaraja,
Ernie Chang,
David Kant,
Zhaoheng Ni,
Yangyang Shi,
Forrest Iandola,
Vikas Chandra
Abstract:
In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay…
▽ More
In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns. In particular, flattening the codebooks represents the highest quality decoding strategy, while being notoriously slow. To this end, we propose a novel stack-and-delay style of decoding strategy to improve upon the flat pattern decoding where generation speed is four times faster as opposed to vanilla flat decoding. This brings the inference time close to that of the delay decoding strategy, and allows for faster inference on GPU for small batch sizes. For the same inference efficiency budget as the delay pattern, we show that the proposed approach performs better in objective evaluations, almost closing the gap with the flat pattern in terms of quality. The results are corroborated by subjective evaluations which show that samples generated by the new model are slightly more often preferred to samples generated by the competing model given the same text prompts.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Enhance audio generation controllability through representation similarity regularization
Authors:
Yangyang Shi,
Gael Le Lan,
Varun Nagaraja,
Zhaoheng Ni,
Xinhao Mei,
Ernie Chang,
Forrest Iandola,
Yang Liu,
Vikas Chandra
Abstract:
This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regula…
▽ More
This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regularization to ensure the alignment between the chosen text representation and the language model's predictions. Our proposal involves the incorporation of audio and text representation regularization, particularly during the classifier-free guidance (CFG) phase, where the text condition is excluded from cross attention during language model training. The aim of this proposed representation regularization is to minimize discrepancies in audio and text similarity compared to other samples within the same training batch. Experimental results on both music and audio generation tasks demonstrate that our proposed methods lead to improvements in objective metrics for both audio and music generation, as well as an enhancement in the human perception for audio generation.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Authors:
Yang Li,
Liangzhen Lai,
Yuan Shangguan,
Forrest N. Iandola,
Zhaoheng Ni,
Ernie Chang,
Yangyang Shi,
Vikas Chandra
Abstract:
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear pr…
▽ More
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage.
To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.
△ Less
Submitted 18 January, 2024; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Collective excitations of a hot QCD medium in a time dependent background magnetic field
Authors:
Gowthama K K,
Vinod Chandra
Abstract:
Collective modes within a hot Quantum Chromodynamics (QCD) medium are obtained from the polarization tensor, considering both constant and time-varying electromagnetic fields. In both scenarios, five complex modes emerge, reliant on the wave vector ($k$), with electrical conductivity exerting significant influence. The impact of the modes on the energy loss of heavy quarks in the hot QCD medium wi…
▽ More
Collective modes within a hot Quantum Chromodynamics (QCD) medium are obtained from the polarization tensor, considering both constant and time-varying electromagnetic fields. In both scenarios, five complex modes emerge, reliant on the wave vector ($k$), with electrical conductivity exerting significant influence. The impact of the modes on the energy loss of heavy quarks in the hot QCD medium with a background electromagnetic field has been studied by obtaining the induced electric field in terms of the polarization tensor while invoking Wong's equations. The findings are seen to be consistent with analogous approaches, reinforcing the significance of the results.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models
Authors:
Yuan Shangguan,
Haichuan Yang,
Danni Li,
Chunyang Wu,
Yassir Fathullah,
Dilin Wang,
Ayushi Dalmia,
Raghuraman Krishnamoorthi,
Ozlem Kalinli,
Junteng Jia,
Jay Mahadeokar,
Xin Lei,
Mike Seltzer,
Vikas Chandra
Abstract:
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficien…
▽ More
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-validating models after making these changes can be a resource-intensive task. This paper presents TODM (Train Once Deploy Many), a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. TODM leverages insights from prior work on Supernet, where Recurrent Neural Network Transducer (RNN-T) models share weights within a Supernet. It reduces layer sizes and widths of the Supernet to obtain subnetworks, making them smaller models suitable for all hardware types. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet: adaptive dropouts, an in-place Alpha-divergence knowledge distillation, and the use of ScaledAdam optimizer. We validate our approach by comparing Supernet-trained versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using LibriSpeech. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER), while efficiently keeping the cost of training many models at a small constant.
△ Less
Submitted 27 November, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
Extending the Chemical Reach of the H3 Survey: Detailed Abundances of the Dwarf-galaxy Stellar Stream Wukong/LMS-1
Authors:
Guilherme Limberg,
Alexander P. Ji,
Rohan P. Naidu,
Anirudh Chiti,
Silvia Rossi,
Sam A. Usman,
Yuan-Sen Ting,
Dennis Zaritsky,
Ana Bonaca,
Lais Borbolato,
Joshua S. Speagle,
Vedant Chandra,
Charlie Conroy
Abstract:
We present the first detailed chemical-abundance analysis of stars from the dwarf-galaxy stellar stream Wukong/LMS-1 covering a wide metallicity range ($-3.5 < \rm[Fe/H] \lesssim -1.3$). We find abundance patterns that are effectively indistinguishable from the bulk of Indus and Jhelum, a pair of smaller stellar streams proposed to be dynamically associated with Wukong/LMS-1. We confirmed a carbon…
▽ More
We present the first detailed chemical-abundance analysis of stars from the dwarf-galaxy stellar stream Wukong/LMS-1 covering a wide metallicity range ($-3.5 < \rm[Fe/H] \lesssim -1.3$). We find abundance patterns that are effectively indistinguishable from the bulk of Indus and Jhelum, a pair of smaller stellar streams proposed to be dynamically associated with Wukong/LMS-1. We confirmed a carbon-enhanced metal-poor star ($\rm[C/Fe] > +0.7$ and $\rm[Fe/H] \sim -2.9$) in Wukong/LMS-1 with strong enhancements in Sr, Y, and Zr, which is peculiar given its solar-level [Ba/Fe]. Wukong/LMS-1 stars have high abundances of $α$ elements up to $\rm[Fe/H] \gtrsim -2$, which is expected for relatively massive dwarfs. Towards the high-metallicity end, Wukong/LMS-1 becomes $α$-poor, revealing that it probably experienced fairly standard chemical evolution. We identified a pair of N- and Na-rich stars in Wukong/LMS-1, reminiscent of multiple populations in globular clusters. This indicates that this dwarf galaxy contained at least one globular cluster that was completely disrupted in addition to two intact ones previously known to be associated with Wukong/LMS-1, which is possibly connected to similar evidence found in Indus. From these $\geq$3 globular clusters, we estimate the total mass of Wukong/LMS-1 to be ${\approx}10^{10} M_\odot$, representing ${\sim}1$% of the present-day Milky Way. Finally, the [Eu/Mg] ratio in Wukong/LMS-1 continuously increases with metallicity, making this the first example of a dwarf galaxy where the production of $r$-process elements is clearly dominated by delayed sources, presumably neutron-star mergers.
△ Less
Submitted 5 April, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
Revisiting Sample Size Determination in Natural Language Understanding
Authors:
Ernie Chang,
Muhammad Hassan Rashid,
Pin-Jie Lin,
Changsheng Zhao,
Vera Demberg,
Yangyang Shi,
Vikas Chandra
Abstract:
Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored…
▽ More
Knowing exactly how many data points need to be labeled to achieve a certain model performance is a hugely beneficial step towards reducing the overall budgets for annotation. It pertains to both active learning and traditional data annotation, and is particularly beneficial for low resource scenarios. Nevertheless, it remains a largely under-explored area of research in NLP. We therefore explored various techniques for estimating the training sample size necessary to achieve a targeted performance value. We derived a simple yet effective approach to predict the maximum achievable model performance based on small amount of training samples - which serves as an early indicator during data annotation for data quality and sample size determination. We performed ablation studies on four language understanding tasks, and showed that the proposed approach allows us to forecast model performance within a small margin of mean absolute error (~ 0.9%) with only 10% data.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Discovery of the Magellanic Stellar Stream Out to 100 Kiloparsecs
Authors:
Vedant Chandra,
Rohan P. Naidu,
Charlie Conroy,
Ana Bonaca,
Dennis Zaritsky,
Phillip A. Cargile,
Nelson Caldwell,
Benjamin D. Johnson,
Jiwon Jesse Han,
Yuan-Sen Ting
Abstract:
The Magellanic Stream (MS) - an enormous ribbon of gas spanning $140^\circ$ of the southern sky trailing the Magellanic Clouds - has been exquisitely mapped in the five decades since its discovery. However, despite concerted efforts, no stellar counterpart to the MS has been conclusively identified. This stellar stream would reveal the distance and 6D kinematics of the MS, constraining its formati…
▽ More
The Magellanic Stream (MS) - an enormous ribbon of gas spanning $140^\circ$ of the southern sky trailing the Magellanic Clouds - has been exquisitely mapped in the five decades since its discovery. However, despite concerted efforts, no stellar counterpart to the MS has been conclusively identified. This stellar stream would reveal the distance and 6D kinematics of the MS, constraining its formation and the past orbital history of the Clouds. We have been conducting a spectroscopic survey of the most distant and luminous red giant stars in the Galactic outskirts. From this dataset, we have discovered a prominent population of 13 stars matching the extreme angular momentum of the Clouds, spanning up to $100^\circ$ along the MS at distances of $60-120$ kpc. Furthermore, these kinemetically-selected stars lie along a [$α$/Fe]-deficient track in chemical space from $-2.5 < \mathrm{[Fe/H]} < -0.5$, consistent with their formation in the Clouds themselves. We identify these stars as high-confidence members of the Magellanic Stellar Stream. Half of these stars are metal-rich and closely follow the gaseous MS, whereas the other half are more scattered and metal-poor. We argue that the metal-rich stream is the recently-formed tidal counterpart to the MS, and speculate that the metal-poor population was thrown out of the SMC outskirts during an earlier interaction between the Clouds. The Magellanic Stellar Stream provides a strong set of constraints - distances, 6D kinematics, and birth locations - that will guide future simulations towards unveiling the detailed history of the Clouds.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Formation of Galactic Disks II: the Physical Drivers of Disk Spin-up
Authors:
Vadim A. Semenov,
Charlie Conroy,
Vedant Chandra,
Lars Hernquist,
Dylan Nelson
Abstract:
Using a representative sample of Milky Way (MW)-like galaxies from the TNG50 cosmological simulation, we investigate physical processes driving the formation of galactic disks. A disk forms as a result of the interplay between inflow and outflow carrying angular momentum in and out of the galaxy. Interestingly, the inflow and outflow have remarkably similar distributions of angular momentum, sugge…
▽ More
Using a representative sample of Milky Way (MW)-like galaxies from the TNG50 cosmological simulation, we investigate physical processes driving the formation of galactic disks. A disk forms as a result of the interplay between inflow and outflow carrying angular momentum in and out of the galaxy. Interestingly, the inflow and outflow have remarkably similar distributions of angular momentum, suggesting an exchange of angular momentum and/or outflow recycling, leading to continuous feeding of prealigned material from the corotating circumgalactic medium. We show that the disk formation in TNG50 is correlated with stellar bulge formation, in qualitative agreement with a recent theoretical model of disk formation facilitated by steep gravitational potentials. Disk formation is also correlated with the formation of a hot circumgalactic halo with around half of the inflow occurring at subsonic and transonic velocities corresponding to Mach numbers of $\lesssim2$. In the context of recent theoretical works connecting disk settling and hot halo formation, our results imply that the subsonic part of the inflow may settle into a disk while the remaining supersonic inflow will perturb this disk via the chaotic cold accretion. We find that disks tend to form when the host halos become more massive than $\sim (1-2) \times 10^{11} M_\odot$, consistent with previous theoretical findings and observational estimates of the predisk protogalaxy remnant in the MW. Our results do not prove that either corotating outflow recycling, gravitational potential steepening, or hot halo formation cause disk formation, but they show that all these processes occur concurrently and may play an important role in disk growth.
△ Less
Submitted 26 July, 2024; v1 submitted 22 June, 2023;
originally announced June 2023.
-
RomAndromeda: The Roman Survey of the Andromeda Halo
Authors:
Arjun Dey,
Joan Najita,
Carrie Filion,
Jiwon Jesse Han,
Sarah Pearson,
Rosemary Wyse,
Adrien C. R. Thob,
Borja Anguiano,
Miranda Apfel,
Magda Arnaboldi,
Eric F. Bell,
Leandro Beraldo e Silva,
Gurtina Besla,
Aparajito Bhattacharya,
Souradeep Bhattacharya,
Vedant Chandra,
Yumi Choi,
Michelle L. M. Collins,
Emily C. Cunningham,
Julianne J. Dalcanton,
Ivanna Escala,
Hayden R. Foote,
Annette M. N. Ferguson,
Benjamin J. Gibson,
Oleg Y. Gnedin
, et al. (28 additional authors not shown)
Abstract:
As our nearest large neighbor, the Andromeda Galaxy provides a unique laboratory for investigating galaxy formation and the distribution and substructure properties of dark matter in a Milky Way-like galaxy. Here, we propose an initial 2-epoch ($Δt\approx 5$yr), 2-band Roman survey of the entire halo of Andromeda, covering 500 square degrees, which will detect nearly every red giant star in the ha…
▽ More
As our nearest large neighbor, the Andromeda Galaxy provides a unique laboratory for investigating galaxy formation and the distribution and substructure properties of dark matter in a Milky Way-like galaxy. Here, we propose an initial 2-epoch ($Δt\approx 5$yr), 2-band Roman survey of the entire halo of Andromeda, covering 500 square degrees, which will detect nearly every red giant star in the halo (10$σ$ detection in F146, F062 of 26.5, 26.1AB mag respectively) and yield proper motions to $\sim$25 microarcsec/year (i.e., $\sim$90 km/s) for all stars brighter than F146 $\approx 23.6$ AB mag (i.e., reaching the red clump stars in the Andromeda halo). This survey will yield (through averaging) high-fidelity proper motions for all satellites and compact substructures in the Andromeda halo and will enable statistical searches for clusters in chemo-dynamical space. Adding a third epoch during the extended mission will improve these proper motions by $\sim t^{-1.5}$, to $\approx 11$ km/s, but this requires obtaining the first epoch in Year 1 of Roman operations. In combination with ongoing and imminent spectroscopic campaigns with ground-based telescopes, this Roman survey has the potential to yield full 3-d space motions of $>$100,000 stars in the Andromeda halo, including (by combining individual measurements) robust space motions of its entire globular cluster and most of its dwarf galaxy satellite populations. It will also identify high-velocity stars in Andromeda, providing unique information on the processes that create this population. These data offer a unique opportunity to study the immigration history, halo formation, and underlying dark matter scaffolding of a galaxy other than our own.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
NANCY: Next-generation All-sky Near-infrared Community surveY
Authors:
Jiwon Jesse Han,
Arjun Dey,
Adrian M. Price-Whelan,
Joan Najita,
Edward F. Schlafly,
Andrew Saydjari,
Risa H. Wechsler,
Ana Bonaca,
David J Schlegel,
Charlie Conroy,
Anand Raichoor,
Alex Drlica-Wagner,
Juna A. Kollmeier,
Sergey E. Koposov,
Gurtina Besla,
Hans-Walter Rix,
Alyssa Goodman,
Douglas Finkbeiner,
Abhijeet Anand,
Matthew Ashby,
Benedict Bahr-Kalus,
Rachel Beaton,
Jayashree Behera,
Eric F. Bell,
Eric C Bellm
, et al. (184 additional authors not shown)
Abstract:
The Nancy Grace Roman Space Telescope is capable of delivering an unprecedented all-sky, high-spatial resolution, multi-epoch infrared map to the astronomical community. This opportunity arises in the midst of numerous ground- and space-based surveys that will provide extensive spectroscopy and imaging together covering the entire sky (such as Rubin/LSST, Euclid, UNIONS, SPHEREx, DESI, SDSS-V, GAL…
▽ More
The Nancy Grace Roman Space Telescope is capable of delivering an unprecedented all-sky, high-spatial resolution, multi-epoch infrared map to the astronomical community. This opportunity arises in the midst of numerous ground- and space-based surveys that will provide extensive spectroscopy and imaging together covering the entire sky (such as Rubin/LSST, Euclid, UNIONS, SPHEREx, DESI, SDSS-V, GALAH, 4MOST, WEAVE, MOONS, PFS, UVEX, NEO Surveyor, etc.). Roman can uniquely provide uniform high-spatial-resolution (~0.1 arcsec) imaging over the entire sky, vastly expanding the science reach and precision of all of these near-term and future surveys. This imaging will not only enhance other surveys, but also facilitate completely new science. By imaging the full sky over two epochs, Roman can measure the proper motions for stars across the entire Milky Way, probing 100 times fainter than Gaia out to the very edge of the Galaxy. Here, we propose NANCY: a completely public, all-sky survey that will create a high-value legacy dataset benefiting innumerable ongoing and forthcoming studies of the universe. NANCY is a pure expression of Roman's potential: it images the entire sky, at high spatial resolution, in a broad infrared bandpass that collects as many photons as possible. The majority of all ongoing astronomical surveys would benefit from incorporating observations of NANCY into their analyses, whether these surveys focus on nearby stars, the Milky Way, near-field cosmology, or the broader universe.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Formation of Galactic Disks I: Why Did the Milky Way's Disk Form Unusually Early?
Authors:
Vadim A. Semenov,
Charlie Conroy,
Vedant Chandra,
Lars Hernquist,
Dylan Nelson
Abstract:
Recent results from spectroscopic and astrometric surveys of nearby stars suggest that the stellar disk of our Milky Way (MW) was formed quite early, within the first few billion years of its evolution. Chemokinematic signatures of disk formation in cosmological zoom-in simulations appear to be in tension with these data, implying that MW-like disk formation is delayed in simulations. We investiga…
▽ More
Recent results from spectroscopic and astrometric surveys of nearby stars suggest that the stellar disk of our Milky Way (MW) was formed quite early, within the first few billion years of its evolution. Chemokinematic signatures of disk formation in cosmological zoom-in simulations appear to be in tension with these data, implying that MW-like disk formation is delayed in simulations. We investigate the formation of galactic disks using a representative sample of MW-like galaxies from the cosmological-volume simulation TNG50. We find that on average MW-mass disks indeed form later than the local data suggest. However, their formation time and metallicity exhibit a substantial scatter, such that $\sim$10% of MW-mass galaxies form disks early, similar to the MW. Thus, although the MW is unusual, it is consistent with the overall population of MW-mass disk galaxies. The direct MW analogs assemble most of their mass early, $\gtrsim 10$ Gyr ago, and are not affected by destructive mergers after that. In addition, these galaxies form their disks during the early enrichment stage when the interstellar medium metallicity increases rapidly, with only $\sim$25% of early-forming disks being as metal-poor as the MW was at the onset of disk formation, [Fe/H] $\approx -1.0$. In contrast, most MW-mass galaxies either form disks from already enriched material or experience late destructive mergers that reset the signatures of galactic disk formation to later times and higher metallicities. Finally, we also show that earlier disk formation leads to more dominant rotationally supported stellar disks at redshift zero.
△ Less
Submitted 22 January, 2024; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Heavy quark radiation in an anisotropic hot QCD medium
Authors:
Jai Prakash,
Vinod Chandra,
Santosh K. Das
Abstract:
The impact of momentum anisotropy on the heavy quarks (HQs) dynamics has been investigated in a hot QCD medium while considering both collisional and radiative processes within the ambit of the Fokker-Planck approach. The relative orientation of the HQs motion (momentum vector) with respect to the direction of anisotropy is responsible for the character of transport coefficients. Therefore, the dr…
▽ More
The impact of momentum anisotropy on the heavy quarks (HQs) dynamics has been investigated in a hot QCD medium while considering both collisional and radiative processes within the ambit of the Fokker-Planck approach. The relative orientation of the HQs motion (momentum vector) with respect to the direction of anisotropy is responsible for the character of transport coefficients. Therefore, the drag and diffusion coefficients of the HQs are decomposed, respectively, into two and four components by considering a general tensor basis. Each component of the drag and diffusion coefficient of the HQs has been analyzed in detail. It is observed that the anisotropy has a significant impact on the transport coefficients of the HQ for both the collisional and the radiational processes. The nuclear suppression factor, $R_{AA}$, has been computed considering the anisotropic medium. It is observed that the momentum anisotropy affects the $R_{AA}$ of the HQs significantly in both elastic and inelastic cases.
△ Less
Submitted 20 November, 2023; v1 submitted 13 June, 2023;
originally announced June 2023.
-
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts
Authors:
Ganesh Jawahar,
Haichuan Yang,
Yunyang Xiong,
Zechun Liu,
Dilin Wang,
Fei Sun,
Meng Li,
Aasish Pappu,
Barlas Oguz,
Muhammad Abdul-Mageed,
Laks V. S. Lakshmanan,
Raghuraman Krishnamoorthi,
Vikas Chandra
Abstract:
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural architecture search (NAS) frameworks. Despite their ability to generate diverse subnetworks without retraining, the quality of these subnetworks is not guaranteed due to weight sharing. In NLP tasks like machine translation and pre-trained language modeling, there is a significant performance gap between superne…
▽ More
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural architecture search (NAS) frameworks. Despite their ability to generate diverse subnetworks without retraining, the quality of these subnetworks is not guaranteed due to weight sharing. In NLP tasks like machine translation and pre-trained language modeling, there is a significant performance gap between supernet and training from scratch for the same model architecture, necessitating retraining post optimal architecture identification.
This study introduces a solution called mixture-of-supernets, a generalized supernet formulation leveraging mixture-of-experts (MoE) to enhance supernet model expressiveness with minimal training overhead. Unlike conventional supernets, this method employs an architecture-based routing mechanism, enabling indirect sharing of model weights among subnetworks. This customization of weights for specific architectures, learned through gradient descent, minimizes retraining time, significantly enhancing training efficiency in NLP. The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models, exhibiting a superior latency-BLEU tradeoff compared to HAT, the SoTA NAS framework for machine translation. Furthermore, it excels in NAS for building memory-efficient task-agnostic BERT models, surpassing NAS-BERT and AutoDistil across various model sizes. The code can be found at: https://github.com/UBC-NLP/MoS.
△ Less
Submitted 7 August, 2024; v1 submitted 7 June, 2023;
originally announced June 2023.
-
The fastest stars in the Galaxy
Authors:
Kareem El-Badry,
Ken J. Shen,
Vedant Chandra,
Evan Bauer,
Jim Fuller,
Jay Strader,
Laura Chomiuk,
Rohan Naidu,
Ilaria Caiazzo,
Antonio C. Rodriguez,
Pranav Nagarajan,
Natsuko Yamaguchi,
Zachary P. Vanderbosch,
Benjamin R. Roulston,
Jan van Roestel,
Boris Gänsicke,
Jiwon Jesse Han,
Kevin B. Burdge,
Alexei V. Filippenko,
Thomas G. Brink,
WeiKang Zheng
Abstract:
We report a spectroscopic search for hypervelocity white dwarfs (WDs) that are runaways from Type Ia supernovae (SNe Ia) and related thermonuclear explosions. Candidates are selected from Gaia data with high tangential velocities and blue colors. We find six new runaways, including four stars with radial velocities (RVs) $>1000\,\rm km\,s^{-1}$ and total space velocities…
▽ More
We report a spectroscopic search for hypervelocity white dwarfs (WDs) that are runaways from Type Ia supernovae (SNe Ia) and related thermonuclear explosions. Candidates are selected from Gaia data with high tangential velocities and blue colors. We find six new runaways, including four stars with radial velocities (RVs) $>1000\,\rm km\,s^{-1}$ and total space velocities $\gtrsim 1300\,\rm km\,s^{-1}$. These are most likely the surviving donors from double-degenerate binaries in which the other WD exploded. The other two objects have lower minimum velocities, $\gtrsim 600\,\rm km\,s^{-1}$, and may have formed through a different mechanism, such as pure deflagration of a WD in a Type Iax supernova. The four fastest stars are hotter and smaller than the previously known "D$^6$ stars," with effective temperatures ranging from $\sim$20,000 to $\sim$130,000 K and radii of $\sim 0.02-0.10\,R_{\odot}$. Three of these have carbon-dominated atmospheres, and one has a helium-dominated atmosphere. Two stars have RVs of $-1694$ and $-2285\rm \,km\,s^{-1}$ -- the fastest systemic stellar RVs ever measured. Their inferred birth velocities, $\sim 2200-2500\,\rm km\,s^{-1}$, imply that both WDs in the progenitor binary had masses $>1.0\,M_{\odot}$. The high observed velocities suggest that a dominant fraction of the observed hypervelocity WD population comes from double-degenerate binaries whose total mass significantly exceeds the Chandrasekhar limit. However, the two nearest and faintest D$^6$ stars have the lowest velocities and masses, suggesting that observational selection effects favor rarer, higher-mass stars. A significant population of fainter low-mass runaways may still await discovery. We infer a birth rate of D$^6$ stars that is consistent with the SN Ia rate. The birth rate is poorly constrained, however, because the luminosities and lifetimes of $\rm D^6$ stars are uncertain.
△ Less
Submitted 25 July, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Authors:
Zechun Liu,
Barlas Oguz,
Changsheng Zhao,
Ernie Chang,
Pierre Stock,
Yashar Mehdad,
Yangyang Shi,
Raghuraman Krishnamoorthi,
Vikas Chandra
Abstract:
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training for LLMs (LLM-QAT) to push quantization levels even further. We propose a data-free distillation method that leverages generations produced by the p…
▽ More
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training for LLMs (LLM-QAT) to push quantization levels even further. We propose a data-free distillation method that leverages generations produced by the pre-trained model, which better preserves the original output distribution and allows quantizing any generative model independent of its training data, similar to post-training quantization methods. In addition to quantizing weights and activations, we also quantize the KV cache, which is critical for increasing throughput and support long sequence dependencies at current model sizes. We experiment with LLaMA models of sizes 7B, 13B, and 30B, at quantization levels down to 4-bits. We observe large improvements over training-free methods, especially in the low-bit settings.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
Cataclysmic Variables from Sloan Digital Sky Survey V -- the search for period bouncers continues
Authors:
K. Inight,
Boris T. Gänsicke,
A. Schwope,
S. F. Anderson,
C. Badenes,
E. Breedt,
V. Chandra,
B. D. R. Davies,
N. P. Gentile Fusillo,
M. J. Green,
J. J. Hermes,
I. Achaica Huamani,
H. Hwang,
K. Knauff,
J. Kurpas,
K. S. Long,
V. Malanushenko,
S. Morrison,
I. J. Quiroz C.,
G. N. Aichele Ramos,
A. Roman-Lopes,
M. R. Schreiber,
A. Standke,
L. Stütz,
J. R. Thorstensen
, et al. (3 additional authors not shown)
Abstract:
SDSS-V is carrying out a dedicated survey for white dwarfs, single and in binaries, and we report the analysis of the spectroscopy of cataclysmic variables (CVs) and CV candidates obtained during the final plug plate observations of SDSS. We identify eight new CVs, spectroscopically confirm 53 and refute eleven published CV candidates, and we report 21 new or improved orbital periods. Combined wit…
▽ More
SDSS-V is carrying out a dedicated survey for white dwarfs, single and in binaries, and we report the analysis of the spectroscopy of cataclysmic variables (CVs) and CV candidates obtained during the final plug plate observations of SDSS. We identify eight new CVs, spectroscopically confirm 53 and refute eleven published CV candidates, and we report 21 new or improved orbital periods. Combined with previously published data, the orbital period distribution of the SDSS-V CVs does not clearly exhibit a period gap. This is consistent with previous findings that spectroscopically identified CVs have a larger proportion of short-period systems compared to samples identified from photometric variability. Remarkably, despite a systematic search, we find very few period bouncers. We estimate the space density of period bouncers to be $\simeq0.2\times10^{-6}\,\mathrm{pc}^{-3}$, i.e. they represent only a few per cent of the total CV population. This suggests that during their final phase of evolution, CVs either destroy the donor, e.g. via a merger, or that they become detached and cease mass transfer.
△ Less
Submitted 11 September, 2023; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Robust Data-driven Metallicities for 175 Million Stars from Gaia XP Spectra
Authors:
Rene Andrae,
Hans-Walter Rix,
Vedant Chandra
Abstract:
We derive and publish data-driven estimates of stellar metallicities [M/H] for 175 million stars with low-resolution XP spectra published in Gaia DR3. The [M/H] values, along with Teff and logg, are derived using the XGBoost algorithm, trained on stellar parameters from APOGEE, augmented by a set of very metal-poor stars. XGBoost draws on a number of data features: the full set of XP spectral coef…
▽ More
We derive and publish data-driven estimates of stellar metallicities [M/H] for 175 million stars with low-resolution XP spectra published in Gaia DR3. The [M/H] values, along with Teff and logg, are derived using the XGBoost algorithm, trained on stellar parameters from APOGEE, augmented by a set of very metal-poor stars. XGBoost draws on a number of data features: the full set of XP spectral coefficients, narrowband fluxes derived from XP spectra, and broadband magnitudes. In particular, we include CatWISE magnitudes, as they reduce the degeneracy of Teff and dust reddening. We also include the parallax as a data feature, which helps constrain logg and [M/H]. The resulting mean stellar parameter precision is 0.1 dex in [M/H], 50 K in Teff, and 0.08 dex in logg. This all-sky [M/H] sample is substantially larger than published samples of comparable fidelity across -3<[M/H]<+0.5. Additionally, we provide a catalog of over 17 million bright (G<16) red giants whose [M/H] are vetted to be precise and pure. We present all-sky maps of the Milky Way in different [M/H] regimes that illustrate the purity of the dataset, and demonstrate the power of this unprecedented sample to reveal the Milky Way's structure from its heart to its disk.
△ Less
Submitted 22 May, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
The Eighteenth Data Release of the Sloan Digital Sky Surveys: Targeting and First Spectra from SDSS-V
Authors:
Andrés Almeida,
Scott F. Anderson,
Maria Argudo-Fernández,
Carles Badenes,
Kat Barger,
Jorge K. Barrera-Ballesteros,
Chad F. Bender,
Erika Benitez,
Felipe Besser,
Dmitry Bizyaev,
Michael R. Blanton,
John Bochanski,
Jo Bovy,
William Nielsen Brandt,
Joel R. Brownstein,
Johannes Buchner,
Esra Bulbul,
Joseph N. Burchett,
Mariana Cano Díaz,
Joleen K. Carlberg,
Andrew R. Casey,
Vedant Chandra,
Brian Cherinka,
Cristina Chiappini,
Abigail A. Coker
, et al. (129 additional authors not shown)
Abstract:
The eighteenth data release of the Sloan Digital Sky Surveys (SDSS) is the first one for SDSS-V, the fifth generation of the survey. SDSS-V comprises three primary scientific programs, or "Mappers": Milky Way Mapper (MWM), Black Hole Mapper (BHM), and Local Volume Mapper (LVM). This data release contains extensive targeting information for the two multi-object spectroscopy programs (MWM and BHM),…
▽ More
The eighteenth data release of the Sloan Digital Sky Surveys (SDSS) is the first one for SDSS-V, the fifth generation of the survey. SDSS-V comprises three primary scientific programs, or "Mappers": Milky Way Mapper (MWM), Black Hole Mapper (BHM), and Local Volume Mapper (LVM). This data release contains extensive targeting information for the two multi-object spectroscopy programs (MWM and BHM), including input catalogs and selection functions for their numerous scientific objectives. We describe the production of the targeting databases and their calibration- and scientifically-focused components. DR18 also includes ~25,000 new SDSS spectra and supplemental information for X-ray sources identified by eROSITA in its eFEDS field. We present updates to some of the SDSS software pipelines and preview changes anticipated for DR19. We also describe three value-added catalogs (VACs) based on SDSS-IV data that have been published since DR17, and one VAC based on the SDSS-V data in the eFEDS field.
△ Less
Submitted 6 July, 2023; v1 submitted 18 January, 2023;
originally announced January 2023.