Search | arXiv e-print repository

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning

Authors: Simran Kaur, Simon Park, Anirudh Goyal, Sanjeev Arora

Abstract: We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data. The Instruct-SkillMix pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following, either from existing datasets, or by directly prompting the model; (2) Data generation: uses the powerful LLM to generat… ▽ More We introduce Instruct-SkillMix, an automated approach for creating diverse, high quality SFT data. The Instruct-SkillMix pipeline involves two stages, each leveraging an existing powerful LLM: (1) Skill extraction: uses the LLM to extract core "skills" for instruction-following, either from existing datasets, or by directly prompting the model; (2) Data generation: uses the powerful LLM to generate (instruction, response) data that exhibit a randomly chosen pair of these skills. Here, the use of random skill combinations promotes diversity and difficulty. Vanilla SFT (i.e., no PPO, DPO, or RL methods) on data generated from Instruct-SkillMix leads to strong gains on instruction following benchmarks such as AlpacaEval 2.0, MT-Bench, and WildBench. With just $4$K examples, LLaMA-3-8B-Base achieves 42.76% length-controlled win rate on AlpacaEval 2.0. To our knowledge, this achieves state-of-the-art performance among all models that have only undergone SFT (no RL methods) and competes with proprietary models such as Claude 3 Opus and LLaMA-3.1-405B-Instruct. Ablation studies also suggest plausible reasons for why creating open instruction-tuning datasets via naive crowd-sourcing has proved difficult. Introducing low quality answers ("shirkers") in $20\%$ of Instruct-SkillMix examples causes performance to plummet, sometimes catastrophically. The Instruct-SkillMix pipeline is flexible and is adaptable to other settings. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.13347 [pdf, other]

ORCHID: Streaming Threat Detection over Versioned Provenance Graphs

Authors: Akul Goyal, Jason Liu, Adam Bates, Gang Wang

Abstract: While Endpoint Detection and Response (EDR) are able to efficiently monitor threats by comparing static rules to the event stream, their inability to incorporate past system context leads to high rates of false alarms. Recent work has demonstrated Provenance-based Intrusion Detection Systems (Prov-IDS) that can examine the causal relationships between abnormal behaviors to improve threat classific… ▽ More While Endpoint Detection and Response (EDR) are able to efficiently monitor threats by comparing static rules to the event stream, their inability to incorporate past system context leads to high rates of false alarms. Recent work has demonstrated Provenance-based Intrusion Detection Systems (Prov-IDS) that can examine the causal relationships between abnormal behaviors to improve threat classification. However, employing these Prov-IDS in practical settings remains difficult -- state-of-the-art neural network based systems are only fast in a fully offline deployment model that increases attacker dwell time, while simultaneously using simplified and less accurate provenance graphs to reduce memory consumption. Thus, today's Prov-IDS cannot operate effectively in the real-time streaming setting required for commercial EDR viability. This work presents the design and implementation of ORCHID, a novel Prov-IDS that performs fine-grained detection of process-level threats over a real time event stream. ORCHID takes advantage of the unique immutable properties of a versioned provenance graphs to iteratively embed the entire graph in a sequential RNN model while only consuming a fraction of the computation and memory costs. We evaluate ORCHID on four public datasets, including DARPA TC, to show that ORCHID can provide competitive classification performance while eliminating detection lag and reducing memory consumption by two orders of magnitude. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.10629 [pdf, other]

The GAPS Programme at TNG. LIX. A characterisation study of the $\sim$300 Myr old multi-planetary system orbiting the star BD+40 2790 (TOI-2076)

Authors: M. Damasso, D. Locci, S. Benatti, A. Maggio, M. Baratella, S. Desidera, K. Biazzo, E. Palle, S. Wang, D. Nardiello, L. Borsato, A. S. Bonomo, S. Messina, G. Nowak, A. Goyal, V. J. S. Bejar, A. Bignamini, L. Cabona, I. Carleo, R. Claudi, R. Cosentino, S. Filomeno, C. Knapic, N. Lodieu, V. Lorenzi , et al. (13 additional authors not shown)

Abstract: We collected more than 300 high-resolution spectra of the 300 Myr old star BD+40 2790 (TOI-2076) over ~3 years. This star hosts three transiting planets discovered by TESS, with orbital periods ~10, 21, and 35 days. BD+40 2790 shows an activity-induced scatter larger than 30 m/s in the radial velocities. We employed different methods to measure the stellar radial velocities and several models to f… ▽ More We collected more than 300 high-resolution spectra of the 300 Myr old star BD+40 2790 (TOI-2076) over ~3 years. This star hosts three transiting planets discovered by TESS, with orbital periods ~10, 21, and 35 days. BD+40 2790 shows an activity-induced scatter larger than 30 m/s in the radial velocities. We employed different methods to measure the stellar radial velocities and several models to filter out the dominant stellar activity signal, in order to bring to light the planet-induced signals which are expected to have semi-amplitudes one order of magnitude lower. We evaluated the mass loss rate of the planetary atmospheres using photoionization hydrodynamic modeling. The dynamical analysis confirms that the three sub-Neptune-sized companions (our radius measurements are $R_b$=2.54$\pm$0.04, $R_c$=3.35$\pm$0.05, and $R_d$=3.29$\pm$0.06 $R_{\rm Earth}$) have masses in the planetary regime. We derive 3$σ$ upper limits below or close to the mass of Neptune for all the planets: 11--12, 12--13.5, and 14--19 $M_{\rm Earth}$ for planet $b$, $c$, and $d$ respectively. In the case of planet $d$, we found promising clues that the mass could be between ~7 and 8 $M_{\rm Earth}$, with a significance level between 2.3--2.5$σ$ (at best). This result must be further investigated using other analysis methods or using high-precision near-IR spectrographs to collect new radial velocities, which could be less affected by stellar activity. Atmospheric photo-evaporation simulations predict that BD+40~2790 b is currently losing its H-He gaseous envelope, which will be completely lost at an age within 0.5--3 Gyr if its current mass is lower than 12 $M_{\rm Earth}$. BD+40 2790 c could have a lower bulk density than $b$, and it could retain its atmosphere up to an age of 5 Gyr. For the outermost planet $d$, we predict almost negligible evolution of its mass and radius induced by photo-evaporation. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: Accepted for publication on A&A. Abstract abridged

arXiv:2408.09310 [pdf, other]

Narrowing the Focus: Learned Optimizers for Pretrained Models

Authors: Gus Kristiansen, Mark Sandler, Andrey Zhmoginov, Nolan Miller, Anirudh Goyal, Jihwan Lee, Max Vladymyrov

Abstract: In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of the training process. Learned optimizers have shown some initial promise, but are generally unsuccessful as a general optimization mechanism applicable to every… ▽ More In modern deep learning, the models are learned by applying gradient updates using an optimizer, which transforms the updates based on various statistics. Optimizers are often hand-designed and tuning their hyperparameters is a big part of the training process. Learned optimizers have shown some initial promise, but are generally unsuccessful as a general optimization mechanism applicable to every problem. In this work we explore a different direction: instead of learning general optimizers, we instead specialize them to a specific training environment. We propose a novel optimizer technique that learns a layer-specific linear combination of update directions provided by a set of base optimizers, effectively adapting its strategy to the specific model and dataset. When evaluated on image classification tasks, this specialized optimizer significantly outperforms both traditional off-the-shelf methods such as Adam, as well as existing general learned optimizers. Moreover, it demonstrates robust generalization with respect to model initialization, evaluating on unseen datasets, and training durations beyond its meta-training horizon. △ Less

Submitted 21 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

arXiv:2408.09162 [pdf, other]

Zero-Shot Object-Centric Representation Learning

Authors: Aniket Didolkar, Andrii Zadaianchuk, Anirudh Goyal, Mike Mozer, Yoshua Bengio, Georg Martius, Maximilian Seitzer

Abstract: The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models traine… ▽ More The goal of object-centric representation learning is to decompose visual scenes into a structured representation that isolates the entities. Recent successes have shown that object-centric representation learning can be scaled to real-world scenes by utilizing pre-trained self-supervised features. However, so far, object-centric methods have mostly been applied in-distribution, with models trained and evaluated on the same dataset. This is in contrast to the wider trend in machine learning towards general-purpose models directly applicable to unseen data and tasks. Thus, in this work, we study current object-centric methods through the lens of zero-shot generalization by introducing a benchmark comprising eight different synthetic and real-world datasets. We analyze the factors influencing zero-shot performance and find that training on diverse real-world images improves transferability to unseen scenarios. Furthermore, inspired by the success of task-specific fine-tuning in foundation models, we introduce a novel fine-tuning strategy to adapt pre-trained vision encoders for the task of object discovery. We find that the proposed approach results in state-of-the-art performance for unsupervised object discovery, exhibiting strong zero-shot transfer to unseen datasets. △ Less

Submitted 17 August, 2024; originally announced August 2024.

arXiv:2407.21783 [pdf, other]

The Llama 3 Herd of Models

Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development. △ Less

Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

arXiv:2407.21009 [pdf, other]

AI-Assisted Generation of Difficult Math Questions

Authors: Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal

Abstract: Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLM… ▽ More Current LLM training positions mathematical reasoning as a core capability. With publicly available sources fully tapped, there is unmet demand for diverse and challenging math questions. Relying solely on human experts is both time-consuming and costly, while LLM-generated questions often lack the requisite diversity and difficulty. We present a design framework that combines the strengths of LLMs with a human-in-the-loop approach to generate a diverse array of challenging math questions. We leverage LLM metacognition skills [Didolkar et al., 2024] of a strong LLM to extract core "skills" from existing math datasets. These skills serve as the basis for generating novel and difficult questions by prompting the LLM with random pairs of core skills. The use of two different skills within each question makes finding such questions an "out of distribution" task for both LLMs and humans. Our pipeline employs LLMs to iteratively generate and refine questions and solutions through multiturn prompting. Human annotators then verify and further refine the questions, with their efficiency enhanced via further LLM interactions. Applying this pipeline on skills extracted from the MATH dataset [Hendrycks et al., 2021] resulted in MATH$^2$ - a dataset of higher-quality math questions, as evidenced by: (a) Lower performance of all models on MATH$^2$ than on MATH (b) Higher performance on MATH when using MATH$^2$ questions as in-context examples. Although focused on mathematics, our methodology seems applicable to other domains requiring structured reasoning, and potentially as a component of scalable oversight. Also of interest is a striking relationship observed between models' performance on the new dataset: the success rate on MATH$^2$ is the square on MATH, suggesting that successfully solving the question in MATH$^2$ requires a nontrivial combination of two distinct math skills. △ Less

Submitted 2 September, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

arXiv:2407.00181 [pdf, other]

$W-$mass and Muon $g-2$ in Inert 2HDM Extended by Singlet Complex Scalar

Authors: Hrishabh Bharadwaj, Mamta Dahiya, Sukanta Dutta, Ashok Goyal

Abstract: The deviations of the recent measurements of the muon magnetic moment and the $W-$boson mass from their SM predictions hint to new physics beyond the SM. In this article, we address the observed discrepancies in the $W$-boson mass and muon anomalous magnetic moment in the Inert Two Higgs Doublet Model (I2HDM) extended by a complex scalar field singlet under the SM gauge group. The model is constra… ▽ More The deviations of the recent measurements of the muon magnetic moment and the $W-$boson mass from their SM predictions hint to new physics beyond the SM. In this article, we address the observed discrepancies in the $W$-boson mass and muon anomalous magnetic moment in the Inert Two Higgs Doublet Model (I2HDM) extended by a complex scalar field singlet under the SM gauge group. The model is constrained from the existing LEP data and the measurements of partial decay widths to gauge bosons at LHC. It is shown that a large subset of this constrained parameter space of the model can simultaneously accommodate the $W$-boson mass and also explain the muon $g-2$ anomaly. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: 15 pages, 5 figures

arXiv:2406.18158 [pdf, other]

3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

Authors: Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

Abstract: Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage R… ▽ More Recent works have shown that visual pretraining on egocentric datasets using masked autoencoders (MAE) can improve generalization for downstream robotics tasks. However, these approaches pretrain only on 2D images, while many robotics applications require 3D scene understanding. In this work, we propose 3D-MVP, a novel approach for 3D multi-view pretraining using masked autoencoders. We leverage Robotic View Transformer (RVT), which uses a multi-view transformer to understand the 3D scene and predict gripper pose actions. We split RVT's multi-view transformer into visual encoder and action decoder, and pretrain its visual encoder using masked autoencoding on large-scale 3D datasets such as Objaverse. We evaluate 3D-MVP on a suite of virtual robot manipulation tasks and demonstrate improved performance over baselines. We also show promising results on a real robot platform with minimal finetuning. Our results suggest that 3D-aware pretraining is a promising approach to improve sample efficiency and generalization of vision-based robotic manipulation policies. We will release code and pretrained models for 3D-MVP to facilitate future research. Project site: https://jasonqsy.github.io/3DMVP △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17232 [pdf, other]

Beyond Demographics: Aligning Role-playing LLM-based Agents Using Human Belief Networks

Authors: Yun-Shiuan Chuang, Zach Studdiford, Krirk Nirunwiroj, Agam Goyal, Vincent V. Frigo, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estima… ▽ More Creating human-like large language model (LLM) agents is crucial for faithful social simulation. Having LLMs role-play based on demographic information sometimes improves human likeness but often does not. This study assessed whether LLM alignment with human behavior can be improved by integrating information from empirically-derived human belief networks. Using data from a human survey, we estimated a belief network encompassing 18 topics loading on two non-overlapping latent factors. We then seeded LLM-based agents with an opinion on one topic, and assessed the alignment of its expressed opinions on remaining test topics with corresponding human data. Role-playing based on demographic information alone did not align LLM and human opinions, but seeding the agent with a single belief greatly improved alignment for topics related in the belief network, and not for topics outside the network. These results suggest a novel path for human-LLM belief alignment in work seeking to simulate and understand patterns of belief distributions in society. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.13046 [pdf, other]

Bayesian-LoRA: LoRA based Parameter Efficient Fine-Tuning using Optimal Quantization levels and Rank Values trough Differentiable Bayesian Gates

Authors: Cristian Meo, Ksenia Sycheva, Anirudh Goyal, Justin Dauwels

Abstract: It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed… ▽ More It is a common practice in natural language processing to pre-train a single model on a general domain and then fine-tune it for downstream tasks. However, when it comes to Large Language Models, fine-tuning the entire model can be computationally expensive, resulting in very intensive energy consumption. As a result, several Parameter Efficient Fine-Tuning (PEFT) approaches were recently proposed. One of the most popular approaches is low-rank adaptation (LoRA), where the key insight is decomposing the update weights of the pre-trained model into two low-rank matrices. However, the proposed approaches either use the same rank value across all different weight matrices, which has been shown to be a sub-optimal choice, or do not use any quantization technique, one of the most important factors when it comes to a model's energy consumption. In this work, we propose Bayesian-LoRA which approaches low-rank adaptation and quantization from a Bayesian perspective by employing a prior distribution on both quantization levels and rank values. As a result, B-LoRA is able to fine-tune a pre-trained model on a specific downstream task, finding the optimal rank values and quantization levels for every low-rank matrix. We validate the proposed model by fine-tuning a pre-trained DeBERTaV3 on the GLUE benchmark. Moreover, we compare it to relevant baselines and present both qualitative and quantitative results, showing how the proposed approach is able to learn optimal-rank quantized matrices. B-LoRA performs on par with or better than the baselines while reducing the total number of bit operations by roughly 70% compared to the baseline methods. △ Less

Submitted 9 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.08545 [pdf, other]

RVT-2: Learning Precise Manipulation from Few Demonstrations

Authors: Ankit Goyal, Valts Blukis, Jie Xu, Yijie Guo, Yu-Wei Chao, Dieter Fox

Abstract: In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high… ▽ More In this work, we study how to build a robotic system that can solve multiple 3D manipulation tasks given language instructions. To be useful in industrial and household domains, such a system should be capable of learning new tasks with few demonstrations and solving them precisely. Prior works, like PerAct and RVT, have studied this problem, however, they often struggle with tasks requiring high precision. We study how to make them more effective, precise, and fast. Using a combination of architectural and system-level improvements, we propose RVT-2, a multitask 3D manipulation model that is 6X faster in training and 2X faster in inference than its predecessor RVT. RVT-2 achieves a new state-of-the-art on RLBench, improving the success rate from 65% to 82%. RVT-2 is also effective in the real world, where it can learn tasks requiring high precision, like picking up and inserting plugs, with just 10 demonstrations. Visual results, code, and trained model are provided at: https://robotic-view-transformer-2.github.io/. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted to RSS 2024

arXiv:2405.15485 [pdf, other]

Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

Authors: Siyuan Guo, Aniket Didolkar, Nan Rosemary Ke, Anirudh Goyal, Ferenc Huszár, Bernhard Schölkopf

Abstract: We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from informat… ▽ More We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from information during in-context learning or instruction-tuning through exploiting the complex knowledge structure within mathematics. Motivated by the Neural Tangent Kernel (NTK), we propose \textit{NTKEval} to assess changes in LLM's probability distribution via training on different kinds of math data. Our systematic analysis finds evidence of domain understanding during in-context learning. By contrast, certain instruction-tuning leads to similar performance changes irrespective of training on different data, suggesting a lack of domain understanding across different skills. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14091 [pdf, other]

Peas-in-a-Pod Across the Radius Valley: Rocky Systems are Less Uniform in Mass but More Uniform in Size and Spacing

Authors: Armaan V. Goyal, Songhu Wang

Abstract: The ubiquity of "peas-in-a-pod" architectural patterns and the existence of the radius valley each present a striking population-level trend for planets with $R_{p} \leq 4 R_{\oplus}$ that serves to place powerful constraints on the formation and evolution of these subgiant worlds. As it has yet to be determined whether the strength of this peas-in-a-pod uniformity differs on either side of the ra… ▽ More The ubiquity of "peas-in-a-pod" architectural patterns and the existence of the radius valley each present a striking population-level trend for planets with $R_{p} \leq 4 R_{\oplus}$ that serves to place powerful constraints on the formation and evolution of these subgiant worlds. As it has yet to be determined whether the strength of this peas-in-a-pod uniformity differs on either side of the radius valley, we separately assess the architectures of systems containing only small ($R_{p} \leq 1.6 R_{\oplus}$), rocky planets from those harboring only intermediate-size ($1.6 R_{\oplus} < R_{p} \leq 4 R_{\oplus}$), volatile-rich worlds to perform a novel statistical comparison of intra-system planetary uniformity across compositionally distinct regimes. We find that, compared to their volatile-rich counterparts, rocky systems are less uniform in mass ($2.6σ$), but more uniform in size ($4.0σ$) and spacing ($3.0σ$). We provide further statistical validation for these results, demonstrating that they are not substantially influenced by the presence of mean motion resonances, low-mass host stars, alternative bulk compositional assumptions, sample size effects, or detection biases. We also obtain tentative evidence ($>2 σ$ significance) that the enhanced size uniformity of rocky systems is dominated by the presence of super-Earths ($1 R_{\oplus} \leq R_{p} \leq 1.6 R_{\oplus}$), while their enhanced mass diversity is driven by the presence of sub-Earth ($R_{p} < 1 R_{\oplus}$) worlds. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted to ApJ Letters (May 2024). 17 pages (including 3 for Appendix), 4 figures, 3 tables

arXiv:2405.12205 [pdf, other]

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Authors: Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy Lillicrap, Danilo Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora

Abstract: Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interac… ▽ More Metacognitive knowledge refers to humans' intuitive knowledge of their own thinking and reasoning processes. Today's best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans. To validate that these skill labels are meaningful and relevant to the LLM's reasoning processes we perform the following experiments. (a) We ask GPT-4 to assign skill labels to training questions in math datasets GSM8K and MATH. (b) When using an LLM to solve the test questions, we present it with the full list of skill labels and ask it to identify the skill needed. Then it is presented with randomly selected exemplar solved questions associated with that skill label. This improves accuracy on GSM8k and MATH for several strong LLMs, including code-assisted models. The methodology presented is domain-agnostic, even though this article applies it to math problems. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: Preprint. Under review

arXiv:2405.04324 [pdf, other]

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabilities, including code generation, fixing bugs, explaining and documenting code, maintaining repositories, and more. In this work, we introduce the Granite series of decoder-only code models for code generative tasks, trained with code written in 116 programming languages. The Granite Code models family consists of models ranging in size from 3 to 34 billion parameters, suitable for applications ranging from complex application modernization tasks to on-device memory-constrained use cases. Evaluation on a comprehensive set of tasks demonstrates that Granite Code models consistently reaches state-of-the-art performance among available open-source code LLMs. The Granite Code model family was optimized for enterprise software development workflows and performs well across a range of coding tasks (e.g. code generation, fixing and explanation), making it a versatile all around code model. We release all our Granite Code models under an Apache 2.0 license for both research and commercial use. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

arXiv:2405.00451 [pdf, other]

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

Authors: Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh

Abstract: We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level… ▽ More We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. To enhance consistency in intermediate steps, we combine outcome validation and stepwise self-evaluation, continually updating the quality assessment of newly generated data. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data. Theoretical analysis reveals the importance of using on-policy sampled data for successful self-improving. Extensive evaluations on various arithmetic and commonsense reasoning tasks demonstrate remarkable performance improvements over existing models. For instance, our approach outperforms the Mistral-7B Supervised Fine-Tuning (SFT) baseline on GSM8K, MATH, and ARC-C, with substantial increases in accuracy to $81.8\%$ (+$5.9\%$), $34.7\%$ (+$5.8\%$), and $76.4\%$ (+$15.8\%$), respectively. Additionally, our research delves into the training and inference compute tradeoff, providing insights into how our method effectively maximizes performance gains. Our code is publicly available at https://github.com/YuxiXie/MCTS-DPO. △ Less

Submitted 17 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, 4 tables (24 pages, 9 figures, 9 tables including references and appendices)

arXiv:2404.18963 [pdf, other]

RE-GrievanceAssist: Enhancing Customer Experience through ML-Powered Complaint Management

Authors: Venkatesh C, Harshit Oberoi, Anurag Kumar Pandey, Anil Goyal, Nikhil Sikka

Abstract: In recent years, digital platform companies have faced increasing challenges in managing customer complaints, driven by widespread consumer adoption. This paper introduces an end-to-end pipeline, named RE-GrievanceAssist, designed specifically for real estate customer complaint management. The pipeline consists of three key components: i) response/no-response ML model using TF-IDF vectorization an… ▽ More In recent years, digital platform companies have faced increasing challenges in managing customer complaints, driven by widespread consumer adoption. This paper introduces an end-to-end pipeline, named RE-GrievanceAssist, designed specifically for real estate customer complaint management. The pipeline consists of three key components: i) response/no-response ML model using TF-IDF vectorization and XGBoost classifier ; ii) user type classifier using fasttext classifier; iii) issue/sub-issue classifier using TF-IDF vectorization and XGBoost classifier. Finally, it has been deployed as a batch job in Databricks, resulting in a remarkable 40% reduction in overall manual effort with monthly cost reduction of Rs 1,50,000 since August 2023. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.17177 [pdf, other]

RE-RFME: Real-Estate RFME Model for customer segmentation

Authors: Anurag Kumar Pandey, Anil Goyal, Nikhil Sikka

Abstract: Marketing is one of the high-cost activities for any online platform. With the increase in the number of customers, it is crucial to understand customers based on their dynamic behaviors to design effective marketing strategies. Customer segmentation is a widely used approach to group customers into different categories and design the marketing strategy targeting each group individually. Therefore… ▽ More Marketing is one of the high-cost activities for any online platform. With the increase in the number of customers, it is crucial to understand customers based on their dynamic behaviors to design effective marketing strategies. Customer segmentation is a widely used approach to group customers into different categories and design the marketing strategy targeting each group individually. Therefore, in this paper, we propose an end-to-end pipeline RE-RFME for segmenting customers into 4 groups: high value, promising, need attention, and need activation. Concretely, we propose a novel RFME (Recency, Frequency, Monetary and Engagement) model to track behavioral features of customers and segment them into different categories. Finally, we train the K-means clustering algorithm to cluster the user into one of the 4 categories. We show the effectiveness of the proposed approach on real-world Housing.com datasets for both website and mobile application users. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.16553 [pdf, other]

doi 10.1145/3632410.3632487

RE-RecSys: An End-to-End system for recommending properties in Real-Estate domain

Authors: Venkatesh C, Harshit Oberoi, Anil Goyal, Nikhil Sikka

Abstract: We propose an end-to-end real-estate recommendation system, RE-RecSys, which has been productionized in real-world industry setting. We categorize any user into 4 categories based on available historical data: i) cold-start users; ii) short-term users; iii) long-term users; and iv) short-long term users. For cold-start users, we propose a novel rule-based engine that is based on the popularity of… ▽ More We propose an end-to-end real-estate recommendation system, RE-RecSys, which has been productionized in real-world industry setting. We categorize any user into 4 categories based on available historical data: i) cold-start users; ii) short-term users; iii) long-term users; and iv) short-long term users. For cold-start users, we propose a novel rule-based engine that is based on the popularity of locality and user preferences. For short-term users, we propose to use content-filtering model which recommends properties based on recent interactions of users. For long-term and short-long term users, we propose a novel combination of content and collaborative filtering based approach which can be easily productionized in the real-world scenario. Moreover, based on the conversion rate, we have designed a novel weighing scheme for different impressions done by users on the platform for the training of content and collaborative models. Finally, we show the efficiency of the proposed pipeline, RE-RecSys, on a real-world property and clickstream dataset collected from leading real-estate platform in India. We show that the proposed pipeline is deployable in real-world scenario with an average latency of <40 ms serving 1000 rpm. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.14146 [pdf]

Physics-based reward driven image analysis in microscopy

Authors: Kamyar Barakati, Hui Yuan, Amit Goyal, Sergei V. Kalinin

Abstract: The rise of electron microscopy has expanded our ability to acquire nanometer and atomically resolved images of complex materials. The resulting vast datasets are typically analyzed by human operators, an intrinsically challenging process due to the multiple possible analysis steps and the corresponding need to build and optimize complex analysis workflows. We present a methodology based on the co… ▽ More The rise of electron microscopy has expanded our ability to acquire nanometer and atomically resolved images of complex materials. The resulting vast datasets are typically analyzed by human operators, an intrinsically challenging process due to the multiple possible analysis steps and the corresponding need to build and optimize complex analysis workflows. We present a methodology based on the concept of a Reward Function coupled with Bayesian Optimization, to optimize image analysis workflows dynamically. The Reward Function is engineered to closely align with the experimental objectives and broader context and is quantifiable upon completion of the analysis. Here, cross-section, high-angle annular dark field (HAADF) images of ion-irradiated $(Y, Dy)Ba_2Cu_3O_{7-δ}$ thin-films were used as a model system. The reward functions were formed based on the expected materials density and atomic spacings and used to drive multi-objective optimization of the classical Laplacian-of-Gaussian (LoG) method. These results can be benchmarked against the DCNN segmentation. This optimized LoG* compares favorably against DCNN in the presence of the additional noise. We further extend the reward function approach towards the identification of partially-disordered regions, creating a physics-driven reward function and action space of high-dimensional clustering. We pose that with correct definition, the reward function approach allows real-time optimization of complex analysis workflows at much higher speeds and lower computational costs than classical DCNN-based inference, ensuring the attainment of results that are both precise and aligned with the human-defined objectives. △ Less

Submitted 5 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 12 pages, 4 figures

arXiv:2404.07428 [pdf, other]

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

Authors: Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

Abstract: Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstra… ▽ More Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents. We introduce AdaDemo (Adaptive Online Demonstration Expansion), a general framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset. AdaDemo strategically collects new demonstrations to address the identified weakness in the existing policy, ensuring data efficiency is maximized. Through a comprehensive evaluation on a total of 22 tasks across two robotic manipulation benchmarks (RLBench and Adroit), we demonstrate AdaDemo's capability to progressively improve policy performance by guiding the generation of high-quality demonstration datasets in a data-efficient manner. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.03183 [pdf, other]

BodyMAP -- Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed

Authors: Abhishek Tandon, Anujraaj Goyal, Henry M. Clever, Zackory Erickson

Abstract: Accurately predicting the 3D human posture and the pressure exerted on the body for people resting in bed, visualized as a body mesh (3D pose & shape) with a 3D pressure map, holds significant promise for healthcare applications, particularly, in the prevention of pressure ulcers. Current methods focus on singular facets of the problem -- predicting only 2D/3D poses, generating 2D pressure images,… ▽ More Accurately predicting the 3D human posture and the pressure exerted on the body for people resting in bed, visualized as a body mesh (3D pose & shape) with a 3D pressure map, holds significant promise for healthcare applications, particularly, in the prevention of pressure ulcers. Current methods focus on singular facets of the problem -- predicting only 2D/3D poses, generating 2D pressure images, predicting pressure only for certain body regions instead of the full body, or forming indirect approximations to the 3D pressure map. In contrast, we introduce BodyMAP, which jointly predicts the human body mesh and 3D applied pressure map across the entire human body. Our network leverages multiple visual modalities, incorporating both a depth image of a person in bed and its corresponding 2D pressure image acquired from a pressure-sensing mattress. The 3D pressure map is represented as a pressure value at each mesh vertex and thus allows for precise localization of high-pressure regions on the body. Additionally, we present BodyMAP-WS, a new formulation of pressure prediction in which we implicitly learn pressure in 3D by aligning sensed 2D pressure images with a differentiable 2D projection of the predicted 3D pressure maps. In evaluations with real-world human data, our method outperforms the current state-of-the-art technique by 25% on both body mesh and 3D applied pressure map prediction tasks for people in bed. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted at CVPR 2024 Project Website: https://bodymap3d.github.io/ Code: https://github.com/RCHI-Lab/BodyMAP

arXiv:2403.18614 [pdf, other]

Energy-ordered resource stratification as an agnostic signature of life

Authors: Akshit Goyal, Mikhail Tikhonov

Abstract: The search for extraterrestrial life hinges on identifying biosignatures, often focusing on gaseous metabolic byproducts as indicators. However, most such biosignatures require assuming specific metabolic processes. It is widely recognized that life on other planets may not resemble that of Earth, but identifying biosignatures ``agnostic'' to such assumptions has remained a challenge. Here, we pro… ▽ More The search for extraterrestrial life hinges on identifying biosignatures, often focusing on gaseous metabolic byproducts as indicators. However, most such biosignatures require assuming specific metabolic processes. It is widely recognized that life on other planets may not resemble that of Earth, but identifying biosignatures ``agnostic'' to such assumptions has remained a challenge. Here, we propose a novel approach by considering the generic outcome of life: the formation of competing ecosystems. We use a minimal model to argue that the presence of ecosystem-level dynamics, characterized by ecological interactions and resource competition, may yield biosignatures independent of specific metabolic activities. Specifically, we propose the emergent stratification of chemical resources in order of decreasing energy content as a candidate new biosignature. While likely inaccessible to remote sensing, this signature could be relevant for sample return missions, or for detection of ancient signatures of life on Earth itself. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 5 pages, 3 figures

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.01276 [pdf, other]

A universal niche geometry governs the response of ecosystems to environmental perturbations

Authors: Akshit Goyal, Jason W. Rocks, Pankaj Mehta

Abstract: How ecosystems respond to environmental perturbations is a fundamental question in ecology, made especially challenging due to the strong coupling between species and their environment. Here, we introduce a theoretical framework for calculating the linear response of ecosystems to environmental perturbations in generalized consumer-resource models. Our construction is applicable to a wide class of… ▽ More How ecosystems respond to environmental perturbations is a fundamental question in ecology, made especially challenging due to the strong coupling between species and their environment. Here, we introduce a theoretical framework for calculating the linear response of ecosystems to environmental perturbations in generalized consumer-resource models. Our construction is applicable to a wide class of systems, including models with non-reciprocal interactions, cross-feeding, and non-linear growth/consumption rates. Within our framework, all ecological variables are embedded into four distinct vector spaces and ecological interactions are represented by geometric transformations between these spaces. We show that near a steady state, such geometric transformations directly map environmental perturbations -- in resource availability and mortality rates -- to shifts in niche structure. We illustrate these ideas in a variety of settings including a minimal model for pH-induced toxicity in bacterial denitrification. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 13 pages, 5 figures

arXiv:2403.01251 [pdf, other]

Accelerating Greedy Coordinate Gradient via Probe Sampling

Authors: Yiran Zhao, Wenyue Zheng, Tianle Cai, Xuan Long Do, Kenji Kawaguchi, Anirudh Goyal, Michael Shieh

Abstract: Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called… ▽ More Safety of Large Language Models (LLMs) has become a critical issue given their rapid progresses. Greedy Coordinate Gradient (GCG) is shown to be effective in constructing adversarial prompts to break the aligned LLMs, but optimization of GCG is time-consuming. To reduce the time cost of GCG and enable more comprehensive studies of LLM safety, in this work, we study a new algorithm called $\texttt{Probe sampling}$. At the core of the algorithm is a mechanism that dynamically determines how similar a smaller draft model's predictions are to the target model's predictions for prompt candidates. When the target model is similar to the draft model, we rely heavily on the draft model to filter out a large number of potential prompt candidates. Probe sampling achieves up to $5.6$ times speedup using Llama2-7b-chat and leads to equal or improved attack success rate (ASR) on the AdvBench. Furthermore, probe sampling is also able to accelerate other prompt optimization techniques and adversarial methods, leading to acceleration of $1.8\times$ for AutoPrompt, $2.4\times$ for APE and $2.4\times$ for AutoDAN. △ Less

Submitted 27 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.18540 [pdf, other]

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Authors: Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora

Abstract: Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extens… ▽ More Public LLMs such as the Llama 2-Chat have driven huge activity in LLM research. These models underwent alignment training and were considered safe. Recently Qi et al. (2023) reported that even benign fine-tuning (e.g., on seemingly safe datasets) can give rise to unsafe behaviors in the models. The current paper is about methods and best practices to mitigate such loss of alignment. Through extensive experiments on several chat models (Meta's Llama 2-Chat, Mistral AI's Mistral 7B Instruct v0.2, and OpenAI's GPT-3.5 Turbo), this paper uncovers that the prompt templates used during fine-tuning and inference play a crucial role in preserving safety alignment, and proposes the "Pure Tuning, Safe Testing" (PTST) principle -- fine-tune models without a safety prompt, but include it at test time. Fine-tuning experiments on GSM8K, ChatDoctor, and OpenOrca show that PTST significantly reduces the rise of unsafe behaviors, and even almost eliminates them in some cases. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 20 pages

arXiv:2402.10786 [pdf]

doi 10.3847/1538-4357/ad2c93

Radio-only and Radio-to-far-ultraviolet Spectral Energy Distribution Modeling of 14 ULIRGs: Insights into the Global Properties of Infrared Bright Galaxies

Authors: Subhrata Dey, Arti Goyal, Katarzyna Małek, Tanio Díaz-Santos

Abstract: We present detailed spectral energy distribution (SED) modeling of 14 local ultraluminous infrared galaxies (ULIRGs) with outstanding photometric data from the literature covering the ultraviolet--infrared (FIR) and radio bands ($\sim$50 MHz to $\sim$30 GHz). We employ the CIGALE SED fitting code to model the ultraviolet--FIR--radio SED. For the radio-only SED modeling, we use the UltraNest packag… ▽ More We present detailed spectral energy distribution (SED) modeling of 14 local ultraluminous infrared galaxies (ULIRGs) with outstanding photometric data from the literature covering the ultraviolet--infrared (FIR) and radio bands ($\sim$50 MHz to $\sim$30 GHz). We employ the CIGALE SED fitting code to model the ultraviolet--FIR--radio SED. For the radio-only SED modeling, we use the UltraNest package, leveraging its nested sampling algorithm. Combining the results from our previous study on 11 luminous infrared galaxies (LIRGs), we discuss the global astrophysical properties of a sample of 25 starburst galaxies (z $<$ 0.5). Their radio spectra are frequently characterized by bends and turnovers, with no indication of ULIRGs exhibiting more complicated SEDs than LIRGs despite showing more signs of interactions. Including radio measurements in the CIGALE modeling constrained the dust luminosity and star formation rate (SFR) estimates by more than 1 order of magnitude better than previously reported for starburst galaxies. We show that total and nonthermal radio luminosity at 1.4 and 4.8 GHz frequencies can be good estimators of recent SFRs for all LIRGs and those ULIRGS with an insignificant influence of active galactic nuclei. A weaker but still significant correlation is observed between radio SFRs at 1.4 GHz and old (averaged over 100 Myr) SFRs based on SED modeling, indicative of multiple episodes of starburst activity during their lifetime. The thermal radio luminosity at 4.8 GHz is a better tracer of recent star formation than the thermal luminosity at 1.4 GHz. Statistically, our modeled nonthermal radio spectral indices do not significantly correlate with redshift, stellar mass, SFR, specific SFR, and dust mass. △ Less

Submitted 7 May, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

Comments: Published in the ApJ

Journal ref: ApJ 966 61 (2024)

arXiv:2401.01623 [pdf, other]

Can AI Be as Creative as Humans?

Authors: Haonan Wang, James Zou, Michael Mozer, Anirudh Goyal, Alex Lamb, Linjun Zhang, Weijie J Su, Zhun Deng, Michael Qizhe Xie, Hannah Brown, Kenji Kawaguchi

Abstract: Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the da… ▽ More Creativity serves as a cornerstone for societal progress and innovation. With the rise of advanced generative AI models capable of tasks once reserved for human creativity, the study of AI's creative potential becomes imperative for its responsible development and application. In this paper, we prove in theory that AI can be as creative as humans under the condition that it can properly fit the data generated by human creators. Therefore, the debate on AI's creativity is reduced into the question of its ability to fit a sufficient amount of data. To arrive at this conclusion, this paper first addresses the complexities in defining creativity by introducing a new concept called Relative Creativity. Rather than attempting to define creativity universally, we shift the focus to whether AI can match the creative abilities of a hypothetical human. The methodological shift leads to a statistically quantifiable assessment of AI's creativity, term Statistical Creativity. This concept, statistically comparing the creative abilities of AI with those of specific human groups, facilitates theoretical exploration of AI's creative potential. Our analysis reveals that by fitting extensive conditional data without marginalizing out the generative conditions, AI can emerge as a hypothetical new creator. The creator possesses the same creative abilities on par with the human creators it was trained on. Building on theoretical findings, we discuss the application in prompt-conditioned autoregressive models, providing a practical means for evaluating creative abilities of generative AI models, such as Large Language Models (LLMs). Additionally, this study provides an actionable training guideline, bridging the theoretical quantification of creativity with practical model training. △ Less

Submitted 25 January, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

Comments: The paper examines AI's creativity, introducing Relative and Statistical Creativity for theoretical and practical analysis, along with practical training guidelines. Project Page: ai-relative-creativity.github.io

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.09191 [pdf, other]

doi 10.1007/s11207-023-02244-0

Solar flare catalog from 3 years of Chandrayaan-2 XSM observations

Authors: Aravind Bharathi Valluvan, Ashwin Goyal, Devansh Jain, Abhinna Sundar Samantaray, Abhilash Sarwade, Kasiviswanathan Sankarasubramanian

Abstract: We present a catalog of 6266 solar flares detected by the X-Ray Solar Monitor onboard the Chandrayaan-2 lunar orbiter between 1.55 and 12.4 keV (1 and 8 Å) from 2019 September 12 to 2022 November 4, including 1469 type A flares. The catalog represents the first large sample, including both type A, hot thermal flares, and type B, impulsive flares, with a sub-A class sensitive instrument. We also de… ▽ More We present a catalog of 6266 solar flares detected by the X-Ray Solar Monitor onboard the Chandrayaan-2 lunar orbiter between 1.55 and 12.4 keV (1 and 8 Å) from 2019 September 12 to 2022 November 4, including 1469 type A flares. The catalog represents the first large sample, including both type A, hot thermal flares, and type B, impulsive flares, with a sub-A class sensitive instrument. We also detect 213 sub-A and 1330 A class flares. Individual flares are fit with an exponentially-modified Gaussian function and multi-flare groups are decomposed into individual flares. We validate our findings with flare catalogs made using visual inspection as well as automatic pipelines on Geostationary Operational Environmental Satellite and Solar Dynamics Observatory data. We find a clear bimodality in the ratio of the width to decay time between type A and B flares. We infer a power-law index of $α_F = 1.92 \pm 0.09$ for the background-subtracted peak flux distribution of XSM flares, which is consistent with the value $\sim 2$ reported in the literature. We also infer $α_F = 1.90 \pm 0.09$ for type B, and $α_F = 1.94 \pm 0.08$ for type A flares, which has previously not been reported in the literature. These comparable values hint at a similarity in their generative processes. △ Less

Submitted 8 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 29 pages, 15 figures, 5 tables

arXiv:2312.08807 [pdf, other]

Constraining 2HDM+S model through W-boson mass measurements

Authors: Anza-Tshilidzi Mulaudzi, Mukesh Kumar, Ashok Goyal, Bruce Mellado

Abstract: Following a discussion on $W$-boson mass observed at the CDF and ATLAS, we explore the parameter space allowed in the 2HDM+$S$ model. Further, the model parameter space is constrained through vector-like leptons via muon $g-2$ measurements. We show our results for additional scalar mass fixed to $m_S \approx 95$ and $150$~GeV keeping the standard Higgs-boson mass at 125~GeV in all four types of 2H… ▽ More Following a discussion on $W$-boson mass observed at the CDF and ATLAS, we explore the parameter space allowed in the 2HDM+$S$ model. Further, the model parameter space is constrained through vector-like leptons via muon $g-2$ measurements. We show our results for additional scalar mass fixed to $m_S \approx 95$ and $150$~GeV keeping the standard Higgs-boson mass at 125~GeV in all four types of 2HDM+$S$ model. The chosen mass of the singlet scalar is motivated by the excesses seen at the CMS and ATLAS data in proton-proton collisions at the Large Hadron Collider. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: 6 pages, 2 figures, Talk presented at SAIP2023

arXiv:2311.15268 [pdf, other]

Unlearning via Sparse Representations

Authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal

Abstract: Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's p… ▽ More Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.14910 [pdf, other]

A latent linear model for nonlinear coupled oscillators on graphs

Authors: Agam Goyal, Zhaoxing Wu, Richard P. Yim, Binhao Chen, Zihong Xu, Hanbaek Lyu

Abstract: A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization between nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlinear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprising… ▽ More A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization between nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlinear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprisingly, such nonlinear behavior of coupled oscillators can be effectively linearized in certain latent dynamic spaces. The key insight is that there is a small number of `latent dynamics filters', each with a specific association with synchronizing and non-synchronizing dynamics on subgraphs so that any observed dynamics on subgraphs can be approximated by a suitable linear combination of such elementary dynamic patterns. Taking an ensemble of subgraph-level predictions provides an interpretable predictor for whether the system on the whole graph reaches global synchronization. We propose algorithms based on supervised matrix factorization to learn such latent dynamics filters. We demonstrate that our method performs competitively in synchronization prediction tasks against baselines and black-box classification algorithms, despite its simple and interpretable architecture. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: 23 pages, 14 figures

arXiv:2311.13577 [pdf, other]

Physical Reasoning and Object Planning for Household Embodied Agents

Authors: Ayush Agrawal, Raghav Prabhakar, Anirudh Goyal, Dianbo Liu

Abstract: In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can e… ▽ More In this study, we explore the sophisticated domain of task planning for robust household embodied agents, with a particular emphasis on the intricate task of selecting substitute objects. We introduce the CommonSense Object Affordance Task (COAT), a novel framework designed to analyze reasoning capabilities in commonsense scenarios. This approach is centered on understanding how these agents can effectively identify and utilize alternative objects when executing household tasks, thereby offering insights into the complexities of practical decision-making in real-world environments.Drawing inspiration from human decision-making, we explore how large language models tackle this challenge through three meticulously crafted commonsense question-and-answer datasets, featuring refined rules and human annotations. Our evaluation of state-of-the-art language models on these datasets sheds light on three pivotal considerations: 1) aligning an object's inherent utility with the task at hand, 2) navigating contextual dependencies (societal norms, safety, appropriateness, and efficiency), and 3) accounting for the current physical state of the object. To maintain accessibility, we introduce five abstract variables reflecting an object's physical condition, modulated by human insights to simulate diverse household scenarios. Our contributions include insightful Object-Utility mappings addressing the first consideration and two extensive QA datasets (15k and 130k questions) probing the intricacies of contextual dependencies and object states. The datasets, along with our findings, are accessible at: \url{https://github.com/com-phy-affordance/COAT}. This research not only advances our understanding of physical commonsense reasoning in language models but also paves the way for future improvements in household agent intelligence. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Total: 32 pages ( 16 pages main content, 11 Figures)

arXiv:2311.09665 [pdf, other]

The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

Authors: Yun-Shiuan Chuang, Siddharth Suresh, Nikunj Harlalka, Agam Goyal, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human gro… ▽ More Human groups are able to converge on more accurate beliefs through deliberation, even in the presence of polarization and partisan bias -- a phenomenon known as the "wisdom of partisan crowds." Generated agents powered by Large Language Models (LLMs) are increasingly used to simulate human collective behavior, yet few benchmarks exist for evaluating their dynamics against the behavior of human groups. In this paper, we examine the extent to which the wisdom of partisan crowds emerges in groups of LLM-based agents that are prompted to role-play as partisan personas (e.g., Democrat or Republican). We find that they not only display human-like partisan biases, but also converge to more accurate beliefs through deliberation as humans do. We then identify several factors that interfere with convergence, including the use of chain-of-thought prompt and lack of details in personas. Conversely, fine-tuning on human data appears to enhance convergence. These findings show the potential and limitations of LLM-based agents as a model of human collective intelligence. △ Less

Submitted 16 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09618 [pdf, other]

Simulating Opinion Dynamics with Networks of LLM-based Agents

Authors: Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert Hawkins, Sijia Yang, Dhavan Shah, Junjie Hu, Timothy T. Rogers

Abstract: Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings re… ▽ More Accurately simulating human opinion dynamics is crucial for understanding a variety of societal phenomena, including polarization and the spread of misinformation. However, the agent-based models (ABMs) commonly used for such simulations often over-simplify human behavior. We propose a new approach to simulating opinion dynamics based on populations of Large Language Models (LLMs). Our findings reveal a strong inherent bias in LLM agents towards producing accurate information, leading simulated agents to consensus in line with scientific reality. This bias limits their utility for understanding resistance to consensus views on issues like climate change. After inducing confirmation bias through prompt engineering, however, we observed opinion fragmentation in line with existing agent-based modeling and opinion dynamics research. These insights highlight the promise and limitations of LLM agents in this domain and suggest a path forward: refining LLMs with real-world discourse to better simulate the evolution of human beliefs. △ Less

Submitted 31 March, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.17567 [pdf, other]

Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models

Authors: Dingli Yu, Simran Kaur, Arushi Gupta, Jonah Brown-Cohen, Anirudh Goyal, Sanjeev Arora

Abstract: With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023). This… ▽ More With LLMs shifting their role from statistical modeling of language to serving as general-purpose AI agents, how should LLM evaluations change? Arguably, a key ability of an AI agent is to flexibly combine, as needed, the basic skills it has learned. The capability to combine skills plays an important role in (human) pedagogy and also in a paper on emergence phenomena (Arora & Goyal, 2023). This work introduces Skill-Mix, a new evaluation to measure ability to combine skills. Using a list of $N$ skills the evaluator repeatedly picks random subsets of $k$ skills and asks the LLM to produce text combining that subset of skills. Since the number of subsets grows like $N^k$, for even modest $k$ this evaluation will, with high probability, require the LLM to produce text significantly different from any text in the training set. The paper develops a methodology for (a) designing and administering such an evaluation, and (b) automatic grading (plus spot-checking by humans) of the results using GPT-4 as well as the open LLaMA-2 70B model. Administering a version of to popular chatbots gave results that, while generally in line with prior expectations, contained surprises. Sizeable differences exist among model capabilities that are not captured by their ranking on popular LLM leaderboards ("cramming for the leaderboard"). Furthermore, simple probability calculations indicate that GPT-4's reasonable performance on $k=5$ is suggestive of going beyond "stochastic parrot" behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training. We sketch how the methodology can lead to a Skill-Mix based eco-system of open evaluations for AI capabilities of future models. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.10928 [pdf, ps, other]

Using Audio Data to Facilitate Depression Risk Assessment in Primary Health Care

Authors: Adam Valen Levinson, Abhay Goyal, Roger Ho Chun Man, Roy Ka-Wei Lee, Koustuv Saha, Nimay Parekh, Frederick L. Altice, Lam Yin Cheung, Munmun De Choudhury, Navin Kumar

Abstract: Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve ove… ▽ More Telehealth is a valuable tool for primary health care (PHC), where depression is a common condition. PHC is the first point of contact for most people with depression, but about 25% of diagnoses made by PHC physicians are inaccurate. Many other barriers also hinder depression detection and treatment in PHC. Artificial intelligence (AI) may help reduce depression misdiagnosis in PHC and improve overall diagnosis and treatment outcomes. Telehealth consultations often have video issues, such as poor connectivity or dropped calls. Audio-only telehealth is often more practical for lower-income patients who may lack stable internet connections. Thus, our study focused on using audio data to predict depression risk. The objectives were to: 1) Collect audio data from 24 people (12 with depression and 12 without mental health or major health condition diagnoses); 2) Build a machine learning model to predict depression risk. TPOT, an autoML tool, was used to select the best machine learning algorithm, which was the K-nearest neighbors classifier. The selected model had high performance in classifying depression risk (Precision: 0.98, Recall: 0.93, F1-Score: 0.96). These findings may lead to a range of tools to help screen for and treat depression. By developing tools to detect depression risk, patients can be routed to AI-driven chatbots for initial screenings. Partnerships with a range of stakeholders are crucial to implementing these solutions. Moreover, ethical considerations, especially around data privacy and potential biases in AI models, need to be at the forefront of any AI-driven intervention in mental health care. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.03739 [pdf, other]

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Authors: Mihir Prabhudesai, Anirudh Goyal, Deepak Pathak, Katerina Fragkiadaki

Abstract: Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works fi… ▽ More Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their unsupervised training, controlling their behavior in downstream tasks, such as maximizing human-perceived image quality, image-text alignment, or ethical image generation, is difficult. Recent works finetune diffusion models to downstream reward functions using vanilla reinforcement learning, notorious for the high variance of the gradient estimators. In this paper, we propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient through the denoising process. While naive implementation of such backpropagation would require prohibitive memory resources for storing the partial derivatives of modern text-to-image models, AlignProp finetunes low-rank adapter weight modules and uses gradient checkpointing, to render its memory usage viable. We test AlignProp in finetuning diffusion models to various objectives, such as image-text semantic alignment, aesthetics, compressibility and controllability of the number of objects present, as well as their combinations. We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler, making it a straightforward choice for optimizing diffusion models for differentiable reward functions of interest. Code and Visualization results are available at https://align-prop.github.io/. △ Less

Submitted 22 June, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: Code available at https://align-prop.github.io/

arXiv:2309.13502 [pdf, ps, other]

Globally Solving a Class of Bilevel Programs with Spatial Price Equilibrium Constraints

Authors: Akshit Goyal, Jean-Philippe P. Richard

Abstract: Bilevel programs with spatial price equilibrium constraints are strategic models that consider a price competition at the lower level. These models find application in facility location-price models, optimal bidding in power networks, and integration of renewable energy sources in distribution networks. In this paper, for the case where the equilibrium at the lower level can be formulated as an op… ▽ More Bilevel programs with spatial price equilibrium constraints are strategic models that consider a price competition at the lower level. These models find application in facility location-price models, optimal bidding in power networks, and integration of renewable energy sources in distribution networks. In this paper, for the case where the equilibrium at the lower level can be formulated as an optimization problem, we introduce an enhanced single-level formulation based on duality and show that its relaxation is stronger than the single-level formulation obtained using KKT conditions. Compared to the literature [1, 2], this new formulation (i) is computationally friendly to global solution strategies using branch-and-bound, and (ii) can tackle instances of larger size. Further, we develop a heuristic procedure to find feasible solutions inside of the branch-and-bound tree that is effective on instances of large size and produces solutions whose objective values are close to the relaxation bound. We demonstrate the benefits of this formulation and heuristic through an extensive numerical study on synthetic instances of Equilibrium Facility Location [3] and on standard IEEE bus networks for planning renewable generation capacity under uncertainty. △ Less

Submitted 23 June, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.00102 [pdf, other]

doi 10.1051/0004-6361/202347333

The LOFAR Two-Metre Sky Survey (LoTSS): VI. Optical identifications for the second data release

Authors: M. J. Hardcastle, M. A. Horton, W. L. Williams, K. J. Duncan, L. Alegre, B. Barkus, J. H. Croston, H. Dickinson, E. Osinga, H. J. A. Röttgering, J. Sabater, T. W. Shimwell, D. J. B. Smith, P. N. Best, A. Botteon, M. Brüggen, A. Drabent, F. de Gasperin, G. Gürkan, M. Hajduk, C. L. Hale, M. Hoeft, M. Jamrozy, M. Kunert-Bajraszewska, R. Kondapally , et al. (27 additional authors not shown)

Abstract: The second data release of the LOFAR Two-Metre Sky Survey (LoTSS) covers 27% of the northern sky, with a total area of $\sim 5,700$ deg$^2$. The high angular resolution of LOFAR with Dutch baselines (6 arcsec) allows us to carry out optical identifications of a large fraction of the detected radio sources without further radio followup; however, the process is made more challenging by the many ext… ▽ More The second data release of the LOFAR Two-Metre Sky Survey (LoTSS) covers 27% of the northern sky, with a total area of $\sim 5,700$ deg$^2$. The high angular resolution of LOFAR with Dutch baselines (6 arcsec) allows us to carry out optical identifications of a large fraction of the detected radio sources without further radio followup; however, the process is made more challenging by the many extended radio sources found in LOFAR images as a result of its excellent sensitivity to extended structure. In this paper we present source associations and identifications for sources in the second data release based on optical and near-infrared data, using a combination of a likelihood-ratio cross-match method developed for our first data release, our citizen science project Radio Galaxy Zoo: LOFAR, and new approaches to algorithmic optical identification, together with extensive visual inspection by astronomers. We also present spectroscopic or photometric redshifts for a large fraction of the optical identifications. In total 4,116,934 radio sources lie in the area with good optical data, of which 85% have an optical or infrared identification and 58% have a good redshift estimate. We demonstrate the quality of the dataset by comparing it with earlier optically identified radio surveys. This is by far the largest ever optically identified radio catalogue, and will permit robust statistical studies of star-forming and radio-loud active galaxies. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: 29 pages. Accepted by A&A; data products available at https://lofar-surveys.org/dr2_release.html

Journal ref: A&A 678, A151 (2023)

arXiv:2308.14969 [pdf, other]

Uncovering the Hidden Cost of Model Compression

Authors: Diganta Misra, Muawiz Chaudhary, Agam Goyal, Bharat Runwal, Pin Yu Chen

Abstract: In an age dominated by resource-intensive foundation models, the ability to efficiently adapt to downstream tasks is crucial. Visual Prompting (VP), drawing inspiration from the prompting techniques employed in Large Language Models (LLMs), has emerged as a pivotal method for transfer learning in the realm of computer vision. As the importance of efficiency continues to rise, research into model c… ▽ More In an age dominated by resource-intensive foundation models, the ability to efficiently adapt to downstream tasks is crucial. Visual Prompting (VP), drawing inspiration from the prompting techniques employed in Large Language Models (LLMs), has emerged as a pivotal method for transfer learning in the realm of computer vision. As the importance of efficiency continues to rise, research into model compression has become indispensable in alleviating the computational burdens associated with training and deploying over-parameterized neural networks. A primary objective in model compression is to develop sparse and/or quantized models capable of matching or even surpassing the performance of their over-parameterized, full-precision counterparts. Although previous studies have explored the effects of model compression on transfer learning, its impact on visual prompting-based transfer remains unclear. This study aims to bridge this gap, shedding light on the fact that model compression detrimentally impacts the performance of visual prompting-based transfer, particularly evident in scenarios with low data volume. Furthermore, our findings underscore the adverse influence of sparsity on the calibration of downstream visual-prompted models. However, intriguingly, we also illustrate that such negative effects on calibration are not present when models are compressed via quantization. This empirical investigation underscores the need for a nuanced understanding beyond mere accuracy in sparse and quantized settings, thereby paving the way for further exploration in Visual Prompting techniques tailored for sparse and quantized models. △ Less

Submitted 15 March, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Preprint

arXiv:2307.15936 [pdf, other]

A Theory for Emergence of Complex Skills in Language Models

Authors: Sanjeev Arora, Anirudh Goyal

Abstract: A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws… ▽ More A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves. △ Less

Submitted 5 November, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

arXiv:2307.15875 [pdf, other]

doi 10.3847/1538-4357/acebe2

Enhanced Size Uniformity for Near-resonant Planets

Authors: Armaan V. Goyal, Fei Dai, Songhu Wang

Abstract: Super-Earths within the same close-in, compact planetary system tend to exhibit a striking degree of uniformity in their radius, mass, and orbital spacing, and this 'peas-in-a-pod' phenomenon itself serves to provide one of the strongest constrains on planet formation at large. While it has been recently demonstrated from independent samples that such planetary uniformity occurs for both configura… ▽ More Super-Earths within the same close-in, compact planetary system tend to exhibit a striking degree of uniformity in their radius, mass, and orbital spacing, and this 'peas-in-a-pod' phenomenon itself serves to provide one of the strongest constrains on planet formation at large. While it has been recently demonstrated from independent samples that such planetary uniformity occurs for both configurations near and distant from mean motion resonance, the question thus remains if the strength of this uniformity itself differs between near-resonant and nonresonant configurations such that the two modes may be astrophysically distinct in their evolution. We thus provide in this work a novel comparative size uniformity analysis for 48 near-resonant and 251 nonresonant multi-planet systems from the California Kepler Survey catalog, evaluating uniformity both across systems and between planetary pairs within the same system. We find that while multiplanet configurations exhibit strong peas-in-a-pod size uniformity regardless of their proximity to resonance, near-resonant configurations display enhanced intra-system size uniformity as compared to their analogous nonresonant counterparts at the level of both entire systems and subsystem planetary pairs and chains. These results are broadly consistent with a variety of formation paradigms for multiplanet systems, such as convergent migration within a turbulent protoplanetary disk or planet-planet interactions incited by postnebular dynamical instabilities. Nevertheless, further investigation is necessary to ascertain whether the nonresonant and near-resonant planetary configurations respectively evolve via a singular process or mechanisms that are dynamically distinct. △ Less

Submitted 15 August, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 15 pages, 6 figures. Accepted to ApJ July 2023

arXiv:2307.12402 [pdf, ps, other]

ChatGPT and Bard Responses to Polarizing Questions

Authors: Abhay Goyal, Muhammad Siddique, Nimay Parekh, Zach Schwitzky, Clara Broekaert, Connor Michelotti, Allie Wong, Lam Yin Cheung, Robin O Hanlon, Lam Yin Cheung, Munmun De Choudhury, Roy Ka-Wei Lee, Navin Kumar

Abstract: Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometime… ▽ More Recent developments in natural language processing have demonstrated the potential of large language models (LLMs) to improve a range of educational and learning outcomes. Of recent chatbots based on LLMs, ChatGPT and Bard have made it clear that artificial intelligence (AI) technology will have significant implications on the way we obtain and search for information. However, these tools sometimes produce text that is convincing, but often incorrect, known as hallucinations. As such, their use can distort scientific facts and spread misinformation. To counter polarizing responses on these tools, it is critical to provide an overview of such responses so stakeholders can determine which topics tend to produce more contentious responses -- key to developing targeted regulatory policy and interventions. In addition, there currently exists no annotated dataset of ChatGPT and Bard responses around possibly polarizing topics, central to the above aims. We address the indicated issues through the following contribution: Focusing on highly polarizing topics in the US, we created and described a dataset of ChatGPT and Bard responses. Broadly, our results indicated a left-leaning bias for both ChatGPT and Bard, with Bard more likely to provide responses around polarizing topics. Bard seemed to have fewer guardrails around controversial topics, and appeared more willing to provide comprehensive, and somewhat human-like responses. Bard may thus be more likely abused by malicious actors. Stakeholders may utilize our findings to mitigate misinformative and/or polarizing responses from LLMs △ Less

Submitted 13 July, 2023; originally announced July 2023.

arXiv:2307.04751 [pdf, other]

Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

Authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox

Abstract: We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of… ▽ More We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/ △ Less

Submitted 10 July, 2023; originally announced July 2023.

Comments: Project page: https://anthonysimeonov.github.io/rpdiff-multi-modal/

arXiv:2307.04053 [pdf, other]

How is Fatherhood Framed Online in Singapore?

Authors: Tran Hien Van, Abhay Goyal, Muhammad Siddique, Lam Yin Cheung, Nimay Parekh, Jonathan Y Huang, Keri McCrickerd, Edson C Tandoc Jr., Gerard Chung, Navin Kumar

Abstract: The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705… ▽ More The proliferation of discussion about fatherhood in Singapore attests to its significance, indicating the need for an exploration of how fatherhood is framed, aiding policy-making around fatherhood in Singapore. Sound and holistic policy around fatherhood in Singapore may reduce stigma and apprehension around being a parent, critical to improving the nations flagging birth rate. We analyzed 15,705 articles and 56,221 posts to study how fatherhood is framed in Singapore across a range of online platforms (news outlets, parenting forums, Twitter). We used NLP techniques to understand these differences. While fatherhood was framed in a range of ways on the Singaporean online environment, it did not seem that fathers were framed as central to the Singaporean family unit. A strength of our work is how the different techniques we have applied validate each other. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2307.03083 [pdf, other]

Predicting Opioid Use Outcomes in Minoritized Communities

Authors: Abhay Goyal, Nimay Parekh, Lam Yin Cheung, Koustuv Saha, Frederick L Altice, Robin O'hanlon, Roger Ho Chun Man, Christian Poellabauer, Honoria Guarino, Pedro Mateu Gelabert, Navin Kumar

Abstract: Machine learning algorithms can sometimes exacerbate health disparities based on ethnicity, gender, and other factors. There has been limited work at exploring potential biases within algorithms deployed on a small scale, and/or within minoritized communities. Understanding the nature of potential biases may improve the prediction of various health outcomes. As a case study, we used data from a sa… ▽ More Machine learning algorithms can sometimes exacerbate health disparities based on ethnicity, gender, and other factors. There has been limited work at exploring potential biases within algorithms deployed on a small scale, and/or within minoritized communities. Understanding the nature of potential biases may improve the prediction of various health outcomes. As a case study, we used data from a sample of 539 young adults from minoritized communities who engaged in nonmedical use of prescription opioids and/or heroin. We addressed the indicated issues through the following contributions: 1) Using machine learning techniques, we predicted a range of opioid use outcomes for participants in our dataset; 2) We assessed if algorithms trained only on a majority sub-sample (e.g., Non-Hispanic/Latino, male), could accurately predict opioid use outcomes for a minoritized sub-sample (e.g., Latino, female). Results indicated that models trained on a random sample of our data could predict a range of opioid use outcomes with high precision. However, we noted a decrease in precision when we trained our models on data from a majority sub-sample, and tested these models on a minoritized sub-sample. We posit that a range of cultural factors and systemic forms of discrimination are not captured by data from majority sub-samples. Broadly, for predictions to be valid, models should be trained on data that includes adequate representation of the groups of people about whom predictions will be made. Stakeholders may utilize our findings to mitigate biases in models for predicting opioid use outcomes within minoritized communities. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Showing 1–50 of 309 results for author: Goyal, A