Search | arXiv e-print repository

MoRe Fine-Tuning with 10x Fewer Parameters

Authors: Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang, Jitian Zhao, John Cooper, Samuel Guo, Chengyu Duan, Frederic Sala

Abstract: Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from… ▽ More Parameter-efficient fine-tuning (PEFT) techniques have unlocked the potential to cheaply and easily specialize large pretrained models. However, the most prominent approaches, like low-rank adapters (LoRA), depend on heuristics or rules-of-thumb for their architectural choices -- potentially limiting their performance for new models and architectures. This limitation suggests that techniques from neural architecture search could be used to obtain optimal adapter architectures, but these are often expensive and difficult to implement. We address this challenge with Monarch Rectangular Fine-tuning (MoRe), a simple framework to search over adapter architectures that relies on the Monarch matrix class. Theoretically, we show that MoRe is more expressive than LoRA. Empirically, our approach is more parameter-efficient and performant than state-of-the-art PEFTs on a range of tasks and models, with as few as 5\% of LoRA's parameters. △ Less

Submitted 30 August, 2024; originally announced August 2024.

arXiv:2407.12450 [pdf, other]

Interim report for the International Muon Collider Collaboration (IMCC)

Authors: C. Accettura, S. Adrian, R. Agarwal, C. Ahdida, C. Aimé, A. Aksoy, G. L. Alberghi, S. Alden, N. Amapane, D. Amorim, P. Andreetto, F. Anulli, R. Appleby, A. Apresyan, P. Asadi, M. Attia Mahmoud, B. Auchmann, J. Back, A. Badea, K. J. Bae, E. J. Bahng, L. Balconi, F. Balli, L. Bandiera, C. Barbagallo , et al. (362 additional authors not shown)

Abstract: The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accele… ▽ More The International Muon Collider Collaboration (IMCC) [1] was established in 2020 following the recommendations of the European Strategy for Particle Physics (ESPP) and the implementation of the European Strategy for Particle Physics-Accelerator R&D Roadmap by the Laboratory Directors Group [2], hereinafter referred to as the the European LDG roadmap. The Muon Collider Study (MuC) covers the accelerator complex, detectors and physics for a future muon collider. In 2023, European Commission support was obtained for a design study of a muon collider (MuCol) [3]. This project started on 1st March 2023, with work-packages aligned with the overall muon collider studies. In preparation of and during the 2021-22 U.S. Snowmass process, the muon collider project parameters, technical studies and physics performance studies were performed and presented in great detail. Recently, the P5 panel [4] in the U.S. recommended a muon collider R&D, proposed to join the IMCC and envisages that the U.S. should prepare to host a muon collider, calling this their "muon shot". In the past, the U.S. Muon Accelerator Programme (MAP) [5] has been instrumental in studies of concepts and technologies for a muon collider. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: This document summarises the International Muon Collider Collaboration (IMCC) progress and status of the Muon Collider R&D programme

arXiv:2407.11004 [pdf, other]

The ALCHEmist: Automated Labeling 500x CHEaper Than LLM Data Annotators

Authors: Tzu-Heng Huang, Catherine Cao, Vaishnavi Bhargava, Frederic Sala

Abstract: Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a… ▽ More Large pretrained models can be used as annotators, helping replace or augment crowdworkers and enabling distilling generalist models into smaller specialist models. Unfortunately, this comes at a cost: employing top-of-the-line models often requires paying thousands of dollars for API calls, while the resulting datasets are static and challenging to audit. To address these challenges, we propose a simple alternative: rather than directly querying labels from pretrained models, we task models to generate programs that can produce labels. These programs can be stored and applied locally, re-used and extended, and cost orders of magnitude less. Our system, Alchemist, obtains comparable to or better performance than large language model-based annotation in a range of tasks for a fraction of the cost: on average, improvements amount to a 12.9% enhancement while the total labeling costs across all datasets are reduced by a factor of approximately 500x. △ Less

Submitted 25 June, 2024; originally announced July 2024.

arXiv:2407.03651 [pdf, other]

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Authors: Amanda Dsouza, Christopher Glaze, Changho Shin, Frederic Sala

Abstract: Large language models are prominently used in real-world applications, often tasked with reasoning over large volumes of documents. An exciting development in this space is models boasting extended context capabilities, with some accommodating over 2 million tokens. Such long context model capabilities remain uncertain in production systems, motivating the need to benchmark their performance on re… ▽ More Large language models are prominently used in real-world applications, often tasked with reasoning over large volumes of documents. An exciting development in this space is models boasting extended context capabilities, with some accommodating over 2 million tokens. Such long context model capabilities remain uncertain in production systems, motivating the need to benchmark their performance on real world use cases. We address this challenge by proposing SWiM, an evaluation framework that addresses the limitations of standard tests. Testing the framework on eight long context models, we find that even strong models such as GPT-4 and Claude 3 Opus degrade in performance when information is present in the middle of the context window (lost-in-the-middle effect). Next, in addition to our benchmark, we propose medoid voting, a simple, but effective training-free approach that helps alleviate this effect, by generating responses a few times, each time randomly permuting documents in the context, and selecting the medoid answer. We evaluate medoid voting on single document QA tasks, achieving up to a 24% lift in accuracy. Our code is available at https://github.com/snorkel-ai/long-context-eval. △ Less

Submitted 14 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.01667 [pdf, other]

ALP leptogenesis

Authors: Martina Cataldi, Alberto Mariotti, Filippo Sala, Miguel Vanvlasselaer

Abstract: We propose a novel realisation of leptogenesis that relies on the out-of-equilibrium decay of an axion-like particle (ALP) into right-handed Majorana neutrinos (RHNs) in the early Universe. With respect to standard thermal leptogenesis, our mechanism lowers by two orders of magnitude the RHN mass, or the tuning in the RHN mass splittings, needed to reproduce the baryon asymmetry of the Universe an… ▽ More We propose a novel realisation of leptogenesis that relies on the out-of-equilibrium decay of an axion-like particle (ALP) into right-handed Majorana neutrinos (RHNs) in the early Universe. With respect to standard thermal leptogenesis, our mechanism lowers by two orders of magnitude the RHN mass, or the tuning in the RHN mass splittings, needed to reproduce the baryon asymmetry of the Universe and neutrino masses. We find that ALP leptogenesis requires $m_a > 10^{4}$ GeV and $f_a > 10^{11}$ GeV for the ALP mass and decay constant, and predicts an early period of matter domination induced by the ALP in parts of its parameter space. We finally provide a viable supersymmetric realisation of ALP leptogenesis where the ALP is the $R$-axion, which accommodates GeV gravitino dark matter and predicts RHN below 10 TeV. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 28 pages, 10 figures

Report number: DESY-24-095

arXiv:2406.03642 [pdf, other]

Is Free Self-Alignment Possible?

Authors: Dyah Adila, Changho Shin, Yijing Zhang, Frederic Sala

Abstract: Aligning pretrained language models (LMs) is a complex and resource-intensive process, often requiring access to large amounts of ground-truth preference data and substantial compute. Are these costs necessary? That is, it is possible to align using only inherent model knowledge and without additional training? We tackle this challenge with AlignEZ, a novel approach that uses (1) self-generated pr… ▽ More Aligning pretrained language models (LMs) is a complex and resource-intensive process, often requiring access to large amounts of ground-truth preference data and substantial compute. Are these costs necessary? That is, it is possible to align using only inherent model knowledge and without additional training? We tackle this challenge with AlignEZ, a novel approach that uses (1) self-generated preference data and (2) representation editing to provide nearly cost-free alignment. During inference, AlignEZ modifies LM representations to reduce undesirable and boost desirable components using subspaces identified via self-generated preference pairs. Our experiments reveal that this nearly cost-free procedure significantly narrows the gap between base pretrained and tuned models by an average of 31.6%, observed across six datasets and three model architectures. Additionally, we explore the potential of using AlignEZ as a means of expediting more expensive alignment procedures. Our experiments show that AlignEZ improves DPO models tuned only using a small subset of ground-truth preference data. Lastly, we study the conditions under which improvement using AlignEZ is feasible, providing valuable insights into its effectiveness. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.00894 [pdf, other]

Pretrained Hybrids with MAD Skills

Authors: Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala

Abstract: While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: i… ▽ More While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose $\textbf{Manticore}$, a framework that addresses these challenges. Manticore $\textit{automates the design of hybrid architectures}$ while reusing pretrained models to create $\textit{pretrained}$ hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then fine-tune hybrids that combine pretrained models from different architecture families -- such as the GPT series and Mamba -- end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to $\textit{program}$ pretrained hybrids to have certain capabilities. Manticore hybrids outperform existing manually-designed hybrids, achieve strong performance on Long Range Arena (LRA) tasks, and can improve on pretrained transformers and state space models. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2405.04568 [pdf, other]

Relic Neutrino Background from Cosmic-Ray Reservoirs

Authors: Andrea Giovanni De Marchi, Alessandro Granelli, Jacopo Nava, Filippo Sala

Abstract: We compute the flux of relic neutrino background (R$ν$B) up-scattered by ultra-high-energy (UHE) cosmic rays (CRs) in clusters that act as CR-reservoirs. The long trapping times of UHECRs make this flux larger than that of R$ν$B up-scattered by UHECRs on their way to Earth, which we also compute. We find that IceCube excludes R$ν$B weighted overdensities larger than $10^{10}$ in clusters, and that… ▽ More We compute the flux of relic neutrino background (R$ν$B) up-scattered by ultra-high-energy (UHE) cosmic rays (CRs) in clusters that act as CR-reservoirs. The long trapping times of UHECRs make this flux larger than that of R$ν$B up-scattered by UHECRs on their way to Earth, which we also compute. We find that IceCube excludes R$ν$B weighted overdensities larger than $10^{10}$ in clusters, and that PUEO, RNO-G, GRAND and IceCube-Gen2 will test values down to $10^{8}$. Our treatment incorporates the momentum transfer dependence of the neutrino-nucleus cross section, deep inelastic scattering, a mixed UHECR composition, and flavour information on the up-scattered R$ν$B fluxes for both cases of neutrino mass spectrum with normal and inverted ordering, providing new handles to possibly disentangle the up-scattered R$ν$B from cosmogenic neutrinos. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 5 pages + appendices, 5 figures, 1 table

arXiv:2404.16188 [pdf, other]

Pearls from Pebbles: Improved Confidence Functions for Auto-labeling

Authors: Harit Vishwakarma, Reid, Chen, Sui Jiet Tay, Satya Sai Srinath Namburi, Frederic Sala, Ramya Korlakai Vinayak

Abstract: Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a… ▽ More Auto-labeling is an important family of techniques that produce labeled training sets with minimum manual labeling. A prominent variant, threshold-based auto-labeling (TBAL), works by finding a threshold on a model's confidence scores above which it can accurately label unlabeled data points. However, many models are known to produce overconfident scores, leading to poor TBAL performance. While a natural idea is to apply off-the-shelf calibration methods to alleviate the overconfidence issue, such methods still fall short. Rather than experimenting with ad-hoc choices of confidence functions, we propose a framework for studying the \emph{optimal} TBAL confidence function. We develop a tractable version of the framework to obtain \texttt{Colander} (Confidence functions for Efficient and Reliable Auto-labeling), a new post-hoc method specifically designed to maximize performance in TBAL systems. We perform an extensive empirical evaluation of our method \texttt{Colander} and compare it against methods designed for calibration. \texttt{Colander} achieves up to 60\% improvements on coverage over the baselines while maintaining auto-labeling error below $5\%$ and using the same amount of labeled data as the baselines. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.08461 [pdf, other]

OTTER: Improving Zero-Shot Classification via Optimal Transport

Authors: Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

Abstract: Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge o… ▽ More Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label distribution. Existing approaches that seek to repair the label distribution are not suitable in zero-shot settings, as they have incompatible requirements such as access to labeled downstream task data or knowledge of the true label balance in the pretraining distribution. We sidestep these challenges and introduce a simple and lightweight approach to adjust pretrained model predictions via optimal transport. Our technique requires only an estimate of the label distribution of a downstream task. Theoretically, we characterize the improvement produced by our procedure under certain mild conditions and provide bounds on the error caused by misspecification. Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like Prior Matching -- often by significant margins -- in 17 out of 21 datasets. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 29 pages

arXiv:2403.07391 [pdf, ps, other]

Towards adiabatic-connection interpolation model with broader applicability

Authors: Lucian A. Constantin, Szymon Śmiga, Fabio Della Sala

Abstract: The Adiabatic Connection Integrand Interpolation (ACII) method represents a general path for calculating correlation energies in electronic systems within the Den sity Functional Theory. ACII functionals include both exact-exchange and the second-order correlation energy, as well as an interpolating function toward the strictly-correlated electron (SCE) regime. Several interpolating functions have… ▽ More The Adiabatic Connection Integrand Interpolation (ACII) method represents a general path for calculating correlation energies in electronic systems within the Den sity Functional Theory. ACII functionals include both exact-exchange and the second-order correlation energy, as well as an interpolating function toward the strictly-correlated electron (SCE) regime. Several interpolating functions have been proposed in the last years targeting different properties, yet an accurate ACII approach with broad applicability is sti ll missing. Recently, we have proposed an ACII functional that was made accurate for the three-dimensional (3D) uniform electron gas as well as for model metal clusters. In this work we present an ACII functional (named genISI2) which is very accurate for both three-dimensional (3D) and two-dimensional (2D) uniform electron gases and for the q uasi-2D infinite barrier model, where most of the exchange-correlation functionals fail badly, as well as for strongly correlated two-electrons systems. Using the exact-exchange Kohn-Sham orbitals, we have also assessed the genISI2 for various molecular systems, showing a superior performance with respect to the o ther ACII methods for total energies, atomization energies, and ionization potentials. The genISI2 functional can thus find application in a broad range of systems and properties. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05615 [pdf, other]

Particle shells from relativistic bubble walls

Authors: Iason Baldes, Maximilian Dichtl, Yann Gouttenoire, Filippo Sala

Abstract: Relativistic bubble walls from cosmological phase transitions (PT) necessarily accumulate expanding shells of particles. We systematically characterize shell properties, and identify and calculate the processes that prevent them from free streaming: phase-space saturation effects, out-of-equilibrium $2\to2$ and $3\to2$ shell-shell and shell-bath interactions, and shell interactions with bubble wal… ▽ More Relativistic bubble walls from cosmological phase transitions (PT) necessarily accumulate expanding shells of particles. We systematically characterize shell properties, and identify and calculate the processes that prevent them from free streaming: phase-space saturation effects, out-of-equilibrium $2\to2$ and $3\to2$ shell-shell and shell-bath interactions, and shell interactions with bubble walls. We find that shells do not free stream in scenarios widely studied in the literature, where standard predictions will need to be reevaluated, including those of bubble wall velocities, gravitational waves (GW) and particle production. Our results support the use of bulk-flow GW predictions in all regions where shells free stream, irrespectively of whether or not the latent heat is mostly converted in the scalar field gradient. △ Less

Submitted 21 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 73 pages, 18 figures and 8 tables including appendices and references, v2: shell-bath processes added

arXiv:2401.15478 [pdf, other]

Product Manifold Representations for Learning on Biological Pathways

Authors: Daniel McNeela, Frederic Sala, Anthony Gitter

Abstract: Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality… ▽ More Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks. In this work, we investigate the effects of embedding pathway graphs in non-Euclidean mixed-curvature spaces and compare against traditional Euclidean graph representation learning models. We then train a supervised model using the learned node embeddings to predict missing protein-protein interactions in pathway graphs. We find large reductions in distortion and boosts on in-distribution edge prediction performance as a result of using mixed-curvature embeddings and their corresponding graph neural network models. However, we find that mixed-curvature representations underperform existing baselines on out-of-distribution edge prediction performance suggesting that these representations may overfit to the training graph topology. We provide our mixed-curvature product GCN code at https://github.com/mcneela/Mixed-Curvature-GCN and our pathway analysis code at https://github.com/mcneela/Mixed-Curvature-Pathways. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: 28 pages, 19 figures

arXiv:2401.12278 [pdf, other]

Early Matter Domination at Colliders: Long Live the Glueball!

Authors: Fady Bishara, Filippo Sala, Kai Schmidt-Hoberg

Abstract: We prove that collider searches for long-lived particles (LLPs) can test the dynamics responsible for matter domination in the early universe. In this letter we concentrate on the specific example of glueballs from a GeV-scale confining dark sector and compute the dilution of cosmological relics induced by their decay. We then show that searches for long-lived glueballs from Higgs decays test incr… ▽ More We prove that collider searches for long-lived particles (LLPs) can test the dynamics responsible for matter domination in the early universe. In this letter we concentrate on the specific example of glueballs from a GeV-scale confining dark sector and compute the dilution of cosmological relics induced by their decay. We then show that searches for long-lived glueballs from Higgs decays test increasing values of dilution at ATLAS and CMS, CODEX-b, ANUBIS and MATHUSLA. We identify the general features that make models of early matter domination discoverable via LLPs at colliders. Our study provides a quantitative physics motivation to test longer lifetimes. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 5 pages + refs, 1 figure

arXiv:2401.12225 [pdf, other]

Multimodal Data Curation via Object Detection and Filter Ensembles

Authors: Tzu-Heng Huang, Changho Shin, Sui Jiet Tay, Dyah Adila, Frederic Sala

Abstract: We propose an approach for curating multimodal data that we used for our entry in the 2023 DataComp competition filtering track. Our technique combines object detection and weak supervision-based ensembling. In the first of two steps in our approach, we employ an out-of-the-box zero-shot object detection model to extract granular information and produce a variety of filter designs. In the second s… ▽ More We propose an approach for curating multimodal data that we used for our entry in the 2023 DataComp competition filtering track. Our technique combines object detection and weak supervision-based ensembling. In the first of two steps in our approach, we employ an out-of-the-box zero-shot object detection model to extract granular information and produce a variety of filter designs. In the second step, we employ weak supervision to ensemble filtering rules. This approach results in a 4% performance improvement when compared to the best-performing baseline, producing the top-ranking position in the small scale track at the time of writing. Furthermore, in the medium scale track, we achieve a noteworthy 4.2% improvement over the baseline by simply ensembling existing baselines with weak supervision. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Appeared in the Workshop of Towards the Next Generation of Computer Vision Datasets (TNGCV) on ICCV 2023

arXiv:2312.09282 [pdf, other]

Baryogenesis and Leptogenesis from Supercooled Confinement

Authors: Maximilian Dichtl, Jacopo Nava, Silvia Pascoli, Filippo Sala

Abstract: We propose a framework of baryogenesis and leptogenesis that relies on a supercooled confining phase transition (PT) in the early universe. The baryon or lepton asymmetry is sourced by decays of hadrons of the strong dynamics after the PT, and it is enhanced compared to the non-confining case, which was the only one explored so far. This widens the energy range of the PT, where the observed baryon… ▽ More We propose a framework of baryogenesis and leptogenesis that relies on a supercooled confining phase transition (PT) in the early universe. The baryon or lepton asymmetry is sourced by decays of hadrons of the strong dynamics after the PT, and it is enhanced compared to the non-confining case, which was the only one explored so far. This widens the energy range of the PT, where the observed baryon asymmetry can be reproduced, down to the electroweak scale. The framework then becomes testable with gravity waves (GW) at LISA and the Einstein Telescope. We then study two explicit realisations: one of leptogenesis from composite sterile neutrinos that realises inverse see-saw; one of baryogenesis from composite scalars that is partly testable by existing colliders and flavour factories. △ Less

Submitted 23 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 34 pages, 9 figures, minor revision, accepted for publication in JHEP

arXiv:2312.04740 [pdf, other]

Train 'n Trade: Foundations of Parameter Markets

Authors: Tzu-Heng Huang, Harit Vishwakarma, Frederic Sala

Abstract: Organizations typically train large models individually. This is costly and time-consuming, particularly for large-scale foundation models. Such vertical production is known to be suboptimal. Inspired by this economic insight, we ask whether it is possible to leverage others' expertise by trading the constituent parts in models, i.e., sets of weights, as if they were market commodities. While rece… ▽ More Organizations typically train large models individually. This is costly and time-consuming, particularly for large-scale foundation models. Such vertical production is known to be suboptimal. Inspired by this economic insight, we ask whether it is possible to leverage others' expertise by trading the constituent parts in models, i.e., sets of weights, as if they were market commodities. While recent advances in aligning and interpolating models suggest that doing so may be possible, a number of fundamental questions must be answered to create viable parameter markets. In this work, we address these basic questions, propose a framework containing the infrastructure necessary for market operations to take place, study strategies for exchanging parameters, and offer means for agents to monetize parameters. Excitingly, compared to agents who train siloed models from scratch, we show that it is possible to mutually gain by using the market, even in competitive settings. This suggests that the notion of parameter markets may be a useful paradigm for improving large-scale model training in the future. △ Less

Submitted 7 December, 2023; originally announced December 2023.

Comments: accepted at NeurIPS 2023

arXiv:2312.00960 [pdf]

The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models

Authors: Satya Sai Srinath Namburi, Makesh Sreedhar, Srinath Srinivasan, Frederic Sala

Abstract: Compressing large language models (LLMs), often consisting of billions of parameters, provides faster inference, smaller memory footprints, and enables local deployment. Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits. The key tradeoff is between the degr… ▽ More Compressing large language models (LLMs), often consisting of billions of parameters, provides faster inference, smaller memory footprints, and enables local deployment. Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits. The key tradeoff is between the degree of compression and the impact on the quality of the compressed model. Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy. More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored. To help bridge this gap, we present a comprehensive analysis across multiple model families (ENCODER, ENCODER-DECODER, and DECODER) using the LAMA and LM-HARNESS benchmarks in order to systematically quantify the effect of commonly employed compression techniques on model performance. A particular focus is on tradeoffs involving parametric knowledge, with the goal of providing practitioners with practical insights to help make informed decisions on compression. We release our codebase1 to enable further research. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted to EMNLP 2023 Findings

arXiv:2309.16430 [pdf, other]

Adiabatic connection interaction strength interpolation method made accurate for the uniform electron gas

Authors: Lucian A. Constantin, Subrata Jana, Szymon Śmiga, Fabio Della Sala

Abstract: The adiabatic connection interaction strength interpolation (ISI)-like method provides a high-level expression for the correlation energy, being in principle exact in the weak-interaction limit, where it recovers the second-order Görling-Levy perturbation term, but also in the strong-interaction limit that is described by the strictly correlated electron approach. In this work, we construct the ge… ▽ More The adiabatic connection interaction strength interpolation (ISI)-like method provides a high-level expression for the correlation energy, being in principle exact in the weak-interaction limit, where it recovers the second-order Görling-Levy perturbation term, but also in the strong-interaction limit that is described by the strictly correlated electron approach. In this work, we construct the genISI functional made accurate for the uniform electron gas, a solid-state physics paradigm that is a very difficult test for ISI-like correlation functionals. We assess the genISI functional for various jellium spheres with the number of electrons Z $\leq$ 912 and for the non-relativistic noble atoms with Z $\leq$ 290. For the jellium clusters, the genISI is remarkably accurate, while for the noble atoms, it shows a good performance, similar to other ISI-like methods. Then, the genISI functional can open the path using the ISI-like method in solid-state calculations. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.04344 [pdf, other]

Zero-Shot Robustification of Zero-Shot Models

Authors: Dyah Adila, Changho Shin, Linrong Cai, Frederic Sala

Abstract: Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We p… ▽ More Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose RoboShot, a method that improves the robustness of pretrained model embeddings in a fully zero-shot fashion. First, we use language models (LMs) to obtain useful insights from task descriptions. These insights are embedded and used to remove harmful and boost useful components in embeddings -- without any supervision. Theoretically, we provide a simple and tractable model for biases in zero-shot embeddings and give a result characterizing under what conditions our approach can boost performance. Empirically, we evaluate RoboShot on nine image and NLP classification tasks and show an average improvement of 15.98% on worst group accuracy, with trivial decrease in overall accuracy over several zero-shot baselines. Additionally, we demonstrate that RoboShot is compatible with a variety of pretrained and language models and propose a way to further boost performance with a zero-shot adaptation variant. △ Less

Submitted 12 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: International Conference on Learning Representations (ICLR), 2024

arXiv:2307.14430 [pdf, other]

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

Authors: Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré

Abstract: The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when le… ▽ More The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.12226 [pdf, other]

Geometry-Aware Adaptation for Pretrained Models

Authors: Nicholas Roberts, Xintong Li, Dyah Adila, Sonia Cromp, Tzu-Heng Huang, Jitian Zhao, Frederic Sala

Abstract: Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of… ▽ More Machine learning models -- including prominent zero-shot models -- are often trained on datasets whose labels are only a small proportion of a larger label space. Such spaces are commonly equipped with a metric that relates the labels via distances between them. We propose a simple approach to exploit this information to adapt the trained model to reliably predict new classes -- or, in the case of zero-shot prediction, to improve its performance -- without any additional training. Our technique is a drop-in replacement of the standard prediction rule, swapping argmax with the Fréchet mean. We provide a comprehensive theoretical analysis for this approach, studying (i) learning-theoretic results trading off label space diameter, sample complexity, and model dimension, (ii) characterizations of the full range of scenarios in which it is possible to predict any unobserved class, and (iii) an optimal active learning-like next class selection procedure to obtain optimal training classes for when it is not possible to predict the entire range of unobserved classes. Empirically, using easily-available external metrics, our proposed approach, Loki, gains up to 29.7% relative improvement over SimCLR on ImageNet and scales to hundreds of thousands of classes. When no such metric is available, Loki can use self-derived metrics from class embeddings and obtains a 10.5% improvement on pretrained zero-shot models such as CLIP. △ Less

Submitted 27 November, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: NeurIPS 2023

arXiv:2307.11031 [pdf, ps, other]

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification

Authors: Neel Guha, Mayee F. Chen, Kush Bhatia, Azalia Mirhoseini, Frederic Sala, Christopher Ré

Abstract: Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work… ▽ More Recent work has shown that language models' (LMs) prompt-based learning capabilities make them well suited for automating data labeling in domains where manual annotation is expensive. The challenge is that while writing an initial prompt is cheap, improving a prompt is costly -- practitioners often require significant labeled data in order to evaluate the impact of prompt modifications. Our work asks whether it is possible to improve prompt-based learning without additional labeled data. We approach this problem by attempting to modify the predictions of a prompt, rather than the prompt itself. Our intuition is that accurate predictions should also be consistent: samples which are similar under some feature representation should receive the same prompt prediction. We propose Embroid, a method which computes multiple representations of a dataset under different embedding functions, and uses the consistency between the LM predictions for neighboring samples to identify mispredictions. Embroid then uses these neighborhoods to create additional predictions for each sample, and combines these predictions with a simple latent variable graphical model in order to generate a final corrected prediction. In addition to providing a theoretical analysis of Embroid, we conduct a rigorous empirical evaluation across six different LMs and up to 95 different tasks. We find that (1) Embroid substantially improves performance over original prompts (e.g., by an average of 7.3 points on GPT-JT), (2) also realizes improvements for more sophisticated prompting strategies (e.g., chain-of-thought), and (3) can be specialized to domains like law through the embedding functions. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 38 pages, 22 figures, 8 tables

arXiv:2307.02715 [pdf, other]

Regularized and Opposite spin-scaled functionals from Møller-Plesset adiabatic connection -- higher accuracy at lower cost

Authors: Kimberly J. Daas, Derk P. Kooi, Nina C. Peters, Eduardo Fabiano, Fabio Della Sala, Paola Gori-Giorgi, Stefan Vuckovic

Abstract: Non-covalent interactions (NCIs) play a crucial role in biology, chemistry, material science, and everything in between. To improve pure quantum-chemical simulations of NCIs, we propose a methodology for constructing approximate correlation energies by combining an interpolation along the Møller adiabatic connection (MP AC) with a regularization and spin-scaling strategy applied to MP2 correlation… ▽ More Non-covalent interactions (NCIs) play a crucial role in biology, chemistry, material science, and everything in between. To improve pure quantum-chemical simulations of NCIs, we propose a methodology for constructing approximate correlation energies by combining an interpolation along the Møller adiabatic connection (MP AC) with a regularization and spin-scaling strategy applied to MP2 correlation energies. This combination yields $c_{\rm os}κ_{\rm os}$-SPL2, which exhibits superior accuracy for NCIs compared to any of the individual strategies. With the $N^4$ formal scaling, $c_{\rm os}κ_{\rm os}$-SPL2, is competitive or often outperforms more expensive dispersion-corrected double hybrids for NCIs.The accuracy of $c_{\rm os}κ_{\rm os}$-SPL2 particularly shines for anionic halogen bonded complexes, where it surpasses standard dispersion-corrected DFT by a factor of 3 to 5. △ Less

Submitted 7 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

Comments: 12 pages + 5 SI, 8 figures + 6 SI

arXiv:2306.15555 [pdf, other]

Bubbletrons

Authors: Iason Baldes, Maximilian Dichtl, Yann Gouttenoire, Filippo Sala

Abstract: In cosmological first-order phase transitions (PT) with relativistic bubble walls, high-energy shells of particles generically form on the inner and outer sides of the walls. Shells from different bubbles can then collide with energies much larger than the PT or inflation scales, and with sizeable rates, realising a `bubbletron'. As an application, we calculate the maximal dark matter mass… ▽ More In cosmological first-order phase transitions (PT) with relativistic bubble walls, high-energy shells of particles generically form on the inner and outer sides of the walls. Shells from different bubbles can then collide with energies much larger than the PT or inflation scales, and with sizeable rates, realising a `bubbletron'. As an application, we calculate the maximal dark matter mass $M_{DM}$ that can be produced from shell collisions in a U(1) gauge PT, for scales of the PT $v_\varphi$ from MeV to $10^{16}$ GeV. We find for example $M_{DM} \sim 10^6/10^{11}/10^{15}$ GeV for $v_\varphi \sim 10^{-2}/10^3/10^8$ GeV. The gravity wave signal sourced at the PT then links Pulsar Timing Arrays with the PeV scale, LISA with the ZeV one, and the Einstein Telescope with grand unification. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: 5 pages plus references, 5 figures

arXiv:2304.00754 [pdf, ps, other]

Gaussian expansion of Yukawa non-local kinetic energy functionals: application to metal clusters

Authors: F. Sarcinella, S. Śmiga, F. Della Sala, E. Fabiano

Abstract: The development of kinetic energy (KE) functionals is one of the current challenges in density functional theory (DFT). The Yukawa non-local KE functionals [Phys. Rev. B 103, 155127 (2021)] have been shown to describe accurately the Lindhard response of the homogeneous electron gas (HEG) directly in the real space, without any step in the reciprocal space. However, the Yukawa kernel employs an exp… ▽ More The development of kinetic energy (KE) functionals is one of the current challenges in density functional theory (DFT). The Yukawa non-local KE functionals [Phys. Rev. B 103, 155127 (2021)] have been shown to describe accurately the Lindhard response of the homogeneous electron gas (HEG) directly in the real space, without any step in the reciprocal space. However, the Yukawa kernel employs an exponential function which cannot be efficiently represented in conventional Gaussian-based quantum chemistry codes. Here, we present an expansion of the Yukawa kernel in Gaussian functions. We show that for the HEG this expansion is independent of the electronic density, and that for general finite systems the accuracy can be easily tuned. Finally, we present results for atomistic sodium clusters of different sizes, showing that simple Yukawa functionals can give superior accuracy as compared to semilocal functionals. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 9 pages, 6 figures

arXiv:2303.17713 [pdf, other]

Mitigating Source Bias for Fairer Weak Supervision

Authors: Changho Shin, Sonia Cromp, Dyah Adila, Frederic Sala

Abstract: Weak supervision enables efficient development of training sets by reducing the need for ground truth labels. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also entail the danger that the produced pseudolabels are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak super… ▽ More Weak supervision enables efficient development of training sets by reducing the need for ground truth labels. However, the techniques that make weak supervision attractive -- such as integrating any source of signal to estimate unknown labels -- also entail the danger that the produced pseudolabels are highly biased. Surprisingly, given everyday use and the potential for increased bias, weak supervision has not been studied from the point of view of fairness. We begin such a study, starting with the observation that even when a fair model can be built from a dataset with access to ground-truth labels, the corresponding dataset labeled via weak supervision can be arbitrarily unfair. To address this, we propose and empirically validate a model for source unfairness in weak supervision, then introduce a simple counterfactual fairness-based technique that can mitigate these biases. Theoretically, we show that it is possible for our approach to simultaneously improve both accuracy and fairness -- in contrast to standard fairness approaches that suffer from tradeoffs. Empirically, we show that our technique improves accuracy on weak supervision baselines by as much as 32\% while reducing demographic parity gap by 82.5\%. A simple extension of our method aimed at maximizing performance produces state-of-the-art performance in five out of ten datasets in the WRENCH benchmark. △ Less

Submitted 29 November, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: NeurIPS 2023

arXiv:2303.17154 [pdf, ps, other]

Flops and Hilbert schemes of space curve singularities

Authors: Duiliu-Emanuel Diaconescu, Mauro Porta, Francesco Sala, Arian Vosoughinia

Abstract: Using pagoda flop transitions between smooth projective threefolds, a relation is derived between the Euler numbers of moduli spaces of stable pairs which are scheme-theoretically supported on a fixed singular space curve and Euler numbers of Flag Hilbert schemes associated to a plane curve singularity. When the space curve singularity is locally complete intersection, one obtains a relation betwe… ▽ More Using pagoda flop transitions between smooth projective threefolds, a relation is derived between the Euler numbers of moduli spaces of stable pairs which are scheme-theoretically supported on a fixed singular space curve and Euler numbers of Flag Hilbert schemes associated to a plane curve singularity. When the space curve singularity is locally complete intersection, one obtains a relation between the latter and Euler numbers of Hilbert schemes of the space curve singularity. It is also shown that this relation yields explicit results for a class of torus-invariant locally complete intersection singularities. △ Less

Submitted 23 June, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: v2: 67 pages, some technical assumptions removed, typos fixed, main results unchanged. v1: 52 pages

MSC Class: 14N35 (Primary); 14D23; 14E99; 14F99 (Secondary)

arXiv:2303.12107 [pdf, other]

Dark Matter spikes around Sgr A* in $γ$-rays

Authors: Shyam Balaji, Divya Sachdeva, Filippo Sala, Joseph Silk

Abstract: We use H.E.S.S. $γ$-ray observations of Sgr A* to derive novel limits on the Dark Matter (DM) annihilation cross-section. We quantify their dependence on uncertainties i) in the DM halo profile, which we vary from peaked to cored, and ii) in the shape of the DM spike around Sgr A*, dynamically heated by the nuclear star cluster. For peaked halo profiles and depending on the heating of the spike, o… ▽ More We use H.E.S.S. $γ$-ray observations of Sgr A* to derive novel limits on the Dark Matter (DM) annihilation cross-section. We quantify their dependence on uncertainties i) in the DM halo profile, which we vary from peaked to cored, and ii) in the shape of the DM spike around Sgr A*, dynamically heated by the nuclear star cluster. For peaked halo profiles and depending on the heating of the spike, our limits are the strongest existing ones for DM masses above a few TeV. Our study contributes to assessing the influence of the advancements in our knowledge of the Milky Way on determining the properties of DM particles. △ Less

Submitted 13 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Published in JCAP, added discussion on energy resolution impact for DM annihilation into photons channel

arXiv:2303.08533 [pdf, other]

Towards a Muon Collider

Authors: Carlotta Accettura, Dean Adams, Rohit Agarwal, Claudia Ahdida, Chiara Aimè, Nicola Amapane, David Amorim, Paolo Andreetto, Fabio Anulli, Robert Appleby, Artur Apresyan, Aram Apyan, Sergey Arsenyev, Pouya Asadi, Mohammed Attia Mahmoud, Aleksandr Azatov, John Back, Lorenzo Balconi, Laura Bandiera, Roger Barlow, Nazar Bartosik, Emanuela Barzi, Fabian Batsch, Matteo Bauce, J. Scott Berg , et al. (272 additional authors not shown)

Abstract: A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders desi… ▽ More A muon collider would enable the big jump ahead in energy reach that is needed for a fruitful exploration of fundamental interactions. The challenges of producing muon collisions at high luminosity and 10 TeV centre of mass energy are being investigated by the recently-formed International Muon Collider Collaboration. This Review summarises the status and the recent advances on muon colliders design, physics and detector studies. The aim is to provide a global perspective of the field and to outline directions for future work. △ Less

Submitted 27 November, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 118 pages, 103 figures

arXiv:2303.07527 [pdf, other]

Domain Generalization via Nuclear Norm Regularization

Authors: Zhenmei Shi, Yifei Ming, Ying Fan, Frederic Sala, Yingyu Liang

Abstract: The ability to generalize to unseen domains is crucial for machine learning systems deployed in the real world, especially when we only have data from limited training domains. In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization. Intuitively, the proposed regularizer mitigates the impacts of environmenta… ▽ More The ability to generalize to unseen domains is crucial for machine learning systems deployed in the real world, especially when we only have data from limited training domains. In this paper, we propose a simple and effective regularization method based on the nuclear norm of the learned features for domain generalization. Intuitively, the proposed regularizer mitigates the impacts of environmental features and encourages learning domain-invariant features. Theoretically, we provide insights into why nuclear norm regularization is more effective compared to ERM and alternative regularization methods. Empirically, we conduct extensive experiments on both synthetic and real datasets. We show nuclear norm regularization achieves strong performance compared to baselines in a wide range of domain generalization tasks. Moreover, our regularizer is broadly applicable with various methods such as ERM and SWAD with consistently improved performance, e.g., 1.7% and 0.9% test accuracy improvements respectively on the DomainBed benchmark. △ Less

Submitted 4 December, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

Comments: 23 pages

arXiv:2212.10579 [pdf, other]

doi 10.1007/JHEP07(2023)188

Resonant Anomaly Detection with Multiple Reference Datasets

Authors: Mayee F. Chen, Benjamin Nachman, Frederic Sala

Abstract: An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage… ▽ More An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2211.13375 [pdf, other]

Lifting Weak Supervision To Structured Prediction

Authors: Harit Vishwakarma, Nicholas Roberts, Frederic Sala

Abstract: Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolab… ▽ More Weak supervision (WS) is a rich set of techniques that produce pseudolabels by aggregating easily obtained but potentially noisy label estimates from a variety of sources. WS is theoretically well understood for binary classification, where simple approaches enable consistent estimation of pseudolabel noise rates. Using this result, it has been shown that downstream models trained on the pseudolabels have generalization guarantees nearly identical to those trained on clean labels. While this is exciting, users often wish to use WS for structured prediction, where the output space consists of more than a binary or multi-class label set: e.g. rankings, graphs, manifolds, and more. Do the favorable theoretical properties of WS for binary classification lift to this setting? We answer this question in the affirmative for a wide range of scenarios. For labels taking values in a finite metric space, we introduce techniques new to weak supervision based on pseudo-Euclidean embeddings and tensor decompositions, providing a nearly-consistent noise rate estimator. For labels in constant-curvature Riemannian manifolds, we introduce new invariants that also yield consistent noise rate estimation. In both cases, when using the resulting pseudolabels in concert with a flexible downstream model, we obtain generalization guarantees nearly identical to those for models trained on clean data. Several of our results, which can be viewed as robustness guarantees in structured prediction with noisy labels, may be of independent interest. Empirical evaluation validates our claims and shows the merits of the proposed method. △ Less

Submitted 23 November, 2022; originally announced November 2022.

arXiv:2211.12620 [pdf, other]

Promises and Pitfalls of Threshold-based Auto-labeling

Authors: Harit Vishwakarma, Heguang Lin, Frederic Sala, Ramya Korlakai Vinayak

Abstract: Creating large-scale high-quality labeled datasets is a major bottleneck in supervised machine learning workflows. Threshold-based auto-labeling (TBAL), where validation data obtained from humans is used to find a confidence threshold above which the data is machine-labeled, reduces reliance on manual annotation. TBAL is emerging as a widely-used solution in practice. Given the long shelf-life and… ▽ More Creating large-scale high-quality labeled datasets is a major bottleneck in supervised machine learning workflows. Threshold-based auto-labeling (TBAL), where validation data obtained from humans is used to find a confidence threshold above which the data is machine-labeled, reduces reliance on manual annotation. TBAL is emerging as a widely-used solution in practice. Given the long shelf-life and diverse usage of the resulting datasets, understanding when the data obtained by such auto-labeling systems can be relied on is crucial. This is the first work to analyze TBAL systems and derive sample complexity bounds on the amount of human-labeled validation data required for guaranteeing the quality of machine-labeled data. Our results provide two crucial insights. First, reasonable chunks of unlabeled data can be automatically and accurately labeled by seemingly bad models. Second, a hidden downside of TBAL systems is potentially prohibitive validation data usage. Together, these insights describe the promise and pitfalls of using such systems. We validate our theoretical guarantees with extensive experiments on synthetic and real datasets. △ Less

Submitted 21 February, 2024; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: NeurIPS 2023 (Spotlight)

Journal ref: Thirty Seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2210.03324 [pdf, other]

AutoML for Climate Change: A Call to Action

Authors: Renbo Tu, Nicholas Roberts, Vishak Prasad, Sibasis Nayak, Paarth Jain, Frederic Sala, Ganesh Ramakrishnan, Ameet Talwalkar, Willie Neiswanger, Colin White

Abstract: The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications. The climate change AI (CCAI) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML)… ▽ More The challenge that climate change poses to humanity has spurred a rapidly developing field of artificial intelligence research focused on climate change applications. The climate change AI (CCAI) community works on a diverse, challenging set of problems which often involve physics-constrained ML or heterogeneous spatiotemporal data. It would be desirable to use automated machine learning (AutoML) techniques to automatically find high-performing architectures and hyperparameters for a given dataset. In this work, we benchmark popular AutoML libraries on three high-leverage CCAI applications: climate modeling, wind power forecasting, and catalyst discovery. We find that out-of-the-box AutoML libraries currently fail to meaningfully surpass the performance of human-designed CCAI models. However, we also identify a few key weaknesses, which stem from the fact that most AutoML techniques are tailored to computer vision and NLP applications. For example, while dozens of search spaces have been designed for image and language data, none have been designed for spatiotemporal data. Addressing these key weaknesses can lead to the discovery of novel architectures that yield substantial performance gains across numerous CCAI applications. Therefore, we present a call to action to the AutoML community, since there are a number of concrete, promising directions for future work in the space of AutoML for CCAI. We release our code and a list of resources at https://github.com/climate-change-automl/climate-change-automl. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2210.02441 [pdf, other]

Ask Me Anything: A simple strategy for prompting language models

Authors: Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré

Abstract: Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect promp… ▽ More Large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt that demonstrates how to perform the task and no additional training. Prompting is a brittle process wherein small modifications to the prompt can cause large variations in the model predictions, and therefore significant effort is dedicated towards designing a painstakingly "perfect prompt" for a task. To mitigate the high degree of effort involved in prompt-design, we instead ask whether producing multiple effective, yet imperfect, prompts and aggregating them can lead to a high quality prompting strategy. Our observations motivate our proposed prompting method, ASK ME ANYTHING (AMA). We first develop an understanding of the effective prompt formats, finding that question-answering (QA) prompts, which encourage open-ended generation ("Who went to the park?") tend to outperform those that restrict the model outputs ("John went to the park. Output True or False."). Our approach recursively uses the LLM itself to transform task inputs to the effective QA format. We apply the collected prompts to obtain several noisy votes for the input's true label. We find that the prompts can have very different accuracies and complex dependencies and thus propose to use weak supervision, a procedure for combining the noisy predictions, to produce the final predictions for the inputs. We evaluate AMA across open-source model families (e.g., EleutherAI, BLOOM, OPT, and T0) and model sizes (125M-175B parameters), demonstrating an average performance lift of 10.2% over the few-shot baseline. This simple strategy enables the open-source GPT-J-6B model to match and exceed the performance of few-shot GPT3-175B on 15 of 20 popular benchmarks. Averaged across these tasks, the GPT-J-6B model outperforms few-shot GPT3-175B. We release our code here: https://github.com/HazyResearch/ama_prompting △ Less

Submitted 19 November, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

arXiv:2208.14362 [pdf, other]

AutoWS-Bench-101: Benchmarking Automated Weak Supervision with 100 Labels

Authors: Nicholas Roberts, Xintong Li, Tzu-Heng Huang, Dyah Adila, Spencer Schoenberg, Cheng-Yu Liu, Lauren Pick, Haotian Ma, Aws Albarghouthi, Frederic Sala

Abstract: Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of construc… ▽ More Weak supervision (WS) is a powerful method to build labeled datasets for training supervised models in the face of little-to-no labeled data. It replaces hand-labeling data with aggregating multiple noisy-but-cheap label estimates expressed by labeling functions (LFs). While it has been used successfully in many domains, weak supervision's application scope is limited by the difficulty of constructing labeling functions for domains with complex or high-dimensional features. To address this, a handful of methods have proposed automating the LF design process using a small set of ground truth labels. In this work, we introduce AutoWS-Bench-101: a framework for evaluating automated WS (AutoWS) techniques in challenging WS settings -- a set of diverse application domains on which it has been previously difficult or impossible to apply traditional WS techniques. While AutoWS is a promising direction toward expanding the application-scope of WS, the emergence of powerful methods such as zero-shot foundation models reveals the need to understand how AutoWS techniques compare or cooperate with modern zero-shot or few-shot learners. This informs the central question of AutoWS-Bench-101: given an initial set of 100 labels for each task, we ask whether a practitioner should use an AutoWS method to generate additional labels or use some simpler baseline, such as zero-shot predictions from a foundation model or supervised learning. We observe that in many settings, it is necessary for AutoWS methods to incorporate signal from foundation models if they are to outperform simple few-shot baselines, and AutoWS-Bench-101 promotes future research in this direction. We conclude with a thorough ablation study of AutoWS methods. △ Less

Submitted 24 November, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: NeurIPS 2022 Datasets and Benchmarks Track

arXiv:2207.08926 [pdf, ps, other]

Cohomological Hall algebras and their representations via torsion pairs

Authors: Duiliu-Emanuel Diaconescu, Mauro Porta, Francesco Sala

Abstract: In this paper, we provide a way of attaching to a torsion pair $(T,F)$ on the heart of a stable $\infty$-category $C$ a cohomological (K-theoretical, categorified) Hall algebra and corresponding left and right representations. More precisely, the algebra is associated to the torsion part, while the representation is associated to the torsion-free part. The left and right actions enable us to con… ▽ More In this paper, we provide a way of attaching to a torsion pair $(T,F)$ on the heart of a stable $\infty$-category $C$ a cohomological (K-theoretical, categorified) Hall algebra and corresponding left and right representations. More precisely, the algebra is associated to the torsion part, while the representation is associated to the torsion-free part. The left and right actions enable us to construct canonical subalgebras of the endomorphism ring of the Borel-Moore homology and K-theory of the moduli stack of torsion-free objects, whose "positive parts" recover the cohomological Hall algebra and the K-theoretical Hall algebra associated to the torsion part $T$, respectively. This provides a new direction that might lead to overcome the long-standing limitation of the theory of cohomological Hall algebras to just produce "positive parts" of whole algebras. We also provide a geometric sufficient criterion ensuring the vanishing of the commutator between two different operators. In the quiver case, we obtain the action of the two-dimensional cohomological Hall algebra of a quiver on the cohomology of Nakajima quiver varieties within our framework. Besides the quiver case, we also apply our framework to two torsion pairs on a smooth projective complex surface, and we investigate the corresponding Hall algebras and their representations associated to them. Finally, we slightly modify our method to construct representations of the cohomological Hall algebra of zero-dimensional sheaves on $S$ on the Borel-Moore homology of the moduli spaces of Pandharipande-Thomas stable pairs on surfaces and on relative Hilbert schemes of points (and we obtain similar results at the level of K-theory and bounded derived category). △ Less

Submitted 18 July, 2022; originally announced July 2022.

Comments: 68 pages

MSC Class: 14A20 (Primary); 14A30; 14F08; 17B37 (Secondary)

arXiv:2207.05096 [pdf, other]

doi 10.21468/SciPostPhys.14.3.033

Hot and heavy dark matter from a weak scale phase transition

Authors: Iason Baldes, Yann Gouttenoire, Filippo Sala

Abstract: We point out that dark matter which is produced non-adiabatically in a phase transition (PT) with fast bubble walls receives a boost in velocity which leads to long free-streaming lengths. We find that this could be observed via the suppressed matter power spectrum for dark matter masses around $10^8 - 10^9$ GeV and energy scales of the PT around $10^{2} - 10^3$ GeV. The PT should take place at th… ▽ More We point out that dark matter which is produced non-adiabatically in a phase transition (PT) with fast bubble walls receives a boost in velocity which leads to long free-streaming lengths. We find that this could be observed via the suppressed matter power spectrum for dark matter masses around $10^8 - 10^9$ GeV and energy scales of the PT around $10^{2} - 10^3$ GeV. The PT should take place at the border of the supercooled regime, i.e. approximately when the Universe becomes vacuum dominated. This work offers novel physics goals for galaxy surveys, Lyman-$α$, stellar stream, lensing, and 21-cm observations, and connects these to the gravitational waves from such phase transitions, and more speculatively to possible telescope signals of heavy dark matter decay. △ Less

Submitted 5 December, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Comments: 9 pages plus appendices. V2: Clarifications and references added, accepted for publication in SciPost Physics

Report number: ULB-TH/22-12

Journal ref: SciPost Phys. 14, 033 (2023)

arXiv:2203.13270 [pdf, other]

Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision

Authors: Mayee F. Chen, Daniel Y. Fu, Dyah Adila, Michael Zhang, Frederic Sala, Kayvon Fatahalian, Christopher Ré

Abstract: Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudol… ▽ More Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudolabels. The challenge is building a combination that best exploits the signal available in both foundation models and weak sources. We propose Liger, a combination that uses foundation model embeddings to improve two crucial elements of existing weak supervision techniques. First, we produce finer estimates of weak source quality by partitioning the embedding space and learning per-part source accuracies. Second, we improve source coverage by extending source votes in embedding space. Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space. On six benchmark NLP and video tasks, Liger outperforms vanilla weak supervision by 14.1 points, weakly-supervised kNN and adapters by 11.8 points, and kNN and adapters supervised by traditional hand labels by 7.2 points. △ Less

Submitted 1 August, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

Comments: UAI 2022 Camera Ready

arXiv:2203.12023 [pdf, other]

Generative Modeling Helps Weak Supervision (and Vice Versa)

Authors: Benedikt Boecking, Nicholas Roberts, Willie Neiswanger, Stefano Ermon, Frederic Sala, Artur Dubrawski

Abstract: Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving… ▽ More Many promising applications of supervised machine learning face hurdles in the acquisition of labeled data in sufficient quantity and quality, creating an expensive bottleneck. To overcome such limitations, techniques that do not depend on ground truth labels have been studied, including weak supervision and generative modeling. While these techniques would seem to be usable in concert, improving one another, how to build an interface between them is not well-understood. In this work, we propose a model fusing programmatic weak supervision and generative adversarial networks and provide theoretical justification motivating this fusion. The proposed approach captures discrete latent variables in the data alongside the weak supervision derived label estimate. Alignment of the two allows for better modeling of sample-dependent accuracies of the weak supervision sources, improving the estimate of unobserved labels. It is the first approach to enable data augmentation through weakly supervised synthetic images and pseudolabels. Additionally, its learned latent variables can be inspected qualitatively. The model outperforms baseline weak supervision label models on a number of multiclass image classification datasets, improves the quality of generated images, and further improves end-model performance through data augmentation with synthetic samples. △ Less

Submitted 11 March, 2023; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: Published as a conference paper at ICLR 2023

ACM Class: I.2.0; I.4.m

arXiv:2203.07261 [pdf, other]

The physics case of a 3 TeV muon collider stage

Authors: Jorge De Blas, Dario Buttazzo, Rodolfo Capdevilla, David Curtin, Roberto Franceschini, Fabio Maltoni, Patrick Meade, Federico Meloni, Shufang Su, Eleni Vryonidou, Andrea Wulzer, Chiara Aimè, Aram Apyan, Pouya Asadi, Mohammed Attia Mahmoud, Aleksandr Azatov, Nazar Bartosik, Alessandro Bertolin, Salvatore Bottaro, Laura Buonincontri, Massimo Casarsa, Luca Castelli, Maria Gabriella Catanesi, Francesco Giovanni Celiberto, Alessandro Cerri , et al. (109 additional authors not shown)

Abstract: In the path towards a muon collider with center of mass energy of 10 TeV or more, a stage at 3 TeV emerges as an appealing option. Reviewing the physics potential of such muon collider is the main purpose of this document. In order to outline the progression of the physics performances across the stages, a few sensitivity projections for higher energy are also presented. There are many opportuniti… ▽ More In the path towards a muon collider with center of mass energy of 10 TeV or more, a stage at 3 TeV emerges as an appealing option. Reviewing the physics potential of such muon collider is the main purpose of this document. In order to outline the progression of the physics performances across the stages, a few sensitivity projections for higher energy are also presented. There are many opportunities for probing new physics at a 3 TeV muon collider. Some of them are in common with the extensively documented physics case of the CLIC 3 TeV energy stage, and include measuring the Higgs trilinear coupling and testing the possible composite nature of the Higgs boson and of the top quark at the 20 TeV scale. Other opportunities are unique of a 3 TeV muon collider, and stem from the fact that muons are collided rather than electrons. This is exemplified by studying the potential to explore the microscopic origin of the current $g$-2 and $B$-physics anomalies, which are both related with muons. △ Less

Submitted 27 May, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 73 pages, 28 figures; Contribution to Snowmass 2021

arXiv:2203.07256 [pdf, other]

Muon Collider Physics Summary

Authors: Chiara Aimè, Aram Apyan, Mohammed Attia Mahmoud, Nazar Bartosik, Alessandro Bertolin, Maurizio Bonesini, Salvatore Bottaro, Dario Buttazzo, Rodolfo Capdevilla, Massimo Casarsa, Luca Castelli, Maria Gabriella Catanesi, Francesco Giovanni Celiberto, Alessandro Cerri, Cari Cesarotti, Grigorios Chachamis, Siyu Chen, Yang-Ting Chien, Mauro Chiesa, Gianmaria Collazuol, Marco Costa, Nathaniel Craig, David Curtin, Sridhara Dasu, Jorge De Blas , et al. (100 additional authors not shown)

Abstract: The perspective of designing muon colliders with high energy and luminosity, which is being investigated by the International Muon Collider Collaboration, has triggered a growing interest in their physics reach. We present a concise summary of the muon colliders potential to explore new physics, leveraging on the unique possibility of combining high available energy with very precise measurements. The perspective of designing muon colliders with high energy and luminosity, which is being investigated by the International Muon Collider Collaboration, has triggered a growing interest in their physics reach. We present a concise summary of the muon colliders potential to explore new physics, leveraging on the unique possibility of combining high available energy with very precise measurements. △ Less

Submitted 27 May, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 21 pages, 7 figures; Contribution to Snowmass 2021

arXiv:2203.06029 [pdf, other]

doi 10.1088/1475-7516/2022/06/028

Search for secluded dark matter towards the Galactic Centre with the ANTARES neutrino telescope

Authors: A. Albert, S. Alves, M. Andre, M. Anghinolfi, G. Anton, M. Ardid, S. Ardid, J. -J. Aubert, J. Aublin, B. Baret, S. Basa, B. Belhorma, M. Bendahman, F. Benfenati, V. Bertin, S. Biagi, M. Bissinger, J. Boumaaza, M. Bouta, M. C. Bouwhuis, H. Branzas, R. Bruijn, J. Brunner, J. Busto, B. Caiffi , et al. (124 additional authors not shown)

Abstract: Searches for dark matter (DM) have not provided any solid evidence for the existence of weakly interacting massive particles in the GeV-TeV mass range. Coincidentally, the scale of new physics is being pushed by collider searches well beyond the TeV domain. This situation strongly motivates the exploration of DM masses much larger than a TeV. Secluded scenarios contain a natural way around the uni… ▽ More Searches for dark matter (DM) have not provided any solid evidence for the existence of weakly interacting massive particles in the GeV-TeV mass range. Coincidentally, the scale of new physics is being pushed by collider searches well beyond the TeV domain. This situation strongly motivates the exploration of DM masses much larger than a TeV. Secluded scenarios contain a natural way around the unitarity bound on the DM mass, via the early matter domination induced by the mediator of its interactions with the Standard Model. High-energy neutrinos constitute one of the very few direct accesses to energy scales above a few TeV. An indirect search for secluded DM signals has been performed with the ANTARES neutrino telescope using data from 2007 to 2015. Upper limits on the DM annihilation cross section for DM masses up to 6 PeV are presented and discussed. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2202.11531 [pdf, ps, other]

Self-Consistent Implementation of Kohn-Sham Adiabatic Connection Models with Improved Treatment of the Strong-Interaction Limit

Authors: S. Śmiga, F. Della Sala, P. Gori-Giorgi, E. Fabiano

Abstract: Adiabatic connection models (ACMs), which interpolate between the limits of weak and strong interaction, are powerful tools to build accurate exchange-correlation functionals. If the exact weak-interaction expansion from second-order perturbation theory is included, a self-consistent implementation of these functionals is challenging and still absent in the literature. In this work we fill this ga… ▽ More Adiabatic connection models (ACMs), which interpolate between the limits of weak and strong interaction, are powerful tools to build accurate exchange-correlation functionals. If the exact weak-interaction expansion from second-order perturbation theory is included, a self-consistent implementation of these functionals is challenging and still absent in the literature. In this work we fill this gap by presenting a fully self-consistent-field (SCF) implementation of some popular ACM functionals. While using second-order perturbation theory at weak interactions, we have also introduced new generalised gradient approximations (GGA's), beyond the usual point-charge-plus-continuum model, for the first two leading terms at strong interactions, which are crucial to ensure robustness and reliability. We then assess the SCF-ACM functionals for molecular systems and for prototypical strong-correlation problems. We find that they perform well for both the total energy and the electronic density and that the impact of SCF orbitals is directly connected to the accuracy of the ACM functional form. For the H$_2$ dissociation the SCF-ACM functionals yield significant improvements with respect to standard functionals, also thanks to the use of the new GGA's for the strong-coupling functionals. △ Less

Submitted 7 September, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: 40 pages, 6 figures

arXiv:2112.07686 [pdf, other]

doi 10.1007/JHEP05(2022)004

Friction pressure on relativistic bubble walls

Authors: Yann Gouttenoire, Ryusuke Jinno, Filippo Sala

Abstract: During a cosmological first-order phase transition, particles of the plasma crossing the bubble walls can radiate a gauge boson. The resulting pressure cannot be computed perturbatively for large coupling constant and/or large supercooling. We resum the real and virtual emissions at all leading-log orders, both analytically and numerically using a Monte-Carlo simulation. We find that radiated boso… ▽ More During a cosmological first-order phase transition, particles of the plasma crossing the bubble walls can radiate a gauge boson. The resulting pressure cannot be computed perturbatively for large coupling constant and/or large supercooling. We resum the real and virtual emissions at all leading-log orders, both analytically and numerically using a Monte-Carlo simulation. We find that radiated bosons are dominantly soft and that the resulting retarding pressure on relativistic bubble walls is linear both in the Lorentz boost and in the order parameter, up to a log. We further quantitatively discuss IR cut-offs, wall thickness effects, the impact of various approximations entering the calculation, and comment on the fate of radiated bosons that are reflected. △ Less

Submitted 14 July, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: 26 pages, 8 figures, plus appendices and references. v2: additional references added, matches JHEP publication

arXiv:2112.03865 [pdf, other]

Universalizing Weak Supervision

Authors: Changho Shin, Winfred Li, Harit Vishwakarma, Nicholas Roberts, Frederic Sala

Abstract: Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new lab… ▽ More Weak supervision (WS) frameworks are a popular way to bypass hand-labeling large datasets for training data-hungry models. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. However, the synthesis technique is specific to a particular kind of label, such as binary labels or sequences, and each new label type requires manually designing a new synthesis algorithm. Instead, we propose a universal technique that enables weak supervision over any label type while still offering desirable properties, including practical flexibility, computational efficiency, and theoretical guarantees. We apply this technique to important problems previously not tackled by WS frameworks including learning to rank, regression, and learning in hyperbolic space. Theoretically, our synthesis approach produces a consistent estimators for learning some challenging but important generalizations of the exponential family model. Experimentally, we validate our framework and show improvement over baselines in diverse settings including real-world learning-to-rank and regression problems along with learning on hyperbolic manifolds. △ Less

Submitted 29 November, 2023; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: ICLR 2022

arXiv:2111.08036 [pdf, ps, other]

On the Chow ring of the classifying stack of algebraic tori

Authors: Francesco Sala

Abstract: We investigate the structure of the Chow ring of the classifying stacks $BT$ of algebraic tori, as it has been defined by B. Totaro. Some previous work of N. Karpenko, A. Merkurjev, S. Blinstein and F. Scavia has shed some light on the structure of such rings. In particular Karpenko showed the absence of torsion classes in the case of permutation tori, while Merkurjev and Blinstein described in a… ▽ More We investigate the structure of the Chow ring of the classifying stacks $BT$ of algebraic tori, as it has been defined by B. Totaro. Some previous work of N. Karpenko, A. Merkurjev, S. Blinstein and F. Scavia has shed some light on the structure of such rings. In particular Karpenko showed the absence of torsion classes in the case of permutation tori, while Merkurjev and Blinstein described in a very effective way the second Chow group $A^2(BT)$ in the general case. Building on this work, Scavia exhibited an example where $A^2(BT)_\text{tors}\neq 0$. Here, by making use of a very elementary approach, we extend the result of Karpenko to special tori and we completely determine the Chow ring $A^*(BT)$ when $T$ is an algebraic torus admitting a resolution with special tori $0\rightarrow T\rightarrow Q\rightarrow P$. In particular we show that there can be torsion in the Chow ring of such tori. △ Less

Submitted 11 November, 2021; originally announced November 2021.

arXiv:2111.00249 [pdf, ps, other]

Shuffle algebras for quivers as quantum groups

Authors: Andrei Neguţ, Francesco Sala, Olivier Schiffmann

Abstract: We define a quantum loop group $\mathbf{U}^+_Q$ associated to an arbitrary quiver $Q=(I,E)$ and maximal set of deformation parameters, with generators indexed by $I \times \mathbb{Z}$ and some explicit quadratic and cubic relations. We prove that $\mathbf{U}^+_Q$ is isomorphic to the (generic, small) shuffle algebra associated to the quiver $Q$ and hence, by [Neg21a], to the localized K-theoretic… ▽ More We define a quantum loop group $\mathbf{U}^+_Q$ associated to an arbitrary quiver $Q=(I,E)$ and maximal set of deformation parameters, with generators indexed by $I \times \mathbb{Z}$ and some explicit quadratic and cubic relations. We prove that $\mathbf{U}^+_Q$ is isomorphic to the (generic, small) shuffle algebra associated to the quiver $Q$ and hence, by [Neg21a], to the localized K-theoretic Hall algebra of $Q$. For the quiver with one vertex and $g$ loops, this yields a presentation of the spherical Hall algebra of a (generic) smooth projective curve of genus $g$ (invoking the results of [SV12]). We extend the above results to the case of non-generic parameters satisfying a certain natural metric condition. As an application, we obtain a description by generators and relations of the subalgebra generated by absolutely cuspidal eigenforms of the Hall algebra of an arbitrary smooth projective curve (invoking the results of [KSV17]). △ Less

Submitted 8 May, 2023; v1 submitted 30 October, 2021; originally announced November 2021.

Comments: Added Section 5 (concerning special values of the parameters) and Section 7 (on the Hall algebra of an arbitrary curve)

arXiv:2110.13926 [pdf, other]

doi 10.1007/JHEP07(2022)084

Supercool Composite Dark Matter Beyond 100 TeV

Authors: Iason Baldes, Yann Gouttenoire, Filippo Sala, Géraldine Servant

Abstract: Dark Matter could be a composite state of a confining sector with an approximate scale symmetry. We consider the case where the associated pseudo-Goldstone boson, the dilaton, mediates its interactions with the Standard Model. When the confining phase transition in the early universe is supercooled, its dynamics allows for Dark Matter masses up to $10^6$ TeV. We derive the precise parameter space… ▽ More Dark Matter could be a composite state of a confining sector with an approximate scale symmetry. We consider the case where the associated pseudo-Goldstone boson, the dilaton, mediates its interactions with the Standard Model. When the confining phase transition in the early universe is supercooled, its dynamics allows for Dark Matter masses up to $10^6$ TeV. We derive the precise parameter space compatible with all experimental constraints, finding that this scenario can be tested partly by telescopes and entirely by gravitational waves. △ Less

Submitted 13 July, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: 35 pages plus appendices and references, accepted for publication in JHEP

Report number: ULB-TH/21-17; DESY 21-172

Journal ref: JHEP 07 (2022) 084

Showing 1–50 of 173 results for author: Sala, F