Zum Hauptinhalt springen

Showing 1–22 of 22 results for author: Kristiadi, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03951  [pdf, other

    cs.LG

    Uncertainty-Guided Optimization on Large Language Model Search Trees

    Authors: Julia Grosse, Ruotian Wu, Ahmad Rashid, Philipp Hennig, Pascal Poupart, Agustinus Kristiadi

    Abstract: Beam search is a standard tree search algorithm when it comes to finding sequences of maximum likelihood, for example, in the decoding processes of large language models. However, it is myopic since it does not take the whole path from the root to a leaf into account. Moreover, it is agnostic to prior knowledge available about the process: For example, it does not consider that the objective being… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 10 pages

  2. arXiv:2406.07780  [pdf, other

    cs.LG cs.CL

    A Critical Look At Tokenwise Reward-Guided Text Generation

    Authors: Ahmad Rashid, Ruotian Wu, Julia Grosse, Agustinus Kristiadi, Pascal Poupart

    Abstract: Large language models (LLMs) can significantly be improved by aligning to human preferences -- the so-called reinforcement learning from human feedback (RLHF). However, the cost of fine-tuning an LLM is prohibitive for many users. Due to their ability to bypass LLM finetuning, tokenwise reward-guided text generation (RGTG) methods have recently been proposed. They use a reward model trained on ful… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  3. arXiv:2406.06459  [pdf, other

    cs.LG

    How Useful is Intermittent, Asynchronous Expert Feedback for Bayesian Optimization?

    Authors: Agustinus Kristiadi, Felix Strieth-Kalthoff, Sriram Ganapathi Subramanian, Vincent Fortuin, Pascal Poupart, Geoff Pleiss

    Abstract: Bayesian optimization (BO) is an integral part of automated scientific discovery -- the so-called self-driving lab -- where human inputs are ideally minimal or at least non-blocking. However, scientists often have strong intuition, and thus human feedback is still useful. Nevertheless, prior works in enhancing BO with expert feedback, such as by incorporating it in an offline or online but blockin… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: AABI 2024. Code: https://github.com/wiseodd/bo-async-feedback

  4. arXiv:2402.05015  [pdf, other

    cs.LG

    A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules?

    Authors: Agustinus Kristiadi, Felix Strieth-Kalthoff, Marta Skreta, Pascal Poupart, Alán Aspuru-Guzik, Geoff Pleiss

    Abstract: Automation is one of the cornerstones of contemporary material discovery. Bayesian optimization (BO) is an essential part of such workflows, enabling scientists to leverage prior domain knowledge into efficient exploration of a large molecular space. While such prior knowledge can take many forms, there has been significant fanfare around the ancillary scientific knowledge encapsulated in large la… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: ICML 2024. Code: https://github.com/wiseodd/lapeft-bayesopt

  5. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  6. arXiv:2312.05705  [pdf, other

    cs.LG stat.ML

    Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC

    Authors: Wu Lin, Felix Dangel, Runa Eschenhagen, Kirill Neklyudov, Agustinus Kristiadi, Richard E. Turner, Alireza Makhzani

    Abstract: Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-fre… ▽ More

    Submitted 23 July, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: A long version of the ICML 2024 paper, updated the text about a related work

  7. arXiv:2311.03683  [pdf, other

    cs.LG

    Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

    Authors: Ahmad Rashid, Serena Hacker, Guojun Zhang, Agustinus Kristiadi, Pascal Poupart

    Abstract: Discriminatively trained, deterministic neural networks are the de facto choice for classification problems. However, even though they achieve state-of-the-art results on in-domain test sets, they tend to be overconfident on out-of-distribution (OOD) data. For instance, ReLU networks - a popular class of neural network architectures - have been shown to almost always yield high confidence predicti… ▽ More

    Submitted 27 March, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted at AISTATS 2024

  8. arXiv:2310.00137  [pdf, other

    cs.LG stat.ML

    On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective

    Authors: Jonathan Wenger, Felix Dangel, Agustinus Kristiadi

    Abstract: The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, which can be demonstrated to hold in wide synthetic neural network architectures. These advantages include faster optimization, reliable uncertainty quan… ▽ More

    Submitted 28 May, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

  9. arXiv:2304.08309  [pdf, other

    cs.LG stat.ML

    Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization

    Authors: Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Vincent Fortuin

    Abstract: The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. It is theoretically compelling since it can be seen as a Gaussian process posterior with the mean function given by the neural network's maximum-a-posteriori predictive function and the covariance function induced by the empirical neural tangent kernel. However, while i… ▽ More

    Submitted 10 July, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: AABI 2023

  10. arXiv:2302.07384  [pdf, other

    cs.LG stat.ML

    The Geometry of Neural Nets' Parameter Spaces Under Reparametrization

    Authors: Agustinus Kristiadi, Felix Dangel, Philipp Hennig

    Abstract: Model reparametrization, which follows the change-of-variable rule of calculus, is a popular way to improve the training of neural nets. But it can also be problematic since it can induce inconsistencies in, e.g., Hessian-based flatness measures, optimization trajectories, and modes of probability densities. This complicates downstream analyses: e.g. one cannot definitively relate flatness with ge… ▽ More

    Submitted 23 October, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: NeurIPS 2023

  11. arXiv:2205.10041  [pdf, other

    cs.LG stat.ML

    Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networks

    Authors: Agustinus Kristiadi, Runa Eschenhagen, Philipp Hennig

    Abstract: Monte Carlo (MC) integration is the de facto method for approximating the predictive distribution of Bayesian neural networks (BNNs). But, even with many MC samples, Gaussian-based BNNs could still yield bad predictive performance due to the posterior approximation's error. Meanwhile, alternatives to MC integration tend to be more expensive or biased. In this work, we experimentally show that the… ▽ More

    Submitted 15 October, 2022; v1 submitted 20 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  12. arXiv:2203.03353  [pdf, other

    stat.ML cs.LG

    Discovering Inductive Bias with Gibbs Priors: A Diagnostic Tool for Approximate Bayesian Inference

    Authors: Luca Rendsburg, Agustinus Kristiadi, Philipp Hennig, Ulrike von Luxburg

    Abstract: Full Bayesian posteriors are rarely analytically tractable, which is why real-world Bayesian inference heavily relies on approximate techniques. Approximations generally differ from the true posterior and require diagnostic tools to assess whether the inference can still be trusted. We investigate a new approach to diagnosing approximate inference: the approximation mismatch is attributed to a cha… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: 24 pages, 9 figues, to be published in AISTATS22

  13. arXiv:2111.03577  [pdf, other

    cs.LG stat.ML

    Mixtures of Laplace Approximations for Improved Post-Hoc Uncertainty in Deep Learning

    Authors: Runa Eschenhagen, Erik Daxberger, Philipp Hennig, Agustinus Kristiadi

    Abstract: Deep neural networks are prone to overconfident predictions on outliers. Bayesian neural networks and deep ensembles have both been shown to mitigate this problem to some extent. In this work, we aim to combine the benefits of the two approaches by proposing to predict with a Gaussian mixture model posterior that consists of a weighted sum of Laplace approximations of independently trained deep ne… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: Bayesian Deep Learning Workshop, NeurIPS 2021

  14. arXiv:2106.14806  [pdf, other

    cs.LG stat.ML

    Laplace Redux -- Effortless Bayesian Deep Learning

    Authors: Erik Daxberger, Agustinus Kristiadi, Alexander Immer, Runa Eschenhagen, Matthias Bauer, Philipp Hennig

    Abstract: Bayesian formulations of deep learning have been shown to have compelling theoretical properties and offer practical functional benefits, such as improved predictive uncertainty quantification and model selection. The Laplace approximation (LA) is a classic, and arguably the simplest family of approximations for the intractable posteriors of deep neural networks. Yet, despite its simplicity, the L… ▽ More

    Submitted 14 March, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready version; source code: https://github.com/AlexImmer/Laplace

  15. arXiv:2106.10065  [pdf, other

    cs.LG stat.ML

    Being a Bit Frequentist Improves Bayesian Neural Networks

    Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig

    Abstract: Despite their compelling theoretical properties, Bayesian neural networks (BNNs) tend to perform worse than frequentist methods in classification-based uncertainty quantification (UQ) tasks such as out-of-distribution (OOD) detection. In this paper, based on empirical findings in prior works, we hypothesize that this issue is because even recent Bayesian methods have never considered OOD data in t… ▽ More

    Submitted 2 February, 2022; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: AISTATS 2022

  16. arXiv:2010.02720  [pdf, other

    cs.LG

    Learnable Uncertainty under Laplace Approximations

    Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig

    Abstract: Laplace approximations are classic, computationally lightweight means for constructing Bayesian neural networks (BNNs). As in other approximate BNNs, one cannot necessarily expect the induced predictive uncertainty to be calibrated. Here we develop a formalism to explicitly "train" the uncertainty in a decoupled way to the prediction itself. To this end, we introduce uncertainty units for Laplace-… ▽ More

    Submitted 7 June, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: UAI 2021

  17. arXiv:2010.02709  [pdf, other

    cs.LG stat.ML

    An Infinite-Feature Extension for Bayesian ReLU Nets That Fixes Their Asymptotic Overconfidence

    Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig

    Abstract: A Bayesian treatment can mitigate overconfidence in ReLU nets around the training data. But far away from them, ReLU Bayesian neural networks (BNNs) can still underestimate uncertainty and thus be asymptotically overconfident. This issue arises since the output variance of a BNN with finitely many features is quadratic in the distance from the data region. Meanwhile, Bayesian linear models with Re… ▽ More

    Submitted 24 January, 2022; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2021

  18. arXiv:2003.01227  [pdf, other

    cs.LG stat.ML

    Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

    Authors: Marius Hobbhahn, Agustinus Kristiadi, Philipp Hennig

    Abstract: In Bayesian Deep Learning, distributions over the output of classification neural networks are often approximated by first constructing a Gaussian distribution over the weights, then sampling from it to receive a distribution over the softmax outputs. This is costly. We reconsider old work (Laplace Bridge) to construct a Dirichlet approximation of this softmax output distribution, which yields an… ▽ More

    Submitted 31 May, 2022; v1 submitted 2 March, 2020; originally announced March 2020.

    Comments: Updated version. Accepted for publication at UAI2022

  19. arXiv:2002.10118  [pdf, other

    stat.ML cs.LG

    Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU Networks

    Authors: Agustinus Kristiadi, Matthias Hein, Philipp Hennig

    Abstract: The point estimates of ReLU classification networks---arguably the most widely used neural network architecture---have been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predicti… ▽ More

    Submitted 17 July, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  20. arXiv:1902.01080  [pdf, other

    stat.ML cs.AI cs.LG

    Predictive Uncertainty Quantification with Compound Density Networks

    Authors: Agustinus Kristiadi, Sina Däubener, Asja Fischer

    Abstract: Despite the huge success of deep neural networks (NNs), finding good mechanisms for quantifying their prediction uncertainty is still an open problem. Bayesian neural networks are one of the most popular approaches to uncertainty quantification. On the other hand, it was recently shown that ensembles of NNs, which belong to the class of mixture models, can be used to quantify prediction uncertaint… ▽ More

    Submitted 29 December, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: Bayesian deep learning workshop, NeuRIPS 2019

  21. arXiv:1809.03194  [pdf, ps, other

    cs.AI cs.CL cs.LG

    Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge

    Authors: Debanjan Chaudhuri, Agustinus Kristiadi, Jens Lehmann, Asja Fischer

    Abstract: Building systems that can communicate with humans is a core problem in Artificial Intelligence. This work proposes a novel neural network architecture for response selection in an end-to-end multi-turn conversational dialogue setting. The architecture applies context level attention and incorporates additional external knowledge provided by descriptions of domain-specific words. It uses a bi-direc… ▽ More

    Submitted 5 November, 2018; v1 submitted 10 September, 2018; originally announced September 2018.

    Comments: Published as conference paper at CoNLL 2018

  22. arXiv:1802.00934  [pdf, other

    cs.AI stat.ML

    Incorporating Literals into Knowledge Graph Embeddings

    Authors: Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens Lehmann, Asja Fischer

    Abstract: Knowledge graphs, on top of entities and their relationships, contain other important elements: literals. Literals encode interesting properties (e.g. the height) of entities that are not captured by links between entities alone. Most of the existing work on embedding (or latent feature) based knowledge graph analysis focuses mainly on the relations between entities. In this work, we study the eff… ▽ More

    Submitted 18 July, 2019; v1 submitted 3 February, 2018; originally announced February 2018.

    Comments: 9 pages, 2 figures, 6 tables