Zum Hauptinhalt springen

Showing 1–27 of 27 results for author: Huszar, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14302  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning

    Authors: Patrik Reizinger, Siyuan Guo, Ferenc Huszár, Bernhard Schölkopf, Wieland Brendel

    Abstract: Identifying latent representations or causal structures is important for good generalization and downstream task performance. However, both fields have been developed rather independently. We observe that several methods in both representation and causal structure learning rely on the same data-generating process (DGP), namely, exchangeable but not i.i.d. (independent and identically distributed)… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2405.18836  [pdf, other

    stat.ME cs.LG

    Do Finetti: On Causal Effects for Exchangeable Data

    Authors: Siyuan Guo, Chi Zhang, Karthika Mohan, Ferenc Huszár, Bernhard Schölkopf

    Abstract: We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2405.15485  [pdf, other

    cs.AI cs.CL cs.LG

    Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs

    Authors: Siyuan Guo, Aniket Didolkar, Nan Rosemary Ke, Anirudh Goyal, Ferenc Huszár, Bernhard Schölkopf

    Abstract: We are beginning to see progress in language model assisted scientific discovery. Motivated by the use of LLMs as a general scientific assistant, this paper assesses the domain knowledge of LLMs through its understanding of different mathematical skills required to solve problems. In particular, we look at not just what the pre-trained model already knows, but how it learned to learn from informat… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2405.14791  [pdf, other

    cs.LG cs.CV cs.DC

    Recurrent Early Exits for Federated Learning with Heterogeneous Clients

    Authors: Royson Lee, Javier Fernandez-Marques, Shell Xu Hu, Da Li, Stefanos Laskaridis, Łukasz Dudziak, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane

    Abstract: Federated learning (FL) has enabled distributed learning of a model across multiple clients in a privacy-preserving manner. One of the main challenges of FL is to accommodate clients with varying hardware capacities; clients have differing compute and memory requirements. To tackle this challenge, recent state-of-the-art approaches leverage the use of early exits. Nonetheless, these approaches fal… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted at the 41st International Conference on Machine Learning (ICML 2024)

  5. arXiv:2405.01964  [pdf, other

    stat.ML cs.LG

    Position: Understanding LLMs Requires More Than Statistical Generalization

    Authors: Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár

    Abstract: The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statist… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted as a position paper at ICML2024, Code: https://github.com/rpatrik96/llm-non-identifiability

  6. arXiv:2310.02420  [pdf, other

    cs.LG cs.CV cs.DC

    FedL2P: Federated Learning to Personalize

    Authors: Royson Lee, Minyoung Kim, Da Li, Xinchi Qiu, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane

    Abstract: Federated learning (FL) research has made progress in developing algorithms for distributed learning of global models, as well as algorithms for local personalization of those common models to the specifics of each client's local data distribution. However, different FL problems may require different personalization strategies, and it may not even be possible to define an effective one-size-fits-a… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  7. arXiv:2212.07886  [pdf, other

    cs.CV

    Meta-Learned Kernel For Blind Super-Resolution Kernel Estimation

    Authors: Royson Lee, Rui Li, Stylianos I. Venieris, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane

    Abstract: Recent image degradation estimation methods have enabled single-image super-resolution (SR) approaches to better upsample real-world images. Among these methods, explicit kernel estimation approaches have demonstrated unprecedented performance at handling unknown degradations. Nonetheless, a number of limitations constrain their efficacy when used by downstream SR models. Specifically, this family… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

    Comments: Preprint: Accepted at the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

  8. arXiv:2210.10452  [pdf, other

    stat.ML cs.LG

    Rethinking Sharpness-Aware Minimization as Variational Inference

    Authors: Szilvia Ujváry, Zsigmond Telek, Anna Kerekes, Anna Mészáros, Ferenc Huszár

    Abstract: Sharpness-aware minimization (SAM) aims to improve the generalisation of gradient-based learning by seeking out flat minima. In this work, we establish connections between SAM and Mean-Field Variational Inference (MFVI) of neural network parameters. We show that both these methods have interpretations as optimizing notions of flatness, and when using the reparametrisation trick, they both boil dow… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  9. arXiv:2203.15756  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data

    Authors: Siyuan Guo, Viktor Tóth, Bernhard Schölkopf, Ferenc Huszár

    Abstract: Constraint-based causal discovery methods leverage conditional independence tests to infer causal relationships in a wide variety of applications. Just as the majority of machine learning methods, existing work focuses on studying $\textit{independent and identically distributed}$ data. However, it is known that even with infinite i.i.d.$\ $ data, constraint-based methods can only identify causal… ▽ More

    Submitted 24 May, 2024; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: camera-ready NeurIPS 2023

  10. Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics

    Authors: Tomo Lazovich, Luca Belli, Aaron Gonzales, Amanda Bower, Uthaipon Tantipongpipat, Kristian Lum, Ferenc Huszar, Rumman Chowdhury

    Abstract: The harmful impacts of algorithmic decision systems have recently come into focus, with many examples of systems such as machine learning (ML) models amplifying existing societal biases. Most metrics attempting to quantify disparities resulting from ML algorithms focus on differences between groups, dividing users based on demographic identities and comparing model performance or overall outcomes… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: 11 pages, 7 figures

  11. arXiv:2111.11542  [pdf, other

    stat.ML cs.LG

    Depth Without the Magic: Inductive Bias of Natural Gradient Descent

    Authors: Anna Kerekes, Anna Mészáros, Ferenc Huszár

    Abstract: In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep lear… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  12. Algorithmic Amplification of Politics on Twitter

    Authors: Ferenc Huszár, Sofia Ira Ktena, Conor O'Brien, Luca Belli, Andrew Schlaikjer, Moritz Hardt

    Abstract: Content on Twitter's home timeline is selected and ordered by personalization algorithms. By consistently ranking certain content higher, these algorithms may amplify some messages while reducing the visibility of others. There's been intense public and scholarly debate about the possibility that some political groups benefit more from algorithmic amplification than others. We provide quantitative… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

  13. arXiv:2010.05380  [pdf, other

    cs.LG

    Efficient Wasserstein Natural Gradients for Reinforcement Learning

    Authors: Ted Moskovitz, Michael Arbel, Ferenc Huszar, Arthur Gretton

    Abstract: A novel optimization approach is proposed for application to policy gradient methods and evolution strategies for reinforcement learning (RL). The procedure uses a computationally efficient Wasserstein natural gradient (WNG) descent that takes advantage of the geometry induced by a Wasserstein penalty to speed optimization. This method follows the recent theme in RL of including a divergence penal… ▽ More

    Submitted 18 March, 2021; v1 submitted 11 October, 2020; originally announced October 2020.

  14. arXiv:2008.00727  [pdf

    cs.LG cs.AI cs.NE stat.ML

    Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

    Authors: Dalin Guo, Sofia Ira Ktena, Ferenc Huszar, Pranay Kumar Myana, Wenzhe Shi, Alykhan Tejani

    Abstract: Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to addres… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  15. arXiv:2007.14523  [pdf, other

    cs.SI cs.LG stat.ML

    Model Size Reduction Using Frequency Based Double Hashing for Recommender Systems

    Authors: Caojin Zhang, Yicun Liu, Yuanpu Xie, Sofia Ira Ktena, Alykhan Tejani, Akshay Gupta, Pranay Kumar Myana, Deepak Dilipkumar, Suvadip Paul, Ikuhiro Ihara, Prasang Upadhyaya, Ferenc Huszar, Wenzhe Shi

    Abstract: Deep Neural Networks (DNNs) with sparse input features have been widely used in recommender systems in industry. These models have large memory requirements and need a huge amount of training data. The large model size usually entails a cost, in the range of millions of dollars, for storage and communication with the inference services. In this paper, we propose a hybrid hashing method to combine… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Comments: Paper is accepted to RecSys 2020

  16. arXiv:1907.06558  [pdf, other

    stat.ML cs.LG

    Addressing Delayed Feedback for Continuous Training with Neural Networks in CTR prediction

    Authors: Sofia Ira Ktena, Alykhan Tejani, Lucas Theis, Pranay Kumar Myana, Deepak Dilipkumar, Ferenc Huszar, Steven Yoo, Wenzhe Shi

    Abstract: One of the challenges in display advertising is that the distribution of features and click through rate (CTR) can exhibit large shifts over time due to seasonality, changes to ad campaigns and other factors. The predominant strategy to keep up with these shifts is to train predictive models continuously, on fresh data, in order to prevent them from becoming stale. However, in many ad systems posi… ▽ More

    Submitted 23 April, 2021; v1 submitted 15 July, 2019; originally announced July 2019.

    Comments: Accepted at RecSys '19

  17. arXiv:1801.05787  [pdf, other

    cs.CV stat.ML

    Faster gaze prediction with dense networks and Fisher pruning

    Authors: Lucas Theis, Iryna Korshunova, Alykhan Tejani, Ferenc Huszár

    Abstract: Predicting human fixations from images has recently seen large improvements by leveraging deep representations which were pretrained for object recognition. However, as we show in this paper, these networks are highly overparameterized for the task of fixation prediction. We first present a simple yet principled greedy pruning method which we call Fisher pruning. Through a combination of knowledge… ▽ More

    Submitted 9 July, 2018; v1 submitted 17 January, 2018; originally announced January 2018.

  18. On Quadratic Penalties in Elastic Weight Consolidation

    Authors: Ferenc Huszár

    Abstract: Elastic weight consolidation (EWC, Kirkpatrick et al, 2017) is a novel algorithm designed to safeguard against catastrophic forgetting in neural networks. EWC can be seen as an approximation to Laplace propagation (Eskin et al, 2004), and this view is consistent with the motivation given by Kirkpatrick et al (2017). In this note, I present an extended derivation that covers the case when there are… ▽ More

    Submitted 11 December, 2017; originally announced December 2017.

  19. arXiv:1703.00395  [pdf, other

    stat.ML cs.CV

    Lossy Image Compression with Compressive Autoencoders

    Authors: Lucas Theis, Wenzhe Shi, Andrew Cunningham, Ferenc Huszár

    Abstract: We propose a new approach to the problem of optimizing autoencoders for lossy image compression. New media formats, changing hardware technology, as well as diverse requirements and content types create a need for compression algorithms which are more flexible than existing codecs. Autoencoders have the potential to address this need, but are difficult to optimize directly due to the inherent non-… ▽ More

    Submitted 1 March, 2017; originally announced March 2017.

  20. arXiv:1702.08235  [pdf, other

    stat.ML cs.LG

    Variational Inference using Implicit Distributions

    Authors: Ferenc Huszár

    Abstract: Generative adversarial networks (GANs) have given us a great tool to fit implicit generative models to data. Implicit distributions are ones we can sample from easily, and take derivatives of samples with respect to model parameters. These models are highly expressive and we argue they can prove just as useful for variational inference (VI) as they are for generative modelling. Several papers have… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

  21. arXiv:1610.04490  [pdf, other

    cs.CV cs.LG stat.ML

    Amortised MAP Inference for Image Super-resolution

    Authors: Casper Kaae Sønderby, Jose Caballero, Lucas Theis, Wenzhe Shi, Ferenc Huszár

    Abstract: Image super-resolution (SR) is an underdetermined inverse problem, where a large number of plausible high-resolution images can explain the same downsampled image. Most current single image SR methods use empirical risk minimisation, often with a pixel-wise mean squared error (MSE) loss. However, the outputs from such methods tend to be blurry, over-smoothed and generally appear implausible. A mor… ▽ More

    Submitted 21 February, 2017; v1 submitted 14 October, 2016; originally announced October 2016.

  22. arXiv:1609.07009  [pdf

    cs.CV

    Is the deconvolution layer the same as a convolutional layer?

    Authors: Wenzhe Shi, Jose Caballero, Lucas Theis, Ferenc Huszar, Andrew Aitken, Christian Ledig, Zehan Wang

    Abstract: In this note, we want to focus on aspects related to two questions most people asked us at CVPR about the network we presented. Firstly, What is the relationship between our proposed layer and the deconvolution layer? And secondly, why are convolutions in low-resolution (LR) space a better choice? These are key questions we tried to answer in the paper, but we were not able to go into as much dept… ▽ More

    Submitted 22 September, 2016; originally announced September 2016.

    Comments: This is a note to share some additional insights for our the CVPR paper

  23. arXiv:1609.05158  [pdf, other

    cs.CV stat.ML

    Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

    Authors: Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, Zehan Wang

    Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolut… ▽ More

    Submitted 23 September, 2016; v1 submitted 16 September, 2016; originally announced September 2016.

    Comments: CVPR 2016 paper with updated affiliations and supplemental material, fixed typo in equation 4

  24. arXiv:1609.04802  [pdf, other

    cs.CV stat.ML

    Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

    Authors: Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, Wenzhe Shi

    Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. R… ▽ More

    Submitted 25 May, 2017; v1 submitted 15 September, 2016; originally announced September 2016.

    Comments: 19 pages, 15 figures, 2 tables, accepted for oral presentation at CVPR, main paper + some supplementary material

  25. arXiv:1511.05101  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

    Authors: Ferenc Huszár

    Abstract: Modern applications and progress in deep learning research have created renewed interest for generative models of text and of images. However, even today it is unclear what objective functions one should use to train and evaluate these models. In this paper we present two contributions. Firstly, we present a critique of scheduled sampling, a state-of-the-art training method that contributed to t… ▽ More

    Submitted 16 November, 2015; originally announced November 2015.

  26. arXiv:1408.2049   

    cs.LG stat.ML

    Optimally-Weighted Herding is Bayesian Quadrature

    Authors: Ferenc Huszar, David Duvenaud

    Abstract: Herding and kernel herding are deterministic methods of choosing samples which summarise a probability distribution. A related task is choosing samples for estimating integrals using Bayesian quadrature. We show that the criterion minimised when selecting samples in kernel herding is equivalent to the posterior variance in Bayesian quadrature. We then show that sequential Bayesian quadrature can b… ▽ More

    Submitted 13 July, 2016; v1 submitted 9 August, 2014; originally announced August 2014.

    Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012). This copy was withdrawn since it's a duplicate of arXiv:1204.1664

    Report number: UAI-P-2012-PG-377-386

  27. arXiv:1112.5745  [pdf, other

    stat.ML cs.LG

    Bayesian Active Learning for Classification and Preference Learning

    Authors: Neil Houlsby, Ferenc Huszár, Zoubin Ghahramani, Máté Lengyel

    Abstract: Information theoretic active learning has been widely studied for probabilistic models. For simple regression an optimal myopic policy is easily tractable. However, for other tasks and with more complex models, such as classification with nonparametric models, the optimal solution is harder to compute. Current approaches make approximations to achieve tractability. We propose an approach that expr… ▽ More

    Submitted 24 December, 2011; originally announced December 2011.