Zum Hauptinhalt springen

Showing 1–23 of 23 results for author: McWilliams, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.04229  [pdf, other

    cs.LG cs.SD eess.AS

    MusicRL: Aligning Music Generation to Human Preferences

    Authors: Geoffrey Cideron, Sertan Girgin, Mauro Verzetti, Damien Vincent, Matej Kastelic, Zalán Borsos, Brian McWilliams, Victor Ungureanu, Olivier Bachem, Olivier Pietquin, Matthieu Geist, Léonard Hussenot, Neil Zeghidour, Andrea Agostinelli

    Abstract: We propose MusicRL, the first music generation system finetuned from human feedback. Appreciation of text-to-music models is particularly subjective since the concept of musicality as well as the specific intention behind a caption are user-dependent (e.g. a caption such as "upbeat work-out music" can map to a retro guitar solo or a techno pop beat). Not only this makes supervised training of such… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  3. arXiv:2209.07562  [pdf, other

    cs.CL

    TwHIN-BERT: A Socially-Enriched Pre-trained Language Model for Multilingual Tweet Representations at Twitter

    Authors: Xinyang Zhang, Yury Malkov, Omar Florez, Serim Park, Brian McWilliams, Jiawei Han, Ahmed El-Kishky

    Abstract: Pre-trained language models (PLMs) are fundamental for natural language processing applications. Most existing PLMs are not tailored to the noisy user-generated text on social media, and the pre-training does not factor in the valuable social engagement logs available in a social network. We present TwHIN-BERT, a multilingual language model productionized at Twitter, trained on in-domain data from… ▽ More

    Submitted 26 August, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

  4. arXiv:2206.04993  [pdf, other

    cs.LG cs.AI cs.GT stat.ML

    The Symmetric Generalized Eigenvalue Problem as a Nash Equilibrium

    Authors: Ian Gemp, Charlie Chen, Brian McWilliams

    Abstract: The symmetric generalized eigenvalue problem (SGEP) is a fundamental concept in numerical linear algebra. It captures the solution of many classical machine learning problems such as canonical correlation analysis, independent components analysis, partial least squares, linear discriminant analysis, principal components and others. Despite this, most general solvers are prohibitively expensive whe… ▽ More

    Submitted 25 April, 2023; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Published in ICLR 2023 (JAX code available as part of github.com/deepmind/eigengame)

  5. arXiv:2201.05119  [pdf, other

    cs.CV cs.LG stat.ML

    Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

    Authors: Nenad Tomasev, Ioana Bica, Brian McWilliams, Lars Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

    Abstract: Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning.… ▽ More

    Submitted 3 November, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

  6. arXiv:2102.04152  [pdf, other

    stat.ML cs.AI cs.LG

    EigenGame Unloaded: When playing games is better than optimizing

    Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

    Abstract: We build on the recently proposed EigenGame that views eigendecomposition as a competitive game. EigenGame's updates are biased if computed using minibatches of data, which hinders convergence and more sophisticated parallelism in the stochastic setting. In this work, we propose an unbiased stochastic update that is asymptotically equivalent to EigenGame, enjoys greater parallelism allowing comput… ▽ More

    Submitted 22 March, 2022; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: Published in ICLR '22

  7. arXiv:2010.07922  [pdf, other

    cs.LG cs.CV stat.ML

    Representation Learning via Invariant Causal Mechanisms

    Authors: Jovana Mitrovic, Brian McWilliams, Jacob Walker, Lars Buesing, Charles Blundell

    Abstract: Self-supervised learning has emerged as a strategy to reduce the reliance on costly supervised signal by pretraining representations only using unlabeled data. These methods combine heuristic proxy classification tasks with data augmentations and have achieved significant success, but our theoretical understanding of this success remains limited. In this paper we analyze self-supervised representa… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  8. arXiv:2010.00554  [pdf, other

    cs.LG stat.ML

    EigenGame: PCA as a Nash Equilibrium

    Authors: Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel

    Abstract: We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function. We analyze the properties of this PCA game and the behavior of its gradient based updates. The resulting algorithm -- which combines elements from Oja's rule with a generalized Gram-Schmidt orthogon… ▽ More

    Submitted 16 March, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

    Comments: Published as a conference paper at International Conference on Learning Representations (ICLR) 2021

  9. arXiv:2002.02325  [pdf, other

    cs.MA cs.AI

    Social diversity and social preferences in mixed-motive reinforcement learning

    Authors: Kevin R. McKee, Ian Gemp, Brian McWilliams, Edgar A. Duéñez-Guzmán, Edward Hughes, Joel Z. Leibo

    Abstract: Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity. In contrast, studies of reinforcement learning in mixed-motive games have primarily leveraged homogeneous approaches. Given the defining characteristic of mixed-motive games--the imperfect correlation of incentives between group members--we study the… ▽ More

    Submitted 12 February, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

    Comments: Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020)

  10. arXiv:1901.05061  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Spectrogram Feature Losses for Music Source Separation

    Authors: Abhimanyu Sahai, Romann Weber, Brian McWilliams

    Abstract: In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement… ▽ More

    Submitted 26 June, 2019; v1 submitted 15 January, 2019; originally announced January 2019.

    Comments: Accepted for presentation at the 27th European Signal Processing Conference (EUSIPCO 2019)

    MSC Class: 62; 68 ACM Class: I.2.6; H.5.5

  11. arXiv:1808.03856  [pdf, other

    cs.LG cs.GR stat.ML

    Neural Importance Sampling

    Authors: Thomas Müller, Brian McWilliams, Fabrice Rousselle, Markus Gross, Jan Novák

    Abstract: We propose to use deep neural networks for generating samples in Monte Carlo integration. Our work is based on non-linear independent components estimation (NICE), which we extend in numerous ways to improve performance and enable its application to integration problems. First, we introduce piecewise-polynomial coupling transforms that greatly increase the modeling power of individual coupling lay… ▽ More

    Submitted 3 September, 2019; v1 submitted 11 August, 2018; originally announced August 2018.

    Comments: 19 pages, 15 figures. Accepted for publication in ACM Transactions on Graphics; presented at SIGGRAPH 2019

  12. arXiv:1804.02900  [pdf, other

    cs.CV

    A Fully Progressive Approach to Single-Image Super-Resolution

    Authors: Yifan Wang, Federico Perazzi, Brian McWilliams, Alexander Sorkine-Hornung, Olga Sorkine-Hornung, Christopher Schroers

    Abstract: Recent deep learning approaches to single image super-resolution have achieved impressive results in terms of traditional error measures and perceptual quality. However, in each case it remains challenging to achieve high quality results for large upsampling factors. To this end, we propose a method (ProSR) that is progressive both in architecture and training: the network upsamples an image in in… ▽ More

    Submitted 10 April, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

  13. arXiv:1804.00884  [pdf, other

    cs.CV

    PhaseNet for Video Frame Interpolation

    Authors: Simone Meyer, Abdelaziz Djelouah, Brian McWilliams, Alexander Sorkine-Hornung, Markus Gross, Christopher Schroers

    Abstract: Most approaches for video frame interpolation require accurate dense correspondences to synthesize an in-between frame. Therefore, they do not perform well in challenging scenarios with e.g. lighting changes or motion blur. Recent deep learning approaches that rely on kernels to represent motion can only alleviate these problems to some extent. In those cases, methods that use a per-pixel phase-ba… ▽ More

    Submitted 3 April, 2018; originally announced April 2018.

    Comments: CVPR 2018

  14. arXiv:1709.05418  [pdf, other

    cs.LG cs.GR stat.ML

    Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks

    Authors: Simon Kallweit, Thomas Müller, Brian McWilliams, Markus Gross, Jan Novák

    Abstract: We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integrat… ▽ More

    Submitted 15 September, 2017; originally announced September 2017.

    Comments: ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2017)

  15. arXiv:1703.00403  [pdf, other

    stat.ML cs.CR cs.DC cs.LG

    Preserving Differential Privacy Between Features in Distributed Estimation

    Authors: Christina Heinze-Deml, Brian McWilliams, Nicolai Meinshausen

    Abstract: Privacy is crucial in many applications of machine learning. Legal, ethical and societal issues restrict the sharing of sensitive data making it difficult to learn from datasets that are partitioned between many parties. One important instance of such a distributed setting arises when information about each record in the dataset is held by different data owners (the design matrix is "vertically-pa… ▽ More

    Submitted 27 June, 2017; v1 submitted 1 March, 2017; originally announced March 2017.

    Journal ref: Stat 7 (1), 2018

  16. arXiv:1702.08591  [pdf, other

    cs.NE cs.LG stat.ML

    The Shattered Gradients Problem: If resnets are the answer, then what is the question?

    Authors: David Balduzzi, Marcus Frean, Lennox Leary, JP Lewis, Kurt Wan-Duo Ma, Brian McWilliams

    Abstract: A long-standing obstacle to progress in deep learning is the problem of vanishing and exploding gradients. Although, the problem has largely been overcome via carefully constructed initializations and batch normalization, architectures incorporating skip-connections such as highway and resnets perform much better than standard feedforward architectures despite well-chosen initialization and batch… ▽ More

    Submitted 6 June, 2018; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: ICML 2017, final version

    Journal ref: PMLR volume 70 (2017)

  17. arXiv:1611.06652  [pdf, other

    stat.ML cs.LG

    Scalable Adaptive Stochastic Optimization Using Random Projections

    Authors: Gabriel Krummenacher, Brian McWilliams, Yannic Kilcher, Joachim M. Buhmann, Nicolai Meinshausen

    Abstract: Adaptive stochastic gradient methods such as AdaGrad have gained popularity in particular for training deep neural networks. The most commonly used and studied variant maintains a diagonal matrix approximation to second order information by accumulating past gradients which are used to tune the step size adaptively. In certain situations the full-matrix variant of AdaGrad is expected to attain bet… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: To appear in Advances in Neural Information Processing Systems 29 (NIPS 2016)

  18. arXiv:1611.02345  [pdf, other

    cs.LG cs.NE stat.ML

    Neural Taylor Approximations: Convergence and Exploration in Rectifier Networks

    Authors: David Balduzzi, Brian McWilliams, Tony Butler-Yeoman

    Abstract: Modern convolutional networks, incorporating rectifiers and max-pooling, are neither smooth nor convex; standard guarantees therefore do not apply. Nevertheless, methods from convex optimization such as gradient descent and Adam are widely used as building blocks for deep learning algorithms. This paper provides the first convergence guarantee applicable to modern convnets, which furthermore match… ▽ More

    Submitted 6 June, 2018; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: ICML 2017, final version

    Journal ref: PMLR volume 70, 2017

  19. arXiv:1507.08104  [pdf, other

    cs.LG

    Learning Representations for Outlier Detection on a Budget

    Authors: Barbora Micenková, Brian McWilliams, Ira Assent

    Abstract: The problem of detecting a small number of outliers in a large dataset is an important task in many fields from fraud detection to high-energy physics. Two approaches have emerged to tackle this problem: unsupervised and supervised. Supervised approaches require a sufficient amount of labeled data and are challenged by novel types of outliers and inherent class imbalance, whereas unsupervised meth… ▽ More

    Submitted 29 July, 2015; originally announced July 2015.

  20. arXiv:1506.03662  [pdf, other

    cs.LG math.OC stat.ML

    Variance Reduced Stochastic Gradient Descent with Neighbors

    Authors: Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams

    Abstract: Stochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its slow convergence can be a computational bottleneck. Variance reduction techniques such as SAG, SVRG and SAGA have been proposed to overcome this weakness, achieving linear convergence. However, these methods are either based on computations of full gradients at pivot points, or on keeping per data point corrections in me… ▽ More

    Submitted 26 February, 2016; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 13 pages

    MSC Class: 90C06; 90C25; 68T05 ACM Class: G.1.6; I.2.6

  21. arXiv:1506.02554  [pdf, other

    stat.ML cs.DC cs.LG

    DUAL-LOCO: Distributing Statistical Estimation Using Random Projections

    Authors: Christina Heinze, Brian McWilliams, Nicolai Meinshausen

    Abstract: We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO ha… ▽ More

    Submitted 8 January, 2016; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: 13 pages

    Journal ref: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 51, 2016, 12 pages

  22. arXiv:1503.08316  [pdf, ps, other

    cs.LG

    A Variance Reduced Stochastic Newton Method

    Authors: Aurelien Lucchi, Brian McWilliams, Thomas Hofmann

    Abstract: Quasi-Newton methods are widely used in practise for convex loss minimization problems. These methods exhibit good empirical performance on a wide variety of tasks and enjoy super-linear convergence to the optimal solution. For large-scale learning problems, stochastic Quasi-Newton methods have been recently proposed. However, these typically only achieve sub-linear convergence rates and have not… ▽ More

    Submitted 9 June, 2015; v1 submitted 28 March, 2015; originally announced March 2015.

  23. arXiv:1306.5554  [pdf, ps, other

    stat.ML cs.LG

    Correlated random features for fast semi-supervised learning

    Authors: Brian McWilliams, David Balduzzi, Joachim M. Buhmann

    Abstract: This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It… ▽ More

    Submitted 5 November, 2013; v1 submitted 24 June, 2013; originally announced June 2013.

    Comments: 15 pages, 3 figures, 6 tables