Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Fitzgibbon, A

.
  1. arXiv:2407.17353  [pdf, other

    cs.LG

    Scalify: scale propagation for efficient low-precision LLM training

    Authors: Paul Balança, Sam Hosegood, Carlo Luschi, Andrew Fitzgibbon

    Abstract: Low-precision formats such as float8 have been introduced in machine learning accelerated hardware to improve computational efficiency for large language models training and inference. Nevertheless, adoption by the ML community has been slowed down by the complex, and sometimes brittle, techniques required to match higher precision training accuracy. In this work, we present Scalify, a end-to-end… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures, ICML 2024 WANT workshop

    MSC Class: 68T07 ACM Class: I.2.7

  2. arXiv:2406.03121  [pdf, ps, other

    cs.LG cond-mat.mtrl-sci physics.comp-ph

    MESS: Modern Electronic Structure Simulations

    Authors: Hatem Helal, Andrew Fitzgibbon

    Abstract: Electronic structure simulation (ESS) has been used for decades to provide quantitative scientific insights on an atomistic scale, enabling advances in chemistry, biology, and materials science, among other disciplines. Following standard practice in scientific computing, the software packages driving these studies have been implemented in compiled languages such as FORTRAN and C. However, the rec… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  3. arXiv:2404.14986  [pdf, other

    cs.LG cs.AI

    $\texttt{MiniMol}$: A Parameter-Efficient Foundation Model for Molecular Learning

    Authors: Kerstin Kläser, Błażej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, Andrew Fitzgibbon

    Abstract: In biological tasks, data is rarely plentiful as it is generated from hard-to-gather measurements. Therefore, pre-training foundation models on large quantities of available data and then transfer to low-data downstream tasks is a promising direction. However, how to design effective foundation models for molecular learning remains an open question, with existing approaches typically focusing on m… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  4. arXiv:2402.04030  [pdf, other

    cs.LG

    Reducing the Cost of Quantum Chemical Data By Backpropagating Through Density Functional Theory

    Authors: Alexander Mathiasen, Hatem Helal, Paul Balanca, Adam Krzywaniak, Ali Parviz, Frederik Hvilshøj, Blazej Banaszewski, Carlo Luschi, Andrew William Fitzgibbon

    Abstract: Density Functional Theory (DFT) accurately predicts the quantum chemical properties of molecules, but scales as $O(N_{\text{electrons}}^3)$. Schütt et al. (2019) successfully approximate DFT 1000x faster with Neural Networks (NN). Arguably, the biggest problem one faces when scaling to larger molecules is the cost of DFT labels. For example, it took years to create the PCQ dataset (Nakata & Shimaz… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  5. arXiv:2311.01135  [pdf, other

    cs.LG physics.chem-ph

    Generating QM1B with PySCF$_{\text{IPU}}$

    Authors: Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters

    Abstract: The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. Th… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: 15 pages, 7 figures. NeurIPS 2023 Track Datasets and Benchmarks

    ACM Class: I.2.6; J.2

  6. arXiv:2310.04292  [pdf, other

    cs.LG

    Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

    Authors: Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Zhiyi Li, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, Ali Parviz, Michael Craig, Michał Koziarski, Jiarui Lu, Zhaocheng Zhu, Cristian Gabellini, Kerstin Klaser, Josef Dean, Cas Wognum, Maciej Sypetkowski, Guillaume Rabusseau, Reihaneh Rabbany, Jian Tang, Christopher Morris , et al. (10 additional authors not shown)

    Abstract: Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by… ▽ More

    Submitted 18 October, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

  7. arXiv:2309.17224  [pdf, other

    cs.LG cs.AR cs.CL cs.ET cs.PF

    Training and inference of large language models using 8-bit floating point

    Authors: Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew William Fitzgibbon

    Abstract: FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect h… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    ACM Class: I.2.7; B.2.4

  8. arXiv:2302.02947  [pdf, other

    cs.LG

    GPS++: Reviving the Art of Message Passing for Molecular Property Prediction

    Authors: Dominic Masters, Josef Dean, Kerstin Klaser, Zhiyi Li, Sam Maddrell-Mander, Adam Sanders, Hatem Helal, Deniz Beker, Andrew Fitzgibbon, Shenyang Huang, Ladislav Rampášek, Dominique Beaini

    Abstract: We present GPS++, a hybrid Message Passing Neural Network / Graph Transformer model for molecular property prediction. Our model integrates a well-tuned local message passing component and biased global attention with other key ideas from prior literature to achieve state-of-the-art results on large-scale molecular dataset PCQM4Mv2. Through a thorough ablation study we highlight the impact of indi… ▽ More

    Submitted 12 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2212.02229

  9. arXiv:2212.10307  [pdf, other

    cs.PL cs.LG cs.MS

    Efficient and Sound Differentiable Programming in a Functional Array-Processing Language

    Authors: Amir Shaikhha, Mathieu Huot, Shabnam Ghasemirad, Andrew Fitzgibbon, Simon Peyton Jones, Dimitrios Vytiniotis

    Abstract: Automatic differentiation (AD) is a technique for computing the derivative of a function represented by a program. This technique is considered as the de-facto standard for computing the differentiation in many machine learning and optimisation software tools. Despite the practicality of this technique, the performance of the differentiated programs, especially for functional languages and in the… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:1806.02136

  10. arXiv:2211.12281  [pdf, other

    cs.LG cs.AI

    BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion

    Authors: Alberto Cattaneo, Daniel Justus, Harry Mellor, Douglas Orr, Jerome Maloberti, Zhenying Liu, Thorin Farnsworth, Andrew Fitzgibbon, Blazej Banaszewski, Carlo Luschi

    Abstract: We present the award-winning submission to the WikiKG90Mv2 track of OGB-LSC@NeurIPS 2022. The task is link-prediction on the large-scale knowledge graph WikiKG90Mv2, consisting of 90M+ nodes and 600M+ edges. Our solution uses a diverse ensemble of $85$ Knowledge Graph Embedding models combining five different scoring functions (TransE, TransH, RotatE, DistMult, ComplEx) and two different loss func… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: First place in the WikiKG90Mv2 track of the Open Graph Benchmark Large-Scale Challenge @NeurIPS2022

  11. arXiv:2209.06354  [pdf, other

    cs.LG

    Tuple Packing: Efficient Batching of Small Graphs in Graph Neural Networks

    Authors: Mario Michael Krell, Manuel Lopez, Sreenidhi Anand, Hatem Helal, Andrew William Fitzgibbon

    Abstract: When processing a batch of graphs in machine learning models such as Graph Neural Networks (GNN), it is common to combine several small graphs into one overall graph to accelerate processing and remove or reduce the overhead of padding. This is for example supported in the PyG library. However, the sizes of small graphs can vary substantially with respect to the number of nodes and edges, and henc… ▽ More

    Submitted 18 September, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

  12. arXiv:2203.05789  [pdf, other

    cs.CV cs.LG

    FLAG: Flow-based 3D Avatar Generation from Sparse Observations

    Authors: Sadegh Aliakbarian, Pashmina Cameron, Federica Bogo, Andrew Fitzgibbon, Thomas J. Cashman

    Abstract: To represent people in mixed reality applications for collaboration and communication, we need to generate realistic and faithful avatar poses. However, the signal streams that can be applied for this task from head-mounted devices (HMDs) are typically limited to head pose and hand pose estimates. While these signals are valuable, they are an incomplete representation of the human body, making it… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: Accepted at CVPR 2022

  13. arXiv:2107.02027  [pdf, other

    cs.CL cs.CC cs.IT cs.LG

    Efficient Sequence Packing without Cross-contamination: Accelerating Large Language Models without Impacting Performance

    Authors: Mario Michael Krell, Matej Kosec, Sergio P. Perez, Andrew Fitzgibbon

    Abstract: Effective training of today's large language models (LLMs) depends on large batches and long sequences for throughput and accuracy. To handle variable-length sequences on hardware accelerators, it is common practice to introduce padding tokens, so that all sequences in a batch have the same length. We show in this paper that the variation in sequence lengths in common NLP datasets is such that up… ▽ More

    Submitted 5 October, 2022; v1 submitted 29 June, 2021; originally announced July 2021.

    Comments: Significantly new version with different authors and much more content. Much larger variety in experiments and exhaustive SOTA analysis

    MSC Class: 05-08 ACM Class: I.2.7; G.2.1

  14. arXiv:2105.02856  [pdf, other

    cs.PL cs.DS

    Hashing Modulo Alpha-Equivalence

    Authors: Krzysztof Maziarz, Tom Ellis, Alan Lawrence, Andrew Fitzgibbon, Simon Peyton Jones

    Abstract: In many applications one wants to identify identical subtrees of a program syntax tree. This identification should ideally be robust to alpha-renaming of the program, but no existing technique has been shown to achieve this with good efficiency (better than $\mathcal{O}(n^2)$ in expression size). We present a new, asymptotically efficient way to hash modulo alpha-equivalence. A key insight of our… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: Accepted for publication at the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI 2021)

  15. Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop

    Authors: Benjamin Biggs, Oliver Boyne, James Charles, Andrew Fitzgibbon, Roberto Cipolla

    Abstract: We introduce an automatic, end-to-end method for recovering the 3D pose and shape of dogs from monocular internet images. The large variation in shape between dog breeds, significant occlusion and low quality of internet images makes this a challenging problem. We learn a richer prior over shapes than previous work, which helps regularize parameter estimation. We demonstrate results on the Stanfor… ▽ More

    Submitted 11 February, 2021; v1 submitted 21 July, 2020; originally announced July 2020.

    Comments: Accepted at ECCV 2020

    Journal ref: 16th European Conference Glasgow UK August 23 to 28 2020 Proceedings Part XI

  16. arXiv:2007.04940  [pdf, other

    cs.CV

    The Phong Surface: Efficient 3D Model Fitting using Lifted Optimization

    Authors: Jingjing Shen, Thomas J. Cashman, Qi Ye, Tim Hutton, Toby Sharp, Federica Bogo, Andrew William Fitzgibbon, Jamie Shotton

    Abstract: Realtime perceptual and interaction capabilities in mixed reality require a range of 3D tracking problems to be solved at low latency on resource-constrained hardware such as head-mounted devices. Indeed, for devices such as HoloLens 2 where the CPU and GPU are left available for applications, multiple tracking subsystems are required to run on a continuous, real-time basis while sharing a single… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Journal ref: ECCV2020

  17. arXiv:2002.12674  [pdf, other

    cs.CV cs.LG

    Inverse Graphics GAN: Learning to Generate 3D Shapes from Unstructured 2D Data

    Authors: Sebastian Lunz, Yingzhen Li, Andrew Fitzgibbon, Nate Kushman

    Abstract: Recent work has shown the ability to learn generative models for 3D shapes from only unstructured 2D images. However, training such models requires differentiating through the rasterization step of the rendering process, therefore past work has focused on developing bespoke rendering models which smooth over this non-differentiable process in various ways. Such models are thus unable to take advan… ▽ More

    Submitted 28 February, 2020; originally announced February 2020.

    Comments: 8 pages paper, 3 pages references, 18 pages appendix

  18. arXiv:1811.05804  [pdf, other

    cs.CV

    Creatures great and SMAL: Recovering the shape and motion of animals from video

    Authors: Benjamin Biggs, Thomas Roddick, Andrew Fitzgibbon, Roberto Cipolla

    Abstract: We present a system to recover the 3D shape and motion of a wide variety of quadrupeds from video. The system comprises a machine learning front-end which predicts candidate 2D joint positions, a discrete optimization which finds kinematically plausible joint correspondences, and an energy minimization stage which fits a detailed 3D model to the image. In order to overcome the limited availability… ▽ More

    Submitted 14 November, 2018; originally announced November 2018.

    Comments: 17 pages, ACCV 2018 oral paper

  19. arXiv:1807.10129  [pdf, other

    cs.MS cs.CV cs.LG

    A Benchmark of Selected Algorithmic Differentiation Tools on Some Problems in Computer Vision and Machine Learning

    Authors: Filip Šrajer, Zuzana Kukelova, Andrew Fitzgibbon

    Abstract: Algorithmic differentiation (AD) allows exact computation of derivatives given only an implementation of an objective function. Although many AD tools are available, a proper and efficient implementation of AD methods is not straightforward. The existing tools are often too different to allow for a general test suite. In this paper, we compare fifteen ways of computing derivatives including eleven… ▽ More

    Submitted 26 July, 2018; originally announced July 2018.

    Comments: Previous versions of this article appeared at AD2016---7th International Conference on Algorithmic Differentiation, and in Optimization Methods and Software, Taylor and Francis, Feb 2018 (online)

  20. arXiv:1806.02136  [pdf, other

    cs.MS cs.LG cs.PL cs.SC stat.ML

    Efficient Differentiable Programming in a Functional Array-Processing Language

    Authors: Amir Shaikhha, Andrew Fitzgibbon, Dimitrios Vytiniotis, Simon Peyton Jones, Christoph Koch

    Abstract: We present a system for the automatic differentiation of a higher-order functional array-processing language. The core functional language underlying this system simultaneously supports both source-to-source automatic differentiation and global optimizations such as loop transformations. Thanks to this feature, we demonstrate how for some real-world machine learning and computer vision benchmarks,… ▽ More

    Submitted 6 June, 2018; originally announced June 2018.

  21. arXiv:1802.03773  [pdf, other

    math.NA cs.MS

    QRkit: Sparse, Composable QR Decompositions for Efficient and Stable Solutions to Problems in Computer Vision

    Authors: Jan Svoboda, Thomas Cashman, Andrew Fitzgibbon

    Abstract: Embedded computer vision applications increasingly require the speed and power benefits of single-precision (32 bit) floating point. However, applications which make use of Levenberg-like optimization can lose significant accuracy when reducing to single precision, sometimes unrecoverably so. This accuracy can be regained using solvers based on QR rather than Cholesky decomposition, but the absenc… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.

  22. arXiv:1711.11566  [pdf, other

    cs.LG cs.CV

    Hybrid VAE: Improving Deep Generative Models using Partial Observations

    Authors: Sergey Tulyakov, Andrew Fitzgibbon, Sebastian Nowozin

    Abstract: Deep neural network models trained on large labeled datasets are the state-of-the-art in a large variety of computer vision tasks. In many applications, however, labeled data is expensive to obtain or requires a time consuming manual annotation process. In contrast, unlabeled data is often abundant and available in large quantities. We present a principled framework to capitalize on unlabeled data… ▽ More

    Submitted 30 November, 2017; originally announced November 2017.

  23. arXiv:1708.01654  [pdf, other

    cs.CV

    Better Together: Joint Reasoning for Non-rigid 3D Reconstruction with Specularities and Shading

    Authors: Qi Liu-Yin, Rui Yu, Lourdes Agapito, Andrew Fitzgibbon, Chris Russell

    Abstract: We demonstrate the use of shape-from-shading (SfS) to improve both the quality and the robustness of 3D reconstruction of dynamic objects captured by a single camera. Unlike previous approaches that made use of SfS as a post-processing step, we offer a principled integrated approach that solves dynamic object tracking and reconstruction and SfS as a single unified cost function. Moving beyond Lamb… ▽ More

    Submitted 4 August, 2017; originally announced August 2017.

    Comments: Submitted to IJCV

  24. arXiv:1704.06843  [pdf, other

    cs.CV

    On the Two-View Geometry of Unsynchronized Cameras

    Authors: Cenek Albl, Zuzana Kukelova, Andrew Fitzgibbon, Jan Heller, Matej Smid, Tomas Pajdla

    Abstract: We present new methods for simultaneously estimating camera geometry and time shift from video sequences from multiple unsynchronized cameras. Algorithms for simultaneous computation of a fundamental matrix or a homography with unknown time shift between images are developed. Our methods use minimal correspondence sets (eight for fundamental matrix and four and a half for homography) and therefore… ▽ More

    Submitted 22 April, 2017; originally announced April 2017.

    Comments: 12 pages, 9 figures, Computer Vision and Pattern Recognition (CVPR) 2017