Zum Hauptinhalt springen

Showing 1–16 of 16 results for author: Savinov, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2211.15089  [pdf, other

    cs.CL cs.LG

    Continuous diffusion for categorical data

    Authors: Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H. Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, Rémi Leblond, Will Grathwohl, Jonas Adler

    Abstract: Diffusion models have quickly become the go-to paradigm for generative modelling of perceptual signals (such as images and sound) through iterative refinement. Their success hinges on the fact that the underlying physical phenomena are continuous. For inherently discrete and categorical data such as language, various diffusion-inspired alternatives have been proposed. However, the continuous natur… ▽ More

    Submitted 15 December, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 26 pages, 8 figures; corrections and additional information about hyperparameters

  4. arXiv:2211.04236  [pdf, other

    cs.CL cs.LG

    Self-conditioned Embedding Diffusion for Text Generation

    Authors: Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, Rémi Leblond

    Abstract: Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as is standard in language modeling. We propose Self-conditioned Embedding Diffusion, a continuous diffusion mechanism that operates on token embeddings and allows… ▽ More

    Submitted 8 November, 2022; originally announced November 2022.

    Comments: 15 pages

  5. arXiv:2112.06749  [pdf, other

    cs.CL cs.LG

    Step-unrolled Denoising Autoencoders for Text Generation

    Authors: Nikolay Savinov, Junyoung Chung, Mikolaj Binkowski, Erich Elsen, Aaron van den Oord

    Abstract: In this paper we propose a new generative model of text, Step-unrolled Denoising Autoencoder (SUNDAE), that does not rely on autoregressive models. Similarly to denoising diffusion techniques, SUNDAE is repeatedly applied on a sequence of tokens, starting from random inputs and improving them each time until convergence. We present a simple new improvement operator that converges in fewer iteratio… ▽ More

    Submitted 19 April, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted to ICLR 2022

  6. arXiv:2012.05672  [pdf, other

    cs.LG cs.AI cs.MA

    Imitating Interactive Intelligence

    Authors: Josh Abramson, Arun Ahuja, Iain Barr, Arthur Brussee, Federico Carnevale, Mary Cassin, Rachita Chhaparia, Stephen Clark, Bogdan Damoc, Andrew Dudzik, Petko Georgiev, Aurelia Guy, Tim Harley, Felix Hill, Alden Hung, Zachary Kenton, Jessica Landon, Timothy Lillicrap, Kory Mathewson, Soňa Mokrá, Alistair Muldal, Adam Santoro, Nikolay Savinov, Vikrant Varma, Greg Wayne , et al. (4 additional authors not shown)

    Abstract: A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central cha… ▽ More

    Submitted 20 January, 2021; v1 submitted 10 December, 2020; originally announced December 2020.

  7. arXiv:1906.10491  [pdf, other

    cs.CV

    Discrete Optimization of Ray Potentials for Semantic 3D Reconstruction

    Authors: Nikolay Savinov, Lubor Ladicky, Christian Haene, Marc Pollefeys

    Abstract: Dense semantic 3D reconstruction is typically formulated as a discrete or continuous problem over label assignments in a voxel grid, combining semantic and depth likelihoods in a Markov Random Field framework. The depth and semantic information is incorporated as a unary potential, smoothed by a pairwise regularizer. However, modelling likelihoods as a unary potential does not model the problem co… ▽ More

    Submitted 25 June, 2019; originally announced June 2019.

    Comments: Published at CVPR 2015

  8. arXiv:1901.03991  [pdf, other

    cs.CV

    RNN-based Generative Model for Fine-Grained Sketching

    Authors: Andrin Jenal, Nikolay Savinov, Torsten Sattler, Gaurav Chaurasia

    Abstract: Deep generative models have shown great promise when it comes to synthesising novel images. While they can generate images that look convincing on a higher-level, generating fine-grained details is still a challenge. In order to foster research on more powerful generative approaches, this paper proposes a novel task: generative modelling of 2D tree skeletons. Trees are an interesting shape class b… ▽ More

    Submitted 13 January, 2019; originally announced January 2019.

    Comments: Includes supplemental material. Link to datasets to be added shortly

  9. arXiv:1810.02274  [pdf, other

    cs.LG cs.AI cs.CV cs.RO stat.ML

    Episodic Curiosity through Reachability

    Authors: Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly

    Abstract: Rewards are sparse in the real world and most of today's reinforcement learning algorithms struggle with such sparsity. One solution to this problem is to allow the agent to create rewards for itself - thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up w… ▽ More

    Submitted 6 August, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

    Comments: Accepted to ICLR 2019. Code at https://github.com/google-research/episodic-curiosity/. Videos at https://sites.google.com/view/episodic-curiosity/

  10. arXiv:1806.06498  [pdf, other

    cs.RO cs.LG eess.SY

    Conditional Affordance Learning for Driving in Urban Environments

    Authors: Axel Sauer, Nikolay Savinov, Andreas Geiger

    Abstract: Most existing approaches to autonomous driving fall into one of two categories: modular pipelines, that build an extensive model of the environment, and imitation learning approaches, that map images directly to control outputs. A recently proposed third paradigm, direct perception, aims to combine the advantages of both by using a neural network to learn appropriate low-dimensional intermediate r… ▽ More

    Submitted 3 November, 2018; v1 submitted 18 June, 2018; originally announced June 2018.

    Comments: Accepted for Conference on Robot Learning (CoRL) 2018

  11. arXiv:1803.00653  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Semi-parametric Topological Memory for Navigation

    Authors: Nikolay Savinov, Alexey Dosovitskiy, Vladlen Koltun

    Abstract: We introduce a new memory architecture for navigation in previously unseen environments, inspired by landmark-based navigation in animals. The proposed semi-parametric topological memory (SPTM) consists of a (non-parametric) graph with nodes corresponding to locations in the environment and a (parametric) deep network capable of retrieving nodes from the graph based on observations. The graph stor… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

    Comments: Published at International Conference on Learning Representations (ICLR) 2018. Project website at https://sites.google.com/view/SPTM

  12. arXiv:1705.08272  [pdf, other

    cs.CV cs.LG cs.NE

    Matching neural paths: transfer from recognition to correspondence search

    Authors: Nikolay Savinov, Lubor Ladicky, Marc Pollefeys

    Abstract: Many machine learning tasks require finding per-part correspondences between objects. In this work we focus on low-level correspondences - a highly ambiguous matching problem. We propose to use a hierarchical semantic representation of the objects, coming from a convolutional neural network, to solve this ambiguity. Training it for low-level correspondence prediction directly might not be an optio… ▽ More

    Submitted 5 November, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: Accepted at NIPS 2017

  13. arXiv:1704.03847  [pdf, other

    cs.CV cs.LG cs.NE cs.RO

    Semantic3D.net: A new Large-scale Point Cloud Classification Benchmark

    Authors: Timo Hackel, Nikolay Savinov, Lubor Ladicky, Jan D. Wegner, Konrad Schindler, Marc Pollefeys

    Abstract: This paper presents a new 3D point cloud classification benchmark data set with over four billion manually labelled points, meant as input for data-hungry (deep) learning methods. We also discuss first submissions to the benchmark that use deep convolutional neural networks (CNNs) as a work horse, which already show remarkable performance improvements over state-of-the-art. CNNs have become the de… ▽ More

    Submitted 12 April, 2017; originally announced April 2017.

    Comments: Accepted to ISPRS Annals. The benchmark website is available at http://www.semantic3d.net/ . The baseline code is available at https://github.com/nsavinov/semantic3dnet

  14. arXiv:1611.07571  [pdf, other

    cs.CV cs.LG cs.NE

    Quad-networks: unsupervised learning to rank for interest point detection

    Authors: Nikolay Savinov, Akihito Seki, Lubor Ladicky, Torsten Sattler, Marc Pollefeys

    Abstract: Several machine learning tasks require to represent the data using only a sparse set of interest points. An ideal detector is able to find the corresponding interest points even if the data undergo a transformation typical for a given domain. Since the task is of high practical interest in computer vision, many hand-crafted solutions were proposed. In this paper, we ask a fundamental question: can… ▽ More

    Submitted 10 April, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: Accepted at CVPR 2017

  15. arXiv:1604.06318  [pdf, other

    cs.CV

    TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Networks

    Authors: Dmitry Laptev, Nikolay Savinov, Joachim M. Buhmann, Marc Pollefeys

    Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformation invariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number… ▽ More

    Submitted 22 September, 2016; v1 submitted 21 April, 2016; originally announced April 2016.

    Comments: Accepted at CVPR 2016. The first two authors assert equal contribution and joint first authorship

  16. arXiv:1604.02885  [pdf, other

    cs.CV

    Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

    Authors: Nikolay Savinov, Christian Haene, Lubor Ladicky, Marc Pollefeys

    Abstract: We propose an approach for dense semantic 3D reconstruction which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization. Our formulation is a convex relaxation which we augment with a crucial non-convex constraint that ensures exact handling of visibility. To tackle the non-convex minimization problem, we propose a majorize-minimize ty… ▽ More

    Submitted 26 August, 2019; v1 submitted 11 April, 2016; originally announced April 2016.

    Comments: Accepted as a spotlight oral paper by CVPR 2016. Code at https://github.com/nsavinov/ray_potentials/