Skip to main content

Showing 1–14 of 14 results for author: Kabra, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.07726  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    PaliGemma: A versatile 3B VLM for transfer

    Authors: Lucas Beyer, Andreas Steiner, André Susano Pinto, Alexander Kolesnikov, Xiao Wang, Daniel Salz, Maxim Neumann, Ibrahim Alabdulmohsin, Michael Tschannen, Emanuele Bugliarello, Thomas Unterthiner, Daniel Keysers, Skanda Koppula, Fangyu Liu, Adam Grycner, Alexey Gritsenko, Neil Houlsby, Manoj Kumar, Keran Rong, Julian Eisenschlos, Rishabh Kabra, Matthias Bauer, Matko Bošnjak, Xi Chen, Matthias Minderer , et al. (10 additional authors not shown)

    Abstract: PaliGemma is an open Vision-Language Model (VLM) that is based on the SigLIP-So400m vision encoder and the Gemma-2B language model. It is trained to be a versatile and broadly knowledgeable base model that is effective to transfer. It achieves strong performance on a wide variety of open-world tasks. We evaluate PaliGemma on almost 40 diverse tasks including standard VLM benchmarks, but also more… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2406.09292  [pdf, other

    cs.CV cs.AI cs.LG

    Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

    Authors: Ziyi Wu, Yulia Rubanova, Rishabh Kabra, Drew A. Hudson, Igor Gilitschenski, Yusuf Aytar, Sjoerd van Steenkiste, Kelsey R. Allen, Thomas Kipf

    Abstract: We address the problem of multi-object 3D pose control in image diffusion models. Instead of conditioning on a sequence of text tokens, we propose to use a set of per-object representations, Neural Assets, to control the 3D pose of individual objects in a scene. Neural Assets are obtained by pooling visual representations of objects from a reference image, such as a frame in a video, and are train… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Additional details and video results are available at https://neural-assets-paper.github.io/

  3. arXiv:2311.17851  [pdf, other

    cs.CV

    Leveraging VLM-Based Pipelines to Annotate 3D Objects

    Authors: Rishabh Kabra, Loic Matthey, Alexander Lerchner, Niloy J. Mitra

    Abstract: Pretrained vision language models (VLMs) present an opportunity to caption unlabeled 3D objects at scale. The leading approach to summarize VLM descriptions from different views of an object (Luo et al., 2023) relies on a language model (GPT4) to produce the final output. This text-based aggregation is susceptible to hallucinations as it merges potentially contradictory descriptions. We propose an… ▽ More

    Submitted 17 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  4. Serving Hybrid-Cloud SQL Interactive Queries at Twitter

    Authors: Chunxu Tang, Beinan Wang, Huijun Wu, Zhenzhao Wang, Yao Li, Vrushali Channapattan, Zhenxiao Luo, Ruchin Kabra, Mainak Ghosh, Nikhil Kantibhai Navadiya, Prachi Mishra, Prateek Mukhedkar, Anneliese Lu

    Abstract: The demand for data analytics has been consistently increasing in the past years at Twitter. In order to fulfill the requirements and provide a highly scalable and available query experience, a large-scale in-house SQL system is heavily relied on. Recently, we evolved the SQL system into a hybrid-cloud SQL federation system, compliant with Twitter's Partly Cloudy strategy. The hybrid-cloud SQL fed… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: Submitted to ECSA 2021 post-proceedings

  5. arXiv:2204.11338  [pdf, other

    cs.DB cs.DC

    Taming Hybrid-Cloud Fast and Scalable Graph Analytics at Twitter

    Authors: Chunxu Tang, Yao Li, Zhenxiao Luo, Mainak Ghosh, Huijun Wu, Lu Zhang, Anneliese Lu, Ruchin Kabra, Nikhil Kantibhai Navadiya, Prachi Mishra, Prateek Mukhedkar, Vrushali Channapattan

    Abstract: We have witnessed a boosted demand for graph analytics at Twitter in recent years, and graph analytics has become one of the key parts of Twitter's large-scale data analytics and machine learning for driving engagement, serving the most relevant content, and promoting healthier conversations. However, infrastructure for graph analytics has historically not been an area of investment at Twitter, re… ▽ More

    Submitted 25 August, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

    Comments: 6 pages, 7 figures, accepted at IEEE GLOBECOM 2022

  6. Forecasting SQL Query Cost at Twitter

    Authors: Chunxu Tang, Beinan Wang, Zhenxiao Luo, Huijun Wu, Shajan Dasan, Maosong Fu, Yao Li, Mainak Ghosh, Ruchin Kabra, Nikhil Kantibhai Navadiya, Da Cheng, Fred Dai, Vrushali Channapattan, Prachi Mishra

    Abstract: With the advent of the Big Data era, it is usually computationally expensive to calculate the resource usages of a SQL query with traditional DBMS approaches. Can we estimate the cost of each query more efficiently without any computation in a SQL engine kernel? Can machine learning techniques help to estimate SQL query resource utilization? The answers are yes. We propose a SQL query cost predict… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: 2021 IEEE International Conference on Cloud Engineering (IC2E). IEEE, 2021

  7. arXiv:2110.07549  [pdf, other

    cs.LG

    Time Series Clustering for Human Behavior Pattern Mining

    Authors: Rohan Kabra, Divya Saxena, Dhaval Patel, Jiannong Cao

    Abstract: Human behavior modeling deals with learning and understanding behavior patterns inherent in humans' daily routines. Existing pattern mining techniques either assume human dynamics is strictly periodic, or require the number of modes as input, or do not consider uncertainty in the sensor data. To handle these issues, in this paper, we propose a novel clustering approach for modeling human behavior… ▽ More

    Submitted 24 October, 2021; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 16 pages

  8. arXiv:2107.11153  [pdf, other

    cs.LG cs.AI stat.ML

    Constellation: Learning relational abstractions over objects for compositional imagination

    Authors: James C. R. Whittington, Rishabh Kabra, Loic Matthey, Christopher P. Burgess, Alexander Lerchner

    Abstract: Learning structured representations of visual scenes is currently a major bottleneck to bridging perception with reasoning. While there has been exciting progress with slot-based models, which learn to segment scenes into sets of objects, learning configurational properties of entire groups of objects is still under-explored. To address this problem, we introduce Constellation, a network that lear… ▽ More

    Submitted 23 July, 2021; originally announced July 2021.

  9. arXiv:2106.03849  [pdf, other

    cs.CV cs.LG

    SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

    Authors: Rishabh Kabra, Daniel Zoran, Goker Erdogan, Loic Matthey, Antonia Creswell, Matthew Botvinick, Alexander Lerchner, Christopher P. Burgess

    Abstract: To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is especially difficult when scene structure needs to be inferred while also estimating the agent's location/viewpoint, as the two variables jointly give rise to t… ▽ More

    Submitted 6 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Animated figures are available at https://sites.google.com/view/simone-scene-understanding/

  10. arXiv:2103.04693  [pdf, other

    cs.CV cs.AI

    Unsupervised Object-Based Transition Models for 3D Partially Observable Environments

    Authors: Antonia Creswell, Rishabh Kabra, Chris Burgess, Murray Shanahan

    Abstract: We present a slot-wise, object-based transition model that decomposes a scene into objects, aligns them (with respect to a slot-wise object memory) to maintain a consistent order across time, and predicts how those objects evolve over successive frames. The model is trained end-to-end without supervision using losses at the level of the object-structured representation rather than pixels. Thanks t… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

  11. arXiv:2007.08973  [pdf, other

    cs.CV cs.AI cs.LG

    AlignNet: Unsupervised Entity Alignment

    Authors: Antonia Creswell, Kyriacos Nikiforou, Oriol Vinyals, Andre Saraiva, Rishabh Kabra, Loic Matthey, Chris Burgess, Malcolm Reynolds, Richard Tanburn, Marta Garnelo, Murray Shanahan

    Abstract: Recently developed deep learning models are able to learn to segment scenes into component objects without supervision. This opens many new and exciting avenues of research, allowing agents to take objects (or entities) as inputs, rather that pixels. Unfortunately, while these models provide excellent segmentation of a single frame, they do not keep track of how objects segmented at one time-step… ▽ More

    Submitted 21 July, 2020; v1 submitted 17 July, 2020; originally announced July 2020.

  12. arXiv:1903.00450  [pdf, other

    cs.LG cs.CV stat.ML

    Multi-Object Representation Learning with Iterative Variational Inference

    Authors: Klaus Greff, Raphaël Lopez Kaufman, Rishabh Kabra, Nick Watters, Chris Burgess, Daniel Zoran, Loic Matthey, Matthew Botvinick, Alexander Lerchner

    Abstract: Human perception is structured around objects which form the basis for our higher-level cognition and impressive systematic generalization abilities. Yet most work on representation learning focuses on feature learning without even considering multiple objects, or treats segmentation as an (often supervised) preprocessing step. Instead, we argue for the importance of learning to segment and repres… ▽ More

    Submitted 27 July, 2020; v1 submitted 1 March, 2019; originally announced March 2019.

    Journal ref: ICML 2019 (PMLR 97:2424-2433)

  13. arXiv:1901.11390  [pdf, other

    cs.CV cs.LG stat.ML

    MONet: Unsupervised Scene Decomposition and Representation

    Authors: Christopher P. Burgess, Loic Matthey, Nicholas Watters, Rishabh Kabra, Irina Higgins, Matt Botvinick, Alexander Lerchner

    Abstract: The ability to decompose scenes in terms of abstract building blocks is crucial for general intelligence. Where those basic building blocks share meaningful properties, interactions and other regularities across scenes, such decompositions can simplify reasoning and facilitate imagination of novel scenarios. In particular, representing perceptual observations in terms of entities should improve da… ▽ More

    Submitted 22 January, 2019; originally announced January 2019.

  14. arXiv:1901.03559  [pdf, other

    cs.LG cs.AI stat.ML

    An investigation of model-free planning

    Authors: Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, Greg Wayne, David Silver, Timothy Lillicrap

    Abstract: The field of reinforcement learning (RL) is facing increasingly challenging domains with combinatorial complexity. For an RL agent to address these challenges, it is essential that it can plan effectively. Prior work has typically utilized an explicit model of the environment, combined with a specific planning algorithm (such as tree search). More recently, a new family of methods have been propos… ▽ More

    Submitted 20 May, 2019; v1 submitted 11 January, 2019; originally announced January 2019.