Zum Hauptinhalt springen

Showing 1–7 of 7 results for author: Shivanna, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2209.05310  [pdf, other

    cs.IR cs.LG

    On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models

    Authors: Rohan Anil, Sandra Gadanho, Da Huang, Nijith Jacob, Zhuoshu Li, Dong Lin, Todd Phillips, Cristina Pop, Kevin Regan, Gil I. Shamir, Rakesh Shivanna, Qiqi Yan

    Abstract: For industrial-scale advertising systems, prediction of ad click-through rate (CTR) is a central problem. Ad clicks constitute a significant class of user engagements and are often used as the primary signal for the usefulness of ads to users. Additionally, in cost-per-click advertising systems where advertisers are charged per click, click rate expectations feed directly into value estimation. Ac… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

    Comments: ORSUM - ACM RecSys, September 23, 2022, Seattle, WA

  4. arXiv:2202.00834  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Nonlinear Initialization Methods for Low-Rank Neural Networks

    Authors: Kiran Vodrahalli, Rakesh Shivanna, Maheswaran Sathiamoorthy, Sagar Jain, Ed H. Chi

    Abstract: We propose a novel low-rank initialization framework for training low-rank deep neural networks -- networks where the weight parameters are re-parameterized by products of two low-rank matrices. The most successful prior existing approach, spectral initialization, draws a sample from the initialization distribution for the full-rank setting and then optimally approximates the full-rank initializat… ▽ More

    Submitted 19 May, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

    Comments: 32 pages, 4 figures, in submission. fixed some errors in previous versions and re-structured/re-focused the paper

  5. arXiv:2008.13535  [pdf, other

    cs.IR cs.LG stat.ML

    DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems

    Authors: Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi

    Abstract: Learning effective feature crosses is the key behind building recommender systems. However, the sparse and large feature space requires exhaustive search to identify effective crosses. Deep & Cross Network (DCN) was proposed to automatically and efficiently learn bounded-degree predictive feature interactions. Unfortunately, in models that serve web-scale traffic with billions of training examples… ▽ More

    Submitted 20 October, 2020; v1 submitted 19 August, 2020; originally announced August 2020.

    Journal ref: In Proceedings of the Web Conference 2021 (WWW '21)

  6. arXiv:2002.03532  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding and Improving Knowledge Distillation

    Authors: Jiaxi Tang, Rakesh Shivanna, Zhe Zhao, Dong Lin, Anima Singh, Ed H. Chi, Sagar Jain

    Abstract: Knowledge Distillation (KD) is a model-agnostic technique to improve model quality while having a fixed capacity budget. It is a commonly used technique for model compression, where a larger capacity teacher model with better quality is used to train a more compact student model with better inference efficiency. Through distillation, one hopes to benefit from student's compactness, without sacrifi… ▽ More

    Submitted 28 February, 2021; v1 submitted 9 February, 2020; originally announced February 2020.

  7. arXiv:1811.02161  [pdf, other

    cs.LG stat.ML

    How Many Pairwise Preferences Do We Need to Rank A Graph Consistently?

    Authors: Aadirupa Saha, Rakesh Shivanna, Chiranjib Bhattacharyya

    Abstract: We consider the problem of optimal recovery of true ranking of $n$ items from a randomly chosen subset of their pairwise preferences. It is well known that without any further assumption, one requires a sample size of $Ω(n^2)$ for the purpose. We analyze the problem with an additional structure of relational graph $G([n],E)$ over the $n$ items added with an assumption of \emph{locality}: Neighbori… ▽ More

    Submitted 11 February, 2019; v1 submitted 5 November, 2018; originally announced November 2018.

    Comments: In Thirty-Third AAAI Conference on Artificial Intelligence, 2019