Zum Hauptinhalt springen

Showing 1–4 of 4 results for author: Gnaneshwar, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08274  [pdf, other

    cs.LG

    BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

    Authors: Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Ustun, Acyr Locatelli

    Abstract: The Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance over dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive. Existing methods mitigate this by pre-training multiple dense expert models independently and using them to initialize an MoE. This is done by using experts' feed… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  2. arXiv:2203.04698  [pdf, other

    cs.LG q-bio.QM

    Score-Based Generative Models for Molecule Generation

    Authors: Dwaraknath Gnaneshwar, Bharath Ramsundar, Dhairya Gandhi, Rachel Kurchin, Venkatasubramanian Viswanathan

    Abstract: Recent advances in generative models have made exploring design spaces easier for de novo molecule generation. However, popular generative models like GANs and normalizing flows face challenges such as training instabilities due to adversarial training and architectural constraints, respectively. Score-based generative models sidestep these challenges by modelling the gradient of the log probabili… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  3. arXiv:2010.14019  [pdf, other

    cs.LG stat.ML

    Know Where To Drop Your Weights: Towards Faster Uncertainty Estimation

    Authors: Akshatha Kamath, Dwaraknath Gnaneshwar, Matias Valdenegro-Toro

    Abstract: Estimating epistemic uncertainty of models used in low-latency applications and Out-Of-Distribution samples detection is a challenge due to the computationally demanding nature of uncertainty estimation techniques. Estimating model uncertainty using approximation techniques like Monte Carlo Dropout (MCD), DropConnect (MCDC) requires a large number of forward passes through the network, rendering t… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: 8 pages, 6 figures, 1 table, with appendix, submitted to a NeurIPS workshop

  4. arXiv:2009.07728  [pdf, other

    cs.CL

    NABU $\mathrm{-}$ Multilingual Graph-based Neural RDF Verbalizer

    Authors: Diego Moussallem, Dwaraknath Gnaneshwar, Thiago Castro Ferreira, Axel-Cyrille Ngonga Ngomo

    Abstract: The RDF-to-text task has recently gained substantial attention due to continuous growth of Linked Data. In contrast to traditional pipeline models, recent studies have focused on neural models, which are now able to convert a set of RDF triples into text in an end-to-end style with promising results. However, English is the only language widely targeted. We address this research gap by presenting… ▽ More

    Submitted 21 September, 2020; v1 submitted 16 September, 2020; originally announced September 2020.

    Comments: International Semantic Web Conference (ISWC) 2020