Zum Hauptinhalt springen

Showing 1–13 of 13 results for author: Melis, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.01848  [pdf, other

    cs.CL

    Circling Back to Recurrent Models of Language

    Authors: Gábor Melis

    Abstract: Just because some purely recurrent models suffer from being hard to optimize and inefficient on today's hardware, they are not necessarily bad models of language. We demonstrate this by the extent to which these models can still be improved by a combination of a slightly better recurrent cell, architecture, objective, as well as optimization. In the process, we establish a new state of the art for… ▽ More

    Submitted 18 April, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

  2. arXiv:2209.12581  [pdf, other

    stat.ML cs.CL cs.LG

    Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization

    Authors: Gábor Melis

    Abstract: Tail Averaging improves on Polyak averaging's non-asymptotic behaviour by excluding a number of leading iterates of stochastic optimization from its calculations. In practice, with a finite number of optimization steps and a learning rate that cannot be annealed to zero, Tail Averaging can get much closer to a local minimum point of the training loss than either the individual iterates or the Poly… ▽ More

    Submitted 17 April, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

  3. arXiv:2012.00708  [pdf, other

    stat.ML cs.CL cs.LG

    Mutual Information Constraints for Monte-Carlo Objectives

    Authors: Gábor Melis, András György, Phil Blunsom

    Abstract: A common failure mode of density models trained as variational autoencoders is to model the data without relying on their latent variables, rendering these variables useless. Two contributing factors, the underspecification of the model and the looseness of the variational lower bound, have been studied separately in the literature. We weave these two strands of research together, specifically the… ▽ More

    Submitted 9 May, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: 32 pages, 29 figures

  4. arXiv:2003.05259  [pdf, other

    cs.CL

    Capturing document context inside sentence-level neural machine translation models with self-training

    Authors: Elman Mansimov, Gábor Melis, Lei Yu

    Abstract: Neural machine translation (NMT) has arguably achieved human level parity when trained and evaluated at the sentence-level. Document-level neural machine translation has received less attention and lags behind its sentence-level counterpart. The majority of the proposed document-level approaches investigate ways of conditioning the model on several source or target sentences to capture document co… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

  5. arXiv:1909.09428  [pdf, other

    cs.CL cs.LG

    A Critical Analysis of Biased Parsers in Unsupervised Parsing

    Authors: Chris Dyer, Gábor Melis, Phil Blunsom

    Abstract: A series of recent papers has used a parsing algorithm due to Shen et al. (2018) to recover phrase-structure trees based on proxies for "syntactic depth." These proxy depths are obtained from the representations learned by recurrent language models augmented with mechanisms that encourage the (unsupervised) discovery of hierarchical structure latent in natural language sentences. Using the same pa… ▽ More

    Submitted 20 September, 2019; originally announced September 2019.

  6. arXiv:1909.01792  [pdf, other

    cs.CL

    Mogrifier LSTM

    Authors: Gábor Melis, Tomáš Kočiský, Phil Blunsom

    Abstract: Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mut… ▽ More

    Submitted 29 January, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

  7. arXiv:1904.03746  [pdf, other

    cs.CL stat.ML

    Unsupervised Recurrent Neural Network Grammars

    Authors: Yoon Kim, Alexander M. Rush, Lei Yu, Adhiguna Kuncoro, Chris Dyer, Gábor Melis

    Abstract: Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve strong language modeling and parsing performance, but require an annotated corpus of parse trees. In this work, we experiment with unsupervised learning of RNNG… ▽ More

    Submitted 4 August, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

    Comments: NAACL 2019

  8. arXiv:1901.09296  [pdf, other

    cs.CL

    Variational Smoothing in Recurrent Neural Network Language Models

    Authors: Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama

    Abstract: We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017). We show that each variant of data noising is an instance of Bayesian recurrent neural networks with a particular variational distribution (i.e., a mixture of Gaussians whose weights depend on statistics derived from the corpus such as the unigram distribution). We use this insig… ▽ More

    Submitted 26 January, 2019; originally announced January 2019.

    Comments: Accepted as a conference paper at ICLR 2019

  9. arXiv:1807.01670  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Encoding Spatial Relations from Natural Language

    Authors: Tiago Ramalho, Tomáš Kočiský, Frederic Besse, S. M. Ali Eslami, Gábor Melis, Fabio Viola, Phil Blunsom, Karl Moritz Hermann

    Abstract: Natural language processing has made significant inroads into learning the semantics of words through distributional approaches, however representations learnt via these methods fail to capture certain kinds of information implicit in the real world. In particular, spatial relations are encoded in a way that is inconsistent with human spatial reasoning and lacking invariance to viewpoint changes.… ▽ More

    Submitted 5 July, 2018; v1 submitted 4 July, 2018; originally announced July 2018.

  10. arXiv:1805.09208  [pdf, other

    stat.ML cs.CL cs.LG

    Pushing the bounds of dropout

    Authors: Gábor Melis, Charles Blundell, Tomáš Kočiský, Karl Moritz Hermann, Chris Dyer, Phil Blunsom

    Abstract: We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that… ▽ More

    Submitted 27 September, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

  11. arXiv:1712.07040  [pdf, other

    cs.CL cs.AI cs.NE

    The NarrativeQA Reading Comprehension Challenge

    Authors: Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

    Abstract: Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecti… ▽ More

    Submitted 19 December, 2017; originally announced December 2017.

  12. arXiv:1707.05589  [pdf, other

    cs.CL

    On the State of the Art of Evaluation in Neural Language Models

    Authors: Gábor Melis, Chris Dyer, Phil Blunsom

    Abstract: Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods w… ▽ More

    Submitted 20 November, 2017; v1 submitted 18 July, 2017; originally announced July 2017.

  13. arXiv:1609.09315  [pdf, other

    cs.CL cs.AI cs.NE

    Semantic Parsing with Semi-Supervised Sequential Autoencoders

    Authors: Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

    Abstract: We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically gener… ▽ More

    Submitted 29 September, 2016; originally announced September 2016.