Skip to main content

Showing 1–35 of 35 results for author: Strub, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19188  [pdf, other

    cs.LG

    Averaging log-likelihoods in direct alignment

    Authors: Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist

    Abstract: To better align Large Language Models (LLMs) with human judgment, Reinforcement Learning from Human Feedback (RLHF) learns a reward model and then optimizes it using regularized RL. Recently, direct alignment methods were introduced to learn such a fine-tuned model directly from a preference dataset without computing a proxy reward function. These methods are built upon contrastive losses involvin… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.19185  [pdf, other

    cs.LG

    Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion

    Authors: Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist

    Abstract: Reinforcement Learning (RL) has been used to finetune Large Language Models (LLMs) using a reward model trained from preference data, to better align with human judgment. The recently introduced direct alignment methods, which are often simpler, more stable, and computationally lighter, can more directly achieve this. However, these approaches cannot optimize arbitrary rewards, and the preference-… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2404.19409  [pdf, other

    cs.CL

    Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

    Authors: Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel Dupoux, Olivier Pietquin

    Abstract: While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization:… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  4. arXiv:2403.11958  [pdf, other

    cs.CL cs.MA

    Language Evolution with Deep Learning

    Authors: Mathieu Rita, Paul Michel, Rahma Chaabouni, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

    Abstract: Computational modeling plays an essential role in the study of language emergence. It aims to simulate the conditions and learning processes that could trigger the emergence of a structured language within a simulated controlled environment. Several methods have been used to investigate the origin of our language, including agent-based systems, Bayesian agents, genetic algorithms, and rule-based s… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: to appear in the Oxford Handbook of Approaches to Language Evolution

  5. arXiv:2312.07551  [pdf, other

    cs.CL

    Language Model Alignment with Elastic Reset

    Authors: Michael Noukhovitch, Samuel Lavoie, Florian Strub, Aaron Courville

    Abstract: Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how differ… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Published at NeurIPS 2023

  6. arXiv:2302.04817  [pdf, other

    cs.LG

    The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

    Authors: Pierre H. Richemond, Allison Tam, Yunhao Tang, Florian Strub, Bilal Piot, Felix Hill

    Abstract: Self-predictive unsupervised learning methods such as BYOL or SimSiam have shown impressive results, and counter-intuitively, do not collapse to trivial representations. In this work, we aim at exploring the simplest possible mathematical arguments towards explaining the underlying mechanisms behind self-predictive unsupervised learning. We start with the observation that those methods crucially r… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  7. arXiv:2301.05158  [pdf, other

    cs.CV cs.AI cs.LG

    SemPPL: Predicting pseudo-labels for better contrastive representations

    Authors: Matko Bošnjak, Pierre H. Richemond, Nenad Tomasev, Florian Strub, Jacob C. Walker, Felix Hill, Lars Holger Buesing, Razvan Pascanu, Charles Blundell, Jovana Mitrovic

    Abstract: Learning from large amounts of unsupervised data and a small amount of supervision is an important open problem in computer vision. We propose a new semi-supervised learning method, Semantic Positives via Pseudo-Labels (SemPPL), that combines labelled and unlabelled data to learn informative representations. Our method extends self-supervised contrastive learning -- where representations are shape… ▽ More

    Submitted 10 January, 2024; v1 submitted 12 January, 2023; originally announced January 2023.

    Comments: Published as a conference paper at ICLR 2023. For checkpoints and source code see https://github.com/google-deepmind/semppl

  8. arXiv:2211.01480  [pdf, other

    cs.MA cs.CL cs.HC

    Over-communicate no more: Situated RL agents learn concise communication protocols

    Authors: Aleksandra Kalinowska, Elnaz Davoodi, Florian Strub, Kory W Mathewson, Ivana Kajic, Michael Bowling, Todd D Murphey, Patrick M Pilarski

    Abstract: While it is known that communication facilitates cooperation in multi-agent settings, it is unclear how to design artificial agents that can learn to effectively and efficiently communicate with each other. Much research on communication emergence uses reinforcement learning (RL) and explores unsituated communication in one-step referential tasks -- the tasks are not temporally interactive and lac… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

  9. arXiv:2209.15342  [pdf, other

    cs.MA cs.CL cs.IT

    Emergent Communication: Generalization and Overfitting in Lewis Games

    Authors: Mathieu Rita, Corentin Tallec, Paul Michel, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux, Florian Strub

    Abstract: Lewis signaling games are a class of simple communication games for simulating the emergence of language. In these games, two agents must agree on a communication protocol in order to solve a cooperative task. Previous work has shown that agents trained to play this game with reinforcement learning tend to develop languages that display undesirable properties from a linguistic point of view (lack… ▽ More

    Submitted 15 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  10. arXiv:2209.10958  [pdf, ps, other

    cs.MA cs.AI

    Developing, Evaluating and Scaling Learning Agents in Multi-Agent Environments

    Authors: Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, Siqi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov , et al. (2 additional authors not shown)

    Abstract: The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks. A signature aim of our group is to use the resources and expertise made available to us at DeepMind in d… ▽ More

    Submitted 22 September, 2022; originally announced September 2022.

    Comments: Published in AI Communications 2022

  11. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  12. arXiv:2204.12982  [pdf, other

    cs.MA

    On the role of population heterogeneity in emergent communication

    Authors: Mathieu Rita, Florian Strub, Jean-Bastien Grill, Olivier Pietquin, Emmanuel Dupoux

    Abstract: Populations have often been perceived as a structuring component for language to emerge and evolve: the larger the population, the more structured the language. While this observation is widespread in the sociolinguistic literature, it has not been consistently reproduced in computer simulations with neural agents. In this paper, we thus aim to clarify this apparent contradiction. We explore emerg… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

    Comments: International Conference on Learning Representations (ICLR) 2022

  13. arXiv:2109.09371  [pdf, other

    cs.AI cs.CL cs.NE stat.ML

    Learning Natural Language Generation from Scratch

    Authors: Alice Martin Donati, Guillaume Quispe, Charles Ollion, Sylvain Le Corff, Florian Strub, Olivier Pietquin

    Abstract: This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach to train conditional language models from scratch by only using reinforcement learning (RL). AsRL methods unsuccessfully scale to large action spaces, we dynamically truncate the vocabulary spaceusing a generic language model. TrufLL thus enables to train a language agent by solely interacting withi… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  14. arXiv:2105.09992  [pdf, other

    cs.LG

    Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

    Authors: Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

    Abstract: Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the… ▽ More

    Submitted 31 May, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at Internationnal Joint Conference on Artificial Intelligence (IJCAI'21) and Self-Supervision for Reinforcement Learning Workshop (SSL-RL @ICLR'21)

  15. arXiv:2103.16559  [pdf, other

    cs.CV

    Broaden Your Views for Self-Supervised Video Learning

    Authors: Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Ross Hemsley, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-Bastien Grill, Aäron van den Oord, Andrew Zisserman

    Abstract: Most successful self-supervised learning methods are trained to align the representations of two independent views from the data. State-of-the-art methods in video are inspired by image techniques, where these two views are similarly extracted by cropping and augmenting the resulting crop. However, these methods miss a crucial element in the video domain: time. We introduce BraVe, a self-supervise… ▽ More

    Submitted 19 October, 2021; v1 submitted 30 March, 2021; originally announced March 2021.

    Comments: This paper is an extended version of our ICCV-21 paper. It includes more results as well as a minor architectural variation which improves results

  16. arXiv:2010.10241  [pdf, ps, other

    stat.ML cs.CV cs.LG

    BYOL works even without batch statistics

    Authors: Pierre H. Richemond, Jean-Bastien Grill, Florent Altché, Corentin Tallec, Florian Strub, Andrew Brock, Samuel Smith, Soham De, Razvan Pascanu, Bilal Piot, Michal Valko

    Abstract: Bootstrap Your Own Latent (BYOL) is a self-supervised learning approach for image representation. From an augmented view of an image, BYOL trains an online network to predict a target network representation of a different augmented view of the same image. Unlike contrastive methods, BYOL does not explicitly use a repulsion term built from negative pairs in its training objective. Yet, it avoids co… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  17. arXiv:2010.02975  [pdf, other

    cs.CL

    Supervised Seeded Iterated Learning for Interactive Language Learning

    Authors: Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

    Abstract: Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P j… ▽ More

    Submitted 6 October, 2020; originally announced October 2020.

  18. arXiv:2008.03127  [pdf, other

    eess.AS cs.LG cs.SD

    A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

    Authors: Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

    Abstract: Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speak… ▽ More

    Submitted 7 August, 2020; originally announced August 2020.

  19. arXiv:2007.08620  [pdf, other

    cs.LG cs.AI stat.ML

    The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

    Authors: Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin

    Abstract: This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random fun… ▽ More

    Submitted 15 December, 2020; v1 submitted 15 July, 2020; originally announced July 2020.

  20. arXiv:2006.07733  [pdf, other

    cs.LG cs.CV stat.ML

    Bootstrap your own latent: A new approach to self-supervised Learning

    Authors: Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko

    Abstract: We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the… ▽ More

    Submitted 10 September, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

  21. arXiv:2003.12694  [pdf, other

    cs.AI cs.CL

    Countering Language Drift with Seeded Iterated Learning

    Authors: Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

    Abstract: Pretraining on human corpus and then finetuning in a simulator has become a standard pipeline for training a goal-oriented dialogue agent. Nevertheless, as soon as the agents are finetuned to maximize task completion, they suffer from the so-called language drift phenomenon: they slowly lose syntactic and semantic properties of language as they only focus on solving the task. In this paper, we pro… ▽ More

    Submitted 24 August, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

  22. arXiv:1910.09451  [pdf, other

    cs.LG cs.CL stat.ML

    HIGhER : Improving instruction following with Hindsight Generation for Experience Replay

    Authors: Geoffrey Cideron, Mathieu Seurin, Florian Strub, Olivier Pietquin

    Abstract: Language creates a compact representation of the world and allows the description of unlimited situations and objectives through compositionality. While these characterizations may foster instructing, conditioning or structuring interactive agent behavior, it remains an open-problem to correctly relate language understanding and reinforcement learning in even simple instruction following scenarios… ▽ More

    Submitted 10 December, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted at ADPRL'20

  23. arXiv:1903.02988  [pdf, other

    physics.data-an cs.NE physics.ins-det

    Accurate reconstruction of EBSD datasets by a multimodal data approach using an evolutionary algorithm

    Authors: Marie-Agathe Charpagne, Florian Strub, Tresa M. Pollock

    Abstract: A new method has been developed for the correction of the distortions and/or enhanced phase differentiation in Electron Backscatter Diffraction (EBSD) data. Using a multi-modal data approach, the method uses segmented images of the phase of interest (laths, precipitates, voids, inclusions) on images gathered by backscattered or secondary electrons of the same area as the EBSD map. The proposed app… ▽ More

    Submitted 8 March, 2019; v1 submitted 7 March, 2019; originally announced March 2019.

    Comments: A short version of this paper exists towards people working in Machine Learning, namely arxiv:1903.02982

    Journal ref: Materials Characterization, 2019

  24. arXiv:1903.02982  [pdf, other

    cs.CV physics.ins-det

    Correction of Electron Back-scattered Diffraction datasets using an evolutionary algorithm

    Authors: Florian Strub, Marie-Agathe Charpagne, Tresa M. Pollock

    Abstract: In materials science and particularly electron microscopy, Electron Back-scatter Diffraction (EBSD) is a common and powerful mapping technique for collecting local crystallographic data at the sub-micron scale. The quality of the reconstruction of the maps is critical to study the spatial distribution of phases and crystallographic orientation relationships between phases, a key interest in materi… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: This short paper target an audience working in Machine Learning. A long version of this paper exists towards people working in Materials (more experiments, more experimental details and analysis), namely arXiv:1903.02988

  25. arXiv:1812.02648  [pdf, other

    cs.AI cs.LG

    Deep Reinforcement Learning and the Deadly Triad

    Authors: Hado van Hasselt, Yotam Doron, Florian Strub, Matteo Hessel, Nicolas Sonnerat, Joseph Modayil

    Abstract: We know from reinforcement learning theory that temporal difference learning can fail in certain cases. Sutton and Barto (2018) identify a deadly triad of function approximation, bootstrapping, and off-policy learning. When these three properties are combined, learning can diverge with the value estimates becoming unbounded. However, several algorithms successfully combine these three properties,… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

  26. arXiv:1808.04446  [pdf, other

    cs.CV cs.CL cs.LG stat.ML

    Visual Reasoning with Multi-hop Feature Modulation

    Authors: Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

    Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to… ▽ More

    Submitted 12 October, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: In Proc of ECCV 2018

  27. arXiv:1711.11017  [pdf, other

    cs.AI cs.CL cs.CV cs.RO cs.SD eess.AS

    HoME: a Household Multimodal Environment

    Authors: Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville

    Abstract: We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: Presented at NIPS 2017's Visually-Grounded Interaction and Language Workshop

  28. arXiv:1709.07871  [pdf, other

    cs.CV cs.AI cs.CL stat.ML

    FiLM: Visual Reasoning with a General Conditioning Layer

    Authors: Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

    Abstract: We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process -… ▽ More

    Submitted 18 December, 2017; v1 submitted 22 September, 2017; originally announced September 2017.

    Comments: AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.03017

  29. arXiv:1707.03017  [pdf, other

    cs.CV cs.AI cs.CL stat.ML

    Learning Visual Reasoning Without Strong Priors

    Authors: Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville

    Abstract: Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than lear… ▽ More

    Submitted 18 December, 2017; v1 submitted 10 July, 2017; originally announced July 2017.

    Comments: Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/film

  30. arXiv:1707.00683  [pdf, other

    cs.CV cs.CL cs.LG

    Modulating early visual processing by language

    Authors: Harm de Vries, Florian Strub, Jérémie Mary, Hugo Larochelle, Olivier Pietquin, Aaron Courville

    Abstract: It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view dominates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and pro… ▽ More

    Submitted 18 December, 2017; v1 submitted 2 July, 2017; originally announced July 2017.

    Comments: Advances in Neural Information Processing Systems 30 (NIPS 2017)

  31. arXiv:1703.05423  [pdf, other

    cs.CL

    End-to-end optimization of goal-driven and visually grounded dialogue systems

    Authors: Florian Strub, Harm de Vries, Jeremie Mary, Bilal Piot, Aaron Courville, Olivier Pietquin

    Abstract: End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too s… ▽ More

    Submitted 15 March, 2017; originally announced March 2017.

  32. arXiv:1611.08481  [pdf, other

    cs.AI cs.CV

    GuessWhat?! Visual object discovery through multi-modal dialogue

    Authors: Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, Hugo Larochelle, Aaron Courville

    Abstract: We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the colle… ▽ More

    Submitted 6 February, 2017; v1 submitted 23 November, 2016; originally announced November 2016.

    Comments: 23 pages; CVPR 2017 submission; see https://guesswhat.ai

  33. arXiv:1606.08718  [pdf, ps, other

    cs.GT

    Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

    Authors: Julien Pérolat, Florian Strub, Bilal Piot, Olivier Pietquin

    Abstract: This paper addresses the problem of learning a Nash equilibrium in $γ$-discounted multiplayer general-sum Markov Games (MG). A key component of this model is the possibility for the players to either collaborate or team apart to increase their rewards. Building an artificial player for general-sum MGs implies to learn more complex strategies which are impossible to obtain by using techniques devel… ▽ More

    Submitted 6 March, 2017; v1 submitted 28 June, 2016; originally announced June 2016.

    Comments: 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, Florida, USA. JMLR: W&CP volume 54

    Report number: CRIStAL, UMR 9189

  34. Hybrid Recommender System based on Autoencoders

    Authors: Florian Strub, Romaric Gaudel, Jérémie Mary

    Abstract: A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings. In the last decades, few attempts where done to handle that objective with Neural Networks, but recently an architecture based on Autoencoders proved to be a promising approach. In current paper, we enhanced that arc… ▽ More

    Submitted 29 December, 2017; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: arXiv admin note: substantial text overlap with arXiv:1603.00806

    Journal ref: the 1st Workshop on Deep Learning for Recommender Systems, Sep 2016, Boston, United States. pp.11 - 16, 2016

  35. arXiv:1603.00806  [pdf, other

    cs.IR cs.AI cs.NE

    Hybrid Collaborative Filtering with Autoencoders

    Authors: Florian Strub, Jeremie Mary, Romaric Gaudel

    Abstract: Collaborative Filtering aims at exploiting the feedback of users to provide personalised recommendations. Such algorithms look for latent variables in a large sparse matrix of ratings. They can be enhanced by adding side information to tackle the well-known cold start problem. While Neu-ral Networks have tremendous success in image and speech recognition, they have received less attention in Colla… ▽ More

    Submitted 19 July, 2016; v1 submitted 2 March, 2016; originally announced March 2016.