Zum Hauptinhalt springen

Showing 1–8 of 8 results for author: Perrin, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2007.13363  [pdf, other

    cs.AI

    Learning Compositional Neural Programs for Continuous Control

    Authors: Thomas Pierrot, Nicolas Perrin, Feryal Behbahani, Alexandre Laterre, Olivier Sigaud, Karim Beguir, Nando de Freitas

    Abstract: We propose a novel solution to challenging sparse-reward, continuous control problems that require hierarchical planning at multiple levels of abstraction. Our solution, dubbed AlphaNPI-X, involves three separate stages of learning. First, we use off-policy reinforcement learning algorithms with experience replay to learn a set of atomic goal-conditioned policies, which can be easily repurposed fo… ▽ More

    Submitted 13 April, 2021; v1 submitted 27 July, 2020; originally announced July 2020.

  2. arXiv:2006.07042  [pdf, other

    cs.LG cs.AI cs.GT

    Recurrent Neural Networks for Stochastic Control in Real-Time Bidding

    Authors: Nicolas Grislain, Nicolas Perrin, Antoine Thabault

    Abstract: Bidding in real-time auctions can be a difficult stochastic control task; especially if underdelivery incurs strong penalties and the market is very uncertain. Most current works and implementations focus on optimally delivering a campaign given a reasonable forecast of the market. Practical implementations have a feedback loop to adjust and be robust to forecasting errors, but no implementation,… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Journal ref: 2019. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Association for Computing Machinery, New York, NY, USA

  3. arXiv:2004.11667  [pdf, other

    cs.RO cs.AI cs.LG stat.ML

    PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

    Authors: Guillaume Matheron, Nicolas Perrin, Olivier Sigaud

    Abstract: The exploration-exploitation trade-off is at the heart of reinforcement learning (RL). However, most continuous control benchmarks used in recent RL research only require local exploration. This led to the development of algorithms that have basic exploration capabilities, and behave poorly in benchmarks that require more versatile exploration. For instance, as demonstrated in our empirical study,… ▽ More

    Submitted 24 April, 2020; originally announced April 2020.

  4. The problem with DDPG: understanding failures in deterministic environments with sparse rewards

    Authors: Guillaume Matheron, Nicolas Perrin, Olivier Sigaud

    Abstract: In environments with continuous state and action spaces, state-of-the-art actor-critic reinforcement learning algorithms can solve very complex problems, yet can also fail in environments that seem trivial, but the reason for such failures is still poorly understood. In this paper, we contribute a formal explanation of these failures in the particular case of sparse reward and deterministic enviro… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

    Comments: 19 pages, submitted to ICLR 2020

  5. arXiv:1905.12941  [pdf, other

    cs.AI

    Learning Compositional Neural Programs with Recursive Tree Search and Planning

    Authors: Thomas Pierrot, Guillaume Ligner, Scott Reed, Olivier Sigaud, Nicolas Perrin, Alexandre Laterre, David Kas, Karim Beguir, Nando de Freitas

    Abstract: We propose a novel reinforcement learning algorithm, AlphaNPI, that incorporates the strengths of Neural Programmer-Interpreters (NPI) and AlphaZero. NPI contributes structural biases in the form of modularity, hierarchy and recursion, which are helpful to reduce sample complexity, improve generalization and increase interpretability. AlphaZero contributes powerful neural network guided search alg… ▽ More

    Submitted 13 April, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

  6. arXiv:1810.08102  [pdf, other

    cs.LG stat.ML

    First-order and second-order variants of the gradient descent in a unified framework

    Authors: Thomas Pierrot, Nicolas Perrin, Olivier Sigaud

    Abstract: In this paper, we provide an overview of first-order and second-order variants of the gradient descent method that are commonly used in machine learning. We propose a general framework in which 6 of these variants can be interpreted as different instances of the same approach. They are the vanilla gradient descent, the classical and generalized Gauss-Newton methods, the natural gradient descent me… ▽ More

    Submitted 14 August, 2021; v1 submitted 18 October, 2018; originally announced October 2018.

    Comments: 13 pages

  7. arXiv:1808.05832  [pdf, other

    cs.LG stat.ML

    Importance mixing: Improving sample reuse in evolutionary policy search methods

    Authors: Aloïs Pourchot, Nicolas Perrin, Olivier Sigaud

    Abstract: Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve… ▽ More

    Submitted 17 August, 2018; originally announced August 2018.

  8. Visibly Tree Automata with Memory and Constraints

    Authors: Hubert Comon-Lundh, Florent Jacquemard, Nicolas Perrin

    Abstract: Tree automata with one memory have been introduced in 2001. They generalize both pushdown (word) automata and the tree automata with constraints of equality between brothers of Bogaert and Tison. Though it has a decidable emptiness problem, the main weakness of this model is its lack of good closure properties. We propose a generalization of the visibly pushdown automata of Alur and Madhusudan… ▽ More

    Submitted 17 June, 2008; v1 submitted 18 April, 2008; originally announced April 2008.

    Comments: 36 pages including an appendix

    ACM Class: F.1.1; F.1.2; I.2.2; I.2.3

    Journal ref: Logical Methods in Computer Science, Volume 4, Issue 2 (June 18, 2008) lmcs:827