Zum Hauptinhalt springen

Showing 1–50 of 70 results for author: Plaat, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09807  [pdf, other

    cs.AI

    World Models Increase Autonomy in Reinforcement Learning

    Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu

    Abstract: Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB)… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2407.18597  [pdf, other

    cs.LG cs.AI cs.CY eess.SY stat.ML

    Reinforcement Learning for Sustainable Energy: A Survey

    Authors: Koen Ponse, Felix Kleuker, Márton Fejér, Álvaro Serra-Gómez, Aske Plaat, Thomas Moerland

    Abstract: The transition to sustainable energy is a key challenge of our time, requiring modifications in the entire pipeline of energy production, storage, transmission, and consumption. At every stage, new sequential decision-making challenges emerge, ranging from the operation of wind farms to the management of electrical grids or the scheduling of electric vehicle charging stations. All such problems ar… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 22 pages excluding references, 40 pages including references, 7 images

  3. arXiv:2407.11511  [pdf, other

    cs.AI cs.CL cs.LG

    Reasoning with Large Language Models, a Survey

    Authors: Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki van Stein, Thomas Back

    Abstract: Scaling up language models to billions of parameters has opened up possibilities for in-context learning, allowing instruction tuning and few-shot learning on tasks that the model was not specifically trained for. This has achieved breakthrough performance on language tasks such as translation, summarization, and question-answering. Furthermore, in addition to these associative "System 1" tasks, r… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  4. arXiv:2405.13092  [pdf, other

    cs.AI cs.LG

    CausalPlayground: Addressing Data-Generation Requirements in Cutting-Edge Causality Research

    Authors: Andreas W M Sauter, Erman Acar, Aske Plaat

    Abstract: Research on causal effects often relies on synthetic data due to the scarcity of real-world datasets with ground-truth effects. Since current data-generating tools do not always meet all requirements for state-of-the-art research, ad-hoc methods are often employed. This leads to heterogeneity among datasets and delays research progress. We address the shortcomings of current data-generating librar… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  5. arXiv:2403.13705  [pdf, ps, other

    cs.AI

    Research Re: search & Re-search

    Authors: Aske Plaat

    Abstract: Search algorithms are often categorized by their node expansion strategy. One option is the depth-first strategy, a simple backtracking strategy that traverses the search space in the order in which successor nodes are generated. An alternative is the best-first strategy, which was designed to make it possible to use domain-specific heuristic information. By exploring promising parts of the search… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: PhD thesis Aske Plaat 20 June 1996. AlphaBeta, SSS*, MTD(f)

  6. arXiv:2403.09713  [pdf, other

    cs.AI cs.CL cs.HC

    A Hybrid Intelligence Method for Argument Mining

    Authors: Michiel van der Meer, Enrico Liscio, Catholijn M. Jonker, Aske Plaat, Piek Vossen, Pradeep K. Murukannaiah

    Abstract: Large-scale survey tools enable the collection of citizen feedback in opinion corpora. Extracting the key arguments from a large and noisy set of opinions helps in understanding the opinions quickly and accurately. Fully automated methods can extract arguments but (1) require large labeled datasets that induce large annotation costs and (2) work well for known viewpoints, but not for novel points… ▽ More

    Submitted 1 August, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Published in JAIR

    Journal ref: Journal of Artificial Intelligence Research (JAIR), 80:1187-1222, 2024

  7. arXiv:2402.06912  [pdf, other

    cs.LG cs.AI

    Solving Deep Reinforcement Learning Tasks with Evolution Strategies and Linear Policy Networks

    Authors: Annie Wong, Jacob de Nobel, Thomas Bäck, Aske Plaat, Anna V. Kononova

    Abstract: Although deep reinforcement learning methods can learn effective policies for challenging problems such as Atari games and robotics tasks, algorithms are complex, and training times are often long. This study investigates how Evolution Strategies perform compared to gradient-based deep reinforcement learning methods. We use Evolution Strategies to optimize the weights of a neural network via neuro… ▽ More

    Submitted 24 July, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

  8. arXiv:2402.03326  [pdf, other

    cs.CV cs.LG

    Slot Structured World Models

    Authors: Jonathan Collu, Riccardo Majellaro, Aske Plaat, Thomas M. Moerland

    Abstract: The ability to perceive and reason about individual objects and their interactions is a goal to be achieved for building intelligent artificial systems. State-of-the-art approaches use a feedforward encoder to extract object embeddings and a latent graph neural network to model the interaction between these object embeddings. However, the feedforward encoder can not extract {\it object-centric} re… ▽ More

    Submitted 8 January, 2024; originally announced February 2024.

  9. arXiv:2401.16974  [pdf, other

    cs.LG cs.AI

    CORE: Towards Scalable and Efficient Causal Discovery with Reinforcement Learning

    Authors: Andreas W. M. Sauter, Nicolò Botteghi, Erman Acar, Aske Plaat

    Abstract: Causal discovery is the challenging task of inferring causal structure from data. Motivated by Pearl's Causal Hierarchy (PCH), which tells us that passive observations alone are not enough to distinguish correlation from causation, there has been a recent push to incorporate interventions into machine learning research. Reinforcement learning provides a convenient framework for such an active appr… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: To be published In Proc. of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024), Auckland, New Zealand, May 6 - 10, 2024, IFAAMAS

    ACM Class: I.2.6; I.2.8

  10. arXiv:2401.10148  [pdf, other

    cs.CV cs.AI cs.LG

    Explicitly Disentangled Representations in Object-Centric Learning

    Authors: Riccardo Majellaro, Jonathan Collu, Aske Plaat, Thomas M. Moerland

    Abstract: Extracting structured representations from raw visual data is an important and long-standing challenge in machine learning. Recently, techniques for unsupervised learning of object-centric representations have raised growing interest. In this context, enhancing the robustness of the latent features can improve the efficiency and effectiveness of the training of downstream tasks. A promising step i… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  11. arXiv:2311.10590  [pdf, other

    cs.LG cs.AI cs.CY stat.ML

    EduGym: An Environment and Notebook Suite for Reinforcement Learning Education

    Authors: Thomas M. Moerland, Matthias Müller-Brockhausen, Zhao Yang, Andrius Bernatavicius, Koen Ponse, Tom Kouwenhoven, Andreas Sauter, Michiel van der Meer, Bram Renting, Aske Plaat

    Abstract: Due to the empirical success of reinforcement learning, an increasing number of students study the subject. However, from our practical teaching experience, we see students entering the field (bachelor, master and early PhD) often struggle. On the one hand, textbooks and (online) lectures provide the fundamentals, but students find it hard to translate between equations and code. On the other hand… ▽ More

    Submitted 22 February, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

  12. arXiv:2310.14139  [pdf, other

    cs.LG cs.AI stat.ML

    Are LSTMs Good Few-Shot Learners?

    Authors: Mike Huisman, Thomas M. Moerland, Aske Plaat, Jan N. van Rijn

    Abstract: Deep learning requires large amounts of data to learn new tasks well, limiting its applicability to domains where such data is available. Meta-learning overcomes this limitation by learning how to learn. In 2001, Hochreiter et al. showed that an LSTM trained with backpropagation across different tasks is capable of meta-learning. Despite promising results of this approach on small problems, and mo… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

    Comments: Accepted at Machine Learning Journal, Special Issue of the ECML PKDD 2023 Journal Track

  13. arXiv:2310.09028  [pdf, other

    cs.LG cs.AI stat.ML

    Subspace Adaptation Prior for Few-Shot Learning

    Authors: Mike Huisman, Aske Plaat, Jan N. van Rijn

    Abstract: Gradient-based meta-learning techniques aim to distill useful prior knowledge from a set of training tasks such that new tasks can be learned more efficiently with gradient descent. While these methods have achieved successes in various scenarios, they commonly adapt all parameters of trainable layers when learning new tasks. This neglects potentially more efficient learning strategies for a given… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted at Machine Learning Journal, Special Issue of the ECML PKDD 2023 Journal Track

  14. arXiv:2310.06148  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Transfer Learning and Gradient-Based Meta-Learning Techniques

    Authors: Mike Huisman, Aske Plaat, Jan N. van Rijn

    Abstract: Deep neural networks can yield good performance on various tasks but often require large amounts of data to train them. Meta-learning received considerable attention as one approach to improve the generalization of these networks from a limited amount of data. Whilst meta-learning techniques have been observed to be successful at this in various scenarios, recent results suggest that when evaluate… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted at Machine Learning Journal, Special Issue on Discovery Science 2021

  15. arXiv:2309.07033  [pdf, other

    cs.HC

    Human-Robot Co-Creativity: A Scoping Review -- Informing a Research Agenda for Human-Robot Co-Creativity with Older Adults

    Authors: Marianne Bossema, Somaya Ben Allouch, Aske Plaat, Rob Saunders

    Abstract: This review is the first step in a long-term research project exploring how social robotics and AI-generated content can contribute to the creative experiences of older adults, with a focus on collaborative drawing and painting. We systematically searched and selected literature on human-robot co-creativity, and analyzed articles to identify methods and strategies for researching co-creative robot… ▽ More

    Submitted 15 September, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

  16. arXiv:2309.01664  [pdf, ps, other

    cs.CL cs.AI cs.HC

    Fine-grained Affective Processing Capabilities Emerging from Large Language Models

    Authors: Joost Broekens, Bernhard Hilpert, Suzan Verberne, Kim Baraka, Patrick Gebhard, Aske Plaat

    Abstract: Large language models, in particular generative pre-trained transformers (GPTs), show impressive results on a wide variety of language-related tasks. In this paper, we explore ChatGPT's zero-shot ability to perform affective computing tasks using prompting alone. We show that ChatGPT a) performs meaningful sentiment analysis in the Valence, Arousal and Dominance dimensions, b) has meaningful emoti… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

  17. arXiv:2304.10098  [pdf, other

    cs.LG cs.AI

    Two-Memory Reinforcement Learning

    Authors: Zhao Yang, Thomas. M. Moerland, Mike Preuss, Aske Plaat

    Abstract: While deep reinforcement learning has shown important empirical success, it tends to learn relatively slow due to slow propagation of rewards information and slow update of parametric neural networks. Non-parametric episodic memory, on the other hand, provides a faster learning alternative that does not require representation learning and uses maximum episodic return as state-action values for act… ▽ More

    Submitted 23 April, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  18. arXiv:2212.03251  [pdf, other

    cs.LG cs.AI cs.RO

    First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

    Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat

    Abstract: Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper, we present… ▽ More

    Submitted 6 January, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2203.16311

  19. arXiv:2211.15183  [pdf, other

    cs.LG cs.AI cs.RO

    Continuous Episodic Control

    Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat

    Abstract: Non-parametric episodic memory can be used to quickly latch onto high-rewarded experience in reinforcement learning tasks. In contrast to parametric deep reinforcement learning approaches in which reward signals need to be back-propagated slowly, these methods only need to discover the solution once, and may then repeatedly solve the task. However, episodic control solutions are stored in discrete… ▽ More

    Submitted 23 April, 2023; v1 submitted 28 November, 2022; originally announced November 2022.

  20. arXiv:2203.16311  [pdf, other

    cs.LG cs.AI

    When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation

    Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat

    Abstract: Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper we present… ▽ More

    Submitted 13 April, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  21. arXiv:2203.03292  [pdf, other

    cs.LG cs.AI

    On Credit Assignment in Hierarchical Reinforcement Learning

    Authors: Joery A. de Vries, Thomas M. Moerland, Aske Plaat

    Abstract: Hierarchical Reinforcement Learning (HRL) has held longstanding promise to advance reinforcement learning. Yet, it has remained a considerable challenge to develop practical algorithms that exhibit some of these promises. To improve our fundamental understanding of HRL, we investigate hierarchical credit assignment from the perspective of conventional multistep reinforcement learning. We show how… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Code and Data available at: https://github.com/joeryjoery/HierQ

  22. arXiv:2203.01075  [pdf, other

    cs.LG

    Reliable validation of Reinforcement Learning Benchmarks

    Authors: Matthias Müller-Brockhausen, Aske Plaat, Mike Preuss

    Abstract: Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole, and a wide variety of games are used as its prominent test problems. However, it is subject to the replicability crisis that currently affects most algorithmic AI research. Benchmarking in Reinforcement Learning could be improved through verifiable results. There are numerous benchmark environments… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

  23. Deep Reinforcement Learning, a textbook

    Authors: Aske Plaat

    Abstract: Deep reinforcement learning has gathered much attention recently. Impressive results were achieved in activities as diverse as autonomous driving, game playing, molecular recombination, and robotics. In all these fields, computer programs have taught themselves to solve difficult problems. They have learned to fly model helicopters and perform aerobatic manoeuvers such as loops and rolls. In some… ▽ More

    Submitted 23 April, 2023; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: Revised version 2023, added description of Monte Carlo sampling and N-step algorithm, improved explanation of on-policy and off-policy learning. Preprint available by permission of Publisher

  24. arXiv:2109.05022  [pdf, other

    cs.LG cs.AI

    Potential-based Reward Shaping in Sokoban

    Authors: Zhao Yang, Mike Preuss, Aske Plaat

    Abstract: Learning to solve sparse-reward reinforcement learning problems is difficult, due to the lack of guidance towards the goal. But in some problems, prior knowledge can be used to augment the learning process. Reward shaping is a way to incorporate prior knowledge into the original reward function in order to speed up the learning. While previous work has investigated the use of expert knowledge to g… ▽ More

    Submitted 10 September, 2021; originally announced September 2021.

  25. arXiv:2107.08241  [pdf, other

    cs.LG cs.AI

    High-Accuracy Model-Based Reinforcement Learning, a Survey

    Authors: Aske Plaat, Walter Kosters, Mike Preuss

    Abstract: Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems from game playing and robotics have been solved with deep model-free methods. Unfortunately, the sample complexity of model-free methods is often high. To reduce the number of environment samples, model-based reinforcement learning creates an explicit model of the envi… ▽ More

    Submitted 17 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2008.05598

  26. arXiv:2106.15691  [pdf, other

    cs.LG cs.AI cs.MA cs.NE

    Deep Multiagent Reinforcement Learning: Challenges and Directions

    Authors: Annie Wong, Thomas Bäck, Anna V. Kononova, Aske Plaat

    Abstract: This paper surveys the field of deep multiagent reinforcement learning. The combination of deep neural networks with reinforcement learning has gained increased traction in recent years and is slowly shifting the focus from single-agent to multiagent environments. Dealing with multiple agents is inherently more complex as (a) the future rewards depend on multiple players' joint actions and (b) the… ▽ More

    Submitted 12 October, 2022; v1 submitted 29 June, 2021; originally announced June 2021.

    Comments: 41 pages, 6 figures

    ACM Class: A.1; I.2.6; I.2.8; J.4

  27. Procedural Content Generation: Better Benchmarks for Transfer Reinforcement Learning

    Authors: Matthias Müller-Brockhausen, Mike Preuss, Aske Plaat

    Abstract: The idea of transfer in reinforcement learning (TRL) is intriguing: being able to transfer knowledge from one problem to another problem without learning everything from scratch. This promises quicker learning and learning more complex methods. To gain an insight into the field and to detect emerging trends, we performed a database search. We note a surprisingly late adoption of deep learning that… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

  28. arXiv:2105.11702  [pdf, other

    cs.AI cs.LG

    Transfer Learning and Curriculum Learning in Sokoban

    Authors: Zhao Yang, Mike Preuss, Aske Plaat

    Abstract: Transfer learning can speed up training in machine learning and is regularly used in classification tasks. It reuses prior knowledge from other tasks to pre-train networks for new tasks. In reinforcement learning, learning actions for a behavior policy that can be applied to new environments is still a challenge, especially for tasks that involve much planning. Sokoban is a challenging puzzle game… ▽ More

    Submitted 10 September, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

  29. arXiv:2105.06136  [pdf, other

    cs.AI cs.LG

    Adaptive Warm-Start MCTS in AlphaZero-like Deep Reinforcement Learning

    Authors: Hui Wang, Mike Preuss, Aske Plaat

    Abstract: AlphaZero has achieved impressive performance in deep reinforcement learning by utilizing an architecture that combines search and training of a neural network in self-play. Many researchers are looking for ways to reproduce and improve results for other games/tasks. However, the architecture is designed to learn from scratch, tabula rasa, accepting a cold-start problem in self-play. Recently, a w… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

  30. arXiv:2104.10527  [pdf, other

    cs.LG cs.AI stat.ML

    Stateless Neural Meta-Learning using Second-Order Gradients

    Authors: Mike Huisman, Aske Plaat, Jan N. van Rijn

    Abstract: Deep learning typically requires large data sets and much compute power for each new problem that is learned. Meta-learning can be used to learn a good prior that facilitates quick learning, thereby relaxing these requirements so that new tasks can be learned quicker; two popular approaches are MAML and the meta-learner LSTM. In this work, we compare the two and formally show that the meta-learner… ▽ More

    Submitted 21 April, 2021; originally announced April 2021.

    Journal ref: Machine Learning, 2022

  31. arXiv:2102.12924  [pdf, other

    cs.LG cs.AI stat.ML

    Visualizing MuZero Models

    Authors: Joery A. de Vries, Ken S. Voskuil, Thomas M. Moerland, Aske Plaat

    Abstract: MuZero, a model-based reinforcement learning algorithm that uses a value equivalent dynamics model, achieved state-of-the-art performance in Chess, Shogi and the game of Go. In contrast to standard forward dynamics models that predict a full next state, value equivalent models are trained to predict a future value, thereby emphasizing value relevant information in the representations. While value… ▽ More

    Submitted 3 March, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

  32. arXiv:2010.06488  [pdf, other

    eess.SY cs.AI

    Multiple Node Immunisation for Preventing Epidemics on Networks by Exact Multiobjective Optimisation of Cost and Shield-Value

    Authors: Michael Emmerich, Joost Nibbeling, Marios Kefalas, Aske Plaat

    Abstract: The general problem in this paper is vertex (node) subset selection with the goal to contain an infection that spreads in a network. Instead of selecting the single most important node, this paper deals with the problem of selecting multiple nodes for removal. As compared to previous work on multiple-node selection, the trade-off between cost and benefit is considered. The benefit is measured in t… ▽ More

    Submitted 13 October, 2020; originally announced October 2020.

    Comments: Based on the Master Thesis of Joost Nibbeling, LIACS, Leiden University, The Netherlands

    MSC Class: 90C27; 05C69 ACM Class: J.3; G.2.2; G.4; I.2.8

  33. arXiv:2010.03522  [pdf, other

    cs.LG cs.AI stat.ML

    A Survey of Deep Meta-Learning

    Authors: Mike Huisman, Jan N. van Rijn, Aske Plaat

    Abstract: Deep neural networks can achieve great successes when presented with large data sets and sufficient computational resources. However, their ability to learn new concepts quickly is limited. Meta-learning is one approach to address this issue, by enabling the network to learn how to learn. The field of Deep Meta-Learning advances at great speed, but lacks a unified, in-depth overview of current tec… ▽ More

    Submitted 21 April, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Published in the AI Review (AIRE) Journal (2021)

  34. arXiv:2008.05598  [pdf, other

    cs.LG cs.AI

    Deep Model-Based Reinforcement Learning for High-Dimensional Problems, a Survey

    Authors: Aske Plaat, Walter Kosters, Mike Preuss

    Abstract: Deep reinforcement learning has shown remarkable success in the past few years. Highly complex sequential decision making problems have been solved in tasks such as game playing and robotics. Unfortunately, the sample complexity of most deep reinforcement learning methods is high, precluding their use in some important applications. Model-based reinforcement learning creates an explicit model of t… ▽ More

    Submitted 1 December, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

  35. arXiv:2006.16712  [pdf, other

    cs.LG cs.AI stat.ML

    Model-based Reinforcement Learning: A Survey

    Authors: Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

    Abstract: Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematic… ▽ More

    Submitted 31 March, 2022; v1 submitted 30 June, 2020; originally announced June 2020.

  36. arXiv:2006.15009  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    A Unifying Framework for Reinforcement Learning and Planning

    Authors: Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

    Abstract: Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in… ▽ More

    Submitted 31 March, 2022; v1 submitted 26 June, 2020; originally announced June 2020.

  37. arXiv:2006.07970  [pdf, other

    cs.AI

    Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning

    Authors: Hui Wang, Mike Preuss, Michael Emmerich, Aske Plaat

    Abstract: Morpion Solitaire is a popular single player game, performed with paper and pencil. Due to its large state space (on the order of the game of Go) traditional search algorithms, such as MCTS, have not been able to find good solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to find a new record of 82 steps, albeit with large computational resources. After achieving this record… ▽ More

    Submitted 14 June, 2020; originally announced June 2020.

    Comments: 4 pages, 2 figures. the first/ongoing attempt to tackle Morpion Solitaire using ranked reward reinforcement learning. submitted to SYNASC2020

  38. arXiv:2005.09645  [pdf, other

    cs.AI

    The Second Type of Uncertainty in Monte Carlo Tree Search

    Authors: Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker

    Abstract: Monte Carlo Tree Search (MCTS) efficiently balances exploration and exploitation in tree search based on count-derived uncertainty. However, these local visit counts ignore a second type of uncertainty induced by the size of the subtree below an action. We first show how, due to the lack of this second uncertainty type, MCTS may completely fail in well-known sparse exploration problems, known from… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

    Comments: arXiv admin note: text overlap with arXiv:1805.09218

  39. Warm-Start AlphaZero Self-Play Search Enhancements

    Authors: Hui Wang, Mike Preuss, Aske Plaat

    Abstract: Recently, AlphaZero has achieved landmark results in deep reinforcement learning, by providing a single self-play architecture that learned three different games at super human level. AlphaZero is a large and complicated system with many parameters, and success requires much compute power and fine-tuning. Reproducing results in other games is a challenge, and many researchers are looking for ways… ▽ More

    Submitted 26 April, 2020; originally announced April 2020.

    Journal ref: PPSN 2020

  40. arXiv:2004.00377  [pdf, other

    cs.AI

    A New Challenge: Approaching Tetris Link with AI

    Authors: Matthias Muller-Brockhausen, Mike Preuss, Aske Plaat

    Abstract: Decades of research have been invested in making computer programs for playing games such as Chess and Go. This paper focuses on a new game, Tetris Link, a board game that is still lacking any scientific analysis. Tetris Link has a large branching factor, hampering a traditional heuristic planning approach. We explore heuristic planning and two other approaches: Reinforcement Learning, Monte Carlo… ▽ More

    Submitted 1 April, 2020; originally announced April 2020.

  41. arXiv:2003.05988  [pdf, other

    cs.LG cs.AI cs.NE

    Analysis of Hyper-Parameters for Small Games: Iterations or Epochs in Self-Play?

    Authors: Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat

    Abstract: The landmark achievements of AlphaGo Zero have created great research interest into self-play in reinforcement learning. In self-play, Monte Carlo Tree Search is used to train a deep neural network, that is then used in tree searches. Training itself is governed by many hyperparameters.There has been surprisingly little research on design choices for hyper-parameter values and loss-functions, pres… ▽ More

    Submitted 12 March, 2020; originally announced March 2020.

  42. arXiv:1903.08781  [pdf, other

    cs.AR cs.DC cs.RO eess.SY

    Fault-Tolerant Nanosatellite Computing on a Budget

    Authors: Christian M. Fuchs, Nadia Murillo, Aske Plaat, Erik Van der Kouwe, Daniel Harsono, Todor Stefanov

    Abstract: Micro- and nanosatellites have become popular platforms for a variety of commercial and scientific applications, but today are considered suitable mainly for short and low-priority space missions due to their low reliability. In part, this can be attributed to their reliance upon cheap, low-feature size, COTS components originally designed for embedded and mobile-market applications, for which tra… ▽ More

    Submitted 20 March, 2019; originally announced March 2019.

    Journal ref: Conference on Radiation Effects on Components and Systems 2018 (RADECS)

  43. arXiv:1903.08129  [pdf, ps, other

    cs.LG cs.AI

    Hyper-Parameter Sweep on AlphaZero General

    Authors: Hui Wang, Michael Emmerich, Mike Preuss, Aske Plaat

    Abstract: Since AlphaGo and AlphaGo Zero have achieved breakground successes in the game of Go, the programs have been generalized to solve other tasks. Subsequently, AlphaZero was developed to play Go, Chess and Shogi. In the literature, the algorithms are explained well. However, AlphaZero contains many parameters, and for neither AlphaGo, AlphaGo Zero nor AlphaZero, there is sufficient discussion about h… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

    Comments: 19 pages 13 figures

  44. arXiv:1902.09493  [pdf, other

    cs.DC cs.OS eess.SY

    Dynamic Fault Tolerance Through Resource Pooling

    Authors: Christian M. Fuchs, Nadia M. Murillo, Aske Plaat, Erik van der Kouwe, Todor Stefanov

    Abstract: Miniaturized satellites are currently not considered suitable for critical, high-priority, and complex multi-phased missions, due to their low reliability. As hardware-side fault tolerance (FT) solutions designed for larger spacecraft can not be adopted aboard very small satellites due to budget, energy, and size constraints, we developed a hybrid FT-approach based upon only COTS components, commo… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

    Journal ref: 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS)

  45. arXiv:1810.06078  [pdf, other

    cs.AI

    Assessing the Potential of Classical Q-learning in General Game Playing

    Authors: Hui Wang, Michael Emmerich, Aske Plaat

    Abstract: After the recent groundbreaking results of AlphaGo and AlphaZero, we have seen strong interests in deep reinforcement learning and artificial general intelligence (AGI) in game playing. However, deep learning is resource-intensive and the theory is not yet well developed. For small games, simple classical table-based Q-learning might still be the algorithm of choice. General Game Playing (GGP) pro… ▽ More

    Submitted 14 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: substantial text overlap with arXiv:1802.05944

  46. arXiv:1805.09613  [pdf, other

    stat.ML cs.AI cs.LG cs.RO eess.SY

    A0C: Alpha Zero in Continuous Action Space

    Authors: Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

    Abstract: A core novelty of Alpha Zero is the interleaving of tree search and deep learning, which has proven very successful in board games like Chess, Shogi and Go. These games have a discrete action space. However, many real-world reinforcement learning domains have continuous action spaces, for example in robotic control, navigation and self-driving cars. This paper presents the necessary theoretical ex… ▽ More

    Submitted 24 May, 2018; originally announced May 2018.

  47. arXiv:1805.09218  [pdf, other

    stat.ML cs.AI cs.LG

    Monte Carlo Tree Search for Asymmetric Trees

    Authors: Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

    Abstract: We present an extension of Monte Carlo Tree Search (MCTS) that strongly increases its efficiency for trees with asymmetry and/or loops. Asymmetric termination of search trees introduces a type of uncertainty for which the standard upper confidence bound (UCB) formula does not account. Our first algorithm (MCTS-T), which assumes a non-stochastic environment, backs-up tree structure uncertainty and… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

  48. arXiv:1802.05944  [pdf, other

    cs.AI

    Monte Carlo Q-learning for General Game Playing

    Authors: Hui Wang, Michael Emmerich, Aske Plaat

    Abstract: After the recent groundbreaking results of AlphaGo, we have seen a strong interest in reinforcement learning in game playing. General Game Playing (GGP) provides a good testbed for reinforcement learning. In GGP, a specification of games rules is given. GGP problems can be solved by reinforcement learning. Q-learning is one of the canonical reinforcement learning methods, and has been used by (Ban… ▽ More

    Submitted 21 May, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

  49. arXiv:1708.06931  [pdf, other

    cs.DC cs.OS

    Bringing Fault-Tolerant GigaHertz-Computing to Space: A Multi-Stage Software-Side Fault-Tolerance Approach for Miniaturized Spacecraft

    Authors: Christian M. Fuchs, Todor Stefanov, Nadia Murillo, Aske Plaat

    Abstract: Modern embedded technology is a driving factor in satellite miniaturization, contributing to a massive boom in satellite launches and a rapidly evolving new space industry. Miniaturized satellites, however, suffer from low reliability, as traditional hardware-based fault-tolerance (FT) concepts are ineffective for on-board computers (OBCs) utilizing modern systems-on-a-chip (SoC). Therefore, large… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

    Comments: 26th IEEE Asian Test Symposium 2017, 27-30 Nov 2017, Taipei City, Taiwan

  50. arXiv:1706.02086  [pdf, other

    cs.DC

    Preliminary Performance Estimations and Benchmark Results for a Software-based Fault-Tolerance Approach aboard Miniaturized Satellite Computers

    Authors: Christian M. Fuchs, Todor Stefanov, Nadia Murillo, Aske Plaat

    Abstract: Modern embedded technology is a driving factor in satellite miniaturization, contributing to a massive boom in satellite launches and a rapidly evolving new space industry. Miniaturized satellites however suffer from low reliability, as traditional hardware-based fault-tolerance (FT) concepts are ineffective for on-board computers (OBCs) utilizing modern systems-on-a-chip (SoC). Larger satellites… ▽ More

    Submitted 22 July, 2017; v1 submitted 7 June, 2017; originally announced June 2017.