Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Tangkaratt, V

.
  1. arXiv:2103.07084  [pdf, other

    stat.ML cs.AI cs.LG

    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information

    Authors: Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

    Abstract: Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an app… ▽ More

    Submitted 12 April, 2022; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: 35 pages

  2. arXiv:2010.10181  [pdf, other

    stat.ML cs.AI cs.LG

    Robust Imitation Learning from Noisy Demonstrations

    Authors: Voot Tangkaratt, Nontawat Charoenphakdee, Masashi Sugiyama

    Abstract: Robust learning from noisy demonstrations is a practical but highly challenging problem in imitation learning. In this paper, we first theoretically show that robust imitation learning can be achieved by optimizing a classification risk with a symmetric loss. Based on this theoretical finding, we then propose a new imitation learning method that optimizes the classification risk by effectively com… ▽ More

    Submitted 19 February, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

    Comments: 16 pages, 9 figures. Accepted to AISTATS 2021

  3. arXiv:2006.02608  [pdf, ps, other

    cs.LG stat.ML

    Meta-Model-Based Meta-Policy Optimization

    Authors: Takuya Hiraoka, Takahisa Imagawa, Voot Tangkaratt, Takayuki Osa, Takashi Onishi, Yoshimasa Tsuruoka

    Abstract: Model-based meta-reinforcement learning (RL) methods have recently been shown to be a promising approach to improving the sample efficiency of RL in multi-task settings. However, the theoretical understanding of those methods is yet to be established, and there is currently no theoretical guarantee of their performance in a real-world environment. In this paper, we analyze the performance guarante… ▽ More

    Submitted 11 October, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: ACML 2021. Video demo: https://drive.google.com/file/d/1DRA-pmIWnHGNv5G_gFrml8YzKCtMcGnu/view?usp=sharing URL Source code: https://github.com/TakuyaHiraoka/Meta-Model-Based-Meta-Policy-Optimization

  4. arXiv:1909.06769  [pdf, other

    cs.LG stat.ML

    VILD: Variational Imitation Learning with Diverse-quality Demonstrations

    Authors: Voot Tangkaratt, Bo Han, Mohammad Emtiyaz Khan, Masashi Sugiyama

    Abstract: The goal of imitation learning (IL) is to learn a good policy from high-quality demonstrations. However, the quality of demonstrations in reality can be diverse, since it is easier and cheaper to collect demonstrations from a mix of experts and amateurs. IL in such situations can be challenging, especially when the level of demonstrators' expertise is unknown. We propose a new IL method called \un… ▽ More

    Submitted 15 September, 2019; originally announced September 2019.

  5. arXiv:1901.09387  [pdf, other

    cs.LG cs.AI stat.ML

    Imitation Learning from Imperfect Demonstration

    Authors: Yueh-Hua Wu, Nontawat Charoenphakdee, Han Bao, Voot Tangkaratt, Masashi Sugiyama

    Abstract: Imitation learning (IL) aims to learn an optimal policy from demonstrations. However, such demonstrations are often imperfect since collecting optimal ones is costly. To effectively learn from imperfect demonstrations, we propose a novel approach that utilizes confidence scores, which describe the quality of demonstrations. More specifically, we propose two confidence-based IL methods, namely two-… ▽ More

    Submitted 29 January, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

  6. arXiv:1901.01365  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

    Authors: Takayuki Osa, Voot Tangkaratt, Masashi Sugiyama

    Abstract: Real-world tasks are often highly structured. Hierarchical reinforcement learning (HRL) has attracted research interest as an approach for leveraging the hierarchical structure of a given task in reinforcement learning (RL). However, identifying the hierarchical policy structure that enhances the performance of RL is not a trivial task. In this paper, we propose an HRL method that learns a latent… ▽ More

    Submitted 7 March, 2019; v1 submitted 4 January, 2019; originally announced January 2019.

    Comments: 16 pages, ICLR 2019

  7. TD-Regularized Actor-Critic Methods

    Authors: Simone Parisi, Voot Tangkaratt, Jan Peters, Mohammad Emtiyaz Khan

    Abstract: Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. To avoid such issues, we propose to regularize the learning objec… ▽ More

    Submitted 25 February, 2019; v1 submitted 19 December, 2018; originally announced December 2018.

  8. arXiv:1812.02632  [pdf, other

    cs.LG cs.AI stat.ML

    Active Deep Q-learning with Demonstration

    Authors: Si-An Chen, Voot Tangkaratt, Hsuan-Tien Lin, Masashi Sugiyama

    Abstract: Recent research has shown that although Reinforcement Learning (RL) can benefit from expert demonstration, it usually takes considerable efforts to obtain enough demonstration. The efforts prevent training decent RL agents with expert demonstration in practice. In this work, we propose Active Reinforcement Learning with Demonstration (ARLD), a new framework to streamline RL in terms of demonstrati… ▽ More

    Submitted 6 December, 2018; originally announced December 2018.

    Journal ref: Mach Learn 109, 1699-1725 (2020)

  9. arXiv:1806.04854  [pdf, other

    stat.ML cs.AI cs.LG stat.CO

    Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam

    Authors: Mohammad Emtiyaz Khan, Didrik Nielsen, Voot Tangkaratt, Wu Lin, Yarin Gal, Akash Srivastava

    Abstract: Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented w… ▽ More

    Submitted 2 August, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

    Comments: Camera ready version

    Journal ref: Thirty-fifth International Conference on Machine Learning, 2018

  10. arXiv:1712.01038  [pdf, other

    stat.ML cs.LG

    Vprop: Variational Inference using RMSprop

    Authors: Mohammad Emtiyaz Khan, Zuozhu Liu, Voot Tangkaratt, Yarin Gal

    Abstract: Many computationally-efficient methods for Bayesian deep learning rely on continuous optimization algorithms, but the implementation of these methods requires significant changes to existing code-bases. In this paper, we propose Vprop, a method for Gaussian variational inference that can be implemented with two minor changes to the off-the-shelf RMSprop optimizer. Vprop also reduces the memory req… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

  11. arXiv:1711.05560  [pdf, other

    stat.ML cs.LG

    Variational Adaptive-Newton Method for Explorative Learning

    Authors: Mohammad Emtiyaz Khan, Wu Lin, Voot Tangkaratt, Zuozhu Liu, Didrik Nielsen

    Abstract: We present the Variational Adaptive Newton (VAN) method which is a black-box optimization method especially suitable for explorative-learning tasks such as active learning and reinforcement learning. Similar to Bayesian methods, VAN estimates a distribution that can be used for exploration, but requires computations that are similar to continuous optimization methods. Our theoretical contribution… ▽ More

    Submitted 15 November, 2017; originally announced November 2017.

  12. arXiv:1705.07606  [pdf, other

    stat.ML

    Guide Actor-Critic for Continuous Control

    Authors: Voot Tangkaratt, Abbas Abdolmaleki, Masashi Sugiyama

    Abstract: Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GAC).… ▽ More

    Submitted 21 February, 2018; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: ICLR 2018

  13. arXiv:1611.03231  [pdf, ps, other

    stat.ML cs.LG

    Policy Search with High-Dimensional Context Variables

    Authors: Voot Tangkaratt, Herke van Hoof, Simone Parisi, Gerhard Neumann, Jan Peters, Masashi Sugiyama

    Abstract: Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such a… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.

  14. arXiv:1508.01019  [pdf, ps, other

    stat.ML

    Direct Estimation of the Derivative of Quadratic Mutual Information with Application in Supervised Dimension Reduction

    Authors: Voot Tangkaratt, Hiroaki Sasaki, Masashi Sugiyama

    Abstract: A typical goal of supervised dimension reduction is to find a low-dimensional subspace of the input space such that the projected input variables preserve maximal information about the output variables. The dependence maximization approach solves the supervised dimension reduction problem through maximizing a statistical dependence between projected input variables and output variables. A well-kno… ▽ More

    Submitted 5 August, 2015; originally announced August 2015.

  15. arXiv:1405.2406  [pdf, ps, other

    cs.RO

    Efficient Reuse of Previous Experiences to Improve Policies in Real Environment

    Authors: Norikazu Sugimoto, Voot Tangkaratt, Thijs Wensveen, Tingting Zhao, Masashi Sugiyama, Jun Morimoto

    Abstract: In this study, we show that a movement policy can be improved efficiently using the previous experiences of a real robot. Reinforcement Learning (RL) is becoming a popular approach to acquire a nonlinear optimal policy through trial and error. However, it is considered very difficult to apply RL to real robot control since it usually requires many learning trials. Such trials cannot be executed in… ▽ More

    Submitted 10 May, 2014; originally announced May 2014.

    Comments: 14 pages, 9 figures, http://www.cns.atr.jp/bri/en/. arXiv admin note: substantial text overlap with arXiv:1301.3966

  16. arXiv:1404.6876  [pdf, ps, other

    cs.LG stat.ML

    Conditional Density Estimation with Dimensionality Reduction via Squared-Loss Conditional Entropy Minimization

    Authors: Voot Tangkaratt, Ning Xie, Masashi Sugiyama

    Abstract: Regression aims at estimating the conditional mean of output given input. However, regression is not informative enough if the conditional density is multimodal, heteroscedastic, and asymmetric. In such a case, estimating the conditional density itself is preferable, but conditional density estimation (CDE) is challenging in high-dimensional space. A naive approach to coping with high-dimensionali… ▽ More

    Submitted 28 April, 2014; originally announced April 2014.

  17. arXiv:1307.5118  [pdf, ps, other

    stat.ML cs.LG

    Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

    Authors: Syogo Mori, Voot Tangkaratt, Tingting Zhao, Jun Morimoto, Masashi Sugiyama

    Abstract: The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Although using many samples tends to improve the accuracy of policy learning, collecting a large number of samples is often expensive in practice. On the other… ▽ More

    Submitted 18 July, 2013; originally announced July 2013.

  18. arXiv:1301.3966  [pdf, ps, other

    cs.LG stat.ML

    Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

    Authors: Tingting Zhao, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama

    Abstract: The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy gradient estimates for reliable policy updates. In this paper, we combine the following three ideas and give a highly effective policy gradient method: (a) the polic… ▽ More

    Submitted 16 January, 2013; originally announced January 2013.