Search | arXiv e-print repository

When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning

Authors: Claas Voelcker, Tyler Kastner, Igor Gilitschenski, Amir-massoud Farahmand

Abstract: We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation learning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning… ▽ More We investigate the impact of auxiliary learning tasks such as observation reconstruction and latent self-prediction on the representation learning problem in reinforcement learning. We also study how they interact with distractions and observation functions in the MDP. We provide a theoretical analysis of the learning dynamics of observation reconstruction, latent self-prediction, and TD learning in the presence of distractions and observation functions under linear model assumptions. With this formalization, we are able to explain why latent-self prediction is a helpful \emph{auxiliary task}, while observation reconstruction can provide more useful features when used in isolation. Our empirical analysis shows that the insights obtained from our learning dynamics framework predicts the behavior of these loss functions beyond the linear model assumption in non-linear neural networks. This reinforces the usefulness of the linear model framework not only for theoretical analysis, but also practical benefit for applied problems. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2310.19804 [pdf, other]

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Abstract: Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). T… ▽ More Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: Published in TMLR

arXiv:2307.01708 [pdf, other]

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

Authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand

Abstract: We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equiva… ▽ More We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability. △ Less

Submitted 3 December, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2110.11438 [pdf]

doi 10.1109/TASLP.2021.3069302

Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence

Authors: Matteo Torcoli, Thorsten Kastner, Jürgen Herre

Abstract: Over the past few decades, computational methods have been developed to estimate perceptual audio quality. These methods, also referred to as objective quality measures, are usually developed and intended for a specific application domain. Because of their convenience, they are often used outside their original intended domain, even if it is unclear whether they provide reliable quality estimates… ▽ More Over the past few decades, computational methods have been developed to estimate perceptual audio quality. These methods, also referred to as objective quality measures, are usually developed and intended for a specific application domain. Because of their convenience, they are often used outside their original intended domain, even if it is unclear whether they provide reliable quality estimates in this case. This work studies the correlation of well-known state-of-the-art objective measures with human perceptual scores in two different domains: audio coding and source separation. The following objective measures are considered: fwSNRseg, dLLR, PESQ, PEAQ, POLQA, PEMO-Q, ViSQOLAudio, (SI-)BSSEval, PEASS, LKR-PI, 2f-model, and HAAQI. Additionally, a novel measure (SI-SA2f) is presented, based on the 2f-model and a BSSEval-based signal decomposition. We use perceptual scores from 7 listening tests about audio coding and 7 listening tests about source separation as ground-truth data for the correlation analysis. The results show that one method (2f-model) performs significantly better than the others on both domains and indicate that the dataset for training the method and a robust underlying auditory model are crucial factors towards a universal, domain-independent objective measure. △ Less

Submitted 21 October, 2021; originally announced October 2021.

Journal ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021

arXiv:2107.10151 [pdf, other]

doi 10.1109/WASPAA52581.2021.9632756

Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate

Authors: Matteo Torcoli, Jouni Paulus, Thorsten Kastner, Christian Uhle

Abstract: Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source s… ▽ More Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source separation. An alternative operation mode of the measure is proposed, more appropriate when considering material with long inactive periods of the target source. The 2f-model requires the reference target source as an input, but this is not available in many applications. Deep neural networks (DNNs) are trained to estimate the 2f-model intrusively using the reference target (iDNN2f), non-intrusively using the input mix as reference (nDNN2f), and reference-free using only the separated output signal (rDNN2f). It is shown that iDNN2f achieves very strong correlation with the original measure on the test data (Pearson r=0.99), while performance decreases for nDNN2f (r>=0.91) and rDNN2f (r>=0.82). The non-intrusive estimate nDNN2f is mapped to select item-dependent remixing gains with the aim of maximizing the interferer attenuation under a constraint on the minimum quality of the remixed output (e.g., audible but not annoying deteriorations). A listening test shows that this is successfully achieved even with very different selected gains (up to 23 dB difference). △ Less

Submitted 21 July, 2021; originally announced July 2021.

Comments: Manuscript accepted for the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

arXiv:2106.08229 [pdf, other]

MICo: Improved representations via sampling-based state similarity for Markov decision processes

Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

Abstract: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed… ▽ More We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed distance addresses both of these issues. In addition to providing detailed theoretical analysis, we provide empirical evidence that learning this distance alongside the value function yields structured and informative representations, including strong results on the Arcade Learning Environment benchmark. △ Less

Submitted 21 January, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: Published at NeurIPS 2021

arXiv:1708.09706 [pdf]

Seminar Innovation Management - Winter Term 2017

Authors: Gerd Häusler, Aleksandra Milczarek, Markus Schreiter, Thomas Kästner, Florian Willomitzer, Andreas Maier, Florian Schiffers, Stefan Steidl, Temitope Paul Onanuga, Mathias Unberath, Florian Dötzer, Maike Stöve, Jonas Hajek, Christian Heidorn, Felix Häußler, Tobias Geimer, Johannes Wendel

Abstract: This document contains the results obtained by the Innovation Management Seminar in winter term 2017. In total 11 ideas have been developed by the team. In the document all 11 ideas show improvements for future applications in ophthalmology. The 11 ideas are AR/VR Glasses with Medical Applications, Augmented Reality Eye Surgery, Game Diagnosis, Intelligent Adapting Glasses, MD Facebook, Medical Cr… ▽ More This document contains the results obtained by the Innovation Management Seminar in winter term 2017. In total 11 ideas have been developed by the team. In the document all 11 ideas show improvements for future applications in ophthalmology. The 11 ideas are AR/VR Glasses with Medical Applications, Augmented Reality Eye Surgery, Game Diagnosis, Intelligent Adapting Glasses, MD Facebook, Medical Crowd Segmentation, Personalized 3D Model of the Human Eye, Photoacoustic Contact Lens, Power Supply Smart Contact Lens, VR-Cornea and Head Mount for Fundus Imaging △ Less

Submitted 22 August, 2017; originally announced August 2017.

ACM Class: K.6.0

Showing 1–7 of 7 results for author: Kastner, T