-
Task Phasing: Automated Curriculum Learning from Demonstrations
Authors:
Vaibhav Bajaj,
Guni Sharon,
Peter Stone
Abstract:
Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled tas…
▽ More
Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled task phasing approach that uses demonstrations to automatically generate a curriculum sequence. Using inverse RL from (suboptimal) demonstrations we define a simple initial task. Our task phasing approach then provides a framework to gradually increase the complexity of the task all the way to the target task, while retuning the RL agent in each phasing iteration. Two approaches for phasing are considered: (1) gradually increasing the proportion of time steps an RL agent is in control, and (2) phasing out a guiding informative reward function. We present conditions that guarantee the convergence of these approaches to an optimal policy. Experimental results on 3 sparse reward domains demonstrate that our task phasing approaches outperform state-of-the-art approaches with respect to asymptotic performance.
△ Less
Submitted 27 March, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Efficient Training of Volterra Series-Based Pre-distortion Filter Using Neural Networks
Authors:
Vinod Bajaj,
Mathieu Chagnon,
Sander Wahls,
Vahid Aref
Abstract:
We present a simple, efficient "direct learning" approach to train Volterra series-based digital pre-distortion filters using neural networks. We show its superior performance over conventional training methods using a 64-QAM 64-GBaud simulated transmitter with varying transmitter nonlinearity and noisy conditions.
We present a simple, efficient "direct learning" approach to train Volterra series-based digital pre-distortion filters using neural networks. We show its superior performance over conventional training methods using a 64-QAM 64-GBaud simulated transmitter with varying transmitter nonlinearity and noisy conditions.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
End-to-End Deep Learning of Long-Haul Coherent Optical Fiber Communications via Regular Perturbation Model
Authors:
Vladislav Neskorniuk,
Andrea Carnio,
Vinod Bajaj,
Domenico Marsella,
Sergei K. Turitsyn,
Jaroslaw E. Prilepsky,
Vahid Aref
Abstract:
We present a novel end-to-end autoencoder-based learning for coherent optical communications using a "parallelizable" perturbative channel model. We jointly optimized constellation shaping and nonlinear pre-emphasis achieving mutual information gain of 0.18 bits/sym./pol. simulating 64 GBd dual-polarization single-channel transmission over 30x80 km G.652 SMF link with EDFAs.
We present a novel end-to-end autoencoder-based learning for coherent optical communications using a "parallelizable" perturbative channel model. We jointly optimized constellation shaping and nonlinear pre-emphasis achieving mutual information gain of 0.18 bits/sym./pol. simulating 64 GBd dual-polarization single-channel transmission over 30x80 km G.652 SMF link with EDFAs.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Collaborative Learning to Generate Audio-Video Jointly
Authors:
Vinod K Kurmi,
Vipul Bajaj,
Badri N Patro,
K S Venkatesh,
Vinay P Namboodiri,
Preethi Jyothi
Abstract:
There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are…
▽ More
There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are able to generate naturalistic samples of video and audio data by the joint correlated generation of audio and video modalities. The proposed method uses multiple discriminators to ensure that the audio, video, and the joint output are also indistinguishable from real-world samples. We present a dataset for this task and show that we are able to generate realistic samples. This method is validated using various standard metrics such as Inception Score, Frechet Inception Distance (FID) and through human evaluation.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
-
Curriculum based Dropout Discriminator for Domain Adaptation
Authors:
Vinod Kumar Kurmi,
Vipul Bajaj,
Venkatesh K Subramanian,
Vinay P Namboodiri
Abstract:
Domain adaptation is essential to enable wide usage of deep learning based networks trained using large labeled datasets. Adversarial learning based techniques have shown their utility towards solving this problem using a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate, it would be useful if a distribution based…
▽ More
Domain adaptation is essential to enable wide usage of deep learning based networks trained using large labeled datasets. Adversarial learning based techniques have shown their utility towards solving this problem using a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate, it would be useful if a distribution based discriminator could be used to bridge this gap. This could be achieved using multiple classifiers or using traditional ensemble methods. In contrast, we suggest that a Monte Carlo dropout based ensemble discriminator could suffice to obtain the distribution based discriminator. Specifically, we propose a curriculum based dropout discriminator that gradually increases the variance of the sample based distribution and the corresponding reverse gradients are used to align the source and target feature representations. The detailed results and thorough ablation analysis show that our model outperforms state-of-the-art results.
△ Less
Submitted 19 October, 2019; v1 submitted 24 July, 2019;
originally announced July 2019.