Search | arXiv e-print repository

Task Phasing: Automated Curriculum Learning from Demonstrations

Authors: Vaibhav Bajaj, Guni Sharon, Peter Stone

Abstract: Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled tas… ▽ More Applying reinforcement learning (RL) to sparse reward domains is notoriously challenging due to insufficient guiding signals. Common RL techniques for addressing such domains include (1) learning from demonstrations and (2) curriculum learning. While these two approaches have been studied in detail, they have rarely been considered together. This paper aims to do so by introducing a principled task phasing approach that uses demonstrations to automatically generate a curriculum sequence. Using inverse RL from (suboptimal) demonstrations we define a simple initial task. Our task phasing approach then provides a framework to gradually increase the complexity of the task all the way to the target task, while retuning the RL agent in each phasing iteration. Two approaches for phasing are considered: (1) gradually increasing the proportion of time steps an RL agent is in control, and (2) phasing out a guiding informative reward function. We present conditions that guarantee the convergence of these approaches to an optimal policy. Experimental results on 3 sparse reward domains demonstrate that our task phasing approaches outperform state-of-the-art approaches with respect to asymptotic performance. △ Less

Submitted 27 March, 2023; v1 submitted 19 October, 2022; originally announced October 2022.

Comments: 7 pages main paper, 7 figures, 4 pages appendix. Submitted to AAAI 2023 Conference

arXiv:2112.06637 [pdf, other]

Efficient Training of Volterra Series-Based Pre-distortion Filter Using Neural Networks

Authors: Vinod Bajaj, Mathieu Chagnon, Sander Wahls, Vahid Aref

Abstract: We present a simple, efficient "direct learning" approach to train Volterra series-based digital pre-distortion filters using neural networks. We show its superior performance over conventional training methods using a 64-QAM 64-GBaud simulated transmitter with varying transmitter nonlinearity and noisy conditions. We present a simple, efficient "direct learning" approach to train Volterra series-based digital pre-distortion filters using neural networks. We show its superior performance over conventional training methods using a 64-QAM 64-GBaud simulated transmitter with varying transmitter nonlinearity and noisy conditions. △ Less

Submitted 13 December, 2021; originally announced December 2021.

Comments: Accepted for presentation in OFC 2022

arXiv:2107.12320 [pdf, other]

End-to-End Deep Learning of Long-Haul Coherent Optical Fiber Communications via Regular Perturbation Model

Authors: Vladislav Neskorniuk, Andrea Carnio, Vinod Bajaj, Domenico Marsella, Sergei K. Turitsyn, Jaroslaw E. Prilepsky, Vahid Aref

Abstract: We present a novel end-to-end autoencoder-based learning for coherent optical communications using a "parallelizable" perturbative channel model. We jointly optimized constellation shaping and nonlinear pre-emphasis achieving mutual information gain of 0.18 bits/sym./pol. simulating 64 GBd dual-polarization single-channel transmission over 30x80 km G.652 SMF link with EDFAs. We present a novel end-to-end autoencoder-based learning for coherent optical communications using a "parallelizable" perturbative channel model. We jointly optimized constellation shaping and nonlinear pre-emphasis achieving mutual information gain of 0.18 bits/sym./pol. simulating 64 GBd dual-polarization single-channel transmission over 30x80 km G.652 SMF link with EDFAs. △ Less

Submitted 26 July, 2021; originally announced July 2021.

Comments: 4 pages; accepted for presentation at ECOC 2021 in September 2021

arXiv:2104.02656 [pdf, other]

Collaborative Learning to Generate Audio-Video Jointly

Authors: Vinod K Kurmi, Vipul Bajaj, Badri N Patro, K S Venkatesh, Vinay P Namboodiri, Preethi Jyothi

Abstract: There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are… ▽ More There have been a number of techniques that have demonstrated the generation of multimedia data for one modality at a time using GANs, such as the ability to generate images, videos, and audio. However, so far, the task of multi-modal generation of data, specifically for audio and videos both, has not been sufficiently well-explored. Towards this, we propose a method that demonstrates that we are able to generate naturalistic samples of video and audio data by the joint correlated generation of audio and video modalities. The proposed method uses multiple discriminators to ensure that the audio, video, and the joint output are also indistinguishable from real-world samples. We present a dataset for this task and show that we are able to generate realistic samples. This method is validated using various standard metrics such as Inception Score, Frechet Inception Distance (FID) and through human evaluation. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: ICASSP 2021 (Accepted)

arXiv:1907.10628 [pdf, other]

Curriculum based Dropout Discriminator for Domain Adaptation

Authors: Vinod Kumar Kurmi, Vipul Bajaj, Venkatesh K Subramanian, Vinay P Namboodiri

Abstract: Domain adaptation is essential to enable wide usage of deep learning based networks trained using large labeled datasets. Adversarial learning based techniques have shown their utility towards solving this problem using a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate, it would be useful if a distribution based… ▽ More Domain adaptation is essential to enable wide usage of deep learning based networks trained using large labeled datasets. Adversarial learning based techniques have shown their utility towards solving this problem using a discriminator that ensures source and target distributions are close. However, here we suggest that rather than using a point estimate, it would be useful if a distribution based discriminator could be used to bridge this gap. This could be achieved using multiple classifiers or using traditional ensemble methods. In contrast, we suggest that a Monte Carlo dropout based ensemble discriminator could suffice to obtain the distribution based discriminator. Specifically, we propose a curriculum based dropout discriminator that gradually increases the variance of the sample based distribution and the corresponding reverse gradients are used to align the source and target feature representations. The detailed results and thorough ablation analysis show that our model outperforms state-of-the-art results. △ Less

Submitted 19 October, 2019; v1 submitted 24 July, 2019; originally announced July 2019.

Comments: BMVC 2019 Accepted, Project Page: https://delta-lab-iitk.github.io/CD3A/

Showing 1–5 of 5 results for author: Bajaj, V