Search | arXiv e-print repository

Multi-User Reinforcement Learning with Low Rank Rewards

Authors: Naman Agarwal, Prateek Jain, Suhas Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli

Abstract: In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering se… ▽ More In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering setting -- the question is can we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard ``non-collaborative'' algorithms. △ Less

Submitted 22 May, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2201.00866 [pdf, ps, other]

Improved bounds for the many-user MAC

Authors: Suhas S Kowshik

Abstract: Many-user MAC is an important model for understanding energy efficiency of massive random access in 5G and beyond. Introduced in Polyanskiy'2017 for the AWGN channel, subsequent works have provided improved bounds on the asymptotic minimum energy-per-bit required to achieve a target per-user error at a given user density and payload, going beyond the AWGN setting. The best known rigorous bounds us… ▽ More Many-user MAC is an important model for understanding energy efficiency of massive random access in 5G and beyond. Introduced in Polyanskiy'2017 for the AWGN channel, subsequent works have provided improved bounds on the asymptotic minimum energy-per-bit required to achieve a target per-user error at a given user density and payload, going beyond the AWGN setting. The best known rigorous bounds use spatially coupled codes along with the optimal AMP algorithm. But these bounds are infeasible to compute beyond a few (around 10) bits of payload. In this paper, we provide new achievability bounds for the many-user AWGN and quasi-static Rayleigh fading MACs using the spatially coupled codebook design along with a scalar AMP algorithm. The obtained bounds are computable even up to 100 bits and outperform the previous ones at this payload. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: 6 pages

arXiv:2107.07791 [pdf, other]

doi 10.1016/j.patcog.2021.108174

Graph Representation Learning for Road Type Classification

Authors: Zahra Gharaee, Shreyas Kowshik, Oliver Stromann, Michael Felsberg

Abstract: We present a novel learning-based approach to graph representations of road networks employing state-of-the-art graph convolutional neural networks. Our approach is applied to realistic road networks of 17 cities from Open Street Map. While edge features are crucial to generate descriptive graph representations of road networks, graph convolutional networks usually rely on node features only. We s… ▽ More We present a novel learning-based approach to graph representations of road networks employing state-of-the-art graph convolutional neural networks. Our approach is applied to realistic road networks of 17 cities from Open Street Map. While edge features are crucial to generate descriptive graph representations of road networks, graph convolutional networks usually rely on node features only. We show that the highly representative edge features can still be integrated into such networks by applying a line graph transformation. We also propose a method for neighborhood sampling based on a topological neighborhood composed of both local and global neighbors. We compare the performance of learning representations using different types of neighborhood aggregation functions in transductive and inductive tasks and in supervised and unsupervised learning. Furthermore, we propose a novel aggregation approach, Graph Attention Isomorphism Network, GAIN. Our results show that GAIN outperforms state-of-the-art methods on the road type classification problem. △ Less

Submitted 3 June, 2022; v1 submitted 16 July, 2021; originally announced July 2021.

arXiv:2105.11558 [pdf, other]

Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems

Authors: Prateek Jain, Suhas S Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli

Abstract: We consider the setting of vector valued non-linear dynamical systems $X_{t+1} = φ(A^* X_t) + η_t$, where $η_t$ is unbiased noise and $φ: \mathbb{R} \to \mathbb{R}$ is a known link function that satisfies certain {\em expansivity property}. The goal is to learn $A^*$ from a single trajectory $X_1,\cdots,X_T$ of {\em dependent or correlated} samples. While the problem is well-studied in the linear… ▽ More We consider the setting of vector valued non-linear dynamical systems $X_{t+1} = φ(A^* X_t) + η_t$, where $η_t$ is unbiased noise and $φ: \mathbb{R} \to \mathbb{R}$ is a known link function that satisfies certain {\em expansivity property}. The goal is to learn $A^*$ from a single trajectory $X_1,\cdots,X_T$ of {\em dependent or correlated} samples. While the problem is well-studied in the linear case, where $φ$ is identity, with optimal error rates even for non-mixing systems, existing results in the non-linear case hold only for mixing systems. In this work, we improve existing results for learning nonlinear systems in a number of ways: a) we provide the first offline algorithm that can learn non-linear dynamical systems without the mixing assumption, b) we significantly improve upon the sample complexity of existing results for mixing systems, c) in the much harder one-pass, streaming setting we study a SGD with Reverse Experience Replay ($\mathsf{SGD-RER}$) method, and demonstrate that for mixing systems, it achieves the same sample complexity as our offline algorithm, d) we justify the expansivity assumption by showing that for the popular ReLU link function -- a non-expansive but easy to learn link function with i.i.d. samples -- any method would require exponentially many samples (with respect to dimension of $X_t$) from the dynamical system. We validate our results via. simulations and demonstrate that a naive application of SGD can be highly sub-optimal. Indeed, our work demonstrates that for correlated data, specialized methods designed for the dependency structure in data can significantly outperform standard SGD based methods. △ Less

Submitted 1 December, 2021; v1 submitted 24 May, 2021; originally announced May 2021.

arXiv:2103.05896 [pdf, other]

Streaming Linear System Identification with Reverse Experience Replay

Authors: Prateek Jain, Suhas S Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli

Abstract: We consider the problem of estimating a linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms, which is encountered in several applications including reinforcement learning (RL) and time-series analysis. While the LTI system estimation problem is well-studied in the {\em offline} setting, the practically important streaming/online setting has received littl… ▽ More We consider the problem of estimating a linear time-invariant (LTI) dynamical system from a single trajectory via streaming algorithms, which is encountered in several applications including reinforcement learning (RL) and time-series analysis. While the LTI system estimation problem is well-studied in the {\em offline} setting, the practically important streaming/online setting has received little attention. Standard streaming methods like stochastic gradient descent (SGD) are unlikely to work since streaming points can be highly correlated. In this work, we propose a novel streaming algorithm, SGD with Reverse Experience Replay ($\mathsf{SGD}-\mathsf{RER}$), that is inspired by the experience replay (ER) technique popular in the RL literature. $\mathsf{SGD}-\mathsf{RER}$ divides data into small buffers and runs SGD backwards on the data stored in the individual buffers. We show that this algorithm exactly deconstructs the dependency structure and obtains information theoretically optimal guarantees for both parameter error and prediction error. Thus, we provide the first -- to the best of our knowledge -- optimal SGD-style algorithm for the classical problem of linear system identification with a first order oracle. Furthermore, $\mathsf{SGD}-\mathsf{RER}$ can be applied to more general settings like sparse LTI identification with known sparsity pattern, and non-linear dynamical systems. Our work demonstrates that the knowledge of data dependency structure can aid us in designing statistically and computationally efficient algorithms which can "decorrelate" streaming samples. △ Less

Submitted 1 December, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

arXiv:1907.09448 [pdf, ps, other]

Energy efficient coded random access for the wireless uplink

Authors: Suhas S Kowshik, Kirill Andreev, Alexey Frolov, Yury Polyanskiy

Abstract: We discuss the problem of designing channel access architectures for enabling fast, low-latency, grant-free and uncoordinated uplink for densely packed wireless nodes. Specifically, we study random-access codes, previously introduced for the AWGN multiple-access channel (MAC) by Polyanskiy'2017, in the practically more relevant case of users subject to Rayleigh fading, when channel gains are unkno… ▽ More We discuss the problem of designing channel access architectures for enabling fast, low-latency, grant-free and uncoordinated uplink for densely packed wireless nodes. Specifically, we study random-access codes, previously introduced for the AWGN multiple-access channel (MAC) by Polyanskiy'2017, in the practically more relevant case of users subject to Rayleigh fading, when channel gains are unknown to the decoder. We propose a random coding achievability bound, which we analyze both non-asymptotically (at finite blocklength) and asymptotically. As a candidate practical solution, we propose an explicit sparse-graph based coding scheme together with an alternating belief-propagation decoder. The latter's performance is found to be surprisingly close to the finite-blocklength bounds. Our main findings are twofold. First, just like in the AWGN MAC we see that jointly decoding large number of users leads to a surprising phase transition effect, where at spectral efficiencies below a critical threshold (5-15 bps/Hz depending on reliability) a perfect multi-user interference cancellation is possible. Second, while the presence of Rayleigh fading significantly increases the minimal required energy-per-bit $E_b/N_0$ (from about 0-2 dB to about 8-11 dB), the inherent randomization introduced by the channel makes it much easier to attain the optimal performance via iterative schemes. In all, it is hoped that a principled definition of the random-access model together with our information-theoretic analysis will open the road towards unified benchmarking and comparison performance of various random-access solutions, such as the currently discussed candidates (MUSA, SCMA, RSMA) for the 5G/6G. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: 26 pages

arXiv:1901.06732 [pdf, other]

doi 10.1109/TIT.2021.3091423

Fundamental limits of many-user MAC with finite payloads and fading

Authors: Suhas S Kowshik, Yury Polyanskiy

Abstract: Consider a (multiple-access) wireless communication system where users are connected to a unique base station over a shared-spectrum radio links. Each user has a fixed number $k$ of bits to send to the base station, and his signal gets attenuated by a random channel gain (quasi-static fading). In this paper we consider the many-user asymptotics of Chen-Chen-Guo'2017, where the number of users grow… ▽ More Consider a (multiple-access) wireless communication system where users are connected to a unique base station over a shared-spectrum radio links. Each user has a fixed number $k$ of bits to send to the base station, and his signal gets attenuated by a random channel gain (quasi-static fading). In this paper we consider the many-user asymptotics of Chen-Chen-Guo'2017, where the number of users grows linearly with the blocklength. Differently, though, we adopt a per-user probability of error (PUPE) criterion (as opposed to classical joint-error probability criterion). Under PUPE the finite energy-per-bit communication is possible, and we are able to derive bounds on the tradeoff between energy and spectral efficiencies. We reconfirm the curious behaviour (previously observed for non-fading MAC) of the possibility of almost perfect multi-user interference (MUI) cancellation for user densities below a critical threshold. Further, we demonstrate the suboptimality of standard solutions such as orthogonalization (i.e., TDMA/FDMA) and treating interference as noise (i.e. pseudo-random CDMA without multi-user detection). Notably, the problem treated here can be seen as a variant of support recovery in compressed sensing for the unusual definition of sparsity with one non-zero entry per each contiguous section of $2^k$ coordinates. This identifies our problem with that of the sparse regression codes (SPARCs) and hence our results can be equivalently understood in the context of SPARCs with sections of length $2^{100}$. Finally, we discuss the relation of the almost perfect MUI cancellation property and the replica-method predictions. △ Less

Submitted 14 June, 2021; v1 submitted 20 January, 2019; originally announced January 2019.

Comments: 34 pages, accepted for publication in IEEE Transactions on Information Theory

arXiv:1604.03829 [pdf, other]

Animation and Chirplet-Based Development of a PIR Sensor Array for Intruder Classification in an Outdoor Environment

Authors: Raviteja Upadrashta, Tarun Choubisa, A. Praneeth, Tony G., Aswath V. S., P. Vijay Kumar, Sripad Kowshik, Hari Prasad Gokul R, T. V. Prabhakar

Abstract: This paper presents the development of a passive infra-red sensor tower platform along with a classification algorithm to distinguish between human intrusion, animal intrusion and clutter arising from wind-blown vegetative movement in an outdoor environment. The research was aimed at exploring the potential use of wireless sensor networks as an early-warning system to help mitigate human-wildlife… ▽ More This paper presents the development of a passive infra-red sensor tower platform along with a classification algorithm to distinguish between human intrusion, animal intrusion and clutter arising from wind-blown vegetative movement in an outdoor environment. The research was aimed at exploring the potential use of wireless sensor networks as an early-warning system to help mitigate human-wildlife conflicts occurring at the edge of a forest. There are three important features to the development. Firstly, the sensor platform employs multiple sensors arranged in the form of a two-dimensional array to give it a key spatial-resolution capability that aids in classification. Secondly, given the challenges of collecting data involving animal intrusion, an Animation-based Simulation tool for Passive Infra-Red sEnsor (ASPIRE) was developed that simulates signals corresponding to human and animal intrusion and some limited models of vegetative clutter. This speeded up the process of algorithm development by allowing us to test different hypotheses in a time-efficient manner. Finally, a chirplet-based model for intruder signal was developed that significantly helped boost classification accuracy despite drawing data from a smaller number of sensors. An SVM-based classifier was used which made use of chirplet, energy and signal cross-correlation-based features. The average accuracy obtained for intruder detection and classification on real-world and simulated data sets was in excess of 97%. △ Less

Submitted 13 April, 2016; originally announced April 2016.

Showing 1–8 of 8 results for author: Kowshik, S