Zum Hauptinhalt springen

Showing 1–40 of 40 results for author: Zahavy, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2308.12649  [pdf, other

    cs.LG cs.AI

    APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT

    Authors: Hadar Schreiber Galler, Tom Zahavy, Guillaume Desjardins, Alon Cohen

    Abstract: We study diverse skill discovery in reward-free environments, aiming to discover all possible skills in simple grid-world environments where prior methods have struggled to succeed. This problem is formulated as mutual training of skills using an intrinsic reward and a discriminator trained to predict a skill given its trajectory. Our initial solution replaces the standard one-vs-all (softmax) dis… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  2. arXiv:2308.09175  [pdf, other

    cs.AI cs.LG

    Diversifying AI: Towards Creative Chess with AlphaZero

    Authors: Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh

    Abstract: In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In par… ▽ More

    Submitted 31 July, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

  3. arXiv:2306.10587  [pdf, other

    cs.LG cs.AI stat.ML

    Acceleration in Policy Optimization

    Authors: Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

    Abstract: We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates. Leveraging the connection between policy iteration and policy gradient methods, we view policy optimization algorithms as iteratively solving a sequence of surrogate objectives, local lower bound… ▽ More

    Submitted 5 September, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  4. arXiv:2304.03995  [pdf, other

    cs.NE cs.LG

    Discovering Attention-Based Genetic Algorithms via Meta-Black-Box Optimization

    Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Chris Lu, Tom Zahavy, Valentin Dalibard, Sebastian Flennerhag

    Abstract: Genetic algorithms constitute a family of black-box optimization algorithms, which take inspiration from the principles of biological evolution. While they provide a general-purpose tool for optimization, their particular instantiations can be heuristic and motivated by loose biological intuition. In this work we explore a fundamentally different approach: Given a sufficiently flexible parametriza… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

    Comments: 14 pages, 31 figures

  5. arXiv:2302.01275  [pdf, other

    cs.LG

    ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs

    Authors: Ted Moskovitz, Brendan O'Donoghue, Vivek Veeriah, Sebastian Flennerhag, Satinder Singh, Tom Zahavy

    Abstract: In recent years, Reinforcement Learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate… ▽ More

    Submitted 5 March, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

  6. arXiv:2301.03236  [pdf, other

    cs.LG cs.AI math.OC

    Optimistic Meta-Gradients

    Authors: Sebastian Flennerhag, Tom Zahavy, Brendan O'Donoghue, Hado van Hasselt, András György, Satinder Singh

    Abstract: We study the connection between gradient-based meta-learning and convex op-timisation. We observe that gradient descent with momentum is a special case of meta-gradients, and building on recent results in optimisation, we prove convergence rates for meta-learning in the single task setting. While a meta-learned update rule can yield faster convergence up to constant factor, it is not sufficient fo… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

  7. arXiv:2212.14530  [pdf, other

    cs.AI cs.LG

    POMRL: No-Regret Learning-to-Plan with Increasing Horizons

    Authors: Khimya Khetarpal, Claire Vernade, Brendan O'Donoghue, Satinder Singh, Tom Zahavy

    Abstract: We study the problem of planning under model uncertainty in an online meta-reinforcement learning (RL) setting where an agent is presented with a sequence of related tasks with limited interactions per task. The agent can use its experience in each task and across tasks to estimate both the transition model and the distribution over tasks. We propose an algorithm to meta-learn the underlying struc… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 24 pages, 6 figures

  8. arXiv:2211.11260  [pdf, other

    cs.NE cs.AI

    Discovering Evolution Strategies via Meta-Black-Box Optimization

    Authors: Robert Tjarko Lange, Tom Schaul, Yutian Chen, Tom Zahavy, Valentin Dallibard, Chris Lu, Satinder Singh, Sebastian Flennerhag

    Abstract: Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search str… ▽ More

    Submitted 2 March, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 25 pages, 21 figures

    Journal ref: 11th International Conference on Learning Representations, ICLR 2023

  9. arXiv:2210.10913  [pdf, other

    cs.LG cs.AI

    Palm up: Playing in the Latent Manifold for Unsupervised Pretraining

    Authors: Hao Liu, Tom Zahavy, Volodymyr Mnih, Satinder Singh

    Abstract: Large and diverse datasets have been the cornerstones of many impressive advancements in artificial intelligence. Intelligent creatures, however, learn by interacting with the environment, which changes the input sensory signals and the state of the environment. In this work, we aim to bring the best of both worlds and propose an algorithm that exhibits an exploratory behavior whilst it utilizes l… ▽ More

    Submitted 21 October, 2022; v1 submitted 19 October, 2022; originally announced October 2022.

    Comments: Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  10. arXiv:2209.06159  [pdf, other

    cs.LG

    Meta-Gradients in Non-Stationary Environments

    Authors: Jelena Luketina, Sebastian Flennerhag, Yannick Schroecker, David Abel, Tom Zahavy, Satinder Singh

    Abstract: Meta-gradient methods (Xu et al., 2018; Zahavy et al., 2020) offer a promising solution to the problem of hyperparameter selection and adaptation in non-stationary reinforcement learning problems. However, the properties of meta-gradients in such environments have not been systematically studied. In this work, we bring new clarity to meta-gradients in non-stationary environments. Concretely, we as… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: 16 pages, 9 figures, CoLLAs 2022

  11. arXiv:2205.13521  [pdf, other

    cs.AI cs.LG

    Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

    Authors: Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou, Satinder Singh

    Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose DOMiNO, a method for Diversity Optimization Maintaining Near Optimality. We formalize the problem as a Constrained Markov Dec… ▽ More

    Submitted 3 February, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

  12. arXiv:2109.04504  [pdf, other

    cs.LG cs.AI stat.ML

    Bootstrapped Meta-Learning

    Authors: Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Meta-learning empowers artificial intelligence to increase its efficiency by learning how to learn. Unlocking this potential involves overcoming a challenging meta-optimisation problem. We propose an algorithm that tackles this problem by letting the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance… ▽ More

    Submitted 16 March, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Published at ICLR 2022. 37 pages, 19 figures, 9 tables

  13. arXiv:2106.11779  [pdf, other

    cs.LG stat.ML

    Emphatic Algorithms for Deep Reinforcement Learning

    Authors: Ray Jiang, Tom Zahavy, Zhongwen Xu, Adam White, Matteo Hessel, Charles Blundell, Hado van Hasselt

    Abstract: Off-policy learning allows us to learn about possible policies of behavior from experience generated by a different behavior policy. Temporal difference (TD) learning algorithms can become unstable when combined with function approximation and off-policy sampling - this is known as the ''deadly triad''. Emphatic temporal difference (ETD($λ$)) algorithm ensures convergence in the linear case by app… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  14. arXiv:2106.00669  [pdf, other

    cs.AI cs.LG stat.ML

    Discovering Diverse Nearly Optimal Policies with Successor Features

    Authors: Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih, Sebastian Flennerhag, Satinder Singh

    Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features, while ass… ▽ More

    Submitted 4 January, 2022; v1 submitted 1 June, 2021; originally announced June 2021.

  15. arXiv:2106.00661  [pdf, other

    cs.AI cs.LG stat.ML

    Reward is enough for convex MDPs

    Authors: Tom Zahavy, Brendan O'Donoghue, Guillaume Desjardins, Satinder Singh

    Abstract: Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that t… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2021; originally announced June 2021.

  16. arXiv:2102.06924  [pdf, other

    cs.LG stat.ML

    Online Apprenticeship Learning

    Authors: Lior Shani, Tom Zahavy, Shie Mannor

    Abstract: In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the… ▽ More

    Submitted 29 December, 2021; v1 submitted 13 February, 2021; originally announced February 2021.

    Comments: AAAI 2022

  17. arXiv:2102.06741  [pdf, other

    cs.LG cs.AI

    Discovery of Options via Meta-Learned Subgoals

    Authors: Vivek Veeriah, Tom Zahavy, Matteo Hessel, Zhongwen Xu, Junhyuk Oh, Iurii Kemaev, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster. However, despite prior work on this topic, the problem of discovering options through interaction with an environment remains a challenge. In this paper, we introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments. Our approach is based… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

  18. arXiv:2102.04323  [pdf, other

    cs.AI cs.LG

    Discovering a set of policies for the worst case reward

    Authors: Tom Zahavy, Andre Barreto, Daniel J Mankowitz, Shaobo Hou, Brendan O'Donoghue, Iurii Kemaev, Satinder Singh

    Abstract: We study the problem of how to construct a set of policies that can be composed together to solve a collection of reinforcement learning tasks. Each task is a different reward function defined as a linear combination of known features. We consider a specific class of policy compositions which we call set improving policies (SIPs): given a set of policies and a set of tasks, a SIP is any compositio… ▽ More

    Submitted 10 December, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  19. arXiv:2102.03799  [pdf, other

    cs.LG cs.AI

    Online Limited Memory Neural-Linear Bandits with Likelihood Matching

    Authors: Ofir Nabati, Tom Zahavy, Shie Mannor

    Abstract: We study neural-linear bandits for solving problems where {\em both} exploration and representation learning play an important role. Neural-linear bandits harnesses the representation power of Deep Neural Networks (DNNs) and combines it with efficient exploration mechanisms by leveraging uncertainty estimation of the model, designed for linear contextual bandits on top of the last hidden layer. In… ▽ More

    Submitted 8 June, 2021; v1 submitted 7 February, 2021; originally announced February 2021.

    Comments: ICML 2021. arXiv admin note: text overlap with arXiv:1901.08612

  20. arXiv:2010.06324  [pdf, other

    cs.LG cs.AI stat.ML

    Balancing Constraints and Rewards with Meta-Gradient D4PG

    Authors: Dan A. Calian, Daniel J. Mankowitz, Tom Zahavy, Zhongwen Xu, Junhyuk Oh, Nir Levine, Timothy Mann

    Abstract: Deploying Reinforcement Learning (RL) agents to solve real-world applications often requires satisfying complex system constraints. Often the constraint thresholds are incorrectly set due to the complex nature of a system or the inability to verify the thresholds offline (e.g, no simulator or reasonable offline evaluation procedure exists). This results in solutions where a task cannot be solved w… ▽ More

    Submitted 27 November, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

  21. arXiv:2004.00994  [pdf, other

    cs.LG cs.AI stat.ML

    Learning to Ask Medical Questions using Reinforcement Learning

    Authors: Uri Shaham, Tom Zahavy, Cesar Caraballo, Shiwani Mahajan, Daisy Massey, Harlan Krumholz

    Abstract: We propose a novel reinforcement learning-based approach for adaptive and iterative feature selection. Given a masked vector of input features, a reinforcement learning agent iteratively selects certain features to be unmasked, and uses them to predict an outcome when it is sufficiently confident. The algorithm makes use of a novel environment setting, corresponding to a non-stationary Markov Deci… ▽ More

    Submitted 25 May, 2020; v1 submitted 31 March, 2020; originally announced April 2020.

  22. arXiv:2002.12928  [pdf, other

    stat.ML cs.LG

    A Self-Tuning Actor-Critic Algorithm

    Authors: Tom Zahavy, Zhongwen Xu, Vivek Veeriah, Matteo Hessel, Junhyuk Oh, Hado van Hasselt, David Silver, Satinder Singh

    Abstract: Reinforcement learning algorithms are highly sensitive to the choice of hyperparameters, typically requiring significant manual effort to identify hyperparameters that perform well on a new domain. In this paper, we take a step towards addressing this issue by using metagradients to automatically adapt hyperparameters online by meta-gradient descent (Xu et al., 2018). We apply our algorithm, Self-… ▽ More

    Submitted 14 April, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

  23. arXiv:1911.10326  [pdf

    physics.optics cs.LG

    Deep learning reconstruction of ultrashort pulses from 2D spatial intensity patterns recorded by an all-in-line system in a single-shot

    Authors: Ron Ziv, Alex Dikopoltsev, Tom Zahavy, Ittai Rubinstein, Pavel Sidorenko, Oren Cohen, Mordechai Segev

    Abstract: We propose a simple all-in-line single-shot scheme for diagnostics of ultrashort laser pulses, consisting of a multi-mode fiber, a nonlinear crystal and a CCD camera. The system records a 2D spatial intensity pattern, from which the pulse shape (amplitude and phase) are recovered, through a fast Deep Learning algorithm. We explore this scheme in simulations and demonstrate the recovery of ultrasho… ▽ More

    Submitted 23 November, 2019; originally announced November 2019.

  24. arXiv:1911.01679  [pdf, other

    cs.LG stat.ML

    Apprenticeship Learning via Frank-Wolfe

    Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

    Abstract: We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In this setting, we are given a Markov Decision Process (MDP) without an explicit reward function. Instead, we observe an expert that acts according to some policy, and the goal is to find a policy whose feature expectations are closest to those of the expert policy. We formulate this problem as findin… ▽ More

    Submitted 20 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

  25. arXiv:1905.09710  [pdf, other

    cs.LG stat.ML

    Inverse Reinforcement Learning in Contextual MDPs

    Authors: Stav Belogolovsky, Philip Korsunsky, Shie Mannor, Chen Tessler, Tom Zahavy

    Abstract: We consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the re… ▽ More

    Submitted 30 December, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

  26. arXiv:1905.09704  [pdf, other

    cs.LG stat.ML

    Unknown mixing times in apprenticeship and reinforcement learning

    Authors: Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour

    Abstract: We derive and analyze learning algorithms for apprenticeship learning, policy evaluation, and policy gradient for average reward criteria. Existing algorithms explicitly require an upper bound on the mixing time. In contrast, we build on ideas from Markov chain theory and derive sampling algorithms that do not require such an upper bound. For these algorithms, we provide theoretical bounds on thei… ▽ More

    Submitted 20 June, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

  27. arXiv:1905.09700  [pdf, other

    cs.LG

    Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces

    Authors: Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor

    Abstract: We propose a computationally efficient algorithm that combines compressed sensing with imitation learning to solve text-based games with combinatorial action spaces. Specifically, we introduce a new compressed sensing algorithm, named IK-OMP, which can be seen as an extension to the Orthogonal Matching Pursuit (OMP). We incorporate IK-OMP into a supervised imitation learning setting and show that… ▽ More

    Submitted 9 February, 2020; v1 submitted 23 May, 2019; originally announced May 2019.

    Comments: Under review at IJCAI 2020

  28. arXiv:1902.10140  [pdf, other

    cs.LG cs.AI cs.DS

    Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies

    Authors: Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

    Abstract: We consider a settings of hierarchical reinforcement learning, in which the reward is a sum of components. For each component we are given a policy that maximizes it and our goal is to assemble a policy from the individual policies that maximizes the sum of the components. We provide theoretical guarantees for assembling such policies in deterministic MDPs with collectible rewards. Our approach bu… ▽ More

    Submitted 3 January, 2020; v1 submitted 26 February, 2019; originally announced February 2019.

    Comments: Extends previous paper (arXiv:1803.04674) by the same authors

  29. arXiv:1901.08612  [pdf, other

    cs.LG cs.AI stat.ML

    Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

    Authors: Tom Zahavy, Shie Mannor

    Abstract: We study the neural-linear bandit model for solving sequential decision-making problems with high dimensional side information. Neural-linear bandits leverage the representation power of deep neural networks and combine it with efficient exploration mechanisms, designed for linear contextual bandits, on top of the last hidden layer. Since the representation is being optimized during learning, info… ▽ More

    Submitted 11 August, 2019; v1 submitted 24 January, 2019; originally announced January 2019.

  30. arXiv:1809.02121  [pdf, other

    cs.LG stat.ML

    Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning

    Authors: Tom Zahavy, Matan Haroush, Nadav Merlis, Daniel J. Mankowitz, Shie Mannor

    Abstract: Learning how to act when there are many available actions in each state is a challenging task for Reinforcement Learning (RL) agents, especially when many of the actions are redundant or irrelevant. In such cases, it is sometimes easier to learn which actions not to take. In this work, we propose the Action-Elimination Deep Q-Network (AE-DQN) architecture that combines a Deep RL algorithm with an… ▽ More

    Submitted 24 February, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Journal ref: Advances in Neural Information Processing Systems (pp. 3566-3577). 2018

  31. arXiv:1803.06024  [pdf, other

    physics.optics cs.AI cs.LG stat.ML

    Deep Learning Reconstruction of Ultra-Short Pulses

    Authors: Tom Zahavy, Alex Dikopoltsev, Oren Cohen, Shie Mannor, Mordechai Segev

    Abstract: Ultra-short laser pulses with femtosecond to attosecond pulse duration are the shortest systematic events humans can create. Characterization (amplitude and phase) of these pulses is a key ingredient in ultrafast science, e.g., exploring chemical reactions and electronic phase transitions. Here, we propose and demonstrate, numerically and experimentally, the first deep neural network technique to… ▽ More

    Submitted 15 March, 2018; originally announced March 2018.

  32. arXiv:1803.04674  [pdf, other

    cs.LG cs.AI stat.ML

    Hierarchical Reinforcement Learning: Approximating Optimal Discounted TSP Using Local Policies

    Authors: Tom Zahavy, Avinatan Hasidim, Haim Kaplan, Yishay Mansour

    Abstract: In this work, we provide theoretical guarantees for reward decomposition in deterministic MDPs. Reward decomposition is a special case of Hierarchical Reinforcement Learning, that allows one to learn many policies in parallel and combine them into a composite solution. Our approach builds on mapping this problem into a Reward Discounted Traveling Salesman Problem, and then deriving approximate sol… ▽ More

    Submitted 13 March, 2018; originally announced March 2018.

  33. arXiv:1802.05846  [pdf, other

    stat.ML cs.LG

    Train on Validation: Squeezing the Data Lemon

    Authors: Guy Tennenholtz, Tom Zahavy, Shie Mannor

    Abstract: Model selection on validation data is an essential step in machine learning. While the mixing of data between training and validation is considered taboo, practitioners often violate it to increase performance. Here, we offer a simple, practical method for using the validation set for training, which allows for a continuous, controlled trade-off between performance and overfitting of model selecti… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

  34. arXiv:1705.07461  [pdf, other

    cs.AI cs.LG stat.ML

    Shallow Updates for Deep Reinforcement Learning

    Authors: Nir Levine, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

    Abstract: Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the ot… ▽ More

    Submitted 2 November, 2017; v1 submitted 21 May, 2017; originally announced May 2017.

  35. arXiv:1611.09534  [pdf, other

    cs.CV cs.CL

    Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce

    Authors: Tom Zahavy, Alessandro Magnani, Abhinandan Krishnan, Shie Mannor

    Abstract: Classifying products into categories precisely and efficiently is a major challenge in modern e-commerce. The high traffic of new products uploaded daily and the dynamic nature of the categories raise the need for machine learning models that can reduce the cost and time of human editors. In this paper, we propose a decision level fusion approach for multi-modal product classification using text a… ▽ More

    Submitted 29 November, 2016; originally announced November 2016.

  36. arXiv:1606.07112  [pdf, other

    stat.ML cs.LG

    Visualizing Dynamics: from t-SNE to SEMI-MDPs

    Authors: Nir Ben Zrihem, Tom Zahavy, Shie Mannor

    Abstract: Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in many challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still missing the tools to analayze their performance and visualize the temporal abstractions that they learn. In this paper, we present a novel method that automatically disc… ▽ More

    Submitted 22 June, 2016; originally announced June 2016.

    Comments: Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

  37. arXiv:1606.05174  [pdf, other

    cs.AI

    Deep Reinforcement Learning Discovers Internal Models

    Authors: Nir Baram, Tom Zahavy, Shie Mannor

    Abstract: Deep Reinforcement Learning (DRL) is a trending field of research, showing great promise in challenging problems such as playing Atari, solving Go and controlling robots. While DRL agents perform well in practice we are still lacking the tools to analayze their performance. In this work we present the Semi-Aggregated MDP (SAMDP) model. A model best suited to describe policies exhibiting both spati… ▽ More

    Submitted 16 June, 2016; originally announced June 2016.

  38. arXiv:1604.07255  [pdf, other

    cs.AI cs.LG

    A Deep Hierarchical Approach to Lifelong Learning in Minecraft

    Authors: Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

    Abstract: We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as D… ▽ More

    Submitted 30 November, 2016; v1 submitted 25 April, 2016; originally announced April 2016.

  39. arXiv:1602.02658  [pdf, other

    cs.LG cs.AI cs.NE

    Graying the black box: Understanding DQNs

    Authors: Tom Zahavy, Nir Ben Zrihem, Shie Mannor

    Abstract: In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Moreover, we propose a new model, the Semi Aggregated Markov Decision Process (SAMDP), and an algorithm that learns it automatically. The SAMDP model allows us to identify spatio-temporal abs… ▽ More

    Submitted 24 April, 2017; v1 submitted 8 February, 2016; originally announced February 2016.

  40. arXiv:1602.02389  [pdf, other

    cs.LG cs.CV stat.ML

    Ensemble Robustness and Generalization of Stochastic Deep Learning Algorithms

    Authors: Tom Zahavy, Bingyi Kang, Alex Sivak, Jiashi Feng, Huan Xu, Shie Mannor

    Abstract: The question why deep learning algorithms generalize so well has attracted increasing research interest. However, most of the well-established approaches, such as hypothesis capacity, stability or sparseness, have not provided complete explanations (Zhang et al., 2016; Kawaguchi et al., 2017). In this work, we focus on the robustness approach (Xu & Mannor, 2012), i.e., if the error of a hypothesis… ▽ More

    Submitted 5 November, 2017; v1 submitted 7 February, 2016; originally announced February 2016.

    Comments: 16 pages, 2 figures