Search | arXiv e-print repository

Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning

Authors: Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting, Philippe Preux

Abstract: Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tR… ▽ More Deep reinforcement learning agents are prone to goal misalignments. The black-box nature of their policies hinders the detection and correction of such misalignments, and the trust necessary for real-world deployment. So far, solutions learning interpretable policies are inefficient or require many human priors. We propose INTERPRETER, a fast distillation method producing INTerpretable Editable tRee Programs for ReinforcEmenT lEaRning. We empirically demonstrate that INTERPRETER compact tree programs match oracles across a diverse set of sequential decision tasks and evaluate the impact of our design choices on interpretability and performances. We show that our policies can be interpreted and edited to correct misalignments on Atari games and to explain real farming strategies. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.10906 [pdf, other]

Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop

Authors: Hector Kohler, Quentin Delfosse, Paul Festor, Philippe Preux

Abstract: Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in po… ▽ More Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in policies, without user studies? What reinforcement learning paradigms,are the most suited to develop interpretable agents? Can Markov Decision Processes integrate interpretable state representations? In addition to motivate an Interpretable RL community centered around the aforementioned questions, we propose the first venue dedicated to Interpretable RL: the InterpPol Workshop. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2402.16608 [pdf, other]

PAQA: Toward ProActive Open-Retrieval Question Answering

Authors: Pierre Erbacher, Jian-Yun Nie, Philippe Preux, Laure Soulier

Abstract: Conversational systems have made significant progress in generating natural language responses. However, their potential as conversational search systems is currently limited due to their passive role in the information-seeking process. One major limitation is the scarcity of datasets that provide labelled ambiguous questions along with a supporting corpus of documents and relevant clarifying ques… ▽ More Conversational systems have made significant progress in generating natural language responses. However, their potential as conversational search systems is currently limited due to their passive role in the information-seeking process. One major limitation is the scarcity of datasets that provide labelled ambiguous questions along with a supporting corpus of documents and relevant clarifying questions. This work aims to tackle the challenge of generating relevant clarifying questions by taking into account the inherent ambiguities present in both user queries and documents. To achieve this, we propose PAQA, an extension to the existing AmbiNQ dataset, incorporating clarifying questions. We then evaluate various models and assess how passage retrieval impacts ambiguity detection and the generation of clarifying questions. By addressing this gap in conversational search systems, we aim to provide additional supervision to enhance their active participation in the information-seeking process and provide users with more accurate results. △ Less

Submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.03337 [pdf, other]

Reinforcement-learning robotic sailboats: simulator and preliminary results

Authors: Eduardo Charles Vasconcellos, Ronald M Sampaio, André P D Araújo, Esteban Walter Gonzales Clua, Philippe Preux, Raphael Guerra, Luiz M G Gonçalves, Luis Martí, Hernan Lira, Nayat Sanchez-Pi

Abstract: This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the si… ▽ More This work focuses on the main challenges and problems in developing a virtual oceanic environment reproducing real experiments using Unmanned Surface Vehicles (USV) digital twins. We introduce the key features for building virtual worlds, considering using Reinforcement Learning (RL) agents for autonomous navigation and control. With this in mind, the main problems concern the definition of the simulation equations (physics and mathematics), their effective implementation, and how to include strategies for simulated control and perception (sensors) to be used with RL. We present the modeling, implementation steps, and challenges required to create a functional digital twin based on a real robotic sailing vessel. The application is immediate for developing navigation algorithms based on RL to be applied on real boats. △ Less

Submitted 16 January, 2024; originally announced February 2024.

Journal ref: NeurIPS 2023 Workshop on Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models, Dec 2023, New Orelans, United States

arXiv:2311.06119 [pdf, other]

Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search

Authors: Pierre Erbacher, Jian-Yun Nie, Philippe Preux, Laure Soulier

Abstract: A peculiarity of conversational search systems is that they involve mixed-initiatives such as system-generated query clarifying questions. Evaluating those systems at a large scale on the end task of IR is very challenging, requiring adequate datasets containing such interactions. However, current datasets only focus on either traditional ad-hoc IR tasks or query clarification tasks, the latter be… ▽ More A peculiarity of conversational search systems is that they involve mixed-initiatives such as system-generated query clarifying questions. Evaluating those systems at a large scale on the end task of IR is very challenging, requiring adequate datasets containing such interactions. However, current datasets only focus on either traditional ad-hoc IR tasks or query clarification tasks, the latter being usually seen as a reformulation task from the initial query. The only two datasets known to us that contain both document relevance judgments and the associated clarification interactions are Qulac and ClariQ. Both are based on the TREC Web Track 2009-12 collection, but cover a very limited number of topics (237 topics), far from being enough for training and testing conversational IR models. To fill the gap, we propose a methodology to automatically build large-scale conversational IR datasets from ad-hoc IR datasets in order to facilitate explorations on conversational IR. Our methodology is based on two processes: 1) generating query clarification interactions through query clarification and answer generators, and 2) augmenting ad-hoc IR datasets with simulated interactions. In this paper, we focus on MsMarco and augment it with query clarification and answer simulations. We perform a thorough evaluation showing the quality and the relevance of the generated interactions for each initial query. This paper shows the feasibility and utility of augmenting ad-hoc IR datasets for conversational IR. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2309.13365 [pdf, other]

Limits of Actor-Critic Algorithms for Decision Tree Policies Learning in IBMDPs

Authors: Hector Kohler, Riad Akrour, Philippe Preux

Abstract: Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been pr… ▽ More Interpretability of AI models allows for user safety checks to build trust in such AIs. In particular, Decision Trees (DTs) provide a global look at the learned model and transparently reveal which features of the input are critical for making a decision. However, interpretability is hindered if the DT is too large. To learn compact trees, a recent Reinforcement Learning (RL) framework has been proposed to explore the space of DTs using deep RL. This framework augments a decision problem (e.g. a supervised classification task) with additional actions that gather information about the features of an otherwise hidden input. By appropriately penalizing these actions, the agent learns to optimally trade-off size and performance of DTs. In practice, a reactive policy for a partially observable Markov decision process (MDP) needs to be learned, which is still an open problem. We show in this paper that deep RL can fail even on simple toy tasks of this class. However, when the underlying decision problem is a supervised classification task, we show that finding the optimal tree can be cast as a fully observable Markov decision problem and be solved efficiently, giving rise to a new family of algorithms for learning DTs that go beyond the classical greedy maximization ones. △ Less

Submitted 21 January, 2024; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: To be included in an other submission. arXiv admin note: text overlap with arXiv:2304.05839

arXiv:2309.12701 [pdf, other]

Interpretable Decision Tree Search as a Markov Decision Process

Authors: Hector Kohler, Riad Akrour, Philippe Preux

Abstract: Finding an optimal decision tree for a supervised learning task is a challenging combinatorial problem to solve at scale. It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling. Unfortunately, these methods are not competitive with the current branch-and-bound state-of-the-art. We propose instead to scale the resolutio… ▽ More Finding an optimal decision tree for a supervised learning task is a challenging combinatorial problem to solve at scale. It was recently proposed to frame the problem as a Markov Decision Problem (MDP) and use deep reinforcement learning to tackle scaling. Unfortunately, these methods are not competitive with the current branch-and-bound state-of-the-art. We propose instead to scale the resolution of such MDPs using an information-theoretic tests generating function that heuristically, and dynamically for every state, limits the set of admissible test actions to a few good candidates. As a solver, we show empirically that our algorithm is at the very least competitive with branch-and-bound alternatives. As a machine learning tool, a key advantage of our approach is to solve for multiple complexity-performance trade-offs at virtually no additional cost. With such a set of solutions, a user can then select the tree that generalizes best and which has the interpretability level that best suits their needs, which no current branch-and-bound method allows. △ Less

Submitted 13 June, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2308.16585 [pdf]

doi 10.1016/S2589-7500(23)00135-8

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

Authors: Patrick Saux, Pierre Bauvin, Violeta Raverdy, Julien Teigny, Hélène Verkindt, Tomy Soumphonphakdy, Maxence Debert, Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen , et al. (9 additional authors not shown)

Abstract: Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participa… ▽ More Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: The Lancet Digital Health, 2023

arXiv:2306.10882 [pdf, other]

AdaStop: adaptive statistical testing for sound comparisons of Deep RL agents

Authors: Timothée Mathieu, Riccardo Della Vecchia, Alena Shilova, Matheus Medeiros Centa, Hector Kohler, Odalric-Ambrym Maillard, Philippe Preux

Abstract: Recently, the scientific community has questioned the statistical reproducibility of many empirical results, especially in the field of machine learning. To solve this reproducibility crisis, we propose a theoretically sound methodology to compare the overall performance of multiple algorithms with stochastic returns. We exemplify our methodology in Deep RL. Indeed, the performance of one executio… ▽ More Recently, the scientific community has questioned the statistical reproducibility of many empirical results, especially in the field of machine learning. To solve this reproducibility crisis, we propose a theoretically sound methodology to compare the overall performance of multiple algorithms with stochastic returns. We exemplify our methodology in Deep RL. Indeed, the performance of one execution of a Deep RL algorithm is random. Therefore, several independent executions are needed to accurately evaluate the overall performance. When comparing several RL algorithms, a major question is how many executions must be made and how can we ensure that the results of such a comparison are theoretically sound. When comparing several algorithms at once, the error of each comparison may accumulate and must be taken into account with a multiple tests procedure to preserve low error guarantees. We introduce AdaStop, a new statistical test based on multiple group sequential tests. When comparing algorithms, AdaStop adapts the number of executions to stop as early as possible while ensuring that we have enough information to distinguish algorithms that perform better than the others in a statistical significant way. We prove theoretically and empirically that AdaStop has a low probability of making a (family-wise) error. Finally, we illustrate the effectiveness of AdaStop in multiple Deep RL use-cases, including toy examples and challenging Mujoco environments. AdaStop is the first statistical test fitted to this sort of comparisons: AdaStop is both a significant contribution to statistics, and a major contribution to computational studies performed in reinforcement learning and in other domains. To summarize our contribution, we introduce AdaStop, a formally grounded statistical tool to let anyone answer the practical question: ``Is my algorithm the new state-of-the-art?''. △ Less

Submitted 27 January, 2024; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2304.05839 [pdf, other]

Optimal Interpretability-Performance Trade-off of Classification Trees with Black-Box Reinforcement Learning

Authors: Hector Kohler, Riad Akrour, Philippe Preux

Abstract: Interpretability of AI models allows for user safety checks to build trust in these models. In particular, decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. However, interpretability is hindered if the DT is too large. To learn compact trees, a Reinforcement Learning (RL) framework has been rece… ▽ More Interpretability of AI models allows for user safety checks to build trust in these models. In particular, decision trees (DTs) provide a global view on the learned model and clearly outlines the role of the features that are critical to classify a given data. However, interpretability is hindered if the DT is too large. To learn compact trees, a Reinforcement Learning (RL) framework has been recently proposed to explore the space of DTs. A given supervised classification task is modeled as a Markov decision problem (MDP) and then augmented with additional actions that gather information about the features, equivalent to building a DT. By appropriately penalizing these actions, the RL agent learns to optimally trade-off size and performance of a DT. However, to do so, this RL agent has to solve a partially observable MDP. The main contribution of this paper is to prove that it is sufficient to solve a fully observable problem to learn a DT optimizing the interpretability-performance trade-off. As such any planning or RL algorithm can be used. We demonstrate the effectiveness of this approach on a set of classical supervised classification datasets and compare our approach with other interpretability-performance optimizing methods. △ Less

Submitted 11 April, 2023; originally announced April 2023.

arXiv:2210.08503 [pdf, other]

Entropy Regularized Reinforcement Learning with Cascading Networks

Authors: Riccardo Della Vecchia, Alena Shilova, Philippe Preux, Riad Akrour

Abstract: Deep Reinforcement Learning (Deep RL) has had incredible achievements on high dimensional problems, yet its learning process remains unstable even on the simplest tasks. Deep RL uses neural networks as function approximators. These neural models are largely inspired by developments in the (un)supervised machine learning community. Compared to these learning frameworks, one of the major difficultie… ▽ More Deep Reinforcement Learning (Deep RL) has had incredible achievements on high dimensional problems, yet its learning process remains unstable even on the simplest tasks. Deep RL uses neural networks as function approximators. These neural models are largely inspired by developments in the (un)supervised machine learning community. Compared to these learning frameworks, one of the major difficulties of RL is the absence of i.i.d. data. One way to cope with this difficulty is to control the rate of change of the policy at every iteration. In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture, by having a neural model that grows in size at each policy update. This allows a closed form entropy regularized policy update, which leads to a better control of the rate of change of the policy at each iteration and help cope with the non i.i.d. nature of RL. Initial experiments on classical RL benchmarks show promising results with remarkable convergence on some RL tasks when compared to other deep RL baselines, while exhibiting limitations on others. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2209.09882 [pdf, other]

Soft Action Priors: Towards Robust Policy Transfer

Authors: Matheus Centa, Philippe Preux

Abstract: Despite success in many challenging problems, reinforcement learning (RL) is still confronted with sample inefficiency, which can be mitigated by introducing prior knowledge to agents. However, many transfer techniques in reinforcement learning make the limiting assumption that the teacher is an expert. In this paper, we use the action prior from the Reinforcement Learning as Inference framework -… ▽ More Despite success in many challenging problems, reinforcement learning (RL) is still confronted with sample inefficiency, which can be mitigated by introducing prior knowledge to agents. However, many transfer techniques in reinforcement learning make the limiting assumption that the teacher is an expert. In this paper, we use the action prior from the Reinforcement Learning as Inference framework - that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior - to recover state-of-the-art policy distillation techniques. Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses. In contrast to prior work, we develop algorithms for leveraging suboptimal action priors that may nevertheless impart valuable knowledge - which we call soft action priors. The proposed algorithms adapt by adjusting the strength of teacher feedback according to an estimate of the teacher's usefulness in each state. We perform tabular experiments, which show that the proposed methods achieve state-of-the-art performance, surpassing it when learning from suboptimal priors. Finally, we demonstrate the robustness of the adaptive algorithms in continuous action deep RL problems, in which adaptive algorithms considerably improved stability when compared to existing policy distillation methods. △ Less

Submitted 20 September, 2022; originally announced September 2022.

Comments: Preprint

arXiv:2207.03270 [pdf, other]

gym-DSSAT: a crop model turned into a Reinforcement Learning environment

Authors: Romain Gautron, Emilio J. Padrón, Philippe Preux, Julien Bigot, Odalric-Ambrym Maillard, David Emukpere

Abstract: Addressing a real world sequential decision problem with Reinforcement Learning (RL) usually starts with the use of a simulated environment that mimics real conditions. We present a novel open source RL environment for realistic crop management tasks. gym-DSSAT is a gym interface to the Decision Support System for Agrotechnology Transfer (DSSAT), a high fidelity crop simulator. DSSAT has been deve… ▽ More Addressing a real world sequential decision problem with Reinforcement Learning (RL) usually starts with the use of a simulated environment that mimics real conditions. We present a novel open source RL environment for realistic crop management tasks. gym-DSSAT is a gym interface to the Decision Support System for Agrotechnology Transfer (DSSAT), a high fidelity crop simulator. DSSAT has been developped over the last 30 years and is widely recognized by agronomists. gym-DSSAT comes with predefined simulations based on real world maize experiments. The environment is as easy to use as any gym environment. We provide performance baselines using basic RL algorithms. We also briefly outline how the monolithic DSSAT simulator written in Fortran has been turned into a Python RL environment. Our methodology is generic and may be applied to similar simulators. We report on very preliminary experimental results which suggest that RL can help researchers to improve sustainability of fertilization and irrigation practices. △ Less

Submitted 27 September, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

Report number: Report-no: Inria RR-9460

arXiv:2110.10632 [pdf, other]

More Efficient Exploration with Symbolic Priors on Action Sequence Equivalences

Authors: Toby Johnstone, Nathan Grinsztajn, Johan Ferret, Philippe Preux

Abstract: Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different… ▽ More Incorporating prior knowledge in reinforcement learning algorithms is mainly an open question. Even when insights about the environment dynamics are available, reinforcement learning is traditionally used in a tabula rasa setting and must explore and learn everything from scratch. In this paper, we consider the problem of exploiting priors about action sequence equivalence: that is, when different sequences of actions produce the same effect. We propose a new local exploration strategy calibrated to minimize collisions and maximize new state visitations. We show that this strategy can be computed at little cost, by solving a convex optimization problem. By replacing the usual epsilon-greedy strategy in a DQN, we demonstrate its potential in several environments with various dynamic structures. △ Less

Submitted 7 November, 2021; v1 submitted 20 October, 2021; originally announced October 2021.

arXiv:2106.07360 [pdf, other]

Low-Rank Projections of GCNs Laplacian

Authors: Nathan Grinsztajn, Philippe Preux, Edouard Oyallon

Abstract: In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the performance of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to images, high frequencie… ▽ More In this work, we study the behavior of standard models for community detection under spectral manipulations. Through various ablation experiments, we evaluate the impact of bandpass filtering on the performance of a GCN: we empirically show that most of the necessary and used information for nodes classification is contained in the low-frequency domain, and thus contrary to images, high frequencies are less crucial to community detection. In particular, it is sometimes possible to obtain accuracies at a state-of-the-art level with simple classifiers that rely only on a few low frequencies. △ Less

Submitted 4 June, 2021; originally announced June 2021.

Journal ref: ICLR 2021 Workshop GTRL, 2021, Online, France

arXiv:2106.05875 [pdf, other]

Interferometric Graph Transform for Community Labeling

Authors: Nathan Grinsztajn, Louis Leconte, Philippe Preux, Edouard Oyallon

Abstract: We present a new approach for learning unsupervised node representations in community graphs. We significantly extend the Interferometric Graph Transform (IGT) to community labeling: this non-linear operator iteratively extracts features that take advantage of the graph topology through demodulation operations. An unsupervised feature extraction step cascades modulus non-linearity with linear oper… ▽ More We present a new approach for learning unsupervised node representations in community graphs. We significantly extend the Interferometric Graph Transform (IGT) to community labeling: this non-linear operator iteratively extracts features that take advantage of the graph topology through demodulation operations. An unsupervised feature extraction step cascades modulus non-linearity with linear operators that aim at building relevant invariants for community labeling. Via a simplified model, we show that the IGT concentrates around the E-IGT: those two representations are related through some ergodicity properties. Experiments on community labeling tasks show that this unsupervised representation achieves performances at the level of the state of the art on the standard and challenging datasets Cora, Citeseer, Pubmed and WikiCS. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2106.04480 [pdf, other]

There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning

Authors: Nathan Grinsztajn, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Abstract: We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order a… ▽ More We propose to learn to distinguish reversible from irreversible actions for better informed decision-making in Reinforcement Learning (RL). From theoretical considerations, we show that approximate reversibility can be learned through a simple surrogate task: ranking randomly sampled trajectory events in chronological order. Intuitively, pairs of events that are always observed in the same order are likely to be separated by an irreversible sequence of actions. Conveniently, learning the temporal order of events can be done in a fully self-supervised way, which we use to estimate the reversibility of actions from experience, without any priors. We propose two different strategies that incorporate reversibility in RL agents, one strategy for exploration (RAE) and one strategy for control (RAC). We demonstrate the potential of reversibility-aware agents in several environments, including the challenging Sokoban game. In synthetic tasks, we show that we can learn control policies that never fail and reduce to zero the side-effects of interactions, even without access to the reward function. △ Less

Submitted 29 October, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

arXiv:2105.09992 [pdf, other]

Don't Do What Doesn't Matter: Intrinsic Motivation with Action Usefulness

Authors: Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

Abstract: Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the… ▽ More Sparse rewards are double-edged training signals in reinforcement learning: easy to design but hard to optimize. Intrinsic motivation guidances have thus been developed toward alleviating the resulting exploration problem. They usually incentivize agents to look for new states through novelty signals. Yet, such methods encourage exhaustive exploration of the state space rather than focusing on the environment's salient interaction opportunities. We propose a new exploration method, called Don't Do What Doesn't Matter (DoWhaM), shifting the emphasis from state novelty to state with relevant actions. While most actions consistently change the state when used, \textit{e.g.} moving the agent, some actions are only effective in specific states, \textit{e.g.}, \emph{opening} a door, \emph{grabbing} an object. DoWhaM detects and rewards actions that seldom affect the environment. We evaluate DoWhaM on the procedurally-generated environment MiniGrid, against state-of-the-art methods and show that DoWhaM greatly reduces sample complexity. △ Less

Submitted 31 May, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

Comments: Accepted at Internationnal Joint Conference on Artificial Intelligence (IJCAI'21) and Self-Supervision for Reinforcement Learning Workshop (SSL-RL @ICLR'21)

arXiv:2102.04376 [pdf, other]

Adversarially Guided Actor-Critic

Authors: Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Abstract: Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper intro… ▽ More Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper introduces a third protagonist: the adversary. While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions. This novel objective stimulates the actor to follow strategies that could not have been correctly predicted from previous trajectories, making its behavior innovative in tasks where the reward is extremely rare. Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks. △ Less

Submitted 8 February, 2021; originally announced February 2021.

Comments: Accepted at ICLR 2021

arXiv:2011.04333 [pdf, other]

Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling

Authors: Nathan Grinsztajn, Olivier Beaumont, Emmanuel Jeannot, Philippe Preux

Abstract: In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non-determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement learning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcement learni… ▽ More In practice, it is quite common to face combinatorial optimization problems which contain uncertainty along with non-determinism and dynamicity. These three properties call for appropriate algorithms; reinforcement learning (RL) is dealing with them in a very natural way. Today, despite some efforts, most real-life combinatorial optimization problems remain out of the reach of reinforcement learning algorithms. In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem, and apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization. On the contrary to static scheduling, where tasks are assigned to processors in a predetermined ordering before the beginning of the parallel execution, our method is dynamic: task allocations and their execution ordering are decided at runtime, based on the system state and unexpected events, which allows much more flexibility. To do so, our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly. We show that this approach is competitive with state-of-the-art heuristics used in high-performance computing runtime systems. Moreover, our algorithm does not require an explicit model of the environment, but we demonstrate that extra knowledge can easily be incorporated and improves performance. We also exhibit key properties provided by this RL approach, and study its transfer abilities to other instances. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2010.04440 [pdf, other]

Learning Value Functions in Deep Policy Gradients using Residual Variance

Authors: Yannis Flet-Berliac, Reda Ouhamma, Odalric-Ambrym Maillard, Philippe Preux

Abstract: Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on recent studies indicating that traditional actor-critic algorithm… ▽ More Policy gradient algorithms have proven to be successful in diverse decision making and control tasks. However, these methods suffer from high sample complexity and instability issues. In this paper, we address these challenges by providing a different approach for training the critic in the actor-critic framework. Our work builds on recent studies indicating that traditional actor-critic algorithms do not succeed in fitting the true value function, calling for the need to identify a better objective for the critic. In our method, the critic uses a new state-value (resp. state-action-value) function approximation that learns the value of the states (resp. state-action pairs) relative to their mean value rather than the absolute value as in conventional actor-critic. We prove the theoretical consistency of the new gradient estimator and observe dramatic empirical improvement across a variety of continuous control tasks and algorithms. Furthermore, we validate our method in tasks with sparse rewards, where we provide experimental evidence and theoretical insights. △ Less

Submitted 15 March, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

Comments: Accepted at ICLR 2021

arXiv:2008.03127 [pdf, other]

A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning

Authors: Mathieu Seurin, Florian Strub, Philippe Preux, Olivier Pietquin

Abstract: Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speak… ▽ More Speaker recognition is a well known and studied task in the speech processing domain. It has many applications, either for security or speaker adaptation of personal devices. In this paper, we present a new paradigm for automatic speaker recognition that we call Interactive Speaker Recognition (ISR). In this paradigm, the recognition system aims to incrementally build a representation of the speakers by requesting personalized utterances to be spoken in contrast to the standard text-dependent or text-independent schemes. To do so, we cast the speaker recognition task into a sequential decision-making problem that we solve with Reinforcement Learning. Using a standard dataset, we show that our method achieves excellent performance while using little speech signal amounts. This method could also be applied as an utterance selection mechanism for building speech synthesis systems. △ Less

Submitted 7 August, 2020; originally announced August 2020.

arXiv:1910.02078 [pdf, other]

I'm sorry Dave, I'm afraid I can't do that, Deep Q-learning from forbidden action

Authors: Mathieu Seurin, Philippe Preux, Olivier Pietquin

Abstract: The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a rob… ▽ More The use of Reinforcement Learning (RL) is still restricted to simulation or to enhance human-operated systems through recommendations. Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes. In this paper, we propose a simple modification of a state-of-the-art deep RL algorithm (DQN), enabling learning from forbidden actions. To do so, the standard Q-learning update is enhanced with an extra safety loss inspired by structured classification. We empirically show that it reduces the number of hit constraints during the learning phase and accelerates convergence to near-optimal policies compared to using standard DQN. Experiments are done on a Visual Grid World Environment and Text-World domain. △ Less

Submitted 13 August, 2020; v1 submitted 4 October, 2019; originally announced October 2019.

Comments: Accepted at Internationnal Joint Conference on Neural Networks (IJCNN'2020)

arXiv:1909.11939 [pdf, other]

MERL: Multi-Head Reinforcement Learning

Authors: Yannis Flet-Berliac, Philippe Preux

Abstract: A common challenge in reinforcement learning is how to convert the agent's interactions with an environment into fast and robust learning. For instance, earlier work makes use of domain knowledge to improve existing reinforcement learning algorithms in complex tasks. While promising, previously acquired knowledge is often costly and challenging to scale up. Instead, we decide to consider problem k… ▽ More A common challenge in reinforcement learning is how to convert the agent's interactions with an environment into fast and robust learning. For instance, earlier work makes use of domain knowledge to improve existing reinforcement learning algorithms in complex tasks. While promising, previously acquired knowledge is often costly and challenging to scale up. Instead, we decide to consider problem knowledge with signals from quantities relevant to solve any task, e.g., self-performance assessment and accurate expectations. $\mathcal{V}^{ex}$ is such a quantity. It is the fraction of variance explained by the value function $V$ and measures the discrepancy between $V$ and the returns. Taking advantage of $\mathcal{V}^{ex}$, we propose MERL, a general framework for structuring reinforcement learning by injecting problem knowledge into policy gradient updates. As a result, the agent is not only optimized for a reward but learns using problem-focused quantities provided by MERL, applicable out-of-the-box to any task. In this paper: (a) We introduce and define MERL, the multi-head reinforcement learning framework we use throughout this work. (b) We conduct experiments across a variety of standard benchmark environments, including 9 continuous control tasks, where results show improved performance. (c) We demonstrate that MERL also improves transfer learning on a set of challenging pixel-based tasks. (d) We ponder how MERL tackles the problem of reward sparsity and better conditions the feature space of reinforcement learning agents. △ Less

Submitted 31 March, 2020; v1 submitted 26 September, 2019; originally announced September 2019.

Comments: Deep Reinforcement Learning Workshop, NeurIPS 2019

arXiv:1904.04025 [pdf, other]

doi 10.24963/ijcai.2020/376

Only Relevant Information Matters: Filtering Out Noisy Samples to Boost RL

Authors: Yannis Flet-Berliac, Philippe Preux

Abstract: In reinforcement learning, policy gradient algorithms optimize the policy directly and rely on sampling efficiently an environment. Nevertheless, while most sampling procedures are based on direct policy sampling, self-performance measures could be used to improve such sampling prior to each policy update. Following this line of thought, we introduce SAUNA, a method where non-informative transitio… ▽ More In reinforcement learning, policy gradient algorithms optimize the policy directly and rely on sampling efficiently an environment. Nevertheless, while most sampling procedures are based on direct policy sampling, self-performance measures could be used to improve such sampling prior to each policy update. Following this line of thought, we introduce SAUNA, a method where non-informative transitions are rejected from the gradient update. The level of information is estimated according to the fraction of variance explained by the value function: a measure of the discrepancy between V and the empirical returns. In this work, we use this metric to select samples that are useful to learn from, and we demonstrate that this selection can significantly improve the performance of policy gradient methods. In this paper: (a) We define SAUNA's metric and introduce its method to filter transitions. (b) We conduct experiments on a set of benchmark continuous control problems. SAUNA significantly improves performance. (c) We investigate how SAUNA reliably selects samples with the most positive impact on learning and study its improvement on both performance and sample efficiency. △ Less

Submitted 20 November, 2020; v1 submitted 8 April, 2019; originally announced April 2019.

Comments: Accepted at IJCAI 2020

arXiv:1812.06286 [pdf, other]

doi 10.1007/s11219-016-9332-8

A Large-Scale Study of Call Graph-based Impact Prediction using Mutation Testing

Authors: Vincenzo Musco, Martin Monperrus, Philippe Preux

Abstract: In software engineering, impact analysis involves predicting the software elements (e.g., modules, classes, methods) potentially impacted by a change in the source code. Impact analysis is required to optimize the testing effort. In this paper, we propose an evaluation technique to predict impact propagation. Based on 10 open-source Java projects and 5 classical mutation operators, we create 17,00… ▽ More In software engineering, impact analysis involves predicting the software elements (e.g., modules, classes, methods) potentially impacted by a change in the source code. Impact analysis is required to optimize the testing effort. In this paper, we propose an evaluation technique to predict impact propagation. Based on 10 open-source Java projects and 5 classical mutation operators, we create 17,000 mutants and study how the error they introduce propagates. This evaluation technique enables us to analyze impact prediction based on four types of call graph. Our results show that graph sophistication increases the completeness of impact prediction. However, and surprisingly to us, the most basic call graph gives the best trade-off between precision and recall for impact prediction. △ Less

Submitted 15 December, 2018; originally announced December 2018.

Journal ref: Software Quality Journal, Volume 25, Issue 3, pp 921-950, 2017

arXiv:1808.04446 [pdf, other]

Visual Reasoning with Multi-hop Feature Modulation

Authors: Florian Strub, Mathieu Seurin, Ethan Perez, Harm de Vries, Jérémie Mary, Philippe Preux, Aaron Courville, Olivier Pietquin

Abstract: Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to… ▽ More Recent breakthroughs in computer vision and natural language processing have spurred interest in challenging multi-modal tasks such as visual question-answering and visual dialogue. For such tasks, one successful approach is to condition image-based convolutional network computation on language via Feature-wise Linear Modulation (FiLM) layers, i.e., per-channel scaling and shifting. We propose to generate the parameters of FiLM layers going up the hierarchy of a convolutional network in a multi-hop fashion rather than all at once, as in prior work. By alternating between attending to the language input and generating FiLM layer parameters, this approach is better able to scale to settings with longer input sequences such as dialogue. We demonstrate that multi-hop FiLM generation achieves state-of-the-art for the short input sequence task ReferIt --- on-par with single-hop FiLM generation --- while also significantly outperforming prior state-of-the-art and single-hop FiLM generation on the GuessWhat?! visual dialogue task. △ Less

Submitted 12 October, 2018; v1 submitted 3 August, 2018; originally announced August 2018.

Comments: In Proc of ECCV 2018

arXiv:1807.09142 [pdf, other]

Recurrent Neural Networks for Long and Short-Term Sequential Recommendation

Authors: Kiewan Villatel, Elena Smirnova, Jérémie Mary, Philippe Preux

Abstract: Recommender systems objectives can be broadly characterized as modeling user preferences over short-or long-term time horizon. A large body of previous research studied long-term recommendation through dimensionality reduction techniques applied to the historical user-item interactions. A recently introduced session-based recommendation setting highlighted the importance of modeling short-term use… ▽ More Recommender systems objectives can be broadly characterized as modeling user preferences over short-or long-term time horizon. A large body of previous research studied long-term recommendation through dimensionality reduction techniques applied to the historical user-item interactions. A recently introduced session-based recommendation setting highlighted the importance of modeling short-term user preferences. In this task, Recurrent Neural Networks (RNN) have shown to be successful at capturing the nuances of user's interactions within a short time window. In this paper, we evaluate RNN-based models on both short-term and long-term recommendation tasks. Our experimental results suggest that RNNs are capable of predicting immediate as well as distant user interactions. We also find the best performing configuration to be a stacked RNN with layer normalization and tied item embeddings. △ Less

Submitted 23 July, 2018; originally announced July 2018.

arXiv:1710.06298 [pdf, ps, other]

doi 10.1007/978-3-319-72150-7_43

A generative model for sparse, evolving digraphs

Authors: Georgios Papoudakis, Philippe Preux, Martin Monperrus

Abstract: Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs similar to real ones. Since real graphs are evolving and this evolution is important to study in order to understand the underlying dynamical system, we tackle… ▽ More Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs similar to real ones. Since real graphs are evolving and this evolution is important to study in order to understand the underlying dynamical system, we tackle the problem of generating series of graphs. We propose SEDGE, an algorithm meant to generate series of graphs similar to a real series. SEDGE is an extension of SDG. We consider graphs that are representations of software programs and show experimentally that our approach outperforms other existing approaches. Experiments show the performance of both algorithms. △ Less

Submitted 17 October, 2017; originally announced October 2017.

Journal ref: 6th International Conference on Complex Networks and their applications, Nov 2017, Lyon, France

arXiv:1611.09187 [pdf]

doi 10.1007/s10664-017-9571-8

Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation

Authors: Benjamin Danglot, Philippe Preux, Benoit Baudry, Martin Monperrus

Abstract: Can the execution of a software be perturbed without breaking the correctness of the output? In this paper, we devise a novel protocol to answer this rarely investigated question. In an experimental study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon ``correctness attraction''. The uniqueness of this protocol is that it considers… ▽ More Can the execution of a software be perturbed without breaking the correctness of the output? In this paper, we devise a novel protocol to answer this rarely investigated question. In an experimental study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon ``correctness attraction''. The uniqueness of this protocol is that it considers a systematic exploration of the perturbation space as well as perfect oracles to determine the correctness of the output. To this extent, our findings on the stability of software under execution perturbations have a level of validity that has never been reported before in the scarce related work. A qualitative manual analysis enables us to set up the first taxonomy ever of the reasons behind correctness attraction. △ Less

Submitted 30 May, 2017; v1 submitted 28 November, 2016; originally announced November 2016.

Journal ref: Empirical Software Engineering, Springer Verlag, 2017

arXiv:1512.07435 [pdf, other]

doi 10.1145/2896995.2896996

A Learning Algorithm for Change Impact Prediction

Authors: Vincenzo Musco, Antonin Carette, Martin Monperrus, Philippe Preux

Abstract: Change impact analysis consists in predicting the impact of a code change in a software application. In this paper, we take a learning perspective on change impact analysis and consider the problem formulated as follows. The artifacts that are considered are methods of object-oriented software, the change under study is a change in the code of the method, the impact is the test methods that fail b… ▽ More Change impact analysis consists in predicting the impact of a code change in a software application. In this paper, we take a learning perspective on change impact analysis and consider the problem formulated as follows. The artifacts that are considered are methods of object-oriented software, the change under study is a change in the code of the method, the impact is the test methods that fail because of the change that has been performed. We propose an algorithm, called LCIP that learns from past impacts to predict future impacts. To evaluate our system, we consider 7 Java software applications totaling 214,000+ lines of code. We simulate 17574 changes and their actual impact through code mutations, as done in mutation testing. We find that LCIP can predict the impact with a precision of 69%, a recall of 79%, corresponding to a F-Score of 55%. △ Less

Submitted 6 May, 2018; v1 submitted 23 December, 2015; originally announced December 2015.

Comments: 5th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, 2016

Journal ref: Proceedings of the 5th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, 2016

arXiv:1510.08231 [pdf, other]

Operator-valued Kernels for Learning from Functional Response Data

Authors: Hachem Kadri, Emmanuel Duflos, Philippe Preux, Stéphane Canu, Alain Rakotomamonjy, Julien Audiffren

Abstract: In this paper we consider the problems of supervised classification and regression in the case where attributes and labels are functions: a data is represented by a set of functions, and the label is also a function. We focus on the use of reproducing kernel Hilbert space theory to learn from such functional data. Basic concepts and properties of kernel-based learning are extended to include the e… ▽ More In this paper we consider the problems of supervised classification and regression in the case where attributes and labels are functions: a data is represented by a set of functions, and the label is also a function. We focus on the use of reproducing kernel Hilbert space theory to learn from such functional data. Basic concepts and properties of kernel-based learning are extended to include the estimation of function-valued functions. In this setting, the representer theorem is restated, a set of rigorously defined infinite-dimensional operator-valued kernels that can be valuably applied when the data are functions is described, and a learning algorithm for nonlinear functional data analysis is introduced. The methodology is illustrated through speech and audio signal processing experiments. △ Less

Submitted 2 November, 2016; v1 submitted 28 October, 2015; originally announced October 2015.

Comments: in Journal of Machine Learning Research (JMLR), 2016

Journal ref: Journal of Machine Learning Research 17 (2016) 1-54

arXiv:1410.7921 [pdf, other]

A Generative Model of Software Dependency Graphs to Better Understand Software Evolution

Authors: Vincenzo Musco, Martin Monperrus, Philippe Preux

Abstract: Software systems are composed of many interacting elements. A natural way to abstract over software systems is to model them as graphs. In this paper we consider software dependency graphs of object-oriented software and we study one topological property: the degree distribution. Based on the analysis of ten software systems written in Java, we show that there exists completely different systems t… ▽ More Software systems are composed of many interacting elements. A natural way to abstract over software systems is to model them as graphs. In this paper we consider software dependency graphs of object-oriented software and we study one topological property: the degree distribution. Based on the analysis of ten software systems written in Java, we show that there exists completely different systems that have the same degree distribution. Then, we propose a generative model of software dependency graphs which synthesizes graphs whose degree distribution is close to the empirical ones observed in real software systems. This model gives us novel insights on the potential fundamental rules of software evolution. △ Less

Submitted 10 April, 2017; v1 submitted 29 October, 2014; originally announced October 2014.

arXiv:1405.7544 [pdf, other]

Cold-start Problems in Recommendation Systems via Contextual-bandit Algorithms

Authors: Hai Thanh Nguyen, Jérémie Mary, Philippe Preux

Abstract: In this paper, we study a cold-start problem in recommendation systems where we have completely new users entered the systems. There is not any interaction or feedback of the new users with the systems previoustly, thus no ratings are available. Trivial approaches are to select ramdom items or the most popular ones to recommend to the new users. However, these methods perform poorly in many case.… ▽ More In this paper, we study a cold-start problem in recommendation systems where we have completely new users entered the systems. There is not any interaction or feedback of the new users with the systems previoustly, thus no ratings are available. Trivial approaches are to select ramdom items or the most popular ones to recommend to the new users. However, these methods perform poorly in many case. In this research, we provide a new look of this cold-start problem in recommendation systems. In fact, we cast this cold-start problem as a contextual-bandit problem. No additional information on new users and new items is needed. We consider all the past ratings of previous users as contextual information to be integrated into the recommendation framework. To solve this type of the cold-start problems, we propose a new efficient method which is based on the LinUCB algorithm for contextual-bandit problems. The experiments were conducted on three different publicly-available data sets, namely Movielens, Netflix and Yahoo!Music. The new proposed methods were also compared with other state-of-the-art techniques. Experiments showed that our new method significantly improves upon all these methods. △ Less

Submitted 29 May, 2014; originally announced May 2014.

arXiv:1405.3536 [pdf, other]

Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques

Authors: Olivier Nicol, Jérémie Mary, Philippe Preux

Abstract: In many recommendation applications such as news recommendation, the items that can be rec- ommended come and go at a very fast pace. This is a challenge for recommender systems (RS) to face this setting. Online learning algorithms seem to be the most straight forward solution. The contextual bandit framework was introduced for that very purpose. In general the evaluation of a RS is a critical iss… ▽ More In many recommendation applications such as news recommendation, the items that can be rec- ommended come and go at a very fast pace. This is a challenge for recommender systems (RS) to face this setting. Online learning algorithms seem to be the most straight forward solution. The contextual bandit framework was introduced for that very purpose. In general the evaluation of a RS is a critical issue. Live evaluation is of- ten avoided due to the potential loss of revenue, hence the need for offline evaluation methods. Two options are available. Model based meth- ods are biased by nature and are thus difficult to trust when used alone. Data driven methods are therefore what we consider here. Evaluat- ing online learning algorithms with past data is not simple but some methods exist in the litera- ture. Nonetheless their accuracy is not satisfac- tory mainly due to their mechanism of data re- jection that only allow the exploitation of a small fraction of the data. We precisely address this issue in this paper. After highlighting the limita- tions of the previous methods, we present a new method, based on bootstrapping techniques. This new method comes with two important improve- ments: it is much more accurate and it provides a measure of quality of its estimation. The latter is a highly desirable property in order to minimize the risks entailed by putting online a RS for the first time. We provide both theoretical and ex- perimental proofs of its superiority compared to state-of-the-art methods, as well as an analysis of the convergence of the measure of quality. △ Less

Submitted 14 May, 2014; originally announced May 2014.

Journal ref: International Conference on Machine Learning 32 (2014)

arXiv:1301.2656 [pdf, ps, other]

Multiple functional regression with both discrete and continuous covariates

Authors: Hachem Kadri, Philippe Preux, Emmanuel Duflos, Stéphane Canu

Abstract: In this paper we present a nonparametric method for extending functional regression methodology to the situation where more than one functional covariate is used to predict a functional response. Borrowing the idea from Kadri et al. (2010a), the method, which support mixed discrete and continuous explanatory variables, is based on estimating a function-valued function in reproducing kernel Hilbert… ▽ More In this paper we present a nonparametric method for extending functional regression methodology to the situation where more than one functional covariate is used to predict a functional response. Borrowing the idea from Kadri et al. (2010a), the method, which support mixed discrete and continuous explanatory variables, is based on estimating a function-valued function in reproducing kernel Hilbert spaces by virtue of positive operator-valued kernels. △ Less

Submitted 12 January, 2013; originally announced January 2013.

Journal ref: 2nd International Workshop on Functional and Operatorial Statistics (IWFOS), Santander : Spain (2011)

arXiv:1301.2655 [pdf, other]

Functional Regularized Least Squares Classi cation with Operator-valued Kernels

Authors: Hachem Kadri, Asma Rabaoui, Philippe Preux, Emmanuel Duflos, Alain Rakotomamonjy

Abstract: Although operator-valued kernels have recently received increasing interest in various machine learning and functional data analysis problems such as multi-task learning or functional regression, little attention has been paid to the understanding of their associated feature spaces. In this paper, we explore the potential of adopting an operator-valued kernel feature space perspective for the anal… ▽ More Although operator-valued kernels have recently received increasing interest in various machine learning and functional data analysis problems such as multi-task learning or functional regression, little attention has been paid to the understanding of their associated feature spaces. In this paper, we explore the potential of adopting an operator-valued kernel feature space perspective for the analysis of functional data. We then extend the Regularized Least Squares Classification (RLSC) algorithm to cover situations where there are multiple functions per observation. Experiments on a sound recognition problem show that the proposed method outperforms the classical RLSC algorithm. △ Less

Submitted 12 January, 2013; originally announced January 2013.

Journal ref: 28th International Conference on Machine Learning (ICML), Seattle : United States (2011)

arXiv:1205.2171 [pdf, other]

A Generalized Kernel Approach to Structured Output Learning

Authors: Hachem Kadri, Mohammad Ghavamzadeh, Philippe Preux

Abstract: We study the problem of structured output learning from a regression perspective. We first provide a general formulation of the kernel dependency estimation (KDE) problem using operator-valued kernels. We show that some of the existing formulations of this problem are special cases of our framework. We then propose a covariance-based operator-valued kernel that allows us to take into account the s… ▽ More We study the problem of structured output learning from a regression perspective. We first provide a general formulation of the kernel dependency estimation (KDE) problem using operator-valued kernels. We show that some of the existing formulations of this problem are special cases of our framework. We then propose a covariance-based operator-valued kernel that allows us to take into account the structure of the kernel feature space. This kernel operates on the output space and encodes the interactions between the outputs without any reference to the input space. To address this issue, we introduce a variant of our KDE method based on the conditional covariance operator that in addition to the correlation between the outputs takes into account the effects of the input variables. Finally, we evaluate the performance of our KDE approach using both covariance and conditional covariance kernels on two structured output problems, and compare it to the state-of-the-art kernel-based structured output regression methods. △ Less

Submitted 15 July, 2015; v1 submitted 10 May, 2012; originally announced May 2012.

Comments: in International Conference on Machine Learning (ICML), Jun 2013, Atlanta, United States. 2013

Report number: RR-7956

arXiv:1203.1596 [pdf, ps, other]

Multiple Operator-valued Kernel Learning

Authors: Hachem Kadri, Alain Rakotomamonjy, Francis Bach, Philippe Preux

Abstract: Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the… ▽ More Positive definite operator-valued kernels generalize the well-known notion of reproducing kernels, and are naturally adapted to multi-output learning situations. This paper addresses the problem of learning a finite linear combination of infinite-dimensional operator-valued kernels which are suitable for extending functional data analysis methods to nonlinear contexts. We study this problem in the case of kernel ridge regression for functional responses with an lr-norm constraint on the combination coefficients. The resulting optimization problem is more involved than those of multiple scalar-valued kernel learning since operator-valued kernels pose more technical and theoretical issues. We propose a multiple operator-valued kernel learning algorithm based on solving a system of linear operator equations by using a block coordinatedescent procedure. We experimentally validate our approach on a functional regression task in the context of finger movement prediction in brain-computer interfaces. △ Less

Submitted 14 June, 2012; v1 submitted 7 March, 2012; originally announced March 2012.

Comments: No. RR-7900 (2012)

arXiv:1203.0203 [pdf, other]

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

Authors: Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, Patrick Gallinari

Abstract: The use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained throug… ▽ More The use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce error-correcting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rollouts-based approaches. The first method consists in using an ECOC-based classifier as the multiclass classifier, reducing the learning complexity from O(A2) to O(Alog(A)). We then propose a novel method that profits from the ECOC's coding dictionary to split the initial MDP into O(log(A)) seperate two-action MDPs. This second method reduces learning complexity even further, from O(A2) to O(log(A)), thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance. △ Less

Submitted 29 February, 2012; originally announced March 2012.

MSC Class: 68T05

arXiv:1108.5668 [pdf, other]

doi 10.1007/978-3-642-23780-5_34

Datum-Wise Classification: A Sequential Approach to Sparsity

Authors: Gabriel Dulac-Arnold, Ludovic Denoyer, Philippe Preux, Patrick Gallinari

Abstract: We propose a novel classification technique whose aim is to select an appropriate representation for each datapoint, in contrast to the usual approach of selecting a representation encompassing the whole dataset. This datum-wise representation is found by using a sparsity inducing empirical risk, which is a relaxation of the standard L 0 regularized risk. The classification problem is modeled as a… ▽ More We propose a novel classification technique whose aim is to select an appropriate representation for each datapoint, in contrast to the usual approach of selecting a representation encompassing the whole dataset. This datum-wise representation is found by using a sparsity inducing empirical risk, which is a relaxation of the standard L 0 regularized risk. The classification problem is modeled as a sequential decision process that sequentially chooses, for each datapoint, which features to use before classifying. Datum-Wise Classification extends naturally to multi-class tasks, and we describe a specific case where our inference has equivalent complexity to a traditional linear classifier, while still using a variable number of features. We compare our classifier to classical L 1 regularized linear models (L 1-SVM and LARS) on a set of common binary and multi-class datasets and show that for an equal average number of features used we can get improved performance using our method. △ Less

Submitted 29 August, 2011; originally announced August 2011.

Comments: ECML2011

Journal ref: Lecture Notes in Computer Science, 2011, Volume 6911/2011, 375-390

arXiv:cs/0611145 [pdf, ps, other]

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

Authors: Manuel Loth, Philippe Preux

Abstract: This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced… ▽ More This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration. △ Less

Submitted 28 November, 2006; originally announced November 2006.

Journal ref: Dans European Symposium on Artificial Neural Networks (2006)

Showing 1–42 of 42 results for author: Preux, P