Zum Hauptinhalt springen

Showing 1–14 of 14 results for author: Dedieu, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.05946  [pdf, other

    cs.LG cs.AI

    Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments

    Authors: Antoine Dedieu, Wolfgang Lehrach, Guangyao Zhou, Dileep George, Miguel Lázaro-Gredilla

    Abstract: Despite their stellar performance on a wide range of tasks, including in-context tasks only revealed during inference, vanilla transformers and variants trained for next-token predictions (a) do not learn an explicit world model of their environment which can be flexibly queried and (b) cannot be used for planning or navigation. In this paper, we consider partially observed environments (POEs), wh… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  2. arXiv:2307.01201  [pdf, other

    cs.CL cs.AI

    Schema-learning and rebinding as mechanisms of in-context learning and emergence

    Authors: Sivaramakrishnan Swaminathan, Antoine Dedieu, Rajkumar Vasudeva Raju, Murray Shanahan, Miguel Lazaro-Gredilla, Dileep George

    Abstract: In-context learning (ICL) is one of the most powerful and most unexpected capabilities to emerge in recent transformer-based large language models (LLMs). Yet the mechanisms that underlie it are poorly understood. In this paper, we demonstrate that comparable ICL capabilities can be acquired by an alternative sequence prediction learning method using clone-structured causal graphs (CSCGs). Moreove… ▽ More

    Submitted 15 June, 2023; originally announced July 2023.

  3. arXiv:2302.00099  [pdf, other

    cs.LG

    Learning noisy-OR Bayesian Networks with Max-Product Belief Propagation

    Authors: Antoine Dedieu, Guangyao Zhou, Dileep George, Miguel Lazaro-Gredilla

    Abstract: Noisy-OR Bayesian Networks (BNs) are a family of probabilistic graphical models which express rich statistical dependencies in binary data. Variational inference (VI) has been the main method proposed to learn noisy-OR BNs with complex latent structures (Jaakkola & Jordan, 1999; Ji et al., 2020; Buhai et al., 2020). However, the proposed VI approaches either (a) use a recognition network with stan… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  4. arXiv:2202.04110  [pdf, other

    cs.LG cs.AI stat.ML

    PGMax: Factor Graphs for Discrete Probabilistic Graphical Models and Loopy Belief Propagation in JAX

    Authors: Guangyao Zhou, Antoine Dedieu, Nishanth Kumar, Wolfgang Lehrach, Miguel Lázaro-Gredilla, Shrinu Kushagra, Dileep George

    Abstract: PGMax is an open-source Python package for (a) easily specifying discrete Probabilistic Graphical Models (PGMs) as factor graphs; and (b) automatically running efficient and scalable loopy belief propagation (LBP) in JAX. PGMax supports general factor graphs with tractable factors, and leverages modern accelerators like GPUs for inference. Compared with existing alternatives, PGMax obtains higher-… ▽ More

    Submitted 24 March, 2023; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: Update authors list

  5. arXiv:2112.03371  [pdf, other

    cs.LG cs.CV

    Graphical Models with Attention for Context-Specific Independence and an Application to Perceptual Grouping

    Authors: Guangyao Zhou, Wolfgang Lehrach, Antoine Dedieu, Miguel Lázaro-Gredilla, Dileep George

    Abstract: Discrete undirected graphical models, also known as Markov Random Fields (MRFs), can flexibly encode probabilistic interactions of multiple variables, and have enjoyed successful applications to a wide range of problems. However, a well-known yet little studied limitation of discrete MRFs is that they cannot capture context-specific independence (CSI). Existing methods require carefully developed… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  6. arXiv:2111.02458  [pdf, other

    stat.ML cs.LG

    Perturb-and-max-product: Sampling and learning in discrete energy-based models

    Authors: Miguel Lazaro-Gredilla, Antoine Dedieu, Dileep George

    Abstract: Perturb-and-MAP offers an elegant approach to approximately sample from a energy-based model (EBM) by computing the maximum-a-posteriori (MAP) configuration of a perturbed version of the model. Sampling in turn enables learning. However, this line of research has been hindered by the general intractability of the MAP computation. Very few works venture outside tractable models, and when they do, t… ▽ More

    Submitted 5 November, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

  7. arXiv:2012.01744  [pdf, other

    stat.ML cs.LG

    Sample-Efficient L0-L2 Constrained Structure Learning of Sparse Ising Models

    Authors: Antoine Dedieu, Miguel Lázaro-Gredilla, Dileep George

    Abstract: We consider the problem of learning the underlying graph of a sparse Ising model with $p$ nodes from $n$ i.i.d. samples. The most recent and best performing approaches combine an empirical loss (the logistic regression loss or the interaction screening loss) with a regularizer (an L1 penalty or an L1 constraint). This results in a convex problem that can be solved separately for each node of the g… ▽ More

    Submitted 15 September, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

  8. arXiv:2006.06803  [pdf, other

    stat.ML cs.LG

    Query Training: Learning a Worse Model to Infer Better Marginals in Undirected Graphical Models with Hidden Variables

    Authors: Miguel Lázaro-Gredilla, Wolfgang Lehrach, Nishad Gothoskar, Guangyao Zhou, Antoine Dedieu, Dileep George

    Abstract: Probabilistic graphical models (PGMs) provide a compact representation of knowledge that can be queried in a flexible way: after learning the parameters of a graphical model once, new probabilistic queries can be answered at test time without retraining. However, when using undirected PGMS with hidden variables, two sources of error typically compound in all but the simplest models (a) learning er… ▽ More

    Submitted 25 February, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

  9. arXiv:2001.06471  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

    Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder

    Abstract: We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, M… ▽ More

    Submitted 6 June, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

    Comments: To appear in JMLR

  10. arXiv:1912.11398  [pdf, ps, other

    stat.ML cs.LG math.ST

    An error bound for Lasso and Group Lasso in high dimensions

    Authors: Antoine Dedieu

    Abstract: We leverage recent advances in high-dimensional statistics to derive new L2 estimation upper bounds for Lasso and Group Lasso in high-dimensions. For Lasso, our bounds scale as $(k^*/n) \log(p/k^*)$---$n\times p$ is the size of the design matrix and $k^*$ the dimension of the ground truth $\boldsymbolβ^*$---and match the optimal minimax rate. For Group Lasso, our bounds scale as… ▽ More

    Submitted 26 February, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

    Comments: arXiv admin note: text overlap with arXiv:1910.08880

  11. arXiv:1910.08880  [pdf, other

    stat.ML cs.LG stat.OT

    Improved error rates for sparse (group) learning with Lipschitz loss functions

    Authors: Antoine Dedieu

    Abstract: We study a family of sparse estimators defined as minimizers of some empirical Lipschitz loss function -- which include the hinge loss, the logistic loss and the quantile regression loss -- with a convex, sparse or group-sparse regularization. In particular, we consider the L1 norm on the coefficients, its sorted Slope version, and the Group L1-L2 extension. We propose a new theoretical framework… ▽ More

    Submitted 21 September, 2021; v1 submitted 19 October, 2019; originally announced October 2019.

    Comments: arXiv admin note: text overlap with arXiv:1810.03081

  12. arXiv:1905.00507  [pdf, other

    stat.ML cs.LG

    Learning higher-order sequential structure with cloned HMMs

    Authors: Antoine Dedieu, Nishad Gothoskar, Scott Swingle, Wolfgang Lehrach, Miguel Lázaro-Gredilla, Dileep George

    Abstract: Variable order sequence modeling is an important problem in artificial and natural intelligence. While overcomplete Hidden Markov Models (HMMs), in theory, have the capacity to represent long-term temporal structure, they often fail to learn and converge to local minima. We show that by constraining HMMs with a simple sparsity structure inspired by biology, we can make it learn variable order sequ… ▽ More

    Submitted 15 May, 2019; v1 submitted 1 May, 2019; originally announced May 2019.

  13. arXiv:1901.01585  [pdf, other

    stat.ML cs.LG

    Solving L1-regularized SVMs and related linear programs: Revisiting the effectiveness of Column and Constraint Generation

    Authors: Antoine Dedieu, Rahul Mazumder, Haoyue Wang

    Abstract: The linear Support Vector Machine (SVM) is a classic classification technique in machine learning. Motivated by applications in modern high dimensional statistics, we consider penalized SVM problems involving the minimization of a hinge-loss function with a convex sparsity-inducing regularizer such as: the L1-norm on the coefficients, its grouped generalization and the sorted L1-penalty (aka Slope… ▽ More

    Submitted 27 August, 2021; v1 submitted 6 January, 2019; originally announced January 2019.

  14. arXiv:1803.01440  [pdf, ps, other

    stat.ML cs.LG

    Hierarchical Modeling and Shrinkage for User Session Length Prediction in Media Streaming

    Authors: Antoine Dedieu, Rahul Mazumder, Zhen Zhu, Hossein Vahabi

    Abstract: An important metric of users' satisfaction and engagement within on-line streaming services is the user session length, i.e. the amount of time they spend on a service continuously without interruption. Being able to predict this value directly benefits the recommendation and ad pacing contexts in music and video streaming services. Recent research has shown that predicting the exact amount of tim… ▽ More

    Submitted 22 June, 2018; v1 submitted 4 March, 2018; originally announced March 2018.

    Comments: 20 pages