Zum Hauptinhalt springen

Showing 1–9 of 9 results for author: Sieber, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15731  [pdf, other

    cs.LG cs.AI eess.SY

    Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

    Authors: Jerome Sieber, Carmen Amo Alonso, Alexandre Didier, Melanie N. Zeilinger, Antonio Orvieto

    Abstract: Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more effi… ▽ More

    Submitted 3 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2403.16899  [pdf, other

    eess.SY cs.CL cs.LG

    State Space Models as Foundation Models: A Control Theoretic Overview

    Authors: Carmen Amo Alonso, Jerome Sieber, Melanie N. Zeilinger

    Abstract: In recent years, there has been a growing interest in integrating linear state-space models (SSM) in deep neural network architectures of foundation models. This is exemplified by the recent success of Mamba, showing better performance than the state-of-the-art Transformer architectures in language tasks. Foundation models, like e.g. GPT-4, aim to encode sequential data into a latent space in orde… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  3. arXiv:2401.16291  [pdf, other

    cs.LG cs.CY

    MachineLearnAthon: An Action-Oriented Machine Learning Didactic Concept

    Authors: Michal Tkáč, Jakub Sieber, Lara Kuhlmann, Matthias Brueggenolte, Alexandru Rinciog, Michael Henke, Artur M. Schweidtmann, Qinghe Gao, Maximilian F. Theisen, Radwa El Shawi

    Abstract: Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. The broad application of ML and the accelerated pace of its evolution lead to an increasing need for dedicated teaching concepts aimed at making the application of this technology more reliable and responsible. However, teaching ML is a daunting task. Aside f… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  4. arXiv:2312.15282  [pdf, other

    stat.ML cs.LG

    Causal Forecasting for Pricing

    Authors: Douglas Schultz, Johannes Stephan, Julian Sieber, Trudie Yeh, Manuel Kunz, Patrick Doupe, Tim Januschowski

    Abstract: This paper proposes a novel method for demand forecasting in a pricing context. Here, modeling the causal relationship between price as an input variable to demand is crucial because retailers aim to set prices in a (profit) optimal manner in a downstream decision making problem. Our methods bring together the Double Machine Learning methodology for causal inference and state-of-the-art transforme… ▽ More

    Submitted 30 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

  5. arXiv:2305.14406  [pdf, other

    cs.LG cs.AI

    Deep Learning based Forecasting: a case study from the online fashion industry

    Authors: Manuel Kunz, Stefan Birr, Mones Raslan, Lei Ma, Zhen Li, Adele Gouttes, Mateusz Koren, Tofigh Naghibi, Johannes Stephan, Mariia Bulycheva, Matthias Grzeschik, Armin Kekić, Michael Narodovitch, Kashif Rasul, Julian Sieber, Tim Januschowski

    Abstract: Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry's set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed invento… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  6. Chronos and CRS: Design of a miniature car-like robot and a software framework for single and multi-agent robotics and control

    Authors: Andrea Carron, Sabrina Bodmer, Lukas Vogel, René Zurbrügg, David Helm, Rahel Rickenbach, Simon Muntwiler, Jerome Sieber, Melanie N. Zeilinger

    Abstract: From both an educational and research point of view, experiments on hardware are a key aspect of robotics and control. In the last decade, many open-source hardware and software frameworks for wheeled robots have been presented, mainly in the form of unicycles and car-like robots, with the goal of making robotics accessible to a wider audience and to support control systems development. Unicycles… ▽ More

    Submitted 17 November, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

  7. arXiv:2208.09033  [pdf, ps, other

    stat.ML cs.LG

    Quantitative Universal Approximation Bounds for Deep Belief Networks

    Authors: Julian Sieber, Johann Gehringer

    Abstract: We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. The approximation is measured in the $L^q$-norm for $q\in[1,\infty]$ ($q=\infty$ corresponding to the supremum norm) and in Kullback-Leibler divergence. Furthermore, we establish sharp quantitative b… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: 17 pages

  8. arXiv:2103.06398  [pdf, other

    cs.LG

    Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters

    Authors: Trevor A. McInroe, Michael Spurrier, Jennifer Sieber, Stephen Conneely

    Abstract: We analyze the hidden activations of neural network policies of deep reinforcement learning (RL) agents and show, empirically, that it's possible to know a priori if a state representation will lend itself to fast learning. RL agents in high-dimensional states have two main learning burdens: (1) to learn an action-selection policy and (2) to learn to discern between useful and non-useful informati… ▽ More

    Submitted 10 March, 2021; originally announced March 2021.

    Comments: 11 pages, 29 figures

  9. arXiv:2103.04709  [pdf, other

    cs.RO

    Design, Optimal Guidance and Control of a Low-cost Re-usable Electric Model Rocket

    Authors: Lukas Spannagl, Elias Hampp, Andrea Carron, Jerome Sieber, Carlo Alberto Pascucci, Aldo U. Zgraggen, Alexander Domahidi, Melanie N. Zeilinger

    Abstract: In the last decade, autonomous vertical take-off and landing (VTOL) vehicles have become increasingly important as they lower mission costs thanks to their re-usability. However, their development is complex, rendering even the basic experimental validation of the required advanced guidance and control (G & C) algorithms prohibitively time-consuming and costly. In this paper, we present the design… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 8 pages