Search | arXiv e-print repository

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

Authors: Jerome Sieber, Carmen Amo Alonso, Alexandre Didier, Melanie N. Zeilinger, Antonio Orvieto

Abstract: Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more effi… ▽ More Softmax attention is the principle backbone of foundation models for various artificial intelligence applications, yet its quadratic complexity in sequence length can limit its inference throughput in long-context settings. To address this challenge, alternative architectures such as linear attention, State Space Models (SSMs), and Recurrent Neural Networks (RNNs) have been considered as more efficient alternatives. While connections between these approaches exist, such models are commonly developed in isolation and there is a lack of theoretical understanding of the shared principles underpinning these architectures and their subtle differences, greatly influencing performance and scalability. In this paper, we introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation. Our framework facilitates rigorous comparisons, providing new insights on the distinctive characteristics of each model class. For instance, we compare linear attention and selective SSMs, detailing their differences and conditions under which both are equivalent. We also provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated. Additionally, we substantiate these new insights with empirical validations and mathematical arguments. This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models. △ Less

Submitted 3 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:2403.16899 [pdf, other]

State Space Models as Foundation Models: A Control Theoretic Overview

Authors: Carmen Amo Alonso, Jerome Sieber, Melanie N. Zeilinger

Abstract: In recent years, there has been a growing interest in integrating linear state-space models (SSM) in deep neural network architectures of foundation models. This is exemplified by the recent success of Mamba, showing better performance than the state-of-the-art Transformer architectures in language tasks. Foundation models, like e.g. GPT-4, aim to encode sequential data into a latent space in orde… ▽ More In recent years, there has been a growing interest in integrating linear state-space models (SSM) in deep neural network architectures of foundation models. This is exemplified by the recent success of Mamba, showing better performance than the state-of-the-art Transformer architectures in language tasks. Foundation models, like e.g. GPT-4, aim to encode sequential data into a latent space in order to learn a compressed representation of the data. The same goal has been pursued by control theorists using SSMs to efficiently model dynamical systems. Therefore, SSMs can be naturally connected to deep sequence modeling, offering the opportunity to create synergies between the corresponding research areas. This paper is intended as a gentle introduction to SSM-based architectures for control theorists and summarizes the latest research developments. It provides a systematic review of the most successful SSM proposals and highlights their main features from a control theoretic perspective. Additionally, we present a comparative analysis of these models, evaluating their performance on a standardized benchmark designed for assessing a model's efficiency at learning long sequences. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2401.16291 [pdf, other]

MachineLearnAthon: An Action-Oriented Machine Learning Didactic Concept

Authors: Michal Tkáč, Jakub Sieber, Lara Kuhlmann, Matthias Brueggenolte, Alexandru Rinciog, Michael Henke, Artur M. Schweidtmann, Qinghe Gao, Maximilian F. Theisen, Radwa El Shawi

Abstract: Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. The broad application of ML and the accelerated pace of its evolution lead to an increasing need for dedicated teaching concepts aimed at making the application of this technology more reliable and responsible. However, teaching ML is a daunting task. Aside f… ▽ More Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. The broad application of ML and the accelerated pace of its evolution lead to an increasing need for dedicated teaching concepts aimed at making the application of this technology more reliable and responsible. However, teaching ML is a daunting task. Aside from the methodological complexity of ML algorithms, both with respect to theory and implementation, the interdisciplinary and empirical nature of the field need to be taken into consideration. This paper introduces the MachineLearnAthon format, an innovative didactic concept designed to be inclusive for students of different disciplines with heterogeneous levels of mathematics, programming and domain expertise. At the heart of the concept lie ML challenges, which make use of industrial data sets to solve real-world problems. These cover the entire ML pipeline, promoting data literacy and practical skills, from data preparation, through deployment, to evaluation. △ Less

Submitted 29 January, 2024; originally announced January 2024.

arXiv:2312.15282 [pdf, other]

Causal Forecasting for Pricing

Authors: Douglas Schultz, Johannes Stephan, Julian Sieber, Trudie Yeh, Manuel Kunz, Patrick Doupe, Tim Januschowski

Abstract: This paper proposes a novel method for demand forecasting in a pricing context. Here, modeling the causal relationship between price as an input variable to demand is crucial because retailers aim to set prices in a (profit) optimal manner in a downstream decision making problem. Our methods bring together the Double Machine Learning methodology for causal inference and state-of-the-art transforme… ▽ More This paper proposes a novel method for demand forecasting in a pricing context. Here, modeling the causal relationship between price as an input variable to demand is crucial because retailers aim to set prices in a (profit) optimal manner in a downstream decision making problem. Our methods bring together the Double Machine Learning methodology for causal inference and state-of-the-art transformer-based forecasting models. In extensive empirical experiments, we show on the one hand that our method estimates the causal effect better in a fully controlled setting via synthetic, yet realistic data. On the other hand, we demonstrate on real-world data that our method outperforms forecasting methods in off-policy settings (i.e., when there's a change in the pricing policy) while only slightly trailing in the on-policy setting. △ Less

Submitted 30 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

arXiv:2305.14406 [pdf, other]

Deep Learning based Forecasting: a case study from the online fashion industry

Authors: Manuel Kunz, Stefan Birr, Mones Raslan, Lei Ma, Zhen Li, Adele Gouttes, Mateusz Koren, Tofigh Naghibi, Johannes Stephan, Mariia Bulycheva, Matthias Grzeschik, Armin Kekić, Michael Narodovitch, Kashif Rasul, Julian Sieber, Tim Januschowski

Abstract: Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry's set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed invento… ▽ More Demand forecasting in the online fashion industry is particularly amendable to global, data-driven forecasting models because of the industry's set of particular challenges. These include the volume of data, the irregularity, the high amount of turn-over in the catalog and the fixed inventory assumption. While standard deep learning forecasting approaches cater for many of these, the fixed inventory assumption requires a special treatment via controlling the relationship between price and demand closely. In this case study, we describe the data and our modelling approach for this forecasting problem in detail and present empirical results that highlight the effectiveness of our approach. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2209.12048 [pdf, other]

doi 10.1109/ICRA48891.2023.10161434

Chronos and CRS: Design of a miniature car-like robot and a software framework for single and multi-agent robotics and control

Authors: Andrea Carron, Sabrina Bodmer, Lukas Vogel, René Zurbrügg, David Helm, Rahel Rickenbach, Simon Muntwiler, Jerome Sieber, Melanie N. Zeilinger

Abstract: From both an educational and research point of view, experiments on hardware are a key aspect of robotics and control. In the last decade, many open-source hardware and software frameworks for wheeled robots have been presented, mainly in the form of unicycles and car-like robots, with the goal of making robotics accessible to a wider audience and to support control systems development. Unicycles… ▽ More From both an educational and research point of view, experiments on hardware are a key aspect of robotics and control. In the last decade, many open-source hardware and software frameworks for wheeled robots have been presented, mainly in the form of unicycles and car-like robots, with the goal of making robotics accessible to a wider audience and to support control systems development. Unicycles are usually small and inexpensive, and therefore facilitate experiments in a larger fleet, but they are not suited for high-speed motion. Car-like robots are more agile, but they are usually larger and more expensive, thus requiring more resources in terms of space and money. In order to bridge this gap, we present Chronos, a new car-like 1/28th scale robot with customized open-source electronics, and CRS, an open-source software framework for control and robotics. The CRS software framework includes the implementation of various state-of-the-art algorithms for control, estimation, and multi-agent coordination. With this work, we aim to provide easier access to hardware and reduce the engineering time needed to start new educational and research projects. △ Less

Submitted 17 November, 2023; v1 submitted 24 September, 2022; originally announced September 2022.

arXiv:2208.09033 [pdf, ps, other]

Quantitative Universal Approximation Bounds for Deep Belief Networks

Authors: Julian Sieber, Johann Gehringer

Abstract: We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. The approximation is measured in the $L^q$-norm for $q\in[1,\infty]$ ($q=\infty$ corresponding to the supremum norm) and in Kullback-Leibler divergence. Furthermore, we establish sharp quantitative b… ▽ More We show that deep belief networks with binary hidden units can approximate any multivariate probability density under very mild integrability requirements on the parental density of the visible nodes. The approximation is measured in the $L^q$-norm for $q\in[1,\infty]$ ($q=\infty$ corresponding to the supremum norm) and in Kullback-Leibler divergence. Furthermore, we establish sharp quantitative bounds on the approximation error in terms of the number of hidden units. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: 17 pages

arXiv:2103.06398 [pdf, other]

Analyzing the Hidden Activations of Deep Policy Networks: Why Representation Matters

Authors: Trevor A. McInroe, Michael Spurrier, Jennifer Sieber, Stephen Conneely

Abstract: We analyze the hidden activations of neural network policies of deep reinforcement learning (RL) agents and show, empirically, that it's possible to know a priori if a state representation will lend itself to fast learning. RL agents in high-dimensional states have two main learning burdens: (1) to learn an action-selection policy and (2) to learn to discern between useful and non-useful informati… ▽ More We analyze the hidden activations of neural network policies of deep reinforcement learning (RL) agents and show, empirically, that it's possible to know a priori if a state representation will lend itself to fast learning. RL agents in high-dimensional states have two main learning burdens: (1) to learn an action-selection policy and (2) to learn to discern between useful and non-useful information in a given state. By learning a latent representation of these high-dimensional states with an auxiliary model, the latter burden is effectively removed, thereby leading to accelerated training progress. We examine this phenomenon across tasks in the PyBullet Kuka environment, where an agent must learn to control a robotic gripper to pick up an object. Our analysis reveals how neural network policies learn to organize their internal representation of the state space throughout training. The results from this analysis provide three main insights into how deep RL agents learn. First, a well-organized internal representation within the policy network is a prerequisite to learning good action-selection. Second, a poor initial representation can cause an unrecoverable collapse within a policy network. Third, a good initial representation allows an agent's policy network to organize its internal representation even before any training begins. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: 11 pages, 29 figures

arXiv:2103.04709 [pdf, other]

Design, Optimal Guidance and Control of a Low-cost Re-usable Electric Model Rocket

Authors: Lukas Spannagl, Elias Hampp, Andrea Carron, Jerome Sieber, Carlo Alberto Pascucci, Aldo U. Zgraggen, Alexander Domahidi, Melanie N. Zeilinger

Abstract: In the last decade, autonomous vertical take-off and landing (VTOL) vehicles have become increasingly important as they lower mission costs thanks to their re-usability. However, their development is complex, rendering even the basic experimental validation of the required advanced guidance and control (G & C) algorithms prohibitively time-consuming and costly. In this paper, we present the design… ▽ More In the last decade, autonomous vertical take-off and landing (VTOL) vehicles have become increasingly important as they lower mission costs thanks to their re-usability. However, their development is complex, rendering even the basic experimental validation of the required advanced guidance and control (G & C) algorithms prohibitively time-consuming and costly. In this paper, we present the design of an inexpensive small-scale VTOL platform that can be built from off-the-shelf components for less than 1000 USD. The vehicle design mimics the first stage of a reusable launcher, making it a perfect test-bed for G & C algorithms. To control the vehicle during ascent and descent, we propose a real-time optimization-based G & C algorithm. The key features are a real-time minimum fuel and free-final-time optimal guidance combined with an offset-free tracking model predictive position controller. The vehicle hardware design and the G & C algorithm are experimentally validated both indoors and outdoor, showing reliable operation in a fully autonomous fashion with all computations done on-board and in real-time. △ Less

Submitted 8 March, 2021; originally announced March 2021.

Comments: 8 pages

Showing 1–9 of 9 results for author: Sieber, J