Search | arXiv e-print repository

Missed Causes and Ambiguous Effects: Counterfactuals Pose Challenges for Interpreting Neural Networks

Abstract: Interpretability research takes counterfactual theories of causality for granted. Most causal methods rely on counterfactual interventions to inputs or the activations of particular model components, followed by observations of the change in models' output logits or behaviors. While this yields more faithful evidence than correlational methods, counterfactuals nonetheless have key problems that bi… ▽ More Interpretability research takes counterfactual theories of causality for granted. Most causal methods rely on counterfactual interventions to inputs or the activations of particular model components, followed by observations of the change in models' output logits or behaviors. While this yields more faithful evidence than correlational methods, counterfactuals nonetheless have key problems that bias our findings in specific and predictable ways. Specifically, (i) counterfactual theories do not effectively capture multiple independently sufficient causes of the same effect, which leads us to miss certain causes entirely; and (ii) counterfactual dependencies in neural networks are generally not transitive, which complicates methods for extracting and interpreting causal graphs from neural networks. We discuss the implications of these challenges for interpretability researchers and propose concrete suggestions for future work. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.03353 [pdf, ps, other]

doi 10.1115/DETC2013-12151

Is there an optimal choice of configuration space for Lie group integration schemes applied to constrained MBS?

Authors: Andreas Mueller, Zdravko Terze

Abstract: Recently various numerical integration schemes have been proposed for numerically simulating the dynamics of constrained multibody systems (MBS) operating. These integration schemes operate directly on the MBS configuration space considered as a Lie group. For discrete spatial mechanical systems there are two Lie group that can be used as configuration space: $SE\left( 3\right) $ and… ▽ More Recently various numerical integration schemes have been proposed for numerically simulating the dynamics of constrained multibody systems (MBS) operating. These integration schemes operate directly on the MBS configuration space considered as a Lie group. For discrete spatial mechanical systems there are two Lie group that can be used as configuration space: $SE\left( 3\right) $ and $SO\left( 3\right) \times \mathbb{R}^{3}$. Since the performance of the numerical integration scheme clearly depends on the underlying configuration space it is important to analyze the effect of using either variant. For constrained MBS a crucial aspect is the constraint satisfaction. In this paper the constraint violation observed for the two variants are investigated. It is concluded that the $SE\left( 3\right) $ formulation outperforms the $SO\left( 3\right) \times \mathbb{R}^{3}$ formulation if the absolute motions of the rigid bodies, as part of a constrained MBS, belong to a motion subgroup. In all other cases both formulations are equivalent. In the latter cases the $SO\left( 3\right) \times \mathbb{R}^{3}$ formulation should be used since the $SE\left( 3\right) $ formulation is numerically more complex, however. △ Less

Submitted 18 June, 2024; originally announced July 2024.

Journal ref: Proceedings of the ASME 2013 International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, IDETC/CIE 2013, August 12-15, 2013, Portland, OR, USA

arXiv:2407.02928 [pdf, other]

A new heuristic approach for contextuality degree estimates and its four- to six-qubit portrayals

Authors: Axel Muller, Metod Saniga, Alain Giorgetti, Frédéric Holweck, Colm Kelleher

Abstract: We introduce and describe a new heuristic method for finding an upper bound on the degree of contextuality and the corresponding unsatisfied part of a quantum contextual configuration with three-element contexts (i.e., lines) located in a multi-qubit symplectic polar space of order two. While the previously used method based on a SAT solver was limited to three qubits, this new method is much fast… ▽ More We introduce and describe a new heuristic method for finding an upper bound on the degree of contextuality and the corresponding unsatisfied part of a quantum contextual configuration with three-element contexts (i.e., lines) located in a multi-qubit symplectic polar space of order two. While the previously used method based on a SAT solver was limited to three qubits, this new method is much faster and more versatile, enabling us to also handle four- to six-qubit cases. The four-qubit unsatisfied configurations we found are quite remarkable. That of an elliptic quadric features 315 lines and has in its core three copies of the split Cayley hexagon of order two having a Heawood-graph-underpinned geometry in common. That of a hyperbolic quadric also has 315 lines but, as a point-line incidence structure, is isomorphic to the dual $\mathcal{DW}(5,2)$ of $\mathcal{W}(5,2)$. Finally, an unsatisfied configuration with 1575 lines associated with all the lines/contexts of the four-qubit space contains a distinguished $\mathcal{DW}(5,2)$ centered on a point-plane incidence graph of PG$(3,2)$. The corresponding configurations found in the five-qubit space exhibit a considerably higher degree of complexity, except for a hyperbolic quadric, whose 6975 unsatisfied contexts are compactified around the point-hyperplane incidence graph of PG$(4,2)$. The most remarkable unsatisfied patterns discovered in the six-qubit space are a couple of disjoint split Cayley hexagons (for the full space) and a subgeometry underpinned by the complete bipartite graph $K_{7,7}$ (for a hyperbolic quadric). △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 35 pages, 14 figures

MSC Class: 81P13 ACM Class: J.2

arXiv:2406.12571 [pdf, ps, other]

doi 10.1016/j.mechmachtheory.2014.06.014

The significance of the configuration space Lie group for the constraint satisfaction in numerical time integration of multibody systems

Authors: Andreas Mueller, Zdravko Terze

Abstract: The dynamics simulation of multibody systems (MBS) using spatial velocities (non-holonomic velocities) requires time integration of the dynamics equations together with the kinematic reconstruction equations (relating time derivatives of configuration variables to rigid body velocities). The latter are specific to the geometry of the rigid body motion underlying a particular formulation, and thus… ▽ More The dynamics simulation of multibody systems (MBS) using spatial velocities (non-holonomic velocities) requires time integration of the dynamics equations together with the kinematic reconstruction equations (relating time derivatives of configuration variables to rigid body velocities). The latter are specific to the geometry of the rigid body motion underlying a particular formulation, and thus to the used configuration space (c-space). The proper c-space of a rigid body is the Lie group SE(3), and the geometry is that of the screw motions. The rigid bodies within a MBS are further subjected to geometric constraints, often due to lower kinematic pairs that define SE(3) subgroups. Traditionally, however, in MBS dynamics the translations and rotations are parameterized independently, which implies the use of the direct product group $SO\left( 3\right) \times {\Bbb R}^{3}$ as rigid body c-space, although this does not account for rigid body motions. Hence, its appropriateness was recently put into perspective. In this paper the significance of the c-space for the constraint satisfaction in numerical time stepping schemes is analyzed for holonomicaly constrained MBS modeled with the 'absolute coordinate' approach, i.e. using the Newton-Euler equations for the individual bodies subjected to geometric constraints. It is shown that the geometric constraints a body is subjected to are exactly satisfied if they constrain the motion to a subgroup of its c-space. Since only the $SE\left( 3\right) $ subgroups have a practical significance it is regarded as the appropriate c-space for the constrained rigid body. Consequently the constraints imposed by lower pair joints are exactly satisfied if the joint connects a body to the ground. For a general MBS, where the motions are not constrained to a subgroup, the SE(3) and $SO\left( 3\right) \times {\Bbb R}^{3}$ yield the same order of accuracy. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Journal ref: The significance of the configuration space Lie group for the constraint satisfaction in numerical time integration of multibody systems, Mechanism and Machine Theory, Vol. 82, 2014, pp. 173-202

arXiv:2406.03348 [pdf, other]

Position: A Call to Action for a Human-Centered AutoML Paradigm

Authors: Marius Lindauer, Florian Karl, Anne Klier, Julia Moosbauer, Alexander Tornede, Andreas Mueller, Frank Hutter, Matthias Feurer, Bernd Bischl

Abstract: Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive p… ▽ More Automated machine learning (AutoML) was formed around the fundamental objectives of automatically and efficiently configuring machine learning (ML) workflows, aiding the research of new ML algorithms, and contributing to the democratization of ML by making it accessible to a broader audience. Over the past decade, commendable achievements in AutoML have primarily focused on optimizing predictive performance. This focused progress, while substantial, raises questions about how well AutoML has met its broader, original goals. In this position paper, we argue that a key to unlocking AutoML's full potential lies in addressing the currently underexplored aspect of user interaction with AutoML systems, including their diverse roles, expectations, and expertise. We envision a more human-centered approach in future AutoML research, promoting the collaborative design of ML systems that tightly integrates the complementary strengths of human expertise and AutoML methodologies. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02294 [pdf, other]

Smaller Batches, Bigger Gains? Investigating the Impact of Batch Sizes on Reinforcement Learning Based Real-World Production Scheduling

Authors: Arthur Müller, Felix Grumbach, Matthia Sabatelli

Abstract: Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts an… ▽ More Production scheduling is an essential task in manufacturing, with Reinforcement Learning (RL) emerging as a key solution. In a previous work, RL was utilized to solve an extended permutation flow shop scheduling problem (PFSSP) for a real-world production line with two stages, linked by a central buffer. The RL agent was trained to sequence equallysized product batches to minimize setup efforts and idle times. However, the substantial impact caused by varying the size of these product batches has not yet been explored. In this follow-up study, we investigate the effects of varying batch sizes, exploring both the quality of solutions and the training dynamics of the RL agent. The results demonstrate that it is possible to methodically identify reasonable boundaries for the batch size. These boundaries are determined on one side by the increasing sample complexity associated with smaller batch sizes, and on the other side by the decreasing flexibility of the agent when dealing with larger batch sizes. This provides the practitioner the ability to make an informed decision regarding the selection of an appropriate batch size. Moreover, we introduce and investigate two new curriculum learning strategies to enable the training with small batch sizes. The findings of this work offer the potential for application in several industrial use cases with comparable scheduling problems. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: This paper was accepted at the ETFA 2024 conference

arXiv:2405.01813 [pdf, other]

doi 10.1145/3555041.3589674

Towards Building Autonomous Data Services on Azure

Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to gain the most value from them. For cloud providers, managing every aspect of an ever-increasing set of data services, while meeting customer SLAs and minimizing operational cost is becoming more challenging. Cloud technology enables the collection of significant amounts of workload traces and system telemetry. With the progress in data science (DS) and machine learning (ML), it is feasible and desirable to utilize a data-driven, ML-based approach to automate various aspects of data services, resulting in the creation of autonomous data services. This paper presents our perspectives and insights on creating autonomous data services on Azure. It also covers the future endeavors we plan to undertake and unresolved issues that still need attention. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

arXiv:2404.06214 [pdf, other]

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors: Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

Abstract: After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-… ▽ More After last year's successful BabyLM Challenge, the competition will be hosted again in 2024/2025. The overarching goals of the challenge remain the same; however, some of the competition rules will be different. The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques. Second, we are relaxing the rules around pretraining data, and will now allow participants to construct their own datasets provided they stay within the 100M-word or 10M-word budget. Third, we introduce a multimodal vision-and-language track, and will release a corpus of 50% text-only and 50% image-text multimodal data as a starting point for LM model training. The purpose of this CfP is to provide rules for this year's challenge, explain these rule changes and their rationale in greater detail, give a timeline of this year's competition, and provide answers to frequently asked questions from last year's challenge. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2403.19647 [pdf, other]

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Authors: Samuel Marks, Can Rager, Eric J. Michaud, Yonatan Belinkov, David Bau, Aaron Mueller

Abstract: We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse featur… ▽ More We introduce methods for discovering and applying sparse feature circuits. These are causally implicated subnetworks of human-interpretable features for explaining language model behaviors. Circuits identified in prior work consist of polysemantic and difficult-to-interpret units like attention heads or neurons, rendering them unsuitable for many downstream applications. In contrast, sparse feature circuits enable detailed understanding of unanticipated mechanisms. Because they are based on fine-grained units, sparse feature circuits are useful for downstream tasks: We introduce SHIFT, where we improve the generalization of a classifier by ablating features that a human judges to be task-irrelevant. Finally, we demonstrate an entirely unsupervised and scalable interpretability pipeline by discovering thousands of sparse feature circuits for automatically discovered model behaviors. △ Less

Submitted 31 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: Code and data at https://github.com/saprmarks/feature-circuits. Demonstration at https://feature-circuits.xyz

arXiv:2403.18587 [pdf, other]

The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision

Authors: Andreas Müller, Erwin Quiring

Abstract: Resource efficiency plays an important role for machine learning nowadays. The energy and decision latency are two critical aspects to ensure a sustainable and practical application. Unfortunately, the energy consumption and decision latency are not robust against adversaries. Researchers have recently demonstrated that attackers can compute and submit so-called sponge examples at inference time t… ▽ More Resource efficiency plays an important role for machine learning nowadays. The energy and decision latency are two critical aspects to ensure a sustainable and practical application. Unfortunately, the energy consumption and decision latency are not robust against adversaries. Researchers have recently demonstrated that attackers can compute and submit so-called sponge examples at inference time to increase the energy consumption and decision latency of neural networks. In computer vision, the proposed strategy crafts inputs with less activation sparsity which could otherwise be used to accelerate the computation. In this paper, we analyze the mechanism how these energy-latency attacks reduce activation sparsity. In particular, we find that input uniformity is a key enabler. A uniform image, that is, an image with mostly flat, uniformly colored surfaces, triggers more activations due to a specific interplay of convolution, batch normalization, and ReLU activation. Based on these insights, we propose two new simple, yet effective strategies for crafting sponge examples: sampling images from a probability distribution and identifying dense, yet inconspicuous inputs in natural datasets. We empirically examine our findings in a comprehensive evaluation with multiple image classification models and show that our attack achieves the same sparsity effect as prior sponge-example methods, but at a fraction of computation effort. We also show that our sponge examples transfer between different neural networks. Finally, we discuss applications of our findings for the good by improving efficiency by increasing sparsity. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at the DLSP 2024

arXiv:2403.09988 [pdf, other]

Interactive Distance Field Mapping and Planning to Enable Human-Robot Collaboration

Authors: Usama Ali, Lan Wu, Adrian Mueller, Fouad Sukkar, Tobias Kaupp, Teresa Vidal-Calleja

Abstract: Human-robot collaborative applications require scene representations that are kept up-to-date and facilitate safe motions in dynamic scenes. In this letter, we present an interactive distance field mapping and planning (IDMP) framework that handles dynamic objects and collision avoidance through an efficient representation. We define \textit{interactive} mapping and planning as the process of crea… ▽ More Human-robot collaborative applications require scene representations that are kept up-to-date and facilitate safe motions in dynamic scenes. In this letter, we present an interactive distance field mapping and planning (IDMP) framework that handles dynamic objects and collision avoidance through an efficient representation. We define \textit{interactive} mapping and planning as the process of creating and updating the representation of the scene online while simultaneously planning and adapting the robot's actions based on that representation. Given depth sensor data, our framework builds a continuous field that allows to query the distance and gradient to the closest obstacle at any required position in 3D space. The key aspect of this work is an efficient Gaussian Process field that performs incremental updates and implicitly handles dynamic objects with a simple and elegant formulation based on a temporary latent model. In terms of mapping, IDMP is able to fuse point cloud data from single and multiple sensors, query the free space at any spatial resolution, and deal with moving objects without semantics. In terms of planning, IDMP allows seamless integration with gradient-based motion planners facilitating fast re-planning for collision-free navigation. The framework is evaluated on both real and synthetic datasets. A comparison with similar state-of-the-art frameworks shows superior performance when handling dynamic objects and comparable or better performance in the accuracy of the computed distance and gradient field. Finally, we show how the framework can be used for fast motion planning in the presence of moving objects. An accompanying video, code, and datasets are made publicly available https://uts-ri.github.io/IDMP. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.15776 [pdf, other]

Truly No-Regret Learning in Constrained MDPs

Authors: Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He

Abstract: Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in a… ▽ More Constrained Markov decision processes (CMDPs) are a common way to model safety constraints in reinforcement learning. State-of-the-art methods for efficiently solving CMDPs are based on primal-dual algorithms. For these algorithms, all currently known regret bounds allow for error cancellations -- one can compensate for a constraint violation in one round with a strict constraint satisfaction in another. This makes the online learning process unsafe since it only guarantees safety for the final (mixture) policy but not during learning. As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations. In this paper, we give the first affirmative answer. We first generalize a result on last-iterate convergence of regularized primal-dual schemes to CMDPs with multiple constraints. Building upon this insight, we propose a model-based primal-dual algorithm to learn in an unknown CMDP. We prove that our algorithm achieves sublinear regret without error cancellations. △ Less

Submitted 18 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2401.01127 [pdf, other]

Wireless 6G Connectivity for Massive Number of Devices and Critical Services

Authors: Anders E. Kalør, Giuseppe Durisi, Sinem Coleri, Stefan Parkvall, Wei Yu, Andreas Mueller, Petar Popovski

Abstract: Compared to the generations up to 4G, whose main focus was on broadband and coverage aspects, 5G has expanded the scope of wireless cellular systems towards embracing two new types of connectivity: massive machine-type communication (mMTC) and ultra-reliable low-latency communications (URLLC). This paper will discuss the possible evolution of these two types of connectivity within the umbrella of… ▽ More Compared to the generations up to 4G, whose main focus was on broadband and coverage aspects, 5G has expanded the scope of wireless cellular systems towards embracing two new types of connectivity: massive machine-type communication (mMTC) and ultra-reliable low-latency communications (URLLC). This paper will discuss the possible evolution of these two types of connectivity within the umbrella of 6G wireless systems. The paper consists of three parts. The first part deals with the connectivity for a massive number of devices. While mMTC research in 5G was predominantly focused on the problem of uncoordinated access in the uplink for a large number of devices, the traffic patterns in 6G may become more symmetric, leading to closed-loop massive connectivity. One of the drivers for this is distributed learning/inference. The second part of the paper will discuss the evolution of wireless connectivity for critical services. While latency and reliability are tightly coupled in 5G, 6G will support a variety of safety critical control applications with different types of timing requirements, as evidenced by the emergence of metrics related to information freshness and information value. Additionally, ensuring ultra-high reliability for safety critical control applications requires modeling and estimation of the tail statistics of the wireless channel, queue length, and delay. The fulfillment of these stringent requirements calls for the development of novel AI-based techniques, incorporating optimization theory, explainable AI, generative AI and digital twins. The third part will analyze the coexistence of massive connectivity and critical services. We will consider scenarios in which a massive number of devices need to support traffic patterns of mixed criticality. This will be followed by a discussion about the management of wireless resources shared by services with different criticality. △ Less

Submitted 1 June, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

Comments: 19 pages, 8 figures

arXiv:2312.08598 [pdf, other]

MotherNet: A Foundational Hypernetwork for Tabular Classification

Authors: Andreas Müller, Carlo Curino, Raghu Ramakrishnan

Abstract: The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture… ▽ More The advent of Foundation Models is transforming machine learning across many modalities (e.g., language, images, videos) with prompt engineering replacing training in many settings. Recent work on tabular data (e.g., TabPFN) hints at a similar opportunity to build Foundation Models for classification for numerical data. In this paper, we go one step further and propose a hypernetwork architecture that we call MotherNet, trained on millions of classification tasks, that, once prompted with a never-seen-before training set generates the weights of a trained ``child'' neural-network. Like other Foundation Models, MotherNet replaces training on specific datasets with in-context learning through a single forward pass. In contrast to existing hypernetworks that were either task-specific or trained for relatively constraint multi-task settings, MotherNet is trained to generate networks to perform multiclass classification on arbitrary tabular datasets without any dataset specific gradient descent. The child network generated by MotherNet using in-context learning outperforms neural networks trained using gradient descent on small datasets, and is competitive with predictions by TabPFN and standard ML methods like Gradient Boosting. Unlike a direct application of transformer models like TabPFN, MotherNet generated networks are highly efficient at inference time. This methodology opens up a new approach to building predictive models on tabular data that is both efficient and robust, without any dataset-specific training. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Comments: 17 pages, 13 figures

ACM Class: I.2.6

arXiv:2311.11046 [pdf]

DenseNet and Support Vector Machine classifications of major depressive disorder using vertex-wise cortical features

Authors: Vladimir Belov, Tracy Erwin-Grabner, Ling-Li Zeng, Christopher R. K. Ching, Andre Aleman, Alyssa R. Amod, Zeynep Basgoze, Francesco Benedetti, Bianca Besteher, Katharina Brosch, Robin Bülow, Romain Colle, Colm G. Connolly, Emmanuelle Corruble, Baptiste Couvy-Duchesne, Kathryn Cullen, Udo Dannlowski, Christopher G. Davey, Annemiek Dols, Jan Ernsting, Jennifer W. Evans, Lukas Fisch, Paola Fuentes-Claramonte, Ali Saffet Gonul, Ian H. Gotlib , et al. (63 additional authors not shown)

Abstract: Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, h… ▽ More Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. In this study, we used globally representative data from the ENIGMA-MDD working group containing an extensive sample of people with MDD (N=2,772) and HC (N=4,240), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. As we analyzed a multi-site sample, we additionally applied the ComBat harmonization tool to remove potential nuisance effects of site. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of features and classifiers is unfeasible. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.07811 [pdf, other]

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Authors: Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen

Abstract: In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the underlying structure of the task defined by the context, or do they rely on superficial heuristics that only generalize to identically distributed examples? We… ▽ More In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the underlying structure of the task defined by the context, or do they rely on superficial heuristics that only generalize to identically distributed examples? We address this question using transformations tasks and an NLI task that assess sensitivity to syntax - a requirement for robust language understanding. We further investigate whether out-of-distribution generalization can be improved via chain-of-thought prompting, where the model is provided with a sequence of intermediate computation steps that illustrate how the task ought to be performed. In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs. The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size; in particular, models pre-trained on code generalize better, and benefit more from chain-of-thought prompting. △ Less

Submitted 10 April, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL 2024

arXiv:2311.00889 [pdf, other]

Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code

Authors: Mohammed Latif Siddiq, Joanna C. S. Santos, Sajith Devareddy, Anna Muller

Abstract: With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to… ▽ More With the growing popularity of Large Language Models (LLMs) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate LLMs do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Therefore, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, configurable assessment techniques to evaluate the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation. △ Less

Submitted 3 June, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: Under review; 12 Pages

arXiv:2310.15213 [pdf, other]

Function Vectors in Large Language Models

Authors: Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau

Abstract: We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are… ▽ More We report the presence of a simple neural mechanism that represents an input-output function as a vector within autoregressive transformer language models (LMs). Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV). FVs are robust to changes in context, i.e., they trigger execution of the task on inputs such as zero-shot and natural text settings that do not resemble the ICL contexts from which they are collected. We test FVs across a range of tasks, models, and layers and find strong causal effects across settings in middle layers. We investigate the internal structure of FVs and find while that they often contain information that encodes the output space of the function, this information alone is not sufficient to reconstruct an FV. Finally, we test semantic vector composition in FVs, and find that to some extent they can be summed to create vectors that trigger new complex tasks. Our findings show that compact, causal internal vector representations of function abstractions can be explicitly extracted from LLMs. Our code and data are available at https://functions.baulab.info. △ Less

Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: ICLR 2024. 52 pages, 30 figures, 23 tables. Code and data at https://functions.baulab.info

arXiv:2310.15085 [pdf, other]

doi 10.1145/3627106.3627134

On the Detection of Image-Scaling Attacks in Machine Learning

Authors: Erwin Quiring, Andreas Müller, Konrad Rieck

Abstract: Image scaling is an integral part of machine learning and computer vision systems. Unfortunately, this preprocessing step is vulnerable to so-called image-scaling attacks where an attacker makes unnoticeable changes to an image so that it becomes a new image after scaling. This opens up new ways for attackers to control the prediction or to improve poisoning and backdoor attacks. While effective t… ▽ More Image scaling is an integral part of machine learning and computer vision systems. Unfortunately, this preprocessing step is vulnerable to so-called image-scaling attacks where an attacker makes unnoticeable changes to an image so that it becomes a new image after scaling. This opens up new ways for attackers to control the prediction or to improve poisoning and backdoor attacks. While effective techniques exist to prevent scaling attacks, their detection has not been rigorously studied yet. Consequently, it is currently not possible to reliably spot these attacks in practice. This paper presents the first in-depth systematization and analysis of detection methods for image-scaling attacks. We identify two general detection paradigms and derive novel methods from them that are simple in design yet significantly outperform previous work. We demonstrate the efficacy of these methods in a comprehensive evaluation with all major learning platforms and scaling algorithms. First, we show that image-scaling attacks modifying the entire scaled image can be reliably detected even under an adaptive adversary. Second, we find that our methods provide strong detection performance even if only minor parts of the image are manipulated. As a result, we can introduce a novel protection layer against image-scaling attacks. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted at ACSAC'23

arXiv:2309.05055 [pdf, other]

doi 10.1016/j.mechmachtheory.2019.103594

An Overview of Formulae for the Higher-Order Kinematics of Lower-Pair Chains with Applications in Robotics and Mechanism Theory

Authors: Andreas Mueller

Abstract: The motions of mechanisms can be described in terms of screw coordinates by means of an exponential mapping. The product of exponentials (POE) describes the configuration of a chain of bodies connected by lower pair joints. The kinematics is thus given in terms of joint screws. The POE serves to express loop constraints for mechanisms as well as the forward kinematics of serial manipulators. Besid… ▽ More The motions of mechanisms can be described in terms of screw coordinates by means of an exponential mapping. The product of exponentials (POE) describes the configuration of a chain of bodies connected by lower pair joints. The kinematics is thus given in terms of joint screws. The POE serves to express loop constraints for mechanisms as well as the forward kinematics of serial manipulators. Besides the compact formulations, the POE gives rise to purely algebraic relations for derivatives wrt. joint variables. It is known that the partial derivatives of the instantaneous joint screws (columns of the geometric Jacobian) are determined by Lie brackets the joint screws. Lesser-known is that derivative of arbitrary order can be compactly expressed by Lie brackets. This has significance for higher-order forward/inverse kinematics and dynamics of robots and multibody systems. Various relations were reported but are scattered in the literature and insufficiently recognized. This paper aims to provide a comprehensive overview of the relevant relations. Its original contributions are closed form and recursive relations for higher-order derivatives and Taylor expansions of various kinematic relations. Their application to kinematic control and dynamics of robotic manipulators and multibody systems is discussed. △ Less

Submitted 10 September, 2023; originally announced September 2023.

Journal ref: Mechanism and Machine Theory, Vol. 142, 2019, 103594, 35 pages

arXiv:2308.03660 [pdf, other]

Detecting Spells in Fantasy Literature with a Transformer Based Artificial Intelligence

Authors: Marcel Moravek, Alexander Zender, Andreas Müller

Abstract: Transformer architectures and models have made significant progress in language-based tasks. In this area, is BERT one of the most widely used and freely available transformer architecture. In our work, we use BERT for context-based phrase recognition of magic spells in the Harry Potter novel series. Spells are a common part of active magic in fantasy novels. Typically, spells are used in a specif… ▽ More Transformer architectures and models have made significant progress in language-based tasks. In this area, is BERT one of the most widely used and freely available transformer architecture. In our work, we use BERT for context-based phrase recognition of magic spells in the Harry Potter novel series. Spells are a common part of active magic in fantasy novels. Typically, spells are used in a specific context to achieve a supernatural effect. A series of investigations were conducted to see if a Transformer architecture could recognize such phrases based on their context in the Harry Potter saga. For our studies a pre-trained BERT model was used and fine-tuned utilising different datasets and training methods to identify the searched context. By considering different approaches for sequence classification as well as token classification, it is shown that the context of spells can be recognised. According to our investigations, the examined sequence length for fine-tuning and validation of the model plays a significant role in context recognition. Based on this, we have investigated whether spells have overarching properties that allow a transfer of the neural network models to other fantasy universes as well. The application of our model showed promising results and is worth to be deepened in subsequent studies. △ Less

Submitted 7 August, 2023; originally announced August 2023.

Comments: 18 pages, 11 figures, 13 tables

arXiv:2308.02575 [pdf, other]

doi 10.3389/feduc.2023.1272229

Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings

Authors: Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, Maximilian Sailer

Abstract: This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to le… ▽ More This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to learn more about the interrater reliability, consistency of the ratings across iterations and the correlation between ratings in terms of content and style. The results revealed a high interrater reliability with ICC scores ranging between 0.94 and 0.99 for different timespans, suggesting that GPT-4 is capable of generating consistent ratings across repetitions with a clear prompt. Style and content ratings show a high correlation of 0.87. When applying a non-adequate style the average content ratings remained constant, while style ratings decreased, which indicates that the large language model (LLM) effectively distinguishes between these two criteria during evaluation. The prompt used in this study is furthermore presented and explained. Further research is necessary to assess the robustness and reliability of AI models in various use cases. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: 14 pages, 7 tables, 1 figure

arXiv:2307.14164 [pdf, ps, other]

Towards Continuous Time Finite Horizon LQR Control in SE(3)

Authors: Shivesh Kumar, Andreas Mueller, Patrick Wensing, Frank Kirchner

Abstract: The control of free-floating robots requires dealing with several challenges. The motion of such robots evolves on a continuous manifold described by the Special Euclidean Group of dimension 3, known as SE(3). Methods from finite horizon Linear Quadratic Regulators (LQR) control have gained recent traction in the robotics community. However, such approaches are inherently solving an unconstrained… ▽ More The control of free-floating robots requires dealing with several challenges. The motion of such robots evolves on a continuous manifold described by the Special Euclidean Group of dimension 3, known as SE(3). Methods from finite horizon Linear Quadratic Regulators (LQR) control have gained recent traction in the robotics community. However, such approaches are inherently solving an unconstrained optimization problem and hence are unable to respect the manifold constraints imposed by the group structure of SE(3). This may lead to small errors, singularity problems and double cover issues depending on the choice of coordinates to model the floating base motion. In this paper, we propose the use of canonical exponential coordinates of SE(3) and the associated Exponential map along with its differentials to embed this structure in the theory of finite horizon LQR controllers. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: IEEE International Conference on Robotics and Automation (ICRA) 2023 Workshop on Geometric Representations The Roles of Modern Screw Theory, Lie algebra, and Geometric Algebra in Robotics

arXiv:2307.11357 [pdf, other]

Bridging the Reality Gap of Reinforcement Learning based Traffic Signal Control using Domain Randomization and Meta Learning

Authors: Arthur Müller, Matthia Sabatelli

Abstract: Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensiv… ▽ More Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensive analysis of potential simulation parameters that contribute to this reality gap. We then also examine two promising strategies that can bridge this gap: Domain Randomization (DR) and Model-Agnostic Meta-Learning (MAML). Both strategies were trained with a traffic simulation model of an intersection. In addition, the model was embedded in LemgoRL, a framework that integrates realistic, safety-critical requirements into the control system. Subsequently, we evaluated the performance of the two methods on a separate model of the same intersection that was developed with a different traffic simulator. In this way, we mimic the reality gap. Our experimental results show that both DR and MAML outperform a state-of-the-art RL algorithm, therefore highlighting their potential to mitigate the reality gap in RLbased TSC systems. △ Less

Submitted 21 July, 2023; originally announced July 2023.

Comments: Paper was accepted by the ITSC 2023 (26th IEEE International Conference on Intelligent Transportation Systems)

arXiv:2307.08486 [pdf, other]

Fairness in KI-Systemen

Authors: Janine Strotherm, Alissa Müller, Barbara Hammer, Benjamin Paaßen

Abstract: The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisci… ▽ More The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. -- Je mehr KI-gestützte Entscheidungen das Leben von Menschen betreffen, desto wichtiger ist die Fairness solcher Entscheidungen. In diesem Kapitel geben wir eine Einführung in die Forschung zu Fairness im maschinellen Lernen. Wir erklären die wesentlichen Fairness-Definitionen und Strategien zur Erreichung von Fairness anhand konkreter Beispiele und ordnen die Fairness-Forschung in den europäischen Kontext ein. Unser Beitrag richtet sich dabei an ein interdisziplinäres Publikum und verzichtet daher auf die mathematische Formulierung sondern betont Visualisierungen und Beispiele. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: in German language

arXiv:2307.00119 [pdf, other]

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning

Authors: Aaron Mueller, Kanika Narang, Lambert Mathias, Qifan Wang, Hamed Firooz

Abstract: Large language models show impressive results on few-shot NLP tasks. However, these models are memory and computation-intensive. Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of… ▽ More Large language models show impressive results on few-shot NLP tasks. However, these models are memory and computation-intensive. Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of tasks. To overcome this issue, we propose meta-training with demonstration retrieval, where we use a dense passage retriever to retrieve semantically similar labeled demonstrations to each example for more varied supervision. By separating external knowledge from model parameters, we can use meta-training to train parameter-efficient models that generalize well on a larger variety of tasks. We construct a meta-training set from UnifiedQA and CrossFit, and propose a demonstration bank based on UnifiedQA tasks. To our knowledge, our work is the first to combine retrieval with meta-training, to use DPR models to retrieve demonstrations, and to leverage demonstrations from many tasks simultaneously, rather than randomly sampling demonstrations from the training set of the target task. Our approach outperforms a variety of targeted parameter-efficient and retrieval-augmented few-shot methods on QA, NLI, and text classification tasks (including SQuAD, QNLI, and TREC). Our approach can be meta-trained and fine-tuned quickly on a single GPU. △ Less

Submitted 30 June, 2023; originally announced July 2023.

Comments: Accepted to Findings of ACL 2023

arXiv:2306.17793 [pdf, ps, other]

doi 10.1007/s11044-017-9583-6

Screw and Lie Group Theory in Multibody Dynamics -- Recursive Algorithms and Equations of Motion of Tree-Topology Systems

Authors: Andreas Mueller

Abstract: Screw and Lie group theory allows for user-friendly modeling of multibody systems (MBS) while at the same they give rise to computationally efficient recursive algorithms. The inherent frame invariance of such formulations allows for use of arbitrary reference frames within the kinematics modeling (rather than obeying modeling conventions such as the Denavit-Hartenberg convention) and to avoid int… ▽ More Screw and Lie group theory allows for user-friendly modeling of multibody systems (MBS) while at the same they give rise to computationally efficient recursive algorithms. The inherent frame invariance of such formulations allows for use of arbitrary reference frames within the kinematics modeling (rather than obeying modeling conventions such as the Denavit-Hartenberg convention) and to avoid introduction of joint frames. The computational efficiency is owed to a representation of twists, accelerations, and wrenches that minimizes the computational effort. This can be directly carried over to dynamics formulations. In this paper recursive $O\left( n\right) $ Newton-Euler algorithms are derived for the four most frequently used representations of twists, and their specific features are discussed. These formulations are related to the corresponding algorithms that were presented in the literature. The MBS motion equations are derived in closed form using the Lie group formulation. One are the so-called 'Euler-Jourdain' or 'projection' equations, of which Kane's equations are a special case, and the other are the Lagrange equations. The recursive kinematics formulations are readily extended to higher orders in order to compute derivatives of the motions equations. To this end, recursive formulations for the acceleration and jerk are derived. It is briefly discussed how this can be employed for derivation of the linearized motion equations and their time derivatives. The geometric modeling allows for direct application of Lie group integration methods, which is briefly discussed. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Journal ref: Multibody System Dynamics, Vol. 42, 2018, pp. 219-248

arXiv:2306.17415 [pdf, other]

doi 10.1007/s11044-017-9582-7

Screw and Lie Group Theory in Multibody Kinematics -- Motion Representation and Recursive Kinematics of Tree-Topology Systems

Authors: Andreas Mueller

Abstract: After three decades of computational multibody system (MBS) dynamics, current research is centered at the development of compact and user friendly yet computationally efficient formulations for the analysis of complex MBS. The key to this is a holistic geometric approach to the kinematics modeling observing that the general motion of rigid bodies as well as the relative motion due to technical joi… ▽ More After three decades of computational multibody system (MBS) dynamics, current research is centered at the development of compact and user friendly yet computationally efficient formulations for the analysis of complex MBS. The key to this is a holistic geometric approach to the kinematics modeling observing that the general motion of rigid bodies as well as the relative motion due to technical joints are screw motions. Moreover, screw theory provides the geometric setting and Lie group theory the analytic foundation for an intuitive and compact MBS modeling. The inherent frame invariance of this modeling approach gives rise to very efficient recursive $O\left( n\right) $ algorithms, for which the so-called 'spatial operator algebra' is one example, and allows for use of readily available geometric data. In this paper three variants for describing the configuration of tree-topology MBS in terms of relative coordinates, i.e. joint variables, are presented: the standard formulation using body-fixed joint frames, a formulation without joint frames, and a formulation without either joint or body-fixed reference frames. This allows for describing the MBS kinematics without introducing joint reference frames and therewith rendering the use of restrictive modeling convention, such as Denavit-Hartenberg parameters, redundant. Four different definitions of twists are recalled and the corresponding recursive expressions are derived. The corresponding Jacobians and their factorization are derived. The aim of this paper is to motivate the use of Lie group modeling and to provide a review of the different formulations for the kinematics of tree-topology MBS in terms of relative (joint) coordinates from the unifying perspective of screw and Lie group theory. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Journal ref: Multibody System Dynamics, Vol. 43, 2018, pp. 37-70

arXiv:2306.09479 [pdf, other]

Inverse Scaling: When Bigger Isn't Better

Authors: Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim , et al. (2 additional authors not shown)

Abstract: Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling… ▽ More Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling on 11 datasets collected by running a public contest, the Inverse Scaling Prize, with a substantial prize pool. Through analysis of the datasets, along with other examples found in the literature, we identify four potential causes of inverse scaling: (i) preference to repeat memorized sequences over following in-context instructions, (ii) imitation of undesirable patterns in the training data, (iii) tasks containing an easy distractor task which LMs could focus on, rather than the harder real task, and (iv) correct but misleading few-shot demonstrations of the task. We release the winning datasets at https://inversescaling.com/data to allow for further investigation of inverse scaling. Our tasks have helped drive the discovery of U-shaped and inverted-U scaling trends, where an initial trend reverses, suggesting that scaling trends are less reliable at predicting the behavior of larger-scale models than previously understood. Overall, our results suggest that there are tasks for which increased model scale alone may not lead to progress, and that more careful thought needs to go into the data and objectives for training language models. △ Less

Submitted 12 May, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: Published in TMLR (2023), 39 pages

Journal ref: Transactions on Machine Learning Research (TMLR), 10/2023, https://openreview.net/forum?id=DwgRm72GQF

arXiv:2306.07001 [pdf, ps, other]

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Authors: Adrian Müller, Pragnya Alatur, Giorgia Ramponi, Niao He

Abstract: Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one… ▽ More Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another. However, we do not consider such a behavior safe in practical applications. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by the augmented Lagrangian method and can be performed efficiently. We show that during $K$ episodes of exploring the CMDP, our algorithm obtains a regret of $\tilde{O}(\sqrt{K})$ for both the objective and the constraint violation. Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors. △ Less

Submitted 30 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.19905 [pdf, other]

How to Plant Trees in Language Models: Data and Architectural Effects on the Emergence of Syntactic Inductive Biases

Authors: Aaron Mueller, Tal Linzen

Abstract: Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear features - when performing tasks after fine-tuning. We test what aspects of pre-training are important for endowing encoder-decoder Transformers with an inductive… ▽ More Accurate syntactic representations are essential for robust generalization in natural language. Recent work has found that pre-training can teach language models to rely on hierarchical syntactic features - as opposed to incorrect linear features - when performing tasks after fine-tuning. We test what aspects of pre-training are important for endowing encoder-decoder Transformers with an inductive bias that favors hierarchical syntactic generalizations. We focus on architectural features (depth, width, and number of parameters), as well as the genre and size of the pre-training corpus, diagnosing inductive biases using two syntactic transformation tasks: question formation and passivization, both in English. We find that the number of parameters alone does not explain hierarchical generalization: model depth plays greater role than model width. We also find that pre-training on simpler language, such as child-directed speech, induces a hierarchical bias using an order-of-magnitude less data than pre-training on more typical datasets based on web text or Wikipedia; this suggests that in cognitively plausible language acquisition settings, neural language models may be more data-efficient than previously thought. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: Accepted to ACL 2023

arXiv:2305.10225 [pdf, other]

doi 10.1017/S0960129524000057

New and improved bounds on the contextuality degree of multi-qubit configurations

Authors: Axel Muller, Metod Saniga, Alain Giorgetti, Henri de Boutray, Frédéric Holweck

Abstract: We present algorithms and a C code to reveal quantum contextuality and evaluate the contextuality degree (a way to quantify contextuality) for a variety of point-line geometries located in binary symplectic polar spaces of small rank. With this code we were not only able to recover, in a more efficient way, all the results of a recent paper by de Boutray et al [(2022). Journal of Physics A: Mathem… ▽ More We present algorithms and a C code to reveal quantum contextuality and evaluate the contextuality degree (a way to quantify contextuality) for a variety of point-line geometries located in binary symplectic polar spaces of small rank. With this code we were not only able to recover, in a more efficient way, all the results of a recent paper by de Boutray et al [(2022). Journal of Physics A: Mathematical and Theoretical 55 475301], but also arrived at a bunch of new noteworthy results. The paper first describes the algorithms and the C code. Then it illustrates its power on a number of subspaces of symplectic polar spaces whose rank ranges from 2 to 7. The most interesting new results include: (i) non-contextuality of configurations whose contexts are subspaces of dimension 2 and higher, (ii) non-existence of negative subspaces of dimension 3 and higher, (iii) considerably improved bounds for the contextuality degree of both elliptic and hyperbolic quadrics for rank 4, as well as for a particular subgeometry of the three-qubit space whose contexts are the lines of this space, (iv) proof for the non-contextuality of perpsets and, last but not least, (v) contextual nature of a distinguished subgeometry of a multi-qubit doily, called a two-spread, and computation of its contextuality degree. Finally, in the three-qubit polar space we correct and improve the contextuality degree of the full configuration and also describe finite geometric configurations formed by unsatisfiable/invalid constraints for both types of quadrics as well as for the geometry whose contexts are all 315 lines of the space. △ Less

Submitted 31 May, 2024; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: 22 pages, 5 figures, 2 tables, published by Cambridge University Press in Mathematical Structures in Computer Science

Journal ref: Mathematical Structures in Computer Science. Published online 2024:1-22

arXiv:2305.05412 [pdf, other]

doi 10.1098/rspa.2022.0732

Hamel's Equations and Geometric Mechanics of Constrained and Floating Multibody and Space Systems

Authors: Andreas Mueller

Abstract: Modern geometric approaches to analytical mechanics rest on a bundle structure of the configuration space. The connection on this bundle allows for an intrinsic splitting of the reduced Euler-Lagrange equations. Hamel's equations, on the other hand, provide a universal approach to non-holonomic mechanics in local coordinates. The link between Hamel's formulation and geometric approaches in local c… ▽ More Modern geometric approaches to analytical mechanics rest on a bundle structure of the configuration space. The connection on this bundle allows for an intrinsic splitting of the reduced Euler-Lagrange equations. Hamel's equations, on the other hand, provide a universal approach to non-holonomic mechanics in local coordinates. The link between Hamel's formulation and geometric approaches in local coordinates has not been discussed sufficiently. The reduced Euler-Lagrange equations as well as the curvature of the connection, are derived with Hamel's original formalism. Intrinsic splitting into Euler-Lagrange and Euler-Poincare equations, and inertial decoupling is achieved by means of the locked velocity. Various aspects of this method are discussed. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.08106 [pdf, other]

doi 10.1007/978-3-031-27420-6_18

Towards Tumour Graph Learning for Survival Prediction in Head & Neck Cancer Patients

Authors: Angel Victor Juanco Muller, Joao F. C. Mota, Keith A. Goatman, Corne Hoogendoorn

Abstract: With nearly one million new cases diagnosed worldwide in 2020, head \& neck cancer is a deadly and common malignity. There are challenges to decision making and treatment of such cancer, due to lesions in multiple locations and outcome variability between patients. Therefore, automated segmentation and prognosis estimation approaches can help ensure each patient gets the most effective treatment.… ▽ More With nearly one million new cases diagnosed worldwide in 2020, head \& neck cancer is a deadly and common malignity. There are challenges to decision making and treatment of such cancer, due to lesions in multiple locations and outcome variability between patients. Therefore, automated segmentation and prognosis estimation approaches can help ensure each patient gets the most effective treatment. This paper presents a framework to perform these functions on arbitrary field of view (FoV) PET and CT registered scans, thus approaching tasks 1 and 2 of the HECKTOR 2022 challenge as team \texttt{VokCow}. The method consists of three stages: localization, segmentation and survival prediction. First, the scans with arbitrary FoV are cropped to the head and neck region and a u-shaped convolutional neural network (CNN) is trained to segment the region of interest. Then, using the obtained regions, another CNN is combined with a support vector machine classifier to obtain the semantic segmentation of the tumours, which results in an aggregated Dice score of 0.57 in task 1. Finally, survival prediction is approached with an ensemble of Weibull accelerated failure times model and deep learning methods. In addition to patient health record data, we explore whether processing graphs of image patches centred at the tumours via graph convolutions can improve the prognostic predictions. A concordance index of 0.64 was achieved in the test set, ranking 6th in the challenge leaderboard for this task. △ Less

Submitted 16 May, 2023; v1 submitted 17 April, 2023; originally announced April 2023.

Comments: Published by Springer as part of the HECKTOR 2022 challenge proccedings https://link.springer.com/chapter/10.1007/978-3-031-27420-6_18

arXiv:2303.18022 [pdf, other]

The Topology-Overlap Trade-Off in Retinal Arteriole-Venule Segmentation

Authors: Angel Victor Juanco Muller, Joao F. C. Mota, Keith A. Goatman, Corne Hoogendoorn

Abstract: Retinal fundus images can be an invaluable diagnosis tool for screening epidemic diseases like hypertension or diabetes. And they become especially useful when the arterioles and venules they depict are clearly identified and annotated. However, manual annotation of these vessels is extremely time demanding and taxing, which calls for automatic segmentation. Although convolutional neural networks… ▽ More Retinal fundus images can be an invaluable diagnosis tool for screening epidemic diseases like hypertension or diabetes. And they become especially useful when the arterioles and venules they depict are clearly identified and annotated. However, manual annotation of these vessels is extremely time demanding and taxing, which calls for automatic segmentation. Although convolutional neural networks can achieve high overlap between predictions and expert annotations, they often fail to produce topologically correct predictions of tubular structures. This situation is exacerbated by the bifurcation versus crossing ambiguity which causes classification mistakes. This paper shows that including a topology preserving term in the loss function improves the continuity of the segmented vessels, although at the expense of artery-vein misclassification and overall lower overlap metrics. However, we show that by including an orientation score guided convolutional module, based on the anisotropic single sided cake wavelet, we reduce such misclassification and further increase the topology correctness of the results. We evaluate our model on public datasets with conveniently chosen metrics to assess both overlap and topology correctness, showing that our model is able to produce results on par with state-of-the-art from the point of view of overlap, while increasing topological accuracy. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: To be published in proceedings of SPIE Medical Imaging 2023 Image Processing

arXiv:2303.17819 [pdf, ps, other]

An Efficient Off-Policy Reinforcement Learning Algorithm for the Continuous-Time LQR Problem

Authors: Victor G. Lopez, Matthias A. Müller

Abstract: In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the… ▽ More In this paper, an off-policy reinforcement learning algorithm is designed to solve the continuous-time LQR problem using only input-state data measured from the system. Different from other algorithms in the literature, we propose the use of a specific persistently exciting input as the exploration signal during the data collection step. We then show that, using this persistently excited data, the solution of the matrix equation in our algorithm is guaranteed to exist and to be unique at every iteration. Convergence of the algorithm to the optimal control input is also proven. Moreover, we formulate the policy evaluation step as the solution of a Sylvester-transpose equation, which increases the efficiency of its solution. Finally, a method to determine a stabilizing policy to initialize the algorithm using only measured data is proposed. △ Less

Submitted 31 March, 2023; originally announced March 2023.

Comments: 7 pages

arXiv:2303.07928 [pdf, other]

doi 10.1098/rspa.2021.0303

Review of the Exponential and Cayley Map on SE(3) as relevant for Lie Group Integration of the Generalized Poisson Equation and Flexible Multibody Systems

Authors: Andreas Mueller

Abstract: The exponential and Cayley map on SE(3) are the prevailing coordinate maps used in Lie group integration schemes for rigid body and flexible body systems. Such geometric integrators are the Munthe-Kaas and generalized-alpha schemes, which involve the differential and its directional derivative of the respective coordinate map. Relevant closed form expressions, which were reported over the last two… ▽ More The exponential and Cayley map on SE(3) are the prevailing coordinate maps used in Lie group integration schemes for rigid body and flexible body systems. Such geometric integrators are the Munthe-Kaas and generalized-alpha schemes, which involve the differential and its directional derivative of the respective coordinate map. Relevant closed form expressions, which were reported over the last two decades, are scattered in the literature, and some are reported without proof. This paper provides a reference summarizing all relevant closed form relations along with the relevant proofs. including the right-trivialized differential of the exponential and Cayley map and their directional derivatives (resembling the Hessian). The latter gives rise to an implicit generalized-alpha scheme for rigid/flexible multibody systems in terms of the Cayley map with improved computational efficiency. △ Less

Submitted 10 September, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

Journal ref: Proc. Royal Soc. A, Vol. 477, 2021

arXiv:2303.03876 [pdf, other]

Organelle-specific segmentation, spatial analysis, and visualization of volume electron microscopy datasets

Authors: Andreas Müller, Deborah Schmidt, Lucas Rieckert, Michele Solimena, Martin Weigert

Abstract: Volume electron microscopy is the method of choice for the in-situ interrogation of cellular ultrastructure at the nanometer scale. Recent technical advances have led to a rapid increase in large raw image datasets that require computational strategies for segmentation and spatial analysis. In this protocol, we describe a practical and annotation-efficient pipeline for organelle-specific segmentat… ▽ More Volume electron microscopy is the method of choice for the in-situ interrogation of cellular ultrastructure at the nanometer scale. Recent technical advances have led to a rapid increase in large raw image datasets that require computational strategies for segmentation and spatial analysis. In this protocol, we describe a practical and annotation-efficient pipeline for organelle-specific segmentation, spatial analysis, and visualization of large volume electron microscopy datasets using freely available, user-friendly software tools that can be run on a single standard workstation. We specifically target researchers in the life sciences with limited computational expertise, who face the following tasks within their volume electron microscopy projects: i) How to generate 3D segmentation labels for different types of cell organelles while minimizing manual annotation efforts, ii) how to analyze the spatial interactions between organelle instances, and iii) how to best visualize the 3D segmentation results. To meet these demands we give detailed guidelines for choosing the most efficient segmentation tools for the specific cell organelle. We furthermore provide easily executable components for spatial analysis and 3D rendering and bridge compatibility issues between freely available open-source tools, such that others can replicate our full pipeline starting from a raw dataset up to the final plots and rendered images. We believe that our detailed description can serve as a valuable reference for similar projects requiring special strategies for single- or multiple organelle analysis which can be achieved with computational resources commonly available to single-user setups. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2301.11796 [pdf, other]

Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

Authors: Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang

Abstract: We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size… ▽ More We present the call for papers for the BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus. This shared task is intended for participants with an interest in small scale language modeling, human language acquisition, low-resource NLP, and cognitive modeling. In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children. The task has three tracks, two of which restrict the training data to pre-released datasets of 10M and 100M words and are dedicated to explorations of approaches such as architectural variations, self-supervised objectives, or curriculum learning. The final track only restricts the amount of text used, allowing innovation in the choice of the data, its domain, and even its modality (i.e., data from sources other than text is welcome). We will release a shared evaluation pipeline which scores models on a variety of benchmarks and tasks, including targeted syntactic evaluations and natural language understanding. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2212.08979 [pdf, other]

Language model acceptability judgements are not always robust to context

Authors: Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams

Abstract: Targeted syntactic evaluations of language models ask whether models show stable preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most targeted syntactic evaluation datasets ask models to make these judgements with just a single context-free sentence as input. This does not match language models' training regime, in which input sentences are always highly con… ▽ More Targeted syntactic evaluations of language models ask whether models show stable preferences for syntactically acceptable content over minimal-pair unacceptable inputs. Most targeted syntactic evaluation datasets ask models to make these judgements with just a single context-free sentence as input. This does not match language models' training regime, in which input sentences are always highly contextualized by the surrounding corpus. This mismatch raises an important question: how robust are models' syntactic judgements in different contexts? In this paper, we investigate the stability of language models' performance on targeted syntactic evaluations as we vary properties of the input context: the length of the context, the types of syntactic phenomena it contains, and whether or not there are violations of grammaticality. We find that model judgements are generally robust when placed in randomly sampled linguistic contexts. However, they are substantially unstable for contexts containing syntactic structures matching those in the critical test content. Among all tested models (GPT-2 and five variants of OPT), we significantly improve models' judgements by providing contexts with matching syntactic structures, and conversely significantly worsen them using unacceptable contexts with matching but violated syntactic structures. This effect is amplified by the length of the context, except for unrelated inputs. We show that these changes in model performance are not explainable by simple features matching the context and the test inputs, such as lexical overlap and dependency overlap. This sensitivity to highly specific syntactic features of the context can only be explained by the models' implicit in-context learning abilities. △ Less

Submitted 17 December, 2022; originally announced December 2022.

arXiv:2212.05815 [pdf, other]

Informed Circular Fields for Global Reactive Obstacle Avoidance of Robotic Manipulators

Authors: Marvin Becker, Philipp Caspers, Tom Hattendorf, Torsten Lilge, Sami Haddadin, Matthias A. Müller

Abstract: In this paper a global reactive motion planning framework for robotic manipulators in complex dynamic environments is presented. In particular, the circular field predictions (CFP) planner from Becker et al. (2021) is extended to ensure obstacle avoidance of the whole structure of a robotic manipulator. Towards this end, a motion planning framework is developed that leverages global information ab… ▽ More In this paper a global reactive motion planning framework for robotic manipulators in complex dynamic environments is presented. In particular, the circular field predictions (CFP) planner from Becker et al. (2021) is extended to ensure obstacle avoidance of the whole structure of a robotic manipulator. Towards this end, a motion planning framework is developed that leverages global information about promising avoidance directions from arbitrary configuration space motion planners, resulting in improved global trajectories while reactively avoiding dynamic obstacles and decreasing the required computational power. The resulting motion planning framework is tested in multiple simulations with complex and dynamic obstacles and demonstrates great potential compared to existing motion planning approaches. △ Less

Submitted 4 August, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

Comments: Accepted to IFAC World Congress 2023

arXiv:2210.16106 [pdf, other]

doi 10.1109/TAC.2023.3303168

Motion Planning using Reactive Circular Fields: A 2D Analysis of Collision Avoidance and Goal Convergence

Authors: Marvin Becker, Johannes Köhler, Sami Haddadin, Matthias A. Müller

Abstract: Recently, many reactive trajectory planning approaches were suggested in the literature because of their inherent immediate adaption in the ever more demanding cluttered and unpredictable environments of robotic systems. However, typically those approaches are only locally reactive without considering global path planning and no guarantees for simultaneous collision avoidance and goal convergence… ▽ More Recently, many reactive trajectory planning approaches were suggested in the literature because of their inherent immediate adaption in the ever more demanding cluttered and unpredictable environments of robotic systems. However, typically those approaches are only locally reactive without considering global path planning and no guarantees for simultaneous collision avoidance and goal convergence can be given. In this paper, we study a recently developed circular field (CF)-based motion planner that combines local reactive control with global trajectory generation by adapting an artificial magnetic field such that multiple trajectories around obstacles can be evaluated. In particular, we provide a mathematically rigorous analysis of this planner in a planar environment to ensure safe motion of the controlled robot. Contrary to existing results, the derived collision avoidance analysis covers the entire CF motion planning algorithm including attractive forces for goal convergence and is not limited to a specific choice of the rotation field, i.e., our guarantees are not limited to a specific potentially suboptimal trajectory. Our Lyapunov-type collision avoidance analysis is based on the definition of an (equivalent) two-dimensional auxiliary system, which enables us to provide tight, if and only if conditions for the case of a collision with point obstacles. Furthermore, we show how this analysis naturally extends to multiple obstacles and we specify sufficient conditions for goal convergence. Finally, we provide a challenging simulation scenario with multiple non-convex point cloud obstacles and demonstrate collision avoidance and goal convergence. △ Less

Submitted 3 November, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: Published in IEEE Transactions on Automatic Control (Early Access)

arXiv:2210.14328 [pdf, other]

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models

Authors: Aaron Mueller, Yu Xia, Tal Linzen

Abstract: Structural probing work has found evidence for latent syntactic information in pre-trained language models. However, much of this analysis has focused on monolingual models, and analyses of multilingual models have employed correlational methods that are confounded by the choice of probing tasks. In this study, we causally probe multilingual language models (XGLM and multilingual BERT) as well as… ▽ More Structural probing work has found evidence for latent syntactic information in pre-trained language models. However, much of this analysis has focused on monolingual models, and analyses of multilingual models have employed correlational methods that are confounded by the choice of probing tasks. In this study, we causally probe multilingual language models (XGLM and multilingual BERT) as well as monolingual BERT-based models across various languages; we do this by performing counterfactual perturbations on neuron activations and observing the effect on models' subject-verb agreement probabilities. We observe where in the model and to what extent syntactic agreement is encoded in each language. We find significant neuron overlap across languages in autoregressive multilingual language models, but not masked language models. We also find two distinct layer-wise effect patterns and two distinct sets of neurons used for syntactic agreement, depending on whether the subject and verb are separated by other tokens. Finally, we find that behavioral analyses of language models are likely underestimating how sensitive masked language models are to syntactic information. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: Accepted to CoNLL 2022

arXiv:2208.12852 [pdf, other]

What Do NLP Researchers Believe? Results of the NLP Community Metasurvey

Authors: Julian Michael, Ari Holtzman, Alicia Parrish, Aaron Mueller, Alex Wang, Angelica Chen, Divyam Madaan, Nikita Nangia, Richard Yuanzhe Pang, Jason Phang, Samuel R. Bowman

Abstract: We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, w… ▽ More We present the results of the NLP Community Metasurvey. Run from May to June 2022, the survey elicited opinions on controversial issues, including industry influence in the field, concerns about AGI, and ethics. Our results put concrete numbers to several controversies: For example, respondents are split almost exactly in half on questions about the importance of artificial general intelligence, whether language models understand language, and the necessity of linguistic structure and inductive bias for solving NLP problems. In addition, the survey posed meta-questions, asking respondents to predict the distribution of survey responses. This allows us not only to gain insight on the spectrum of beliefs held by NLP researchers, but also to uncover false sociological beliefs where the community's predictions don't match reality. We find such mismatches on a wide range of issues. Among other results, the community greatly overestimates its own belief in the usefulness of benchmarks and the potential for scaling to solve real-world problems, while underestimating its own belief in the importance of linguistic structure, inductive bias, and interdisciplinary science. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 31 pages, 19 figures, 3 tables; more information at https://nlpsurvey.net

ACM Class: I.2.7

arXiv:2208.06080 [pdf, other]

Smartwatch-based ecological momentary assessments for occupant wellness and privacy in buildings

Authors: Clayton Miller, Renee Christensen, Jin Kai Leong, Mahmoud Abdelrahman, Federico Tartarini, Matias Quintana, Andre Matthias Müller, Mario Frei

Abstract: This paper describes the adaptation of an open-source ecological momentary assessment smart-watch platform with three sets of micro-survey wellness-related questions focused on i) infectious disease (COVID-19) risk perception, ii) privacy and distraction in an office context, and iii) triggers of various movement-related behaviors in buildings. This platform was previously used to collect data for… ▽ More This paper describes the adaptation of an open-source ecological momentary assessment smart-watch platform with three sets of micro-survey wellness-related questions focused on i) infectious disease (COVID-19) risk perception, ii) privacy and distraction in an office context, and iii) triggers of various movement-related behaviors in buildings. This platform was previously used to collect data for thermal comfort, and this work extends its use to other domains. Several research participants took part in a proof-of-concept experiment by wearing a smartwatch to collect their micro-survey question preferences and perception responses for two of the question sets. Participants were also asked to install an indoor localization app on their phone to detect where precisely in the building they completed the survey. The experiment identified occupant information such as the tendencies for the research participants to prefer privacy in certain spaces and the difference between infectious disease risk perception in naturally versus mechanically ventilated spaces. △ Less

Submitted 11 August, 2022; originally announced August 2022.

Journal ref: 17th International Conference on Indoor Air Quality and Climate, INDOOR AIR 2022

arXiv:2206.10122 [pdf, other]

Safe and Psychologically Pleasant Traffic Signal Control with Reinforcement Learning using Action Masking

Authors: Arthur Müller, Matthia Sabatelli

Abstract: Reinforcement learning (RL) for traffic signal control (TSC) has shown better performance in simulation for controlling the traffic flow of intersections than conventional approaches. However, due to several challenges, no RL-based TSC has been deployed in the field yet. One major challenge for real-world deployment is to ensure that all safety requirements are met at all times during operation. W… ▽ More Reinforcement learning (RL) for traffic signal control (TSC) has shown better performance in simulation for controlling the traffic flow of intersections than conventional approaches. However, due to several challenges, no RL-based TSC has been deployed in the field yet. One major challenge for real-world deployment is to ensure that all safety requirements are met at all times during operation. We present an approach to ensure safety in a real-world intersection by using an action space that is safe by design. The action space encompasses traffic phases, which represent the combination of non-conflicting signal colors of the intersection. Additionally, an action masking mechanism makes sure that only appropriate phase transitions are carried out. Another challenge for real-world deployment is to ensure a control behavior that avoids stress for road users. We demonstrate how to achieve this by incorporating domain knowledge through extending the action masking mechanism. We test and verify our approach in a realistic simulation scenario. By ensuring safety and psychologically pleasant control behavior, our approach drives development towards real-world deployment of RL for TSC. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: Paper was accepted by the ITSC 2022 (25th IEEE International Conference on Intelligent Transportation Systems)

arXiv:2206.03599 [pdf, other]

doi 10.1016/j.jocs.2022.101853

Multi-qubit doilies: enumeration for all ranks and classification for ranks four and five

Authors: Axel Muller, Metod Saniga, Alain Giorgetti, Henri De Boutray, Frédéric Holweck

Abstract: For $N \geq 2$, an $N$-qubit doily is a doily living in the $N$-qubit symplectic polar space. These doilies are related to operator-based proofs of quantum contextuality. Following and extending the strategy of Saniga et al. (Mathematics 9 (2021) 2272) that focused exclusively on three-qubit doilies, we first bring forth several formulas giving the number of both linear and quadratic doilies for a… ▽ More For $N \geq 2$, an $N$-qubit doily is a doily living in the $N$-qubit symplectic polar space. These doilies are related to operator-based proofs of quantum contextuality. Following and extending the strategy of Saniga et al. (Mathematics 9 (2021) 2272) that focused exclusively on three-qubit doilies, we first bring forth several formulas giving the number of both linear and quadratic doilies for any $N > 2$. Then we present an effective algorithm for the generation of all $N$-qubit doilies. Using this algorithm for $N=4$ and $N=5$, we provide a classification of $N$-qubit doilies in terms of types of observables they feature and number of negative lines they are endowed with. We also list several distinguished findings about $N$-qubit doilies that are absent in the three-qubit case, point out a couple of specific features exhibited by linear doilies and outline some prospective extensions of our approach. △ Less

Submitted 25 November, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

Comments: Minor revisions and corrections. Published in Journal of Computational Science, Volume 64, 2022, 101853, ISSN 1877-7503, https://doi.org/10.1016/j.jocs.2022.101853

Journal ref: Journal of Computational Science 64 (2022) 101853

arXiv:2205.07610 [pdf, other]

doi 10.1145/3524059.3532376

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

Authors: André Müller, Bertil Schmidt, Richard Membarth, Roland Leißa, Sebastian Hack

Abstract: In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. We present AnySeq/GPU-a se… ▽ More In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current state-of-the-art approaches are not able to leverage the massively parallel processing capabilities of modern GPUs with close-to-peak performance. We present AnySeq/GPU-a sequence alignment library that augments the AnySeq1 library with a novel approach for accelerating dynamic programming (DP) alignment on GPUs by minimizing memory accesses using warp shuffles and half-precision arithmetic. Our implementation is based on the AnyDSL compiler framework which allows for convenient zero-cost abstractions through guaranteed partial evaluation. We show that our approach achieves over 80% of the peak performance on both NVIDIA and AMD GPUs thereby outperforming the GPU-based alignment libraries AnySeq1, GASAL2, ADEPT, and NVBIO by a factor of at least 3.6 while achieving a median speedup of 19.2x over these tools across different alignment scenarios and sequence lengths when running on the same hardware. This leads to throughputs of up to 1.7 TCUPS (tera cell updates per second) on an NVIDIA GV100, up to 3.3 TCUPS with half-precision arithmetic on a single NVIDIA A100, and up to 3.8 TCUPS on an AMD MI100. △ Less

Submitted 16 May, 2022; originally announced May 2022.

Comments: published on ICS '22 (2022 International Conference on Supercomputing)

arXiv:2204.09710 [pdf, other]

doi 10.1190/geo2021-0586.1

Complete identification of complex salt geometries from inaccurate migrated subsurface offset gathers using deep learning

Authors: Ana Paula O. Muller, Jesse C. Costa, Clecio R. Bom, Elisangela L. Faria, Matheus Klatt, Gabriel Teixeira, Marcelo P. de Albuquerque, Marcio P. de Albuquerque

Abstract: Delimiting salt inclusions from migrated images is a time-consuming activity that relies on highly human-curated analysis and is subject to interpretation errors or limitations of the methods available. We propose to use migrated images produced from an inaccurate velocity model (with a reasonable approximation of sediment velocity, but without salt inclusions) to predict the correct salt inclusio… ▽ More Delimiting salt inclusions from migrated images is a time-consuming activity that relies on highly human-curated analysis and is subject to interpretation errors or limitations of the methods available. We propose to use migrated images produced from an inaccurate velocity model (with a reasonable approximation of sediment velocity, but without salt inclusions) to predict the correct salt inclusions shape using a Convolutional Neural Network (CNN). Our approach relies on subsurface Common Image Gathers to focus the sediments' reflections around the zero offset and to spread the energy of salt reflections over large offsets. Using synthetic data, we trained a U-Net to use common-offset subsurface images as input channels for the CNN and the correct salt-masks as network output. The network learned to predict the salt inclusions masks with high accuracy; moreover, it also performed well when applied to synthetic benchmark data sets that were not previously introduced. Our training process tuned the U-Net to successfully learn the shape of complex salt bodies from partially focused subsurface offset images. △ Less

Submitted 5 December, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

Comments: Manuscript published at Geophysics

Journal ref: Geophysics 87 2022

arXiv:2204.07128 [pdf, other]

Label Semantic Aware Pre-training for Few-shot Text Classification

Authors: Aaron Mueller, Jason Krone, Salvatore Romeo, Saab Mansour, Elman Mansimov, Yi Zhang, Dan Roth

Abstract: In text classification tasks, useful information is encoded in the label names. Label semantic aware systems have leveraged this information for improved text classification performance during fine-tuning and prediction. However, use of label-semantics during pre-training has not been extensively explored. We therefore propose Label Semantic Aware Pre-training (LSAP) to improve the generalization… ▽ More In text classification tasks, useful information is encoded in the label names. Label semantic aware systems have leveraged this information for improved text classification performance during fine-tuning and prediction. However, use of label-semantics during pre-training has not been extensively explored. We therefore propose Label Semantic Aware Pre-training (LSAP) to improve the generalization and data efficiency of text classification systems. LSAP incorporates label semantics into pre-trained generative models (T5 in our case) by performing secondary pre-training on labeled sentences from a variety of domains. As domain-general pre-training requires large amounts of data, we develop a filtering and labeling pipeline to automatically create sentence-label pairs from unlabeled text. We perform experiments on intent (ATIS, Snips, TOPv2) and topic classification (AG News, Yahoo! Answers). LSAP obtains significant accuracy improvements over state-of-the-art models for few-shot text classification while maintaining performance comparable to state of the art in high-resource settings. △ Less

Submitted 29 May, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted at ACL 2022

Showing 1–50 of 110 results for author: Mueller, A