Search | arXiv e-print repository

arXiv:2407.20650 [pdf, other]

No learning rates needed: Introducing SALSA -- Stable Armijo Line Search Adaptation

Authors: Philip Kenneweg, Tristan Kenneweg, Fabian Fumagalli, Barbara Hammer

Abstract: In recent studies, line search methods have been demonstrated to significantly enhance the performance of conventional stochastic gradient descent techniques across various datasets and architectures, while making an otherwise critical choice of learning rate schedule superfluous. In this paper, we identify problems of current state-of-the-art of line search methods, propose enhancements, and rigo… ▽ More In recent studies, line search methods have been demonstrated to significantly enhance the performance of conventional stochastic gradient descent techniques across various datasets and architectures, while making an otherwise critical choice of learning rate schedule superfluous. In this paper, we identify problems of current state-of-the-art of line search methods, propose enhancements, and rigorously assess their effectiveness. Furthermore, we evaluate these methods on orders of magnitude larger datasets and more complex data domains than previously done. More specifically, we enhance the Armijo line search method by speeding up its computation and incorporating a momentum term into the Armijo criterion, making it better suited for stochastic mini-batching. Our optimization approach outperforms both the previous Armijo implementation and a tuned learning rate schedule for the Adam and SGD optimizers. Our evaluation covers a diverse range of architectures, such as Transformers, CNNs, and MLPs, as well as data domains, including NLP and image data. Our work is publicly available as a Python package, which provides a simple Pytorch optimizer. △ Less

Submitted 30 July, 2024; originally announced July 2024.

Comments: published in IJCNN 2024. arXiv admin note: text overlap with arXiv:2403.18519

arXiv:2407.14288 [pdf, other]

Augmentation of Universal Potentials for Broad Applications

Authors: Joe Pitfield, Florian Brix, Zeyuan Tang, Andreas Møller Slavensky, Nikolaj Rønne, Mads-Peter Verner Christiansen, Bjørk Hammer

Abstract: Universal potentials open the door for DFT level calculations at a fraction of their cost. We find that for application to systems outside the scope of its training data, CHGNet\cite{deng2023chgnet} has the potential to succeed out of the box, but can also fail significantly in predicting the ground state configuration. We demonstrate that via fine-tuning or a $Δ$-learning approach it is possible… ▽ More Universal potentials open the door for DFT level calculations at a fraction of their cost. We find that for application to systems outside the scope of its training data, CHGNet\cite{deng2023chgnet} has the potential to succeed out of the box, but can also fail significantly in predicting the ground state configuration. We demonstrate that via fine-tuning or a $Δ$-learning approach it is possible to augment the overall performance of universal potentials for specific cluster and surface systems. We utilize this to investigate and explain experimentally observed defects in the Ag(111)-O surface reconstruction and explain the mechanics behind its formation. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.13471 [pdf, other]

doi 10.1063/5.0207801

Accelerating structure search using atomistic graph-based classifiers

Authors: Andreas Møller Slavensky, Bjørk Hammer

Abstract: We introduce an atomistic classifier based on a combination of spectral graph theory and a Voronoi tessellation method. This classifier allows for the discrimination between structures from different minima of a potential energy surface, making it a useful tool for sorting through large datasets of atomic systems. We incorporate the classifier as a filtering method in the Global Optimization with… ▽ More We introduce an atomistic classifier based on a combination of spectral graph theory and a Voronoi tessellation method. This classifier allows for the discrimination between structures from different minima of a potential energy surface, making it a useful tool for sorting through large datasets of atomic systems. We incorporate the classifier as a filtering method in the Global Optimization with First-principles Energy Expressions (GOFEE) algorithm. Here it is used to filter out structures from exploited regions of the potential energy landscape, whereby the risk of stagnation during the searches is lowered. We demonstrate the usefulness of the classifier by solving the global optimization problem of 2-dimensional pyroxene, 3-dimensional olivine, Au12, and Lennard-Jones LJ55 and LJ75 nanoparticles. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: 12 pages, 10 figures

Journal ref: J. Chem. Phys. 161, 014713 (2024)

arXiv:2407.12525 [pdf, other]

Efficient ensemble uncertainty estimation in Gaussian Processes Regression

Authors: Mads-Peter Verner Christiansen, Nikolaj Rønne, Bjørk Hammer

Abstract: Reliable uncertainty measures are required when using data based machine learning interatomic potentials (MLIPs) for atomistic simulations. In this work, we propose for sparse Gaussian Process Regression type MLIP a stochastic uncertainty measure akin to the query-by-committee approach often used in conjunction with neural network based MLIPs. The uncertainty measure is coined \textit{"label noise… ▽ More Reliable uncertainty measures are required when using data based machine learning interatomic potentials (MLIPs) for atomistic simulations. In this work, we propose for sparse Gaussian Process Regression type MLIP a stochastic uncertainty measure akin to the query-by-committee approach often used in conjunction with neural network based MLIPs. The uncertainty measure is coined \textit{"label noise"} ensemble uncertainty as it emerges from adding noise to the energy labels in the training data. We find that this method of calculating an ensemble uncertainty is as well calibrated as the one obtained from the closed-form expression for the posterior variance when the sparse GPR is treated as a projected process. Comparing the two methods, our proposed ensemble uncertainty is, however, faster to evaluate than the closed-form expression. Finally, we demonstrate that the proposed uncertainty measure acts better to support a Bayesian search for optimal structure of Au$_{20}$ clusters. △ Less

Submitted 17 July, 2024; originally announced July 2024.

arXiv:2406.03012 [pdf, other]

Analyzing the Influence of Training Samples on Explanations

Authors: André Artelt, Barbara Hammer

Abstract: EXplainable AI (XAI) constitutes a popular method to analyze the reasoning of AI systems by explaining their decision-making, e.g. providing a counterfactual explanation of how to achieve recourse. However, in cases such as unexpected explanations, the user might be interested in learning about the cause of this explanation -- e.g. properties of the utilized training data that are responsible for… ▽ More EXplainable AI (XAI) constitutes a popular method to analyze the reasoning of AI systems by explaining their decision-making, e.g. providing a counterfactual explanation of how to achieve recourse. However, in cases such as unexpected explanations, the user might be interested in learning about the cause of this explanation -- e.g. properties of the utilized training data that are responsible for the observed explanation. Under the umbrella of data valuation, first approaches have been proposed that estimate the influence of data samples on a given model. In this work, we take a slightly different stance, as we are interested in the influence of single samples on a model explanation rather than the model itself. Hence, we propose the novel problem of identifying training data samples that have a high influence on a given explanation (or related quantity) and investigate the particular case of differences in the cost of the recourse between protected groups. For this, we propose an algorithm that identifies such influential training samples. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted at the Workshop on Explainable Artificial Intelligence (XAI) at IJCAI 2024. arXiv admin note: text overlap with arXiv:2402.08290

arXiv:2406.02078 [pdf, other]

A Toolbox for Supporting Research on AI in Water Distribution Networks

Authors: André Artelt, Marios S. Kyriakou, Stelios G. Vrachimis, Demetrios G. Eliades, Barbara Hammer, Marios M. Polycarpou

Abstract: Drinking water is a vital resource for humanity, and thus, Water Distribution Networks (WDNs) are considered critical infrastructures in modern societies. The operation of WDNs is subject to diverse challenges such as water leakages and contamination, cyber/physical attacks, high energy consumption during pump operation, etc. With model-based methods reaching their limits due to various uncertaint… ▽ More Drinking water is a vital resource for humanity, and thus, Water Distribution Networks (WDNs) are considered critical infrastructures in modern societies. The operation of WDNs is subject to diverse challenges such as water leakages and contamination, cyber/physical attacks, high energy consumption during pump operation, etc. With model-based methods reaching their limits due to various uncertainty sources, AI methods offer promising solutions to those challenges. In this work, we introduce a Python toolbox for complex scenario modeling \& generation such that AI researchers can easily access challenging problems from the drinking water domain. Besides providing a high-level interface for the easy generation of hydraulic and water quality scenario data, it also provides easy access to popular event detection benchmarks and an environment for developing control algorithms. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted at the Workshop on Artificial Intelligence for Critical Infrastructure (AI4CI 2024) @ IJCAI'24 , Jeju Island, South Korea

arXiv:2405.10852 [pdf, other]

KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions

Authors: Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer

Abstract: The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game vi… ▽ More The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions. △ Less

Submitted 16 July, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

Comments: Published Paper at ICML 2024: https://openreview.net/forum?id=d5jXW2H4gg

arXiv:2405.10271 [pdf, other]

Automated Federated Learning via Informed Pruning

Authors: Christian Internò, Elena Raponi, Niki van Stein, Thomas Bäck, Markus Olhofer, Yaochu Jin, Barbara Hammer

Abstract: Federated learning (FL) represents a pivotal shift in machine learning (ML) as it enables collaborative training of local ML models coordinated by a central aggregator, all without the need to exchange local data. However, its application on edge devices is hindered by limited computational capabilities and data communication challenges, compounded by the inherent complexity of Deep Learning (DL)… ▽ More Federated learning (FL) represents a pivotal shift in machine learning (ML) as it enables collaborative training of local ML models coordinated by a central aggregator, all without the need to exchange local data. However, its application on edge devices is hindered by limited computational capabilities and data communication challenges, compounded by the inherent complexity of Deep Learning (DL) models. Model pruning is identified as a key technique for compressing DL models on devices with limited resources. Nonetheless, conventional pruning techniques typically rely on manually crafted heuristics and demand human expertise to achieve a balance between model size, speed, and accuracy, often resulting in sub-optimal solutions. In this study, we introduce an automated federated learning approach utilizing informed pruning, called AutoFLIP, which dynamically prunes and compresses DL models within both the local clients and the global server. It leverages a federated loss exploration phase to investigate model gradient behavior across diverse datasets and losses, providing insights into parameter significance. Our experiments showcase notable enhancements in scenarios with strong non-IID data, underscoring AutoFLIP's capacity to tackle computational constraints and achieve superior global convergence. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.06425 [pdf, other]

Koopman-Based Surrogate Modelling of Turbulent Rayleigh-Bénard Convection

Authors: Thorben Markmann, Michiel Straat, Barbara Hammer

Abstract: Several related works have introduced Koopman-based Machine Learning architectures as a surrogate model for dynamical systems. These architectures aim to learn non-linear measurements (also known as observables) of the system's state that evolve by a linear operator and are, therefore, amenable to model-based linear control techniques. So far, mainly simple systems have been targeted, and Koopman… ▽ More Several related works have introduced Koopman-based Machine Learning architectures as a surrogate model for dynamical systems. These architectures aim to learn non-linear measurements (also known as observables) of the system's state that evolve by a linear operator and are, therefore, amenable to model-based linear control techniques. So far, mainly simple systems have been targeted, and Koopman architectures as reduced-order models for more complex dynamics have not been fully explored. Hence, we use a Koopman-inspired architecture called the Linear Recurrent Autoencoder Network (LRAN) for learning reduced-order dynamics in convection flows of a Rayleigh Bénard Convection (RBC) system at different amounts of turbulence. The data is obtained from direct numerical simulations of the RBC system. A traditional fluid dynamics method, the Kernel Dynamic Mode Decomposition (KDMD), is used to compare the LRAN. For both methods, we performed hyperparameter sweeps to identify optimal settings. We used a Normalized Sum of Square Error measure for the quantitative evaluation of the models, and we also studied the model predictions qualitatively. We obtained more accurate predictions with the LRAN than with KDMD in the most turbulent setting. We conjecture that this is due to the LRAN's flexibility in learning complicated observables from data, thereby serving as a viable surrogate model for the main structure of fluid dynamics in turbulent convection settings. In contrast, KDMD was more effective in lower turbulence settings due to the repetitiveness of the convection flow. The feasibility of Koopman-based surrogate models for turbulent fluid flows opens possibilities for efficient model-based control techniques useful in a variety of industrial settings. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted at the International Joint Conference on Neural Networks (IJCNN) 2024

arXiv:2404.08582 [pdf, other]

doi 10.1109/IJCNN60899.2024.10651287

FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation

Authors: Riza Velioglu, Robin Chan, Barbara Hammer

Abstract: In the realm of fashion object detection and segmentation for online shopping images, existing state-of-the-art fashion parsing models encounter limitations, particularly when exposed to non-model-worn apparel and close-up shots. To address these failures, we introduce FashionFail; a new fashion dataset with e-commerce images for object detection and segmentation. The dataset is efficiently curate… ▽ More In the realm of fashion object detection and segmentation for online shopping images, existing state-of-the-art fashion parsing models encounter limitations, particularly when exposed to non-model-worn apparel and close-up shots. To address these failures, we introduce FashionFail; a new fashion dataset with e-commerce images for object detection and segmentation. The dataset is efficiently curated using our novel annotation tool that leverages recent foundation models. The primary objective of FashionFail is to serve as a test bed for evaluating the robustness of models. Our analysis reveals the shortcomings of leading models, such as Attribute-Mask R-CNN and Fashionformer. Additionally, we propose a baseline approach using naive data augmentation to mitigate common failure cases and improve model robustness. Through this work, we aim to inspire and support further research in fashion item detection and segmentation for industrial applications. The dataset, annotation tool, code, and models are available at \url{https://rizavelioglu.github.io/fashionfail/}. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: to be published in 2024 International Joint Conference on Neural Networks (IJCNN)

arXiv:2404.01317 [pdf, other]

doi 10.1007/978-3-031-21753-1_25

Intelligent Learning Rate Distribution to reduce Catastrophic Forgetting in Transformers

Authors: Philip Kenneweg, Alexander Schulz, Sarah Schröder, Barbara Hammer

Abstract: Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in… ▽ More Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in this context. We perform a hyperparameter optimization process to find learning rate distributions that are better than a flat learning rate. We combine the learning rate distributions thus found and show that they generalize to better performance with respect to the problem of catastrophic forgetting. We validate these learning rate distributions with a variety of NLP benchmarks from the GLUE dataset. △ Less

Submitted 27 March, 2024; originally announced April 2024.

arXiv:2403.18872 [pdf, other]

Targeted Visualization of the Backbone of Encoder LLMs

Authors: Isaac Roberts, Alexander Schulz, Luca Hermes, Barbara Hammer

Abstract: Attention based Large Language Models (LLMs) are the state-of-the-art in natural language processing (NLP). The two most common architectures are encoders such as BERT, and decoders like the GPT models. Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks, signifying the neces… ▽ More Attention based Large Language Models (LLMs) are the state-of-the-art in natural language processing (NLP). The two most common architectures are encoders such as BERT, and decoders like the GPT models. Despite the success of encoder models, on which we focus in this work, they also bear several risks, including issues with bias or their susceptibility for adversarial attacks, signifying the necessity for explainable AI to detect such issues. While there does exist various local explainability methods focusing on the prediction of single inputs, global methods based on dimensionality reduction for classification inspection, which have emerged in other domains and that go further than just using t-SNE in the embedding space, are not widely spread in NLP. To reduce this gap, we investigate the application of DeepView, a method for visualizing a part of the decision function together with a data set in two dimensions, to the NLP domain. While in previous work, DeepView has been used to inspect deep image classification models, we demonstrate how to apply it to BERT-based NLP classifiers and investigate its usability in this domain, including settings with adversarially perturbed input samples and pre-trained, fine-tuned, and multi-task models. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.18570 [pdf, other]

Physics-Informed Graph Neural Networks for Water Distribution Systems

Authors: Inaam Ashraf, Janine Strotherm, Luca Hermes, Barbara Hammer

Abstract: Water distribution systems (WDS) are an integral part of critical infrastructure which is pivotal to urban development. As 70% of the world's population will likely live in urban environments in 2050, efficient simulation and planning tools for WDS play a crucial role in reaching UN's sustainable developmental goal (SDG) 6 - "Clean water and sanitation for all". In this realm, we propose a novel a… ▽ More Water distribution systems (WDS) are an integral part of critical infrastructure which is pivotal to urban development. As 70% of the world's population will likely live in urban environments in 2050, efficient simulation and planning tools for WDS play a crucial role in reaching UN's sustainable developmental goal (SDG) 6 - "Clean water and sanitation for all". In this realm, we propose a novel and efficient machine learning emulator, more precisely, a physics-informed deep learning (DL) model, for hydraulic state estimation in WDS. Using a recursive approach, our model only needs a few graph convolutional neural network (GCN) layers and employs an innovative algorithm based on message passing. Unlike conventional machine learning tasks, the model uses hydraulic principles to infer two additional hydraulic state features in the process of reconstructing the available ground truth feature in an unsupervised manner. To the best of our knowledge, this is the first DL approach to emulate the popular hydraulic simulator EPANET, utilizing no additional information. Like most DL models and unlike the hydraulic simulator, our model demonstrates vastly faster emulation times that do not increase drastically with the size of the WDS. Moreover, we achieve high accuracy on the ground truth and very similar results compared to the hydraulic simulator as demonstrated through experiments on five real-world WDS datasets. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Extended version of the paper with the same title published at Proceedings of the AAAI Conference on Artificial Intelligence 2024

arXiv:2403.18555 [pdf, other]

doi 10.5220/0011615300003411

Debiasing Sentence Embedders through Contrastive Word Pairs

Authors: Philip Kenneweg, Sarah Schröder, Alexander Schulz, Barbara Hammer

Abstract: Over the last years, various sentence embedders have been an integral part in the success of current machine learning approaches to Natural Language Processing (NLP). Unfortunately, multiple sources have shown that the bias, inherent in the datasets upon which these embedding methods are trained, is learned by them. A variety of different approaches to remove biases in embeddings exists in the lit… ▽ More Over the last years, various sentence embedders have been an integral part in the success of current machine learning approaches to Natural Language Processing (NLP). Unfortunately, multiple sources have shown that the bias, inherent in the datasets upon which these embedding methods are trained, is learned by them. A variety of different approaches to remove biases in embeddings exists in the literature. Most of these approaches are applicable to word embeddings and in fewer cases to sentence embeddings. It is problematic that most debiasing approaches are directly transferred from word embeddings, therefore these approaches fail to take into account the nonlinear nature of sentence embedders and the embeddings they produce. It has been shown in literature that bias information is still present if sentence embeddings are debiased using such methods. In this contribution, we explore an approach to remove linear and nonlinear bias information for NLP solutions, without impacting downstream performance. We compare our approach to common debiasing methods on classical bias metrics and on bias metrics which take nonlinear information into account. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18547 [pdf, other]

doi 10.14428/esann/2022.ES2022-45

Neural Architecture Search for Sentence Classification with BERT

Authors: Philip Kenneweg, Sarah Schröder, Barbara Hammer

Abstract: Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outper… ▽ More Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18519 [pdf, other]

doi 10.1109/ACDSA59508.2024.10467724

Improving Line Search Methods for Large Scale Neural Network Training

Authors: Philip Kenneweg, Tristan Kenneweg, Barbara Hammer

Abstract: In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets… ▽ More In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.18506 [pdf, other]

doi 10.1109/IJCNN54540.2023.10192001

Faster Convergence for Transformer Fine-tuning with Line Search Methods

Authors: Philip Kenneweg, Leonardo Galli, Tristan Kenneweg, Barbara Hammer

Abstract: Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search… ▽ More Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and perform the line search separately on these local units. Our optimization method outperforms the traditional Adam optimizer and achieves significant performance improvements for small data sets or small training budgets, while performing equal or better for other tested cases. Our work is publicly available as a python package, which provides a hyperparameter-free pytorch optimizer that is compatible with arbitrary network architectures. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.00820 [pdf, other]

Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup

Authors: Tristan Kenneweg, Philip Kenneweg, Barbara Hammer

Abstract: Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-Language Model (LLM) outputs with domain specific and time sensitive data. Very recently a shift is happening from simple RAG setups that query a vector database for additional information with every user input to more sophisticated forms of RAG. However, different concrete approaches compete on mostly anecd… ▽ More Retrieval Augmented Generation (RAG) systems have seen huge popularity in augmenting Large-Language Model (LLM) outputs with domain specific and time sensitive data. Very recently a shift is happening from simple RAG setups that query a vector database for additional information with every user input to more sophisticated forms of RAG. However, different concrete approaches compete on mostly anecdotal evidence at the moment. In this paper we present a rigorous dataset creation and evaluation workflow to quantitatively compare different RAG strategies. We use a dataset created this way for the development and evaluation of a boolean agent RAG setup: A system in which a LLM can decide whether to query a vector database or not, thus saving tokens on questions that can be answered with internal knowledge. We publish our code and generated dataset online. △ Less

Submitted 26 February, 2024; originally announced March 2024.

Comments: Was handed in to IJCNN prior to preprint publication here. Was neither accepted nor rejected at date of publication here

arXiv:2402.18338 [pdf, other]

doi 10.1063/5.0156218

Generating candidates in global optimization algorithms using complementary energy landscapes

Authors: Andreas Møller Slavensky, Mads-Peter V. Christensen, Bjørk Hammer

Abstract: Global optimization of atomistic structure rely on the generation of new candidate structures in order to drive the exploration of the potential energy surface (PES) in search for the global minimum energy (GM) structure. In this work, we discuss a type of structure generation, which locally optimizes structures in complementary energy (CE) landscapes. These landscapes are formulated temporarily d… ▽ More Global optimization of atomistic structure rely on the generation of new candidate structures in order to drive the exploration of the potential energy surface (PES) in search for the global minimum energy (GM) structure. In this work, we discuss a type of structure generation, which locally optimizes structures in complementary energy (CE) landscapes. These landscapes are formulated temporarily during the searches as machine learned potentials (MLPs) using local atomistic environments sampled from collected data. The CE landscapes are deliberately incomplete MLPs that rather than mimicking every aspect of the true PES are sought to become much smoother, having only few local minima. This means that local optimization in the CE landscapes may facilitate identification of new funnels in the true PES. We discuss how to construct the CE landscapes and we test their influence on global optimization of a reduced rutile SnO2(110)-(4x1) surface, and an olivine (Mg2SiO4)4 cluster for which we report a new global minimum energy structure. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures

Journal ref: J. Chem. Phys. 159, 024123 (2023)

arXiv:2402.17404 [pdf, other]

Generative diffusion model for surface structure discovery

Authors: Nikolaj Rønne, Alán Aspuru-Guzik, Bjørk Hammer

Abstract: We present a generative diffusion model specifically tailored to the discovery of surface structures. The generative model takes into account substrate registry and periodicity by including masked atoms and $z$-directional confinement. Using a rotational equivariant neural network architecture, we design a method that trains a denoiser-network for diffusion alongside a force-field for guided sampl… ▽ More We present a generative diffusion model specifically tailored to the discovery of surface structures. The generative model takes into account substrate registry and periodicity by including masked atoms and $z$-directional confinement. Using a rotational equivariant neural network architecture, we design a method that trains a denoiser-network for diffusion alongside a force-field for guided sampling of low-energy surface phases. An effective data-augmentation scheme for training the denoiser-network is proposed to scale generation far beyond structure sizes represented in the training data. We showcase the generative model by investigating multiple surface systems and propose an atomistic structure model for a previously unknown silver-oxide domain-boundary of unprecedented size. △ Less

Submitted 2 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.08290 [pdf, other]

The Effect of Data Poisoning on Counterfactual Explanations

Authors: André Artelt, Shubham Sharma, Freddy Lecué, Barbara Hammer

Abstract: Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e.\ more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the… ▽ More Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e.\ more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formally introduce and investigate data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. In this context, we characterize and prove the correctness of several different data poisonings. We also empirically demonstrate that state-of-the-art counterfactual generation methods and toolboxes are vulnerable to such data poisoning. △ Less

Submitted 21 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2401.15499 [pdf, other]

doi 10.5220/0012577200003654

Semantic Properties of cosine based bias scores for word embeddings

Authors: Sarah Schröder, Alexander Schulz, Fabian Hinder, Barbara Hammer

Abstract: Plenty of works have brought social biases in language models to attention and proposed methods to detect such biases. As a result, the literature contains a great deal of different bias tests and scores, each introduced with the premise to uncover yet more biases that other scores fail to detect. What severely lacks in the literature, however, are comparative studies that analyse such bias scores… ▽ More Plenty of works have brought social biases in language models to attention and proposed methods to detect such biases. As a result, the literature contains a great deal of different bias tests and scores, each introduced with the premise to uncover yet more biases that other scores fail to detect. What severely lacks in the literature, however, are comparative studies that analyse such bias scores and help researchers to understand the benefits or limitations of the existing methods. In this work, we aim to close this gap for cosine based bias scores. By building on a geometric definition of bias, we propose requirements for bias scores to be considered meaningful for quantifying biases. Furthermore, we formally analyze cosine based scores from the literature with regard to these requirements. We underline these findings with experiments to show that the bias scores' limitations have an impact in the application case. △ Less

Submitted 12 September, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: 11 pages, 3 figures. arXiv admin note: text overlap with arXiv:2111.07864

arXiv:2401.13371 [pdf, other]

SVARM-IQ: Efficient Approximation of Any-order Shapley Interactions through Stratification

Authors: Patrick Kolpaczki, Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke Hüllermeier

Abstract: Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exac… ▽ More Addressing the limitations of individual attribution scores via the Shapley value (SV), the field of explainable AI (XAI) has recently explored intricate interactions of features or data points. In particular, extensions of the SV, such as the Shapley Interaction Index (SII), have been proposed as a measure to still benefit from the axiomatic basis of the SV. However, similar to the SV, their exact computation remains computationally prohibitive. Hence, we propose with SVARM-IQ a sampling-based approach to efficiently approximate Shapley-based interaction indices of any order. SVARM-IQ can be applied to a broad class of interaction indices, including the SII, by leveraging a novel stratified representation. We provide non-asymptotic theoretical guarantees on its approximation quality and empirically demonstrate that SVARM-IQ achieves state-of-the-art estimation results in practical XAI scenarios on different model classes and application domains. △ Less

Submitted 1 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.12069 [pdf, other]

doi 10.1609/aaai.v38i13.29352

Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles

Authors: Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke Hüllermeier

Abstract: While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions.… ▽ More While shallow decision trees may be interpretable, larger ensemble models like gradient-boosted trees, which often set the state of the art in machine learning problems involving tabular data, still remain black box models. As a remedy, the Shapley value (SV) is a well-known concept in explainable artificial intelligence (XAI) research for quantifying additive feature attributions of predictions. The model-specific TreeSHAP methodology solves the exponential complexity for retrieving exact SVs from tree-based models. Expanding beyond individual feature attribution, Shapley interactions reveal the impact of intricate feature interactions of any order. In this work, we present TreeSHAP-IQ, an efficient method to compute any-order additive Shapley interactions for predictions of tree-based models. TreeSHAP-IQ is supported by a mathematical framework that exploits polynomial arithmetic to compute the interaction scores in a single recursive traversal of the tree, akin to Linear TreeSHAP. We apply TreeSHAP-IQ on state-of-the-art tree ensembles and explore interactions on well-established benchmark datasets. △ Less

Submitted 22 January, 2024; originally announced January 2024.

arXiv:2401.01733 [pdf, other]

Investigating the Suitability of Concept Drift Detection for Detecting Leakages in Water Distribution Networks

Authors: Valerie Vaquet, Fabian Hinder, Barbara Hammer

Abstract: Leakages are a major risk in water distribution networks as they cause water loss and increase contamination risks. Leakage detection is a difficult task due to the complex dynamics of water distribution networks. In particular, small leakages are hard to detect. From a machine-learning perspective, leakages can be modeled as concept drift. Thus, a wide variety of drift detection schemes seems to… ▽ More Leakages are a major risk in water distribution networks as they cause water loss and increase contamination risks. Leakage detection is a difficult task due to the complex dynamics of water distribution networks. In particular, small leakages are hard to detect. From a machine-learning perspective, leakages can be modeled as concept drift. Thus, a wide variety of drift detection schemes seems to be a suitable choice for detecting leakages. In this work, we explore the potential of model-loss-based and distribution-based drift detection methods to tackle leakage detection. We additionally discuss the issue of temporal dependencies in the data and propose a way to cope with it when applying distribution-based detection. We evaluate different methods systematically for leakages of different sizes and detection times. Additionally, we propose a first drift-detection-based technique for localizing leakages. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2312.10212 [pdf, other]

A Remark on Concept Drift for Dependent Data

Authors: Fabian Hinder, Valerie Vaquet, Barbara Hammer

Abstract: Concept drift, i.e., the change of the data generating distribution, can render machine learning models inaccurate. Several works address the phenomenon of concept drift in the streaming context usually assuming that consecutive data points are independent of each other. To generalize to dependent data, many authors link the notion of concept drift to time series. In this work, we show that the te… ▽ More Concept drift, i.e., the change of the data generating distribution, can render machine learning models inaccurate. Several works address the phenomenon of concept drift in the streaming context usually assuming that consecutive data points are independent of each other. To generalize to dependent data, many authors link the notion of concept drift to time series. In this work, we show that the temporal dependencies are strongly influencing the sampling process. Thus, the used definitions need major modifications. In particular, we show that the notion of stationarity is not suited for this setup and discuss alternatives. We demonstrate that these alternative formal notions describe the observable learning behavior in numerical experiments. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2312.02034 [pdf, other]

Trust, distrust, and appropriate reliance in (X)AI: a survey of empirical evaluation of user trust

Authors: Roel Visser, Tobias M. Peters, Ingrid Scharlau, Barbara Hammer

Abstract: A current concern in the field of Artificial Intelligence (AI) is to ensure the trustworthiness of AI systems. The development of explainability methods is one prominent way to address this, which has often resulted in the assumption that the use of explainability will lead to an increase in the trust of users and wider society. However, the dynamics between explainability and trust are not well e… ▽ More A current concern in the field of Artificial Intelligence (AI) is to ensure the trustworthiness of AI systems. The development of explainability methods is one prominent way to address this, which has often resulted in the assumption that the use of explainability will lead to an increase in the trust of users and wider society. However, the dynamics between explainability and trust are not well established and empirical investigations of their relation remain mixed or inconclusive. In this paper we provide a detailed description of the concepts of user trust and distrust in AI and their relation to appropriate reliance. For that we draw from the fields of machine learning, human-computer interaction, and the social sciences. Furthermore, we have created a survey of existing empirical studies that investigate the effects of AI systems and XAI methods on user (dis)trust. With clarifying the concepts and summarizing the empirical investigations, we aim to provide researchers, who examine user trust in AI, with an improved starting point for developing user studies to measure and evaluate the user's attitude towards and reliance on AI systems. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2310.15830 [pdf, other]

Localizing Anomalies in Critical Infrastructure using Model-Based Drift Explanations

Authors: Valerie Vaquet, Fabian Hinder, Jonas Vaquet, Kathrin Lammers, Lars Quakernack, Barbara Hammer

Abstract: Facing climate change, the already limited availability of drinking water will decrease in the future rendering drinking water an increasingly scarce resource. Considerable amounts of it are lost through leakages in water transportation and distribution networks. Thus, anomaly detection and localization, in particular for leakages, are crucial but challenging tasks due to the complex interactions… ▽ More Facing climate change, the already limited availability of drinking water will decrease in the future rendering drinking water an increasingly scarce resource. Considerable amounts of it are lost through leakages in water transportation and distribution networks. Thus, anomaly detection and localization, in particular for leakages, are crucial but challenging tasks due to the complex interactions and changing demands in water distribution networks. In this work, we analyze the effects of anomalies on the dynamics of critical infrastructure systems by modeling the networks employing Bayesian networks. We then discuss how the problem is connected to and can be considered through the lens of concept drift. In particular, we argue that model-based explanations of concept drift are a promising tool for localizing anomalies given limited information about the network. The methodology is experimentally evaluated using realistic benchmark scenarios. To showcase that our methodology applies to critical infrastructure more generally, in addition to considering leakages and sensor faults in water systems, we showcase the suitability of the derived technique to localize sensor faults in power systems. △ Less

Submitted 7 February, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.15826 [pdf, other]

One or Two Things We know about Concept Drift -- A Survey on Monitoring Evolving Environments

Authors: Fabian Hinder, Valerie Vaquet, Barbara Hammer

Abstract: The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in uns… ▽ More The world surrounding us is subject to constant change. These changes, frequently described as concept drift, influence many industrial and technical processes. As they can lead to malfunctions and other anomalous behavior, which may be safety-critical in many scenarios, detecting and analyzing concept drift is crucial. In this paper, we provide a literature review focusing on concept drift in unsupervised data streams. While many surveys focus on supervised data streams, so far, there is no work reviewing the unsupervised setting. However, this setting is of particular relevance for monitoring and anomaly detection which are directly applicable to many tasks and challenges in engineering. This survey provides a taxonomy of existing work on drift detection. Besides, it covers the current state of research on drift localization in a systematic way. In addition to providing a systematic literature review, this work provides precise mathematical definitions of the considered problems and contains standardized experiments on parametric artificial datasets allowing for a direct comparison of different strategies for detection and localization. Thereby, the suitability of different schemes can be analyzed systematically and guidelines for their usage in real-world scenarios can be provided. Finally, there is a section on the emerging topic of explaining concept drift. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2307.08486 [pdf, other]

Fairness in KI-Systemen

Authors: Janine Strotherm, Alissa Müller, Barbara Hammer, Benjamin Paaßen

Abstract: The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisci… ▽ More The more AI-assisted decisions affect people's lives, the more important the fairness of such decisions becomes. In this chapter, we provide an introduction to research on fairness in machine learning. We explain the main fairness definitions and strategies for achieving fairness using concrete examples and place fairness research in the European context. Our contribution is aimed at an interdisciplinary audience and therefore avoids mathematical formulation but emphasizes visualizations and examples. -- Je mehr KI-gestützte Entscheidungen das Leben von Menschen betreffen, desto wichtiger ist die Fairness solcher Entscheidungen. In diesem Kapitel geben wir eine Einführung in die Forschung zu Fairness im maschinellen Lernen. Wir erklären die wesentlichen Fairness-Definitionen und Strategien zur Erreichung von Fairness anhand konkreter Beispiele und ordnen die Fairness-Forschung in den europäischen Kontext ein. Unser Beitrag richtet sich dabei an ein interdisziplinäres Publikum und verzichtet daher auf die mathematische Formulierung sondern betont Visualisierungen und Beispiele. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: in German language

arXiv:2306.07775 [pdf, other]

doi 10.1007/978-3-031-44064-9_11

iPDP: On Partial Dependence Plots in Dynamic Modeling Scenarios

Authors: Maximilian Muschalik, Fabian Fumagalli, Rohit Jagtani, Barbara Hammer, Eyke Hüllermeier

Abstract: Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has prim… ▽ More Post-hoc explanation techniques such as the well-established partial dependence plot (PDP), which investigates feature dependencies, are used in explainable artificial intelligence (XAI) to understand black-box machine learning models. While many real-world applications require dynamic models that constantly adapt over time and react to changes in the underlying distribution, XAI, so far, has primarily considered static learning environments, where models are trained in a batch mode and remain unchanged. We thus propose a novel model-agnostic XAI framework called incremental PDP (iPDP) that extends on the PDP to extract time-dependent feature effects in non-stationary learning environments. We formally analyze iPDP and show that it approximates a time-dependent variant of the PDP that properly reacts to real and virtual concept drift. The time-sensitivity of iPDP is controlled by a single smoothing parameter, which directly corresponds to the variance and the approximation error of iPDP in a static learning environment. We illustrate the efficacy of iPDP by showcasing an example application for drift detection and conducting multiple experiments on real-world and synthetic data sets and streams. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: This preprint has not undergone peer review or any post-submission improvements or corrections

arXiv:2306.07637 [pdf, other]

doi 10.1007/978-3-031-44070-0_14

For Better or Worse: The Impact of Counterfactual Explanations' Directionality on User Behavior in xAI

Authors: Ulrike Kuhl, André Artelt, Barbara Hammer

Abstract: Counterfactual explanations (CFEs) are a popular approach in explainable artificial intelligence (xAI), highlighting changes to input data necessary for altering a model's output. A CFE can either describe a scenario that is better than the factual state (upward CFE), or a scenario that is worse than the factual state (downward CFE). However, potential benefits and drawbacks of the directionality… ▽ More Counterfactual explanations (CFEs) are a popular approach in explainable artificial intelligence (xAI), highlighting changes to input data necessary for altering a model's output. A CFE can either describe a scenario that is better than the factual state (upward CFE), or a scenario that is worse than the factual state (downward CFE). However, potential benefits and drawbacks of the directionality of CFEs for user behavior in xAI remain unclear. The current user study (N=161) compares the impact of CFE directionality on behavior and experience of participants tasked to extract new knowledge from an automated system based on model predictions and CFEs. Results suggest that upward CFEs provide a significant performance advantage over other forms of counterfactual feedback. Moreover, the study highlights potential benefits of mixed CFEs improving user performance compared to downward CFEs or no explanations. In line with the performance results, users' explicit knowledge of the system is statistically higher after receiving upward CFEs compared to downward comparisons. These findings imply that the alignment between explanation and task at hand, the so-called regulatory fit, may play a crucial role in determining the effectiveness of model explanations, informing future research directions in xAI. To ensure reproducible research, the entire code, underlying models and user data of this study is openly available: https://github.com/ukuhl/DirectionalAlienZoo △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 22 pages, 3 figures This work has been accepted for presentation at the 1st World Conference on eXplainable Artificial Intelligence (xAI 2023), July 26-28, 2023 - Lisbon, Portugal

Journal ref: Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903

arXiv:2306.06107 [pdf, other]

Adversarial Attacks on Leakage Detectors in Water Distribution Networks

Authors: Paul Stahlhofen, André Artelt, Luca Hermes, Barbara Hammer

Abstract: Many Machine Learning models are vulnerable to adversarial attacks: There exist methodologies that add a small (imperceptible) perturbation to an input such that the model comes up with a wrong prediction. Better understanding of such attacks is crucial in particular for models used in security-critical domains, such as monitoring of water distribution networks, in order to devise counter-measures… ▽ More Many Machine Learning models are vulnerable to adversarial attacks: There exist methodologies that add a small (imperceptible) perturbation to an input such that the model comes up with a wrong prediction. Better understanding of such attacks is crucial in particular for models used in security-critical domains, such as monitoring of water distribution networks, in order to devise counter-measures enhancing model robustness and trustworthiness. We propose a taxonomy for adversarial attacks against machine learning based leakage detectors in water distribution networks. Following up on this, we focus on a particular type of attack: an adversary searching the least sensitive point, that is, the location in the water network where the largest possible undetected leak could occur. Based on a mathematical formalization of the least sensitive point problem, we use three different algorithmic approaches to find a solution. Results are evaluated on two benchmark water distribution networks. △ Less

Submitted 25 May, 2023; originally announced June 2023.

Comments: This paper was accepted for the 17th International Work-Conference on Artificial Neural Networks (IWANN 2023). A link to the version of record will be provided upon publication of the conference proceedings

arXiv:2305.15846 [pdf, other]

doi 10.1063/5.0150379

A machine learning potential for simulating infrared spectra of nanosilicate clusters

Authors: Zeyuan Tang, Stefan T. Bromley, Bjørk Hammer

Abstract: The use of machine learning (ML) in chemical physics has enabled the construction of interatomic potentials having the accuracy of ab initio methods and a computational cost comparable to that of classical force fields. Training an ML model requires an efficient method for the generation of training data. Here we apply an accurate and efficient protocol to collect training data for constructing a… ▽ More The use of machine learning (ML) in chemical physics has enabled the construction of interatomic potentials having the accuracy of ab initio methods and a computational cost comparable to that of classical force fields. Training an ML model requires an efficient method for the generation of training data. Here we apply an accurate and efficient protocol to collect training data for constructing a neural network based ML interatomic potential for nanosilicate clusters. Initial training data are taken from normal modes and farthest point sampling. Later on, the set of training data is extended via an active learning strategy in which new data are identified by the disagreement between an ensemble of ML models. The whole process is further accelerated by parallel sampling over structures. We use the ML model to run molecular dynamics (MD) simulations of nanosilicate clusters with various sizes, from which infrared spectra with anharmonicity included can be extracted. Such spectroscopic data are needed for understanding the properties of silicate dust grains in the interstellar medium (ISM) and in circumstellar environments. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 11 pages, 8 figures, accpected by J. Chem. Phys

arXiv:2303.09331 [pdf, other]

Model Based Explanations of Concept Drift

Authors: Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, Barbara Hammer

Abstract: The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models can become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift, i.e., describing the potentially complex and hi… ▽ More The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models can become inaccurate and need adjustment. While there do exist methods to detect concept drift or to adjust models in the presence of observed drift, the question of explaining drift, i.e., describing the potentially complex and high dimensional change of distribution in a human-understandable fashion, has hardly been considered so far. This problem is of importance since it enables an inspection of the most prominent characteristics of how and where drift manifests itself. Hence, it enables human understanding of the change and it increases acceptance of life-long learning models. In this paper, we present a novel technology characterizing concept drift in terms of the characteristic change of spatial features based on various explanation techniques. To do so, we propose a methodology to reduce the explanation of concept drift to an explanation of models that are trained in a suitable way extracting relevant information regarding the drift. This way a large variety of explanation schemes is available. Thus, a suitable method can be selected for the problem of drift explanation at hand. We outline the potential of this approach and demonstrate its usefulness in several examples. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.01181 [pdf, other]

doi 10.1007/978-3-031-43418-1_26

iSAGE: An Incremental Version of SAGE for Online Explanation on Data Streams

Authors: Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer, Eyke Hüllermeier

Abstract: Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalizatio… ▽ More Existing methods for explainable artificial intelligence (XAI), including popular feature importance measures such as SAGE, are mostly restricted to the batch learning scenario. However, machine learning is often applied in dynamic environments, where data arrives continuously and learning must be done in an online manner. Therefore, we propose iSAGE, a time- and memory-efficient incrementalization of SAGE, which is able to react to changes in the model as well as to drift in the data-generating process. We further provide efficient feature removal methods that break (interventional) and retain (observational) feature dependencies. Moreover, we formally analyze our explanation method to show that iSAGE adheres to similar theoretical properties as SAGE. Finally, we evaluate our approach in a thorough experimental analysis based on well-established data sets and data streams with concept drift. △ Less

Submitted 14 June, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2303.01179 [pdf, other]

SHAP-IQ: Unified Approximation of any-order Shapley Interactions

Authors: Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer

Abstract: Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice o… ▽ More Predominately in explainable artificial intelligence (XAI) research, the Shapley value (SV) is applied to determine feature attributions for any black box model. Shapley interaction indices extend the SV to define any-order feature interactions. Defining a unique Shapley interaction index is an open research question and, so far, three definitions have been proposed, which differ by their choice of axioms. Moreover, each definition requires a specific approximation technique. Here, we propose SHAPley Interaction Quantification (SHAP-IQ), an efficient sampling-based approximator to compute Shapley interactions for arbitrary cardinal interaction indices (CII), i.e. interaction indices that satisfy the linearity, symmetry and dummy axiom. SHAP-IQ is based on a novel representation and, in contrast to existing methods, we provide theoretical guarantees for its approximation quality, as well as estimates for the variance of the point estimates. For the special case of SV, our approach reveals a novel representation of the SV and corresponds to Unbiased KernelSHAP with a greatly simplified calculation. We illustrate the computational efficiency and effectiveness by explaining language, image classification and high-dimensional synthetic models. △ Less

Submitted 30 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.04141 [pdf, other]

Combining self-labeling and demand based active learning for non-stationary data streams

Authors: Valerie Vaquet, Fabian Hinder, Johannes Brinkrolf, Barbara Hammer

Abstract: Learning from non-stationary data streams is a research direction that gains increasing interest as more data in form of streams becomes available, for example from social media, smartphones, or industrial process monitoring. Most approaches assume that the ground truth of the samples becomes available (possibly with some delay) and perform supervised online learning in the test-then-train scheme.… ▽ More Learning from non-stationary data streams is a research direction that gains increasing interest as more data in form of streams becomes available, for example from social media, smartphones, or industrial process monitoring. Most approaches assume that the ground truth of the samples becomes available (possibly with some delay) and perform supervised online learning in the test-then-train scheme. While this assumption might be valid in some scenarios, it does not apply to all settings. In this work, we focus on scarcely labeled data streams and explore the potential of self-labeling in gradually drifting data streams. We formalize this setup and propose a novel online $k$-nn classifier that combines self-labeling and demand-based active learning. △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2301.01869 [pdf, other]

doi 10.1109/AERO55745.2023.10115743

Enabling Ice Core Science on Mars and Ocean Worlds

Authors: Alexander G. Chipps, Cassius B. Tunis, Nathan Chellman, Joseph R. McConnell, Bruce Hammer, Christopher E. Carr

Abstract: Ice deposits on Earth provide an extended record of volcanism, planetary climate, and life. On Mars, such a record may extend as far back as tens to hundreds of millions of years (My), compared to only a few My on Earth. Here, we propose and demonstrate a compact instrument, the Melter-Sublimator for Ice Science (MSIS), and describe its potential use cases. Similar to current use in the analysis o… ▽ More Ice deposits on Earth provide an extended record of volcanism, planetary climate, and life. On Mars, such a record may extend as far back as tens to hundreds of millions of years (My), compared to only a few My on Earth. Here, we propose and demonstrate a compact instrument, the Melter-Sublimator for Ice Science (MSIS), and describe its potential use cases. Similar to current use in the analysis of ice cores, linking MSIS to downstream elemental, chemical, and biological analyses could address whether Mars is, or was in the recent past, volcanically active, enable the creation of a detailed climate history of the late Amazonian, and seek evidence of subsurface life preserved in ice sheets. The sublimation feature can not only serve as a preconcentrator for in-situ analyses, but also enable the collection of rare material such as cosmogenic nuclides, which could be returned to Earth and used to confirm and expand the record of nearby supernovas and long-term trends in space weather. Missions to Ocean Worlds such as Europa or Enceladus will involve ice processing, and there MSIS would deliver liquid samples for downstream wet chemistry analyses. Our combined melter-sublimator system can thus help to address diverse questions in heliophysics, habitability, and astrobiology. △ Less

Submitted 4 January, 2023; originally announced January 2023.

Comments: 10 pages, 5 figures

arXiv:2212.01223 [pdf, other]

On the Change of Decision Boundaries and Loss in Learning with Concept Drift

Authors: Fabian Hinder, Valerie Vaquet, Johannes Brinkrolf, Barbara Hammer

Abstract: The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models may become inaccurate and need adjustment. Many technologies for learning with drift rely on the interleaved test-train error (ITTE) as a quantity which approximates the model generalization error and triggers drift detection and mo… ▽ More The notion of concept drift refers to the phenomenon that the distribution generating the observed data changes over time. If drift is present, machine learning models may become inaccurate and need adjustment. Many technologies for learning with drift rely on the interleaved test-train error (ITTE) as a quantity which approximates the model generalization error and triggers drift detection and model updates. In this work, we investigate in how far this procedure is mathematically justified. More precisely, we relate a change of the ITTE to the presence of real drift, i.e., a changed posterior, and to a change of the training result under the assumption of optimality. We support our theoretical findings by empirical evidence for several learning algorithms, models, and datasets. △ Less

Submitted 2 December, 2022; originally announced December 2022.

arXiv:2212.00695 [pdf, other]

doi 10.1007/978-3-031-21753-1_31

Explainable Artificial Intelligence for Improved Modeling of Processes

Authors: Riza Velioglu, Jan Philip Göpfert, André Artelt, Barbara Hammer

Abstract: In modern business processes, the amount of data collected has increased substantially in recent years. Because this data can potentially yield valuable insights, automated knowledge extraction based on process mining has been proposed, among other techniques, to provide users with intuitive access to the information contained therein. At present, the majority of technologies aim to reconstruct ex… ▽ More In modern business processes, the amount of data collected has increased substantially in recent years. Because this data can potentially yield valuable insights, automated knowledge extraction based on process mining has been proposed, among other techniques, to provide users with intuitive access to the information contained therein. At present, the majority of technologies aim to reconstruct explicit business process models. These are directly interpretable but limited concerning the integration of diverse and real-valued information sources. On the other hand, Machine Learning (ML) benefits from the vast amount of data available and can deal with high-dimensional sources, yet it has rarely been applied to being used in processes. In this contribution, we evaluate the capability of modern Transformer architectures as well as more classical ML technologies of modeling process regularities, as can be quantitatively evaluated by their prediction capability. In addition, we demonstrate the capability of attentional properties and feature relevance determination by highlighting features that are crucial to the processes' predictive abilities. We demonstrate the efficacy of our approach using five benchmark datasets and show that the ML models are capable of predicting critical outcomes and that the attention mechanisms or XAI components offer new insights into the underlying processes. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 12 pages, 3 tables, 3 figures. Published in IDEAL 2022: https://link.springer.com/chapter/10.1007/978-3-031-21753-1_31

Journal ref: IDEAL 2022, LNCS 13756, pp. 313-325, 2022

arXiv:2211.14858 [pdf, other]

"Explain it in the Same Way!" -- Model-Agnostic Group Fairness of Counterfactual Explanations

Authors: André Artelt, Barbara Hammer

Abstract: Counterfactual explanations are a popular type of explanation for making the outcomes of a decision making system transparent to the user. Counterfactual explanations tell the user what to do in order to change the outcome of the system in a desirable way. However, it was recently discovered that the recommendations of what to do can differ significantly in their complexity between protected group… ▽ More Counterfactual explanations are a popular type of explanation for making the outcomes of a decision making system transparent to the user. Counterfactual explanations tell the user what to do in order to change the outcome of the system in a desirable way. However, it was recently discovered that the recommendations of what to do can differ significantly in their complexity between protected groups of individuals. Providing more difficult recommendations of actions to one group leads to a disadvantage of this group compared to other groups. In this work we propose a model-agnostic method for computing counterfactual explanations that do not differ significantly in their complexity between protected groups. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2211.12989 [pdf, other]

Unsupervised Unlearning of Concept Drift with Autoencoders

Authors: André Artelt, Kleanthis Malialis, Christos Panayiotou, Marios Polycarpou, Barbara Hammer

Abstract: Concept drift refers to a change in the data distribution affecting the data stream of future samples. Consequently, learning models operating on the data stream might become obsolete, and need costly and difficult adjustments such as retraining or adaptation. Existing methods usually implement a local concept drift adaptation scheme, where either incremental learning of the models is used, or the… ▽ More Concept drift refers to a change in the data distribution affecting the data stream of future samples. Consequently, learning models operating on the data stream might become obsolete, and need costly and difficult adjustments such as retraining or adaptation. Existing methods usually implement a local concept drift adaptation scheme, where either incremental learning of the models is used, or the models are completely retrained when a drift detection mechanism triggers an alarm. This paper proposes an alternative approach in which an unsupervised and model-agnostic concept drift adaptation method at the global level is introduced, based on autoencoders. Specifically, the proposed method aims to ``unlearn'' the concept drift without having to retrain or adapt any of the learning models operating on the data. An extensive experimental evaluation is conducted in two application domains. We consider a realistic water distribution network with more than 30 models in-place, from which we create 200 simulated data sets / scenarios. We further consider an image-related task to demonstrate the effectiveness of our method. △ Less

Submitted 19 September, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

Comments: Accepted at IEEE SSCI 2023

arXiv:2211.11308 [pdf, other]

Novel transfer learning schemes based on Siamese networks and synthetic data

Authors: Dominik Stallmann, Philip Kenneweg, Barbara Hammer

Abstract: Transfer learning schemes based on deep networks which have been trained on huge image corpora offer state-of-the-art technologies in computer vision. Here, supervised and semi-supervised approaches constitute efficient technologies which work well with comparably small data sets. Yet, such applications are currently restricted to application domains where suitable deepnetwork models are readily a… ▽ More Transfer learning schemes based on deep networks which have been trained on huge image corpora offer state-of-the-art technologies in computer vision. Here, supervised and semi-supervised approaches constitute efficient technologies which work well with comparably small data sets. Yet, such applications are currently restricted to application domains where suitable deepnetwork models are readily available. In this contribution, we address an important application area in the domain of biotechnology, the automatic analysis of CHO-K1 suspension growth in microfluidic single-cell cultivation, where data characteristics are very dissimilar to existing domains and trained deep networks cannot easily be adapted by classical transfer learning. We propose a novel transfer learning scheme which expands a recently introduced Twin-VAE architecture, which is trained on realistic and synthetic data, and we modify its specialized training procedure to the transfer learning domain. In the specific domain, often only few to no labels exist and annotations are costly. We investigate a novel transfer learning strategy, which incorporates a simultaneous retraining on natural and synthetic data using an invariant shared representation as well as suitable target variables, while it learns to handle unseen data from a different microscopy tech nology. We show the superiority of the variation of our Twin-VAE architecture over the state-of-the-art transfer learning methodology in image processing as well as classical image processing technologies, which persists, even with strongly shortened training times and leads to satisfactory results in this domain. The source code is available at https://github.com/dstallmann/transfer_learning_twinvae, works cross-platform, is open-source and free (MIT licensed) software. We make the data sets available at https://pub.uni-bielefeld.de/record/2960030. △ Less

Submitted 22 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

arXiv:2211.09587 [pdf, ps, other]

Spatial Graph Convolution Neural Networks for Water Distribution Systems

Authors: Inaam Ashraf, Luca Hermes, André Artelt, Barbara Hammer

Abstract: We investigate the task of missing value estimation in graphs as given by water distribution systems (WDS) based on sparse signals as a representative machine learning challenge in the domain of critical infrastructure. The underlying graphs have a comparably low node degree and high diameter, while information in the graph is globally relevant, hence graph neural networks face the challenge of lo… ▽ More We investigate the task of missing value estimation in graphs as given by water distribution systems (WDS) based on sparse signals as a representative machine learning challenge in the domain of critical infrastructure. The underlying graphs have a comparably low node degree and high diameter, while information in the graph is globally relevant, hence graph neural networks face the challenge of long-term dependencies. We propose a specific architecture based on message passing which displays excellent results for a number of benchmark tasks in the WDS domain. Further, we investigate a multi-hop variation, which requires considerably less resources and opens an avenue towards big WDS graphs. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Under submission. Python code will be made available soon

arXiv:2209.01939 [pdf, other]

doi 10.1007/s10994-023-06385-y

Incremental Permutation Feature Importance (iPFI): Towards Online Explanations on Data Streams

Authors: Fabian Fumagalli, Maximilian Muschalik, Eyke Hüllermeier, Barbara Hammer

Abstract: Explainable Artificial Intelligence (XAI) has mainly focused on static learning scenarios so far. We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode. We seek efficient incremental algorithms for computing feature importance (FI) measures, specifically, an incremental FI measure based on feature marginalization… ▽ More Explainable Artificial Intelligence (XAI) has mainly focused on static learning scenarios so far. We are interested in dynamic scenarios where data is sampled progressively, and learning is done in an incremental rather than a batch mode. We seek efficient incremental algorithms for computing feature importance (FI) measures, specifically, an incremental FI measure based on feature marginalization of absent features similar to permutation feature importance (PFI). We propose an efficient, model-agnostic algorithm called iPFI to estimate this measure incrementally and under dynamic modeling conditions including concept drift. We prove theoretical guarantees on the approximation quality in terms of expectation and variance. To validate our theoretical findings and the efficacy of our approaches compared to traditional batch PFI, we conduct multiple experimental studies on benchmark data with and without concept drift. △ Less

Submitted 7 September, 2022; v1 submitted 5 September, 2022; originally announced September 2022.

arXiv:2209.01358 [pdf]

Machine learning based approach for solving atomic structures of nanomaterials combining pair distribution functions with density functional theory

Authors: Magnus Kløve, Sanna Sommer, Bo B. Iversen, Bjørk Hammer, Wilke Dononelli

Abstract: Determination of crystal structures of nanocrystalline or amorphous compounds is a great challenge in solid states chemistry and physics. Pair distribution function (PDF) analysis of X-Ray or neutron total scattering data has proven to be a key element in tackling this challenge. However, in most cases a reliable structural motif is needed as starting configuration for structure refinements. Here,… ▽ More Determination of crystal structures of nanocrystalline or amorphous compounds is a great challenge in solid states chemistry and physics. Pair distribution function (PDF) analysis of X-Ray or neutron total scattering data has proven to be a key element in tackling this challenge. However, in most cases a reliable structural motif is needed as starting configuration for structure refinements. Here, we present an algorithm that is able to determine the crystal structure of an unknown compound by means of an on-the-fly trained machine learning model that combines density functional theory (DFT) calculations with comparison of calculated and measured PDFs for global optimization in an artificial landscape. Due to the nature of this landscape, even metastable configurations can be determined. △ Less

Submitted 14 September, 2022; v1 submitted 3 September, 2022; originally announced September 2022.

arXiv:2208.09273 [pdf, other]

doi 10.1063/5.0121748

Atomistic structure search using local surrogate mode

Authors: Nikolaj Rønne, Mads-Peter V. Christiansen, Andreas Møller Slavensky, Zeyuan Tang, Florian Brix, Mikkel Elkjær Pedersen, Malthe Kjær Bisbo, Bjørk Hammer

Abstract: We describe a local surrogate model for use in conjunction with global structure search methods. The model follows the Gaussian approximation potential (GAP) formalism and is based on a the smooth overlap of atomic positions descriptor with sparsification in terms of a reduced number of local environments using mini-batch $k$-means. The model is implemented in the Atomistic Global Optimization X f… ▽ More We describe a local surrogate model for use in conjunction with global structure search methods. The model follows the Gaussian approximation potential (GAP) formalism and is based on a the smooth overlap of atomic positions descriptor with sparsification in terms of a reduced number of local environments using mini-batch $k$-means. The model is implemented in the Atomistic Global Optimization X framework and used as a partial replacement of the local relaxations in basin hopping structure search. The approach is shown to be robust for a wide range of atomistic system including molecules, nano-particles, surface supported clusters and surface thin films. The benefits in a structure search context of a local surrogate model are demonstrated. This includes the ability to transfer learning from smaller systems as well as the possibility to perform concurrent multi-stoichiometry searches. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: 12 pages, 11 figures

Journal ref: J. Chem. Phys. 157, 174115 (2022)

arXiv:2207.01898 [pdf, ps, other]

"Even if ..." -- Diverse Semifactual Explanations of Reject

Authors: André Artelt, Barbara Hammer

Abstract: Machine learning based decision making systems applied in safety critical areas require reliable high certainty predictions. For this purpose, the system can be extended by an reject option which allows the system to reject inputs where only a prediction with an unacceptably low certainty would be possible. While being able to reject uncertain samples is important, it is also of importance to be a… ▽ More Machine learning based decision making systems applied in safety critical areas require reliable high certainty predictions. For this purpose, the system can be extended by an reject option which allows the system to reject inputs where only a prediction with an unacceptably low certainty would be possible. While being able to reject uncertain samples is important, it is also of importance to be able to explain why a particular sample was rejected. With the ongoing rise of eXplainable AI (XAI), a lot of explanation methodologies for machine learning based systems have been developed -- explaining reject options, however, is still a novel field where only very little prior work exists. In this work, we propose to explain rejects by semifactual explanations, an instance of example-based explanation methods, which them self have not been widely considered in the XAI community yet. We propose a conceptual modeling of semifactual explanations for arbitrary reject options and empirically evaluate a specific implementation on a conformal prediction based reject option. △ Less

Submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.07391 [pdf, other]

"Why Here and Not There?" -- Diverse Contrasting Explanations of Dimensionality Reduction

Authors: André Artelt, Alexander Schulz, Barbara Hammer

Abstract: Dimensionality reduction is a popular preprocessing and a widely used tool in data mining. Transparency, which is usually achieved by means of explanations, is nowadays a widely accepted and crucial requirement of machine learning based systems like classifiers and recommender systems. However, transparency of dimensionality reduction and other data mining tools have not been considered in much de… ▽ More Dimensionality reduction is a popular preprocessing and a widely used tool in data mining. Transparency, which is usually achieved by means of explanations, is nowadays a widely accepted and crucial requirement of machine learning based systems like classifiers and recommender systems. However, transparency of dimensionality reduction and other data mining tools have not been considered in much depth yet, still it is crucial to understand their behavior -- in particular practitioners might want to understand why a specific sample got mapped to a specific location. In order to (locally) understand the behavior of a given dimensionality reduction method, we introduce the abstract concept of contrasting explanations for dimensionality reduction, and apply a realization of this concept to the specific application of explaining two dimensional data visualization. △ Less

Submitted 22 February, 2023; v1 submitted 15 June, 2022; originally announced June 2022.

Comments: Accepted and presented as a full conference paper at ICPRAM 2023

Showing 1–50 of 134 results for author: Hammer, B