Search | arXiv e-print repository

Towards Compositionality in Concept Learning

Authors: Adam Stein, Aaditya Naik, Yinjun Wu, Mayur Naik, Eric Wong

Abstract: Concept-based interpretability methods offer a lens into the internals of foundation models by decomposing their embeddings into high-level concepts. These concept representations are most useful when they are compositional, meaning that the individual concepts compose to explain the full sample. We show that existing unsupervised concept extraction methods find concepts which are not compositiona… ▽ More Concept-based interpretability methods offer a lens into the internals of foundation models by decomposing their embeddings into high-level concepts. These concept representations are most useful when they are compositional, meaning that the individual concepts compose to explain the full sample. We show that existing unsupervised concept extraction methods find concepts which are not compositional. To automatically discover compositional concept representations, we identify two salient properties of such representations, and propose Compositional Concept Extraction (CCE) for finding concepts which obey these properties. We evaluate CCE on five different datasets over image and text data. Our evaluation shows that CCE finds more compositional concept representations than baselines and yields better accuracy on four downstream classification tasks. Code and data are available at https://github.com/adaminsky/compositional_concepts . △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: Accepted at ICML 2024. 26 pages, 10 figures

arXiv:2405.17399 [pdf, other]

Transformers Can Do Arithmetic with the Right Embeddings

Authors: Sean McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein

Abstract: The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix ena… ▽ More The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.04514 [pdf, other]

Scalable Circuit Cutting and Scheduling in a Resource-constrained and Distributed Quantum System

Authors: Shuwen Kan, Zefan Du, Miguel Palma, Samuel A Stein, Chenxu Liu, Wenqi Wei, Juntao Chen, Ang Li, Ying Mao

Abstract: Despite quantum computing's rapid development, current systems remain limited in practical applications due to their limited qubit count and quality. Various technologies, such as superconducting, trapped ions, and neutral atom quantum computing technologies are progressing towards a fault tolerant era, however they all face a diverse set of challenges in scalability and control. Recent efforts ha… ▽ More Despite quantum computing's rapid development, current systems remain limited in practical applications due to their limited qubit count and quality. Various technologies, such as superconducting, trapped ions, and neutral atom quantum computing technologies are progressing towards a fault tolerant era, however they all face a diverse set of challenges in scalability and control. Recent efforts have focused on multi-node quantum systems that connect multiple smaller quantum devices to execute larger circuits. Future demonstrations hope to use quantum channels to couple systems, however current demonstrations can leverage classical communication with circuit cutting techniques. This involves cutting large circuits into smaller subcircuits and reconstructing them post-execution. However, existing cutting methods are hindered by lengthy search times as the number of qubits and gates increases. Additionally, they often fail to effectively utilize the resources of various worker configurations in a multi-node system. To address these challenges, we introduce FitCut, a novel approach that transforms quantum circuits into weighted graphs and utilizes a community-based, bottom-up approach to cut circuits according to resource constraints, e.g., qubit counts, on each worker. FitCut also includes a scheduling algorithm that optimizes resource utilization across workers. Implemented with Qiskit and evaluated extensively, FitCut significantly outperforms the Qiskit Circuit Knitting Toolbox, reducing time costs by factors ranging from 3 to 2000 and improving resource utilization rates by up to 3.88 times on the worker side, achieving a system-wide improvement of 2.86 times. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2402.14020 [pdf, other]

Coercing LLMs to do and reveal (almost) anything

Authors: Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein

Abstract: It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. In this work, we argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking. We provide a broad overview of possible attack surfaces and attack goals. Based on a series of concrete examples, we discuss, categorize and syst… ▽ More It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. In this work, we argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking. We provide a broad overview of possible attack surfaces and attack goals. Based on a series of concrete examples, we discuss, categorize and systematize attacks that coerce varied unintended behaviors, such as misdirection, model control, denial-of-service, or data extraction. We analyze these attacks in controlled experiments, and find that many of them stem from the practice of pre-training LLMs with coding capabilities, as well as the continued existence of strange "glitch" tokens in common LLM vocabularies that should be removed for security reasons. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 32 pages. Implementation available at https://github.com/JonasGeiping/carving

arXiv:2401.15113 [pdf, other]

Towards Global Glacier Mapping with Deep Learning and Open Earth Observation Data

Authors: Konstantin A. Maslov, Claudio Persello, Thomas Schellenberger, Alfred Stein

Abstract: Accurate global glacier mapping is critical for understanding climate change impacts. Despite its importance, automated glacier mapping at a global scale remains largely unexplored. Here we address this gap and propose Glacier-VisionTransformer-U-Net (GlaViTU), a convolutional-transformer deep learning model, and five strategies for multitemporal global-scale glacier mapping using open satellite i… ▽ More Accurate global glacier mapping is critical for understanding climate change impacts. Despite its importance, automated glacier mapping at a global scale remains largely unexplored. Here we address this gap and propose Glacier-VisionTransformer-U-Net (GlaViTU), a convolutional-transformer deep learning model, and five strategies for multitemporal global-scale glacier mapping using open satellite imagery. Assessing the spatial, temporal and cross-sensor generalisation shows that our best strategy achieves intersection over union >0.85 on previously unobserved images in most cases, which drops to >0.75 for debris-rich areas such as High-Mountain Asia and increases to >0.90 for regions dominated by clean ice. A comparative validation against human expert uncertainties in terms of area and distance deviations underscores GlaViTU performance, approaching or matching expert-level delineation. Adding synthetic aperture radar data, namely, backscatter and interferometric coherence, increases the accuracy in all regions where available. The calibrated confidence for glacier extents is reported making the predictions more reliable and interpretable. We also release a benchmark dataset that covers 9% of glaciers worldwide. Our results support efforts towards automated multitemporal and global glacier mapping. △ Less

Submitted 29 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: after major revision, discussion extended, added comparison with human experts, added comparison with band ratio

arXiv:2310.03132 [pdf, ps, other]

Application-Oriented Co-Design of Motors and Motions for a 6DOF Robot Manipulator

Authors: Adrian Stein, Yebin Wang, Yusuke Sakamoto, Bingnan Wang, Huazhen Fang

Abstract: This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for… ▽ More This work investigates an application-driven co-design problem where the motion and motors of a six degrees of freedom robotic manipulator are optimized simultaneously, and the application is characterized by a set of tasks. Unlike the state-of-the-art which selects motors from a product catalogue and performs co-design for a single task, this work designs the motor geometry as well as motion for a specific application. Contributions are made towards solving the proposed co-design problem in a computationally-efficient manner. First, a two-step process is proposed, where multiple motor designs are identified by optimizing motions and motors for multiple tasks one by one, and then are reconciled to determine the final motor design. Second, magnetic equivalent circuit modeling is exploited to establish the analytic mapping from motor design parameters to dynamic models and objective functions to facilitate the subsequent differentiable simulation. Third, a direct-collocation-based differentiable simulator of motor and robotic arm dynamics is developed to balance the computational complexity and numerical stability. Simulation verifies that higher performance for a specific application can be achieved with the multi-task method, compared to several benchmark co-design methods. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2308.06686 [pdf, other]

TorchQL: A Programming Framework for Integrity Constraints in Machine Learning

Authors: Aaditya Naik, Adam Stein, Yinjun Wu, Mayur Naik, Eric Wong

Abstract: Finding errors in machine learning applications requires a thorough exploration of their behavior over data. Existing approaches used by practitioners are often ad-hoc and lack the abstractions needed to scale this process. We present TorchQL, a programming framework to evaluate and improve the correctness of machine learning applications. TorchQL allows users to write queries to specify and check… ▽ More Finding errors in machine learning applications requires a thorough exploration of their behavior over data. Existing approaches used by practitioners are often ad-hoc and lack the abstractions needed to scale this process. We present TorchQL, a programming framework to evaluate and improve the correctness of machine learning applications. TorchQL allows users to write queries to specify and check integrity constraints over machine learning models and datasets. It seamlessly integrates relational algebra with functional programming to allow for highly expressive queries using only eight intuitive operators. We evaluate TorchQL on diverse use-cases including finding critical temporal inconsistencies in objects detected across video frames in autonomous driving, finding data imputation errors in time-series medical records, finding data labeling errors in real-world images, and evaluating biases and constraining outputs of language models. Our experiments show that TorchQL enables up to 13x faster query executions than baselines like Pandas and MongoDB, and up to 40% shorter queries than native Python. We also conduct a user study and find that TorchQL is natural enough for developers familiar with Python to specify complex integrity constraints. △ Less

Submitted 14 February, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

arXiv:2307.09835 [pdf, ps, other]

Deep Operator Network Approximation Rates for Lipschitz Operators

Authors: Christoph Schwab, Andreas Stein, Jakob Zech

Abstract: We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of… ▽ More We establish universality and expression rate bounds for a class of neural Deep Operator Networks (DON) emulating Lipschitz (or Hölder) continuous maps $\mathcal G:\mathcal X\to\mathcal Y$ between (subsets of) separable Hilbert spaces $\mathcal X$, $\mathcal Y$. The DON architecture considered uses linear encoders $\mathcal E$ and decoders $\mathcal D$ via (biorthogonal) Riesz bases of $\mathcal X$, $\mathcal Y$, and an approximator network of an infinite-dimensional, parametric coordinate map that is Lipschitz continuous on the sequence space $\ell^2(\mathbb N)$. Unlike previous works ([Herrmann, Schwab and Zech: Neural and Spectral operator surrogates: construction and expression rate bounds, SAM Report, 2022], [Marcati and Schwab: Exponential Convergence of Deep Operator Networks for Elliptic Partial Differential Equations, SAM Report, 2022]), which required for example $\mathcal G$ to be holomorphic, the present expression rate results require mere Lipschitz (or Hölder) continuity of $\mathcal G$. Key in the proof of the present expression rate bounds is the use of either super-expressive activations (e.g. [Yarotski: Elementary superexpressive activations, Int. Conf. on ML, 2021], [Shen, Yang and Zhang: Neural network approximation: Three hidden layers are enough, Neural Networks, 2021], and the references there) which are inspired by the Kolmogorov superposition theorem, or of nonstandard NN architectures with standard (ReLU) activations as recently proposed in [Zhang, Shen and Yang: Neural Network Architecture Beyond Width and Depth, Adv. in Neural Inf. Proc. Sys., 2022]. We illustrate the abstract results by approximation rate bounds for emulation of a) solution operators for parametric elliptic variational inequalities, and b) Lipschitz maps of Hilbert-Schmidt operators. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: 31 pages

MSC Class: 41A65; 68T15; 68Q32

arXiv:2306.00976 [pdf, other]

TopEx: Topic-based Explanations for Model Comparison

Authors: Shreya Havaldar, Adam Stein, Eric Wong, Lyle Ungar

Abstract: Meaningfully comparing language models is challenging with current explanation methods. Current explanations are overwhelming for humans due to large vocabularies or incomparable across models. We present TopEx, an explanation method that enables a level playing field for comparing language models via model-agnostic topics. We demonstrate how TopEx can identify similarities and differences between… ▽ More Meaningfully comparing language models is challenging with current explanation methods. Current explanations are overwhelming for humans due to large vocabularies or incomparable across models. We present TopEx, an explanation method that enables a level playing field for comparing language models via model-agnostic topics. We demonstrate how TopEx can identify similarities and differences between DistilRoBERTa and GPT-2 on a variety of NLP tasks. △ Less

Submitted 1 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to ICLR 2023, Tiny Papers Track

arXiv:2305.19787 [pdf, other]

DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation

Authors: Xianwei Lv, Claudio Persello, Wangbin Li, Xiao Huang, Dongping Ming, Alfred Stein

Abstract: Image segmentation aims to partition an image according to the objects in the scene and is a fundamental step in analysing very high spatial-resolution (VHR) remote sensing imagery. Current methods struggle to effectively consider land objects with diverse shapes and sizes. Additionally, the determination of segmentation scale parameters frequently adheres to a static and empirical doctrine, posin… ▽ More Image segmentation aims to partition an image according to the objects in the scene and is a fundamental step in analysing very high spatial-resolution (VHR) remote sensing imagery. Current methods struggle to effectively consider land objects with diverse shapes and sizes. Additionally, the determination of segmentation scale parameters frequently adheres to a static and empirical doctrine, posing limitations on the segmentation of large-scale remote sensing images and yielding algorithms with limited interpretability. To address the above challenges, we propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images by integrating deep learning and region adjacency graph (RAG). This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG. We propose a modified binary tree sampling method to generate shift-scale data, serving as inputs for transformer-based deep learning networks, a shift-scale attention with 3-Dimension relative position embedding to learn features across scales, and an embedding to fuse learned features with hand-crafted features. DeepMerge can achieve high segmentation accuracy in a supervised manner from large-scale remotely sensed images and provides an interpretable optimal scale parameter, which is validated using a remote sensing image of 0.55 m resolution covering an area of 5,660 km^2. The experimental results show that DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods. △ Less

Submitted 5 January, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.16308 [pdf, other]

Rectifying Group Irregularities in Explanations for Distribution Shift

Authors: Adam Stein, Yinjun Wu, Eric Wong, Mayur Naik

Abstract: It is well-known that real-world changes constituting distribution shift adversely affect model performance. How to characterize those changes in an interpretable manner is poorly understood. Existing techniques to address this problem take the form of shift explanations that elucidate how to map samples from the original distribution toward the shifted one by reducing the disparity between these… ▽ More It is well-known that real-world changes constituting distribution shift adversely affect model performance. How to characterize those changes in an interpretable manner is poorly understood. Existing techniques to address this problem take the form of shift explanations that elucidate how to map samples from the original distribution toward the shifted one by reducing the disparity between these two distributions. However, these methods can introduce group irregularities, leading to explanations that are less feasible and robust. To address these issues, we propose Group-aware Shift Explanations (GSE), a method that produces interpretable explanations by leveraging worst-group optimization to rectify group irregularities. We demonstrate how GSE not only maintains group structures, such as demographic and hierarchical subpopulations, but also enhances feasibility and robustness in the resulting explanations in a wide range of tabular, language, and image settings. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 19 pages, 5 figures

arXiv:2303.14511 [pdf, other]

Improving robustness of jet tagging algorithms with adversarial training: exploring the loss surface

Authors: Annika Stein

Abstract: In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst's perspective, obtaining highest possible performance is desirable, but recently, some attention has been shifted towards studying robustness of models to inv… ▽ More In the field of high-energy physics, deep learning algorithms continue to gain in relevance and provide performance improvements over traditional methods, for example when identifying rare signals or finding complex patterns. From an analyst's perspective, obtaining highest possible performance is desirable, but recently, some attention has been shifted towards studying robustness of models to investigate how well these perform under slight distortions of input features. Especially for tasks that involve many (low-level) inputs, the application of deep neural networks brings new challenges. In the context of jet flavor tagging, adversarial attacks are used to probe a typical classifier's vulnerability and can be understood as a model for systematic uncertainties. A corresponding defense strategy, adversarial training, improves robustness, while maintaining high performance. Investigating the loss surface corresponding to the inputs and models in question reveals geometric interpretations of robustness, taking correlations into account. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: 5 pages, 2 figures; submitted to ACAT 2022 proceedings

arXiv:2303.00116 [pdf, other]

Neural Auctions Compromise Bidder Information

Authors: Alex Stein, Avi Schwarzschild, Michael Curry, Tom Goldstein, John Dickerson

Abstract: Single-shot auctions are commonly used as a means to sell goods, for example when selling ad space or allocating radio frequencies, however devising mechanisms for auctions with multiple bidders and multiple items can be complicated. It has been shown that neural networks can be used to approximate optimal mechanisms while satisfying the constraints that an auction be strategyproof and individuall… ▽ More Single-shot auctions are commonly used as a means to sell goods, for example when selling ad space or allocating radio frequencies, however devising mechanisms for auctions with multiple bidders and multiple items can be complicated. It has been shown that neural networks can be used to approximate optimal mechanisms while satisfying the constraints that an auction be strategyproof and individually rational. We show that despite such auctions maximizing revenue, they do so at the cost of revealing private bidder information. While randomness is often used to build in privacy, in this context it comes with complications if done without care. Specifically, it can violate rationality and feasibility constraints, fundamentally change the incentive structure of the mechanism, and/or harm top-level metrics such as revenue and social welfare. We propose a method that employs stochasticity to improve privacy while meeting the requirements for auction mechanisms with only a modest sacrifice in revenue. We analyze the cost to the auction house that comes with introducing varying degrees of privacy in common auction settings. Our results show that despite current neural auctions' ability to approximate optimal mechanisms, the resulting vulnerability that comes with relying on neural networks must be accounted for. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2302.04418 [pdf, other]

Learning to Select Pivotal Samples for Meta Re-weighting

Authors: Yinjun Wu, Adam Stein, Jacob Gardner, Mayur Naik

Abstract: Sample re-weighting strategies provide a promising mechanism to deal with imperfect training data in machine learning, such as noisily labeled or class-imbalanced data. One such strategy involves formulating a bi-level optimization problem called the meta re-weighting problem, whose goal is to optimize performance on a small set of perfect pivotal samples, called meta samples. Many approaches have… ▽ More Sample re-weighting strategies provide a promising mechanism to deal with imperfect training data in machine learning, such as noisily labeled or class-imbalanced data. One such strategy involves formulating a bi-level optimization problem called the meta re-weighting problem, whose goal is to optimize performance on a small set of perfect pivotal samples, called meta samples. Many approaches have been proposed to efficiently solve this problem. However, all of them assume that a perfect meta sample set is already provided while we observe that the selections of meta sample set is performance critical. In this paper, we study how to learn to identify such a meta sample set from a large, imperfect training set, that is subsequently cleaned and used to optimize performance in the meta re-weighting setting. We propose a learning framework which reduces the meta samples selection problem to a weighted K-means clustering problem through rigorously theoretical analysis. We propose two clustering methods within our learning framework, Representation-based clustering method (RBC) and Gradient-based clustering method (GBC), for balancing performance and computational efficiency. Empirical studies demonstrate the performance advantage of our methods over various baseline methods. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: Published in AAAI 2023 (oral)

arXiv:2301.13379 [pdf, other]

Faithful Chain-of-Thought Reasoning

Authors: Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apidianaki, Chris Callison-Burch

Abstract: While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving… ▽ More While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM and a deterministic solver respectively. This guarantees that the reasoning chain provides a faithful explanation of the final answer. Aside from interpretability, Faithful CoT also improves empirical performance: it outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning, 5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference. Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong synergy between faithfulness and accuracy. △ Less

Submitted 20 September, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

Comments: IJCNLP-AACL 2023 camera-ready version

arXiv:2210.05443 [pdf, other]

QuCNN : A Quantum Convolutional Neural Network with Entanglement Based Backpropagation

Authors: Samuel A. Stein, Ying Mao, James Ang, Ang Li

Abstract: Quantum Machine Learning continues to be a highly active area of interest within Quantum Computing. Many of these approaches have adapted classical approaches to the quantum settings, such as QuantumFlow, etc. We push forward this trend and demonstrate an adaption of the Classical Convolutional Neural Networks to quantum systems - namely QuCNN. QuCNN is a parameterised multi-quantum-state based ne… ▽ More Quantum Machine Learning continues to be a highly active area of interest within Quantum Computing. Many of these approaches have adapted classical approaches to the quantum settings, such as QuantumFlow, etc. We push forward this trend and demonstrate an adaption of the Classical Convolutional Neural Networks to quantum systems - namely QuCNN. QuCNN is a parameterised multi-quantum-state based neural network layer computing similarities between each quantum filter state and each quantum data state. With QuCNN, back propagation can be achieved through a single-ancilla qubit quantum routine. QuCNN is validated by applying a convolutional layer with a data state and a filter state over a small subset of MNIST images, comparing the back propagated gradients, and training a filter state against an ideal target state. △ Less

Submitted 11 October, 2022; originally announced October 2022.

arXiv:2203.13890 [pdf, other]

doi 10.1007/s41781-022-00087-1

Improving Robustness of Jet Tagging Algorithms with Adversarial Training

Authors: Annika Stein, Xavier Coubez, Spandan Mondal, Andrzej Novak, Alexander Schmidt

Abstract: Deep learning is a standard tool in the field of high-energy physics, facilitating considerable sensitivity enhancements for numerous analysis strategies. In particular, in identification of physics objects, such as jet flavor tagging, complex neural network architectures play a major role. However, these methods are reliant on accurate simulations. Mismodeling can lead to non-negligible differenc… ▽ More Deep learning is a standard tool in the field of high-energy physics, facilitating considerable sensitivity enhancements for numerous analysis strategies. In particular, in identification of physics objects, such as jet flavor tagging, complex neural network architectures play a major role. However, these methods are reliant on accurate simulations. Mismodeling can lead to non-negligible differences in performance in data that need to be measured and calibrated against. We investigate the classifier response to input data with injected mismodelings and probe the vulnerability of flavor tagging algorithms via application of adversarial attacks. Subsequently, we present an adversarial training strategy that mitigates the impact of such simulated attacks and improves the classifier robustness. We examine the relationship between performance and vulnerability and show that this method constitutes a promising approach to reduce the vulnerability to poor modeling. △ Less

Submitted 16 September, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

Comments: 17 pages, 16 figures, 2 tables. Replaced with the published version. Added the journal reference and the DOI. Code accessible under https://github.com/AnnikaStein/Adversarial-Training-for-Jet-Tagging

Journal ref: Comput Softw Big Sci 6 (2022) 15

arXiv:2111.09087 [pdf, other]

doi 10.1007/s10489-021-03035-5

A Case Study of Vehicle Route Optimization

Authors: Veronika Lesch, Maximilian König, Samuel Kounev, Anthony Stein, Christian Krupitzer

Abstract: In the last decades, the classical Vehicle Routing Problem (VRP), i.e., assigning a set of orders to vehicles and planning their routes has been intensively researched. As only the assignment of order to vehicles and their routes is already an NP-complete problem, the application of these algorithms in practice often fails to take into account the constraints and restrictions that apply in real-wo… ▽ More In the last decades, the classical Vehicle Routing Problem (VRP), i.e., assigning a set of orders to vehicles and planning their routes has been intensively researched. As only the assignment of order to vehicles and their routes is already an NP-complete problem, the application of these algorithms in practice often fails to take into account the constraints and restrictions that apply in real-world applications, the so called rich VRP (rVRP) and are limited to single aspects. In this work, we incorporate the main relevant real-world constraints and requirements. We propose a two-stage strategy and a Timeline algorithm for time windows and pause times, and apply a Genetic Algorithm (GA) and Ant Colony Optimization (ACO) individually to the problem to find optimal solutions. Our evaluation of eight different problem instances against four state-of-the-art algorithms shows that our approach handles all given constraints in a reasonable time. △ Less

Submitted 17 November, 2021; originally announced November 2021.

arXiv:2102.07280 [pdf, other]

doi 10.1109/IGARSS47720.2021.9554573

3D Fully Convolutional Neural Networks with Intersection Over Union Loss for Crop Mapping from Multi-Temporal Satellite Images

Authors: Sina Mohammadi, Mariana Belgiu, Alfred Stein

Abstract: Information on cultivated crops is relevant for a large number of food security studies. Different scientific efforts are dedicated to generating this information from remote sensing images by means of machine learning methods. Unfortunately, these methods do not take account of the spatial-temporal relationships inherent in remote sensing images. In our paper, we explore the capability of a 3D Fu… ▽ More Information on cultivated crops is relevant for a large number of food security studies. Different scientific efforts are dedicated to generating this information from remote sensing images by means of machine learning methods. Unfortunately, these methods do not take account of the spatial-temporal relationships inherent in remote sensing images. In our paper, we explore the capability of a 3D Fully Convolutional Neural Network (FCN) to map crop types from multi-temporal images. In addition, we propose the Intersection Over Union (IOU) loss function for increasing the overlap between the predicted classes and ground reference data. The proposed method was applied to identify soybean and corn from a study area situated in the US corn belt using multi-temporal Landsat images. The study shows that our method outperforms related methods, obtaining a Kappa coefficient of 91.8%. We conclude that using the IOU loss function provides a superior choice to learn individual crop types. △ Less

Submitted 19 October, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Comments: Accepted by IGARSS 2021

Journal ref: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 5834-5837

arXiv:2012.00824 [pdf, ps, other]

Quantum-Inspired Classical Algorithm for Slow Feature Analysis

Authors: Daniel Chen, Yekun Xu, Betis Baheri, Samuel A. Stein, Chuan Bi, Ying Mao, Qiang Quan, Shuai Xu

Abstract: Recently, there has been a surge of interest for quantum computation for its ability to exponentially speed up algorithms, including machine learning algorithms. However, Tang suggested that the exponential speed up can also be done on a classical computer. In this paper, we proposed an algorithm for slow feature analysis, a machine learning algorithm that extracts the slow-varying features, with… ▽ More Recently, there has been a surge of interest for quantum computation for its ability to exponentially speed up algorithms, including machine learning algorithms. However, Tang suggested that the exponential speed up can also be done on a classical computer. In this paper, we proposed an algorithm for slow feature analysis, a machine learning algorithm that extracts the slow-varying features, with a run time O(polylog(n)poly(d)). To achieve this, we assumed necessary preprocessing of the input data as well as the existence of a data structure supporting a particular sampling scheme. The analysis of algorithm borrowed results from matrix perturbation theory, which was crucial for the algorithm's correctness. This work demonstrates the possible application and extent for which quantum-inspired computation can be used. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2002.05628 [pdf, other]

XCS Classifier System with Experience Replay

Authors: Anthony Stein, Roland Maier, Lukas Rosenbauer, Jörg Hähner

Abstract: XCS constitutes the most deeply investigated classifier system today. It bears strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter d… ▽ More XCS constitutes the most deeply investigated classifier system today. It bears strong potentials and comes with inherent capabilities for mastering a variety of different learning tasks. Besides outstanding successes in various classification and regression tasks, XCS also proved very effective in certain multi-step environments from the domain of reinforcement learning. Especially in the latter domain, recent advances have been mainly driven by algorithms which model their policies based on deep neural networks -- among which the Deep-Q-Network (DQN) is a prominent representative. Experience Replay (ER) constitutes one of the crucial factors for the DQN's successes, since it facilitates stabilized training of the neural network-based Q-function approximators. Surprisingly, XCS barely takes advantage of similar mechanisms that leverage stored raw experiences encountered so far. To bridge this gap, this paper investigates the benefits of extending XCS with ER. On the one hand, we demonstrate that for single-step tasks ER bears massive potential for improvements in terms of sample efficiency. On the shady side, however, we reveal that the use of ER might further aggravate well-studied issues not yet solved for XCS when applied to sequential decision problems demanding for long-action-chains. △ Less

Submitted 13 February, 2020; originally announced February 2020.

arXiv:2002.01370 [pdf, other]

Bootstrapping a DQN Replay Memory with Synthetic Experiences

Authors: Wenzel Baron Pilar von Pilchau, Anthony Stein, Jörg Hähner

Abstract: An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples b… ▽ More An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples bear great potential in form of knowledge about the problem that can be extracted. We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner. The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version. △ Less

Submitted 4 February, 2020; originally announced February 2020.

arXiv:1902.06061 [pdf, other]

Towards Automated Melanoma Detection with Deep Learning: Data Purification and Augmentation

Authors: Devansh Bisla, Anna Choromanska, Jennifer A. Stein, David Polsky, Russell Berman

Abstract: Melanoma is one of the ten most common cancers in the US. Early detection is crucial for survival, but often the cancer is diagnosed in the fatal stage. Deep learning has the potential to improve cancer detection rates, but its applicability to melanoma detection is compromised by the limitations of the available skin lesion databases, which are small, heavily imbalanced, and contain images with o… ▽ More Melanoma is one of the ten most common cancers in the US. Early detection is crucial for survival, but often the cancer is diagnosed in the fatal stage. Deep learning has the potential to improve cancer detection rates, but its applicability to melanoma detection is compromised by the limitations of the available skin lesion databases, which are small, heavily imbalanced, and contain images with occlusions. We build deep-learning-based tools for data purification and augmentation to counter-act these limitations. The developed tools can be utilized in a deep learning system for lesion classification and we show how to build such a system. The system heavily relies on the processing unit for removing image occlusions and the data generation unit, based on generative adversarial networks, for populating scarce lesion classes, or equivalently creating virtual patients with pre-defined types of lesions. We empirically verify our approach and show that incorporating these two units into melanoma detection system results in the superior performance over common baselines. △ Less

Submitted 14 May, 2019; v1 submitted 16 February, 2019; originally announced February 2019.

Comments: Accepted to CVPR ISIC Workshop - 2019

arXiv:1809.01771 [pdf, ps, other]

doi 10.1016/j.ins.2018.09.001

An Analysis of Hierarchical Text Classification Using Word Embeddings

Authors: Roger A. Stein, Patricia A. Jaques, Joao F. Valiati

Abstract: Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and alg… ▽ More Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations---fastText, XGBoost, SVM, and Keras' CNN---and noticeable word embeddings generation methods---GloVe, word2vec, and fastText---with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC. △ Less

Submitted 5 September, 2018; originally announced September 2018.

Comments: Article accepted for publication in Information Sciences on Sep 1st, 2018

arXiv:1806.05793 [pdf, other]

doi 10.1109/TGRS.2018.2837357

Recurrent Multiresolution Convolutional Networks for VHR Image Classification

Authors: John Ray Bergado, Claudio Persello, Alfred Stein

Abstract: Classification of very high resolution (VHR) satellite images has three major challenges: 1) inherent low intra-class and high inter-class spectral similarities, 2) mismatching resolution of available bands, and 3) the need to regularize noisy classification maps. Conventional methods have addressed these challenges by adopting separate stages of image fusion, feature extraction, and post-classifi… ▽ More Classification of very high resolution (VHR) satellite images has three major challenges: 1) inherent low intra-class and high inter-class spectral similarities, 2) mismatching resolution of available bands, and 3) the need to regularize noisy classification maps. Conventional methods have addressed these challenges by adopting separate stages of image fusion, feature extraction, and post-classification map regularization. These processing stages, however, are not jointly optimizing the classification task at hand. In this study, we propose a single-stage framework embedding the processing stages in a recurrent multiresolution convolutional network trained in an end-to-end manner. The feedforward version of the network, called FuseNet, aims to match the resolution of the panchromatic and multispectral bands in a VHR image using convolutional layers with corresponding downsampling and upsampling operations. Contextual label information is incorporated into FuseNet by means of a recurrent version called ReuseNet. We compared FuseNet and ReuseNet against the use of separate processing steps for both image fusion, e.g. pansharpening and resampling through interpolation, and map regularization such as conditional random fields. We carried out our experiments on a land cover classification task using a Worldview-03 image of Quezon City, Philippines and the ISPRS 2D semantic labeling benchmark dataset of Vaihingen, Germany. FuseNet and ReuseNet surpass the baseline approaches in both quantitative and qualitative results. △ Less

Submitted 14 June, 2018; originally announced June 2018.

arXiv:1511.03010 [pdf]

doi 10.3390/ijgi5050055.

Geospatial Big Data Handling Theory and Methods: A Review and Research Challenges

Authors: S. Li, S. Dragicevic, F. Anton, M. Sester, S. Winter, A. Coltekin, C. Pettit, B. Jiang, J. Haworth, A. Stein, T. Cheng

Abstract: Big data has now become a strong focus of global interest that is increasingly attracting the attention of academia, industry, government and other organizations. Big data can be situated in the disciplinary area of traditional geospatial data handling theory and methods. The increasing volume and varying format of collected geospatial big data presents challenges in storing, managing, processing,… ▽ More Big data has now become a strong focus of global interest that is increasingly attracting the attention of academia, industry, government and other organizations. Big data can be situated in the disciplinary area of traditional geospatial data handling theory and methods. The increasing volume and varying format of collected geospatial big data presents challenges in storing, managing, processing, analyzing, visualizing and verifying the quality of data. This has implications for the quality of decisions made with big data. Consequently, this position paper of the International Society for Photogrammetry and Remote Sensing (ISPRS) Technical Commission II (TC II) revisits the existing geospatial data handling methods and theories to determine if they are still capable of handling emerging geospatial big data. Further, the paper synthesises problems, major issues and challenges with current developments as well as recommending what needs to be developed further in the near future. Keywords: Big data, Geospatial, Data handling, Analytics, Spatial Modeling, Review △ Less

Submitted 10 November, 2015; originally announced November 2015.

Comments: 25 pages, 3 figures

Journal ref: ISPRS International Journal of Geo-Information, 5(5), 55, 2016

Showing 1–26 of 26 results for author: Stein, A