Search | arXiv e-print repository

Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness

Authors: Lars Hillebrand, Prabhupad Pradhan, Christian Bauckhage, Rafet Sifa

Abstract: We introduce "pointer-guided segment ordering" (SO), a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations in large language models. Our methodology leverages a self-attention-driven pointer network to restore the original sequence of shuffled text segments, addressing the challenge of capturing the structural coherence and contextua… ▽ More We introduce "pointer-guided segment ordering" (SO), a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations in large language models. Our methodology leverages a self-attention-driven pointer network to restore the original sequence of shuffled text segments, addressing the challenge of capturing the structural coherence and contextual dependencies within documents. This pre-training approach is complemented by a fine-tuning methodology that incorporates dynamic sampling, augmenting the diversity of training instances and improving sample efficiency for various downstream applications. We evaluate our method on a diverse set of datasets, demonstrating its efficacy in tasks requiring sequential text classification across scientific literature and financial reporting domains. Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures, leading to state-of-the-art performance in downstream classification tasks. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 17 pages, 3 figures, 5 tables, accepted at ECML-PKDD 2024

arXiv:2311.15679 [pdf, other]

Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

Authors: Maurice Günder, Sneha Banerjee, Rafet Sifa, Christian Bauckhage

Abstract: Model-agnostic explanation methods for deep learning models are flexible regarding usability and availability. However, due to the fact that they can only manipulate input to see changes in output, they suffer from weak performance when used with complex model architectures. For models with large inputs as, for instance, in object detection, sampling-based methods like KernelSHAP are inefficient d… ▽ More Model-agnostic explanation methods for deep learning models are flexible regarding usability and availability. However, due to the fact that they can only manipulate input to see changes in output, they suffer from weak performance when used with complex model architectures. For models with large inputs as, for instance, in object detection, sampling-based methods like KernelSHAP are inefficient due to many computation-heavy forward passes through the model. In this work, we present a framework for using sampling-based explanation models in a computer vision context by body part relevance assessment for pedestrian detection. Furthermore, we introduce a novel sampling-based method similar to KernelSHAP that shows more robustness for lower sampling sizes and, thus, is more efficient for explainability analyses on large-scale datasets. △ Less

Submitted 1 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.03076 [pdf, other]

SugarViT -- Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet

Authors: Maurice Günder, Facundo Ramón Ispizua Yamati, Abel Andree Barreto Alcántara, Anne-Katrin Mahlein, Rafet Sifa, Christian Bauckhage

Abstract: Remote sensing and artificial intelligence are pivotal technologies of precision agriculture nowadays. The efficient retrieval of large-scale field imagery combined with machine learning techniques shows success in various tasks like phenotyping, weeding, cropping, and disease control. This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation… ▽ More Remote sensing and artificial intelligence are pivotal technologies of precision agriculture nowadays. The efficient retrieval of large-scale field imagery combined with machine learning techniques shows success in various tasks like phenotyping, weeding, cropping, and disease control. This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation for the use case disease severity scoring for Cercospora Leaf Spot (CLS) in sugar beet. With concepts of Deep Label Distribution Learning (DLDL), special loss functions, and a tailored model architecture, we develop an efficient Vision Transformer based model for disease severity scoring called SugarViT. One novelty in this work is the combination of remote sensing data with environmental parameters of the experimental sites for disease severity prediction. Although the model is evaluated on this special use case, it is held as generic as possible to also be applicable to various image-based classification and regression tasks. With our framework, it is even possible to learn models on multi-objective problems as we show by a pretraining on environmental metadata. △ Less

Submitted 1 February, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: submitted to Computers and Electronics in Agriculture

arXiv:2310.13526 [pdf, ps, other]

Controlled Randomness Improves the Performance of Transformer Models

Authors: Tobias Deußer, Cong Zhao, Wolfgang Krämer, David Leonhard, Christian Bauckhage, Rafet Sifa

Abstract: During the pre-training step of natural language models, the main objective is to learn a general representation of the pre-training dataset, usually requiring large amounts of textual data to capture the complexity and diversity of natural language. Contrasting this, in most cases, the size of the data available to solve the specific downstream task is often dwarfed by the aforementioned pre-trai… ▽ More During the pre-training step of natural language models, the main objective is to learn a general representation of the pre-training dataset, usually requiring large amounts of textual data to capture the complexity and diversity of natural language. Contrasting this, in most cases, the size of the data available to solve the specific downstream task is often dwarfed by the aforementioned pre-training dataset, especially in domains where data is scarce. We introduce controlled randomness, i.e. noise, into the training process to improve fine-tuning language models and explore the performance of targeted noise in addition to the parameters of these models. We find that adding such noise can improve the performance in our two downstream tasks of joint named entity recognition and relation extraction and text summarization. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: Accepted at ICMLA 2023, 10 pages, 2 tables

arXiv:2308.07791 [pdf, other]

Informed Named Entity Recognition Decoding for Generative Language Models

Authors: Tobias Deußer, Lars Hillebrand, Christian Bauckhage, Rafet Sifa

Abstract: Ever-larger language models with ever-increasing capabilities are by now well-established text processing tools. Alas, information extraction tasks such as named entity recognition are still largely unaffected by this progress as they are primarily based on the previous generation of encoder-only transformer models. Here, we propose a simple yet effective approach, Informed Named Entity Recognitio… ▽ More Ever-larger language models with ever-increasing capabilities are by now well-established text processing tools. Alas, information extraction tasks such as named entity recognition are still largely unaffected by this progress as they are primarily based on the previous generation of encoder-only transformer models. Here, we propose a simple yet effective approach, Informed Named Entity Recognition Decoding (iNERD), which treats named entity recognition as a generative process. It leverages the language understanding capabilities of recent generative models in a future-proof manner and employs an informed decoding scheme incorporating the restricted nature of information extraction into open-ended text generation, improving performance and eliminating any risk of hallucinations. We coarse-tune our model on a merged named entity corpus to strengthen its performance, evaluate five generative language models on eight named entity recognition datasets, and achieve remarkable results, especially in an environment with an unknown entity class set, demonstrating the adaptability of the approach. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: 12 pages, 2 figures, 4 tables

arXiv:2308.06111 [pdf, other]

Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models

Authors: Lars Hillebrand, Armin Berger, Tobias Deußer, Tim Dilmaghani, Mohamed Khaled, Bernd Kliem, Rüdiger Loitz, Maren Pielka, David Leonhard, Christian Bauckhage, Rafet Sifa

Abstract: Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial envir… ▽ More Auditing financial documents is a very tedious and time-consuming process. As of today, it can already be simplified by employing AI-based solutions to recommend relevant text passages from a report for each legal requirement of rigorous accounting standards. However, these methods need to be fine-tuned regularly, and they require abundant annotated data, which is often lacking in industrial environments. Hence, we present ZeroShotALI, a novel recommender system that leverages a state-of-the-art large language model (LLM) in conjunction with a domain-specifically optimized transformer-based text-matching solution. We find that a two-step approach of first retrieving a number of best matching document sections per legal requirement with a custom BERT-based model and second filtering these selections using an LLM yields significant performance improvements over existing approaches. △ Less

Submitted 14 August, 2023; v1 submitted 11 August, 2023; originally announced August 2023.

Comments: Accepted at DocEng 2023, 4 pages, 1 figure, 2 tables

arXiv:2306.15786 [pdf, other]

An Empirical Evaluation of the Rashomon Effect in Explainable Machine Learning

Authors: Sebastian Müller, Vanessa Toborek, Katharina Beckh, Matthias Jakobs, Christian Bauckhage, Pascal Welke

Abstract: The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation a… ▽ More The Rashomon Effect describes the following phenomenon: for a given dataset there may exist many models with equally good performance but with different solution strategies. The Rashomon Effect has implications for Explainable Machine Learning, especially for the comparability of explanations. We provide a unified view on three different comparison scenarios and conduct a quantitative evaluation across different datasets, models, attribution methods, and metrics. We find that hyperparameter-tuning plays a role and that metric selection matters. Our results provide empirical support for previously anecdotal evidence and exhibit challenges for both scientists and practitioners. △ Less

Submitted 29 June, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2211.06112 [pdf, other]

Towards automating Numerical Consistency Checks in Financial Reports

Authors: Lars Hillebrand, Tobias Deußer, Tim Dilmaghani, Bernd Kliem, Rüdiger Loitz, Christian Bauckhage, Rafet Sifa

Abstract: We introduce KPI-Check, a novel system that automatically identifies and cross-checks semantically equivalent key performance indicators (KPIs), e.g. "revenue" or "total costs", in real-world German financial reports. It combines a financial named entity and relation extraction module with a BERT-based filtering and text pair classification component to extract KPIs from unstructured sentences bef… ▽ More We introduce KPI-Check, a novel system that automatically identifies and cross-checks semantically equivalent key performance indicators (KPIs), e.g. "revenue" or "total costs", in real-world German financial reports. It combines a financial named entity and relation extraction module with a BERT-based filtering and text pair classification component to extract KPIs from unstructured sentences before linking them to synonymous occurrences in the balance sheet and profit & loss statement. The tool achieves a high matching performance of $73.00$% micro F$_1$ on a hold out test set and is currently being deployed for a globally operating major auditing firm to assist the auditing procedure of financial statements. △ Less

Submitted 11 November, 2022; originally announced November 2022.

Comments: Accepted at BigData 2022, 10 pages, 3 figure, 5 tables

arXiv:2210.09163 [pdf, ps, other]

doi 10.1109/ICMLA55696.2022.00254

KPI-EDGAR: A Novel Dataset and Accompanying Metric for Relation Extraction from Financial Documents

Authors: Tobias Deußer, Syed Musharraf Ali, Lars Hillebrand, Desiana Nurchalifah, Basil Jacob, Christian Bauckhage, Rafet Sifa

Abstract: We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four acco… ▽ More We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: Accepted at ICMLA 2022, 6 pages, 5 tables

arXiv:2210.01241 [pdf, other]

Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization

Authors: Rajkumar Ramamurthy, Prithviraj Ammanabrolu, Kianté Brantley, Jack Hessel, Rafet Sifa, Christian Bauckhage, Hannaneh Hajishirzi, Yejin Choi

Abstract: We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of… ▽ More We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If we view text generation as a sequential decision-making problem, reinforcement learning (RL) appears to be a natural conceptual framework. However, using RL for LM-based generation faces empirical challenges, including training instability due to the combinatorial action space, as well as a lack of open-source libraries and benchmarks customized for LM alignment. Thus, a question rises in the research community: is RL a practical paradigm for NLP? To help answer this, we first introduce an open-source modular library, RL4LMs (Reinforcement Learning for Language Models), for optimizing language generators with RL. The library consists of on-policy RL algorithms that can be used to train any encoder or encoder-decoder LM in the HuggingFace library (Wolf et al. 2020) with an arbitrary reward function. Next, we present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, a set of 6 language generation tasks which are supervised not by target strings, but by reward functions which capture automated measures of human preference. GRUE is the first leaderboard-style evaluation of RL algorithms for NLP tasks. Finally, we introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization) that learns to effectively reduce the combinatorial action space in language generation. We show 1) that RL techniques are generally better than supervised methods at aligning LMs to human preferences; and 2) that NLPO exhibits greater stability and performance than previous policy gradient methods (e.g., PPO (Schulman et al. 2017)), based on both automatic and human evaluations. △ Less

Submitted 28 February, 2023; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: In Proceedings of ICLR 2023. Code found at https://github.com/allenai/rl4lms and Project website at https://rl4lms.apps.allenai.org/

arXiv:2209.02055 [pdf, other]

Full Kullback-Leibler-Divergence Loss for Hyperparameter-free Label Distribution Learning

Authors: Maurice Günder, Nico Piatkowski, Christian Bauckhage

Abstract: The concept of Label Distribution Learning (LDL) is a technique to stabilize classification and regression problems with ambiguous and/or imbalanced labels. A prototypical use-case of LDL is human age estimation based on profile images. Regarding this regression problem, a so called Deep Label Distribution Learning (DLDL) method has been developed. The main idea is the joint regression of the labe… ▽ More The concept of Label Distribution Learning (LDL) is a technique to stabilize classification and regression problems with ambiguous and/or imbalanced labels. A prototypical use-case of LDL is human age estimation based on profile images. Regarding this regression problem, a so called Deep Label Distribution Learning (DLDL) method has been developed. The main idea is the joint regression of the label distribution and its expectation value. However, the original DLDL method uses loss components with different mathematical motivation and, thus, different scales, which is why the use of a hyperparameter becomes necessary. In this work, we introduce a loss function for DLDL whose components are completely defined by Kullback-Leibler (KL) divergences and, thus, are directly comparable to each other without the need of additional hyperparameters. It generalizes the concept of DLDL with regard to further use-cases, in particular for multi-dimensional or multi-scale distribution learning tasks. △ Less

Submitted 5 September, 2022; originally announced September 2022.

Comments: 8 pages, 4 figures

arXiv:2209.01106 [pdf, other]

A New Aligned Simple German Corpus

Authors: Vanessa Toborek, Moritz Busch, Malte Boßert, Christian Bauckhage, Pascal Welke

Abstract: "Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We ev… ▽ More "Leichte Sprache", the German counterpart to Simple English, is a regulated language aiming to facilitate complex written language that would otherwise stay inaccessible to different groups of people. We present a new sentence-aligned monolingual corpus for Simple German -- German. It contains multiple document-aligned sources which we have aligned using automatic sentence-alignment methods. We evaluate our alignments based on a manually labelled subset of aligned documents. The quality of our sentence alignments, as measured by F1-score, surpasses previous work. We publish the dataset under CC BY-SA and the accompanying code under MIT license. △ Less

Submitted 26 May, 2023; v1 submitted 2 September, 2022; originally announced September 2022.

Comments: Accepted at ACL 2023

arXiv:2208.04365 [pdf, other]

Gradient Flows for L2 Support Vector Machine Training

Authors: Christian Bauckhage, Helen Schneider, Benjamin Wulff, Rafet Sifa

Abstract: We explore the merits of training of support vector machines for binary classification by means of solving systems of ordinary differential equations. We thus assume a continuous time perspective on a machine learning problem which may be of interest for implementations on (re)emerging hardware platforms such as analog- or quantum computers. We explore the merits of training of support vector machines for binary classification by means of solving systems of ordinary differential equations. We thus assume a continuous time perspective on a machine learning problem which may be of interest for implementations on (re)emerging hardware platforms such as analog- or quantum computers. △ Less

Submitted 8 August, 2022; originally announced August 2022.

Comments: Peer-reviewed and presented as part of the workshop on Continuous Time Methods for Machine Learning at the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, 2022

arXiv:2208.02140 [pdf, other]

KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports

Authors: Lars Hillebrand, Tobias Deußer, Tim Dilmaghani, Bernd Kliem, Rüdiger Loitz, Christian Bauckhage, Rafet Sifa

Abstract: We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue" or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Tran… ▽ More We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue" or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Transformers (BERT) combining a recurrent neural network (RNN) with conditional label masking to sequentially tag entities before it classifies their relations. Our model also introduces a learnable RNN-based pooling mechanism and incorporates domain expert knowledge by explicitly filtering impossible relations. We achieve a substantially higher prediction performance on a new practical dataset of German financial reports, outperforming several strong baselines including a competing state-of-the-art span-based entity tagging approach. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Comments: Accepted at ICPR 2022, 8 pages, 1 figure, 6 tables

arXiv:2206.03960 [pdf, other]

Predict better with less training data using a QNN

Authors: Barry D. Reese, Marek Kowalik, Christian Metzl, Christian Bauckhage, Eldar Sultanow

Abstract: Over the past decade, machine learning revolutionized vision-based quality assessment for which convolutional neural networks (CNNs) have now become the standard. In this paper, we consider a potential next step in this development and describe a quanvolutional neural network (QNN) algorithm that efficiently maps classical image data to quantum states and allows for reliable image analysis. We pra… ▽ More Over the past decade, machine learning revolutionized vision-based quality assessment for which convolutional neural networks (CNNs) have now become the standard. In this paper, we consider a potential next step in this development and describe a quanvolutional neural network (QNN) algorithm that efficiently maps classical image data to quantum states and allows for reliable image analysis. We practically demonstrate how to leverage quantum devices in computer vision and how to introduce quantum convolutions into classical CNNs. Dealing with a real world use case in industrial quality control, we implement our hybrid QNN model within the PennyLane framework and empirically observe it to achieve better predictions using much fewer training data than classical CNNs. In other words, we empirically observe a genuine quantum advantage for an industrial application where the advantage is due to superior data encoding. △ Less

Submitted 8 June, 2022; originally announced June 2022.

Comments: 23 pages, 15 figures

MSC Class: 81P68 ACM Class: I.5.1

arXiv:2205.11433 [pdf, other]

Informed Pre-Training on Prior Knowledge

Authors: Laura von Rueden, Sebastian Houben, Kostadin Cvejoski, Christian Bauckhage, Nico Piatkowski

Abstract: When training data is scarce, the incorporation of additional prior knowledge can assist the learning process. While it is common to initialize neural networks with weights that have been pre-trained on other large data sets, pre-training on more concise forms of knowledge has rather been overlooked. In this paper, we propose a novel informed machine learning approach and suggest to pre-train on p… ▽ More When training data is scarce, the incorporation of additional prior knowledge can assist the learning process. While it is common to initialize neural networks with weights that have been pre-trained on other large data sets, pre-training on more concise forms of knowledge has rather been overlooked. In this paper, we propose a novel informed machine learning approach and suggest to pre-train on prior knowledge. Formal knowledge representations, e.g. graphs or equations, are first transformed into a small and condensed data set of knowledge prototypes. We show that informed pre-training on such knowledge prototypes (i) speeds up the learning processes, (ii) improves generalization capabilities in the regime where not enough training data is available, and (iii) increases model robustness. Analyzing which parts of the model are affected most by the prototypes reveals that improvements come from deeper layers that typically represent high-level features. This confirms that informed pre-training can indeed transfer semantic knowledge. This is a novel effect, which shows that knowledge-based pre-training has additional and complementary strengths to existing approaches. △ Less

Submitted 23 May, 2022; originally announced May 2022.

arXiv:2204.11133 [pdf, other]

Towards Bundle Adjustment for Satellite Imaging via Quantum Machine Learning

Authors: Nico Piatkowski, Thore Gerlach, Romain Hugues, Rafet Sifa, Christian Bauckhage, Frederic Barbaresco

Abstract: Given is a set of images, where all images show views of the same area at different points in time and from different viewpoints. The task is the alignment of all images such that relevant information, e.g., poses, changes, and terrain, can be extracted from the fused image. In this work, we focus on quantum methods for keypoint extraction and feature matching, due to the demanding computational c… ▽ More Given is a set of images, where all images show views of the same area at different points in time and from different viewpoints. The task is the alignment of all images such that relevant information, e.g., poses, changes, and terrain, can be extracted from the fused image. In this work, we focus on quantum methods for keypoint extraction and feature matching, due to the demanding computational complexity of these sub-tasks. To this end, k-medoids clustering, kernel density clustering, nearest neighbor search, and kernel methods are investigated and it is explained how these methods can be re-formulated for quantum annealers and gate-based quantum computers. Experimental results obtained on digital quantum emulation hardware, quantum annealers, and quantum gate computers show that classical systems still deliver superior results. However, the proposed methods are ready for the current and upcoming generations of quantum computing devices which have the potential to outperform classical systems in the near future. △ Less

Submitted 23 April, 2022; originally announced April 2022.

ACM Class: C.3; I.2; I.4

arXiv:2203.08815 [pdf, other]

QUBOs for Sorting Lists and Building Trees

Authors: Christian Bauckhage, Thore Gerlach, Nico Piatkowski

Abstract: We show that the fundamental tasks of sorting lists and building search trees or heaps can be modeled as quadratic unconstrained binary optimization problems (QUBOs). The idea is to understand these tasks as permutation problems and to devise QUBOs whose solutions represent appropriate permutation matrices. We discuss how to construct such QUBOs and how to solve them using Hopfield nets or adiabat… ▽ More We show that the fundamental tasks of sorting lists and building search trees or heaps can be modeled as quadratic unconstrained binary optimization problems (QUBOs). The idea is to understand these tasks as permutation problems and to devise QUBOs whose solutions represent appropriate permutation matrices. We discuss how to construct such QUBOs and how to solve them using Hopfield nets or adiabatic) quantum computing. In short, we show that neurocomputing methods or quantum computers can solve problems usually associated with abstract data structures. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2201.02885 [pdf, other]

doi 10.1093/gigascience/giac054

Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision

Authors: Maurice Günder, Facundo R. Ispizua Yamati, Jana Kierdorf, Ribana Roscher, Anne-Katrin Mahlein, Christian Bauckhage

Abstract: UAV-based image retrieval in modern agriculture enables gathering large amounts of spatially referenced crop image data. In large-scale experiments, however, UAV images suffer from containing a multitudinous amount of crops in a complex canopy architecture. Especially for the observation of temporal effects, this complicates the recognition of individual plants over several images and the extracti… ▽ More UAV-based image retrieval in modern agriculture enables gathering large amounts of spatially referenced crop image data. In large-scale experiments, however, UAV images suffer from containing a multitudinous amount of crops in a complex canopy architecture. Especially for the observation of temporal effects, this complicates the recognition of individual plants over several images and the extraction of relevant information tremendously. In this work, we present a hands-on workflow for the automatized temporal and spatial identification and individualization of crop images from UAVs abbreviated as "cataloging" based on comprehensible computer vision methods. We evaluate the workflow on two real-world datasets. One dataset is recorded for observation of Cercospora leaf spot - a fungal disease - in sugar beet over an entire growing cycle. The other one deals with harvest prediction of cauliflower plants. The plant catalog is utilized for the extraction of single plant images seen over multiple time points. This gathers large-scale spatio-temporal image dataset that in turn can be applied to train further machine learning models including various data layers. The presented approach improves analysis and interpretation of UAV data in agriculture significantly. By validation with some reference data, our method shows an accuracy that is similar to more complex deep learning-based recognition techniques. Our workflow is able to automatize plant cataloging and training image extraction, especially for large datasets. △ Less

Submitted 11 January, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

Comments: Preprint submitted to GigaScience

Journal ref: GigaScience, Volume 11, 2022

arXiv:2112.10712 [pdf, other]

Evolutionary Hierarchical Harvest Schedule Optimization for Food Waste Prevention

Authors: Maurice Günder, Nico Piatkowski, Laura von Rueden, Rafet Sifa, Christian Bauckhage

Abstract: In order to avoid disadvantages of monocropping for soil and environment, it is advisable to practice intercropping of various plant species whenever possible. However, intercropping is challenging as it requires a balanced planting schedule due to individual cultivation time frames. Maintaining a continuous harvest reduces logistical costs and related greenhouse gas emissions, and contributes to… ▽ More In order to avoid disadvantages of monocropping for soil and environment, it is advisable to practice intercropping of various plant species whenever possible. However, intercropping is challenging as it requires a balanced planting schedule due to individual cultivation time frames. Maintaining a continuous harvest reduces logistical costs and related greenhouse gas emissions, and contributes to food waste prevention. In this work, we address these issues and propose an optimization method for a full harvest season of large crop ensembles that complies with given constraints. By using an approach based on an evolutionary algorithm combined with a novel hierarchical loss function and adaptive mutation rate, we transfer the multi-objective into a pseudo-single-objective optimization problem and obtain faster convergence and better solutions than for conventional approaches. △ Less

Submitted 20 December, 2021; originally announced December 2021.

Comments: 4 pages, AAAI-2022 Workshop AI for Agriculture and Food Systems (AIAFS)

arXiv:2110.14747 [pdf, other]

Dynamic Review-based Recommenders

Authors: Kostadin Cvejoski, Ramses J. Sanchez, Christian Bauckhage, Cesar Ojeda

Abstract: Just as user preferences change with time, item reviews also reflect those same preference changes. In a nutshell, if one is to sequentially incorporate review content knowledge into recommender systems, one is naturally led to dynamical models of text. In the present work we leverage the known power of reviews to enhance rating predictions in a way that (i) respects the causality of review genera… ▽ More Just as user preferences change with time, item reviews also reflect those same preference changes. In a nutshell, if one is to sequentially incorporate review content knowledge into recommender systems, one is naturally led to dynamical models of text. In the present work we leverage the known power of reviews to enhance rating predictions in a way that (i) respects the causality of review generation and (ii) includes, in a bidirectional fashion, the ability of ratings to inform language review models and vice-versa, language representations that help predict ratings end-to-end. Moreover, our representations are time-interval aware and thus yield a continuous-time representation of the dynamics. We provide experiments on real-world datasets and show that our methodology is able to outperform several state-of-the-art models. Source code for all models can be found at [1]. △ Less

Submitted 22 March, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: 6pages, Published at International Data Science Conference 2021 (iDSC21)

arXiv:2104.07538 [pdf, other]

Street-Map Based Validation of Semantic Segmentation in Autonomous Driving

Authors: Laura von Rueden, Tim Wirtz, Fabian Hueger, Jan David Schneider, Nico Piatkowski, Christian Bauckhage

Abstract: Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness, which motivates the thorough validation of learned models. However, current validation approaches mostly require ground truth data and are thus both cost-intensive and limited in their applicability. We propose to overcome these limitations by a model agnostic validation using a-priori knowledge… ▽ More Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness, which motivates the thorough validation of learned models. However, current validation approaches mostly require ground truth data and are thus both cost-intensive and limited in their applicability. We propose to overcome these limitations by a model agnostic validation using a-priori knowledge from street maps. In particular, we show how to validate semantic segmentation masks and demonstrate the potential of our approach using OpenStreetMap. We introduce validation metrics that indicate false positive or negative road segments. Besides the validation approach, we present a method to correct the vehicle's GPS position so that a more accurate localization can be used for the street-map based validation. Lastly, we present quantitative results on the Cityscapes dataset indicating that our validation approach can indeed uncover errors in semantic segmentation masks. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: Final version accepted at the International Conference on Pattern Recognition (ICPR). arXiv admin note: substantial text overlap with arXiv:2011.08008

arXiv:2012.13453 [pdf, other]

doi 10.1109/CEC55065.2022.9870269

Quantum Circuit Evolution on NISQ Devices

Authors: Lukas Franken, Bogdan Georgiev, Sascha Mücke, Moritz Wolter, Raoul Heese, Christian Bauckhage, Nico Piatkowski

Abstract: Variational quantum circuits build the foundation for various classes of quantum algorithms. In a nutshell, the weights of a parametrized quantum circuit are varied until the empirical sampling distribution of the circuit is sufficiently close to a desired outcome. Numerical first-order methods are applied frequently to fit the parameters of the circuit, but most of the time, the circuit itself, t… ▽ More Variational quantum circuits build the foundation for various classes of quantum algorithms. In a nutshell, the weights of a parametrized quantum circuit are varied until the empirical sampling distribution of the circuit is sufficiently close to a desired outcome. Numerical first-order methods are applied frequently to fit the parameters of the circuit, but most of the time, the circuit itself, that is, the actual composition of gates, is fixed. Methods for optimizing the circuit design jointly with the weights have been proposed, but empirical results are rather scarce. Here, we consider a simple evolutionary strategy that addresses the trade-off between finding appropriate circuit architectures and parameter tuning. We evaluate our method both via simulation and on actual quantum hardware. Our benchmark problems include the transverse field Ising Hamiltonian and the Sherrington-Kirkpatrick spin model. Despite the shortcomings of current noisy intermediate-scale quantum hardware, we find only a minor slowdown on actual quantum machines compared to simulations. Moreover, we investigate which mutation operations most significantly contribute to the optimization. The results provide intuition on how randomized search heuristics behave on actual quantum hardware and lay out a path for further refinement of evolutionary quantum gate circuits. △ Less

Submitted 23 May, 2022; v1 submitted 23 December, 2020; originally announced December 2020.

Comments: 8 pages, 7 figures. To appear in the proceedings of IEEE Congress on Evolutionary Computation (CEC) 2022

Journal ref: 2022 IEEE Congress on Evolutionary Computation (CEC), pp. 1-8

arXiv:2012.05684 [pdf, other]

doi 10.1109/IJCNN48605.2020.9206768

Recurrent Point Review Models

Authors: Kostadin Cvejoski, Ramses J. Sanchez, Bogdan Georgiev, Christian Bauckhage, Cesar Ojeda

Abstract: Deep neural network models represent the state-of-the-art methodologies for natural language processing. Here we build on top of these methodologies to incorporate temporal information and model how to review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, t… ▽ More Deep neural network models represent the state-of-the-art methodologies for natural language processing. Here we build on top of these methodologies to incorporate temporal information and model how to review data changes with time. Specifically, we use the dynamic representations of recurrent point process models, which encode the history of how business or service reviews are received in time, to generate instantaneous language models with improved prediction capabilities. Simultaneously, our methodologies enhance the predictive power of our point process models by incorporating summarized review content representations. We provide recurrent network and temporal convolution solutions for modeling the review content. We deploy our methodologies in the context of recommender systems, effectively characterizing the change in preference and taste of users as time evolves. Source code is available at [1]. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 8 pages, 6 figures, Published in: 2020 International Joint Conference on Neural Networks (IJCNN)

Journal ref: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, United Kingdom, 2020, pp. 1-8

arXiv:2011.08272 [pdf, other]

NLPGym -- A toolkit for evaluating RL agents on Natural Language Processing Tasks

Authors: Rajkumar Ramamurthy, Rafet Sifa, Christian Bauckhage

Abstract: Reinforcement learning (RL) has recently shown impressive performance in complex game AI and robotics tasks. To a large extent, this is thanks to the availability of simulated environments such as OpenAI Gym, Atari Learning Environment, or Malmo which allow agents to learn complex tasks through interaction with virtual environments. While RL is also increasingly applied to natural language process… ▽ More Reinforcement learning (RL) has recently shown impressive performance in complex game AI and robotics tasks. To a large extent, this is thanks to the availability of simulated environments such as OpenAI Gym, Atari Learning Environment, or Malmo which allow agents to learn complex tasks through interaction with virtual environments. While RL is also increasingly applied to natural language processing (NLP), there are no simulated textual environments available for researchers to apply and consistently benchmark RL on NLP tasks. With the work reported here, we therefore release NLPGym, an open-source Python toolkit that provides interactive textual environments for standard NLP tasks such as sequence tagging, multi-label classification, and question answering. We also present experimental results for 6 tasks using different RL algorithms which serve as baselines for further research. The toolkit is published at https://github.com/rajcscw/nlp-gym △ Less

Submitted 16 November, 2020; originally announced November 2020.

Comments: Accepted at Wordplay: When Language Meets Games Workshop @ NeurIPS 2020

arXiv:2011.08008 [pdf, other]

Towards Map-Based Validation of Semantic Segmentation Masks

Authors: Laura von Rueden, Tim Wirtz, Fabian Hueger, Jan David Schneider, Christian Bauckhage

Abstract: Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness. We propose to validate machine learning models for self-driving vehicles not only with given ground truth labels, but also with additional a-priori knowledge. In particular, we suggest to validate the drivable area in semantic segmentation masks using given street map data. We present first resul… ▽ More Artificial intelligence for autonomous driving must meet strict requirements on safety and robustness. We propose to validate machine learning models for self-driving vehicles not only with given ground truth labels, but also with additional a-priori knowledge. In particular, we suggest to validate the drivable area in semantic segmentation masks using given street map data. We present first results, which indicate that prediction errors can be uncovered by map-based validation. △ Less

Submitted 26 November, 2020; v1 submitted 3 November, 2020; originally announced November 2020.

arXiv:2007.07320 [pdf, other]

Learning Syllogism with Euler Neural-Networks

Authors: Tiansi Dong, Chengjiang Li, Christian Bauckhage, Juanzi Li, Stefan Wrobel, Armin B. Cremers

Abstract: Traditional neural networks represent everything as a vector, and are able to approximate a subset of logical reasoning to a certain degree. As basic logic relations are better represented by topological relations between regions, we propose a novel neural network that represents everything as a ball and is able to learn topological configuration as an Euler diagram. So comes the name Euler Neural… ▽ More Traditional neural networks represent everything as a vector, and are able to approximate a subset of logical reasoning to a certain degree. As basic logic relations are better represented by topological relations between regions, we propose a novel neural network that represents everything as a ball and is able to learn topological configuration as an Euler diagram. So comes the name Euler Neural-Network (ENN). The central vector of a ball is a vector that can inherit representation power of traditional neural network. ENN distinguishes four spatial statuses between balls, namely, being disconnected, being partially overlapped, being part of, being inverse part of. Within each status, ideal values are defined for efficient reasoning. A novel back-propagation algorithm with six Rectified Spatial Units (ReSU) can optimize an Euler diagram representing logical premises, from which logical conclusion can be deduced. In contrast to traditional neural network, ENN can precisely represent all 24 different structures of Syllogism. Two large datasets are created: one extracted from WordNet-3.0 covers all types of Syllogism reasoning, the other extracted all family relations from DBpedia. Experiment results approve the superior power of ENN in logical representation and reasoning. Datasets and source code are available upon request. △ Less

Submitted 20 July, 2020; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 16 pages, 6 figures

arXiv:1912.04132 [pdf, other]

Recurrent Point Processes for Dynamic Review Models

Authors: Kostadin Cvejoski, Ramses J. Sanchez, Bogdan Georgiev, Jannis Schuecker, Christian Bauckhage, Cesar Ojeda

Abstract: Recent progress in recommender system research has shown the importance of including temporal representations to improve interpretability and performance. Here, we incorporate temporal representations in continuous time via recurrent point process for a dynamical model of reviews. Our goal is to characterize how changes in perception, user interest and seasonal effects affect review text. Recent progress in recommender system research has shown the importance of including temporal representations to improve interpretability and performance. Here, we incorporate temporal representations in continuous time via recurrent point process for a dynamical model of reviews. Our goal is to characterize how changes in perception, user interest and seasonal effects affect review text. △ Less

Submitted 15 January, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: Presented at the AAAI 2020 Workshop on Interactive and Conversational Recommendation Systems

arXiv:1911.06121 [pdf, other]

Towards Supervised Extractive Text Summarization via RNN-based Sequence Classification

Authors: Eduardo Brito, Max Lübbering, David Biesner, Lars Patrick Hillebrand, Christian Bauckhage

Abstract: This article briefly explains our submitted approach to the DocEng'19 competition on extractive summarization. We implemented a recurrent neural network based model that learns to classify whether an article's sentence belongs to the corresponding extractive summary or not. We bypass the lack of large annotated news corpora for extractive summarization by generating extractive summaries from abstr… ▽ More This article briefly explains our submitted approach to the DocEng'19 competition on extractive summarization. We implemented a recurrent neural network based model that learns to classify whether an article's sentence belongs to the corresponding extractive summary or not. We bypass the lack of large annotated news corpora for extractive summarization by generating extractive summaries from abstractive ones, which are available from the CNN corpus. △ Less

Submitted 13 November, 2019; originally announced November 2019.

arXiv:1906.09808 [pdf, ps, other]

Recurrent Adversarial Service Times

Authors: César Ojeda, Kostadin Cvejosky, Ramsés J. Sánchez, Jannis Schuecker, Bogdan Georgiev, Christian Bauckhage

Abstract: Service system dynamics occur at the interplay between customer behaviour and a service provider's response. This kind of dynamics can effectively be modeled within the framework of queuing theory where customers' arrivals are described by point process models. However, these approaches are limited by parametric assumptions as to, for example, inter-event time distributions. In this paper, we addr… ▽ More Service system dynamics occur at the interplay between customer behaviour and a service provider's response. This kind of dynamics can effectively be modeled within the framework of queuing theory where customers' arrivals are described by point process models. However, these approaches are limited by parametric assumptions as to, for example, inter-event time distributions. In this paper, we address these limitations and propose a novel, deep neural network solution to the queuing problem. Our solution combines a recurrent neural network that models the arrival process with a recurrent generative adversarial network which models the service time distribution. We evaluate our methodology on various empirical datasets ranging from internet services (Blockchain, GitHub, Stackoverflow) to mobility service systems (New York taxi cab). △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1903.12394 [pdf, other]

doi 10.1109/TKDE.2021.3079836

Informed Machine Learning -- A Taxonomy and Survey of Integrating Knowledge into Learning Systems

Authors: Laura von Rueden, Sebastian Mayer, Katharina Beckh, Bogdan Georgiev, Sven Giesselbach, Raoul Heese, Birgit Kirsch, Julius Pfrommer, Annika Pick, Rajkumar Ramamurthy, Michal Walczak, Jochen Garcke, Christian Bauckhage, Jannis Schuecker

Abstract: Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for inf… ▽ More Despite its great success, machine learning can have its limits when dealing with insufficient training data. A potential solution is the additional integration of prior knowledge into the training process which leads to the notion of informed machine learning. In this paper, we present a structured overview of various approaches in this field. We provide a definition and propose a concept for informed machine learning which illustrates its building blocks and distinguishes it from conventional machine learning. We introduce a taxonomy that serves as a classification framework for informed machine learning approaches. It considers the source of knowledge, its representation, and its integration into the machine learning pipeline. Based on this taxonomy, we survey related research and describe how different knowledge representations such as algebraic equations, logic rules, or simulation results can be used in learning systems. This evaluation of numerous papers on the basis of our taxonomy uncovers key methods in the field of informed machine learning. △ Less

Submitted 28 May, 2021; v1 submitted 29 March, 2019; originally announced March 2019.

Comments: Accepted at IEEE Transactions on Knowledge and Data Engineering: https://ieeexplore.ieee.org/document/9429985

arXiv:1803.04300 [pdf, other]

Neural Conditional Gradients

Authors: Patrick Schramowski, Christian Bauckhage, Kristian Kersting

Abstract: The move from hand-designed to learned optimizers in machine learning has been quite successful for gradient-based and -free optimizers. When facing a constrained problem, however, maintaining feasibility typically requires a projection step, which might be computationally expensive and not differentiable. We show how the design of projection-free convex optimization algorithms can be cast as a le… ▽ More The move from hand-designed to learned optimizers in machine learning has been quite successful for gradient-based and -free optimizers. When facing a constrained problem, however, maintaining feasibility typically requires a projection step, which might be computationally expensive and not differentiable. We show how the design of projection-free convex optimization algorithms can be cast as a learning problem based on Frank-Wolfe Networks: recurrent networks implementing the Frank-Wolfe algorithm aka. conditional gradients. This allows them to learn to exploit structure when, e.g., optimizing over rank-1 matrices. Our LSTM-learned optimizers outperform hand-designed as well learned but unconstrained ones. We demonstrate this for training support vector machines and softmax classifiers. △ Less

Submitted 30 July, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

Comments: arXiv admin note: text overlap with arXiv:1610.05120 by other authors

arXiv:1710.11395 [pdf, other]

doi 10.1145/1526709.1526809

The Slashdot Zoo: Mining a Social Network with Negative Edges

Authors: Jérôme Kunegis, Andreas Lommatzsch, Christian Bauckhage

Abstract: We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative endorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we consider signed variants of global network chara… ▽ More We analyse the corpus of user relationships of the Slashdot technology news site. The data was collected from the Slashdot Zoo feature where users of the website can tag other users as friends and foes, providing positive and negative endorsements. We adapt social network analysis techniques to the problem of negative edge weights. In particular, we consider signed variants of global network characteristics such as the clustering coefficient, node-level characteristics such as centrality and popularity measures, and link-level characteristics such as distances and similarity measures. We evaluate these measures on the task of identifying unpopular users, as well as on the task of predicting the sign of links and show that the network exhibits multiplicative transitivity which allows algebraic methods based on matrix multiplication to be used. We compare our methods to traditional methods which are only suitable for positively weighted edges. △ Less

Submitted 31 October, 2017; originally announced October 2017.

Comments: 10 pages, color, accepted at WWW 2009

ACM Class: I.2.6; H.4.0

Journal ref: Proc. WWW 2009

arXiv:1704.01046 [pdf, other]

Using Echo State Networks for Cryptography

Authors: Rajkumar Ramamurthy, Christian Bauckhage, Krisztian Buza, Stefan Wrobel

Abstract: Echo state networks are simple recurrent neural networks that are easy to implement and train. Despite their simplicity, they show a form of memory and can predict or regenerate sequences of data. We make use of this property to realize a novel neural cryptography scheme. The key idea is to assume that Alice and Bob share a copy of an echo state network. If Alice trains her copy to memorize a mess… ▽ More Echo state networks are simple recurrent neural networks that are easy to implement and train. Despite their simplicity, they show a form of memory and can predict or regenerate sequences of data. We make use of this property to realize a novel neural cryptography scheme. The key idea is to assume that Alice and Bob share a copy of an echo state network. If Alice trains her copy to memorize a message, she can communicate the trained part of the network to Bob who plugs it into his copy to regenerate the message. Considering a byte-level representation of in- and output, the technique applies to arbitrary types of data (texts, images, audio files, etc.) and practical experiments reveal it to satisfy the fundamental cryptographic properties of diffusion and confusion. △ Less

Submitted 4 April, 2017; originally announced April 2017.

Comments: 8 pages, ICANN 2017

arXiv:1511.01523 [pdf, other]

SGPD Volume Maximization for Community Detection

Authors: Kasra Manshaei, Christian Bauckhage

Abstract: In this note we briefly study the feasibility of community detection in complex networks using peripheral vertices. Our method suggests a novel direction in axiomizing the problem of clustering in graphs and complex networks by looking at the topological role each vertex plays in the community structure, regardless of the attributes. The promising strength of pseudo-peripheral vertices as a lever… ▽ More In this note we briefly study the feasibility of community detection in complex networks using peripheral vertices. Our method suggests a novel direction in axiomizing the problem of clustering in graphs and complex networks by looking at the topological role each vertex plays in the community structure, regardless of the attributes. The promising strength of pseudo-peripheral vertices as a lever for analysis of complex networks is also demonstrated on real-world data. △ Less

Submitted 4 November, 2015; originally announced November 2015.

arXiv:1501.06180 [pdf, other]

doi 10.1109/TCSVT.2015.2397199

Exploring Human Vision Driven Features for Pedestrian Detection

Authors: Shanshan Zhang, Christian Bauckhage, Dominik A. Klein, Armin B. Cremers

Abstract: Motivated by the center-surround mechanism in the human visual attention system, we propose to use average contrast maps for the challenge of pedestrian detection in street scenes due to the observation that pedestrians indeed exhibit discriminative contrast texture. Our main contributions are first to design a local, statistical multi-channel descriptorin order to incorporate both color and gradi… ▽ More Motivated by the center-surround mechanism in the human visual attention system, we propose to use average contrast maps for the challenge of pedestrian detection in street scenes due to the observation that pedestrians indeed exhibit discriminative contrast texture. Our main contributions are first to design a local, statistical multi-channel descriptorin order to incorporate both color and gradient information. Second, we introduce a multi-direction and multi-scale contrast scheme based on grid-cells in order to integrate expressive local variations. Contributing to the issue of selecting most discriminative features for assessing and classification, we perform extensive comparisons w.r.t. statistical descriptors, contrast measurements, and scale structures. This way, we obtain reasonable results under various configurations. Empirical findings from applying our optimized detector on the INRIA and Caltech pedestrian datasets show that our features yield state-of-the-art performance in pedestrian detection. △ Less

Submitted 25 January, 2015; originally announced January 2015.

Comments: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

arXiv:1501.04232 [pdf, other]

Maximum Entropy Models of Shortest Path and Outbreak Distributions in Networks

Authors: Christian Bauckhage, Kristian Kersting, Fabian Hadiji

Abstract: Properties of networks are often characterized in terms of features such as node degree distributions, average path lengths, diameters, or clustering coefficients. Here, we study shortest path length distributions. On the one hand, average as well as maximum distances can be determined therefrom; on the other hand, they are closely related to the dynamics of network spreading processes. Because of… ▽ More Properties of networks are often characterized in terms of features such as node degree distributions, average path lengths, diameters, or clustering coefficients. Here, we study shortest path length distributions. On the one hand, average as well as maximum distances can be determined therefrom; on the other hand, they are closely related to the dynamics of network spreading processes. Because of the combinatorial nature of networks, we apply maximum entropy arguments to derive a general, physically plausible model. In particular, we establish the generalized Gamma distribution as a continuous characterization of shortest path length histograms of networks or arbitrary topology. Experimental evaluations corroborate our theoretical results. △ Less

Submitted 17 January, 2015; originally announced January 2015.

arXiv:1410.3314 [pdf, other]

Propagation Kernels

Authors: Marion Neumann, Roman Garnett, Christian Bauckhage, Kristian Kersting

Abstract: We introduce propagation kernels, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge informat… ▽ More We introduce propagation kernels, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge information. This has two benefits. First, off-the-shelf propagation schemes can be used to naturally construct kernels for many graph types, including labeled, partially labeled, unlabeled, directed, and attributed graphs. Second, by leveraging existing efficient and informative propagation schemes, propagation kernels can be considerably faster than state-of-the-art approaches without sacrificing predictive performance. We will also show that if the graphs at hand have a regular structure, for instance when modeling image or video data, one can exploit this regularity to scale the kernel computation to large databases of graphs with thousands of nodes. We support our contributions by exhaustive experiments on a number of real-world graphs from a variety of application domains. △ Less

Submitted 13 October, 2014; originally announced October 2014.

arXiv:1410.0642 [pdf, other]

A Note on Archetypal Analysis and the Approximation of Convex Hulls

Authors: Christian Bauckhage

Abstract: We briefly review the basic ideas behind archetypal analysis for matrix factorization and discuss its behavior in approximating the convex hull of a data sample. We then ask how good such approximations can be and consider different cases. Understanding archetypal analysis as the problem of computing a convexity constrained low-rank approximation of the identity matrix provides estimates for arche… ▽ More We briefly review the basic ideas behind archetypal analysis for matrix factorization and discuss its behavior in approximating the convex hull of a data sample. We then ask how good such approximations can be and consider different cases. Understanding archetypal analysis as the problem of computing a convexity constrained low-rank approximation of the identity matrix provides estimates for archetypal analysis and the SiVM heuristic. △ Less

Submitted 27 September, 2014; originally announced October 2014.

arXiv:1409.0104 [pdf, ps, other]

Marginalizing over the PageRank Damping Factor

Authors: Christian Bauckhage

Abstract: In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification. In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification. △ Less

Submitted 30 August, 2014; originally announced September 2014.

arXiv:1407.3950 [pdf]

A Comparison of Methods for Player Clustering via Behavioral Telemetry

Authors: Anders Drachen, Christian Thurau, Rafet Sifa, Christian Bauckhage

Abstract: The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets can be exceptionally complex, with features recorded for a varying population of users over a tempora… ▽ More The analysis of user behavior in digital games has been aided by the introduction of user telemetry in game development, which provides unprecedented access to quantitative data on user behavior from the installed game clients of the entire population of players. Player behavior telemetry datasets can be exceptionally complex, with features recorded for a varying population of users over a temporal segment that can reach years in duration. Categorization of behaviors, whether through descriptive methods (e.g. segmention) or unsupervised/supervised learning techniques, is valuable for finding patterns in the behavioral data, and developing profiles that are actionable to game developers. There are numerous methods for unsupervised clustering of user behavior, e.g. k-means/c-means, Non-negative Matrix Factorization, or Principal Component Analysis. Although all yield behavior categorizations, interpretation of the resulting categories in terms of actual play behavior can be difficult if not impossible. In this paper, a range of unsupervised techniques are applied together with Archetypal Analysis to develop behavioral clusters from playtime data of 70,014 World of Warcraft players, covering a five year interval. The techniques are evaluated with respect to their ability to develop actionable behavioral profiles from the dataset. △ Less

Submitted 15 July, 2014; originally announced July 2014.

Comments: Foundations of Digital Games 2013

MSC Class: N/A ACM Class: H.2.8

arXiv:1406.6529 [pdf, other]

Strong Regularities in Growth and Decline of Popularity of Social Media Services

Authors: Christian Bauckhage, Kristian Kersting

Abstract: We analyze general trends and pattern in time series that characterize the dynamics of collective attention to social media services and Web-based businesses. Our study is based on search frequency data available from Google Trends and considers 175 different services. For each service, we collect data from 45 different countries as well as global averages. This way, we obtain more than 8,000 time… ▽ More We analyze general trends and pattern in time series that characterize the dynamics of collective attention to social media services and Web-based businesses. Our study is based on search frequency data available from Google Trends and considers 175 different services. For each service, we collect data from 45 different countries as well as global averages. This way, we obtain more than 8,000 time series which we analyze using diffusion models from the economic sciences. We find that these models accurately characterize the empirical data and our analysis reveals that collective attention to social media grows and subsides in a highly regular and predictable manner. Regularities persist across regions, cultures, and topics and thus hint at general mechanisms that govern the adoption of Web-based services. We discuss several cases in detail to highlight interesting findings. Our methods are of economic interest as they may inform investment decisions and can help assessing at what stage of the general life-cycle a Web service is at. △ Less

Submitted 25 June, 2014; originally announced June 2014.

ACM Class: G.3; H.3.5

arXiv:1402.3193 [pdf, ps, other]

Characterizations and Kullback-Leibler Divergence of Gompertz Distributions

Authors: Christian Bauckhage

Abstract: In this note, we characterize the Gompertz distribution in terms of extreme value distributions and point out that it implicitly models the interplay of two antagonistic growth processes. In addition, we derive a closed form expressions for the Kullback-Leibler divergence between two Gompertz Distributions. Although the latter is rather easy to obtain, it seems not to have been widely reported bef… ▽ More In this note, we characterize the Gompertz distribution in terms of extreme value distributions and point out that it implicitly models the interplay of two antagonistic growth processes. In addition, we derive a closed form expressions for the Kullback-Leibler divergence between two Gompertz Distributions. Although the latter is rather easy to obtain, it seems not to have been widely reported before. △ Less

Submitted 13 February, 2014; originally announced February 2014.

arXiv:1401.6853 [pdf, ps, other]

Computing the Kullback-Leibler Divergence between two Generalized Gamma Distributions

Authors: Christian Bauckhage

Abstract: We derive a closed form solution for the Kullback-Leibler divergence between two generalized gamma distributions. These notes are meant as a reference and provide a guided tour towards a result of practical interest that is rarely explicated in the literature. We derive a closed form solution for the Kullback-Leibler divergence between two generalized gamma distributions. These notes are meant as a reference and provide a guided tour towards a result of practical interest that is rarely explicated in the literature. △ Less

Submitted 27 January, 2014; originally announced January 2014.

arXiv:1310.7114 [pdf, other]

Efficient Information Theoretic Clustering on Discrete Lattices

Authors: Christian Bauckhage, Kristian Kersting

Abstract: We consider the problem of clustering data that reside on discrete, low dimensional lattices. Canonical examples for this setting are found in image segmentation and key point extraction. Our solution is based on a recent approach to information theoretic clustering where clusters result from an iterative procedure that minimizes a divergence measure. We replace costly processing steps in the orig… ▽ More We consider the problem of clustering data that reside on discrete, low dimensional lattices. Canonical examples for this setting are found in image segmentation and key point extraction. Our solution is based on a recent approach to information theoretic clustering where clusters result from an iterative procedure that minimizes a divergence measure. We replace costly processing steps in the original algorithm by means of convolutions. These allow for highly efficient implementations and thus significantly reduce runtime. This paper therefore bridges a gap between machine learning and signal processing. △ Less

Submitted 26 October, 2013; originally announced October 2013.

Comments: This paper has been presented at the workshop LWA 2012

arXiv:1310.3713 [pdf, ps, other]

Computing the Kullback-Leibler Divergence between two Weibull Distributions

Authors: Christian Bauckhage

Abstract: We derive a closed form solution for the Kullback-Leibler divergence between two Weibull distributions. These notes are meant as reference material and intended to provide a guided tour towards a result that is often mentioned but seldom made explicit in the literature. We derive a closed form solution for the Kullback-Leibler divergence between two Weibull distributions. These notes are meant as reference material and intended to provide a guided tour towards a result that is often mentioned but seldom made explicit in the literature. △ Less

Submitted 14 October, 2013; originally announced October 2013.

arXiv:1304.7984 [pdf, other]

GeoDBLP: Geo-Tagging DBLP for Mining the Sociology of Computer Science

Authors: Fabian Hadiji, Kristian Kersting, Christian Bauckhage, Babak Ahmadi

Abstract: Many collective human activities have been shown to exhibit universal patterns. However, the possibility of universal patterns across timing events of researcher migration has barely been explored at global scale. Here, we show that timing events of migration within different countries exhibit remarkable similarities. Specifically, we look at the distribution governing the data of researcher migra… ▽ More Many collective human activities have been shown to exhibit universal patterns. However, the possibility of universal patterns across timing events of researcher migration has barely been explored at global scale. Here, we show that timing events of migration within different countries exhibit remarkable similarities. Specifically, we look at the distribution governing the data of researcher migration inferred from the web. Compiling the data in itself represents a significant advance in the field of quantitative analysis of migration patterns. Official and commercial records are often access restricted, incompatible between countries, and especially not registered across researchers. Instead, we introduce GeoDBLP where we propagate geographical seed locations retrieved from the web across the DBLP database of 1,080,958 authors and 1,894,758 papers. But perhaps more important is that we are able to find statistical patterns and create models that explain the migration of researchers. For instance, we show that the science job market can be treated as a Poisson process with individual propensities to migrate following a log-normal distribution over the researcher's career stage. That is, although jobs enter the market constantly, researchers are generally not "memoryless" but have to care greatly about their next move. The propensity to make k>1 migrations, however, follows a gamma distribution suggesting that migration at later career stages is "memoryless". This aligns well but actually goes beyond scientometric models typically postulated based on small case studies. On a very large, transnational scale, we establish the first general regularities that should have major implications on strategies for education and research worldwide. △ Less

Submitted 30 April, 2013; originally announced April 2013.

arXiv:1210.4919 [pdf]

Latent Dirichlet Allocation Uncovers Spectral Characteristics of Drought Stressed Plants

Authors: Mirwaes Wahabzada, Kristian Kersting, Christian Bauckhage, Christoph Roemer, Agim Ballvora, Francisco Pinto, Uwe Rascher, Jens Leon, Lutz Ploemer

Abstract: Understanding the adaptation process of plants to drought stress is essential in improving management practices, breeding strategies as well as engineering viable crops for a sustainable agriculture in the coming decades. Hyper-spectral imaging provides a particularly promising approach to gain such understanding since it allows to discover non-destructively spectral characteristics of plants gove… ▽ More Understanding the adaptation process of plants to drought stress is essential in improving management practices, breeding strategies as well as engineering viable crops for a sustainable agriculture in the coming decades. Hyper-spectral imaging provides a particularly promising approach to gain such understanding since it allows to discover non-destructively spectral characteristics of plants governed primarily by scattering and absorption characteristics of the leaf internal structure and biochemical constituents. Several drought stress indices have been derived using hyper-spectral imaging. However, they are typically based on few hyper-spectral images only, rely on interpretations of experts, and consider few wavelengths only. In this study, we present the first data-driven approach to discovering spectral drought stress indices, treating it as an unsupervised labeling problem at massive scale. To make use of short range dependencies of spectral wavelengths, we develop an online variational Bayes algorithm for latent Dirichlet allocation with convolved Dirichlet regularizer. This approach scales to massive datasets and, hence, provides a more objective complement to plant physiological practices. The spectral topics found conform to plant physiological knowledge and can be computed in a fraction of the time compared to existing LDA approaches. △ Less

Submitted 16 October, 2012; originally announced October 2012.

Comments: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Report number: UAI-P-2012-PG-852-862

Showing 1–48 of 48 results for author: Bauckhage, C