-
AdaptEval: Evaluating Large Language Models on Domain Adaptation for Text Summarization
Authors:
Anum Afzal,
Ribin Chalumattu,
Florian Matthes,
Laura Mascarell Espuny
Abstract:
Despite the advances in the abstractive summarization task using Large Language Models (LLM), there is a lack of research that asses their abilities to easily adapt to different domains. We evaluate the domain adaptation abilities of a wide range of LLMs on the summarization task across various domains in both fine-tuning and in-context learning settings. We also present AdaptEval, the first domai…
▽ More
Despite the advances in the abstractive summarization task using Large Language Models (LLM), there is a lack of research that asses their abilities to easily adapt to different domains. We evaluate the domain adaptation abilities of a wide range of LLMs on the summarization task across various domains in both fine-tuning and in-context learning settings. We also present AdaptEval, the first domain adaptation evaluation suite. AdaptEval includes a domain benchmark and a set of metrics to facilitate the analysis of domain adaptation. Our results demonstrate that LLMs exhibit comparable performance in the in-context learning setting, regardless of their parameter scale.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Towards Optimizing and Evaluating a Retrieval Augmented QA Chatbot using LLMs with Human in the Loop
Authors:
Anum Afzal,
Alexander Kowsik,
Rajna Fani,
Florian Matthes
Abstract:
Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and e…
▽ More
Large Language Models have found application in various mundane and repetitive tasks including Human Resource (HR) support. We worked with the domain experts of SAP SE to develop an HR support chatbot as an efficient and effective tool for addressing employee inquiries. We inserted a human-in-the-loop in various parts of the development cycles such as dataset collection, prompt optimization, and evaluation of generated output. By enhancing the LLM-driven chatbot's response quality and exploring alternative retrieval methods, we have created an efficient, scalable, and flexible tool for HR professionals to address employee inquiries effectively. Our experiments and evaluation conclude that GPT-4 outperforms other models and can overcome inconsistencies in data through internal reasoning capabilities. Additionally, through expert analysis, we infer that reference-free evaluation metrics such as G-Eval and Prometheus demonstrate reliability closely aligned with that of human evaluation.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives
Authors:
Stephen Meisenbacher,
Alexandra Klymenko,
Patrick Gage Kelley,
Sai Teja Peddinti,
Kurt Thomas,
Florian Matthes
Abstract:
The rise of powerful AI models, more formally $\textit{General-Purpose AI Systems}$ (GPAIS), has led to impressive leaps in performance across a wide range of tasks. At the same time, researchers and practitioners alike have raised a number of privacy concerns, resulting in a wealth of literature covering various privacy risks and vulnerabilities of AI models. Works surveying such risks provide di…
▽ More
The rise of powerful AI models, more formally $\textit{General-Purpose AI Systems}$ (GPAIS), has led to impressive leaps in performance across a wide range of tasks. At the same time, researchers and practitioners alike have raised a number of privacy concerns, resulting in a wealth of literature covering various privacy risks and vulnerabilities of AI models. Works surveying such risks provide differing focuses, leading to disparate sets of privacy risks with no clear unifying taxonomy. We conduct a systematic review of these survey papers to provide a concise and usable overview of privacy risks in GPAIS, as well as proposed mitigation strategies. The developed privacy framework strives to unify the identified privacy risks and mitigations at a technical level that is accessible to non-experts. This serves as the basis for a practitioner-focused interview study to assess technical stakeholder perceptions of privacy risks and mitigations in GPAIS.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Engineering Conversational Search Systems: A Review of Applications, Architectures, and Functional Components
Authors:
Phillip Schneider,
Wessel Poelman,
Michael Rovatsos,
Florian Matthes
Abstract:
Conversational search systems enable information retrieval via natural language interactions, with the goal of maximizing users' information gain over multiple dialogue turns. The increasing prevalence of conversational interfaces adopting this search paradigm challenges traditional information retrieval approaches, stressing the importance of better understanding the engineering process of develo…
▽ More
Conversational search systems enable information retrieval via natural language interactions, with the goal of maximizing users' information gain over multiple dialogue turns. The increasing prevalence of conversational interfaces adopting this search paradigm challenges traditional information retrieval approaches, stressing the importance of better understanding the engineering process of developing these systems. We undertook a systematic literature review to investigate the links between theoretical studies and technical implementations of conversational search systems. Our review identifies real-world application scenarios, system architectures, and functional components. We consolidate our results by presenting a layered architecture framework and explaining the core functions of conversational search systems. Furthermore, we reflect on our findings in light of the rapid progress in large language models, discussing their capabilities, limitations, and directions for future research.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy
Authors:
Stephen Meisenbacher,
Maulik Chevli,
Florian Matthes
Abstract:
Applications of Differential Privacy (DP) in NLP must distinguish between the syntactic level on which a proposed mechanism operates, often taking the form of $\textit{word-level}$ or $\textit{document-level}$ privatization. Recently, several word-level $\textit{Metric}$ Differential Privacy approaches have been proposed, which rely on this generalized DP notion for operating in word embedding spa…
▽ More
Applications of Differential Privacy (DP) in NLP must distinguish between the syntactic level on which a proposed mechanism operates, often taking the form of $\textit{word-level}$ or $\textit{document-level}$ privatization. Recently, several word-level $\textit{Metric}$ Differential Privacy approaches have been proposed, which rely on this generalized DP notion for operating in word embedding spaces. These approaches, however, often fail to produce semantically coherent textual outputs, and their application at the sentence- or document-level is only possible by a basic composition of word perturbations. In this work, we strive to address these challenges by operating $\textit{between}$ the word and sentence levels, namely with $\textit{collocations}$. By perturbing n-grams rather than single words, we devise a method where composed privatized outputs have higher semantic coherence and variable length. This is accomplished by constructing an embedding model based on frequently occurring word groups, in which unigram words co-exist with bi- and trigram collocations. We evaluate our method in utility and privacy tests, which make a clear case for tokenization strategies beyond the word level.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
DP-MLM: Differentially Private Text Rewriting Using Masked Language Models
Authors:
Stephen Meisenbacher,
Maulik Chevli,
Juraj Vladika,
Florian Matthes
Abstract:
The task of text privatization using Differential Privacy has recently taken the form of $\textit{text rewriting}$, in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting proce…
▽ More
The task of text privatization using Differential Privacy has recently taken the form of $\textit{text rewriting}$, in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting process. In response to this, we propose $\textbf{DP-MLM}$, a new method for differentially private text rewriting based on leveraging masked language models (MLMs) to rewrite text in a semantically similar $\textit{and}$ obfuscated manner. We accomplish this with a simple contextualization technique, whereby we rewrite a text one token at a time. We find that utilizing encoder-only MLMs provides better utility preservation at lower $\varepsilon$ levels, as compared to previous methods relying on larger models with a decoder. In addition, MLMs allow for greater customization of the rewriting mechanism, as opposed to generative approaches. We make the code for $\textbf{DP-MLM}$ public and reusable, found at https://github.com/sjmeis/DPMLM .
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
NLP-KG: A System for Exploratory Search of Scientific Literature in Natural Language Processing
Authors:
Tim Schopf,
Florian Matthes
Abstract:
Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the explo…
▽ More
Scientific literature searches are often exploratory, whereby users are not yet familiar with a particular field or concept but are interested in learning more about it. However, existing systems for scientific literature search are typically tailored to keyword-based lookup searches, limiting the possibilities for exploration. We propose NLP-KG, a feature-rich system designed to support the exploration of research literature in unfamiliar natural language processing (NLP) fields. In addition to a semantic search, NLP-KG allows users to easily find survey papers that provide a quick introduction to a field of interest. Further, a Fields of Study hierarchy graph enables users to familiarize themselves with a field and its related areas. Finally, a chat interface allows users to ask questions about unfamiliar concepts or specific articles in NLP and obtain answers grounded in knowledge retrieved from scientific publications. Our system provides users with comprehensive exploration possibilities, supporting them in investigating the relationships between different fields, understanding unfamiliar concepts in NLP, and finding relevant research literature. Demo, video, and code are available at: https://github.com/NLP-Knowledge-Graph/NLP-KG-WebApp.
△ Less
Submitted 4 July, 2024; v1 submitted 21 June, 2024;
originally announced June 2024.
-
AGB-DE: A Corpus for the Automated Legal Assessment of Clauses in German Consumer Contracts
Authors:
Daniel Braun,
Florian Matthes
Abstract:
Legal tasks and datasets are often used as benchmarks for the capabilities of language models. However, openly available annotated datasets are rare. In this paper, we introduce AGB-DE, a corpus of 3,764 clauses from German consumer contracts that have been annotated and legally assessed by legal experts. Together with the data, we present a first baseline for the task of detecting potentially voi…
▽ More
Legal tasks and datasets are often used as benchmarks for the capabilities of language models. However, openly available annotated datasets are rare. In this paper, we introduce AGB-DE, a corpus of 3,764 clauses from German consumer contracts that have been annotated and legally assessed by legal experts. Together with the data, we present a first baseline for the task of detecting potentially void clauses, comparing the performance of an SVM baseline with three fine-tuned open language models and the performance of GPT-3.5. Our results show the challenging nature of the task, with no approach exceeding an F1-score of 0.54. While the fine-tuned models often performed better with regard to precision, GPT-3.5 outperformed the other approaches with regard to recall. An analysis of the errors indicates that one of the main challenges could be the correct interpretation of complex clauses, rather than the decision boundaries of what is permissible and what is not.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
Authors:
Juraj Vladika,
Phillip Schneider,
Florian Matthes
Abstract:
In recent years, Large Language Models (LLMs) have demonstrated an impressive ability to encode knowledge during pre-training on large text corpora. They can leverage this knowledge for downstream tasks like question answering (QA), even in complex areas involving health topics. Considering their high potential for facilitating clinical work in the future, understanding the quality of encoded medi…
▽ More
In recent years, Large Language Models (LLMs) have demonstrated an impressive ability to encode knowledge during pre-training on large text corpora. They can leverage this knowledge for downstream tasks like question answering (QA), even in complex areas involving health topics. Considering their high potential for facilitating clinical work in the future, understanding the quality of encoded medical knowledge and its recall in LLMs is an important step forward. In this study, we examine the capability of LLMs to exhibit medical knowledge recall by constructing a novel dataset derived from systematic reviews -- studies synthesizing evidence-based answers for specific medical questions. Through experiments on the new MedREQAL dataset, comprising question-answer pairs extracted from rigorous systematic reviews, we assess six LLMs, such as GPT and Mixtral, analyzing their classification and generation performance. Our experimental insights into LLM performance on the novel biomedical QA dataset reveal the still challenging nature of this task.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text
Authors:
Stephen Meisenbacher,
Florian Matthes
Abstract:
The study of Differential Privacy (DP) in Natural Language Processing often views the task of text privatization as a $\textit{rewriting}$ task, in which sensitive input texts are rewritten to hide explicit or implicit private information. In order to evaluate the privacy-preserving capabilities of a DP text rewriting mechanism, $\textit{empirical privacy}$ tests are frequently employed. In these…
▽ More
The study of Differential Privacy (DP) in Natural Language Processing often views the task of text privatization as a $\textit{rewriting}$ task, in which sensitive input texts are rewritten to hide explicit or implicit private information. In order to evaluate the privacy-preserving capabilities of a DP text rewriting mechanism, $\textit{empirical privacy}$ tests are frequently employed. In these tests, an adversary is modeled, who aims to infer sensitive information (e.g., gender) about the author behind a (privatized) text. Looking to improve the empirical protections provided by DP rewriting methods, we propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts, where DP rewritten texts are rewritten $\textit{again}$. Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations. Therefore, our approach raises the bar for DP rewriting methods in their empirical privacy evaluations, providing an extra layer of protection against malicious adversaries.
△ Less
Submitted 31 May, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy
Authors:
Stephen Meisenbacher,
Maulik Chevli,
Florian Matthes
Abstract:
The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potential…
▽ More
The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Towards A Structured Overview of Use Cases for Natural Language Processing in the Legal Domain: A German Perspective
Authors:
Juraj Vladika,
Stephen Meisenbacher,
Martina Preis,
Alexandra Klymenko,
Florian Matthes
Abstract:
In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. Amidst the steady flow of research solutions stemming from the NLP domain, the study of use cases has fallen behind, leading to a number of innovative technical methods without a place in practice. In this work, we aim…
▽ More
In recent years, the field of Legal Tech has risen in prevalence, as the Natural Language Processing (NLP) and legal disciplines have combined forces to digitalize legal processes. Amidst the steady flow of research solutions stemming from the NLP domain, the study of use cases has fallen behind, leading to a number of innovative technical methods without a place in practice. In this work, we aim to build a structured overview of Legal Tech use cases, grounded in NLP literature, but also supplemented by voices from legal practice in Germany. Based upon a Systematic Literature Review, we identify seven categories of NLP technologies for the legal domain, which are then studied in juxtaposition to 22 legal use cases. In the investigation of these use cases, we identify 15 ethical, legal, and social aspects (ELSA), shedding light on the potential concerns of digitally transforming the legal domain.
△ Less
Submitted 2 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
Authors:
Juraj Vladika,
Florian Matthes
Abstract:
In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large kno…
▽ More
In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases. By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets. We modify different retrieval settings to observe their influence on the QA pipeline's performance, including the number of retrieved documents, sentence selection process, the publication year of articles, and their number of citations. Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%. We discuss the results, highlight interesting examples, and outline challenges for future research, like managing evidence disagreement and crafting user-friendly explanations.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
A Comparative Analysis of Word-Level Metric Differential Privacy: Benchmarking The Privacy-Utility Trade-off
Authors:
Stephen Meisenbacher,
Nihildev Nandakumar,
Alexandra Klymenko,
Florian Matthes
Abstract:
The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" r…
▽ More
The application of Differential Privacy to Natural Language Processing techniques has emerged in relevance in recent years, with an increasing number of studies published in established NLP outlets. In particular, the adaptation of Differential Privacy for use in NLP tasks has first focused on the $\textit{word-level}$, where calibrated noise is added to word embedding vectors to achieve "noisy" representations. To this end, several implementations have appeared in the literature, each presenting an alternative method of achieving word-level Differential Privacy. Although each of these includes its own evaluation, no comparative analysis has been performed to investigate the performance of such methods relative to each other. In this work, we conduct such an analysis, comparing seven different algorithms on two NLP tasks with varying hyperparameters, including the $\textit{epsilon ($\varepsilon$)}$ parameter, or privacy budget. In addition, we provide an in-depth analysis of the results with a focus on the privacy-utility trade-off, as well as open-source our implementation code for further reproduction. As a result of our analysis, we give insight into the benefits and challenges of word-level Differential Privacy, and accordingly, we suggest concrete steps forward for the research field.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Enterprise Use Cases Combining Knowledge Graphs and Natural Language Processing
Authors:
Phillip Schneider,
Tim Schopf,
Juraj Vladika,
Florian Matthes
Abstract:
Knowledge management is a critical challenge for enterprises in today's digital world, as the volume and complexity of data being generated and collected continue to grow incessantly. Knowledge graphs (KG) emerged as a promising solution to this problem by providing a flexible, scalable, and semantically rich way to organize and make sense of data. This paper builds upon a recent survey of the res…
▽ More
Knowledge management is a critical challenge for enterprises in today's digital world, as the volume and complexity of data being generated and collected continue to grow incessantly. Knowledge graphs (KG) emerged as a promising solution to this problem by providing a flexible, scalable, and semantically rich way to organize and make sense of data. This paper builds upon a recent survey of the research literature on combining KGs and Natural Language Processing (NLP). Based on selected application scenarios from enterprise context, we discuss synergies that result from such a combination. We cover various approaches from the three core areas of KG construction, reasoning as well as KG-based NLP tasks. In addition to explaining innovative enterprise use cases, we assess their maturity in terms of practical applicability and conclude with an outlook on emergent application areas for the future.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Comparing Knowledge Sources for Open-Domain Scientific Claim Verification
Authors:
Juraj Vladika,
Florian Matthes
Abstract:
The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of developing efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the system…
▽ More
The increasing rate at which scientific knowledge is discovered and health claims shared online has highlighted the importance of developing efficient fact-checking systems for scientific claims. The usual setting for this task in the literature assumes that the documents containing the evidence for claims are already provided and annotated or contained in a limited corpus. This renders the systems unrealistic for real-world settings where knowledge sources with potentially millions of documents need to be queried to find relevant evidence. In this paper, we perform an array of experiments to test the performance of open-domain claim verification systems. We test the final verdict prediction of systems on four datasets of biomedical and health claims in different settings. While keeping the pipeline's evidence selection and verdict prediction parts constant, document retrieval is performed over three common knowledge sources (PubMed, Wikipedia, Google) and using two different information retrieval techniques. We show that PubMed works better with specialized biomedical claims, while Wikipedia is more suited for everyday health concerns. Likewise, BM25 excels in retrieval precision, while semantic search in recall of relevant evidence. We discuss the results, outline frequent retrieval patterns and challenges, and provide promising future directions.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation
Authors:
Phillip Schneider,
Manuel Klettner,
Elena Simperl,
Florian Matthes
Abstract:
Generating natural language text from graph-structured data is essential for conversational information seeking. Semantic triples derived from knowledge graphs can serve as a valuable source for grounding responses from conversational agents by providing a factual basis for the information they communicate. This is especially relevant in the context of large language models, which offer great pote…
▽ More
Generating natural language text from graph-structured data is essential for conversational information seeking. Semantic triples derived from knowledge graphs can serve as a valuable source for grounding responses from conversational agents by providing a factual basis for the information they communicate. This is especially relevant in the context of large language models, which offer great potential for conversational interaction but are prone to hallucinating, omitting, or producing conflicting information. In this study, we conduct an empirical analysis of conversational large language models in generating natural language text from semantic triples. We compare four large language models of varying sizes with different prompting techniques. Through a series of benchmark experiments on the WebNLG dataset, we analyze the models' performance and identify the most common issues in the generated predictions. Our findings show that the capabilities of large language models in triple verbalization can be significantly improved through few-shot prompting, post-processing, and efficient fine-tuning techniques, particularly for smaller models that exhibit lower zero-shot performance.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
A Universal System for OpenID Connect Sign-ins with Verifiable Credentials and Cross-Device Flow
Authors:
Felix Hoops,
Florian Matthes
Abstract:
Self-Sovereign Identity (SSI), as a new and promising identity management paradigm, needs mechanisms that can ease a gradual transition of existing services and developers towards it. Systems that bridge the gap between SSI and established identity and access management have been proposed but still lack adoption. We argue that they are all some combination of too complex, locked into specific ecos…
▽ More
Self-Sovereign Identity (SSI), as a new and promising identity management paradigm, needs mechanisms that can ease a gradual transition of existing services and developers towards it. Systems that bridge the gap between SSI and established identity and access management have been proposed but still lack adoption. We argue that they are all some combination of too complex, locked into specific ecosystems, have no source code available, or are not sufficiently documented. We propose a comparatively simple system that enables SSI-based sign-ins for services that support the widespread OpenID Connect or OAuth 2.0 protocols. Its handling of claims is highly configurable through a single policy and designed for cross-device authentication flows involving a smartphone identity wallet. For external interfaces, we solely rely on open standards, such as the recent OpenID for Verifiable Credentials standards. We provide our implementation as open-source software intended for prototyping and as a reference. Also, we contribute a detailed technical discussion of our particular sign-in flow. To prove its feasibility, we have successfully tested it with existing software and realistic hardware.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Playing the MEV Game on a First-Come-First-Served Blockchain
Authors:
Burak Öz,
Jonas Gebele,
Parshant Singh,
Filip Rezabek,
Florian Matthes
Abstract:
Maximal Extractable Value (MEV) searching has gained prominence on the Ethereum blockchain since the surge in Decentralized Finance activities. In Ethereum, MEV extraction primarily hinges on fee payments to block proposers. However, in First-Come-First-Served (FCFS) blockchain networks, the focus shifts to latency optimizations, akin to High-Frequency Trading in Traditional Finance. This paper il…
▽ More
Maximal Extractable Value (MEV) searching has gained prominence on the Ethereum blockchain since the surge in Decentralized Finance activities. In Ethereum, MEV extraction primarily hinges on fee payments to block proposers. However, in First-Come-First-Served (FCFS) blockchain networks, the focus shifts to latency optimizations, akin to High-Frequency Trading in Traditional Finance. This paper illustrates the dynamics of the MEV extraction game in an FCFS network, specifically Algorand. We introduce an arbitrage detection algorithm tailored to the unique time constraints of FCFS networks and assess its effectiveness. Additionally, our experiments investigate potential optimizations in Algorand's network layer to secure optimal execution positions.
Our analysis reveals that while the states of relevant trading pools are updated approximately every six blocks on median, pursuing MEV at the block state level is not viable on Algorand, as arbitrage opportunities are typically executed within the blocks they appear. Our algorithm's performance under varying time constraints underscores the importance of timing in arbitrage discovery. Furthermore, our network-level experiments identify critical transaction prioritization strategies for Algorand's FCFS network. Key among these is reducing latency in connections with relays that are well-connected to high-staked proposers.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs
Authors:
Phillip Schneider,
Manuel Klettner,
Kristiina Jokinen,
Elena Simperl,
Florian Matthes
Abstract:
Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversa…
▽ More
Conversational question answering systems often rely on semantic parsing to enable interactive information retrieval, which involves the generation of structured database queries from a natural language input. For information-seeking conversations about facts stored within a knowledge graph, dialogue utterances are transformed into graph queries in a process that is called knowledge-based conversational question answering. This paper evaluates the performance of large language models that have not been explicitly pre-trained on this task. Through a series of experiments on an extensive benchmark dataset, we compare models of varying sizes with different prompting techniques and identify common issue types in the generated output. Our results demonstrate that large language models are capable of generating graph queries from dialogues, with significant improvements achievable through few-shot prompting and fine-tuning techniques, especially for smaller models that exhibit lower zero-shot performance.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs
Authors:
Juraj Vladika,
Alexander Fichtl,
Florian Matthes
Abstract:
Recent advances in natural language processing (NLP) owe their success to pre-training language models on large amounts of unstructured data. Still, there is an increasing effort to combine the unstructured nature of LMs with structured knowledge and reasoning. Particularly in the rapidly evolving field of biomedical NLP, knowledge-enhanced language models (KELMs) have emerged as promising tools t…
▽ More
Recent advances in natural language processing (NLP) owe their success to pre-training language models on large amounts of unstructured data. Still, there is an increasing effort to combine the unstructured nature of LMs with structured knowledge and reasoning. Particularly in the rapidly evolving field of biomedical NLP, knowledge-enhanced language models (KELMs) have emerged as promising tools to bridge the gap between large language models and domain-specific knowledge, considering the available biomedical knowledge graphs (KGs) curated by experts over the decades. In this paper, we develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models (PLMs). We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical ontology OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. The approach includes partitioning knowledge graphs into smaller subgraphs, fine-tuning adapter modules for each subgraph, and combining the knowledge in a fusion layer. We test the performance on three downstream tasks: document classification,question answering, and natural language inference. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low. Finally, we provide a detailed interpretation of the results and report valuable insights for future work.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
A Knowledge Graph Approach for Exploratory Search in Research Institutions
Authors:
Tim Schopf,
Nektrios Machner,
Florian Matthes
Abstract:
Over the past decades, research institutions have grown increasingly and consequently also their research output. This poses a significant challenge for researchers seeking to understand the research landscape of an institution. The process of exploring the research landscape of institutions has a vague information need, no precise goal, and is open-ended. Current applications are not designed to…
▽ More
Over the past decades, research institutions have grown increasingly and consequently also their research output. This poses a significant challenge for researchers seeking to understand the research landscape of an institution. The process of exploring the research landscape of institutions has a vague information need, no precise goal, and is open-ended. Current applications are not designed to fulfill the requirements for exploratory search in research institutions. In this paper, we analyze exploratory search in research institutions and propose a knowledge graph-based approach to enhance this process.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
A Taxonomy of Decentralized Identifier Methods for Practitioners
Authors:
Felix Hoops,
Alexander Mühle,
Florian Matthes,
Christoph Meinel
Abstract:
A core part of the new identity management paradigm of Self-Sovereign Identity (SSI) is the W3C Decentralized Identifiers (DIDs) standard. The diversity of interoperable implementations encouraged by the paradigm is key for a less centralized future, and it is made possible by the concept of DIDs. However, this leads to a kind of dilemma of choices, where practitioners are faced with the difficult…
▽ More
A core part of the new identity management paradigm of Self-Sovereign Identity (SSI) is the W3C Decentralized Identifiers (DIDs) standard. The diversity of interoperable implementations encouraged by the paradigm is key for a less centralized future, and it is made possible by the concept of DIDs. However, this leads to a kind of dilemma of choices, where practitioners are faced with the difficult decision of which methods to choose and support in their applications. Due to the decentralized development of DID method specifications and the overwhelming number of different choices, it is hard to get an overview. In this paper, we propose a taxonomy of DID methods with the goal to empower practitioners to make informed decisions when selecting DID methods. To that end, our taxonomy is designed to provide an overview of the current landscape while providing adoption-relevant characteristics. For this purpose, we rely on the Nickerson et al. methodology for taxonomy creation, utilizing both conceptual-to-empirical and empirical-to-conceptual approaches. During the iterative process, we collect and survey an extensive and potentially exhaustive list of around 160 DID methods from various sources. The taxonomy we arrive at uses a total of 7 dimensions and 22 characteristics to span the contemporary design space of DID methods from the perspective of a practitioner. In addition to elaborating on these characteristics, we also discuss how a practitioner can use the taxonomy to select suitable DID methods for a specific use case.
△ Less
Submitted 18 October, 2023;
originally announced November 2023.
-
From Data to Dialogue: Leveraging the Structure of Knowledge Graphs for Conversational Exploratory Search
Authors:
Phillip Schneider,
Nils Rehtanz,
Kristiina Jokinen,
Florian Matthes
Abstract:
Exploratory search is an open-ended information retrieval process that aims at discovering knowledge about a topic or domain rather than searching for a specific answer or piece of information. Conversational interfaces are particularly suitable for supporting exploratory search, allowing users to refine queries and examine search results through interactive dialogues. In addition to conversationa…
▽ More
Exploratory search is an open-ended information retrieval process that aims at discovering knowledge about a topic or domain rather than searching for a specific answer or piece of information. Conversational interfaces are particularly suitable for supporting exploratory search, allowing users to refine queries and examine search results through interactive dialogues. In addition to conversational search interfaces, knowledge graphs are also useful in supporting information exploration due to their rich semantic representation of data items. In this study, we demonstrate the synergistic effects of combining knowledge graphs and conversational interfaces for exploratory search, bridging the gap between structured and unstructured information retrieval. To this end, we propose a knowledge-driven dialogue system for exploring news articles by asking natural language questions and using the graph structure to navigate between related topics. Based on a user study with 54 participants, we empirically evaluate the effectiveness of the graph-based exploratory search and discuss design implications for developing such systems.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
HealthFC: Verifying Health Claims with Evidence-Based Medical Fact-Checking
Authors:
Juraj Vladika,
Phillip Schneider,
Florian Matthes
Abstract:
In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for thi…
▽ More
In the digital age, seeking health advice on the Internet has become a common practice. At the same time, determining the trustworthiness of online medical content is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance automated Natural Language Processing (NLP) solutions for this task, in this paper we introduce a novel dataset HealthFC. It consists of 750 health-related claims in German and English, labeled for veracity by medical experts and backed with evidence from systematic reviews and clinical trials. We provide an analysis of the dataset, highlighting its characteristics and challenges. The dataset can be used for NLP tasks related to automated fact-checking, such as evidence retrieval, claim verification, or explanation generation. For testing purposes, we provide baseline systems based on different approaches, examine their performance, and discuss the findings. We show that the dataset is a challenging test bed with a high potential for future use.
△ Less
Submitted 25 March, 2024; v1 submitted 15 September, 2023;
originally announced September 2023.
-
A Study of MEV Extraction Techniques on a First-Come-First-Served Blockchain
Authors:
Burak Öz,
Filip Rezabek,
Jonas Gebele,
Felix Hoops,
Florian Matthes
Abstract:
Maximal Extractable Value (MEV) has become a significant incentive on blockchain networks, referring to the value captured through the manipulation of transaction execution order and strategic issuance of profit-generation transactions. We argue that transaction ordering techniques used for MEV extraction in blockchains where fees can influence the execution order do not directly apply to blockcha…
▽ More
Maximal Extractable Value (MEV) has become a significant incentive on blockchain networks, referring to the value captured through the manipulation of transaction execution order and strategic issuance of profit-generation transactions. We argue that transaction ordering techniques used for MEV extraction in blockchains where fees can influence the execution order do not directly apply to blockchains where the order is determined based on transactions' arrival times. Such blockchains' First-Come-First-Served (FCFS) nature can yield different optimization strategies for entities seeking MEV, known as searchers, requiring further study. This paper explores the applicability of MEV extraction techniques observed on Ethereum, a fee-based blockchain, to Algorand, an FCFS blockchain. Our results show the prevalence of arbitrage MEV getting extracted through backruns on pending transactions in the network, uniformly distributed to block positions. However, on-chain data do not reveal latency optimizations between specific MEV searchers and Algorand block proposers. We also study network clogging attacks and argue how searchers can exploit them as a viable ordering technique for MEV extraction in FCFS networks.
△ Less
Submitted 15 January, 2024; v1 submitted 12 August, 2023;
originally announced August 2023.
-
SoK: Assessing the State of Applied Federated Machine Learning
Authors:
Tobias Müller,
Maximilian Stäbler,
Hugo Gascón,
Frank Köster,
Florian Matthes
Abstract:
Machine Learning (ML) has shown significant potential in various applications; however, its adoption in privacy-critical domains has been limited due to concerns about data privacy. A promising solution to this issue is Federated Machine Learning (FedML), a model-to-data approach that prioritizes data privacy. By enabling ML algorithms to be applied directly to distributed data sources without sha…
▽ More
Machine Learning (ML) has shown significant potential in various applications; however, its adoption in privacy-critical domains has been limited due to concerns about data privacy. A promising solution to this issue is Federated Machine Learning (FedML), a model-to-data approach that prioritizes data privacy. By enabling ML algorithms to be applied directly to distributed data sources without sharing raw data, FedML offers enhanced privacy protections, making it suitable for privacy-critical environments. Despite its theoretical benefits, FedML has not seen widespread practical implementation. This study aims to explore the current state of applied FedML and identify the challenges hindering its practical adoption. Through a comprehensive systematic literature review, we assess 74 relevant papers to analyze the real-world applicability of FedML. Our analysis focuses on the characteristics and emerging trends of FedML implementations, as well as the motivational drivers and application domains. We also discuss the encountered challenges in integrating FedML into real-life settings. By shedding light on the existing landscape and potential obstacles, this research contributes to the further development and implementation of FedML in privacy-critical scenarios.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
Exploring the Landscape of Natural Language Processing Research
Authors:
Tim Schopf,
Karim Arabi,
Florian Matthes
Abstract:
As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies…
▽ More
As an efficient approach to understand, generate, and process natural language texts, research in natural language processing (NLP) has exhibited a rapid spread and wide adoption in recent years. Given the increasing research work in this area, several NLP-related approaches have been surveyed in the research community. However, a comprehensive study that categorizes established topics, identifies trends, and outlines areas for future research remains absent. Contributing to closing this gap, we have systematically classified and analyzed research papers in the ACL Anthology. As a result, we present a structured overview of the research landscape, provide a taxonomy of fields of study in NLP, analyze recent developments in NLP, summarize our findings, and highlight directions for future work.
△ Less
Submitted 24 September, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity Using Contrastive Learning and Structured Knowledge
Authors:
Tim Schopf,
Emanuel Gerber,
Malte Ostendorff,
Florian Matthes
Abstract:
Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on certain predefined aspects. Thus, similarity predictions of texts are more targeted to specific requirements and more easily explainable. In this paper, we pres…
▽ More
Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on certain predefined aspects. Thus, similarity predictions of texts are more targeted to specific requirements and more easily explainable. In this paper, we present AspectCSE, an approach for aspect-based contrastive learning of sentence embeddings. Results indicate that AspectCSE achieves an average improvement of 3.97% on information retrieval tasks across multiple aspects compared to the previous best results. We also propose using Wikidata knowledge graph properties to train models of multi-aspect sentence embeddings in which multiple specific aspects are simultaneously considered during similarity predictions. We demonstrate that multi-aspect embeddings outperform single-aspect embeddings on aspect-specific information retrieval tasks. Finally, we examine the aspect-based sentence embedding space and demonstrate that embeddings of semantically similar aspect labels are often close, even without explicit similarity training between different aspect labels.
△ Less
Submitted 24 September, 2023; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Time Moves Faster When There is Nothing You Anticipate: The Role of Time in MEV Rewards
Authors:
Burak Öz,
Benjamin Kraner,
Nicolò Vallarano,
Bingle Stegmann Kruger,
Florian Matthes,
Claudio Juan Tessone
Abstract:
This study explores the intricacies of waiting games, a novel dynamic that emerged with Ethereum's transition to a Proof-of-Stake (PoS)-based block proposer selection protocol. Within this PoS framework, validators acquire a distinct monopoly position during their assigned slots, given that block proposal rights are set deterministically, contrasting with Proof-of-Work (PoW) protocols. Consequentl…
▽ More
This study explores the intricacies of waiting games, a novel dynamic that emerged with Ethereum's transition to a Proof-of-Stake (PoS)-based block proposer selection protocol. Within this PoS framework, validators acquire a distinct monopoly position during their assigned slots, given that block proposal rights are set deterministically, contrasting with Proof-of-Work (PoW) protocols. Consequently, validators have the power to delay block proposals, stepping outside the honest validator specs, optimizing potential returns through MEV payments. Nonetheless, this strategic behaviour introduces the risk of orphaning if attestors fail to observe and vote on the block timely. Our quantitative analysis of this waiting phenomenon and its associated risks reveals an opportunity for enhanced MEV extraction, exceeding standard protocol rewards, and providing sufficient incentives for validators to play the game. Notably, our findings indicate that delayed proposals do not always result in orphaning and orphaned blocks are not consistently proposed later than non-orphaned ones. To further examine consensus stability under varying network conditions, we adopt an agent-based simulation model tailored for PoS-Ethereum, illustrating that consensus disruption will not be observed unless significant delay strategies are adopted. Ultimately, this research offers valuable insights into the advent of waiting games on Ethereum, providing a comprehensive understanding of trade-offs and potential profits for validators within the blockchain ecosystem.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Efficient Domain Adaptation of Sentence Embeddings Using Adapters
Authors:
Tim Schopf,
Dennis N. Schneider,
Florian Matthes
Abstract:
Sentence embeddings enable us to capture the semantic similarity of short texts. Most sentence embedding models are trained for general semantic textual similarity tasks. Therefore, to use sentence embeddings in a particular domain, the model must be adapted to it in order to achieve good results. Usually, this is done by fine-tuning the entire sentence embedding model for the domain of interest.…
▽ More
Sentence embeddings enable us to capture the semantic similarity of short texts. Most sentence embedding models are trained for general semantic textual similarity tasks. Therefore, to use sentence embeddings in a particular domain, the model must be adapted to it in order to achieve good results. Usually, this is done by fine-tuning the entire sentence embedding model for the domain of interest. While this approach yields state-of-the-art results, all of the model's weights are updated during fine-tuning, making this method resource-intensive. Therefore, instead of fine-tuning entire sentence embedding models for each target domain individually, we propose to train lightweight adapters. These domain-specific adapters do not require fine-tuning all underlying sentence embedding model parameters. Instead, we only train a small number of additional parameters while keeping the weights of the underlying sentence embedding model fixed. Training domain-specific adapters allows always using the same base model and only exchanging the domain-specific adapters to adapt sentence embeddings to a specific domain. We show that using adapters for parameter-efficient domain adaptation of sentence embeddings yields competitive performance within 1% of a domain-adapted, entirely fine-tuned sentence embedding model while only training approximately 3.6% of the parameters.
△ Less
Submitted 24 September, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Challenges in Domain-Specific Abstractive Summarization and How to Overcome them
Authors:
Anum Afzal,
Juraj Vladika,
Daniel Braun,
Florian Matthes
Abstract:
Large Language Models work quite well with general-purpose data and many tasks in Natural Language Processing. However, they show several limitations when used for a task such as domain-specific abstractive text summarization. This paper identifies three of those limitations as research problems in the context of abstractive text summarization: 1) Quadratic complexity of transformer-based models w…
▽ More
Large Language Models work quite well with general-purpose data and many tasks in Natural Language Processing. However, they show several limitations when used for a task such as domain-specific abstractive text summarization. This paper identifies three of those limitations as research problems in the context of abstractive text summarization: 1) Quadratic complexity of transformer-based models with respect to the input text length; 2) Model Hallucination, which is a model's ability to generate factually incorrect text; and 3) Domain Shift, which happens when the distribution of the model's training and test corpus is not the same. Along with a discussion of the open research questions, this paper also provides an assessment of existing state-of-the-art techniques relevant to domain-specific text summarization to address the research gaps.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Identifying Practical Challenges in the Implementation of Technical Measures for Data Privacy Compliance
Authors:
Oleksandra Klymenko,
Stephen Meisenbacher,
Florian Matthes
Abstract:
Modern privacy regulations provide a strict mandate for data processing entities to implement appropriate technical measures to demonstrate compliance. In practice, determining what measures are indeed "appropriate" is not trivial, particularly in light of vague guidelines provided by privacy regulations. To exacerbate the issue, challenges arise not only in the implementation of the technical mea…
▽ More
Modern privacy regulations provide a strict mandate for data processing entities to implement appropriate technical measures to demonstrate compliance. In practice, determining what measures are indeed "appropriate" is not trivial, particularly in light of vague guidelines provided by privacy regulations. To exacerbate the issue, challenges arise not only in the implementation of the technical measures themselves, but also in a variety of factors involving the roles, processes, decisions, and culture surrounding the pursuit of privacy compliance. In this paper, we present 33 challenges faced in the implementation of technical measures for privacy compliance, derived from a qualitative analysis of 16 interviews with privacy professionals. In addition, we evaluate the interview findings in a survey study, which gives way to a discussion of the identified challenges and their implications.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Scientific Fact-Checking: A Survey of Resources and Approaches
Authors:
Juraj Vladika,
Florian Matthes
Abstract:
The task of fact-checking deals with assessing the veracity of factual claims based on credible evidence and background knowledge. In particular, scientific fact-checking is the variation of the task concerned with verifying claims rooted in scientific knowledge. This task has received significant attention due to the growing importance of scientific and health discussions on online platforms. Aut…
▽ More
The task of fact-checking deals with assessing the veracity of factual claims based on credible evidence and background knowledge. In particular, scientific fact-checking is the variation of the task concerned with verifying claims rooted in scientific knowledge. This task has received significant attention due to the growing importance of scientific and health discussions on online platforms. Automated scientific fact-checking methods based on NLP can help combat the spread of misinformation, assist researchers in knowledge discovery, and help individuals understand new scientific breakthroughs. In this paper, we present a comprehensive survey of existing research in this emerging field and its related tasks. We provide a task description, discuss the construction process of existing datasets, and analyze proposed models and approaches. Based on our findings, we identify intriguing challenges and outline potential future directions to advance the field.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Unlocking the Potential of Collaborative AI -- On the Socio-technical Challenges of Federated Machine Learning
Authors:
Tobias Müller,
Milena Zahn,
Florian Matthes
Abstract:
The disruptive potential of AI systems roots in the emergence of big data. Yet, a significant portion is scattered and locked in data silos, leaving its potential untapped. Federated Machine Learning is a novel AI paradigm enabling the creation of AI models from decentralized, potentially siloed data. Hence, Federated Machine Learning could technically open data silos and therefore unlock economic…
▽ More
The disruptive potential of AI systems roots in the emergence of big data. Yet, a significant portion is scattered and locked in data silos, leaving its potential untapped. Federated Machine Learning is a novel AI paradigm enabling the creation of AI models from decentralized, potentially siloed data. Hence, Federated Machine Learning could technically open data silos and therefore unlock economic potential. However, this requires collaboration between multiple parties owning data silos. Setting up collaborative business models is complex and often a reason for failure. Current literature lacks guidelines on which aspects must be considered to successfully realize collaborative AI projects. This research investigates the challenges of prevailing collaborative business models and distinct aspects of Federated Machine Learning. Through a systematic literature review, focus group, and expert interviews, we provide a systemized collection of socio-technical challenges and an extended Business Model Canvas for the initial viability assessment of collaborative AI projects.
△ Less
Submitted 28 April, 2023; v1 submitted 26 April, 2023;
originally announced April 2023.
-
Sebis at SemEval-2023 Task 7: A Joint System for Natural Language Inference and Evidence Retrieval from Clinical Trial Reports
Authors:
Juraj Vladika,
Florian Matthes
Abstract:
With the increasing number of clinical trial reports generated every day, it is becoming hard to keep up with novel discoveries that inform evidence-based healthcare recommendations. To help automate this process and assist medical experts, NLP solutions are being developed. This motivated the SemEval-2023 Task 7, where the goal was to develop an NLP system for two tasks: evidence retrieval and na…
▽ More
With the increasing number of clinical trial reports generated every day, it is becoming hard to keep up with novel discoveries that inform evidence-based healthcare recommendations. To help automate this process and assist medical experts, NLP solutions are being developed. This motivated the SemEval-2023 Task 7, where the goal was to develop an NLP system for two tasks: evidence retrieval and natural language inference from clinical trial data. In this paper, we describe our two developed systems. The first one is a pipeline system that models the two tasks separately, while the second one is a joint system that learns the two tasks simultaneously with a shared representation and a multi-task learning approach. The final system combines their outputs in an ensemble system. We formalize the models, present their characteristics and challenges, and provide an analysis of achieved results. Our system ranked 3rd out of 40 participants with a final submission.
△ Less
Submitted 2 May, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.
-
Voice-Based Conversational Agents and Knowledge Graphs for Improving News Search in Assisted Living
Authors:
Phillip Schneider,
Nils Rehtanz,
Kristiina Jokinen,
Florian Matthes
Abstract:
As the healthcare sector is facing major challenges, such as aging populations, staff shortages, and common chronic diseases, delivering high-quality care to individuals has become very difficult. Conversational agents have shown to be a promising technology to alleviate some of these issues. In the form of digital health assistants, they have the potential to improve the everyday life of the elde…
▽ More
As the healthcare sector is facing major challenges, such as aging populations, staff shortages, and common chronic diseases, delivering high-quality care to individuals has become very difficult. Conversational agents have shown to be a promising technology to alleviate some of these issues. In the form of digital health assistants, they have the potential to improve the everyday life of the elderly and chronically ill people. This includes, for example, medication reminders, routine checks, or social chit-chat. In addition, conversational agents can satisfy the fundamental need of having access to information about daily news or local events, which enables individuals to stay informed and connected with the world around them. However, finding relevant news sources and navigating the plethora of news articles available online can be overwhelming, particularly for those who may have limited technological literacy or health-related impairments. To address this challenge, we propose an innovative solution that combines knowledge graphs and conversational agents for news search in assisted living. By leveraging graph databases to semantically structure news data and implementing an intuitive voice-based interface, our system can help care-dependent people to easily discover relevant news articles and give personalized recommendations. We explain our design choices, provide a system architecture, share insights of an initial user test, and give an outlook on planned future work.
△ Less
Submitted 24 March, 2023;
originally announced March 2023.
-
Investigating Conversational Search Behavior For Domain Exploration
Authors:
Phillip Schneider,
Anum Afzal,
Juraj Vladika,
Daniel Braun,
Florian Matthes
Abstract:
Conversational search has evolved as a new information retrieval paradigm, marking a shift from traditional search systems towards interactive dialogues with intelligent search agents. This change especially affects exploratory information-seeking contexts, where conversational search systems can guide the discovery of unfamiliar domains. In these scenarios, users find it often difficult to expres…
▽ More
Conversational search has evolved as a new information retrieval paradigm, marking a shift from traditional search systems towards interactive dialogues with intelligent search agents. This change especially affects exploratory information-seeking contexts, where conversational search systems can guide the discovery of unfamiliar domains. In these scenarios, users find it often difficult to express their information goals due to insufficient background knowledge. Conversational interfaces can provide assistance by eliciting information needs and narrowing down the search space. However, due to the complexity of information-seeking behavior, the design of conversational interfaces for retrieving information remains a great challenge. Although prior work has employed user studies to empirically ground the system design, most existing studies are limited to well-defined search tasks or known domains, thus being less exploratory in nature. Therefore, we conducted a laboratory study to investigate open-ended search behavior for navigation through unknown information landscapes. The study comprised of 26 participants who were restricted in their search to a text chat interface. Based on the collected dialogue transcripts, we applied statistical analyses and process mining techniques to uncover general information-seeking patterns across five different domains. We not only identify core dialogue acts and their interrelations that enable users to discover domain knowledge, but also derive design suggestions for conversational search systems.
△ Less
Submitted 27 February, 2023; v1 submitted 10 January, 2023;
originally announced January 2023.
-
Evaluating Unsupervised Text Classification: Zero-shot and Similarity-based Approaches
Authors:
Tim Schopf,
Daniel Braun,
Florian Matthes
Abstract:
Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a trainin…
▽ More
Text classification of unseen classes is a challenging Natural Language Processing task and is mainly attempted using two different types of approaches. Similarity-based approaches attempt to classify instances based on similarities between text document representations and class description representations. Zero-shot text classification approaches aim to generalize knowledge gained from a training task by assigning appropriate labels of unknown classes to text documents. Although existing studies have already investigated individual approaches to these categories, the experiments in literature do not provide a consistent comparison. This paper addresses this gap by conducting a systematic evaluation of different similarity-based and zero-shot approaches for text classification of unseen classes. Different state-of-the-art approaches are benchmarked on four text classification datasets, including a new dataset from the medical domain. Additionally, novel SimCSE and SBERT-based baselines are proposed, as other baselines used in existing work yield weak classification results and are easily outperformed. Finally, the novel similarity-based Lbl2TransformerVec approach is presented, which outperforms previous state-of-the-art approaches in unsupervised text classification. Our experiments show that similarity-based approaches significantly outperform zero-shot approaches in most cases. Additionally, using SimCSE or SBERT embeddings instead of simpler text representations increases similarity-based classification results even further.
△ Less
Submitted 31 January, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Semantic Similarity-Based Clustering of Findings From Security Testing Tools
Authors:
Phillip Schneider,
Markus Voggenreiter,
Abdullah Gulraiz,
Florian Matthes
Abstract:
Over the last years, software development in domains with high security demands transitioned from traditional methodologies to uniting modern approaches from software development and operations (DevOps). Key principles of DevOps gained more importance and are now applied to security aspects of software development, resulting in the automation of security-enhancing activities. In particular, it is…
▽ More
Over the last years, software development in domains with high security demands transitioned from traditional methodologies to uniting modern approaches from software development and operations (DevOps). Key principles of DevOps gained more importance and are now applied to security aspects of software development, resulting in the automation of security-enhancing activities. In particular, it is common practice to use automated security testing tools that generate reports after inspecting a software artifact from multiple perspectives. However, this raises the challenge of generating duplicate security findings. To identify these duplicate findings manually, a security expert has to invest resources like time, effort, and knowledge. A partial automation of this process could reduce the analysis effort, encourage DevOps principles, and diminish the chance of human error. In this study, we investigated the potential of applying Natural Language Processing for clustering semantically similar security findings to support the identification of problem-specific duplicate findings. Towards this goal, we developed a web application for annotating and assessing security testing tool reports and published a human-annotated corpus of clustered security findings. In addition, we performed a comparison of different semantic similarity techniques for automatically grouping security findings. Finally, we assess the resulting clusters using both quantitative and qualitative evaluation methods.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Lessons Learned: Surveying the Practicality of Differential Privacy in the Industry
Authors:
Gonzalo Munilla Garrido,
Xiaoyuan Liu,
Florian Matthes,
Dawn Song
Abstract:
Since its introduction in 2006, differential privacy has emerged as a predominant statistical tool for quantifying data privacy in academic works. Yet despite the plethora of research and open-source utilities that have accompanied its rise, with limited exceptions, differential privacy has failed to achieve widespread adoption in the enterprise domain. Our study aims to shed light on the fundamen…
▽ More
Since its introduction in 2006, differential privacy has emerged as a predominant statistical tool for quantifying data privacy in academic works. Yet despite the plethora of research and open-source utilities that have accompanied its rise, with limited exceptions, differential privacy has failed to achieve widespread adoption in the enterprise domain. Our study aims to shed light on the fundamental causes underlying this academic-industrial utilization gap through detailed interviews of 24 privacy practitioners across 9 major companies. We analyze the results of our survey to provide key findings and suggestions for companies striving to improve privacy protection in their data workflows and highlight the necessary and missing requirements of existing differential privacy tools, with the goal of guiding researchers working towards the broader adoption of differential privacy. Our findings indicate that analysts suffer from lengthy bureaucratic processes for requesting access to sensitive data, yet once granted, only scarcely-enforced privacy policies stand between rogue practitioners and misuse of private information. We thus argue that differential privacy can significantly improve the processes of requesting and conducting data exploration across silos, and conclude that with a few of the improvements suggested herein, the practical use of differential privacy across the enterprise is within striking distance.
△ Less
Submitted 7 November, 2022;
originally announced November 2022.
-
Lbl2Vec: An Embedding-Based Approach for Unsupervised Document Retrieval on Predefined Topics
Authors:
Tim Schopf,
Daniel Braun,
Florian Matthes
Abstract:
In this paper, we consider the task of retrieving documents with predefined topics from an unlabeled document dataset using an unsupervised approach. The proposed unsupervised approach requires only a small number of keywords describing the respective topics and no labeled document. Existing approaches either heavily relied on a large amount of additionally encoded world knowledge or on term-docum…
▽ More
In this paper, we consider the task of retrieving documents with predefined topics from an unlabeled document dataset using an unsupervised approach. The proposed unsupervised approach requires only a small number of keywords describing the respective topics and no labeled document. Existing approaches either heavily relied on a large amount of additionally encoded world knowledge or on term-document frequencies. Contrariwise, we introduce a method that learns jointly embedded document and word vectors solely from the unlabeled document dataset in order to find documents that are semantically similar to the topics described by the keywords. The proposed method requires almost no text preprocessing but is simultaneously effective at retrieving relevant documents with high probability. When successively retrieving documents on different predefined topics from publicly available and commonly used datasets, we achieved an average area under the receiver operating characteristic curve value of 0.95 on one dataset and 0.92 on another. Further, our method can be used for multiclass document classification, without the need to assign labels to the dataset in advance. Compared with an unsupervised classification baseline, we increased F1 scores from 76.6 to 82.7 and from 61.0 to 75.1 on the respective datasets. For easy replication of our approach, we make the developed Lbl2Vec code publicly available as a ready-to-use tool under the 3-Clause BSD license.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
PatternRank: Leveraging Pretrained Language Models and Part of Speech for Unsupervised Keyphrase Extraction
Authors:
Tim Schopf,
Simon Klimek,
Florian Matthes
Abstract:
Keyphrase extraction is the process of automatically selecting a small set of most relevant phrases from a given text. Supervised keyphrase extraction approaches need large amounts of labeled training data and perform poorly outside the domain of the training data. In this paper, we present PatternRank, which leverages pretrained language models and part-of-speech for unsupervised keyphrase extrac…
▽ More
Keyphrase extraction is the process of automatically selecting a small set of most relevant phrases from a given text. Supervised keyphrase extraction approaches need large amounts of labeled training data and perform poorly outside the domain of the training data. In this paper, we present PatternRank, which leverages pretrained language models and part-of-speech for unsupervised keyphrase extraction from single documents. Our experiments show PatternRank achieves higher precision, recall and F1-scores than previous state-of-the-art approaches. In addition, we present the KeyphraseVectorizers package, which allows easy modification of part-of-speech patterns for candidate keyphrase selection, and hence adaptation of our approach to any domain.
△ Less
Submitted 12 October, 2022; v1 submitted 11 October, 2022;
originally announced October 2022.
-
A Decade of Knowledge Graphs in Natural Language Processing: A Survey
Authors:
Phillip Schneider,
Tim Schopf,
Juraj Vladika,
Mikhail Galkin,
Elena Simperl,
Florian Matthes
Abstract:
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing am…
▽ More
In pace with developments in the research field of artificial intelligence, knowledge graphs (KGs) have attracted a surge of interest from both academia and industry. As a representation of semantic relations between entities, KGs have proven to be particularly relevant for natural language processing (NLP), experiencing a rapid spread and wide adoption within recent years. Given the increasing amount of research work in this area, several KG-related approaches have been surveyed in the NLP research community. However, a comprehensive study that categorizes established topics and reviews the maturity of individual research streams remains absent to this day. Contributing to closing this gap, we systematically analyzed 507 papers from the literature on KGs in NLP. Our survey encompasses a multifaceted review of tasks, research types, and contributions. As a result, we present a structured overview of the research landscape, provide a taxonomy of tasks, summarize our findings, and highlight directions for future work.
△ Less
Submitted 30 September, 2022;
originally announced October 2022.
-
Exploring privacy-enhancing technologies in the automotive value chain
Authors:
Gonzalo Munilla Garrido,
Kaja Schmidt,
Christopher Harth-Kitzerow,
Johannes Klepsch,
Andre Luckow,
Florian Matthes
Abstract:
Privacy-enhancing technologies (PETs) are becoming increasingly crucial for addressing customer needs, security, privacy (e.g., enhancing anonymity and confidentiality), and regulatory requirements. However, applying PETs in organizations requires a precise understanding of use cases, technologies, and limitations. This paper investigates several industrial use cases, their characteristics, and th…
▽ More
Privacy-enhancing technologies (PETs) are becoming increasingly crucial for addressing customer needs, security, privacy (e.g., enhancing anonymity and confidentiality), and regulatory requirements. However, applying PETs in organizations requires a precise understanding of use cases, technologies, and limitations. This paper investigates several industrial use cases, their characteristics, and the potential applicability of PETs to these. We conduct expert interviews to identify and classify uses cases, a gray literature review of relevant open-source PET tools, and discuss how the use case characteristics can be addressed using PETs' capabilities. While we focus mainly on automotive use cases, the results also apply to other use case domains.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Understanding the Implementation of Technical Measures in the Process of Data Privacy Compliance: A Qualitative Study
Authors:
Oleksandra Klymenko,
Oleksandr Kosenkov,
Stephen Meisenbacher,
Parisa Elahidoost,
Daniel Mendez,
Florian Matthes
Abstract:
Modern privacy regulations, such as the General Data Protection Regulation (GDPR), address privacy in software systems in a technologically agnostic way by mentioning general "technical measures" for data privacy compliance rather than dictating how these should be implemented. An understanding of the concept of technical measures and how exactly these can be handled in practice, however, is not t…
▽ More
Modern privacy regulations, such as the General Data Protection Regulation (GDPR), address privacy in software systems in a technologically agnostic way by mentioning general "technical measures" for data privacy compliance rather than dictating how these should be implemented. An understanding of the concept of technical measures and how exactly these can be handled in practice, however, is not trivial due to its interdisciplinary nature and the necessary technical-legal interactions. We aim to investigate how the concept of technical measures for data privacy compliance is understood in practice as well as the technical-legal interaction intrinsic to the process of implementing those technical measures. We follow a research design that is 1) exploratory in nature, 2) qualitative, and 3) interview-based, with 16 selected privacy professionals in the technical and legal domains. Our results suggest that there is no clear mutual understanding and commonly accepted approach to handling technical measures. Both technical and legal roles are involved in the implementation of such measures. While they still often operate in separate spheres, a predominant opinion amongst the interviewees is to promote more interdisciplinary collaboration. Our empirical findings confirm the need for better interaction between legal and engineering teams when implementing technical measures for data privacy. We posit that interdisciplinary collaboration is paramount to a more complete understanding of technical measures, which currently lacks a mutually accepted notion. Yet, as strongly suggested by our results, there is still a lack of systematic approaches to such interaction. Therefore, the results strengthen our confidence in the need for further investigations into the technical-legal dynamic of data privacy compliance.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
Differential Privacy in Natural Language Processing: The Story So Far
Authors:
Oleksandra Klymenko,
Stephen Meisenbacher,
Florian Matthes
Abstract:
As the tide of Big Data continues to influence the landscape of Natural Language Processing (NLP), the utilization of modern NLP methods has grounded itself in this data, in order to tackle a variety of text-based tasks. These methods without a doubt can include private or otherwise personally identifiable information. As such, the question of privacy in NLP has gained fervor in recent years, coin…
▽ More
As the tide of Big Data continues to influence the landscape of Natural Language Processing (NLP), the utilization of modern NLP methods has grounded itself in this data, in order to tackle a variety of text-based tasks. These methods without a doubt can include private or otherwise personally identifiable information. As such, the question of privacy in NLP has gained fervor in recent years, coinciding with the development of new Privacy-Enhancing Technologies (PETs). Among these PETs, Differential Privacy boasts several desirable qualities in the conversation surrounding data privacy. Naturally, the question becomes whether Differential Privacy is applicable in the largely unstructured realm of NLP. This topic has sparked novel research, which is unified in one basic goal: how can one adapt Differential Privacy to NLP methods? This paper aims to summarize the vulnerabilities addressed by Differential Privacy, the current thinking, and above all, the crucial next steps that must be considered.
△ Less
Submitted 17 August, 2022;
originally announced August 2022.
-
Exponential Randomized Response: Boosting Utility in Differentially Private Selection
Authors:
Gonzalo Munilla Garrido,
Florian Matthes
Abstract:
A differentially private selection algorithm outputs from a finite set the item that approximately maximizes a data-dependent quality function. The most widely adopted mechanisms tackling this task are the pioneering exponential mechanism and permute-and-flip, which can offer utility improvements of up to a factor of two over the exponential mechanism. This work introduces a new differentially pri…
▽ More
A differentially private selection algorithm outputs from a finite set the item that approximately maximizes a data-dependent quality function. The most widely adopted mechanisms tackling this task are the pioneering exponential mechanism and permute-and-flip, which can offer utility improvements of up to a factor of two over the exponential mechanism. This work introduces a new differentially private mechanism for private selection and conducts theoretical and empirical comparisons with the above mechanisms. For reasonably common scenarios, our mechanism can provide utility improvements of factors significantly larger than two over the exponential and permute-and-flip mechanisms. Because the utility can deteriorate in niche scenarios, we recommend our mechanism to analysts who can tolerate lower utility for some datasets.
△ Less
Submitted 3 August, 2022; v1 submitted 11 January, 2022;
originally announced January 2022.
-
Do I Get the Privacy I Need? Benchmarking Utility in Differential Privacy Libraries
Authors:
Gonzalo Munilla Garrido,
Joseph Near,
Aitsam Muhammad,
Warren He,
Roman Matzutt,
Florian Matthes
Abstract:
An increasing number of open-source libraries promise to bring differential privacy to practice, even for non-experts. This paper studies five libraries that offer differentially private analytics: Google DP, SmartNoise, diffprivlib, diffpriv, and Chorus. We compare these libraries qualitatively (capabilities, features, and maturity) and quantitatively (utility and scalability) across four analyti…
▽ More
An increasing number of open-source libraries promise to bring differential privacy to practice, even for non-experts. This paper studies five libraries that offer differentially private analytics: Google DP, SmartNoise, diffprivlib, diffpriv, and Chorus. We compare these libraries qualitatively (capabilities, features, and maturity) and quantitatively (utility and scalability) across four analytics queries (count, sum, mean, and variance) executed on synthetic and real-world datasets. We conclude that these libraries provide similar utility (except in some notable scenarios). However, there are significant differences in the features provided, and we find that no single library excels in all areas. Based on our results, we provide guidance for practitioners to help in choosing a suitable library, guidance for library designers to enhance their software, and guidance for researchers on open challenges in differential privacy tools for non-experts.
△ Less
Submitted 22 September, 2021;
originally announced September 2021.
-
Revealing the Landscape of Privacy-Enhancing Technologies in the Context of Data Markets for the IoT: A Systematic Literature Review
Authors:
Gonzalo Munilla Garrido,
Johannes Sedlmeir,
Ömer Uludağ,
Ilias Soto Alaoui,
Andre Luckow,
Florian Matthes
Abstract:
IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for th…
▽ More
IoT data markets in public and private institutions have become increasingly relevant in recent years because of their potential to improve data availability and unlock new business models. However, exchanging data in markets bears considerable challenges related to disclosing sensitive information. Despite considerable research focused on different aspects of privacy-enhancing data markets for the IoT, none of the solutions proposed so far seems to find a practical adoption. Thus, this study aims to organize the state-of-the-art solutions, analyze and scope the technologies that have been suggested in this context, and structure the remaining challenges to determine areas where future research is required. To accomplish this goal, we conducted a systematic literature review on privacy enhancement in data markets for the IoT, covering 50 publications dated up to July 2020, and provided updates with 24 publications dated up to May 2022. Our results indicate that most research in this area has emerged only recently, and no IoT data market architecture has established itself as canonical. Existing solutions frequently lack the required combination of anonymization and secure computation technologies. Furthermore, there is no consensus on the appropriate use of blockchain technology for IoT data markets and a low degree of leveraging existing libraries or reusing generic data market architectures. We also identified significant challenges remaining, such as the copy problem and the recursive enforcement problem that-while solutions have been suggested to some extent-are often not sufficiently addressed in proposed designs. We conclude that privacy-enhancing technologies need further improvements to positively impact data markets so that, ultimately, the value of data is preserved through data scarcity and users' privacy and businesses-critical information are protected.
△ Less
Submitted 12 July, 2022; v1 submitted 25 July, 2021;
originally announced July 2021.