Search | arXiv e-print repository

arXiv:2407.04472 [pdf]

EventChat: Implementation and user-centric evaluation of a large language model-driven conversational recommender system for exploring leisure events in an SME context

Authors: Hannes Kunstmann, Joseph Ollier, Joel Persson, Florian von Wangenheim

Abstract: Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock… ▽ More Large language models (LLMs) present an enormous evolution in the strategic potential of conversational recommender systems (CRS). Yet to date, research has predominantly focused upon technical frameworks to implement LLM-driven CRS, rather than end-user evaluations or strategic implications for firms, particularly from the perspective of a small to medium enterprises (SME) that makeup the bedrock of the global economy. In the current paper, we detail the design of an LLM-driven CRS in an SME setting, and its subsequent performance in the field using both objective system metrics and subjective user evaluations. While doing so, we additionally outline a short-form revised ResQue model for evaluating LLM-driven CRS, enabling replicability in a rapidly evolving field. Our results reveal good system performance from a user experience perspective (85.5% recommendation accuracy) but underscore latency, cost, and quality issues challenging business viability. Notably, with a median cost of $0.04 per interaction and a latency of 5.7s, cost-effectiveness and response time emerge as crucial areas for achieving a more user-friendly and economically viable LLM-driven CRS for SME settings. One major driver of these costs is the use of an advanced LLM as a ranker within the retrieval-augmented generation (RAG) technique. Our results additionally indicate that relying solely on approaches such as Prompt-based learning with ChatGPT as the underlying LLM makes it challenging to achieve satisfying quality in a production environment. Strategic considerations for SMEs deploying an LLM-driven CRS are outlined, particularly considering trade-offs in the current technical landscape. △ Less

Submitted 9 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

Comments: 27 pages, 3 tables, 5 figures, pre-print manuscript, updated version of manuscript due to typo (previous version, Figure 5 was incorrectly named Figure 6)

MSC Class: 68T50 ACM Class: I.2.7; H.5.2

arXiv:2305.17553 [pdf, other]

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Authors: Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez

Abstract: Recent model editing techniques promise to mitigate the problem of memorizing false or outdated associations during LLM training. However, we show that these techniques can introduce large unwanted side effects which are not detected by existing specificity benchmarks. We extend the existing CounterFact benchmark to include a dynamic component and dub our benchmark CounterFact+. Additionally, we e… ▽ More Recent model editing techniques promise to mitigate the problem of memorizing false or outdated associations during LLM training. However, we show that these techniques can introduce large unwanted side effects which are not detected by existing specificity benchmarks. We extend the existing CounterFact benchmark to include a dynamic component and dub our benchmark CounterFact+. Additionally, we extend the metrics used for measuring specificity by a principled KL divergence-based metric. We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity. Our findings highlight the need for improved specificity benchmarks that identify and prevent unwanted side effects. △ Less

Submitted 3 June, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

Comments: To be published in ACL Findings 2023; for code see https://github.com/apartresearch/specificityplus; for a homepage see https://specificityplus.apartresearch.com/; updated Figures to uniform style

ACM Class: I.2.7

arXiv:2204.07124 [pdf, other]

Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine

Authors: Theresa Blümlein, Joel Persson, Stefan Feuerriegel

Abstract: Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records… ▽ More Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine. △ Less

Submitted 19 June, 2023; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: 24 pages, 4 figures

Journal ref: In Machine Learning for Healthcare Conference (pp. 146-171). PMLR 2022

arXiv:2106.00356 [pdf, other]

Predicting COVID-19 Spread from Large-Scale Mobility Data

Authors: Amray Schwabe, Joel Persson, Stefan Feuerriegel

Abstract: To manage the COVID-19 epidemic effectively, decision-makers in public health need accurate forecasts of case numbers. A potential near real-time predictor of future case numbers is human mobility; however, research on the predictive power of mobility is lacking. To fill this gap, we introduce a novel model for epidemic forecasting based on mobility data, called mobility marked Hawkes model. The p… ▽ More To manage the COVID-19 epidemic effectively, decision-makers in public health need accurate forecasts of case numbers. A potential near real-time predictor of future case numbers is human mobility; however, research on the predictive power of mobility is lacking. To fill this gap, we introduce a novel model for epidemic forecasting based on mobility data, called mobility marked Hawkes model. The proposed model consists of three components: (1) A Hawkes process captures the transmission dynamics of infectious diseases. (2) A mark modulates the rate of infections, thus accounting for how the reproduction number R varies across space and time. The mark is modeled using a regularized Poisson regression based on mobility covariates. (3) A correction procedure incorporates new cases seeded by people traveling between regions. Our model was evaluated on the COVID-19 epidemic in Switzerland. Specifically, we used mobility data from February through April 2020, amounting to approximately 1.5 billion trips. Trip counts were derived from large-scale telecommunication data, i.e., cell phone pings from the Swisscom network, the largest telecommunication provider in Switzerland. We compared our model against various state-of-the-art baselines in terms of out-of-sample root mean squared error. We found that our model outperformed the baselines by 15.52%. The improvement was consistently achieved across different forecast horizons between 5 and 21 days. In addition, we assessed the predictive power of conventional point of interest data, confirming that telecommunication data is superior. To the best of our knowledge, our work is the first to predict the spread of COVID-19 from telecommunication data. Altogether, our work contributes to previous research by developing a scalable early warning system for decision-makers in public health tasked with controlling the spread of infectious diseases. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: 9 pages, 3 figures. Accepted for publication in KDD '21: 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

arXiv:2006.11801 [pdf]

Internet of Threats Introspection in Dynamic Intelligent Virtual Sensing

Authors: Victor R. Kebande, Joseph Bugeja, Jan A. Persson

Abstract: Continued ubiquity of communication infrastructure across Internet of Things (IoT) ecosystems has seen persistent advances of dynamic, intelligent, virtualised sensing and actuation. This has led to effective interaction across the connected ecosystem of -things. Furthermore, this has enabled the creation of smart environments that has created the need for the development of different IoT protocol… ▽ More Continued ubiquity of communication infrastructure across Internet of Things (IoT) ecosystems has seen persistent advances of dynamic, intelligent, virtualised sensing and actuation. This has led to effective interaction across the connected ecosystem of -things. Furthermore, this has enabled the creation of smart environments that has created the need for the development of different IoT protocols that support the relaying of information across billions of electronic devices over the Internet. That notwithstanding, the phenomenon of virtual sensors that are supported by IoT technologies like Wireless Sensor Networks (WSNs), RFID, WIFI, Bluetooth, ZigBee, IEEE 802.15.4, etc., emulates physical sensors, and enables more efficient resource management through the dynamic allocation of virtual sensor resources. A distinctive example of this has been the proposition of the Dynamic Intelligent Virtual Sensors (DIVS). This DIVS concept is a novel proposition that allows sensing to be done by the use of logical instances through the use of labeled data. This allows for making accurate predictions during data fusion. However, a potential security attack on DIVS may end up providing false labels during the User Feedback Process (UFP), which may interfere with the accuracy of DIVS. This paper investigates the threat landscape in DIVS when employed in IoT ecosystems, in order to identify the extent to which the severity of these threats may hinder accurate prediction of DIVS in IoT, based on labeled data. △ Less

Submitted 21 June, 2020; originally announced June 2020.

Showing 1–5 of 5 results for author: Persson, J