Search | arXiv e-print repository

doi 10.1049/PBPC027E_ch7

IoT Monitoring with Blockchain: Generating Smart Contracts from Service Level Agreements

Authors: Adam Booth, Awatif Alqahtani, Ellis Solaiman

Abstract: A Service Level Agreement (SLA) is a commitment between a client and provider that assures the quality of service (QoS) a client can expect to receive when purchasing a service. However, evidence of SLA violations in Internet of Things (IoT) service monitoring data can be manipulated by the provider or consumer, resulting in an issue of trust between contracted parties. The following research aims… ▽ More A Service Level Agreement (SLA) is a commitment between a client and provider that assures the quality of service (QoS) a client can expect to receive when purchasing a service. However, evidence of SLA violations in Internet of Things (IoT) service monitoring data can be manipulated by the provider or consumer, resulting in an issue of trust between contracted parties. The following research aims to explore the use of blockchain technology in monitoring IoT systems using smart contracts so that SLA violations captured are irrefutable amongst service providers and clients. The research focuses on the development of a Java library that is capable of generating a smart contract from a given SLA. A smart contract generated by this library is validated through a mock scenario presented in the form of a Remote Patient Monitoring IoT system. In this scenario, the findings demonstrate a 100 percent success rate in capturing all emulated violations. △ Less

Submitted 23 August, 2024; originally announced August 2024.

Journal ref: Managing Internet of Things Applications across Edge and Cloud Data Centres, IET, 2024

arXiv:2402.04922 [pdf, other]

Voronoi Candidates for Bayesian Optimization

Authors: Nathan Wycoff, John W. Smith, Annie S. Booth, Robert B. Gramacy

Abstract: Bayesian optimization (BO) offers an elegant approach for efficiently optimizing black-box functions. However, acquisition criteria demand their own challenging inner-optimization, which can induce significant overhead. Many practical BO methods, particularly in high dimension, eschew a formal, continuous optimization of the acquisition function and instead search discretely over a finite set of s… ▽ More Bayesian optimization (BO) offers an elegant approach for efficiently optimizing black-box functions. However, acquisition criteria demand their own challenging inner-optimization, which can induce significant overhead. Many practical BO methods, particularly in high dimension, eschew a formal, continuous optimization of the acquisition function and instead search discretely over a finite set of space-filling candidates. Here, we propose to use candidates which lie on the boundary of the Voronoi tessellation of the current design points, so they are equidistant to two or more of them. We discuss strategies for efficient implementation by directly sampling the Voronoi boundary without explicitly generating the tessellation, thus accommodating large designs in high dimension. On a battery of test problems optimized via Gaussian processes with expected improvement, our proposed approach significantly improves the execution time of a multi-start continuous search without a loss in accuracy. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: comments very welcome

arXiv:2309.07340 [pdf, other]

Informative path planning for scalar dynamic reconstruction using coregionalized Gaussian processes and a spatiotemporal kernel

Authors: Lorenzo Booth, Stefano Carpin

Abstract: The proliferation of unmanned vehicles offers many opportunities for solving environmental sampling tasks with applications in resource monitoring and precision agriculture. Informative path planning (IPP) includes a family of methods which offer improvements over traditional surveying techniques for suggesting locations for observation collection. In this work, we present a novel solution to the… ▽ More The proliferation of unmanned vehicles offers many opportunities for solving environmental sampling tasks with applications in resource monitoring and precision agriculture. Informative path planning (IPP) includes a family of methods which offer improvements over traditional surveying techniques for suggesting locations for observation collection. In this work, we present a novel solution to the IPP problem by using a coregionalized Gaussian processes to estimate a dynamic scalar field that varies in space and time. Our method improves previous approaches by using a composite kernel accounting for spatiotemporal correlations and at the same time, can be readily incorporated in existing IPP algorithms. Through extensive simulations, we show that our novel modeling approach leads to more accurate estimations when compared with formerly proposed methods that do not account for the temporal dimension. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: Accepted to IROS 2023

arXiv:2210.11917 [pdf, other]

A portable coding strategy to exploit vectorization on combustion simulations

Authors: Fabio Banchelli, Guillermo Oyarzun, Marta Garcia-Gasulla, Filippo Mantovani, Ambrus Both, Guillaume Houzeaux, Daniel Mira

Abstract: The complexity of combustion simulations demands the latest high-performance computing tools to accelerate its time-to-solution results. A current trend on HPC systems is the utilization of CPUs with SIMD or vector extensions to exploit data parallelism. Our work proposes a strategy to improve the automatic vectorization of finite element-based scientific codes. The approach applies a parametric c… ▽ More The complexity of combustion simulations demands the latest high-performance computing tools to accelerate its time-to-solution results. A current trend on HPC systems is the utilization of CPUs with SIMD or vector extensions to exploit data parallelism. Our work proposes a strategy to improve the automatic vectorization of finite element-based scientific codes. The approach applies a parametric configuration to the data structures to help the compiler detect the block of codes that can take advantage of vector computation while maintaining the code portable. A detailed analysis of the computational impact of this methodology on the different stages of a CFD solver is studied on the PRECCINSTA burner simulation. Our parametric implementation has proven to help the compiler generate more vector instructions in the assembly operation: this results in a reduction of up to 9.3 times of the total executed instruction maintaining constant the Instructions Per Cycle and the CPU frequency. The proposed strategy improves the performance of the CFD case under study up to 4.67 times on the MareNostrum 4 supercomputer. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2202.00120 [pdf, other]

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers

Authors: Aleksandr Perevalov, Dennis Diefenbach, Ricardo Usbeck, Andreas Both

Abstract: The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems,… ▽ More The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as "endangered" by UNESCO. We call the extended dataset QALD-9-plus and made it available online https://github.com/Perevalov/qald_9_plus. △ Less

Submitted 7 February, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

arXiv:2201.08174 [pdf, other]

Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis

Authors: Aleksandr Perevalov, Xi Yan, Liubov Kovriguina, Longquan Jiang, Andreas Both, Ricardo Usbeck

Abstract: Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing diffe… ▽ More Data-driven systems need to be evaluated to establish trust in the scientific approach and its applicability. In particular, this is true for Knowledge Graph (KG) Question Answering (QA), where complex data structures are made accessible via natural-language interfaces. Evaluating the capabilities of these systems has been a driver for the community for more than ten years while establishing different KGQA benchmark datasets. However, comparing different approaches is cumbersome. The lack of existing and curated leaderboards leads to a missing global view over the research field and could inject mistrust into the results. In particular, the latest and most-used datasets in the KGQA community, LC-QuAD and QALD, miss providing central and up-to-date points of trust. In this paper, we survey and analyze a wide range of evaluation results with significant coverage of 100 publications and 98 systems from the last decade. We provide a new central and open leaderboard for any KGQA benchmark dataset as a focal point for the community - https://kgqa.github.io/leaderboard. Our analysis highlights existing problems during the evaluation of KGQA systems. Thus, we will point to possible improvements for future evaluations. △ Less

Submitted 20 January, 2022; originally announced January 2022.

arXiv:2112.12071 [pdf, other]

doi 10.1080/15472450.2024.2372894

Activity-based and agent-based Transport model of Melbourne (AToM): an open multi-modal transport simulation model for Greater Melbourne

Authors: Afshin Jafari, Dhirendra Singh, Alan Both, Mahsa Abdollahyar, Lucy Gunn, Steve Pemberton, Billie Giles-Corti

Abstract: Agent-based and activity-based models for simulating transportation systems have attracted significant attention in recent years. Few studies, however, include a detailed representation of active modes of transportation - such as walking and cycling - at a city-wide level, where dominating motorised modes are often of primary concern. This paper presents an open workflow for creating a multi-modal… ▽ More Agent-based and activity-based models for simulating transportation systems have attracted significant attention in recent years. Few studies, however, include a detailed representation of active modes of transportation - such as walking and cycling - at a city-wide level, where dominating motorised modes are often of primary concern. This paper presents an open workflow for creating a multi-modal agent-based and activity-based transport simulation model, focusing on Greater Melbourne, and including the process of mode choice calibration for the four main travel modes of driving, public transport, cycling and walking. The synthetic population generated and used as an input for the simulation model represented Melbourne's population based on Census 2016, with daily activities and trips based on the Victoria's 2016-18 travel survey data. The road network used in the simulation model includes all public roads accessible via the included travel modes. We compared the output of the simulation model with observations from the real world in terms of mode share, road volume, travel time, and travel distance. Through these comparisons, we showed that our model is suitable for studying mode choice and road usage behaviour of travellers. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2112.05452 [pdf, other]

Improving the Question Answering Quality using Answer Candidate Filtering based on Natural-Language Features

Authors: Aleksandr Gashkov, Aleksandr Perevalov, Maria Eltsova, Andreas Both

Abstract: Software with natural-language user interfaces has an ever-increasing importance. However, the quality of the included Question Answering (QA) functionality is still not sufficient regarding the number of questions that are answered correctly. In our work, we address the research problem of how the QA quality of a given system can be improved just by evaluating the natural-language input (i.e., th… ▽ More Software with natural-language user interfaces has an ever-increasing importance. However, the quality of the included Question Answering (QA) functionality is still not sufficient regarding the number of questions that are answered correctly. In our work, we address the research problem of how the QA quality of a given system can be improved just by evaluating the natural-language input (i.e., the user's question) and output (i.e., the system's answer). Our main contribution is an approach capable of identifying wrong answers provided by a QA system. Hence, filtering incorrect answers from a list of answer candidates is leading to a highly improved QA quality. In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers, which increases the QA quality significantly in comparison to the non-filtered output of a system. △ Less

Submitted 10 December, 2021; originally announced December 2021.

arXiv:2111.10061 [pdf, other]

An Activity-Based Model of Transport Demand for Greater Melbourne

Authors: Alan Both, Dhirendra Singh, Afshin Jafari, Billie Giles-Corti, Lucy Gunn

Abstract: In this paper, we present an algorithm for creating a synthetic population for the Greater Melbourne area using a combination of machine learning, probabilistic, and gravity-based approaches. We combine these techniques in a hybrid model with three primary innovations: 1. when assigning activity patterns, we generate individual activity chains for every agent, tailored to their cohort; 2. when sel… ▽ More In this paper, we present an algorithm for creating a synthetic population for the Greater Melbourne area using a combination of machine learning, probabilistic, and gravity-based approaches. We combine these techniques in a hybrid model with three primary innovations: 1. when assigning activity patterns, we generate individual activity chains for every agent, tailored to their cohort; 2. when selecting destinations, we aim to strike a balance between the distance-decay of trip lengths and the activity-based attraction of destination locations; and 3. we take into account the number of trips remaining for an agent so as to ensure they do not select a destination that would be unreasonable to return home from. Our method is completely open and replicable, requiring only publicly available data to generate a synthetic population of agents compatible with commonly used agent-based modeling software such as MATSim. The synthetic population was found to be accurate in terms of distance distribution, mode choice, and destination choice for a variety of population sizes. △ Less

Submitted 19 November, 2021; originally announced November 2021.

Comments: 35 pages, 10 figures

arXiv:2105.12674 [pdf]

doi 10.1108/MD-10-2017-1001

Measuring information exchange and brokerage capacity of healthcare teams

Authors: F. Grippa, J. Bucuvalas, A. Booth, E. Alessandrini, A. Fronzetti Colladon, L. M. Wade

Abstract: Purpose: The purpose of this paper is to explore possible factors impacting team performance in healthcare, by focusing on information exchange within and across hospital's boundaries. Design/methodology/approach: Through a web-survey and group interviews, the authors collected data on the communication networks of 31 members of four interdisciplinary healthcare teams involved in a system redesign… ▽ More Purpose: The purpose of this paper is to explore possible factors impacting team performance in healthcare, by focusing on information exchange within and across hospital's boundaries. Design/methodology/approach: Through a web-survey and group interviews, the authors collected data on the communication networks of 31 members of four interdisciplinary healthcare teams involved in a system redesign initiative within a large US children's hospital. The authors mapped their internal and external social networks based on management advice, technical support and knowledge dissemination within and across departments, studying interaction patterns that involved more than 700 actors. The authors then compared team performance and social network metrics such as degree, closeness and betweenness centrality, and computed cross ties and constraint levels for each team. Findings: The results indicate that highly effective teams were more inwardly focused and less connected to outside members. Moreover, highly recognized teams communicated frequently but, overall, less intensely than the others. Originality/value: Mapping knowledge flows and balancing internal focus and outward connectivity of interdisciplinary teams may help healthcare decision makers in their attempt to achieve high value for patients, families and employees. △ Less

Submitted 26 May, 2021; originally announced May 2021.

ACM Class: J.4

Journal ref: Management Decision 56(10), 2239-2251 (2018)

arXiv:2104.10108 [pdf]

Development of a dynamic type 2 diabetes risk prediction tool: a UK Biobank study

Authors: Nikola Dolezalova, Massimo Cairo, Alex Despotovic, Adam T. C. Booth, Angus B. Reed, Davide Morelli, David Plans

Abstract: Diabetes affects over 400 million people and is among the leading causes of morbidity worldwide. Identification of high-risk individuals can support early diagnosis and prevention of disease development through lifestyle changes. However, the majority of existing risk scores require information about blood-based factors which are not obtainable outside of the clinic. Here, we aimed to develop an a… ▽ More Diabetes affects over 400 million people and is among the leading causes of morbidity worldwide. Identification of high-risk individuals can support early diagnosis and prevention of disease development through lifestyle changes. However, the majority of existing risk scores require information about blood-based factors which are not obtainable outside of the clinic. Here, we aimed to develop an accessible solution that could be deployed digitally and at scale. We developed a predictive 10-year type 2 diabetes risk score using 301 features derived from 472,830 participants in the UK Biobank dataset while excluding any features which are not easily obtainable by a smartphone. Using a data-driven feature selection process, 19 features were included in the final reduced model. A Cox proportional hazards model slightly overperformed a DeepSurv model trained using the same features, achieving a concordance index of 0.818 (95% CI: 0.812-0.823), compared to 0.811 (95% CI: 0.806-0.815). The final model showed good calibration. This tool can be used for clinical screening of individuals at risk of developing type 2 diabetes and to foster patient empowerment by broadening their knowledge of the factors affecting their personal risk. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: 12 pages

arXiv:2104.09226 [pdf]

Machine learning approach to dynamic risk modeling of mortality in COVID-19: a UK Biobank study

Authors: Mohammad A. Dabbah, Angus B. Reed, Adam T. C. Booth, Arrash Yassaee, Alex Despotovic, Benjamin Klasmer, Emily Binning, Mert Aral, David Plans, Alain B. Labrique, Diwakar Mohan

Abstract: The COVID-19 pandemic has created an urgent need for robust, scalable monitoring tools supporting stratification of high-risk patients. This research aims to develop and validate prediction models, using the UK Biobank, to estimate COVID-19 mortality risk in confirmed cases. From the 11,245 participants testing positive for COVID-19, we develop a data-driven random forest classification model with… ▽ More The COVID-19 pandemic has created an urgent need for robust, scalable monitoring tools supporting stratification of high-risk patients. This research aims to develop and validate prediction models, using the UK Biobank, to estimate COVID-19 mortality risk in confirmed cases. From the 11,245 participants testing positive for COVID-19, we develop a data-driven random forest classification model with excellent performance (AUC: 0.91), using baseline characteristics, pre-existing conditions, symptoms, and vital signs, such that the score could dynamically assess mortality risk with disease deterioration. We also identify several significant novel predictors of COVID-19 mortality with equivalent or greater predictive value than established high-risk comorbidities, such as detailed anthropometrics and prior acute kidney failure, urinary tract infection, and pneumonias. The model design and feature selection enables utility in outpatient settings. Possible applications include supporting individual-level risk profiling and monitoring disease progression across patients with COVID-19 at-scale, especially in hospital-at-home settings. △ Less

Submitted 19 April, 2021; originally announced April 2021.

Comments: 20 pages, 3 figures

arXiv:2102.10966 [pdf, other]

Better Call the Plumber: Orchestrating Dynamic Information Extraction Pipelines

Authors: Mohamad Yaser Jaradeh, Kuldeep Singh, Markus Stocker, Andreas Both, Sören Auer

Abstract: In the last decade, a large number of Knowledge Graph (KG) information extraction approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG information extraction (IE) have not been studied in the literature. We propose Plumber, the first framework that brings together the research community's disjoint IE efforts. The Plum… ▽ More In the last decade, a large number of Knowledge Graph (KG) information extraction approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG information extraction (IE) have not been studied in the literature. We propose Plumber, the first framework that brings together the research community's disjoint IE efforts. The Plumber architecture comprises 33 reusable components for various KG information extraction subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components,Plumber dynamically generates suitable information extraction pipelines and offers overall 264 distinct pipelines.We study the optimization problem of choosing suitable pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over two KGs: DBpedia, and Open Research Knowledge Graph (ORKG). Our results demonstrate the effectiveness of Plumber in dynamically generating KG information extraction pipelines,outperforming all baselines agnostics of the underlying KG. Furthermore,we provide an analysis of collective failure cases, study the similarities and synergies among integrated components, and discuss their limitations. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted in ICWE 2021

arXiv:2007.07060 [pdf, other]

Template-Based Question Answering over Linked Geospatial Data

Authors: Dharmen Punjani, Markos Iliakis, Theodoros Stefou, Kuldeep Singh, Andreas Both, Manolis Koubarakis, Iosif Angelidis, Konstantina Bereta, Themis Beris, Dimitris Bilidas, Theofilos Ioannidis, Nikolaos Karalis, Christoph Lange, Despina-Athanasia Pantazi, Christos Papaloukas, Georgios Stamoulis

Abstract: Large amounts of geospatial data have been made available recently on the linked open data cloud and the portals of many national cartographic agencies (e.g., OpenStreetMap data, administrative geographies of various countries, or land cover/land use data sets). These datasets use various geospatial vocabularies and can be queried using SPARQL or its OGC-standardized extension GeoSPARQL. In this p… ▽ More Large amounts of geospatial data have been made available recently on the linked open data cloud and the portals of many national cartographic agencies (e.g., OpenStreetMap data, administrative geographies of various countries, or land cover/land use data sets). These datasets use various geospatial vocabularies and can be queried using SPARQL or its OGC-standardized extension GeoSPARQL. In this paper, we go beyond these approaches to offer a question-answering engine for natural language questions on top of linked geospatial data sources. Our system has been implemented as re-usable components of the Frankenstein question answering architecture. We give a detailed description of the system's architecture, its underlying algorithms, and its evaluation using a set of 201 natural language questions. The set of questions is offered to the research community as a gold standard dataset for the comparative evaluation of future geospatial question answering engines. △ Less

Submitted 29 April, 2021; v1 submitted 14 July, 2020; originally announced July 2020.

Comments: 27 pages, 2 figures

arXiv:1803.00832 [pdf, other]

Towards a Question Answering System over the Semantic Web

Authors: Dennis Diefenbach, Andreas Both, Kamal Singh, Pierre Maret

Abstract: Thanks to the development of the Semantic Web, a lot of new structured data has become available on the Web in the form of knowledge bases (KBs). Making this valuable data accessible and usable for end-users is one of the main goals of Question Answering (QA) over KBs. Most current QA systems query one KB, in one language (namely English). The existing approaches are not designed to be easily adap… ▽ More Thanks to the development of the Semantic Web, a lot of new structured data has become available on the Web in the form of knowledge bases (KBs). Making this valuable data accessible and usable for end-users is one of the main goals of Question Answering (QA) over KBs. Most current QA systems query one KB, in one language (namely English). The existing approaches are not designed to be easily adaptable to new KBs and languages. We first introduce a new approach for translating natural language questions to SPARQL queries. It is able to query several KBs simultaneously, in different languages, and can easily be ported to other KBs and languages. In our evaluation, the impact of our approach is proven using 5 different well-known and large KBs: Wikidata, DBpedia, MusicBrainz, DBLP and Freebase as well as 5 different languages namely English, German, French, Italian and Spanish. Second, we show how we integrated our approach, to make it easily accessible by the research community and by end-users. To summarize, we provided a conceptional solution for multilingual, KB-agnostic Question Answering over the Semantic Web. The provided first approximation validates this concept. △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: There is a Patent Pending for the presented approach. It was submitted the 18 January 2018 at the EPO and has the number EP18305035.0

arXiv:1610.09514 [pdf]

REFOCUS: Current & Future Search Interface Requirements for German-speaking Users

Authors: Maximilian Speicher, Andreas Both, Martin Gaedke

Abstract: While smartphones are widely used for web browsing, also other novel devices like Smart TVs become increasingly popular. Yet, current interfaces do not cater for the newly available devices beyond touch and small screens, if at all for the latter. Particularly search engines -- today's entry points of the WWW -- must ensure their interfaces are easy to use on any web-enabled device. We report on a… ▽ More While smartphones are widely used for web browsing, also other novel devices like Smart TVs become increasingly popular. Yet, current interfaces do not cater for the newly available devices beyond touch and small screens, if at all for the latter. Particularly search engines -- today's entry points of the WWW -- must ensure their interfaces are easy to use on any web-enabled device. We report on a survey that investigated (1) users' perception and usage of current search interfaces, and (2) their expectations towards current and future search interfaces. Users are mostly satisfied with desktop and mobile search, but seem to be skeptical towards web search with novel devices and input modalities. Hence, we derive REFOCUS -- a novel set of requirements for current and future search interfaces, which shall address the demand for improvement of novel web search and has been validated by 12 dedicated experts. △ Less

Submitted 10 August, 2022; v1 submitted 29 October, 2016; originally announced October 2016.

Comments: Originally published by IADIS in the Proceedings of the 15th International Conference on WWW/Internet (ICWI '16). The final publication is available at http://iadisportal.org/digital-library/refocus-current-future-search-interface-requirements-for-german-speaking-users

arXiv:1403.6397 [pdf, other]

Evaluating topic coherence measures

Authors: Frank Rosner, Alexander Hinneburg, Michael Röder, Martin Nettling, Andreas Both

Abstract: Topic models extract representative word sets - called topics - from word counts in documents without requiring any semantic annotations. Topics are not guaranteed to be well interpretable, therefore, coherence measures have been proposed to distinguish between good and bad topics. Studies of topic coherence so far are limited to measures that score pairs of individual words. For the first time, w… ▽ More Topic models extract representative word sets - called topics - from word counts in documents without requiring any semantic annotations. Topics are not guaranteed to be well interpretable, therefore, coherence measures have been proposed to distinguish between good and bad topics. Studies of topic coherence so far are limited to measures that score pairs of individual words. For the first time, we include coherence measures from scientific philosophy that score pairs of more complex word subsets and apply them to topic scoring. △ Less

Submitted 25 March, 2014; originally announced March 2014.

Comments: This work has been presented at the "Topic Models: Computation, Application and Evaluation" workshop at the "Neural Information Processing Systems" conference 2013

arXiv:1403.0613 [pdf, other]

doi 10.1016/j.artint.2015.03.010

On Redundant Topological Constraints

Authors: Sanjiang Li, Zhiguo Long, Weiming Liu, Matt Duckham, Alan Both

Abstract: The Region Connection Calculus (RCC) is a well-known calculus for representing part-whole and topological relations. It plays an important role in qualitative spatial reasoning, geographical information science, and ontology. The computational complexity of reasoning with RCC5 and RCC8 (two fragments of RCC) as well as other qualitative spatial/temporal calculi has been investigated in depth in th… ▽ More The Region Connection Calculus (RCC) is a well-known calculus for representing part-whole and topological relations. It plays an important role in qualitative spatial reasoning, geographical information science, and ontology. The computational complexity of reasoning with RCC5 and RCC8 (two fragments of RCC) as well as other qualitative spatial/temporal calculi has been investigated in depth in the literature. Most of these works focus on the consistency of qualitative constraint networks. In this paper, we consider the important problem of redundant qualitative constraints. For a set $Γ$ of qualitative constraints, we say a constraint $(x R y)$ in $Γ$ is redundant if it is entailed by the rest of $Γ$. A prime subnetwork of $Γ$ is a subset of $Γ$ which contains no redundant constraints and has the same solution set as $Γ$. It is natural to ask how to compute such a prime subnetwork, and when it is unique. In this paper, we show that this problem is in general intractable, but becomes tractable if $Γ$ is over a tractable subalgebra $\mathcal{S}$ of a qualitative calculus. Furthermore, if $\mathcal{S}$ is a subalgebra of RCC5 or RCC8 in which weak composition distributes over nonempty intersections, then $Γ$ has a unique prime subnetwork, which can be obtained in cubic time by removing all redundant constraints simultaneously from $Γ$. As a byproduct, we show that any path-consistent network over such a distributive subalgebra is weakly globally consistent and minimal. A thorough empirical analysis of the prime subnetwork upon real geographical data sets demonstrates the approach is able to identify significantly more redundant constraints than previously proposed algorithms, especially in constraint networks with larger proportions of partial overlap relations. △ Less

Submitted 13 February, 2015; v1 submitted 3 March, 2014; originally announced March 2014.

Comments: An extended abstract appears in Proceedings of the 14th International Conference on the Principles of Knowledge Representation and Reasoning (KR-14), Vienna, Austria, July 20-24, 2014

Journal ref: Artificial Intelligence 225 (2015) 51-76

Showing 1–18 of 18 results for author: Both, A