Search | arXiv e-print repository

CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models

Authors: Shengye Wan, Cyrus Nikolaidis, Daniel Song, David Molnar, James Crnkovich, Jayson Grace, Manish Bhatt, Sahana Chennabasappa, Spencer Whitman, Stephanie Ding, Vlad Ionescu, Yue Li, Joshua Saxe

Abstract: We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabili… ▽ More We are releasing a new suite of security benchmarks for LLMs, CYBERSECEVAL 3, to continue the conversation on empirically measuring LLM cybersecurity risks and capabilities. CYBERSECEVAL 3 assesses 8 different risks across two broad categories: risk to third parties, and risk to application developers and end users. Compared to previous work, we add new areas focused on offensive security capabilities: automated social engineering, scaling manual offensive cyber operations, and autonomous offensive cyber operations. In this paper we discuss applying these benchmarks to the Llama 3 models and a suite of contemporaneous state-of-the-art LLMs, enabling us to contextualize risks both with and without mitigations in place. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.21783 [pdf, other]

The Llama 3 Herd of Models

Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development. △ Less

Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

arXiv:2404.13161 [pdf, other]

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Authors: Manish Bhatt, Sahana Chennabasappa, Yue Li, Cyrus Nikolaidis, Daniel Song, Shengye Wan, Faizan Ahmad, Cornelius Aschermann, Yaohui Chen, Dhaval Kapil, David Molnar, Spencer Whitman, Joshua Saxe

Abstract: Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral,… ▽ More Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved problem; for example, all tested models showed between 26% and 41% successful prompt injection tests. We further introduce the safety-utility tradeoff: conditioning an LLM to reject unsafe prompts can cause the LLM to falsely reject answering benign prompts, which lowers utility. We propose quantifying this tradeoff using False Refusal Rate (FRR). As an illustration, we introduce a novel test set to quantify FRR for cyberattack helpfulness risk. We find many LLMs able to successfully comply with "borderline" benign requests while still rejecting most unsafe requests. Finally, we quantify the utility of LLMs for automating a core cybersecurity task, that of exploiting software vulnerabilities. This is important because the offensive capabilities of LLMs are of intense interest; we quantify this by creating novel test sets for four representative problems. We find that models with coding capabilities perform better than those without, but that further work is needed for LLMs to become proficient at exploit generation. Our code is open source and can be used to evaluate other LLMs. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2402.16822 [pdf, other]

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Authors: Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

Abstract: As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to adversarial attacks is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a n… ▽ More As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to adversarial attacks is of paramount importance. Existing methods for identifying adversarial prompts tend to focus on specific domains, lack diversity, or require extensive human annotations. To address these limitations, we present Rainbow Teaming, a novel black-box approach for producing a diverse collection of adversarial prompts. Rainbow Teaming casts adversarial prompt generation as a quality-diversity problem, and uses open-ended search to generate prompts that are both effective and diverse. Focusing on the safety domain, we use Rainbow Teaming to target various state-of-the-art LLMs, including the Llama 2 and Llama 3 models. Our approach reveals hundreds of effective adversarial prompts, with an attack success rate exceeding 90% across all tested models. Furthermore, we demonstrate that fine-tuning models with synthetic data generated by the Rainbow Teaming method significantly enhances their safety without sacrificing general performance or helpfulness. We additionally explore the versatility of Rainbow Teaming by applying it to question answering and cybersecurity, showcasing its potential to drive robust open-ended self-improvement in a wide range of applications. △ Less

Submitted 22 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2312.04724 [pdf, other]

Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models

Authors: Manish Bhatt, Sahana Chennabasappa, Cyrus Nikolaidis, Shengye Wan, Ivan Evtimov, Dominik Gabi, Daniel Song, Faizan Ahmad, Cornelius Aschermann, Lorenzo Fontana, Sasha Frolov, Ravi Prakash Giri, Dhaval Kapil, Yiannis Kozyrakis, David LeBlanc, James Milazzo, Aleksandar Straumann, Gabriel Synnaeve, Varun Vontimitta, Spencer Whitman, Joshua Saxe

Abstract: This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their lev… ▽ More This paper presents CyberSecEval, a comprehensive benchmark developed to help bolster the cybersecurity of Large Language Models (LLMs) employed as coding assistants. As what we believe to be the most extensive unified cybersecurity safety benchmark to date, CyberSecEval provides a thorough evaluation of LLMs in two crucial security domains: their propensity to generate insecure code and their level of compliance when asked to assist in cyberattacks. Through a case study involving seven models from the Llama 2, Code Llama, and OpenAI GPT large language model families, CyberSecEval effectively pinpointed key cybersecurity risks. More importantly, it offered practical insights for refining these models. A significant observation from the study was the tendency of more advanced models to suggest insecure code, highlighting the critical need for integrating security considerations in the development of sophisticated LLMs. CyberSecEval, with its automated test case generation and evaluation pipeline covers a broad scope and equips LLM designers and researchers with a tool to broadly measure and enhance the cybersecurity safety properties of LLMs, contributing to the development of more secure AI systems. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2308.12950 [pdf, other]

Code Llama: Open Foundation Models for Code

Authors: Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom , et al. (1 additional authors not shown)

Abstract: We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama… ▽ More We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B, 34B and 70B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B, 13B and 70B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 67% and 65% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use. △ Less

Submitted 31 January, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2308.05876 [pdf, other]

Strategic Decision-Making in Multi-Agent Domains: A Weighted Potential Dynamic Game Approach

Authors: Maulik Bhatt, Negar Mehr

Abstract: In interactive multi-agent settings, decision-making complexity arises from agents' interconnected objectives. Dynamic game theory offers a formal framework for analyzing such intricacies. Yet, solving dynamic games and determining Nash equilibria pose computational challenges due to the need of solving coupled optimal control problems. To address this, our key idea is to leverage potential games,… ▽ More In interactive multi-agent settings, decision-making complexity arises from agents' interconnected objectives. Dynamic game theory offers a formal framework for analyzing such intricacies. Yet, solving dynamic games and determining Nash equilibria pose computational challenges due to the need of solving coupled optimal control problems. To address this, our key idea is to leverage potential games, which are games with a potential function that allows for the computation of Nash equilibria by optimizing the potential function. We argue that dynamic potential games, can effectively facilitate interactive decision-making in many multi-agent interactions. We will identify structures in realistic multi-agent interactive scenarios that can be transformed into weighted potential dynamic games. We will show that the open-loop Nash equilibria of the resulting weighted potential dynamic game can be obtained by solving a single optimal control problem. We will demonstrate the effectiveness of the proposed method through various simulation studies, showing close proximity to feedback Nash equilibria and significant improvements in solve time compared to state-of-the-art game solvers. △ Less

Submitted 22 August, 2023; v1 submitted 10 August, 2023; originally announced August 2023.

arXiv:2303.14461 [pdf, other]

Indian Language Summarization using Pretrained Sequence-to-Sequence Models

Authors: Ashok Urlana, Sahil Manoj Bhatt, Nirmal Surange, Manish Shrivastava

Abstract: The ILSUM shared task focuses on text summarization for two major Indian languages- Hindi and Gujarati, along with English. In this task, we experiment with various pretrained sequence-to-sequence models to find out the best model for each of the languages. We present a detailed overview of the models and our approaches in this paper. We secure the first rank across all three sub-tasks (English, H… ▽ More The ILSUM shared task focuses on text summarization for two major Indian languages- Hindi and Gujarati, along with English. In this task, we experiment with various pretrained sequence-to-sequence models to find out the best model for each of the languages. We present a detailed overview of the models and our approaches in this paper. We secure the first rank across all three sub-tasks (English, Hindi and Gujarati). This paper also extensively analyzes the impact of k-fold cross-validation while experimenting with limited data size, and we also perform various experiments with a combination of the original and a filtered version of the data to determine the efficacy of the pretrained models. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: Accepted at FIRE-2022, Indian Language Summarization (ILSUM) track

arXiv:2206.08963 [pdf, other]

Efficient Constrained Multi-Agent Trajectory Optimization using Dynamic Potential Games

Authors: Maulik Bhatt, Yixuan Jia, Negar Mehr

Abstract: Although dynamic games provide a rich paradigm for modeling agents' interactions, solving these games for real-world applications is often challenging. Many real-world interactive settings involve general nonlinear state and input constraints that couple agents' decisions with one another. In this work, we develop an efficient and fast planner for interactive trajectory optimization in constrained… ▽ More Although dynamic games provide a rich paradigm for modeling agents' interactions, solving these games for real-world applications is often challenging. Many real-world interactive settings involve general nonlinear state and input constraints that couple agents' decisions with one another. In this work, we develop an efficient and fast planner for interactive trajectory optimization in constrained setups using a constrained game-theoretical framework. Our key insight is to leverage the special structure of agents' objective and constraint functions that are common in multi-agent interactions for fast and reliable planning. More precisely, we identify the structure of agents' cost and constraint functions under which the resulting dynamic game is an instance of a constrained dynamic potential game. Constrained dynamic potential games are a class of games for which instead of solving a set of coupled constrained optimal control problems, a constrained Nash equilibrium, i.e. a Generalized Nash equilibrium, can be found by solving a single constrained optimal control problem. This simplifies constrained interactive trajectory optimization significantly. We compare the performance of our method in a navigation setup involving four planar agents and show that our method is on average 20 times faster than the state-of-the-art. We further provide experimental validation of our proposed method in a navigation setup involving two quadrotors carrying a rigid object while avoiding collisions with two humans. △ Less

Submitted 4 August, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

arXiv:2012.14359 [pdf, other]

Commonsense Visual Sensemaking for Autonomous Driving: On Generalised Neurosymbolic Online Abduction Integrating Vision and Semantics

Authors: Jakob Suchan, Mehul Bhatt, Srikrishna Varadarajan

Abstract: We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular fr… ▽ More We demonstrate the need and potential of systematically integrated vision and semantics solutions for visual sensemaking in the backdrop of autonomous driving. A general neurosymbolic method for online visual sensemaking using answer set programming (ASP) is systematically formalised and fully implemented. The method integrates state of the art in visual computing, and is developed as a modular framework that is generally usable within hybrid architectures for realtime perception and control. We evaluate and demonstrate with community established benchmarks KITTIMOD, MOT-2017, and MOT-2020. As use-case, we focus on the significance of human-centred visual sensemaking -- e.g., involving semantic representation and explainability, question-answering, commonsense interpolation -- in safety-critical autonomous driving situations. The developed neurosymbolic framework is domain-independent, with the case of autonomous driving designed to serve as an exemplar for online visual sensemaking in diverse cognitive interaction settings in the backdrop of select human-centred AI technology design considerations. Keywords: Cognitive Vision, Deep Semantics, Declarative Spatial Reasoning, Knowledge Representation and Reasoning, Commonsense Reasoning, Visual Abduction, Answer Set Programming, Autonomous Driving, Human-Centred Computing and Design, Standardisation in Driving Technology, Spatial Cognition and AI. △ Less

Submitted 28 December, 2020; originally announced December 2020.

Comments: This is a preprint / review version of an accepted contribution to be published as part of the Artificial Intelligence Journal (AIJ).? The article is an extended version of an IJCAI 2019 publication [74, arXiv:1906.00107]

arXiv:2012.10710 [pdf, other]

Visuo-Locomotive Complexity as a Component of Parametric Systems for Architecture Design

Authors: Vasiliki Kondyli, Mehul Bhatt, Evgenia Spyridonos

Abstract: A people-centred approach for designing large-scale built-up spaces necessitates systematic anticipation of user's embodied visuo-locomotive experience from the viewpoint of human-environment interaction factors pertaining to aspects such as navigation, wayfinding, usability. In this context, we develop a behaviour-based visuo-locomotive complexity model that functions as a key correlate of cognit… ▽ More A people-centred approach for designing large-scale built-up spaces necessitates systematic anticipation of user's embodied visuo-locomotive experience from the viewpoint of human-environment interaction factors pertaining to aspects such as navigation, wayfinding, usability. In this context, we develop a behaviour-based visuo-locomotive complexity model that functions as a key correlate of cognitive performance vis-a-vis internal navigation in built-up spaces. We also demonstrate the model's implementation and application as a parametric tool for the identification and manipulation of the architectural morphology along a navigation path as per the parameters of the proposed visuospatial complexity model. We present examples based on an empirical study in two healthcare buildings, and showcase the manner in which a dynamic and interactive parametric (complexity) model can promote behaviour-based decision-making throughout the design process to maintain desired levels of visuospatial complexity as part of a navigation or wayfinding experience. △ Less

Submitted 19 December, 2020; originally announced December 2020.

Comments: This is a preprint of the contribution published as part of the proceedings of ICoRD 2021: 8th International Conference on Research into Design, IDC School of Design (IIT Mumbai, India). ICoRD 2021, www.idc.iitb.ac.in/icord2021/ - The overall scientific agenda driving this research may be consulted here: The DesignSpace Group / www.design-space.org

arXiv:2010.07360 [pdf]

Deep Learning in Ultrasound Elastography Imaging

Authors: Hongliang Li, Manish Bhatt, Zhen Qu, Shiming Zhang, Martin C. Hartel, Ali Khademhosseini, Guy Cloutier

Abstract: It is known that changes in the mechanical properties of tissues are associated with the onset and progression of certain diseases. Ultrasound elastography is a technique to characterize tissue stiffness using ultrasound imaging either by measuring tissue strain using quasi-static elastography or natural organ pulsation elastography, or by tracing a propagated shear wave induced by a source or a n… ▽ More It is known that changes in the mechanical properties of tissues are associated with the onset and progression of certain diseases. Ultrasound elastography is a technique to characterize tissue stiffness using ultrasound imaging either by measuring tissue strain using quasi-static elastography or natural organ pulsation elastography, or by tracing a propagated shear wave induced by a source or a natural vibration using dynamic elastography. In recent years, deep learning has begun to emerge in ultrasound elastography research. In this review, several common deep learning frameworks in the computer vision community, such as multilayer perceptron, convolutional neural network, and recurrent neural network are described. Then, recent advances in ultrasound elastography using such deep learning techniques are revisited in terms of algorithm development and clinical diagnosis. Finally, the current challenges and future developments of deep learning in ultrasound elastography are prospected. △ Less

Submitted 31 October, 2020; v1 submitted 14 October, 2020; originally announced October 2020.

arXiv:2008.01286 [pdf, other]

doi 10.1145/3311790.3396670

Design and Deployment of Photo2Building: A Cloud-based Procedural Modeling Tool as a Service

Authors: Manush Bhatt, Rajesh Kalyanam, Gen Nishida, Liu He, Christopher May, Dev Niyogi, Daniel Aliaga

Abstract: We present a Photo2Building tool to create a plausible 3D model of a building from only a single photograph. Our tool is based on a prior desktop version which, as described in this paper, is converted into a client-server model, with job queuing, web-page support, and support of concurrent usage. The reported cloud-based web-accessible tool can reconstruct a building in 40 seconds on average and… ▽ More We present a Photo2Building tool to create a plausible 3D model of a building from only a single photograph. Our tool is based on a prior desktop version which, as described in this paper, is converted into a client-server model, with job queuing, web-page support, and support of concurrent usage. The reported cloud-based web-accessible tool can reconstruct a building in 40 seconds on average and costing only 0.60 USD with current pricing. This provides for an extremely scalable and possibly widespread tool for creating building models for use in urban design and planning applications. With the growing impact of rapid urbanization on weather and climate and resource availability, access to such a service is expected to help a wide variety of users such as city planners, urban meteorologists worldwide in the quest to improved prediction of urban weather and designing climate-resilient cities of the future. △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: 7 pages, 7 figures, PEARC '20: Practice and Experience in Advanced Research Computing, July 26--30, 2020, Portland, OR, USA

Journal ref: ACM, PEARC 2020

arXiv:2006.00059 [pdf, other]

Towards a Human-Centred Cognitive Model of Visuospatial Complexity in Everyday Driving

Authors: Vasiliki Kondyli, Mehul Bhatt, Jakob Suchan

Abstract: We develop a human-centred, cognitive model of visuospatial complexity in everyday, naturalistic driving conditions. With a focus on visual perception, the model incorporates quantitative, structural, and dynamic attributes identifiable in the chosen context; the human-centred basis of the model lies in its behavioural evaluation with human subjects with respect to psychophysical measures pertaini… ▽ More We develop a human-centred, cognitive model of visuospatial complexity in everyday, naturalistic driving conditions. With a focus on visual perception, the model incorporates quantitative, structural, and dynamic attributes identifiable in the chosen context; the human-centred basis of the model lies in its behavioural evaluation with human subjects with respect to psychophysical measures pertaining to embodied visuoauditory attention. We report preliminary steps to apply the developed cognitive model of visuospatial complexity for human-factors guided dataset creation and benchmarking, and for its use as a semantic template for the (explainable) computational analysis of visuospatial complexity. △ Less

Submitted 2 June, 2020; v1 submitted 29 May, 2020; originally announced June 2020.

Comments: 9th European Starting AI Researchers Symposium (STAIRS), at ECAI 2020, the 24th European Conference on Artificial Intelligence (ECAI)., Santiago de Compostela, Spain

arXiv:1906.00107 [pdf, other]

Out of Sight But Not Out of Mind: An Answer Set Programming Based Online Abduction Framework for Visual Sensemaking in Autonomous Driving

Authors: Jakob Suchan, Mehul Bhatt, Srikrishna Varadarajan

Abstract: We demonstrate the need and potential of systematically integrated vision and semantics} solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modul… ▽ More We demonstrate the need and potential of systematically integrated vision and semantics} solutions for visual sensemaking (in the backdrop of autonomous driving). A general method for online visual sensemaking using answer set programming is systematically formalised and fully implemented. The method integrates state of the art in (deep learning based) visual computing, and is developed as a modular framework usable within hybrid architectures for perception & control. We evaluate and demo with community established benchmarks KITTIMOD and MOT. As use-case, we focus on the significance of human-centred visual sensemaking ---e.g., semantic representation and explainability, question-answering, commonsense interpolation--- in safety-critical autonomous driving situations. △ Less

Submitted 31 May, 2019; originally announced June 2019.

Comments: IJCAI 2019: the 28th International Joint Conference on Artificial Intelligence (IJCAI) 2019, August 10 - 16, Macao. (Preprint / to appear)

arXiv:1806.07376 [pdf, other]

Semantic Analysis of (Reflectional) Visual Symmetry: A Human-Centred Computational Model for Declarative Explainability

Authors: Jakob Suchan, Mehul Bhatt, Srikrishna Vardarajan, Seyed Ali Amirshahi, Stella Yu

Abstract: We present a computational model for the semantic interpretation of symmetry in naturalistic scenes. Key features include a human-centred representation, and a declarative, explainable interpretation model supporting deep semantic question-answering founded on an integration of methods in knowledge representation and deep learning based computer vision. In the backdrop of the visual arts, we showc… ▽ More We present a computational model for the semantic interpretation of symmetry in naturalistic scenes. Key features include a human-centred representation, and a declarative, explainable interpretation model supporting deep semantic question-answering founded on an integration of methods in knowledge representation and deep learning based computer vision. In the backdrop of the visual arts, we showcase the framework's capability to generate human-centred, queryable, relational structures, also evaluating the framework with an empirical study on the human perception of visual symmetry. Our framework represents and is driven by the application of foundational, integrated Vision and Knowledge Representation and Reasoning methods for applications in the arts, and the psychological and social sciences. △ Less

Submitted 14 September, 2018; v1 submitted 31 May, 2018; originally announced June 2018.

Comments: Preprint of accepted article / Journal: Advances in Cognitive Systems. ( http://www.cogsys.org/journal )

Journal ref: Advances in Cognitive Systems. (http://www.cogsys.org/journal), 2018

arXiv:1805.06861 [pdf, other]

Answer Set Programming Modulo `Space-Time'

Authors: Carl Schultz, Mehul Bhatt, Jakob Suchan, Przemysław Wałęga

Abstract: We present ASP Modulo `Space-Time', a declarative representational and computational framework to perform commonsense reasoning about regions with both spatial and temporal components. Supported are capabilities for mixed qualitative-quantitative reasoning, consistency checking, and inferring compositions of space-time relations; these capabilities combine and synergise for applications in a range… ▽ More We present ASP Modulo `Space-Time', a declarative representational and computational framework to perform commonsense reasoning about regions with both spatial and temporal components. Supported are capabilities for mixed qualitative-quantitative reasoning, consistency checking, and inferring compositions of space-time relations; these capabilities combine and synergise for applications in a range of AI application areas where the processing and interpretation of spatio-temporal data is crucial. The framework and resulting system is the only general KR-based method for declaratively reasoning about the dynamics of `space-time' regions as first-class objects. We present an empirical evaluation (with scalability and robustness results), and include diverse application examples involving interpretation and control tasks. △ Less

Submitted 17 May, 2018; originally announced May 2018.

arXiv:1712.00840 [pdf, other]

Visual Explanation by High-Level Abduction: On Answer-Set Programming Driven Reasoning about Moving Objects

Authors: Jakob Suchan, Mehul Bhatt, Przemysław Wałęga, Carl Schultz

Abstract: We propose a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data. The architecture consists of two tightly integrated synergistic components: (1) (functional) answer set programming based abductive reasoning with space-time tracklets as native entities; and (2) a visual processing pi… ▽ More We propose a hybrid architecture for systematically computing robust visual explanation(s) encompassing hypothesis formation, belief revision, and default reasoning with video data. The architecture consists of two tightly integrated synergistic components: (1) (functional) answer set programming based abductive reasoning with space-time tracklets as native entities; and (2) a visual processing pipeline for detection based object tracking and motion analysis. We present the formal framework, its general implementation as a (declarative) method in answer set programming, and an example application and evaluation based on two diverse video datasets: the MOTChallenge benchmark developed by the vision community, and a recently developed Movie Dataset. △ Less

Submitted 3 December, 2017; originally announced December 2017.

Comments: Preprint of final publication published as part of AAAI 2018: J. Suchan., M. Bhatt, Wałęga, P., Schultz, C. (2018). Visual Explanation by High-Level Abduction: On Answer-Set Programming Driven Reasoning about Moving Objects. In AAAI 2018: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, February 2-7, 2018, New Orleans, USA

arXiv:1710.04076 [pdf, other]

Deep Semantic Abstractions of Everyday Human Activities: On Commonsense Representations of Human Interactions

Authors: Jakob Suchan, Mehul Bhatt

Abstract: We propose a deep semantic characterization of space and motion categorically from the viewpoint of grounding embodied human-object interactions. Our key focus is on an ontological model that would be adept to formalisation from the viewpoint of commonsense knowledge representation, relational learning, and qualitative reasoning about space and motion in cognitive robotics settings. We demonstrate… ▽ More We propose a deep semantic characterization of space and motion categorically from the viewpoint of grounding embodied human-object interactions. Our key focus is on an ontological model that would be adept to formalisation from the viewpoint of commonsense knowledge representation, relational learning, and qualitative reasoning about space and motion in cognitive robotics settings. We demonstrate key aspects of the space & motion ontology and its formalization as a representational framework in the backdrop of select examples from a dataset of everyday activities. Furthermore, focussing on human-object interaction data obtained from RGBD sensors, we also illustrate how declarative (spatio-temporal) reasoning in the (constraint) logic programming family may be performed with the developed deep semantic abstractions. △ Less

Submitted 10 October, 2017; originally announced October 2017.

Comments: In ROBOT 2017: Third Iberian Robotics Conference. Escuela Técnica Superior de Ingeniería, Sevilla (Spain) (November 22-24, 2017). https://grvc.us.es/robot2017/ (to appear). arXiv admin note: substantial text overlap with arXiv:1709.05293

arXiv:1709.05293 [pdf, other]

Commonsense Scene Semantics for Cognitive Robotics: Towards Grounding Embodied Visuo-Locomotive Interactions

Authors: Jakob Suchan, Mehul Bhatt

Abstract: We present a commonsense, qualitative model for the semantic grounding of embodied visuo-spatial and locomotive interactions. The key contribution is an integrative methodology combining low-level visual processing with high-level, human-centred representations of space and motion rooted in artificial intelligence. We demonstrate practical applicability with examples involving object interactions,… ▽ More We present a commonsense, qualitative model for the semantic grounding of embodied visuo-spatial and locomotive interactions. The key contribution is an integrative methodology combining low-level visual processing with high-level, human-centred representations of space and motion rooted in artificial intelligence. We demonstrate practical applicability with examples involving object interactions, and indoor movement. △ Less

Submitted 15 September, 2017; originally announced September 2017.

Comments: to appear in: ICCV 2017 Workshop - Vision in Practice on Autonomous Robots (ViPAR), International Conference on Computer Vision (ICCV), Venice, Italy

arXiv:1610.10041 [pdf]

doi 10.5121/csit.2016.60105

Competence building framework requirements for information technology for educational management

Authors: Rakesh Mohan Bhatt

Abstract: Progressive efforts have been evolving continuously for the betterment of the services of the Information Technology for Educational Management(ITEM). These services require data intensive and communication intensive applications. Due to the massive growth of information, situation becomes difficult to manage these services. Here the role of the Information and Communication Technology (ICT) infra… ▽ More Progressive efforts have been evolving continuously for the betterment of the services of the Information Technology for Educational Management(ITEM). These services require data intensive and communication intensive applications. Due to the massive growth of information, situation becomes difficult to manage these services. Here the role of the Information and Communication Technology (ICT) infrastructure particularly data centre with communication components becomes important to facilitate these services. The present paper discusses the related issues such as competent staff, appropriate ICT infrastructure, ICT acceptance level etc. required for ITEM competence building framework considering the earlier approach for core competences for ITEM. It this connection, it is also necessary to consider the procurement of standard and appropriate ICT facilities. This will help in the integration of these facilities for the future expansion. This will also enable to create and foresee the impact of the pairing the management with information, technology, and education components individually and combined. These efforts will establish a strong coupling between the ITEM activities and resource management for effective implementation of the framework. △ Less

Submitted 12 January, 2016; originally announced October 2016.

Comments: 7 pages in CS & IT-CSCP 2016

arXiv:1608.02693 [pdf, other]

Deeply Semantic Inductive Spatio-Temporal Learning

Authors: Jakob Suchan, Mehul Bhatt, Carl Schultz

Abstract: We present an inductive spatio-temporal learning framework rooted in inductive logic programming. With an emphasis on visuo-spatial language, logic, and cognition, the framework supports learning with relational spatio-temporal features identifiable in a range of domains involving the processing and interpretation of dynamic visuo-spatial imagery. We present a prototypical system, and an example a… ▽ More We present an inductive spatio-temporal learning framework rooted in inductive logic programming. With an emphasis on visuo-spatial language, logic, and cognition, the framework supports learning with relational spatio-temporal features identifiable in a range of domains involving the processing and interpretation of dynamic visuo-spatial imagery. We present a prototypical system, and an example application in the domain of computing for visual arts and computational cognitive science. △ Less

Submitted 9 August, 2016; originally announced August 2016.

Comments: Accepted for publication at ILP 2016: 26th International Conference on Inductive Logic Programming 4th - 6th September 2016, London. Keywords: Spatio-Temporal Learning; Dynamic Visuo-Spatial Imagery; Declarative Spatial Reasoning; Inductive Logic Programming; AI and Art

arXiv:1607.07565 [pdf, other]

Grounding Dynamic Spatial Relations for Embodied (Robot) Interaction

Authors: Michael Spranger, Jakob Suchan, Mehul Bhatt, Manfred Eppe

Abstract: This paper presents a computational model of the processing of dynamic spatial relations occurring in an embodied robotic interaction setup. A complete system is introduced that allows autonomous robots to produce and interpret dynamic spatial phrases (in English) given an environment of moving objects. The model unites two separate research strands: computational cognitive semantics and on common… ▽ More This paper presents a computational model of the processing of dynamic spatial relations occurring in an embodied robotic interaction setup. A complete system is introduced that allows autonomous robots to produce and interpret dynamic spatial phrases (in English) given an environment of moving objects. The model unites two separate research strands: computational cognitive semantics and on commonsense spatial representation and reasoning. The model for the first time demonstrates an integration of these different strands. △ Less

Submitted 26 July, 2016; originally announced July 2016.

Comments: in: Pham, D.-N. and Park, S.-B., editors, PRICAI 2014: Trends in Artificial Intelligence, volume 8862 of Lecture Notes in Computer Science, pages 958-971. Springer

arXiv:1607.05968 [pdf, other]

Robust Natural Language Processing - Combining Reasoning, Cognitive Semantics and Construction Grammar for Spatial Language

Authors: Michael Spranger, Jakob Suchan, Mehul Bhatt

Abstract: We present a system for generating and understanding of dynamic and static spatial relations in robotic interaction setups. Robots describe an environment of moving blocks using English phrases that include spatial relations such as "across" and "in front of". We evaluate the system in robot-robot interactions and show that the system can robustly deal with visual perception errors, language omiss… ▽ More We present a system for generating and understanding of dynamic and static spatial relations in robotic interaction setups. Robots describe an environment of moving blocks using English phrases that include spatial relations such as "across" and "in front of". We evaluate the system in robot-robot interactions and show that the system can robustly deal with visual perception errors, language omissions and ungrammatical utterances. △ Less

Submitted 20 July, 2016; originally announced July 2016.

Comments: in IJCAI'16: Proceedings of the 25th international joint conference on Artificial intelligence, Palo Alto, 2016. AAAI Press

arXiv:1606.07860 [pdf, other]

Non-Monotonic Spatial Reasoning with Answer Set Programming Modulo Theories

Authors: Przemysław Andrzej Wałęga, Carl Schultz, Mehul Bhatt

Abstract: The systematic modelling of dynamic spatial systems is a key requirement in a wide range of application areas such as commonsense cognitive robotics, computer-aided architecture design, and dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning -a crucial requirement within dynamic spatial systems- based on… ▽ More The systematic modelling of dynamic spatial systems is a key requirement in a wide range of application areas such as commonsense cognitive robotics, computer-aided architecture design, and dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning -a crucial requirement within dynamic spatial systems- based on Answer Set Programming Modulo Theories (ASPMT). ASPMT(QS) consists of a (qualitative) spatial representation module (QS) and a method for turning tight ASPMT instances into Satisfiability Modulo Theories (SMT) instances in order to compute stable models by means of SMT solvers. We formalise and implement concepts of default spatial reasoning and spatial frame axioms. Spatial reasoning is performed by encoding spatial relations as systems of polynomial constraints, and solving via SMT with the theory of real nonlinear arithmetic. We empirically evaluate ASPMT(QS) in comparison with other contemporary spatial reasoning systems both within and outside the context of logic programming. ASPMT(QS) is currently the only existing system that is capable of reasoning about indirect spatial effects (i.e., addressing the ramification problem), and integrating geometric and qualitative spatial information within a non-monotonic spatial reasoning context. This paper is under consideration for publication in TPLP. △ Less

Submitted 28 June, 2016; v1 submitted 24 June, 2016; originally announced June 2016.

Comments: 22 pages, 6 figures, Under consideration for publication in TPLP

arXiv:1508.03276 [pdf, other]

Talking about the Moving Image: A Declarative Model for Image Schema Based Embodied Perception Grounding and Language Generation

Authors: Jakob Suchan, Mehul Bhatt, Harshita Jhavar

Abstract: We present a general theory and corresponding declarative model for the embodied grounding and natural language based analytical summarisation of dynamic visuo-spatial imagery. The declarative model ---ecompassing spatio-linguistic abstractions, image schemas, and a spatio-temporal feature based language generator--- is modularly implemented within Constraint Logic Programming (CLP). The implement… ▽ More We present a general theory and corresponding declarative model for the embodied grounding and natural language based analytical summarisation of dynamic visuo-spatial imagery. The declarative model ---ecompassing spatio-linguistic abstractions, image schemas, and a spatio-temporal feature based language generator--- is modularly implemented within Constraint Logic Programming (CLP). The implemented model is such that primitives of the theory, e.g., pertaining to space and motion, image schemata, are available as first-class objects with `deep semantics' suited for inference and query. We demonstrate the model with select examples broadly motivated by areas such as film, design, geography, smart environments where analytical natural language based externalisations of the moving image are central from the viewpoint of human interaction, evidence-based qualitative analysis, and sensemaking. Keywords: moving image, visual semantics and embodiment, visuo-spatial cognition and computation, cognitive vision, computational models of narrative, declarative spatial reasoning △ Less

Submitted 13 August, 2015; originally announced August 2015.

Comments: 19 pages. Unpublished report

arXiv:1506.04945 [pdf, other]

Spatial Symmetry Driven Pruning Strategies for Efficient Declarative Spatial Reasoning

Authors: Carl Schultz, Mehul Bhatt

Abstract: Declarative spatial reasoning denotes the ability to (declaratively) specify and solve real-world problems related to geometric and qualitative spatial representation and reasoning within standard knowledge representation and reasoning (KR) based methods (e.g., logic programming and derivatives). One approach for encoding the semantics of spatial relations within a declarative programming framewor… ▽ More Declarative spatial reasoning denotes the ability to (declaratively) specify and solve real-world problems related to geometric and qualitative spatial representation and reasoning within standard knowledge representation and reasoning (KR) based methods (e.g., logic programming and derivatives). One approach for encoding the semantics of spatial relations within a declarative programming framework is by systems of polynomial constraints. However, solving such constraints is computationally intractable in general (i.e. the theory of real-closed fields). We present a new algorithm, implemented within the declarative spatial reasoning system CLP(QS), that drastically improves the performance of deciding the consistency of spatial constraint graphs over conventional polynomial encodings. We develop pruning strategies founded on spatial symmetries that form equivalence classes (based on affine transformations) at the qualitative spatial level. Moreover, pruning strategies are themselves formalised as knowledge about the properties of space and spatial symmetries. We evaluate our algorithm using a range of benchmarks in the class of contact problems, and proofs in mereology and geometry. The empirical results show that CLP(QS) with knowledge-based spatial pruning outperforms conventional polynomial encodings by orders of magnitude, and can thus be applied to problems that are otherwise unsolvable in practice. △ Less

Submitted 16 June, 2015; originally announced June 2015.

Comments: 22 pages. Accepted for publication at: COSIT 2015 - Conference on Spatial Information Theory XII (COSIT), Santa Fe, New Mexico, USA ,October 2015

arXiv:1506.04929 [pdf, other]

ASPMT(QS): Non-Monotonic Spatial Reasoning with Answer Set Programming Modulo Theories

Authors: Przemysław Andrzej Wałęga, Mehul Bhatt, Carl Schultz

Abstract: The systematic modelling of \emph{dynamic spatial systems} [9] is a key requirement in a wide range of application areas such as comonsense cognitive robotics, computer-aided architecture design, dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning ---a crucial requirement within dynamic spatial systems--… ▽ More The systematic modelling of \emph{dynamic spatial systems} [9] is a key requirement in a wide range of application areas such as comonsense cognitive robotics, computer-aided architecture design, dynamic geographic information systems. We present ASPMT(QS), a novel approach and fully-implemented prototype for non-monotonic spatial reasoning ---a crucial requirement within dynamic spatial systems-- based on Answer Set Programming Modulo Theories (ASPMT). ASPMT(QS) consists of a (qualitative) spatial representation module (QS) and a method for turning tight ASPMT instances into Sat Modulo Theories (SMT) instances in order to compute stable models by means of SMT solvers. We formalise and implement concepts of default spatial reasoning and spatial frame axioms using choice formulas. Spatial reasoning is performed by encoding spatial relations as systems of polynomial constraints, and solving via SMT with the theory of real nonlinear arithmetic. We empirically evaluate ASPMT(QS) in comparison with other prominent contemporary spatial reasoning systems. Our results show that ASPMT(QS) is the only existing system that is capable of reasoning about indirect spatial effects (i.e. addressing the ramification problem), and integrating geometric and qualitative spatial information within a non-monotonic spatial reasoning context. △ Less

Submitted 16 June, 2015; originally announced June 2015.

Comments: pages 13. accepted for publication at: LPNMR 2015 - Logic Programming and Nonmonotonic Reasoning, 13th International Conference, LPNMR 2015, LNAI Vol. 9345., Lexington, September 27-30, 2015. Proceedings., (editors: Francesco Calimeri, Giovambattista Ianni, Miroslaw Truszczynski)

arXiv:1405.6130 [pdf]

doi 10.14445/22312803/IJCTT-V7P143

A Study of Local Binary Pattern Method for Facial Expression Detection

Authors: Ms. Drashti H. Bhatt, Mr. Kirit R. Rathod, Mr. Shardul J. Agravat

Abstract: Face detection is a basic task for expression recognition. The reliability of face detection & face recognition approach has a major role on the performance and usability of the entire system. There are several ways to undergo face detection & recognition. We can use Image Processing Operations, various classifiers, filters or virtual machines for the former. Various strategies are being available… ▽ More Face detection is a basic task for expression recognition. The reliability of face detection & face recognition approach has a major role on the performance and usability of the entire system. There are several ways to undergo face detection & recognition. We can use Image Processing Operations, various classifiers, filters or virtual machines for the former. Various strategies are being available for Facial Expression Detection. The field of facial expression detection can have various applications along with its importance & can be interacted between human being & computer. Many few options are available to identify a face in an image in accurate & efficient manner. Local Binary Pattern (LBP) based texture algorithms have gained popularity in these years. LBP is an effective approach to have facial expression recognition & is a feature-based approach. △ Less

Submitted 4 February, 2014; originally announced May 2014.

Comments: 3 pages, 2 images, International Journal of Computer Trends and Technology (IJCTT)

Journal ref: Ms.Drashti H. Bhatt , Mr.Kirit R. Rathod , Mr.Shardul J. Agravat. Article: A Study of Local Binary Pattern Method for Facial Expression Detection. IJCTT 7(3):151-153, January 2014. Published by Seventh Sense Research Group

arXiv:1307.3040 [pdf, ps, other]

Between Sense and Sensibility: Declarative narrativisation of mental models as a basis and benchmark for visuo-spatial cognition and computation focussed collaborative cognitive systems

Authors: Mehul Bhatt

Abstract: What lies between `\emph{sensing}' and `\emph{sensibility}'? In other words, what kind of cognitive processes mediate sensing capability, and the formation of sensible impressions ---e.g., abstractions, analogies, hypotheses and theory formation, beliefs and their revision, argument formation--- in domain-specific problem solving, or in regular activities of everyday living, working and simply goi… ▽ More What lies between `\emph{sensing}' and `\emph{sensibility}'? In other words, what kind of cognitive processes mediate sensing capability, and the formation of sensible impressions ---e.g., abstractions, analogies, hypotheses and theory formation, beliefs and their revision, argument formation--- in domain-specific problem solving, or in regular activities of everyday living, working and simply going around in the environment? How can knowledge and reasoning about such capabilities, as exhibited by humans in particular problem contexts, be used as a model and benchmark for the development of collaborative cognitive (interaction) systems concerned with human assistance, assurance, and empowerment? We pose these questions in the context of a range of assistive technologies concerned with \emph{visuo-spatial perception and cognition} tasks encompassing aspects such as commonsense, creativity, and the application of specialist domain knowledge and problem-solving thought processes. Assistive technologies being considered include: (a) human activity interpretation; (b) high-level cognitive rovotics; (c) people-centred creative design in domains such as architecture & digital media creation, and (d) qualitative analyses geographic information systems. Computational narratives not only provide a rich cognitive basis, but they also serve as a benchmark of functional performance in our development of computational cognitive assistance systems. We posit that computational narrativisation pertaining to space, actions, and change provides a useful model of \emph{visual} and \emph{spatio-temporal thinking} within a wide-range of problem-solving tasks and application areas where collaborative cognitive systems could serve an assistive and empowering function. △ Less

Submitted 31 March, 2014; v1 submitted 11 July, 2013; originally announced July 2013.

Comments: 5 pages, research statement summarising recent publications

arXiv:1307.2541 [pdf, other]

Geospatial Narratives and their Spatio-Temporal Dynamics: Commonsense Reasoning for High-level Analyses in Geographic Information Systems

Authors: Mehul Bhatt, Jan Oliver Wallgruen

Abstract: The modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational me… ▽ More The modelling, analysis, and visualisation of dynamic geospatial phenomena has been identified as a key developmental challenge for next-generation Geographic Information Systems (GIS). In this context, the envisaged paradigmatic extensions to contemporary foundational GIS technology raises fundamental questions concerning the ontological, formal representational, and (analytical) computational methods that would underlie their spatial information theoretic underpinnings. We present the conceptual overview and architecture for the development of high-level semantic and qualitative analytical capabilities for dynamic geospatial domains. Building on formal methods in the areas of commonsense reasoning, qualitative reasoning, spatial and temporal representation and reasoning, reasoning about actions and change, and computational models of narrative, we identify concrete theoretical and practical challenges that accrue in the context of formal reasoning about `space, events, actions, and change'. With this as a basis, and within the backdrop of an illustrated scenario involving the spatio-temporal dynamics of urban narratives, we address specific problems and solutions techniques chiefly involving `qualitative abstraction', `data integration and spatial consistency', and `practical geospatial abduction'. From a broad topical viewpoint, we propose that next-generation dynamic GIS technology demands a transdisciplinary scientific perspective that brings together Geography, Artificial Intelligence, and Cognitive Science. Keywords: artificial intelligence; cognitive systems; human-computer interaction; geographic information systems; spatio-temporal dynamics; computational models of narrative; geospatial analysis; geospatial modelling; ontology; qualitative spatial modelling and reasoning; spatial assistance systems △ Less

Submitted 16 December, 2013; v1 submitted 9 July, 2013; originally announced July 2013.

Comments: ISPRS International Journal of Geo-Information (ISSN 2220-9964); Special Issue on: Geospatial Monitoring and Modelling of Environmental Change}. IJGI. Editor: Duccio Rocchini. (pre-print of article in press)

arXiv:1306.5308 [pdf, other]

Cognitive Interpretation of Everyday Activities: Toward Perceptual Narrative Based Visuo-Spatial Scene Interpretation

Authors: Mehul Bhatt, Jakob Suchan, Carl Schultz

Abstract: We position a narrative-centred computational model for high-level knowledge representation and reasoning in the context of a range of assistive technologies concerned with "visuo-spatial perception and cognition" tasks. Our proposed narrative model encompasses aspects such as \emph{space, events, actions, change, and interaction} from the viewpoint of commonsense reasoning and learning in large-s… ▽ More We position a narrative-centred computational model for high-level knowledge representation and reasoning in the context of a range of assistive technologies concerned with "visuo-spatial perception and cognition" tasks. Our proposed narrative model encompasses aspects such as \emph{space, events, actions, change, and interaction} from the viewpoint of commonsense reasoning and learning in large-scale cognitive systems. The broad focus of this paper is on the domain of "human-activity interpretation" in smart environments, ambient intelligence etc. In the backdrop of a "smart meeting cinematography" domain, we position the proposed narrative model, preliminary work on perceptual narrativisation, and the immediate outlook on constructing general-purpose open-source tools for perceptual narrativisation. ACM Classification: I.2 Artificial Intelligence: I.2.0 General -- Cognitive Simulation, I.2.4 Knowledge Representation Formalisms and Methods, I.2.10 Vision and Scene Understanding: Architecture and control structures, Motion, Perceptual reasoning, Shape, Video analysis General keywords: cognitive systems; human-computer interaction; spatial cognition and computation; commonsense reasoning; spatial and temporal reasoning; assistive technologies △ Less

Submitted 22 June, 2013; originally announced June 2013.

Comments: To appear at: Computational Models of Narrative (CMN) 2013., a satellite event of CogSci 2013: The 35th meeting of the Cognitive Science Society

ACM Class: I.2; I.2.0; I.2.4; I.2.10

arXiv:1306.1034 [pdf, other]

ROTUNDE - A Smart Meeting Cinematography Initiative: Tools, Datasets, and Benchmarks for Cognitive Interpretation and Control

Authors: Mehul Bhatt, Jakob Suchan, Christian Freksa

Abstract: We construe smart meeting cinematography with a focus on professional situations such as meetings and seminars, possibly conducted in a distributed manner across socio-spatially separated groups. The basic objective in smart meeting cinematography is to interpret professional interactions involving people, and automatically produce dynamic recordings of discussions, debates, presentations etc in t… ▽ More We construe smart meeting cinematography with a focus on professional situations such as meetings and seminars, possibly conducted in a distributed manner across socio-spatially separated groups. The basic objective in smart meeting cinematography is to interpret professional interactions involving people, and automatically produce dynamic recordings of discussions, debates, presentations etc in the presence of multiple communication modalities. Typical modalities include gestures (e.g., raising one's hand for a question, applause), voice and interruption, electronic apparatus (e.g., pressing a button), movement (e.g., standing-up, moving around) etc. ROTUNDE, an instance of smart meeting cinematography concept, aims to: (a) develop functionality-driven benchmarks with respect to the interpretation and control capabilities of human-cinematographers, real-time video editors, surveillance personnel, and typical human performance in everyday situations; (b) Develop general tools for the commonsense cognitive interpretation of dynamic scenes from the viewpoint of visuo-spatial cognition centred perceptual narrativisation. Particular emphasis is placed on declarative representations and interfacing mechanisms that seamlessly integrate within large-scale cognitive (interaction) systems and companion technologies consisting of diverse AI sub-components. For instance, the envisaged tools would provide general capabilities for high-level commonsense reasoning about space, events, actions, change, and interaction. △ Less

Submitted 5 June, 2013; originally announced June 2013.

Comments: Appears in AAAI-2013 Workshop on: Space, Time, and Ambient Intelligence (STAMI 2013)

arXiv:1306.0665 [pdf, other]

Narrative based Postdictive Reasoning for Cognitive Robotics

Authors: Manfred Eppe, Mehul Bhatt

Abstract: Making sense of incomplete and conflicting narrative knowledge in the presence of abnormalities, unobservable processes, and other real world considerations is a challenge and crucial requirement for cognitive robotics systems. An added challenge, even when suitably specialised action languages and reasoning systems exist, is practical integration and application within large-scale robot control f… ▽ More Making sense of incomplete and conflicting narrative knowledge in the presence of abnormalities, unobservable processes, and other real world considerations is a challenge and crucial requirement for cognitive robotics systems. An added challenge, even when suitably specialised action languages and reasoning systems exist, is practical integration and application within large-scale robot control frameworks. In the backdrop of an autonomous wheelchair robot control task, we report on application-driven work to realise postdiction triggered abnormality detection and re-planning for real-time robot control: (a) Narrative-based knowledge about the environment is obtained via a larger smart environment framework; and (b) abnormalities are postdicted from stable-models of an answer-set program corresponding to the robot's epistemic model. The overall reasoning is performed in the context of an approximate epistemic action theory based planner implemented via a translation to answer-set programming. △ Less

Submitted 4 June, 2013; originally announced June 2013.

Comments: Commonsense Reasoning Symposium, Ayia Napa, Cyprus, 2013

arXiv:1304.4925 [pdf, other]

h-approximation: History-Based Approximation of Possible World Semantics as ASP

Authors: Manfred Eppe, Mehul Bhatt, Frank Dylla

Abstract: We propose an approximation of the Possible Worlds Semantics (PWS) for action planning. A corresponding planning system is implemented by a transformation of the action specification to an Answer-Set Program. A novelty is support for postdiction wrt. (a) the plan existence problem in our framework can be solved in NP, as compared to $Σ_2^P$ for non-approximated PWS of Baral(2000); and (b) the plan… ▽ More We propose an approximation of the Possible Worlds Semantics (PWS) for action planning. A corresponding planning system is implemented by a transformation of the action specification to an Answer-Set Program. A novelty is support for postdiction wrt. (a) the plan existence problem in our framework can be solved in NP, as compared to $Σ_2^P$ for non-approximated PWS of Baral(2000); and (b) the planner generates optimal plans wrt. a minimal number of actions in $Δ_2^P$. We demo the planning system with standard problems, and illustrate its integration in a larger software framework for robot control in a smart home. △ Less

Submitted 14 June, 2013; v1 submitted 17 April, 2013; originally announced April 2013.

Comments: 12th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2013)

Showing 1–35 of 35 results for author: Bhatt, M