Search | arXiv e-print repository

Imagen 3

Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models. We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2406.10272 [pdf, other]

Connected Speech-Based Cognitive Assessment in Chinese and English

Authors: Saturnino Luz, Sofia De La Fuente Garcia, Fasih Haider, Davida Fromm, Brian MacWhinney, Alyssa Lanzi, Ya-Ning Chang, Chia-Ju Chou, Yi-Chien Liu

Abstract: We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age… ▽ More We present a novel benchmark dataset and prediction tasks for investigating approaches to assess cognitive function through analysis of connected speech. The dataset consists of speech samples and clinical information for speakers of Mandarin Chinese and English with different levels of cognitive impairment as well as individuals with normal cognition. These data have been carefully matched by age and sex by propensity score analysis to ensure balance and representativity in model training. The prediction tasks encompass mild cognitive impairment diagnosis and cognitive test score prediction. This framework was designed to encourage the development of approaches to speech-based cognitive assessment which generalise across languages. We illustrate it by presenting baseline prediction models that employ language-agnostic and comparable features for diagnosis and cognitive test score prediction. The models achieved unweighted average recall was 59.2% in diagnosis, and root mean squared error of 2.89 in score prediction. △ Less

Submitted 18 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: To appear in Proceedings of Interspeech 2024

ACM Class: J.3; I.5.4

arXiv:2406.07380 [pdf, ps, other]

Addressing Sustainability-IN Software Challenges

Authors: Coral Calero, Félix O. García, Gabriel Alberto García-Mireles, M. Ángeles Moraga, Aurora Vizcaíno

Abstract: In this position paper we address the Software Sustainability from the IN perspective, so that the Software Engineering (SE) community is aware of the need to contribute towards sustainable software companies, which need to adopt a holistic approach to sustainability considering all its dimensions (human, economic and environmental). A series of important challenges to be considered in the coming… ▽ More In this position paper we address the Software Sustainability from the IN perspective, so that the Software Engineering (SE) community is aware of the need to contribute towards sustainable software companies, which need to adopt a holistic approach to sustainability considering all its dimensions (human, economic and environmental). A series of important challenges to be considered in the coming years are presented, in order that advances in involved SE communities on the subject can be harmonised and used to contribute more effectively to this field of great interest and impact on society. △ Less

Submitted 11 June, 2024; originally announced June 2024.

MSC Class: cs.SE

arXiv:2406.03138 [pdf, other]

A Frame-based Attention Interpretation Method for Relevant Acoustic Feature Extraction in Long Speech Depression Detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-leve… ▽ More Speech-based depression detection tools could help early screening of depression. Here, we address two issues that may hinder the clinical practicality of such tools: segment-level labelling noise and a lack of model interpretability. We propose a speech-level Audio Spectrogram Transformer to avoid segment-level labelling. We observe that the proposed model significantly outperforms a segment-level model, providing evidence for the presence of segment-level labelling noise in audio modality and the advantage of longer-duration speech analysis for depression detection. We introduce a frame-based attention interpretation method to extract acoustic features from prediction-relevant waveform signals for interpretation by clinicians. Through interpretation, we observe that the proposed model identifies reduced loudness and F0 as relevant signals of depression, which aligns with the speech characteristics of depressed patients documented in clinical studies. △ Less

Submitted 7 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: 5 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2309.13476

arXiv:2404.10179 [pdf, other]

Scaling Instructable Agents Across Many Simulated Worlds

Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructions across a diverse range of virtual 3D environments, including curated research environments as well as open-ended, commercial video games. Our goal is to develop an instructable agent that can accomplish anything a human can do in any simulated 3D environment. Our approach focuses on language-driven generality while imposing minimal assumptions. Our agents interact with environments in real-time using a generic, human-like interface: the inputs are image observations and language instructions and the outputs are keyboard-and-mouse actions. This general approach is challenging, but it allows agents to ground language across many visually complex and semantically rich environments while also allowing us to readily run agents in new environments. In this paper we describe our motivation and goal, the initial progress we have made, and promising preliminary results on several diverse research environments and a variety of commercial video games. △ Less

Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

arXiv:2402.00233 [pdf]

doi 10.26599/TST.2020.9010004

An Architecture for Software Engineering Gamification

Authors: Óscar Pedreira, Félix García, Mario Piattini, Alejandro Cortiñas, Ana Cerdeira-Pena

Abstract: Gamification has been applied in software engineering to improve quality and results by increasing people's motivation and engagement. A systematic mapping has identified research gaps in the field, one of them being the difficulty of creating an integrated gamified environment comprising all the tools of an organization, since most existing gamified tools are custom developments or prototypes. In… ▽ More Gamification has been applied in software engineering to improve quality and results by increasing people's motivation and engagement. A systematic mapping has identified research gaps in the field, one of them being the difficulty of creating an integrated gamified environment comprising all the tools of an organization, since most existing gamified tools are custom developments or prototypes. In this paper, we propose a gamification software architecture that allows us to transform the work environment of a software organization into an integrated gamified environment, i.e., the organization can maintain its tools, and the rewards obtained by the users for their actions in different tools will mount up. We developed a gamification engine based on our proposal, and we carried out a case study in which we applied it in a real software development company. The case study shows that the gamification engine has allowed the company to create a gamified workplace by integrating custom developed tools and off-the-shelf tools such as Redmine, TestLink, or JUnit, with the gamification engine. Two main advantages can be highlighted: (i) our solution allows the organization to maintain its current tools, and (ii) the rewards for actions in any tool accumulate in a centralized gamified environment. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Journal ref: Tsinghua Science and Technology, 25(6):776-797, December 2020

arXiv:2401.14542 [pdf, other]

Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model

Authors: Julia Barnett, Hugo Flores Garcia, Bryan Pardo

Abstract: Every artist has a creative process that draws inspiration from previous artists and their works. Today, "inspiration" has been automated by generative music models. The black box nature of these models obscures the identity of the works that influence their creative output. As a result, users may inadvertently appropriate, misuse, or copy existing artists' works. We establish a replicable methodo… ▽ More Every artist has a creative process that draws inspiration from previous artists and their works. Today, "inspiration" has been automated by generative music models. The black box nature of these models obscures the identity of the works that influence their creative output. As a result, users may inadvertently appropriate, misuse, or copy existing artists' works. We establish a replicable methodology to systematically identify similar pieces of music audio in a manner that is useful for understanding training data attribution. A key aspect of our approach is to harness an effective music audio similarity measure. We compare the effect of applying CLMR and CLAP embeddings to similarity measurement in a set of 5 million audio clips used to train VampNet, a recent open source generative music model. We validate this approach with a human listening study. We also explore the effect that modifications of an audio example (e.g., pitch shifting, time stretching, background noise) have on similarity measurements. This work is foundational to incorporating automated influence attribution into generative modeling, which promises to let model creators and users move from ignorant appropriation to informed creation. Audio samples that accompany this paper are available at https://tinyurl.com/exploring-musical-roots. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 14 pages + references. Under conference review

arXiv:2310.05785 [pdf, other]

Joint object detection and re-identification for 3D obstacle multi-camera systems

Authors: Irene Cortés, Jorge Beltrán, Arturo de la Escalera, Fernando García

Abstract: In recent years, the field of autonomous driving has witnessed remarkable advancements, driven by the integration of a multitude of sensors, including cameras and LiDAR systems, in different prototypes. However, with the proliferation of sensor data comes the pressing need for more sophisticated information processing techniques. This research paper introduces a novel modification to an object det… ▽ More In recent years, the field of autonomous driving has witnessed remarkable advancements, driven by the integration of a multitude of sensors, including cameras and LiDAR systems, in different prototypes. However, with the proliferation of sensor data comes the pressing need for more sophisticated information processing techniques. This research paper introduces a novel modification to an object detection network that uses camera and lidar information, incorporating an additional branch designed for the task of re-identifying objects across adjacent cameras within the same vehicle while elevating the quality of the baseline 3D object detection outcomes. The proposed methodology employs a two-step detection pipeline: initially, an object detection network is employed, followed by a 3D box estimator that operates on the filtered point cloud generated from the network's detections. Extensive experimental evaluations encompassing both 2D and 3D domains validate the effectiveness of the proposed approach and the results underscore the superiority of this method over traditional Non-Maximum Suppression (NMS) techniques, with an improvement of more than 5\% in the car category in the overlapping areas. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2309.13476 [pdf, other]

Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

Authors: Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Abstract: Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-… ▽ More Depression is a common mental disorder. Automatic depression detection tools using speech, enabled by machine learning, help early screening of depression. This paper addresses two limitations that may hinder the clinical implementations of such tools: noise resulting from segment-level labelling and a lack of model interpretability. We propose a bi-modal speech-level transformer to avoid segment-level labelling and introduce a hierarchical interpretation approach to provide both speech-level and sentence-level interpretations, based on gradient-weighted attention maps derived from all attention layers to track interactions between input features. We show that the proposed model outperforms a model that learns at a segment level ($p$=0.854, $r$=0.947, $F1$=0.897 compared to $p$=0.732, $r$=0.808, $F1$=0.768). For model interpretation, using one true positive sample, we show which sentences within a given speech are most relevant to depression detection; and which text tokens and Mel-spectrogram regions within these sentences are most relevant to depression detection. These interpretations allow clinicians to verify the validity of predictions made by depression detection tools, promoting their clinical implementations. △ Less

Submitted 6 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: 5 pages, 3 figures, submitted to IEEE International Conference on Acoustics, Speech, and Signal Processing

ACM Class: F.2.2; I.2.7

arXiv:2307.04686 [pdf, other]

VampNet: Music Generation via Masked Acoustic Token Modeling

Authors: Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo

Abstract: We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that at… ▽ More We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online. △ Less

Submitted 12 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

arXiv:2301.02487 [pdf, other]

Watching your call: Breaking VoLTE Privacy in LTE/5G Networks

Authors: Zishuai Cheng, Mihai Ordean, Flavio D. Garcia, Baojiang Cui, Dominik Rys

Abstract: Voice over LTE (VoLTE) and Voice over NR (VoNR) are two similar technologies that have been widely deployed by operators to provide a better calling experience in LTE and 5G networks, respectively. The VoLTE/NR protocols rely on the security features of the underlying LTE/5G network to protect users' privacy such that nobody can monitor calls and learn details about call times, duration, and direc… ▽ More Voice over LTE (VoLTE) and Voice over NR (VoNR) are two similar technologies that have been widely deployed by operators to provide a better calling experience in LTE and 5G networks, respectively. The VoLTE/NR protocols rely on the security features of the underlying LTE/5G network to protect users' privacy such that nobody can monitor calls and learn details about call times, duration, and direction. In this paper, we introduce a new privacy attack which enables adversaries to analyse encrypted LTE/5G traffic and recover any VoLTE/NR call details. We achieve this by implementing a novel mobile-relay adversary which is able to remain undetected by using an improved physical layer parameter guessing procedure. This adversary facilitates the recovery of encrypted configuration messages exchanged between victim devices and the mobile network. We further propose an identity mapping method which enables our mobile-relay adversary to link a victim's network identifiers to the phone number efficiently, requiring a single VoLTE protocol message. We evaluate the real-world performance of our attacks using four modern commercial off-the-shelf phones and two representative, commercial network carriers. We collect over 60 hours of traffic between the phones and the mobile networks and execute 160 VoLTE calls, which we use to successfully identify patterns in the physical layer parameter allocation and in VoLTE traffic, respectively. Our real-world experiments show that our mobile-relay works as expected in all test cases, and the VoLTE activity logs recovered describe the actual communication with 100% accuracy. Finally, we show that we can link network identifiers such as International Mobile Subscriber Identities (IMSI), Subscriber Concealed Identifiers (SUCI) and/or Globally Unique Temporary Identifiers (GUTI) to phone numbers while remaining undetected by the victim. △ Less

Submitted 6 January, 2023; originally announced January 2023.

arXiv:2208.08655 [pdf, other]

Generating Synthetic Clinical Data that Capture Class Imbalanced Distributions with Generative Adversarial Networks: Example using Antiretroviral Therapy for HIV

Authors: Nicholas I-Hsien Kuo, Federico Garcia, Anders Sönnerborg, Maurizio Zazzi, Michael Böhm, Rolf Kaiser, Mark Polizzotto, Louisa Jorm, Sebastiano Barbieri

Abstract: Clinical data usually cannot be freely distributed due to their highly confidential nature and this hampers the development of machine learning in the healthcare domain. One way to mitigate this problem is by generating realistic synthetic datasets using generative adversarial networks (GANs). However, GANs are known to suffer from mode collapse thus creating outputs of low diversity. This lowers… ▽ More Clinical data usually cannot be freely distributed due to their highly confidential nature and this hampers the development of machine learning in the healthcare domain. One way to mitigate this problem is by generating realistic synthetic datasets using generative adversarial networks (GANs). However, GANs are known to suffer from mode collapse thus creating outputs of low diversity. This lowers the quality of the synthetic healthcare data, and may cause it to omit patients of minority demographics or neglect less common clinical practices. In this paper, we extend the classic GAN setup with an additional variational autoencoder (VAE) and include an external memory to replay latent features observed from the real samples to the GAN generator. Using antiretroviral therapy for human immunodeficiency virus (ART for HIV) as a case study, we show that our extended setup overcomes mode collapse and generates a synthetic dataset that accurately describes severely imbalanced class distributions commonly found in real-world clinical variables. In addition, we demonstrate that our synthetic dataset is associated with a very low patient disclosure risk, and that it retains a high level of utility from the ground truth dataset to support the development of downstream machine learning algorithms. △ Less

Submitted 20 January, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: In the near future, we will make our codes and synthetic datasets publicly available to facilitate future research. Follow us on https://healthgym.ai/

arXiv:2208.03528 [pdf, other]

MetaEmu: An Architecture Agnostic Rehosting Framework for Automotive Firmware

Authors: Zitai Chen, Sam L. Thomas, Flavio D. Garcia

Abstract: In this paper we present MetaEmu, an architecture-agnostic emulator synthesizer geared towards rehosting and security analysis of automotive firmware. MetaEmu improves over existing rehosting environments in two ways: Firstly, it solves the hitherto open-problem of a lack of generic Virtual Execution Environments (VXEs) for rehosting by synthesizing processor simulators from Ghidra's language defi… ▽ More In this paper we present MetaEmu, an architecture-agnostic emulator synthesizer geared towards rehosting and security analysis of automotive firmware. MetaEmu improves over existing rehosting environments in two ways: Firstly, it solves the hitherto open-problem of a lack of generic Virtual Execution Environments (VXEs) for rehosting by synthesizing processor simulators from Ghidra's language definitions. In doing so, MetaEmu can simulate any processor supported by a vast and growing library of open-source definitions. In MetaEmu, we use a specification-based approach to cover peripherals, execution models, and analyses, which allows our framework to be easily extended. Secondly, MetaEmu can rehost and analyze multiple targets, each of different architecture, simultaneously, and share analysis facts between each target's analysis environment, a technique we call inter-device analysis. We show that the flexibility afforded by our approach does not lead to a performance trade-off -- MetaEmu lifts rehosted firmware to an optimized intermediate representation, and provides performance comparable to existing emulation tools, such as Unicorn. Our evaluation spans five different architectures, bare-metal and RTOS-based firmware, and three kinds of automotive Electronic Control Unit (ECU) from four distinct vendors -- none of which can be rehosted or emulated by current tools, due to lack of processor support. Further, we show how MetaEmu enables a diverse set of analyses by implementing a fuzzer, a symbolic executor for solving peripheral access checks, a CAN ID reverse engineering tool, and an inter-device coverage tracker. △ Less

Submitted 6 August, 2022; originally announced August 2022.

arXiv:2205.12152 [pdf, other]

Performance analysis of downlink MIMO-NOMA systems over Weibull fading channels

Authors: Lenin Patricio Jiménez Jiménez, Fernando Darío Almeida García, Maria Cecilia Luna Alvarado, Gustavo Fraidenraich, Michel Daoud Yacoub, José Cândido Silveira Santos Filho, Eduardo Rodrigues de Lima

Abstract: This work analyzes the performance of a downlink multi-user multiple-input multiple-output (MU-MIMO) non-orthogonal multiple access (NOMA) communications system. To reduce hardware complexity and exploit antenna diversity, we consider a transmit antenna selection (TAS) scheme and equal-gain combining (EGC) receivers. Further, we consider Weibull-distributed fading channels to account for non-linea… ▽ More This work analyzes the performance of a downlink multi-user multiple-input multiple-output (MU-MIMO) non-orthogonal multiple access (NOMA) communications system. To reduce hardware complexity and exploit antenna diversity, we consider a transmit antenna selection (TAS) scheme and equal-gain combining (EGC) receivers. Further, we consider Weibull-distributed fading channels to account for non-linearities of the propagation medium and to cover, as special cases, important fading scenarios such as Rayleigh and exponential models. Performance metrics such as the outage probability (OP) and the average bit error rate (ABER) are derived in an exact manner. An asymptotic analysis for the OP and for the ABER is also carried out. Moreover, we obtain exact expressions for the probability density function (PDF) and the cumulative distribution function (CDF) of the end-to-end signal-to-noise ratio (SNR). Interestingly, our results indicate that, except for the first user (nearest user), in a high-SNR regime the ABER achieves a performance floor that depends solely on the user's power allocation coefficient and on the type of modulation, and not on the channel statistics or the amount of transmit and receive antennas. To the best of the authors' knowledge, no performance analyses have been reported in the literature for the considered scenario. The validity of all our expressions is confirmed via Monte-Carlo simulations. △ Less

Submitted 8 August, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 6 pages, 5 figures, submited to IEEE GLOBECOM 2022 - WORKSHOP

arXiv:2205.06567 [pdf, other]

Millimeter-Wave Automotive Radar Spoofing

Authors: Mihai Ordean, Flavio D. Garcia

Abstract: Millimeter-wave radar systems are one of the core components of the safety-critical Advanced Driver Assistant System (ADAS) of a modern vehicle. Due to their ability to operate efficiently despite bad weather conditions and poor visibility, they are often the only reliable sensor a car has to detect and evaluate potential dangers in the surrounding environment. In this paper, we propose several at… ▽ More Millimeter-wave radar systems are one of the core components of the safety-critical Advanced Driver Assistant System (ADAS) of a modern vehicle. Due to their ability to operate efficiently despite bad weather conditions and poor visibility, they are often the only reliable sensor a car has to detect and evaluate potential dangers in the surrounding environment. In this paper, we propose several attacks against automotive radars for the purposes of assessing their reliability in real-world scenarios. Using COTS hardware, we are able to successfully interfere with automotive-grade FMCW radars operating in the commonly used 77GHz frequency band, deployed in real-world, truly wireless environments. Our strongest type of interference is able to trick the victim into detecting virtual (moving) objects. We also extend this attack with a novel method that leverages noise to remove real-world objects, thus complementing the aforementioned object spoofing attack. We evaluate the viability of our attacks in two ways. First, we establish a baseline by implementing and evaluating an unrealistically powerful adversary which requires synchronization to the victim in a limited setup that uses wire-based chirp synchronization. Later, we implement, for the first time, a truly wireless attack that evaluates a weaker but realistic adversary which is non-synchronized and does not require any adjustment feedback from the victim. Finally, we provide theoretical fundamentals for our findings, and discuss the efficiency of potential countermeasures against the proposed attacks. We plan to release our software as open-source. △ Less

Submitted 13 May, 2022; originally announced May 2022.

arXiv:2205.00763 [pdf, other]

Data-driven emotional body language generation for social robotics

Authors: Mina Marmpena, Fernando Garcia, Angelica Lim, Nikolas Hemion, Thomas Wennekers

Abstract: In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration, since humans attribute, and perhaps subconsciously anticipate, such traces to perceive an agent as engaging, trustworthy, and socially present. Robotic emotional body language needs to be believable, nuanced and relevant to the context. We im… ▽ More In social robotics, endowing humanoid robots with the ability to generate bodily expressions of affect can improve human-robot interaction and collaboration, since humans attribute, and perhaps subconsciously anticipate, such traces to perceive an agent as engaging, trustworthy, and socially present. Robotic emotional body language needs to be believable, nuanced and relevant to the context. We implemented a deep learning data-driven framework that learns from a few hand-designed robotic bodily expressions and can generate numerous new ones of similar believability and lifelikeness. The framework uses the Conditional Variational Autoencoder model and a sampling approach based on the geometric properties of the model's latent space to condition the generative process on targeted levels of valence and arousal. The evaluation study found that the anthropomorphism and animacy of the generated expressions are not perceived differently from the hand-designed ones, and the emotional conditioning was adequately differentiable between most levels except the pairs of neutral-positive valence and low-medium arousal. Furthermore, an exploratory analysis of the results reveals a possible impact of the conditioning on the perceived dominance of the robot, as well as on the participants' attention. △ Less

Submitted 2 May, 2022; originally announced May 2022.

Comments: For the associated video of the generated animations, see https://youtu.be/wmLT8FARSk0 and for a repository of the training data, see https://github.com/minamar/rebl-pepper-data

ACM Class: I.2.9

arXiv:2203.06369 [pdf, other]

The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

Authors: Nicholas I-Hsien Kuo, Mark N. Polizzotto, Simon Finfer, Federico Garcia, Anders Sönnerborg, Maurizio Zazzi, Michael Böhm, Louisa Jorm, Sebastiano Barbieri

Abstract: In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their highly confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collec… ▽ More In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their highly confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy in ambulatory care. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low. △ Less

Submitted 12 March, 2022; originally announced March 2022.

arXiv:2110.13323 [pdf, other]

Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit

Authors: Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Dmitry Vedenko, Bryan Pardo

Abstract: We present a software framework that integrates neural networks into the popular open-source audio editing software, Audacity, with a minimal amount of developer effort. In this paper, we showcase some example use cases for both end-users and neural network developers. We hope that this work fosters a new level of interactivity between deep learning practitioners and end-users. We present a software framework that integrates neural networks into the popular open-source audio editing software, Audacity, with a minimal amount of developer effort. In this paper, we showcase some example use cases for both end-users and neural network developers. We hope that this work fosters a new level of interactivity between deep learning practitioners and end-users. △ Less

Submitted 28 October, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

arXiv:2109.11224 [pdf, other]

A Novel Open Set Energy-based Flow Classifier for Network Intrusion Detection

Authors: Manuela M. C. Souza, Camila Pontes, Joao Gondim, Luis P. F. Garcia, Luiz DaSilva, Marcelo A. Marotta

Abstract: Network intrusion detection systems (NIDS) are one of many solutions that make up a computer security system. Several machine learning-based NIDS have been proposed in recent years, but most of them were developed and evaluated under the assumption that the training context is similar to the test context. In real networks, this assumption is false, given the emergence of new attacks and variants o… ▽ More Network intrusion detection systems (NIDS) are one of many solutions that make up a computer security system. Several machine learning-based NIDS have been proposed in recent years, but most of them were developed and evaluated under the assumption that the training context is similar to the test context. In real networks, this assumption is false, given the emergence of new attacks and variants of known attacks. To deal with this reality, the open set recognition field, which is the most general task of recognizing classes not seen during training in any domain, began to gain importance in NIDS research. Yet, existing solutions are often bounded to high temporal complexities and performance bottlenecks. In this work, we propose an algorithm to be used in NIDS that performs open set recognition. Our proposal is an adaptation of the single-class Energy-based Flow Classifier (EFC), which proved to be an algorithm with strong generalization capability and low computational cost. The new version of EFC correctly classifies not only known attacks, but also unknown ones, and differs from other proposals from the literature by presenting a single layer with low temporal complexity. Our proposal was evaluated against well-established multi-class algorithms and as an open set classifier. It proved to be an accurate classifier in both evaluations, similar to the state of the art. As a conclusion of our work, we consider EFC a promising algorithm to be used in NIDS for its high performance and applicability in real networks. △ Less

Submitted 26 April, 2022; v1 submitted 23 September, 2021; originally announced September 2021.

arXiv:2109.10664 [pdf]

A deep neural network for multi-species fish detection using multiple acoustic cameras

Authors: Guglielmo Fernandez Garcia, François Martignac, Marie Nevoux, Laurent Beaulaton, Thomas Corpetti

Abstract: Underwater acoustic cameras are high potential devices for many applications in ecology, notably for fisheries management and monitoring. However how to extract such data into high value information without a time-consuming entire dataset reading by an operator is still a challenge. Moreover the analysis of acoustic imaging, due to its low signal-to-noise ratio, is a perfect training ground for ex… ▽ More Underwater acoustic cameras are high potential devices for many applications in ecology, notably for fisheries management and monitoring. However how to extract such data into high value information without a time-consuming entire dataset reading by an operator is still a challenge. Moreover the analysis of acoustic imaging, due to its low signal-to-noise ratio, is a perfect training ground for experimenting with new approaches, especially concerning Deep Learning techniques. We present hereby a novel approach that takes advantage of both CNN (Convolutional Neural Network) and classical CV (Computer Vision) techniques, able to detect a generic class ''fish'' in acoustic video streams. The pipeline pre-treats the acoustic images to extract 2 features, in order to localise the signals and improve the detection performances. To ensure the performances from an ecological point of view, we propose also a two-step validation, one to validate the results of the trainings and one to test the method on a real-world scenario. The YOLOv3-based model was trained with data of fish from multiple species recorded by the two common acoustic cameras, DIDSON and ARIS, including species of high ecological interest, as Atlantic salmon or European eels. The model we developed provides satisfying results detecting almost 80% of fish and minimizing the false positive rate, however the model is much less efficient for eel detections on ARIS videos. The first CNN pipeline for fish monitoring exploiting video data from two models of acoustic cameras satisfies most of the required features. Many challenges are still present, such as the automation of fish species identification through a multiclass model. 1 However the results point a new solution for dealing with complex data, such as sonar data, which can also be reapplied in other cases where the signal-to-noise ratio is a challenge. △ Less

Submitted 22 September, 2021; originally announced September 2021.

arXiv:2107.07029 [pdf, other]

Leveraging Hierarchical Structures for Few-Shot Musical Instrument Recognition

Authors: Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Bryan Pardo

Abstract: Deep learning work on musical instrument recognition has generally focused on instrument classes for which we have abundant data. In this work, we exploit hierarchical relationships between instruments in a few-shot learning setup to enable classification of a wider set of musical instruments, given a few examples at inference. We apply a hierarchical loss function to the training of prototypical… ▽ More Deep learning work on musical instrument recognition has generally focused on instrument classes for which we have abundant data. In this work, we exploit hierarchical relationships between instruments in a few-shot learning setup to enable classification of a wider set of musical instruments, given a few examples at inference. We apply a hierarchical loss function to the training of prototypical networks, combined with a method to aggregate prototypes hierarchically, mirroring the structure of a predefined musical instrument hierarchy. These extensions require no changes to the network architecture and new levels can be easily added or removed. Compared to a non-hierarchical few-shot baseline, our method leads to a significant increase in classification accuracy and significant decrease mistake severity on instrument classes unseen in training. △ Less

Submitted 29 July, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

arXiv:2106.09455 [pdf]

Conference proceedings KI4Industry AI for SMEs -- The online congress for practical entry into AI for SMEs

Authors: Michael Arnemann, Per Olof Beckemeier, Thomas Bertram, Michael Eder, Maximilian Erschig, Matthias Feiner, Francisco Javier Fernandez Garcia, Frederic Foerster, Ruediger Haas, Martin Kipfmueller, Jan Kotschenreuther, Bernd Langer, Ivan Lozada Rodriguez, Thomas Meibert, Simon Ottenhaus, Stefan Paschek, Lars Pfotzer, Michael M. Roth, Tim Schanz, Philip Scherer, Janine Schwienke, Martin Simon, Robin Tenscher-Philipp

Abstract: The Institute of Materials and Processes, IMP, of the University of Applied Sciences in Karlsruhe, Germany in cooperation with VDI Verein Deutscher Ingenieure e.V, AEN Automotive Engineering Network and their cooperation partners present their competences of AI-based solution approaches in the production engineering field. The online congress KI 4 Industry on November 12 and 13, 2020, showed what… ▽ More The Institute of Materials and Processes, IMP, of the University of Applied Sciences in Karlsruhe, Germany in cooperation with VDI Verein Deutscher Ingenieure e.V, AEN Automotive Engineering Network and their cooperation partners present their competences of AI-based solution approaches in the production engineering field. The online congress KI 4 Industry on November 12 and 13, 2020, showed what opportunities the use of artificial intelligence offers for medium-sized manufacturing companies, SMEs, and where potential fields of application lie. The main purpose of KI 4 Industry is to increase the transfer of knowledge, research and technology from universities to small and medium-sized enterprises, to demystify the term AI and to encourage companies to use AI-based solutions in their own value chain or in their products. △ Less

Submitted 5 August, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

Comments: Editors: Matthias Feiner and Manuel Schoellhorn, 72 pages, 48 figures, in German, Conference proceedings KI 4 Industry, 79 pages in total

arXiv:2104.11448 [pdf]

How to help university students to manage their interruptions and improve their attention and time management

Authors: Aurora Vizcaíno, Ignacio García-Rodríguez de Guzmán, Antonio Manjavacas, Félix García, José A. Cruz-Lemus, Manuel Ángel Serrano

Abstract: Technology has changed both our way of life and the way in which we learn. Students now attend lectures with laptops and mobile phones, and this situation is accentuated in the case of students on Computer Science degrees, since they require their computers in order to participate in both theoretical and practical lessons. Problems, however, arise when the students' social networks are opened on t… ▽ More Technology has changed both our way of life and the way in which we learn. Students now attend lectures with laptops and mobile phones, and this situation is accentuated in the case of students on Computer Science degrees, since they require their computers in order to participate in both theoretical and practical lessons. Problems, however, arise when the students' social networks are opened on their computers and they receive notifications that interrupt their work. We set up a workshop regarding time, thoughts and attention management with the objective of teaching our students techniques that would allow them to manage interruptions, concentrate better and definitively make better use of their time. Those who took part in the workshop were then evaluated to discover its effects. The results obtained are quite optimistic and are described in this paper with the objective of encouraging other universities to perform similar initiatives. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: 15 pages + appendices + references, 10 tables, 6 figures. Exposed in https://jenui2020.uv.es/

arXiv:2104.11021 [pdf, other]

Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing Simulation-to-Real Domain Shift in LiDAR Bird's Eye View

Authors: Alejandro Barrera, Jorge Beltrán, Carlos Guindel, Jose Antonio Iglesias, Fernando García

Abstract: The performance of object detection methods based on LiDAR information is heavily impacted by the availability of training data, usually limited to certain laser devices. As a result, the use of synthetic data is becoming popular when training neural network models, as both sensor specifications and driving scenarios can be generated ad-hoc. However, bridging the gap between virtual and real envir… ▽ More The performance of object detection methods based on LiDAR information is heavily impacted by the availability of training data, usually limited to certain laser devices. As a result, the use of synthetic data is becoming popular when training neural network models, as both sensor specifications and driving scenarios can be generated ad-hoc. However, bridging the gap between virtual and real environments is still an open challenge, as current simulators cannot completely mimic real LiDAR operation. To tackle this issue, domain adaptation strategies are usually applied, obtaining remarkable results on vehicle detection when applied to range view (RV) and bird's eye view (BEV) projections while failing for smaller road agents. In this paper, we present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process. The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark. The obtained results show the advantages of the proposed method over the existing alternatives. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: Submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC2021)

arXiv:2103.15871 [pdf, other]

Industry Scale Semi-Supervised Learning for Natural Language Understanding

Authors: Luoxin Chen, Francisco Garcia, Varun Kumar, He Xie, Jianhua Lu

Abstract: This paper presents a production Semi-Supervised Learning (SSL) pipeline based on the student-teacher framework, which leverages millions of unlabeled examples to improve Natural Language Understanding (NLU) tasks. We investigate two questions related to the use of unlabeled data in production SSL context: 1) how to select samples from a huge unlabeled data pool that are beneficial for SSL trainin… ▽ More This paper presents a production Semi-Supervised Learning (SSL) pipeline based on the student-teacher framework, which leverages millions of unlabeled examples to improve Natural Language Understanding (NLU) tasks. We investigate two questions related to the use of unlabeled data in production SSL context: 1) how to select samples from a huge unlabeled data pool that are beneficial for SSL training, and 2) how do the selected data affect the performance of different state-of-the-art SSL techniques. We compare four widely used SSL techniques, Pseudo-Label (PL), Knowledge Distillation (KD), Virtual Adversarial Training (VAT) and Cross-View Training (CVT) in conjunction with two data selection methods including committee-based selection and submodular optimization based selection. We further examine the benefits and drawbacks of these techniques when applied to intent classification (IC) and named entity recognition (NER) tasks, and provide guidelines specifying when each of these methods might be beneficial to improve large scale NLU systems. △ Less

Submitted 29 March, 2021; originally announced March 2021.

Comments: NAACL 2021 Industry track

arXiv:2102.09364 [pdf]

Ethics as a service: a pragmatic operationalisation of AI Ethics

Authors: Jessica Morley, Anat Elhalal, Francesca Garcia, Libby Kinsey, Jakob Mokander, Luciano Floridi

Abstract: As the range of potential uses for Artificial Intelligence (AI), in particular machine learning (ML), has increased, so has awareness of the associated ethical issues. This increased awareness has led to the realisation that existing legislation and regulation provides insufficient protection to individuals, groups, society, and the environment from AI harms. In response to this realisation, there… ▽ More As the range of potential uses for Artificial Intelligence (AI), in particular machine learning (ML), has increased, so has awareness of the associated ethical issues. This increased awareness has led to the realisation that existing legislation and regulation provides insufficient protection to individuals, groups, society, and the environment from AI harms. In response to this realisation, there has been a proliferation of principle-based ethics codes, guidelines and frameworks. However, it has become increasingly clear that a significant gap exists between the theory of AI ethics principles and the practical design of AI systems. In previous work, we analysed whether it is possible to close this gap between the what and the how of AI ethics through the use of tools and methods designed to help AI developers, engineers, and designers translate principles into practice. We concluded that this method of closure is currently ineffective as almost all existing translational tools and methods are either too flexible (and thus vulnerable to ethics washing) or too strict (unresponsive to context). This raised the question: if, even with technical guidance, AI ethics is challenging to embed in the process of algorithmic design, is the entire pro-ethical design endeavour rendered futile? And, if no, then how can AI ethics be made useful for AI practitioners? This is the question we seek to address here by exploring why principles and technical translational tools are still needed even if they are limited, and how these limitations can be potentially overcome by providing theoretical grounding of a concept that has been termed Ethics as a Service. △ Less

Submitted 11 February, 2021; originally announced February 2021.

Comments: 21 pages, first draft

arXiv:2101.04431 [pdf, other]

doi 10.1109/TITS.2022.3155228

Automatic Extrinsic Calibration Method for LiDAR and Camera Sensor Setups

Authors: Jorge Beltrán, Carlos Guindel, Arturo de la Escalera, Fernando García

Abstract: Most sensor setups for onboard autonomous perception are composed of LiDARs and vision systems, as they provide complementary information that improves the reliability of the different algorithms necessary to obtain a robust scene understanding. However, the effective use of information from different sources requires an accurate calibration between the sensors involved, which usually implies a te… ▽ More Most sensor setups for onboard autonomous perception are composed of LiDARs and vision systems, as they provide complementary information that improves the reliability of the different algorithms necessary to obtain a robust scene understanding. However, the effective use of information from different sources requires an accurate calibration between the sensors involved, which usually implies a tedious and burdensome process. We present a method to calibrate the extrinsic parameters of any pair of sensors involving LiDARs, monocular or stereo cameras, of the same or different modalities. The procedure is composed of two stages: first, reference points belonging to a custom calibration target are extracted from the data provided by the sensors to be calibrated, and second, the optimal rigid transformation is found through the registration of both point sets. The proposed approach can handle devices with very different resolutions and poses, as usually found in vehicle setups. In order to assess the performance of the proposed method, a novel evaluation suite built on top of a popular simulation framework is introduced. Experiments on the synthetic environment show that our calibration algorithm significantly outperforms existing methods, whereas real data tests corroborate the results obtained in the evaluation suite. Open-source code is available at https://github.com/beltransen/velo2cam_calibration △ Less

Submitted 15 March, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

Comments: Published on IEEE Transactions on Intelligent Transportation Systems

Journal ref: IEEE Transactions on Intelligent Transportation Systems, 2022

arXiv:2101.01835 [pdf, other]

Risk markers by sex for in-hospital mortality in patients with acute coronary syndrome: a machine learning approach

Authors: Blanca Vazquez, Gibran Fuentes-Pineda, Fabian Garcia, Gabriela Borrayo, Juan Prohias

Abstract: Background: Several studies have highlighted the importance of considering sex differences in the diagnosis and treatment of Acute Coronary Syndrome (ACS). However, the identification of sex-specific risk markers in ACS sub-populations has been scarcely studied. The present study aims to explore machine learning (ML) models to identify in-hospital mortality markers for women and men in ACS sub-pop… ▽ More Background: Several studies have highlighted the importance of considering sex differences in the diagnosis and treatment of Acute Coronary Syndrome (ACS). However, the identification of sex-specific risk markers in ACS sub-populations has been scarcely studied. The present study aims to explore machine learning (ML) models to identify in-hospital mortality markers for women and men in ACS sub-populations collected from a public database of electronic health records (EHR). Methods: We extracted 1,299 patients with ST-elevation myocardial infarction (STEMI) and 2,820 patients with non-ST-elevation myocardial infarction (NSTEMI) from the Medical Information Mart for Intensive Care (MIMIC)-III database. We trained and validated mortality prediction models and used an interpretability technique to identify sex-specific markers for each sub-population. Results: The models based on eXtreme Gradient Boosting (XGBoost) achieved the highest performance: area under the curve (AUC) = 0.94 (95\% CI:0.84-0.96) for STEMI and AUC = 0.94 (95\% CI:0.80-0.90) for NSTEMI. For STEMI, the top markers in women are chronic kidney failure, high heart rate, and age over 70 years. For men, the top markers are acute kidney failure, high troponin T levels, and age over 75 years. However, for NSTEMI, the top markers in women are low troponin levels, high urea levels, and age over 80 years. For men, the top markers are high heart rate, creatinine levels, and age over 70 years. Conclusions: Our results show possible significant and coherent sex-specific risk markers of different ACS sub-populations by interpreting ML mortality models trained on EHRs. Differences are observed in the identified risk markers between women and men, highlighting the importance of considering sex-specific markers in implementing more appropriate treatment strategies and better clinical outcomes. △ Less

Submitted 17 November, 2021; v1 submitted 5 January, 2021; originally announced January 2021.

Comments: Accepted article: Informatics in Medicine Unlocked

ACM Class: I.2.1

arXiv:2012.03855 [pdf, other]

The Lab vs The Crowd: An Investigation into Data Quality for Neural Dialogue Models

Authors: José Lopes, Francisco J. Chiyah Garcia, Helen Hastie

Abstract: Challenges around collecting and processing quality data have hampered progress in data-driven dialogue models. Previous approaches are moving away from costly, resource-intensive lab settings, where collection is slow but where the data is deemed of high quality. The advent of crowd-sourcing platforms, such as Amazon Mechanical Turk, has provided researchers with an alternative cost-effective and… ▽ More Challenges around collecting and processing quality data have hampered progress in data-driven dialogue models. Previous approaches are moving away from costly, resource-intensive lab settings, where collection is slow but where the data is deemed of high quality. The advent of crowd-sourcing platforms, such as Amazon Mechanical Turk, has provided researchers with an alternative cost-effective and rapid way to collect data. However, the collection of fluid, natural spoken or textual interaction can be challenging, particularly between two crowd-sourced workers. In this study, we compare the performance of dialogue models for the same interaction task but collected in two different settings: in the lab vs. crowd-sourced. We find that fewer lab dialogues are needed to reach similar accuracy, less than half the amount of lab data as crowd-sourced data. We discuss the advantages and disadvantages of each data collection method. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: Accepted at Human in the Loop Dialogue Systems Workshop @NeurIPS 2020

arXiv:2011.02382 [pdf, other]

doi 10.1016/j.compmedimag.2020.101816

Noise Reduction to Compute Tissue Mineral Density and Trabecular Bone Volume Fraction from Low Resolution QCT

Authors: Felix Thomsen, José M. Fuertes García, Manuel Lucena, Juan Pisula, Rodrigo de Luis García, Jan Broggrefe, Claudio Delrieux

Abstract: We propose a 3D neural network with specific loss functions for quantitative computed tomography (QCT) noise reduction to compute micro-structural parameters such as tissue mineral density (TMD) and bone volume ratio (BV/TV) with significantly higher accuracy than using no or standard noise reduction filters. The vertebra-phantom study contained high resolution peripheral and clinical CT scans wit… ▽ More We propose a 3D neural network with specific loss functions for quantitative computed tomography (QCT) noise reduction to compute micro-structural parameters such as tissue mineral density (TMD) and bone volume ratio (BV/TV) with significantly higher accuracy than using no or standard noise reduction filters. The vertebra-phantom study contained high resolution peripheral and clinical CT scans with simulated in vivo CT noise and nine repetitions of three different tube currents (100, 250 and 360 mAs). Five-fold cross validation was performed on 20466 purely spongy pairs of noisy and ground-truth patches. Comparison of training and test errors revealed high robustness against over-fitting. While not showing effects for the assessment of BMD and voxel-wise densities, the filter improved thoroughly the computation of TMD and BV/TV with respect to the unfiltered data. Root-mean-square and accuracy errors of low resolution TMD and BV/TV decreased to less than 17% of the initial values. Furthermore filtered low resolution scans revealed still more TMD- and BV/TV-relevant information than high resolution CT scans, either unfiltered or filtered with two state-of-the-art standard denoising methods. The proposed architecture is threshold and rotational invariant, applicable on a wide range of image resolutions at once, and likely serves for an accurate computation of further micro-structural parameters. Furthermore, it is less prone for over-fitting than neural networks that compute structural parameters directly. In conclusion, the method is potentially important for the diagnosis of osteoporosis and other bone diseases since it allows to assess relevant 3D micro-structural information from standard low exposure CT protocols such as 100 mAs and 120 kVp. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: A revised version of this manuscript was accepted for publication in Computerized Medical Imaging and Graphics

arXiv:2010.06047 [pdf, other]

Artificial Intelligence, speech and language processing approaches to monitoring Alzheimer's Disease: a systematic review

Authors: Sofia de la Fuente Garcia, Craig Ritchie, Saturnino Luz

Abstract: Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's… ▽ More Language is a valuable source of clinical information in Alzheimer's Disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in the context of Alzheimer's Disease, detailing current research procedures, highlighting their limitations and suggesting strategies to address them. We conducted a systematic review of original research between 2000 and 2019, registered in PROSPERO (reference CRD42018116606). An interdisciplinary search covered six databases on engineering (ACM and IEEE), psychology (PsycINFO), medicine (PubMed and Embase) and Web of Science. Bibliographies of relevant papers were screened until December 2019. From 3,654 search results 51 articles were selected against the eligibility criteria. Four tables summarise their findings: study details (aim, population, interventions, comparisons, methods and outcomes), data details (size, type, modalities, annotation, balance, availability and language of study), methodology (pre-processing, feature generation, machine learning, evaluation and results) and clinical applicability (research implications, clinical potential, risk of bias and strengths/limitations). While promising results are reported across nearly all 51 studies, very few have been implemented in clinical research or practice. We concluded that the main limitations of the field are poor standardisation, limited comparability of results, and a degree of disconnect between study aims and clinical applications. Attempts to close these gaps should support translation of future research into clinical practice. △ Less

Submitted 12 October, 2020; originally announced October 2020.

Comments: Pre-print submitted to the Journal of Alzheimer's Disease

ACM Class: J.3; I.2.7; I.2.6; I.5.4

arXiv:2008.09672 [pdf, other]

Towards Autonomous Driving: a Multi-Modal 360$^{\circ}$ Perception Proposal

Authors: Jorge Beltrán, Carlos Guindel, Irene Cortés, Alejandro Barrera, Armando Astudillo, Jesús Urdiales, Mario Álvarez, Farid Bekka, Vicente Milanés, Fernando García

Abstract: In this paper, a multi-modal 360$^{\circ}$ framework for 3D object detection and tracking for autonomous vehicles is presented. The process is divided into four main stages. First, images are fed into a CNN network to obtain instance segmentation of the surrounding road participants. Second, LiDAR-to-image association is performed for the estimated mask proposals. Then, the isolated points of ever… ▽ More In this paper, a multi-modal 360$^{\circ}$ framework for 3D object detection and tracking for autonomous vehicles is presented. The process is divided into four main stages. First, images are fed into a CNN network to obtain instance segmentation of the surrounding road participants. Second, LiDAR-to-image association is performed for the estimated mask proposals. Then, the isolated points of every object are processed by a PointNet ensemble to compute their corresponding 3D bounding boxes and poses. Lastly, a tracking stage based on Unscented Kalman Filter is used to track the agents along time. The solution, based on a novel sensor fusion configuration, provides accurate and reliable road environment detection. A wide variety of tests of the system, deployed in an autonomous vehicle, have successfully assessed the suitability of the proposed perception stack in a real autonomous driving application. △ Less

Submitted 21 August, 2020; originally announced August 2020.

Comments: Accepted for publication in IEEE ITSC 2020

arXiv:2006.09090 [pdf, other]

doi 10.1109/ITSC.2019.8917501

Response of Vulnerable Road Users to Visual Information from Autonomous Vehicles in Shared Spaces

Authors: Walter Morales Alvarez, Miguel Ángel de Miguel, Fernando García, Cristina Olaverri-Monreal

Abstract: Completely unmanned autonomous vehicles have been anticipated for a while. Initially, these are expected to drive only under certain conditions on some roads, and advanced functionality is required to cope with the ever-increasing challenges of safety. To enhance the public's perception of road safety and trust in new vehicular technologies, we investigate in this paper the effect of several inter… ▽ More Completely unmanned autonomous vehicles have been anticipated for a while. Initially, these are expected to drive only under certain conditions on some roads, and advanced functionality is required to cope with the ever-increasing challenges of safety. To enhance the public's perception of road safety and trust in new vehicular technologies, we investigate in this paper the effect of several interaction paradigms with vulnerable road users by developing and applying algorithms for the automatic analysis of pedestrian body language. We assess behavioral patterns and determine the impact of the coexistence of AVs and other road users on general road safety in a shared space for VRUs and vehicles. Results showed that the implementation of visual communication cues for interacting with VRUs is not necessarily required for a shared space in which informal traffic rules apply. △ Less

Submitted 22 July, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

Comments: Published paper in the IEEE Intelligent Transportation Systems Conference - ITSC 2019

Journal ref: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 3714-3719

arXiv:2004.00689 [pdf, other]

doi 10.1145/3319502.3374789

Robots in the Danger Zone: Exploring Public Perception through Engagement

Authors: David A. Robb, Muneeb I. Ahmad, Carlo Tiseo, Simona Aracri, Alistair C. McConnell, Vincent Page, Christian Dondrup, Francisco J. Chiyah Garcia, Hai-Nguyen Nguyen, Èric Pairet, Paola Ardón Ramírez, Tushar Semwal, Hazel M. Taylor, Lindsay J. Wilson, David Lane, Helen Hastie, Katrin Lohan

Abstract: Public perceptions of Robotics and Artificial Intelligence (RAI) are important in the acceptance, uptake, government regulation and research funding of this technology. Recent research has shown that the public's understanding of RAI can be negative or inaccurate. We believe effective public engagement can help ensure that public opinion is better informed. In this paper, we describe our first ite… ▽ More Public perceptions of Robotics and Artificial Intelligence (RAI) are important in the acceptance, uptake, government regulation and research funding of this technology. Recent research has shown that the public's understanding of RAI can be negative or inaccurate. We believe effective public engagement can help ensure that public opinion is better informed. In this paper, we describe our first iteration of a high throughput in-person public engagement activity. We describe the use of a light touch quiz-format survey instrument to integrate in-the-wild research participation into the engagement, allowing us to probe both the effectiveness of our engagement strategy, and public perceptions of the future roles of robots and humans working in dangerous settings, such as in the off-shore energy sector. We critique our methods and share interesting results into generational differences within the public's view of the future of Robotics and AI in hazardous environments. These findings include that older peoples' views about the future of robots in hazardous environments were not swayed by exposure to our exhibit, while the views of younger people were affected by our exhibit, leading us to consider carefully in future how to more effectively engage with and inform older people. △ Less

Submitted 1 April, 2020; originally announced April 2020.

Comments: Accepted in HRI 2020, Keywords: Human robot interaction, robotics, artificial intelligence, public engagement, public perceptions of robots, robotics and society

ACM Class: K.4.m; I.2.9

Journal ref: In Human-Robot Interaction HRI 2020, ACM, NY, USA, 10 pages

arXiv:2003.05995 [pdf, other]

CRWIZ: A Framework for Crowdsourcing Real-Time Wizard-of-Oz Dialogues

Authors: Francisco J. Chiyah Garcia, José Lopes, Xingkun Liu, Helen Hastie

Abstract: Large corpora of task-based and open-domain conversational dialogues are hugely valuable in the field of data-driven dialogue systems. Crowdsourcing platforms, such as Amazon Mechanical Turk, have been an effective method for collecting such large amounts of data. However, difficulties arise when task-based dialogues require expert domain knowledge or rapid access to domain-relevant information, s… ▽ More Large corpora of task-based and open-domain conversational dialogues are hugely valuable in the field of data-driven dialogue systems. Crowdsourcing platforms, such as Amazon Mechanical Turk, have been an effective method for collecting such large amounts of data. However, difficulties arise when task-based dialogues require expert domain knowledge or rapid access to domain-relevant information, such as databases for tourism. This will become even more prevalent as dialogue systems become increasingly ambitious, expanding into tasks with high levels of complexity that require collaboration and forward planning, such as in our domain of emergency response. In this paper, we propose CRWIZ: a framework for collecting real-time Wizard of Oz dialogues through crowdsourcing for collaborative, complex tasks. This framework uses semi-guided dialogue to avoid interactions that breach procedures and processes only known to experts, while enabling the capture of a wide variety of interactions. The framework is available at https://github.com/JChiyah/crwiz △ Less

Submitted 12 March, 2020; originally announced March 2020.

Comments: 10 pages, 5 figures. To Appear in LREC 2020

arXiv:2003.05870 [pdf, other]

Natural Language Interaction to Facilitate Mental Models of Remote Robots

Authors: Francisco J. Chiyah Garcia, José Lopes, Helen Hastie

Abstract: Increasingly complex and autonomous robots are being deployed in real-world environments with far-reaching consequences. High-stakes scenarios, such as emergency response or offshore energy platform and nuclear inspections, require robot operators to have clear mental models of what the robots can and can't do. However, operators are often not the original designers of the robots and thus, they do… ▽ More Increasingly complex and autonomous robots are being deployed in real-world environments with far-reaching consequences. High-stakes scenarios, such as emergency response or offshore energy platform and nuclear inspections, require robot operators to have clear mental models of what the robots can and can't do. However, operators are often not the original designers of the robots and thus, they do not necessarily have such clear mental models, especially if they are novice users. This lack of mental model clarity can slow adoption and can negatively impact human-machine teaming. We propose that interaction with a conversational assistant, who acts as a mediator, can help the user with understanding the functionality of remote robots and increase transparency through natural language explanations, as well as facilitate the evaluation of operators' mental models. △ Less

Submitted 12 March, 2020; originally announced March 2020.

Comments: In Workshop on Mental Models of Robots at HRI 2020

arXiv:2003.04188 [pdf, other]

BirdNet+: End-to-End 3D Object Detection in LiDAR Bird's Eye View

Authors: Alejandro Barrera, Carlos Guindel, Jorge Beltrán, Fernando García

Abstract: On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices. Albeit image features are typically preferred for detection, numerous approaches take only spatial data as input. Exploiting this information in inference usually involves the use of compact representations such as the Bird's Eye View (BEV) projection, which entails a loss of informa… ▽ More On-board 3D object detection in autonomous vehicles often relies on geometry information captured by LiDAR devices. Albeit image features are typically preferred for detection, numerous approaches take only spatial data as input. Exploiting this information in inference usually involves the use of compact representations such as the Bird's Eye View (BEV) projection, which entails a loss of information and thus hinders the joint inference of all the parameters of the objects' 3D boxes. In this paper, we present a fully end-to-end 3D object detection framework that can infer oriented 3D boxes solely from BEV images by using a two-stage object detector and ad-hoc regression branches, eliminating the need for a post-processing stage. The method outperforms its predecessor (BirdNet) by a large margin and obtains state-of-the-art results on the KITTI 3D Object Detection Benchmark for all the categories in evaluation. △ Less

Submitted 9 March, 2020; originally announced March 2020.

Comments: Submitted to IEEE International Conference on Intelligent Transportation Systems (ITSC2020)

arXiv:2002.08239 [pdf, other]

siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection

Authors: Irene Cortes, Jorge Beltran, Arturo de la Escalera, Fernando Garcia

Abstract: The rapid development of embedded hardware in autonomous vehicles broadens their computational capabilities, thus bringing the possibility to mount more complete sensor setups able to handle driving scenarios of higher complexity. As a result, new challenges such as multiple detections of the same object have to be addressed. In this work, a siamese network is integrated into the pipeline of a wel… ▽ More The rapid development of embedded hardware in autonomous vehicles broadens their computational capabilities, thus bringing the possibility to mount more complete sensor setups able to handle driving scenarios of higher complexity. As a result, new challenges such as multiple detections of the same object have to be addressed. In this work, a siamese network is integrated into the pipeline of a well-known 3D object detector approach to suppress duplicate proposals coming from different cameras via re-identification. Additionally, associations are exploited to enhance the 3D box regression of the object by aggregating their corresponding LiDAR frustums. The experimental evaluation on the nuScenes dataset shows that the proposed method outperforms traditional NMS approaches. △ Less

Submitted 19 February, 2020; originally announced February 2020.

Comments: Submitted to IEEE Intelligent Vehicles Symposium 2020 (IV2020)

arXiv:2001.01577 [pdf, other]

Learning Reusable Options for Multi-Task Reinforcement Learning

Authors: Francisco M. Garcia, Chris Nota, Philip S. Thomas

Abstract: Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given… ▽ More Reinforcement learning (RL) has become an increasingly active area of research in recent years. Although there are many algorithms that allow an agent to solve tasks efficiently, they often ignore the possibility that prior experience related to the task at hand might be available. For many practical applications, it might be unfeasible for an agent to learn how to solve a task from scratch, given that it is generally a computationally expensive process; however, prior experience could be leveraged to make these problems tractable in practice. In this paper, we propose a framework for exploiting existing experience by learning reusable options. We show that after an agent learns policies for solving a small number of problems, we are able to use the trajectories generated from those policies to learn reusable options that allow an agent to quickly learn how to solve novel and related problems. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Comments: 15 pages, 7 figures, pre-print

arXiv:1908.06917 [pdf, other]

Message Passing for Complex Question Answering over Knowledge Graphs

Authors: Svitlana Vakulenko, Javier David Fernandez Garcia, Axel Polleres, Maarten de Rijke, Michael Cochez

Abstract: Question answering over knowledge graphs (KGQA) has evolved from simple single-fact questions to complex questions that require graph traversal and aggregation. We propose a novel approach for complex KGQA that uses unsupervised message passing, which propagates confidence scores obtained by parsing an input question and matching terms in the knowledge graph to a set of possible answers. First, we… ▽ More Question answering over knowledge graphs (KGQA) has evolved from simple single-fact questions to complex questions that require graph traversal and aggregation. We propose a novel approach for complex KGQA that uses unsupervised message passing, which propagates confidence scores obtained by parsing an input question and matching terms in the knowledge graph to a set of possible answers. First, we identify entity, relationship, and class names mentioned in a natural language question, and map these to their counterparts in the graph. Then, the confidence scores of these mappings propagate through the graph structure to locate the answer entities. Finally, these are aggregated depending on the identified question type. This approach can be efficiently implemented as a series of sparse matrix multiplications mimicking joins over small local subgraphs. Our evaluation results show that the proposed approach outperforms the state-of-the-art on the LC-QuAD benchmark. Moreover, we show that the performance of the approach depends only on the quality of the question interpretation results, i.e., given a correct relevance score distribution, our approach always produces a correct answer ranking. Our error analysis reveals correct answers missing from the benchmark dataset and inconsistencies in the DBpedia knowledge graph. Finally, we provide a comprehensive evaluation of the proposed approach accompanied with an ablation study and an error analysis, which showcase the pitfalls for each of the question answering components in more detail. △ Less

Submitted 19 August, 2019; originally announced August 2019.

Comments: Accepted in CIKM 2019

arXiv:1908.04698 [pdf, other]

Towards Self-Explainable Cyber-Physical Systems

Authors: Mathias Blumreiter, Joel Greenyer, Francisco Javier Chiyah Garcia, Verena Klös, Maike Schwammberger, Christoph Sommer, Andreas Vogelsang, Andreas Wortmann

Abstract: With the increasing complexity of CPSs, their behavior and decisions become increasingly difficult to understand and comprehend for users and other stakeholders. Our vision is to build self-explainable systems that can, at run-time, answer questions about the system's past, current, and future behavior. As hitherto no design methodology or reference framework exists for building such systems, we p… ▽ More With the increasing complexity of CPSs, their behavior and decisions become increasingly difficult to understand and comprehend for users and other stakeholders. Our vision is to build self-explainable systems that can, at run-time, answer questions about the system's past, current, and future behavior. As hitherto no design methodology or reference framework exists for building such systems, we propose the MAB-EX framework for building self-explainable systems that leverage requirements- and explainability models at run-time. The basic idea of MAB-EX is to first Monitor and Analyze a certain behavior of a system, then Build an explanation from explanation models and convey this EXplanation in a suitable way to a stakeholder. We also take into account that new explanations can be learned, by updating the explanation models, should new and yet un-explainable behavior be detected by the system. △ Less

Submitted 13 August, 2019; originally announced August 2019.

arXiv:1907.01294 [pdf, other]

Lane Detection and Classification using Cascaded CNNs

Authors: Fabio Pizzati, Marco Allodi, Alejandro Barrera, Fernando García

Abstract: Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundar… ▽ More Lane detection is extremely important for autonomous vehicles. For this reason, many approaches use lane boundary information to locate the vehicle inside the street, or to integrate GPS-based localization. As many other computer vision based tasks, convolutional neural networks (CNNs) represent the state-of-the-art technology to indentify lane boundaries. However, the position of the lane boundaries w.r.t. the vehicle may not suffice for a reliable positioning, as for path planning or localization information regarding lane types may also be needed. In this work, we present an end-to-end system for lane boundary identification, clustering and classification, based on two cascaded neural networks, that runs in real-time. To build the system, 14336 lane boundaries instances of the TuSimple dataset for lane detection have been labelled using 8 different classes. Our dataset and the code for inference are available online. △ Less

Submitted 17 July, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: Presented at Eurocast 2019

arXiv:1905.00941 [pdf, other]

doi 10.1109/IVS.2019.8814181

Enhanced free space detection in multiple lanes based on single CNN with scene identification

Authors: Fabio Pizzati, Fernando García

Abstract: Many systems for autonomous vehicles' navigation rely on lane detection. Traditional algorithms usually estimate only the position of the lanes on the road, but an autonomous control system may also need to know if a lane marking can be crossed or not, and what portion of space inside the lane is free from obstacles, to make safer control decisions. On the other hand, free space detection algorith… ▽ More Many systems for autonomous vehicles' navigation rely on lane detection. Traditional algorithms usually estimate only the position of the lanes on the road, but an autonomous control system may also need to know if a lane marking can be crossed or not, and what portion of space inside the lane is free from obstacles, to make safer control decisions. On the other hand, free space detection algorithms only detect navigable areas, without information about lanes. State-of-the-art algorithms use CNNs for both tasks, with significant consumption of computing resources. We propose a novel approach that estimates the free space inside each lane, with a single CNN. Additionally, adding only a small requirement concerning GPU RAM, we infer the road type, that will be useful for path planning. To achieve this result, we train a multi-task CNN. Then, we further elaborate the output of the network, to extract polygons that can be effectively used in navigation control. Finally, we provide a computationally efficient implementation, based on ROS, that can be executed in real time. Our code and trained models are available online. △ Less

Submitted 6 May, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

Comments: Will appear in the 2019 IEEE Intelligent Vehicles Symposium (IV 2019)

Journal ref: 2019 IEEE Intelligent Vehicles Symposium (IV)

arXiv:1904.09116 [pdf, other]

Enabling Socially Competent navigation through incorporating HRI

Authors: Arturo Cruz-Maya, Fernando Garcia, Amit Kumar Pandey

Abstract: Over the last years, social robots have been deployed in public environments making evident the need of human-aware navigation capabilities. In this regard, the robotics community have made efforts to include proxemics or social conventions within the navigation approaches. Nevertheless, few works have tackled the problem of labelling humans as an interactive agent when blocking the robot motion t… ▽ More Over the last years, social robots have been deployed in public environments making evident the need of human-aware navigation capabilities. In this regard, the robotics community have made efforts to include proxemics or social conventions within the navigation approaches. Nevertheless, few works have tackled the problem of labelling humans as an interactive agent when blocking the robot motion trajectory. Current state of the art navigation planners will either propose an alternative path or freeze the motion until the path is free. We present the first prototype of a framework designed to enhance social competency of robots while navigating in indoor environments. The implementation is done using Navigation and Object Detection open-source software. Specifically, the Robot Operating System (ROS) navigation stack, and OpenCV with Caffe deep learning models and MobileNet Single Shot Detector (SSD), respectively. △ Less

Submitted 19 April, 2019; originally announced April 2019.

Comments: HRI '19: ACM Workshop on Social Human-Robot Interaction of Human-care Service Robots, March 11--14, 2019, Daegu, Korea

arXiv:1904.08854 [pdf, other]

Wait for me! Towards socially assistive walk companions

Authors: Fernando Garcia, Amit Kumar Pandey, Charles Fattal

Abstract: The aim of the present study involves designing a humanoid robot guide as a walking trainer for elderly and rehabilitation patients. The system is based on the humanoid robot Pepper with a compliance approach that allows to match the motion intention of the user to the robot's pace. This feasibility study is backed up by an experimental evaluation conducted in a rehabilitation centre. We hypothesi… ▽ More The aim of the present study involves designing a humanoid robot guide as a walking trainer for elderly and rehabilitation patients. The system is based on the humanoid robot Pepper with a compliance approach that allows to match the motion intention of the user to the robot's pace. This feasibility study is backed up by an experimental evaluation conducted in a rehabilitation centre. We hypothesize that Pepper robot used as an assistive partner, can also benefit elderly users by motivating them to perform physical activity. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Comments: 2nd Workshop on Social Robots in Therapy and Care. 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI 2019)

Report number: SREC/2019/01

arXiv:1902.00843 [pdf, other]

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Authors: Francisco M. Garcia, Philip S. Thomas

Abstract: In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should… ▽ More In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed approach. △ Less

Submitted 2 February, 2019; originally announced February 2019.

Comments: Accepted as Extended Abstract, AAMAS, 2019

arXiv:1812.05395 [pdf]

Conceptualizing Business Process Maps

Authors: Geert Poels, Felix Garcia, Francisco Ruiz, Mario Piattini

Abstract: Process maps provide a high-level overview of an organisation's business processes. While used for many years in different shapes and forms, there is little shared understanding of the concept and its relationship to enterprise architecture. In this report we position the concept of business process map within the domain of enterprise architecture. Based on literature, we provide a conceptualisati… ▽ More Process maps provide a high-level overview of an organisation's business processes. While used for many years in different shapes and forms, there is little shared understanding of the concept and its relationship to enterprise architecture. In this report we position the concept of business process map within the domain of enterprise architecture. Based on literature, we provide a conceptualisation of the process map as a business process architecture model that can be integrated with the broader enterprise architecture model. From our conceptualisation we derive requirements for designing a meta-model of a modelling language for process maps. The design of this meta-model is the subject of a research paper, entitled Architecting Business Process Maps, for which this report acts as a complement that details the underlying process map conceptualisation. △ Less

Submitted 13 December, 2018; originally announced December 2018.

Comments: 10 pages, 6 figures, technical report that complements the research paper Architecting Business Process Maps of the same authors

arXiv:1811.03566 [pdf, other]

A Natural Language Interface with Relayed Acoustic Communications for Improved Command and Control of AUVs

Authors: David A. Robb, Jonatan Scharff Willners, Nicolas Valeyrie, Francisco J. Chiyah Garcia, Atanas Laskov, Xingkun Liu, Pedro Patron, Helen Hastie, Yvan R. Petillot

Abstract: Autonomous underwater vehicles (AUVs) are being tasked with increasingly complex missions. The acoustic communications required for AUVs are, by the nature of the medium, low bandwidth while adverse environmental conditions underwater often mean they are also intermittent. This has motivated development of highly autonomous systems, which can operate independently of their operators for considerab… ▽ More Autonomous underwater vehicles (AUVs) are being tasked with increasingly complex missions. The acoustic communications required for AUVs are, by the nature of the medium, low bandwidth while adverse environmental conditions underwater often mean they are also intermittent. This has motivated development of highly autonomous systems, which can operate independently of their operators for considerable periods of time. These missions often involve multiple vehicles leading not only to challenges in communications but also in command and control (C2). Specifically operators face complexity in controlling multi-objective, multi-vehicle missions, whilst simultaneously facing uncertainty over the current status and safety of several remote high value assets. Additionally, it may be required to perform command and control of these complex missions in a remote control room. In this paper, we propose a combination of an intuitive, natural language operator interface combined with communications that use platforms from multiple domains to relay data over different mediums and transmission modes, improving command and control of collaborative and fully autonomous missions. In trials, we have demonstrated an integrated system combining working prototypes with established commercial C2 software that enables the use of a natural language interface to monitor an AUV survey mission in an on-shore command and control centre. △ Less

Submitted 9 November, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

Comments: The definitive version of this preprint is to be Published in AUV2018 Keywords: Conversational agent, Natural Language Understanding, Chatbot, AUV, USV, Communication Relay, Acoustic, Communication

arXiv:1809.07269 [pdf, other]

doi 10.1007/978-3-030-05204-1_23

Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots

Authors: Chandrakant Bothe, Fernando Garcia, Arturo Cruz Maya, Amit Kumar Pandey, Stefan Wermter

Abstract: Service robots need to show appropriate social behaviour in order to be deployed in social environments such as healthcare, education, retail, etc. Some of the main capabilities that robots should have are navigation and conversational skills. If the person is impatient, the person might want a robot to navigate faster and vice versa. Linguistic features that indicate politeness can provide social… ▽ More Service robots need to show appropriate social behaviour in order to be deployed in social environments such as healthcare, education, retail, etc. Some of the main capabilities that robots should have are navigation and conversational skills. If the person is impatient, the person might want a robot to navigate faster and vice versa. Linguistic features that indicate politeness can provide social cues about a person's patient and impatient behaviour. The novelty presented in this paper is to dynamically incorporate politeness in robotic dialogue systems for navigation. Understanding the politeness in users' speech can be used to modulate the robot behaviour and responses. Therefore, we developed a dialogue system to navigate in an indoor environment, which produces different robot behaviours and responses based on users' intention and degree of politeness. We deploy and test our system with the Pepper robot that adapts to the changes in user's politeness. △ Less

Submitted 14 November, 2018; v1 submitted 19 September, 2018; originally announced September 2018.

Comments: Proceedings of ICSR 2018

arXiv:1808.10406 [pdf, other]

Characterizing classification datasets: a study of meta-features for meta-learning

Authors: Adriano Rivolli, Luís P. F. Garcia, Carlos Soares, Joaquin Vanschoren, André C. P. L. F. de Carvalho

Abstract: Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for… ▽ More Meta-learning is increasingly used to support the recommendation of machine learning algorithms and their configurations. Such recommendations are made based on meta-data, consisting of performance evaluations of algorithms on prior datasets, as well as characterizations of these datasets. These characterizations, also called meta-features, describe properties of the data which are predictive for the performance of machine learning algorithms trained on them. Unfortunately, despite being used in a large number of studies, meta-features are not uniformly described, organized and computed, making many empirical studies irreproducible and hard to compare. This paper aims to deal with this by systematizing and standardizing data characterization measures for classification datasets used in meta-learning. Moreover, it presents MFE, a new tool for extracting meta-features from datasets and identifying more subtle reproducibility issues in the literature, proposing guidelines for data characterization that strengthen reproducible empirical research in meta-learning. △ Less

Submitted 26 August, 2019; v1 submitted 30 August, 2018; originally announced August 2018.

Showing 1–50 of 63 results for author: Garcia, F