Search | arXiv e-print repository

Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey

Authors: Gregory Palmer, Chris Parry, Daniel J. B. Harrold, Chris Willis

Abstract: The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber-defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber-… ▽ More The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber-defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber-operations (ACO) at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACO problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACO-DRL agent. We provide: i.) A summary of the domain properties that define the ACO problem; ii.) A comprehensive evaluation of the extent to which domains used for benchmarking DRL approaches are comparable to ACO; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACO. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACO. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 60 pages, 14 figures, 3 tables

arXiv:2211.01704 [pdf]

Cutting Through the Noise: An Empirical Comparison of Psychoacoustic and Envelope-based Features for Machinery Fault Detection

Authors: Peter Wißbrock, Yvonne Richter, David Pelkmann, Zhao Ren, Gregory Palmer

Abstract: Acoustic-based fault detection has a high potential to monitor the health condition of mechanical parts. However, the background noise of an industrial environment may negatively influence the performance of fault detection. Limited attention has been paid to improving the robustness of fault detection against industrial environmental noise. Therefore, we present the Lenze production background-no… ▽ More Acoustic-based fault detection has a high potential to monitor the health condition of mechanical parts. However, the background noise of an industrial environment may negatively influence the performance of fault detection. Limited attention has been paid to improving the robustness of fault detection against industrial environmental noise. Therefore, we present the Lenze production background-noise (LPBN) real-world dataset and an automated and noise-robust auditory inspection (ARAI) system for the end-of-line inspection of geared motors. An acoustic array is used to acquire data from motors with a minor fault, major fault, or which are healthy. A benchmark is provided to compare the psychoacoustic features with different types of envelope features based on expert knowledge of the gearbox. To the best of our knowledge, we are the first to apply time-varying psychoacoustic features for fault detection. We train a state-of-the-art one-class-classifier, on samples from healthy motors and separate the faulty ones for fault detection using a threshold. The best-performing approaches achieve an area under curve of 0.87 (logarithm envelope), 0.86 (time-varying psychoacoustics), and 0.91 (combination of both). △ Less

Submitted 13 March, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: the final published version at ICASSP 2023 include small additional content as well as some minor revisions

arXiv:2210.17227 [pdf, other]

Modelling M/M/R-JSQ-PS sojourn time distribution for Ultra-Reliable Low Latency Communication services

Authors: Geraint I. Palmer, Jorge Martín-Pérez

Abstract: The future Internet promises to support time-sensitive services that require ultra low latencies and reliabilities of 99.99%. Recent advances in cellular and WiFi connections enhance the network to meet high reliability and ultra low latencies. However, the aforementioned services require that the server processing time ensures low latencies with high reliability, otherwise the end-to-end performa… ▽ More The future Internet promises to support time-sensitive services that require ultra low latencies and reliabilities of 99.99%. Recent advances in cellular and WiFi connections enhance the network to meet high reliability and ultra low latencies. However, the aforementioned services require that the server processing time ensures low latencies with high reliability, otherwise the end-to-end performance is not met. To that end, in this paper we use queuing theory to model the sojourn time distribution for Ultra-Reliable Low Latency Communication services of M/M/R-JSQ-PS systems: Markovian queues with R CPU servers following a join shortest queue processor-sharing discipline (for example Linux systems). We develop open-source simulation software, and develop and compare six analytical approximations for the sojourn time distribution. The proposed approximations yield Wasserstein distances below 2 time units, and upon medium loads incur into errors of less than 1.78 time units (e.g., milliseconds) for the 99.99th percentile sojourn time. Moreover, the proposed sojourn time approximations are stable regardless the number of CPUs and stay close to the simulations. △ Less

Submitted 22 December, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: 14 Pages, 10 figures, submitted to Elsevier European Journal of Operational Research

arXiv:2207.04515 [pdf, other]

Developing an AI-enabled IIoT platform -- Lessons learned from early use case validation

Authors: Holger Eichelberger, Gregory Palmer, Svenja Reimer, Tat Trong Vu, Hieu Do, Sofiane Laridi, Alexander Weber, Claudia Niederée, Thomas Hildebrandt

Abstract: For a broader adoption of AI in industrial production, adequate infrastructure capabilities are crucial. This includes easing the integration of AI with industrial devices, support for distributed deployment, monitoring, and consistent system configuration. Existing IIoT platforms still lack required capabilities to flexibly integrate reusable AI services and relevant standards such as Asset Admin… ▽ More For a broader adoption of AI in industrial production, adequate infrastructure capabilities are crucial. This includes easing the integration of AI with industrial devices, support for distributed deployment, monitoring, and consistent system configuration. Existing IIoT platforms still lack required capabilities to flexibly integrate reusable AI services and relevant standards such as Asset Administration Shells or OPC UA in an open, ecosystem-based manner. This is exactly what our next level Intelligent Industrial Production Ecosphere (IIP-Ecosphere) platform addresses, employing a highly configurable low-code based approach. In this paper, we introduce the design of this platform and discuss an early evaluation in terms of a demonstrator for AI-enabled visual quality inspection. This is complemented by insights and lessons learned during this early evaluation activity. △ Less

Submitted 10 July, 2022; originally announced July 2022.

Comments: 16 pages, 5 figures

arXiv:2207.03352 [pdf, other]

Market Making with Scaled Beta Policies

Authors: Joseph Jerome, Gregory Palmer, Rahul Savani

Abstract: This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of c… ▽ More This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of contiguous prices. Scaled beta distribution based policies generalise these, allowing volume to be skewed across the price interval. We demonstrate that this flexibility is useful for inventory management, one of the key challenges faced by a market maker. In this paper, we conduct three main experiments: first, we compare our more flexible beta-based actions with the special case of ladder strategies; then, we investigate the performance of simple fixed distributions; and finally, we devise and evaluate a simple and intuitive dynamic control policy that adjusts actions in a continuous manner depending on the signed inventory that the market maker has acquired. All empirical evaluations use a high-fidelity limit order book simulator based on historical data with 50 levels on each side. △ Less

Submitted 27 September, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2205.09550 [pdf, other]

Data Valuation for Offline Reinforcement Learning

Authors: Amir Abolfazli, Gregory Palmer, Daniel Kudenko

Abstract: The success of deep reinforcement learning (DRL) hinges on the availability of training data, which is typically obtained via a large number of environment interactions. In many real-world scenarios, costs and risks are associated with gathering these data. The field of offline reinforcement learning addresses these issues through outsourcing the collection of data to a domain expert or a carefull… ▽ More The success of deep reinforcement learning (DRL) hinges on the availability of training data, which is typically obtained via a large number of environment interactions. In many real-world scenarios, costs and risks are associated with gathering these data. The field of offline reinforcement learning addresses these issues through outsourcing the collection of data to a domain expert or a carefully monitored program and subsequently searching for a batch-constrained optimal policy. With the emergence of data markets, an alternative to constructing a dataset in-house is to purchase external data. However, while state-of-the-art offline reinforcement learning approaches have shown a lot of promise, they currently rely on carefully constructed datasets that are well aligned with the intended target domains. This raises questions regarding the transferability and robustness of an offline reinforcement learning agent trained on externally acquired data. In this paper, we empirically evaluate the ability of the current state-of-the-art offline reinforcement learning approaches to coping with the source-target domain mismatch within two MuJoCo environments, finding that current state-of-the-art offline reinforcement learning algorithms underperform in the target domain. To address this, we propose data valuation for offline reinforcement learning (DVORL), which allows us to identify relevant and high-quality transitions, improving the performance and transferability of policies learned by offline reinforcement learning algorithms. The results show that our method outperforms offline reinforcement learning baselines on two MuJoCo environments. △ Less

Submitted 19 May, 2022; originally announced May 2022.

Comments: 9 pages, 3 figures, 2 tables

arXiv:2205.02827 [pdf, other]

Identifying Cause-and-Effect Relationships of Manufacturing Errors using Sequence-to-Sequence Learning

Authors: Jeff Reimer, Yandong Wang, Sofiane Laridi, Juergen Urdich, Sören Wilmsmeier, Gregory Palmer

Abstract: In car-body production the pre-formed sheet metal parts of the body are assembled on fully-automated production lines. The body passes through multiple stations in succession, and is processed according to the order requirements. The timely completion of orders depends on the individual station-based operations concluding within their scheduled cycle times. If an error occurs in one station, it ca… ▽ More In car-body production the pre-formed sheet metal parts of the body are assembled on fully-automated production lines. The body passes through multiple stations in succession, and is processed according to the order requirements. The timely completion of orders depends on the individual station-based operations concluding within their scheduled cycle times. If an error occurs in one station, it can have a knock-on effect, resulting in delays on the downstream stations. To the best of our knowledge, there exist no methods for automatically distinguishing between source and knock-on errors in this setting, as well as establishing a causal relation between them. Utilizing real-time information about conditions collected by a production data acquisition system, we propose a novel vehicle manufacturing analysis system, which uses deep learning to establish a link between source and knock-on errors. We benchmark three sequence-to-sequence models, and introduce a novel composite time-weighted action metric for evaluating models in this context. We evaluate our framework on a real-world car production dataset recorded by Volkswagen Commercial Vehicles. Surprisingly we find that 71.68% of sequences contain either a source or knock-on error. With respect to seq2seq model training, we find that the Transformer demonstrates a better performance compared to LSTM and GRU in this domain, in particular when the prediction range with respect to the durations of future actions is increased. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: 11 pages, 5 figures, 2 tables

arXiv:2203.04696 [pdf, other]

Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition

Authors: Yi Chang, Sofiane Laridi, Zhao Ren, Gregory Palmer, Björn W. Schuller, Marco Fisichella

Abstract: Due to the development of machine learning and speech processing, speech emotion recognition has been a popular research topic in recent years. However, the speech data cannot be protected when it is uploaded and processed on servers in the internet-of-things applications of speech emotion recognition. Furthermore, deep neural networks have proven to be vulnerable to human-indistinguishable advers… ▽ More Due to the development of machine learning and speech processing, speech emotion recognition has been a popular research topic in recent years. However, the speech data cannot be protected when it is uploaded and processed on servers in the internet-of-things applications of speech emotion recognition. Furthermore, deep neural networks have proven to be vulnerable to human-indistinguishable adversarial perturbations. The adversarial attacks generated from the perturbations may result in deep neural networks wrongly predicting the emotional states. We propose a novel federated adversarial learning framework for protecting both data and deep neural networks. The proposed framework consists of i) federated learning for data privacy, and ii) adversarial training at the training stage and randomisation at the testing stage for model robustness. The experiments show that our proposed framework can effectively protect the speech data locally and improve the model robustness against a series of adversarial attacks. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 11 pages, 6 figures, 3 tables

arXiv:2007.04611 [pdf, other]

A deep learning approach to identify unhealthy advertisements in street view images

Authors: Gregory Palmer, Mark Green, Emma Boyland, Yales Stefano Rios Vasconcelos, Rahul Savani, Alex Singleton

Abstract: While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort.… ▽ More While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort. To address this problem we develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street-level images. We introduce the Liverpool 360 Street View (LIV360SV) dataset for evaluating our workflow. The dataset contains 25,349, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th - 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband). We find evidence of social inequalities with a larger proportion of food advertisements located within deprived areas and those frequented by students. Our project presents a novel implementation for the incidental classification of street view images for identifying unhealthy advertisements, providing a means through which to identify areas that can benefit from tougher advertisement restriction policies for tackling social inequalities. △ Less

Submitted 7 February, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

Comments: 13 pages, 5 figures, 3 table. To appear in Nature Scientific Reports

arXiv:2002.09406 [pdf, other]

The Automated Inspection of Opaque Liquid Vaccines

Authors: Gregory Palmer, Benjamin Schnieders, Rahul Savani, Karl Tuyls, Joscha-David Fossel, Harry Flore

Abstract: In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial's surface. We train 3D-ConvNets to predict th… ▽ More In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial's surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0.94 and 0.93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217,888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: 8 pages, 5 Figures, 3 Tables, ECAI 2020 Conference Proceedings

arXiv:1903.00683 [pdf, other]

Fully Convolutional One-Shot Object Segmentation for Industrial Robotics

Authors: Benjamin Schnieders, Shan Luo, Gregory Palmer, Karl Tuyls

Abstract: The ability to identify and localize new objects robustly and effectively is vital for robotic grasping and manipulation in warehouses or smart factories. Deep convolutional neural networks (DCNNs) have achieved the state-of-the-art performance on established image datasets for object detection and segmentation. However, applying DCNNs in dynamic industrial scenarios, e.g., warehouses and autonomo… ▽ More The ability to identify and localize new objects robustly and effectively is vital for robotic grasping and manipulation in warehouses or smart factories. Deep convolutional neural networks (DCNNs) have achieved the state-of-the-art performance on established image datasets for object detection and segmentation. However, applying DCNNs in dynamic industrial scenarios, e.g., warehouses and autonomous production, remains a challenging problem. DCNNs quickly become ineffective when tasked with detecting objects that they have not been trained on. Given that re-training using the latest data is time consuming, DCNNs cannot meet the requirement of the Factory of the Future (FoF) regarding rapid development and production cycles. To address this problem, we propose a novel one-shot object segmentation framework, using a fully convolutional Siamese network architecture, to detect previously unknown objects based on a single prototype image. We turn to multi-task learning to reduce training time and improve classification accuracy. Furthermore, we introduce a novel approach to automatically cluster the learnt feature space representation in a weakly supervised manner. We test the proposed framework on the RoboCup@Work dataset, simulating requirements for the FoF. Results show that the trained network on average identifies 73% of previously unseen objects correctly from a single example image. Correctly identified objects are estimated to have a 87.53% successful pick-up rate. Finally, multi-task learning lowers the convergence time by up to 33%, and increases accuracy by 2.99%. △ Less

Submitted 2 March, 2019; originally announced March 2019.

Comments: International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), 9 pages

arXiv:1809.05096 [pdf, other]

Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

Authors: Gregory Palmer, Rahul Savani, Karl Tuyls

Abstract: In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have pr… ▽ More In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies to learn optimal joint policies. Addressing one pathology often leaves approaches vulnerable towards others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as leniency, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems, while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in our environment, overcoming the outlined pathologies. △ Less

Submitted 7 May, 2019; v1 submitted 13 September, 2018; originally announced September 2018.

Comments: 11 Pages, 6 Figures, AAMAS2019 Conference Proceedings

arXiv:1710.03561 [pdf, other]

Ciw: An open source discrete event simulation library

Authors: Geraint I. Palmer, Vincent A. Knight, Paul R. Harper, Asyl L. Hawa

Abstract: This paper introduces Ciw, an open source library for conducting discrete event simulations that has been developed in Python. The strengths of the library are illustrated in terms of best practice and reproducibility for computational research. An analysis of Ciw's performance and comparison to several alternative discrete event simulation frameworks is presented. This paper introduces Ciw, an open source library for conducting discrete event simulations that has been developed in Python. The strengths of the library are illustrated in terms of best practice and reproducibility for computational research. An analysis of Ciw's performance and comparison to several alternative discrete event simulation frameworks is presented. △ Less

Submitted 27 September, 2017; originally announced October 2017.

arXiv:1707.04402 [pdf, other]

Lenient Multi-Agent Deep Reinforcement Learning

Authors: Gregory Palmer, Karl Tuyls, Daan Bloembergen, Rahul Savani

Abstract: Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated becaus… ▽ More Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents. △ Less

Submitted 27 February, 2018; v1 submitted 14 July, 2017; originally announced July 2017.

Comments: 9 pages, 6 figures, AAMAS2018 Conference Proceedings

arXiv:1604.00896 [pdf, other]

doi 10.5334/jors.125

An open reproducible framework for the study of the iterated prisoner's dilemma

Authors: Vincent Knight, Owen Campbell, Marc Harper, Karol Langner, James Campbell, Thomas Campbell, Alex Carney, Martin Chorley, Cameron Davidson-Pilon, Kristian Glass, Nikoleta Glynatsi, Tomáš Ehrlich, Martin Jones, Georgios Koutsovoulos, Holly Tibble, Müller Jochen, Geraint Palmer, Piotr Petunov, Paul Slavin, Timothy Standen, Luis Visintini, Karl Molden

Abstract: The Axelrod library is an open source Python package that allows for reproducible game theoretic research into the Iterated Prisoner's Dilemma. This area of research began in the 1980s but suffers from a lack of documentation and test code. The goal of the library is to provide such a resource, with facilities for the design of new strategies and interactions between them, as well as conducting to… ▽ More The Axelrod library is an open source Python package that allows for reproducible game theoretic research into the Iterated Prisoner's Dilemma. This area of research began in the 1980s but suffers from a lack of documentation and test code. The goal of the library is to provide such a resource, with facilities for the design of new strategies and interactions between them, as well as conducting tournaments and ecological simulations for populations of strategies. With a growing collection of 139 strategies, the library is a also a platform for an original tournament that, in itself, is of interest to the game theoretic community. This paper describes the Iterated Prisoner's Dilemma, the Axelrod library and its development, and insights gained from some novel research. △ Less

Submitted 20 December, 2016; v1 submitted 4 April, 2016; originally announced April 2016.

Comments: 11 pages, Journal of Open Research Software 4.1 (2016)

Showing 1–15 of 15 results for author: Palmer, G