Search | arXiv e-print repository

Reward Certification for Policy Smoothed Reinforcement Learning

Authors: Ronghui Mu, Leandro Soriano Marcolino, Tianle Zhang, Yanghao Zhang, Xiaowei Huang, Wenjie Ruan

Abstract: Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks. Recent studies have introduced "smoothed policies" in order to enhance its robustness. Yet, it is still challenging to establish a provable guarantee to certify the bound of its total reward. Prior methods relied primarily on computing bounds using Lipschitz continui… ▽ More Reinforcement Learning (RL) has achieved remarkable success in safety-critical areas, but it can be weakened by adversarial attacks. Recent studies have introduced "smoothed policies" in order to enhance its robustness. Yet, it is still challenging to establish a provable guarantee to certify the bound of its total reward. Prior methods relied primarily on computing bounds using Lipschitz continuity or calculating the probability of cumulative reward above specific thresholds. However, these techniques are only suited for continuous perturbations on the RL agent's observations and are restricted to perturbations bounded by the $l_2$-norm. To address these limitations, this paper proposes a general black-box certification method capable of directly certifying the cumulative reward of the smoothed policy under various $l_p$-norm bounded perturbations. Furthermore, we extend our methodology to certify perturbations on action spaces. Our approach leverages f-divergence to measure the distinction between the original distribution and the perturbed distribution, subsequently determining the certification bound by solving a convex optimisation problem. We provide a comprehensive theoretical analysis and run sufficient experiments in multiple environments. Our results show that our method not only improves the certified lower bound of mean cumulative reward but also demonstrates better efficiency than state-of-the-art techniques. △ Less

Submitted 12 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: This paper will be presented in AAAI2024

arXiv:2212.11746 [pdf, other]

Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning

Authors: Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Gaojie Jin, Qiang Ni

Abstract: Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to dete… ▽ More Cooperative multi-agent reinforcement learning (c-MARL) is widely applied in safety-critical scenarios, thus the analysis of robustness for c-MARL models is profoundly important. However, robustness certification for c-MARLs has not yet been explored in the community. In this paper, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. c-MARL certification poses two key challenges compared with single-agent systems: (i) the accumulated uncertainty as the number of agents increases; (ii) the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent us from directly using existing algorithms. Hence, we employ the false discovery rate (FDR) controlling procedure considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. As our method is general, it can also be applied in single-agent environments. We empirically show that our certification bounds are much tighter than state-of-the-art RL certification solutions. We also run experiments on two popular c-MARL algorithms: QMIX and VDN, in two different environments, with two and four agents. The experimental results show that our method produces meaningful guaranteed robustness for all models and environments. Our tool CertifyCMARL is available at https://github.com/TrustAI/CertifyCMA △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: This paper will appear in AAAI2023

arXiv:2210.05626 [pdf, other]

Semantic Segmentation under Adverse Conditions: A Weather and Nighttime-aware Synthetic Data-based Approach

Authors: Abdulrahman Kerim, Felipe Chamone, Washington Ramos, Leandro Soriano Marcolino, Erickson R. Nascimento, Richard Jiang

Abstract: Recent semantic segmentation models perform well under standard weather conditions and sufficient illumination but struggle with adverse weather conditions and nighttime. Collecting and annotating training data under these conditions is expensive, time-consuming, error-prone, and not always practical. Usually, synthetic data is used as a feasible data source to increase the amount of training data… ▽ More Recent semantic segmentation models perform well under standard weather conditions and sufficient illumination but struggle with adverse weather conditions and nighttime. Collecting and annotating training data under these conditions is expensive, time-consuming, error-prone, and not always practical. Usually, synthetic data is used as a feasible data source to increase the amount of training data. However, just directly using synthetic data may actually harm the model's performance under normal weather conditions while getting only small gains in adverse situations. Therefore, we present a novel architecture specifically designed for using synthetic training data for domain adaptation. We propose a simple yet powerful addition to DeepLabV3+ by using weather and time-of-the-day supervisors trained with multi-task learning, making it both weather and nighttime aware, which improves its mIoU accuracy by $14$ percentage points on the ACDC dataset while maintaining a score of $75\%$ mIoU on the Cityscapes dataset. Our code is available at https://github.com/lsmcolab/Semantic-Segmentation-under-Adverse-Conditions. △ Less

Submitted 11 October, 2022; originally announced October 2022.

Comments: This paper is accepted by BMVC 2022

arXiv:2208.12763 [pdf, other]

Leveraging Synthetic Data to Learn Video Stabilization Under Adverse Conditions

Authors: Abdulrahman Kerim, Washington L. S. Ramos, Leandro Soriano Marcolino, Erickson R. Nascimento, Richard Jiang

Abstract: Video stabilization plays a central role to improve videos quality. However, despite the substantial progress made by these methods, they were, mainly, tested under standard weather and lighting conditions, and may perform poorly under adverse conditions. In this paper, we propose a synthetic-aware adverse weather robust algorithm for video stabilization that does not require real data and can be… ▽ More Video stabilization plays a central role to improve videos quality. However, despite the substantial progress made by these methods, they were, mainly, tested under standard weather and lighting conditions, and may perform poorly under adverse conditions. In this paper, we propose a synthetic-aware adverse weather robust algorithm for video stabilization that does not require real data and can be trained only on synthetic data. We also present Silver, a novel rendering engine to generate the required training data with an automatic ground-truth extraction procedure. Our approach uses our specially generated synthetic data for training an affine transformation matrix estimator avoiding the feature extraction issues faced by current methods. Additionally, since no video stabilization datasets under adverse conditions are available, we propose the novel VSAC105Real dataset for evaluation. We compare our method to five state-of-the-art video stabilization algorithms using two benchmarks. Our results show that current approaches perform poorly in at least one weather condition, and that, even training in a small dataset with synthetic data only, we achieve the best performance in terms of stability average score, distortion score, success rate, and average cropping ratio when considering all weather conditions. Hence, our video stabilization model generalizes well on real-world videos and does not require large-scale synthetic training data to converge. △ Less

Submitted 26 August, 2022; originally announced August 2022.

ACM Class: I.4.0; I.4.1; I.6.0

arXiv:2207.07539 [pdf, other]

3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models

Authors: Ronghui Mu, Wenjie Ruan, Leandro S. Marcolino, Qiang Ni

Abstract: 3D point cloud models are widely applied in safety-critical scenes, which delivers an urgent need to obtain more solid proofs to verify the robustness of models. Existing verification method for point cloud model is time-expensive and computationally unattainable on large networks. Additionally, they cannot handle the complete PointNet model with joint alignment network (JANet) that contains multi… ▽ More 3D point cloud models are widely applied in safety-critical scenes, which delivers an urgent need to obtain more solid proofs to verify the robustness of models. Existing verification method for point cloud model is time-expensive and computationally unattainable on large networks. Additionally, they cannot handle the complete PointNet model with joint alignment network (JANet) that contains multiplication layers, which effectively boosts the performance of 3D models. This motivates us to design a more efficient and general framework to verify various architectures of point cloud models. The key challenges in verifying the large-scale complete PointNet models are addressed as dealing with the cross-non-linearity operations in the multiplication layers and the high computational complexity of high-dimensional point cloud inputs and added layers. Thus, we propose an efficient verification framework, 3DVerifier, to tackle both challenges by adopting a linear relaxation function to bound the multiplication layer and combining forward and backward propagation to compute the certified bounds of the outputs of the point cloud models. Our comprehensive experiments demonstrate that 3DVerifier outperforms existing verification algorithms for 3D models in terms of both efficiency and accuracy. Notably, our approach achieves an orders-of-magnitude improvement in verification efficiency for the large network, and the obtained certified bounds are also significantly tighter than the state-of-the-art verifiers. We release our tool 3DVerifier via https://github.com/TrustAI/3DVerifier for use by the community. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2203.15778 [pdf, other]

doi 10.1109/TPAMI.2022.3157198

Text-Driven Video Acceleration: A Weakly-Supervised Reinforcement Learning Method

Authors: Washington Ramos, Michel Silva, Edson Araujo, Victor Moura, Keller Oliveira, Leandro Soriano Marcolino, Erickson R. Nascimento

Abstract: The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised me… ▽ More The growth of videos in our digital age and the users' limited time raise the demand for processing untrimmed videos to produce shorter versions conveying the same information. Despite the remarkable progress that summarization methods have made, most of them can only select a few frames or skims, creating visual gaps and breaking the video context. This paper presents a novel weakly-supervised methodology based on a reinforcement learning formulation to accelerate instructional videos using text. A novel joint reward function guides our agent to select which frames to remove and reduce the input video to a target length without creating gaps in the final video. We also propose the Extended Visually-guided Document Attention Network (VDAN+), which can generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in Precision, Recall, and F1 Score against the baselines while effectively controlling the video's output length. Visit https://www.verlab.dcc.ufmg.br/semantic-hyperlapse/tpami2022/ for code and extra results. △ Less

Submitted 29 March, 2022; originally announced March 2022.

Comments: Accepted to the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2022. arXiv admin note: text overlap with arXiv:2003.14229

arXiv:2201.09337 [pdf, other]

Congestion control algorithms for robotic swarms with a common target based on the throughput of the target area

Authors: Yuri Tavares dos Passos, Xavier Duquesne, Leandro Soriano Marcolino

Abstract: When a large number of robots try to reach a common area, congestions happen, causing severe delays. To minimise congestion in a robotic swarm system, traffic control algorithms must be employed in a decentralised manner. Based on strategies aimed to maximise the throughput of the common target area, we developed two novel algorithms for robots using artificial potential fields for obstacle avoida… ▽ More When a large number of robots try to reach a common area, congestions happen, causing severe delays. To minimise congestion in a robotic swarm system, traffic control algorithms must be employed in a decentralised manner. Based on strategies aimed to maximise the throughput of the common target area, we developed two novel algorithms for robots using artificial potential fields for obstacle avoidance and navigation. One algorithm is inspired by creating a queue to get to the target area (Single Queue Former -- SQF), while the other makes the robots touch the boundary of the circular area by using vector fields (Touch and Run Vector Fields -- TRVF). We performed simulation experiments to show that the proposed algorithms are bounded by the throughput of their inspired theoretical strategies and compare the two novel algorithms with state-of-art algorithms for the same problem (PCC, EE and PCC-EE). The SQF algorithm significantly outperforms all other algorithms for a large number of robots or when the circular target region radius is small. TRVF, on the other hand, is better than SQF only for a limited number of robots and outperforms only PCC for numerous robots. However, it allows us to analyse the potential impacts on the throughput when transferring an idea from a theoretical strategy to a concrete algorithm that considers changing linear speeds and distances between robots. △ Less

Submitted 23 August, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

Comments: Corrections were made to the TRVF algorithm and the text, and new references were added

arXiv:2201.09335 [pdf, other]

On the throughput of the common target area for robotic swarm strategies -- extended version

Authors: Yuri Tavares dos Passos, Xavier Duquesne, Leandro Soriano Marcolino

Abstract: A robotic swarm may encounter traffic congestion when many robots simultaneously attempt to reach the same area. For solving that efficiently, robots must execute decentralised traffic control algorithms. In this work, we propose a measure for evaluating the access efficiency of a common target area as the number of robots in the swarm rises: the common target area throughput. We also employ here… ▽ More A robotic swarm may encounter traffic congestion when many robots simultaneously attempt to reach the same area. For solving that efficiently, robots must execute decentralised traffic control algorithms. In this work, we propose a measure for evaluating the access efficiency of a common target area as the number of robots in the swarm rises: the common target area throughput. We also employ here the target area asymptotic throughput -- that is, the throughput of a target region with a limited area as the time tends to infinity -- because it is always finite as the number of robots grows, opposed to the relation arrival time at the target per number of robots that tends to infinity. Using this measure, we can analytically compare the effectiveness of different algorithms. In particular, we propose and formally evaluate three different theoretical strategies for getting to a circular target area: (i) forming parallel queues towards the target area, (ii) forming a hexagonal packing through a corridor going to the target, and (iii) making multiple curved trajectories towards the boundary of the target area. We calculate the throughput for a fixed time and the asymptotic throughput for these strategies. Additionally, we corroborate these results by simulations, showing that when a strategy has higher throughput, its arrival time per number of robots is lower. Thus, we conclude that using throughput is well suited for comparing congestion algorithms for a common target area in robotic swarms even if we do not have their closed asymptotic equation. △ Less

Submitted 30 March, 2022; v1 submitted 23 January, 2022; originally announced January 2022.

Comments: The proof for finite throughput was removed

arXiv:2111.05468 [pdf, other]

Sparse Adversarial Video Attacks with Spatial Transformations

Authors: Ronghui Mu, Wenjie Ruan, Leandro Soriano Marcolino, Qiang Ni

Abstract: In recent years, a significant amount of research efforts concentrated on adversarial attacks on images, while adversarial video attacks have seldom been explored. We propose an adversarial attack strategy on videos, called DeepSAVA. Our model includes both additive perturbation and spatial transformation by a unified optimisation framework, where the structural similarity index (SSIM) measure is… ▽ More In recent years, a significant amount of research efforts concentrated on adversarial attacks on images, while adversarial video attacks have seldom been explored. We propose an adversarial attack strategy on videos, called DeepSAVA. Our model includes both additive perturbation and spatial transformation by a unified optimisation framework, where the structural similarity index (SSIM) measure is adopted to measure the adversarial distance. We design an effective and novel optimisation scheme which alternatively utilizes Bayesian optimisation to identify the most influential frame in a video and Stochastic gradient descent (SGD) based optimisation to produce both additive and spatial-transformed perturbations. Doing so enables DeepSAVA to perform a very sparse attack on videos for maintaining human imperceptibility while still achieving state-of-the-art performance in terms of both attack success rate and adversarial transferability. Our intensive experiments on various types of deep neural networks and video datasets confirm the superiority of DeepSAVA. △ Less

Submitted 9 November, 2021; originally announced November 2021.

Comments: The short version of this work will appear in the BMVC 2021 conference

arXiv:2008.09461 [pdf, other]

doi 10.1140/epjb/s10051-020-00016-4

The paradox of productivity during quarantine: an agent-based simulation

Authors: Peter Hardy, Leandro Soriano Marcolino, José F. Fontanari

Abstract: Economies across the globe were brought to their knees due to lockdowns and social restriction measures to contain the spread of the SARS-CoV-2, despite the quick switch to remote working. This downfall may be partially explained by the "water cooler effect", which holds that higher levels of social interaction lead to higher productivity due to a boost in people's mood. Somewhat paradoxically, ho… ▽ More Economies across the globe were brought to their knees due to lockdowns and social restriction measures to contain the spread of the SARS-CoV-2, despite the quick switch to remote working. This downfall may be partially explained by the "water cooler effect", which holds that higher levels of social interaction lead to higher productivity due to a boost in people's mood. Somewhat paradoxically, however, there are reports of increased productivity in the remote working scenario. Here we address quantitatively this issue using a variety of experimental findings of social psychology that address the interplay between mood, social interaction and productivity to set forth an agent-based model for a workplace composed of extrovert and introvert agent stereotypes that differ solely on their propensities to initiate a social interaction. We find that the effects of curtailing social interactions depend on the proportion of the stereotypes in the working group: while the social restriction measures always have a negative impact on the productivity of groups composed predominantly of introverts, they may actually improve the productivity of groups composed predominantly of extroverts. Our results offer a proof of concept that the paradox of productivity during quarantine can be explained by taking into account the distinct effects of the social distancing measures on extroverts and introverts. △ Less

Submitted 19 November, 2020; v1 submitted 21 August, 2020; originally announced August 2020.

Journal ref: Eur. Phys. J. B (2021) 94: 40

arXiv:2003.14229 [pdf, other]

Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data

Authors: Washington Ramos, Michel Silva, Edson Araujo, Leandro Soriano Marcolino, Erickson Nascimento

Abstract: The rapid increase in the amount of published visual data and the limited time of users bring the demand for processing untrimmed videos to produce shorter versions that convey the same information. Despite the remarkable progress that has been made by summarization methods, most of them can only select a few frames or skims, which creates visual gaps and breaks the video context. In this paper, w… ▽ More The rapid increase in the amount of published visual data and the limited time of users bring the demand for processing untrimmed videos to produce shorter versions that convey the same information. Despite the remarkable progress that has been made by summarization methods, most of them can only select a few frames or skims, which creates visual gaps and breaks the video context. In this paper, we present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video. Our agent is textually and visually oriented to select which frames to remove to shrink the input video. Additionally, we propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space to represent both textual and visual data. Our experiments show that our method achieves the best performance in terms of F1 Score and coverage at the video segment level. △ Less

Submitted 31 March, 2020; originally announced March 2020.

Comments: CVPR 2020

Showing 1–11 of 11 results for author: Marcolino, L S