Search | arXiv e-print repository

Syft 0.5: A Platform for Universally Deployable Structured Transparency

Authors: Adam James Hall, Madhava Jay, Tudor Cebere, Bogdan Cebere, Koen Lennart van der Veen, George Muraru, Tongye Xu, Patrick Cason, William Abramson, Ayoub Benaissa, Chinmay Shah, Alan Aboudib, Théo Ryffel, Kritika Prakash, Tom Titcombe, Varun Kumar Khare, Maddie Shang, Ionesio Junior, Animesh Gupta, Jason Paumier, Nahua Kang, Vova Manannikov, Andrew Trask

Abstract: We present Syft 0.5, a general-purpose framework that combines a core group of privacy-enhancing technologies that facilitate a universal set of structured transparency systems. This framework is demonstrated through the design and implementation of a novel privacy-preserving inference information flow where we pass homomorphically encrypted activation signals through a split neural network for in… ▽ More We present Syft 0.5, a general-purpose framework that combines a core group of privacy-enhancing technologies that facilitate a universal set of structured transparency systems. This framework is demonstrated through the design and implementation of a novel privacy-preserving inference information flow where we pass homomorphically encrypted activation signals through a split neural network for inference. We show that splitting the model further up the computation chain significantly reduces the computation time of inference and the payload size of activation signals at the cost of model secrecy. We evaluate our proposed flow with respect to its provision of the core structural transparency principles. △ Less

Submitted 27 April, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

Comments: ICLR 2021 Workshop on Distributed and Private Machine Learning (DPML 2021)

arXiv:2104.05743 [pdf, other]

Practical Defences Against Model Inversion Attacks for Split Neural Networks

Authors: Tom Titcombe, Adam J. Hall, Pavlos Papadopoulos, Daniele Romanini

Abstract: We describe a threat model under which a split network-based federated learning system is susceptible to a model inversion attack by a malicious computational server. We demonstrate that the attack can be successfully performed with limited knowledge of the data distribution by the attacker. We propose a simple additive noise method to defend against model inversion, finding that the method can si… ▽ More We describe a threat model under which a split network-based federated learning system is susceptible to a model inversion attack by a malicious computational server. We demonstrate that the attack can be successfully performed with limited knowledge of the data distribution by the attacker. We propose a simple additive noise method to defend against model inversion, finding that the method can significantly reduce attack efficacy at an acceptable accuracy trade-off on MNIST. Furthermore, we show that NoPeekNN, an existing defensive method, protects different information from exposure, suggesting that a combined defence is necessary to fully protect private user data. △ Less

Submitted 21 April, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

Comments: ICLR 2021 Workshop on Distributed and Private Machine Learning (DPML 2021)

arXiv:2104.00489 [pdf, other]

PyVertical: A Vertical Federated Learning Framework for Multi-headed SplitNN

Authors: Daniele Romanini, Adam James Hall, Pavlos Papadopoulos, Tom Titcombe, Abbas Ismail, Tudor Cebere, Robert Sandmann, Robin Roehm, Michael A. Hoeh

Abstract: We introduce PyVertical, a framework supporting vertical federated learning using split neural networks. The proposed framework allows a data scientist to train neural networks on data features vertically partitioned across multiple owners while keeping raw data on an owner's device. To link entities shared across different datasets' partitions, we use Private Set Intersection on IDs associated wi… ▽ More We introduce PyVertical, a framework supporting vertical federated learning using split neural networks. The proposed framework allows a data scientist to train neural networks on data features vertically partitioned across multiple owners while keeping raw data on an owner's device. To link entities shared across different datasets' partitions, we use Private Set Intersection on IDs associated with data points. To demonstrate the validity of the proposed framework, we present the training of a simple dual-headed split neural network for a MNIST classification task, with data samples vertically distributed across two data owners and a data scientist. △ Less

Submitted 14 April, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: ICLR 2021 Workshop on Distributed and Private Machine Learning (DPML 2021)

arXiv:2103.15753 [pdf, other]

doi 10.3390/make3020017

Privacy and Trust Redefined in Federated Machine Learning

Authors: Pavlos Papadopoulos, Will Abramson, Adam J. Hall, Nikolaos Pitropakis, William J. Buchanan

Abstract: A common privacy issue in traditional machine learning is that data needs to be disclosed for the training procedures. In situations with highly sensitive data such as healthcare records, accessing this information is challenging and often prohibited. Luckily, privacy-preserving technologies have been developed to overcome this hurdle by distributing the computation of the training and ensuring th… ▽ More A common privacy issue in traditional machine learning is that data needs to be disclosed for the training procedures. In situations with highly sensitive data such as healthcare records, accessing this information is challenging and often prohibited. Luckily, privacy-preserving technologies have been developed to overcome this hurdle by distributing the computation of the training and ensuring the data privacy to their owners. The distribution of the computation to multiple participating entities introduces new privacy complications and risks. In this paper, we present a privacy-preserving decentralised workflow that facilitates trusted federated learning among participants. Our proof-of-concept defines a trust framework instantiated using decentralised identity technologies being developed under Hyperledger projects Aries/Indy/Ursa. Only entities in possession of Verifiable Credentials issued from the appropriate authorities are able to establish secure, authenticated communication channels authorised to participate in a federated learning workflow related to mental health data. △ Less

Submitted 30 March, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

Comments: MDPI Mach. Learn. Knowl. Extr. 2021, 3(2), 333-356; https://doi.org/10.3390/make3020017

Journal ref: Mach. Learn. Knowl. Extr. 2021, 3(2), 333-356

arXiv:2011.09350 [pdf, other]

Asymmetric Private Set Intersection with Applications to Contact Tracing and Private Vertical Federated Machine Learning

Authors: Nick Angelou, Ayoub Benaissa, Bogdan Cebere, William Clark, Adam James Hall, Michael A. Hoeh, Daniel Liu, Pavlos Papadopoulos, Robin Roehm, Robert Sandmann, Phillipp Schoppmann, Tom Titcombe

Abstract: We present a multi-language, cross-platform, open-source library for asymmetric private set intersection (PSI) and PSI-Cardinality (PSI-C). Our protocol combines traditional DDH-based PSI and PSI-C protocols with compression based on Bloom filters that helps reduce communication in the asymmetric setting. Currently, our library supports C++, C, Go, WebAssembly, JavaScript, Python, and Rust, and ru… ▽ More We present a multi-language, cross-platform, open-source library for asymmetric private set intersection (PSI) and PSI-Cardinality (PSI-C). Our protocol combines traditional DDH-based PSI and PSI-C protocols with compression based on Bloom filters that helps reduce communication in the asymmetric setting. Currently, our library supports C++, C, Go, WebAssembly, JavaScript, Python, and Rust, and runs on both traditional hardware (x86) and browser targets. We further apply our library to two use cases: (i) a privacy-preserving contact tracing protocol that is compatible with existing approaches, but improves their privacy guarantees, and (ii) privacy-preserving machine learning on vertically partitioned data. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Comments: NeurIPS 2020 Workshop on Privacy Preserving Machine Learning (PPML 2020)

arXiv:2006.02456 [pdf, other]

doi 10.1007/978-3-030-58986-8_14

A Distributed Trust Framework for Privacy-Preserving Machine Learning

Authors: Will Abramson, Adam James Hall, Pavlos Papadopoulos, Nikolaos Pitropakis, William J Buchanan

Abstract: When training a machine learning model, it is standard procedure for the researcher to have full knowledge of both the data and model. However, this engenders a lack of trust between data owners and data scientists. Data owners are justifiably reluctant to relinquish control of private information to third parties. Privacy-preserving techniques distribute computation in order to ensure that data r… ▽ More When training a machine learning model, it is standard procedure for the researcher to have full knowledge of both the data and model. However, this engenders a lack of trust between data owners and data scientists. Data owners are justifiably reluctant to relinquish control of private information to third parties. Privacy-preserving techniques distribute computation in order to ensure that data remains in the control of the owner while learning takes place. However, architectures distributed amongst multiple agents introduce an entirely new set of security and trust complications. These include data poisoning and model theft. This paper outlines a distributed infrastructure which is used to facilitate peer-to-peer trust between distributed agents; collaboratively performing a privacy-preserving workflow. Our outlined prototype sets industry gatekeepers and governance bodies as credential issuers. Before participating in the distributed learning workflow, malicious actors must first negotiate valid credentials. We detail a proof of concept using Hyperledger Aries, Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs) to establish a distributed trust architecture during a privacy-preserving machine learning experiment. Specifically, we utilise secure and authenticated DID communication channels in order to facilitate a federated learning workflow related to mental health care data. △ Less

Submitted 3 June, 2020; originally announced June 2020.

Comments: To be published in the proceedings of the 17th International Conference on Trust, Privacy and Security in Digital Business - TrustBus2020

Report number: TrustBus 2020, LNCS 12395, pp. 205--220, 2020 MSC Class: 68M25 ACM Class: C.2.0

Journal ref: 17th International Conference TrustBus 2020

arXiv:1907.10272 [pdf, other]

Predicting Malicious Insider Threat Scenarios Using Organizational Data and a Heterogeneous Stack-Classifier

Authors: Adam James Hall, Nikolaos Pitropakis, William J Buchanan, Naghmeh Moradpoor

Abstract: Insider threats continue to present a major challenge for the information security community. Despite constant research taking place in this area; a substantial gap still exists between the requirements of this community and the solutions that are currently available. This paper uses the CERT dataset r4.2 along with a series of machine learning classifiers to predict the occurrence of a particular… ▽ More Insider threats continue to present a major challenge for the information security community. Despite constant research taking place in this area; a substantial gap still exists between the requirements of this community and the solutions that are currently available. This paper uses the CERT dataset r4.2 along with a series of machine learning classifiers to predict the occurrence of a particular malicious insider threat scenario - the uploading sensitive information to wiki leaks before leaving the organization. These algorithms are aggregated into a meta-classifier which has a stronger predictive performance than its constituent models. It also defines a methodology for performing pre-processing on organizational log data into daily user summaries for classification, and is used to train multiple classifiers. Boosting is also applied to optimise classifier accuracy. Overall the models are evaluated through analysis of their associated confusion matrix and Receiver Operating Characteristic (ROC) curve, and the best performing classifiers are aggregated into an ensemble classifier. This meta-classifier has an accuracy of \textbf{96.2\%} with an area under the ROC curve of \textbf{0.988}. △ Less

Submitted 24 July, 2019; originally announced July 2019.

Journal ref: 2018 IEEE International Conference on Big Data (Big Data). IEEE, 2018

Showing 1–7 of 7 results for author: Hall, A J