Search | arXiv e-print repository

The Tracking Machine Learning challenge : Throughput phase

Authors: Sabrina Amrouche, Laurent Basara, Paolo Calafiura, Dmitry Emeliyanov, Victor Estrade, Steven Farrell, Cécile Germain, Vladimir Vava Gligorov, Tobias Golling, Sergey Gorbunov, Heather Gray, Isabelle Guyon, Mikhail Hushchyn, Vincenzo Innocente, Moritz Kiehn, Marcel Kunze, Edward Moyse, David Rousseau, Andreas Salzburger, Andrey Ustyuzhanin, Jean-Roch Vlimant

Abstract: This paper reports on the second "Throughput" phase of the Tracking Machine Learning (TrackML) challenge on the Codalab platform. As in the first "Accuracy" phase, the participants had to solve a difficult experimental problem linked to tracking accurately the trajectory of particles as e.g. created at the Large Hadron Collider (LHC): given O($10^5$) points, the participants had to connect them in… ▽ More This paper reports on the second "Throughput" phase of the Tracking Machine Learning (TrackML) challenge on the Codalab platform. As in the first "Accuracy" phase, the participants had to solve a difficult experimental problem linked to tracking accurately the trajectory of particles as e.g. created at the Large Hadron Collider (LHC): given O($10^5$) points, the participants had to connect them into O($10^4$) individual groups that represent the particle trajectories which are approximated helical. While in the first phase only the accuracy mattered, the goal of this second phase was a compromise between the accuracy and the speed of inference. Both were measured on the Codalab platform where the participants had to upload their software. The best three participants had solutions with good accuracy and speed an order of magnitude faster than the state of the art when the challenge was designed. Although the core algorithms were less diverse than in the first phase, a diversity of techniques have been used and are described in this paper. The performance of the algorithms are analysed in depth and lessons derived. △ Less

Submitted 14 May, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: submitted to Computing and Software for Big Science

arXiv:1810.09300 [pdf, other]

RCanopus: Making Canopus Resilient to Failures and Byzantine Faults

Authors: S. Keshav, W. Golab, B. Wong, S. Rizvi, S. Gorbunov

Abstract: Distributed consensus is a key enabler for many distributed systems including distributed databases and blockchains. Canopus is a scalable distributed consensus protocol that ensures that live nodes in a system agree on an ordered sequence of operations (called transactions). Unlike most prior consensus protocols, Canopus does not rely on a single leader. Instead, it uses a virtual tree overlay fo… ▽ More Distributed consensus is a key enabler for many distributed systems including distributed databases and blockchains. Canopus is a scalable distributed consensus protocol that ensures that live nodes in a system agree on an ordered sequence of operations (called transactions). Unlike most prior consensus protocols, Canopus does not rely on a single leader. Instead, it uses a virtual tree overlay for message dissemination to limit network traffic across oversubscribed links. It leverages hardware redundancies, both within a rack and inside the network fabric, to reduce both protocol complexity and communication overhead. These design decisions enable Canopus to support large deployments without significant performance degradation. The existing Canopus protocol is resilient in the face of node and communication failures, but its focus is primarily on performance, so does not respond well to other types of failures. For example, the failure of a single rack of servers causes all live nodes to stall. The protocol is also open to attack by Byzantine nodes, which can cause different live nodes to conclude the protocol with different transaction orders. In this paper, we describe RCanopus (`resilent Canopus') which extends Canopus to add liveness, that is, allowing live nodes to make progress, when possible, despite many types of failures. This requires RCanopus to accurately detect and recover from failure despite using unreliable failure detectors, and tolerance of Byzantine attacks. Second, RCanopus guarantees safety, that is, agreement amongst live nodes of transaction order, in the presence of Byzantine attacks and network partitioning. △ Less

Submitted 16 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: Pre-print

arXiv:1711.02279 [pdf, other]

StealthDB: a Scalable Encrypted Database with Full SQL Query Support

Authors: Alexey Gribov, Dhinakaran Vinayagamurthy, Sergey Gorbunov

Abstract: Encrypted database systems provide a great method for protecting sensitive data in untrusted infrastructures. These systems are built using either special-purpose cryptographic algorithms that support operations over encrypted data, or by leveraging trusted computing co-processors. Strong cryptographic algorithms (e.g., public-key encryptions, garbled circuits) usually result in high performance o… ▽ More Encrypted database systems provide a great method for protecting sensitive data in untrusted infrastructures. These systems are built using either special-purpose cryptographic algorithms that support operations over encrypted data, or by leveraging trusted computing co-processors. Strong cryptographic algorithms (e.g., public-key encryptions, garbled circuits) usually result in high performance overheads, while weaker algorithms (e.g., order-preserving encryption) result in large leakage profiles. On the other hand, some encrypted database systems (e.g., Cipherbase, TrustedDB) leverage non-standard trusted computing devices, and are designed to work around the architectural limitations of the specific devices used. In this work we build StealthDB - an encrypted database system from Intel SGX. Our system can run on any newer generation Intel CPU. StealthDB has a very small trusted computing base, scales to large transactional workloads, requires minor DBMS changes, and provides a relatively strong security guarantees at steady state and during query execution. Our prototype on top of Postgres supports the full TPC-C benchmark with a 30% decrease in the average throughput over an unmodified version of Postgres operating on a 2GB unencrypted dataset. △ Less

Submitted 21 April, 2019; v1 submitted 6 November, 2017; originally announced November 2017.

Showing 1–3 of 3 results for author: Gorbunov, S