-
Sequential Binary Classification for Intrusion Detection in Software Defined Networks
Authors:
Ishan Chokshi,
Shrihari Vasudevan,
Nachiappan Sundaram,
Raaghul Ranganathan
Abstract:
Software-Defined Networks (SDN) are the standard architecture for network deployment. Intrusion Detection Systems (IDS) are a pivotal part of this technology as networks become more vulnerable to new and sophisticated attacks. Machine Learning (ML)-based IDS are increasingly seen as the most effective approach to handle this issue. However, IDS datasets suffer from high class imbalance, which impa…
▽ More
Software-Defined Networks (SDN) are the standard architecture for network deployment. Intrusion Detection Systems (IDS) are a pivotal part of this technology as networks become more vulnerable to new and sophisticated attacks. Machine Learning (ML)-based IDS are increasingly seen as the most effective approach to handle this issue. However, IDS datasets suffer from high class imbalance, which impacts the performance of standard ML models. We propose Sequential Binary Classification (SBC) - an algorithm for multi-class classification to address this issue. SBC is a hierarchical cascade of base classifiers, each of which can be modelled on any general binary classifier. Extensive experiments are reported on benchmark datasets that evaluate the performance of SBC under different scenarios.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
First-Generation Inference Accelerator Deployment at Facebook
Authors:
Michael Anderson,
Benny Chen,
Stephen Chen,
Summer Deng,
Jordan Fix,
Michael Gschwind,
Aravind Kalaiah,
Changkyu Kim,
Jaewon Lee,
Jason Liang,
Haixin Liu,
Yinghai Lu,
Jack Montgomery,
Arun Moorthy,
Satish Nadathur,
Sam Naghshineh,
Avinash Nayak,
Jongsoo Park,
Chris Petersen,
Martin Schatz,
Narayanan Sundaram,
Bangsheng Tang,
Peter Tang,
Amy Yang,
Jiecao Yu
, et al. (90 additional authors not shown)
Abstract:
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the in…
▽ More
In this paper, we provide a deep dive into the deployment of inference accelerators at Facebook. Many of our ML workloads have unique characteristics, such as sparse memory accesses, large model sizes, as well as high compute, memory and network bandwidth requirements. We co-designed a high-performance, energy-efficient inference accelerator platform based on these requirements. We describe the inference accelerator platform ecosystem we developed and deployed at Facebook: both hardware, through Open Compute Platform (OCP), and software framework and tooling, through Pytorch/Caffe2/Glow. A characteristic of this ecosystem from the start is its openness to enable a variety of AI accelerators from different vendors. This platform, with six low-power accelerator cards alongside a single-socket host CPU, allows us to serve models of high complexity that cannot be easily or efficiently run on CPUs. We describe various performance optimizations, at both platform and accelerator level, which enables this platform to serve production traffic at Facebook. We also share deployment challenges, lessons learned during performance optimization, as well as provide guidance for future inference hardware co-design.
△ Less
Submitted 4 August, 2021; v1 submitted 8 July, 2021;
originally announced July 2021.
-
Online Fashion Commerce: Modelling Customer Promise Date
Authors:
Preethi V,
Nachiappan Sundaram,
Ravindra Babu Tallamraju
Abstract:
In the e-commerce space, accurate prediction of delivery dates plays a major role in customer experience as well as in optimizing the supply chain operations. Predicting a date later than the actual delivery date might sometimes result in the customer not placing the order (lost sales) while promising a date earlier than the actual delivery date would lead to a bad customer experience and conseque…
▽ More
In the e-commerce space, accurate prediction of delivery dates plays a major role in customer experience as well as in optimizing the supply chain operations. Predicting a date later than the actual delivery date might sometimes result in the customer not placing the order (lost sales) while promising a date earlier than the actual delivery date would lead to a bad customer experience and consequent customer churn. In this paper, we present a machine learning-based approach for penalizing incorrect predictions differently using non-conventional loss functions, while working under various uncertainties involved in making successful deliveries such as traffic disruptions, weather conditions, supply chain, and logistics. We examine statistical, deep learning, and conventional machine learning approaches, and we propose an approach that outperformed the pre-existing rule-based models. The proposed model is deployed internally for Fashion e-Commerce and is operational.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
The LDBC Graphalytics Benchmark
Authors:
Alexandru Iosup,
Ahmed Musaafir,
Alexandru Uta,
Arnau Prat Pérez,
Gábor Szárnyas,
Hassan Chafi,
Ilie Gabriel Tănase,
Lifeng Nai,
Michael Anderson,
Mihai Capotă,
Narayanan Sundaram,
Peter Boncz,
Siegfried Depner,
Stijn Heldens,
Thomas Manhardt,
Tim Hegeman,
Wing Lung Ngai,
Yinglong Xia
Abstract:
In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, s…
▽ More
In this document, we describe LDBC Graphalytics, an industrial-grade benchmark for graph analysis platforms. The main goal of Graphalytics is to enable the fair and objective comparison of graph analysis platforms. Due to the diversity of bottlenecks and performance issues such platforms need to address, Graphalytics consists of a set of selected deterministic algorithms for full-graph analysis, standard graph datasets, synthetic dataset generators, and reference output for validation purposes. Its test harness produces deep metrics that quantify multiple kinds of systems scalability, weak and strong, and robustness, such as failures and performance variability. The benchmark also balances comprehensiveness with runtime necessary to obtain the deep metrics. The benchmark comes with open-source software for generating performance data, for validating algorithm results, for monitoring and sharing performance data, and for obtaining the final benchmark result as a standard performance report.
△ Less
Submitted 6 April, 2023; v1 submitted 30 November, 2020;
originally announced November 2020.
-
An Application of Newsboy Problem in Supply Chain Optimisation of Online Fashion E-Commerce
Authors:
Chandramouli Kamanchi,
Gopinath Ashok Kumar,
Nachiappan Sundaram,
Ravindra Babu T,
Chaithanya Bandi
Abstract:
We describe a supply chain optimization model deployed in an online fashion e-commerce company in India called Myntra. Our model is simple, elegant and easy to put into service. The model utilizes historic data and predicts the quantity of Stock Keeping Units (SKUs) to hold so that the metrics "Fulfilment Index" and "Utilization Index" are optimized. We present the mathematics central to our model…
▽ More
We describe a supply chain optimization model deployed in an online fashion e-commerce company in India called Myntra. Our model is simple, elegant and easy to put into service. The model utilizes historic data and predicts the quantity of Stock Keeping Units (SKUs) to hold so that the metrics "Fulfilment Index" and "Utilization Index" are optimized. We present the mathematics central to our model as well as compare the performance of our model with baseline regression based solutions.
△ Less
Submitted 5 July, 2020;
originally announced July 2020.
-
A Faster DiSH: Hardware Implementation of a Discrete Cell Signaling Network Simulator
Authors:
Kevin Gilboy,
Khaled Sayed,
Niteesh Sundaram,
Kara Bocan,
Natasa Miskov-Zivanov
Abstract:
Development of fast methods to conduct in silico experiments using computational models of cellular signaling is a promising approach toward advances in personalized medicine. However, software-based cellular network simulation has run-times plagued by wasted CPU cycles and unnecessary processes. Hardware-based simulation affords substantial speedup, but prior attempts at hardware-based biological…
▽ More
Development of fast methods to conduct in silico experiments using computational models of cellular signaling is a promising approach toward advances in personalized medicine. However, software-based cellular network simulation has run-times plagued by wasted CPU cycles and unnecessary processes. Hardware-based simulation affords substantial speedup, but prior attempts at hardware-based biological simulation have been limited in scope and have suffered from inaccuracies due to poor random number generation. In this work, we propose several hardware-based simulation schemes utilizing novel random update index generation techniques for step-based and round-based stochastic simulations of cellular networks. Our results show improved runtimes while maintaining simulation accuracy compared to software implementations.
△ Less
Submitted 19 November, 2018;
originally announced November 2018.
-
Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies
Authors:
Brian Friesen,
Md. Mostofa Ali Patwary,
Brian Austin,
Nadathur Satish,
Zachary Slepian,
Narayanan Sundaram,
Deborah Bard,
Daniel J Eisenstein,
Jack Deslippe,
Pradeep Dubey,
Prabhat
Abstract:
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to bill…
▽ More
The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.
△ Less
Submitted 31 August, 2017;
originally announced September 2017.
-
Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Authors:
Thorsten Kurth,
Jian Zhang,
Nadathur Satish,
Ioannis Mitliagkas,
Evan Racah,
Mostofa Ali Patwary,
Tareq Malas,
Narayanan Sundaram,
Wahid Bhimji,
Mikhail Smorkalov,
Jack Deslippe,
Mikhail Shiryaev,
Srinivas Sridharan,
Prabhat,
Pradeep Dubey
Abstract:
This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation…
▽ More
This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation obtains $\sim$2TFLOP/s on a single Cori Phase-II Xeon-Phi node. We use a hybrid strategy employing synchronous node-groups, while using asynchronous communication across groups. We use this strategy to scale training of a single model to $\sim$9600 Xeon-Phi nodes; obtaining peak performance of 11.73-15.07 PFLOP/s and sustained performance of 11.41-13.27 PFLOP/s. At scale, our HEP architecture produces state-of-the-art classification accuracy on a dataset with 10M images, exceeding that achieved by selections on high-level physics-motivated features. Our semi-supervised architecture successfully extracts weather patterns in a 15TB climate dataset. Our results demonstrate that Deep Learning can be optimized and scaled effectively on many-core, HPC systems.
△ Less
Submitted 17 August, 2017;
originally announced August 2017.
-
PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures
Authors:
Md. Mostofa Ali Patwary,
Nadathur Rajagopalan Satish,
Narayanan Sundaram,
Jialin Liu,
Peter Sadowski,
Evan Racah,
Suren Byna,
Craig Tull,
Wahid Bhimji,
Prabhat,
Pradeep Dubey
Abstract:
Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited s…
▽ More
Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited scalability for Big Data analytics challenges in the scientific domain. In this paper, we present parallel and highly optimized kd-tree based KNN algorithms (both construction and querying) suitable for distributed architectures. Our algorithm includes novel approaches for pruning search space and improving load balancing and partitioning among nodes and threads. Using TB-sized datasets from three science applications: astrophysics, plasma physics, and particle physics, we show that our implementation can construct kd-tree of 189 billion particles in 48 seconds on utilizing $\sim$50,000 cores. We also demonstrate computation of KNN of 19 billion queries in 12 seconds. We demonstrate almost linear speedup both for shared and distributed memory computers. Our algorithms outperforms earlier implementations by more than order of magnitude; thereby radically improving the applicability of our implementation to state-of-the-art Big Data analytics problems. In addition, we showcase performance and scalability on the recently released Intel Xeon Phi processor showing that our algorithm scales well even on massively parallel architectures.
△ Less
Submitted 27 July, 2016;
originally announced July 2016.
-
GraphMat: High performance graph analytics made productive
Authors:
Narayanan Sundaram,
Nadathur Rajagopalan Satish,
Md Mostofa Ali Patwary,
Subramanya R Dulloor,
Satya Gautam Vadlamudi,
Dipankar Das,
Pradeep Dubey
Abstract:
Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operat…
▽ More
Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operations in the backend. We get the productivity benefits of a vertex programming framework without sacrificing performance. GraphMat is in C++, and we have been able to write a diverse set of graph algorithms in this framework with the same effort compared to other vertex programming frameworks. GraphMat performs 1.2-7X faster than high performance frameworks such as GraphLab, CombBLAS and Galois. It achieves better multicore scalability (13-15X on 24 cores) than other frameworks and is 1.2X off native, hand-optimized code on a variety of different graph algorithms. Since GraphMat performance depends mainly on a few scalable and well-understood sparse matrix operations, GraphMatcan naturally benefit from the trend of increasing parallelism on future hardware.
△ Less
Submitted 24 March, 2015;
originally announced March 2015.