Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Satish, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:1811.09886  [pdf, other

    cs.LG stat.ML

    Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

    Authors: Jongsoo Park, Maxim Naumov, Protonu Basu, Summer Deng, Aravind Kalaiah, Daya Khudia, James Law, Parth Malani, Andrey Malevich, Satish Nadathur, Juan Pino, Martin Schatz, Alexander Sidorov, Viswanath Sivakumar, Andrew Tulloch, Xiaodong Wang, Yiming Wu, Hector Yuen, Utku Diril, Dmytro Dzhulgakov, Kim Hazelwood, Bill Jia, Yangqing Jia, Lin Qiao, Vijay Rao , et al. (3 additional authors not shown)

    Abstract: The application of deep learning techniques resulted in remarkable improvement of machine learning models. In this paper provides detailed characterizations of deep learning models used in many Facebook social network services. We present computational characteristics of our models, describe high performance optimizations targeting existing systems, point out their limitations and make suggestions… ▽ More

    Submitted 29 November, 2018; v1 submitted 24 November, 2018; originally announced November 2018.

  2. arXiv:1709.00086  [pdf, other

    astro-ph.CO cs.CE cs.PF

    Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

    Authors: Brian Friesen, Md. Mostofa Ali Patwary, Brian Austin, Nadathur Satish, Zachary Slepian, Narayanan Sundaram, Deborah Bard, Daniel J Eisenstein, Jack Deslippe, Pradeep Dubey, Prabhat

    Abstract: The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to bill… ▽ More

    Submitted 31 August, 2017; originally announced September 2017.

    Comments: 11 pages, 7 figures, accepted to SuperComputing 2017

  3. arXiv:1708.05256  [pdf, other

    cs.PF cs.CV cs.LG

    Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data

    Authors: Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, Pradeep Dubey

    Abstract: This paper presents the first, 15-PetaFLOP Deep Learning system for solving scientific pattern classification problems on contemporary HPC architectures. We develop supervised convolutional architectures for discriminating signals in high-energy physics data as well as semi-supervised architectures for localizing and classifying extreme weather in climate data. Our Intelcaffe-based implementation… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.

    Comments: 12 pages, 9 figures

  4. arXiv:1704.02677  [pdf, other

    cs.AR

    Banshee: Bandwidth-Efficient DRAM Caching Via Software/Hardware Cooperation

    Authors: Xiangyao Yu, Christopher J. Hughes, Nadathur Satish, Onur Mutlu, Srinivas Devadas

    Abstract: Putting the DRAM on the same package with a processor enables several times higher memory bandwidth than conventional off-package DRAM. Yet, the latency of in-package DRAM is not appreciably lower than that of off-package DRAM. A promising use of in-package DRAM is as a large cache. Unfortunately, most previous DRAM cache designs mainly optimize for hit latency and do not consider off-chip bandwid… ▽ More

    Submitted 9 April, 2017; originally announced April 2017.

    Comments: 12 pages

  5. arXiv:1611.06172  [pdf, other

    cs.DC stat.ML

    Parallelizing Word2Vec in Multi-Core and Many-Core Architectures

    Authors: Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey

    Abstract: Word2vec is a widely used algorithm for extracting low-dimensional vector representations of words. State-of-the-art algorithms including those by Mikolov et al. have been parallelized for multi-core CPU architectures, but are based on vector-vector operations with "Hogwild" updates that are memory-bandwidth intensive and do not efficiently use computational resources. In this paper, we propose "H… ▽ More

    Submitted 23 December, 2016; v1 submitted 18 November, 2016; originally announced November 2016.

    Comments: NIPS Workshop on Efficient Methods for Deep Neural Networks (2016)

  6. PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures

    Authors: Md. Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jialin Liu, Peter Sadowski, Evan Racah, Suren Byna, Craig Tull, Wahid Bhimji, Prabhat, Pradeep Dubey

    Abstract: Computing $k$-Nearest Neighbors (KNN) is one of the core kernels used in many machine learning, data mining and scientific computing applications. Although kd-tree based $O(\log n)$ algorithms have been proposed for computing KNN, due to its inherent sequentiality, linear algorithms are being used in practice. This limits the applicability of such methods to millions of data points, with limited s… ▽ More

    Submitted 27 July, 2016; originally announced July 2016.

    Comments: 11 pages in PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures, Md. Mostofa Ali Patwary et.al., IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2016

  7. arXiv:1604.04661  [pdf, other

    cs.DC cs.CL stat.ML

    Parallelizing Word2Vec in Shared and Distributed Memory

    Authors: Shihao Ji, Nadathur Satish, Sheng Li, Pradeep Dubey

    Abstract: Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its exceptional performance in many NLP applications such as named entity recognition, sentiment analysis, machine translation and question answering. State-of-the-art algor… ▽ More

    Submitted 8 August, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

    Comments: Added more results

  8. arXiv:1511.06909  [pdf, other

    cs.LG cs.CL cs.NE stat.ML

    BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies

    Authors: Shihao Ji, S. V. N. Vishwanathan, Nadathur Satish, Michael J. Anderson, Pradeep Dubey

    Abstract: We propose BlackOut, an approximation algorithm to efficiently train massive recurrent neural network language models (RNNLMs) with million word vocabularies. BlackOut is motivated by using a discriminative loss, and we describe a new sampling strategy which significantly reduces computation while improving stability, sample efficiency, and rate of convergence. One way to understand BlackOut is to… ▽ More

    Submitted 31 March, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016

  9. arXiv:1503.07241  [pdf, other

    cs.PF cs.DB cs.DC

    GraphMat: High performance graph analytics made productive

    Authors: Narayanan Sundaram, Nadathur Rajagopalan Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Satya Gautam Vadlamudi, Dipankar Das, Pradeep Dubey

    Abstract: Given the growing importance of large-scale graph analytics, there is a need to improve the performance of graph analysis frameworks without compromising on productivity. GraphMat is our solution to bridge this gap between a user-friendly graph analytics framework and native, hand-optimized code. GraphMat functions by taking vertex programs and mapping them to high performance sparse matrix operat… ▽ More

    Submitted 24 March, 2015; originally announced March 2015.

  10. arXiv:1109.6885  [pdf, other

    cs.DB

    Fast Updates on Read-Optimized Databases Using Multi-Core CPUs

    Authors: Jens Krueger, Changkyu Kim, Martin Grund, Nadathur Satish, David Schwalb, Jatin Chhugani, Hasso Plattner, Pradeep Dubey, Alexander Zeier

    Abstract: Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process introduces significant overheads and unacceptable downtimes in update intensive systems, aspiring to combine transactional and analytical workloads into one system… ▽ More

    Submitted 30 September, 2011; originally announced September 2011.

    Comments: VLDB2012

    Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 1, pp. 61-72 (2011)