Zum Hauptinhalt springen

Showing 1–2 of 2 results for author: Chetlur, S

Searching in archive cs. Search in all archives.
.
  1. Wafer-Scale Fast Fourier Transforms

    Authors: Marcelo Orenes-Vera, Ilya Sharapov, Robert Schreiber, Mathias Jacquelin, Philippe Vandermersch, Sharan Chetlur

    Abstract: We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a two-dimensional mesh of roughly 850,000 processing elements (PEs) with fast local memory and equally fast nearest-neighbor interconnections. Our wafer-scale FFT (wsFF… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Journal ref: Proceedings of the 37th International Conference on Supercomputing 2023

  2. arXiv:1410.0759  [pdf, other

    cs.NE cs.LG cs.MS

    cuDNN: Efficient Primitives for Deep Learning

    Authors: Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer

    Abstract: We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels must be reoptimized, which makes maintaining codebases difficult over time. Similar issues have long been addressed in the HPC community by libraries such as the… ▽ More

    Submitted 17 December, 2014; v1 submitted 3 October, 2014; originally announced October 2014.