Zum Hauptinhalt springen

Showing 1–6 of 6 results for author: Dukhan, M

.
  1. arXiv:2001.04438  [pdf, other

    cs.PF cs.LG

    The Two-Pass Softmax Algorithm

    Authors: Marat Dukhan, Artsiom Ablavatski

    Abstract: The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

  2. arXiv:1911.09723  [pdf, other

    cs.CV

    Fast Sparse ConvNets

    Authors: Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

    Abstract: Historically, the pursuit of efficient inference has been one of the driving forces behind research into new deep learning architectures and building blocks. Some recent examples include: the squeeze-and-excitation module, depthwise separable convolutions in Xception, and the inverted bottleneck in MobileNet v2. Notably, in all of these cases, the resulting building blocks enabled not only higher… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

  3. arXiv:1907.02129  [pdf, other

    cs.CV cs.LG cs.NE

    The Indirect Convolution Algorithm

    Authors: Marat Dukhan

    Abstract: Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS libraries. Convolutions with 1x1 kernels can be directly represented as a GEMM call, but convolutions with larger kernels require a special memory layout transformatio… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: Presented on Efficient Deep Learning for Computer Vision workshop at CVPR 2019

  4. arXiv:1812.08934  [pdf, other

    cs.CV cs.NE

    ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

    Authors: Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, Niraj K. Jha

    Abstract: This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement learning algorithms, our approach leverages existing efficient network building blocks and focuses on exploiting hardware traits and adapting computation resources to… ▽ More

    Submitted 20 December, 2018; originally announced December 2018.

  5. arXiv:1603.00491  [pdf, other

    math.NA cs.PF

    Wanted: Floating-Point Add Round-off Error instruction

    Authors: Marat Dukhan, Richard Vuduc, Jason Riedy

    Abstract: We propose a new instruction (FPADDRE) that computes the round-off error in floating-point addition. We explain how this instruction benefits high-precision arithmetic operations in applications where double precision is not sufficient. Performance estimates on Intel Haswell, Intel Skylake, and AMD Steamroller processors, as well as Intel Knights Corner co-processor, demonstrate that such an instr… ▽ More

    Submitted 1 March, 2016; originally announced March 2016.

  6. arXiv:1411.1460  [pdf, other

    cs.DC cs.PF

    Branch-Avoiding Graph Algorithms

    Authors: Oded Green, Marat Dukhan, Richard Vuduc

    Abstract: This paper quantifies the impact of branches and branch mispredictions on the single-core performance for two classes of graph problems. Specifically, we consider classical algorithms for computing connected components and breadth-first search (BFS). We show that branch mispredictions are costly and can reduce performance by as much as 30%-50%. This insight suggests that one should seek graph algo… ▽ More

    Submitted 5 November, 2014; originally announced November 2014.

    ACM Class: C.0; C.4; E.1