Zum Hauptinhalt springen

Showing 1–28 of 28 results for author: Ballard, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.08134  [pdf, other

    cs.LG math.NA math.OC

    Randomized Algorithms for Symmetric Nonnegative Matrix Factorization

    Authors: Koby Hayashi, Sinan G. Aksoy, Grey Ballard, Haesun Park

    Abstract: Symmetric Nonnegative Matrix Factorization (SymNMF) is a technique in data analysis and machine learning that approximates a symmetric matrix with a product of a nonnegative, low-rank matrix and its transpose. To design faster and more scalable algorithms for SymNMF we develop two randomized algorithms for its computation. The first algorithm uses randomized matrix sketching to compute an initial… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    MSC Class: 65F55; 65F20

  2. arXiv:2307.16652  [pdf, other

    cs.DC cs.LG stat.ML

    Sequential and Shared-Memory Parallel Algorithms for Partitioned Local Depths

    Authors: Aditya Devarakonda, Grey Ballard

    Abstract: In this work, we design, analyze, and optimize sequential and shared-memory parallel algorithms for partitioned local depths (PaLD). Given a set of data points and pairwise distances, PaLD is a method for identifying strength of pairwise relationships based on relative distances, enabling the identification of strong ties within dense and sparse communities even if their sizes and within-community… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    MSC Class: 68W10 ACM Class: D.1.3

  3. arXiv:2207.10437  [pdf, other

    cs.DC

    Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse

    Abstract: Multiple Tensor-Times-Matrix (Multi-TTM) is a key computation in algorithms for computing and operating with the Tucker tensor decomposition, which is frequently used in multidimensional data analysis. We establish communication lower bounds that determine how much data movement is required to perform the Multi-TTM computation in parallel. The crux of the proof relies on analytically solving a con… ▽ More

    Submitted 2 February, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

  4. arXiv:2205.13407  [pdf, ps, other

    cs.DC

    Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds

    Authors: Hussam Al Daas, Grey Ballard, Laura Grigori, Suraj Kumar, Kathryn Rouse

    Abstract: Communication lower bounds have long been established for matrix multiplication algorithms. However, most methods of asymptotic analysis have either ignored the constant factors or not obtained the tightest possible values. Recent work has demonstrated that more careful analysis improves the best known constants for some classical matrix multiplication lower bounds and helps to identify more effic… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

  5. arXiv:1909.06524  [pdf, ps, other

    math.NA cs.DS

    A Generalized Randomized Rank-Revealing Factorization

    Authors: Grey Ballard, James Demmel, Ioana Dumitriu, Alexander Rusciano

    Abstract: We introduce a Generalized Randomized QR-decomposition that may be applied to arbitrary products of matrices and their inverses, without needing to explicitly compute the products or inverses. This factorization is a critical part of a communication-optimal spectral divide-and-conquer algorithm for the nonsymmetric eigenvalue problem. In this paper, we establish that this randomized QR-factorizati… ▽ More

    Submitted 13 September, 2019; originally announced September 2019.

    MSC Class: 15-04

  6. arXiv:1909.01149  [pdf, other

    math.NA cs.DC cs.MS

    PLANC: Parallel Low Rank Approximation with Non-negativity Constraints

    Authors: Srinivas Eswar, Koby Hayashi, Grey Ballard, Ramakrishnan Kannan, Michael A. Matheson, Haesun Park

    Abstract: We consider the problem of low-rank approximation of massive dense non-negative tensor data, for example to discover latent patterns in video and imaging applications. As the size of data sets grows, single workstations are hitting bottlenecks in both computation time and available memory. We propose a distributed-memory parallel computing solution to handle massive data sets, loading the input da… ▽ More

    Submitted 30 August, 2019; originally announced September 2019.

    Comments: arXiv admin note: text overlap with arXiv:1806.07985

  7. arXiv:1906.04749  [pdf, other

    eess.IV cs.CV math.NA

    Joint 3D Localization and Classification of Space Debris using a Multispectral Rotating Point Spread Function

    Authors: Chao Wang, Grey Ballard, Robert Plemmons, Sudhakar Prasad

    Abstract: We consider the problem of joint three-dimensional (3D) localization and material classification of unresolved space debris using a multispectral rotating point spread function (RPSF). The use of RPSF allows one to estimate the 3D locations of point sources from their rotated images acquired by a single 2D sensor array, since the amount of rotation of each source image about its x, y location depe… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: 25 pages

  8. TuckerMPI: A Parallel C++/MPI Software Package for Large-scale Data Compression via the Tucker Tensor Decomposition

    Authors: Grey Ballard, Alicia Klinvex, Tamara G. Kolda

    Abstract: Our goal is compression of massive-scale grid-structured data, such as the multi-terabyte output of a high-fidelity computational simulation. For such data sets, we have developed a new software package called TuckerMPI, a parallel C++/MPI software package for compressing distributed data. The approach is based on treating the data as a tensor, i.e., a multidimensional array, and computing its tru… ▽ More

    Submitted 21 August, 2019; v1 submitted 17 January, 2019; originally announced January 2019.

    Journal ref: ACM Transactions on Mathematical Software, Vol. 46, No. 2, Article 13, June 2020

  9. arXiv:1806.07985  [pdf, other

    math.NA cs.DC cs.MS

    Parallel Nonnegative CP Decomposition of Dense Tensors

    Authors: Grey Ballard, Koby Hayashi, Ramakrishnan Kannan

    Abstract: The CP tensor decomposition is a low-rank approximation of a tensor. We present a distributed-memory parallel algorithm and implementation of an alternating optimization method for computing a CP decomposition of dense tensor data that can enforce nonnegativity of the computed low-rank factors. The principal task is to parallelize the matricized-tensor times Khatri-Rao product (MTTKRP) bottleneck… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

  10. arXiv:1805.05278  [pdf, ps, other

    cs.DC

    A 3D Parallel Algorithm for QR Decomposition

    Authors: Grey Ballard, James Demmel, Laura Grigori, Mathias Jacquelin, Nicholas Knight

    Abstract: Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different c… ▽ More

    Submitted 14 May, 2018; originally announced May 2018.

  11. arXiv:1801.00843  [pdf, ps, other

    cs.CC

    The geometry of rank decompositions of matrix multiplication II: $3\times 3$ matrices

    Authors: Grey Ballard, Christian Ikenmeyer, J. M. Landsberg, Nick Ryder

    Abstract: This is the second in a series of papers on rank decompositions of the matrix multiplication tensor. We present new rank $23$ decompositions for the $3\times 3$ matrix multiplication tensor $M_{\langle 3\rangle}$. All our decompositions have symmetry groups that include the standard cyclic permutation of factors but otherwise exhibit a range of behavior. One of them has 11 cubes as summands and ad… ▽ More

    Submitted 2 January, 2018; originally announced January 2018.

    MSC Class: 68Q17; 14L30; 15A69

  12. arXiv:1708.08976  [pdf, other

    cs.DC

    Shared Memory Parallelization of MTTKRP for Dense Tensors

    Authors: Koby Hayashi, Grey Ballard, Jeffrey Jiang, Michael Tobia

    Abstract: The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in me… ▽ More

    Submitted 29 August, 2017; originally announced August 2017.

    Comments: 10 pages, 27 figures

  13. arXiv:1708.07401  [pdf, ps, other

    cs.DC

    Communication Lower Bounds for Matricized Tensor Times Khatri-Rao Product

    Authors: Grey Ballard, Nicholas Knight, Kathryn Rouse

    Abstract: The matricized-tensor times Khatri-Rao product computation is the typical bottleneck in algorithms for computing a CP decomposition of a tensor. In order to develop high performance sequential and parallel algorithms, we establish communication lower bounds that identify how much data movement is required for this computation in the case of dense tensors. We also present sequential and parallel al… ▽ More

    Submitted 22 October, 2017; v1 submitted 24 August, 2017; originally announced August 2017.

  14. arXiv:1609.09154  [pdf, other

    cs.DC math.NA stat.ML

    MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization

    Authors: Ramakrishnan Kannan, Grey Ballard, Haesun Park

    Abstract: Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors $W$ and $H$, for the given input matrix $A$, such that $A \approx W H$. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data m… ▽ More

    Submitted 28 September, 2016; originally announced September 2016.

    Comments: arXiv admin note: text overlap with arXiv:1509.09313

  15. arXiv:1604.03703  [pdf, other

    cs.DC math.NA

    A communication-avoiding parallel algorithm for the symmetric eigenvalue problem

    Authors: Edgar Solomonik, Grey Ballard, James Demmel, Torsten Hoefler

    Abstract: Many large-scale scientific computations require eigenvalue solvers in a scaling regime where efficiency is limited by data movement. We introduce a parallel algorithm for computing the eigenvalues of a dense symmetric matrix, which performs asymptotically less communication than previously known approaches. We provide analysis in the Bulk Synchronous Parallel (BSP) model with additional considera… ▽ More

    Submitted 13 April, 2016; originally announced April 2016.

  16. arXiv:1603.05627  [pdf, ps, other

    cs.DC

    Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication

    Authors: Grey Ballard, Alex Druinsky, Nicholas Knight, Oded Schwartz

    Abstract: We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the interprocessor communication volume along a critical path in a parallel computation and also the volume of data moving through the memory hierarchy in… ▽ More

    Submitted 17 March, 2016; originally announced March 2016.

  17. Parallel Tensor Compression for Large-Scale Scientific Data

    Authors: Woody Austin, Grey Ballard, Tamara G. Kolda

    Abstract: As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8~TB of data, assuming double precision. By viewing the data as a dense five-way tensor, we can comput… ▽ More

    Submitted 23 February, 2016; v1 submitted 22 October, 2015; originally announced October 2015.

    Journal ref: IPDPS'16: Proceedings of the 30th IEEE International Parallel and Distributed Processing Symposium, pp. 912-922, May 2016

  18. arXiv:1510.00844  [pdf, other

    cs.DC math.NA

    Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

    Authors: Ariful Azad, Grey Ballard, Aydin Buluc, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, Samuel Williams

    Abstract: Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, th… ▽ More

    Submitted 16 November, 2016; v1 submitted 3 October, 2015; originally announced October 2015.

    Journal ref: SIAM Journal of Scientific Computing, Volume 38, Number 6, pp. C624-C651, 2016

  19. arXiv:1509.09313  [pdf, other

    cs.DC

    A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization

    Authors: Ramakrishnan Kannan, Grey Ballard, Haesun Park

    Abstract: Non-negative matrix factorization (NMF) is the problem of determining two non-negative low rank factors $W$ and $H$, for the given input matrix $A$, such that $A \approx W H$. NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data m… ▽ More

    Submitted 30 September, 2015; originally announced September 2015.

  20. Diamond Sampling for Approximate Maximum All-pairs Dot-product (MAD) Search

    Authors: Grey Ballard, Ali Pinar, Tamara G. Kolda, C. Seshadhri

    Abstract: Given two sets of vectors, $A = \{{a_1}, \dots, {a_m}\}$ and $B=\{{b_1},\dots,{b_n}\}$, our problem is to find the top-$t$ dot products, i.e., the largest $|{a_i}\cdot{b_j}|$ among all possible pairs. This is a fundamental mathematical problem that appears in numerous data applications involving similarity search, link prediction, and collaborative filtering. We propose a sampling-based approach t… ▽ More

    Submitted 18 June, 2015; v1 submitted 11 June, 2015; originally announced June 2015.

    Journal ref: ICDM 2015: Proceedings of the 2015 IEEE International Conference on Data Mining, pp. 11-20, November 2015

  21. arXiv:1409.2908  [pdf, ps, other

    cs.DC cs.MS math.NA

    A Framework for Practical Parallel Fast Matrix Multiplication

    Authors: Austin R. Benson, Grey Ballard

    Abstract: Matrix multiplication is a fundamental computation in many scientific disciplines. In this paper, we show that novel fast matrix multiplication algorithms can significantly outperform vendor implementations of the classical algorithm and Strassen's fast algorithm on modest problem sizes and shapes. Furthermore, we show that the best choice of fast algorithm depends not only on the size of the matr… ▽ More

    Submitted 9 September, 2014; originally announced September 2014.

    ACM Class: G.4

    Journal ref: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2015

  22. arXiv:1209.2184  [pdf, other

    cs.DS cs.CC math.NA

    Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication

    Authors: Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, Oded Schwartz

    Abstract: Graph expansion analysis of computational DAGs is useful for obtaining communication cost lower bounds where previous methods, such as geometric embedding, are not applicable. This has recently been demonstrated for Strassen's and Strassen-like fast square matrix multiplication algorithms. Here we extend the expansion analysis approach to fast algorithms for rectangular matrix multiplication, obta… ▽ More

    Submitted 10 September, 2012; originally announced September 2012.

    Journal ref: Design and Analysis of Algorithms Volume 7659, 2012, pp 13-36

  23. arXiv:1202.3177  [pdf, other

    cs.DS cs.CC cs.DC math.CO math.NA

    Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds

    Authors: Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, Oded Schwartz

    Abstract: A parallel algorithm has perfect strong scaling if its running time on P processors is linear in 1/P, including all communication costs. Distributed-memory parallel algorithms for matrix multiplication with perfect strong scaling have only recently been found. One is based on classical matrix multiplication (Solomonik and Demmel, 2011), and one is based on Strassen's fast matrix multiplication (Ba… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Comments: 4 pages, 1 figure

    MSC Class: 68W10; 68W40 ACM Class: F.2.1

  24. arXiv:1202.3173  [pdf, other

    cs.DS cs.CC cs.DC math.CO math.NA

    Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication

    Authors: Grey Ballard, James Demmel, Olga Holtz, Benjamin Lipshitz, Oded Schwartz

    Abstract: Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A criti… ▽ More

    Submitted 14 February, 2012; originally announced February 2012.

    Comments: 13 pages, 3 figures

    MSC Class: 68W40; 68W10 ACM Class: F.2.1

  25. arXiv:1109.1693  [pdf, ps, other

    cs.DS cs.CC cs.DC math.CO math.NA

    Graph Expansion and Communication Costs of Fast Matrix Multiplication

    Authors: Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz

    Abstract: The communication cost of algorithms (also known as I/O-complexity) is shown to be closely related to the expansion properties of the corresponding computation graphs. We demonstrate this on Strassen's and other fast matrix multiplication algorithms, and obtain first lower bounds on their communication costs. In the sequential case, where the processor has a fast memory of size $M$, too small to… ▽ More

    Submitted 8 September, 2011; originally announced September 2011.

    Report number: UCB/EECS-2011-40 ACM Class: F.2.1

    Journal ref: Proceedings of the 23rd annual symposium on parallelism in algorithms and architectures. ACM, 1-12. 2011 (a shorter conference version)

  26. arXiv:1011.3077  [pdf, other

    math.NA cs.DC cs.MS

    Minimizing Communication for Eigenproblems and the Singular Value Decomposition

    Authors: Grey Ballard, James Demmel, Ioana Dumitriu

    Abstract: Algorithms have two costs: arithmetic and communication. The latter represents the cost of moving data, either between levels of a memory hierarchy, or between processors over a network. Communication often dominates arithmetic and represents a rapidly increasing proportion of the total cost, so we seek algorithms that minimize communication. In \cite{BDHS10} lower bounds were presented on the amo… ▽ More

    Submitted 12 November, 2010; originally announced November 2010.

    Comments: 43 pages, 11 figures

    MSC Class: 65F15

  27. arXiv:0905.2485  [pdf, ps, other

    cs.CC cs.DS math.NA

    Minimizing Communication in Linear Algebra

    Authors: Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz

    Abstract: In 1981 Hong and Kung proved a lower bound on the amount of communication needed to perform dense, matrix-multiplication using the conventional $O(n^3)$ algorithm, where the input matrices were too large to fit in the small, fast memory. In 2004 Irony, Toledo and Tiskin gave a new proof of this result and extended it to the parallel case. In both cases the lower bound may be expressed as $Ω$(#ar… ▽ More

    Submitted 15 May, 2009; originally announced May 2009.

    Comments: 27 pages, 2 tables

    Journal ref: SIAM. J. Matrix Anal. & Appl. 32 (2011), no. 3, 866-901

  28. arXiv:0902.2537  [pdf, other

    math.NA cs.CC cs.DS

    Communication-optimal Parallel and Sequential Cholesky Decomposition

    Authors: Grey Ballard, James Demmel, Olga Holtz, Oded Schwartz

    Abstract: Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lower… ▽ More

    Submitted 12 April, 2010; v1 submitted 15 February, 2009; originally announced February 2009.

    Comments: 29 pages, 2 tables, 6 figures

    ACM Class: F.2.1

    Journal ref: SIAM J. Sci. Comput. 32, (2010) pp. 3495-3523