Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: D'Alberto, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.09453  [pdf, other

    cs.LG cs.AR cs.CL

    Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

    Authors: Paolo D'Alberto, Taehee Jeong, Akshai Jain, Shreyas Manjunath, Mrinal Sarmah, Samuel Hsu, Yaswanth Raparti, Nitesh Pipralia

    Abstract: Nowadays, increasingly larger Deep Neural Networks (DNNs) are being developed, trained, and utilized. These networks require significant computational resources, putting a strain on both advanced and limited devices. Our solution is to implement {\em weight block sparsity}, which is a structured sparsity that is friendly to hardware. By zeroing certain sections of the convolution and fully connect… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 12 pages, 10 figures, 1 table

    ACM Class: C.5; D.3.4

  2. arXiv:2312.12732  [pdf, other

    cs.MS cs.PF

    Strassen's Matrix Multiplication Algorithm Is Still Faster

    Authors: Paolo D'Alberto

    Abstract: Recently, reinforcement algorithms discovered new algorithms that really jump-started a wave of excitements and a flourishing of publications. However, there is little on implementations, applications, and, especially, no absolute performance and, we show here they are not here to replace Strassen's original fast matrix multiplication yet. We present Matrix Flow, this is a simple Python project fo… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 8 pages, 2 images, mathematical software

    MSC Class: 97N80 ACM Class: G.4

  3. arXiv:2308.00106  [pdf, other

    cs.DC

    Entropy Maximization in Sparse Matrix by Vector Multiplication ($\max_E SpMV$)

    Authors: Paolo D'Alberto, Abhishek Jain, Ismail Bustany, Henri Fraisse, Mansimran Benipal

    Abstract: The peak performance of any SpMV depends primarily on the available memory bandwidth and its effective use. GPUs, ASICs, and new FPGAs have higher and higher bandwidth; however, for large scale and highly sparse matrices, SpMV is still a hard problem because of its random access pattern and workload imbalance. Here, we show how to turn randomness to our advantage. We propose a matrix permutation p… ▽ More

    Submitted 24 July, 2023; originally announced August 2023.

    Comments: 26 pages

  4. arXiv:2307.12875  [pdf, other

    stat.AP cs.HC cs.PF

    Digital Advertising: the Measure of Mobile Visits Lifts

    Authors: Paolo D'Alberto, Veronica Milenkiy, Fairiz Fi Azizi

    Abstract: Mobile-phone advertising enables marketers to reach customers at a personal level and it enables the measure of costumers reaction by novel approaches, in real time, and at scale. By keeping a device anonymous, we can deliver custom adverts and we can check when the device owner will visit a specific mortar-and-brick location. This is the first step in a sale. By measuring visits and sales, the or… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 27 pages, 18 figures

    ACM Class: G.3; A.3; B.3; C.3

  5. arXiv:2110.04327  [pdf, other

    cs.CL

    DPUV3INT8: A Compiler View to programmable FPGA Inference Engines

    Authors: Paolo D'Alberto, Jiangsha Ma, Jintao Li, Yiming Hu, Manasa Bollavaram, Shaoxia Fang

    Abstract: We have a FPGA design, we make it fast, efficient, and tested for a few important examples. Now we must infer a general solution to deploy in the data center. Here, we describe the FPGA DPUV3INT8 design and our compiler effort. The hand-tuned SW-HW solution for Resnet50\_v1 has (close to) 2 times better images per second (throughput) than our best FPGA implementation; the compiler generalizes the… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: 11 pages

  6. arXiv:1805.07941  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Quantizing Convolutional Neural Networks for Low-Power High-Throughput Inference Engines

    Authors: Sean O. Settle, Manasa Bollavaram, Paolo D'Alberto, Elliott Delaye, Oscar Fernandez, Nicholas Fraser, Aaron Ng, Ashish Sirasao, Michael Wu

    Abstract: Deep learning as a means to inferencing has proliferated thanks to its versatility and ability to approach or exceed human-level accuracy. These computational models have seemingly insatiable appetites for computational resources not only while training, but also when deployed at scales ranging from data centers all the way down to embedded devices. As such, increasing consideration is being made… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

  7. arXiv:1501.02185  [pdf, other

    stat.AP cs.OH

    Multiple-Campaign Ad-Targeting Deployment: Parallel Response Modeling, Calibration and Scoring Without Personal User Information

    Authors: Paolo D'Alberto

    Abstract: We present a vertical introduction to campaign optimization; that is, the ability to predict the user response to an ad campaign without any users' profiles on average and for each exposed ad. In practice, we present an approach to build a polytomous model, multi response, composed by several hundred binary models using generalized linear models. The theory has been introduced twenty years ago and… ▽ More

    Submitted 10 May, 2015; v1 submitted 2 January, 2015; originally announced January 2015.

  8. arXiv:1501.00491  [pdf, other

    cs.OH

    Mapping and Matching Algorithms: Data Mining by Adaptive Graphs

    Authors: Paolo D'Alberto, Veronica Milenkly

    Abstract: Assume we have two bijective functions $U(x)$ and $M(x)$ with $M(x)\neq U(x)$ for all $x$ and $M,N: \N \rightarrow \N$ . Every day and in different locations, we see the different results of $U$ and $M$ without seeing $x$. We are not assured about the time stamp nor the order within the day but at least the location is fully defined. We want to find the matching between $U(x)$ and $M(x)$ (i.e., we… ▽ More

    Submitted 2 January, 2015; originally announced January 2015.

  9. arXiv:1205.2927  [pdf, other

    cs.MS

    A Heterogeneous Accelerated Matrix Multiplication: OpenCL + APU + GPU+ Fast Matrix Multiply

    Authors: Paolo D'Alberto

    Abstract: As users and developers, we are witnessing the opening of a new computing scenario: the introduction of hybrid processors into a single die, such as an accelerated processing unit (APU) processor, and the plug-and-play of additional graphics processing units (GPUs) onto a single motherboard. These APU processors provide multiple symmetric cores with their memory hierarchies and an integrated GPU.… ▽ More

    Submitted 13 May, 2012; originally announced May 2012.

    Comments: 15 pages, 6 Figure, Fusion AMD Fusion Developer Summit 2012

    ACM Class: G.4

  10. arXiv:1107.2691  [pdf, other

    stat.CO cs.IR

    On the Weakenesses of Correlation Measures used for Search Engines' Results (Unsupervised Comparison of Search Engine Rankings)

    Authors: Paolo D'Alberto, Ali Dasdan

    Abstract: The correlation of the result lists provided by search engines is fundamental and it has deep and multidisciplinary ramifications. Here, we present automatic and unsupervised methods to assess whether or not search engines provide results that are comparable or correlated. We have two main contributions: First, we provide evidence that for more than 80% of the input queries - independently of thei… ▽ More

    Submitted 13 July, 2011; originally announced July 2011.

    Comments: 16 pages, 19 figures