Zum Hauptinhalt springen

Showing 1–10 of 10 results for author: Severo, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08837  [pdf, ps, other

    cs.LG cs.DS cs.IT

    Entropy Coding of Unordered Data Structures

    Authors: Julius Kunze, Daniel Severo, Giulio Zani, Jan-Willem van de Meent, James Townsend

    Abstract: We present shuffle coding, a general method for optimal compression of sequences of unordered objects using bits-back coding. Data structures that can be compressed using shuffle coding include multisets, graphs, hypergraphs, and others. We release an implementation that can easily be adapted to different data types and statistical models, and demonstrate that our implementation achieves state-of-… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Published at ICLR 2024

  2. arXiv:2310.05986  [pdf, other

    cs.CV

    The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric

    Authors: Daniel Severo, Lucas Theis, Johannes Ballé

    Abstract: We show how perceptual embeddings of the visual system can be constructed at inference-time with no training data or deep neural network features. Our perceptual embeddings are solutions to a weighted least squares (WLS) problem, defined at the pixel-level, and solved at inference-time, that can capture global and local image characteristics. The distance in embedding space is used to define a per… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  3. arXiv:2305.09705  [pdf, other

    cs.LG cs.IT

    Random Edge Coding: One-Shot Bits-Back Coding of Large Labeled Graphs

    Authors: Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani

    Abstract: We present a one-shot method for compressing large labeled graphs called Random Edge Coding. When paired with a parameter-free model based on Pólya's Urn, the worst-case computational and memory complexities scale quasi-linearly and linearly with the number of observed edges, making it efficient on sparse graphs, and requires only integer arithmetic. Key to our method is bits-back coding, which is… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: Published at ICML 2023

  4. arXiv:2210.06662  [pdf, other

    cs.LG

    Action Matching: Learning Stochastic Dynamics from Samples

    Authors: Kirill Neklyudov, Rob Brekelmans, Daniel Severo, Alireza Makhzani

    Abstract: Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to bet… ▽ More

    Submitted 8 June, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

    Comments: Published in ICML 2023

  5. arXiv:2112.13687  [pdf

    cs.LG

    Predição de Incidência de Lesão por Pressão em Pacientes de UTI usando Aprendizado de Máquina

    Authors: Henrique P. Silva, Arthur D. Reys, Daniel S. Severo, Dominique H. Ruther, Flávio A. O. B. Silva, Maria C. S. S. Guimarães, Roberto Z. A. Pinto, Saulo D. S. Pedro, Túlio P. Navarro, Danilo Silva

    Abstract: Pressure ulcers have high prevalence in ICU patients but are preventable if identified in initial stages. In practice, the Braden scale is used to classify high-risk patients. This paper investigates the use of machine learning in electronic health records data for this task, by using data available in MIMIC-III v1.4. Two main contributions are made: a new approach for evaluating models that consi… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: 3 pages, 1 figure, in Portuguese, accepted at XVIII Congresso Brasileiro de Informática em Saúde (CBIS 2021)

  6. arXiv:2107.09716  [pdf, other

    cs.LG eess.SP

    Regularized Classification-Aware Quantization

    Authors: Daniel Severo, Elad Domanovitz, Ashish Khisti

    Abstract: Traditionally, quantization is designed to minimize the reconstruction error of a data source. When considering downstream classification tasks, other measures of distortion can be of interest; such as the 0-1 classification loss. Furthermore, it is desirable that the performance of these quantizers not deteriorate once they are deployed into production, as relearning the scheme online is not alwa… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: Accepted to the 30th Biennial Symposium on Communications (BSC) 2021

  7. arXiv:2107.09202  [pdf, other

    cs.IT cs.LG eess.SP

    Compressing Multisets with Large Alphabets using Bits-Back Coding

    Authors: Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani, Karen Ullrich

    Abstract: Current methods which compress multisets at an optimal rate have computational complexity that scales linearly with alphabet size, making them too slow to be practical in many real-world settings. We show how to convert a compression algorithm for sequences into one for multisets, in exchange for an additional complexity term that is quasi-linear in sequence length. This allows us to compress mult… ▽ More

    Submitted 27 February, 2023; v1 submitted 15 July, 2021; originally announced July 2021.

    Journal ref: IEEE Journal on Selected Areas in Information Theory, 2023

  8. arXiv:2102.11086  [pdf, other

    cs.LG cs.AI cs.IT stat.CO

    Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

    Authors: Yangjun Ruan, Karen Ullrich, Daniel Severo, James Townsend, Ashish Khisti, Arnaud Doucet, Alireza Makhzani, Chris J. Maddison

    Abstract: Latent variable models have been successfully applied in lossless compression with the bits-back coding algorithm. However, bits-back suffers from an increase in the bitrate equal to the KL divergence between the approximate posterior and the true posterior. In this paper, we show how to remove this gap asymptotically by deriving bits-back coding algorithms from tighter variational bounds. The key… ▽ More

    Submitted 14 June, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

  9. Predicting Multiple ICD-10 Codes from Brazilian-Portuguese Clinical Notes

    Authors: Arthur D. Reys, Danilo Silva, Daniel Severo, Saulo Pedro, Marcia M. de Souza e Sá, Guilherme A. C. Salgado

    Abstract: ICD coding from electronic clinical records is a manual, time-consuming and expensive process. Code assignment is, however, an important task for billing purposes and database organization. While many works have studied the problem of automated ICD coding from free text using machine learning techniques, most use records in the English language, especially from the MIMIC-III public dataset. This w… ▽ More

    Submitted 29 July, 2020; originally announced August 2020.

    Comments: Accepted at BRACIS 2020

  10. arXiv:1910.00752  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Ward2ICU: A Vital Signs Dataset of Inpatients from the General Ward

    Authors: Daniel Severo, Flávio Amaro, Estevam R. Hruschka Jr, André Soares de Moura Costa

    Abstract: We present a proxy dataset of vital signs with class labels indicating patient transitions from the ward to intensive care units called Ward2ICU. Patient privacy is protected using a Wasserstein Generative Adversarial Network to implicitly learn an approximation of the data distribution, allowing us to sample synthetic data. The quality of data generation is assessed directly on the binary classif… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.