Search | arXiv e-print repository

Out-of-Distribution Detection using Maximum Entropy Coding

Authors: Mojtaba Abolfazli, Mohammad Zaeri Amirani, Anders Høst-Madsen, June Zhang, Andras Bratincsak

Abstract: Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics… ▽ More Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics $T_1(x^M),T_2(x^M),\ldots$. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with $-\log P(x^M)$. We show that this approach satisfied a number of theoretical properties. For real world data $P$ usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2301.03128 [pdf, ps, other]

Compress-and-Forward via Multilevel Coding and Trellis Coded Quantization

Authors: Heping Wan, Anders Host-Madsen, Aria Nosratinia

Abstract: Compress-forward (CF) relays can improve communication rates even when the relay cannot decode the source signal. Efficient implementation of CF is a topic of contemporary interest, in part because of its potential impact on wireless technologies such as cloud-RAN. There exists a gap between the performance of CF implementations in the high spectral efficiency regime and the corresponding informat… ▽ More Compress-forward (CF) relays can improve communication rates even when the relay cannot decode the source signal. Efficient implementation of CF is a topic of contemporary interest, in part because of its potential impact on wireless technologies such as cloud-RAN. There exists a gap between the performance of CF implementations in the high spectral efficiency regime and the corresponding information theoretic achievable rates. We begin by re-framing a dilemma causing this gap, and propose an approach for its mitigation. We utilize trellis coded quantization (TCQ) at the relay together with multi-level coding at the source and relay, in a manner that facilitates the calculation of bit LLRs at the destination for joint decoding. The contributions of this work include designing TCQ for end-to-end relay performance, since a distortion-minimizing TCQ is suboptimum. The reported improvements include a 1dB gain over prior results for PSK modulation. △ Less

Submitted 8 January, 2023; originally announced January 2023.

arXiv:2206.01851 [pdf, other]

Out-of-Distribution Detection using BiGAN and MDL

Authors: Mojtaba Abolfazli, Mohammad Zaeri Arimani, Anders Host-Madsen, June Zhang, Andras Bratincsak

Abstract: We consider the following problem: we have a large dataset of normal data available. We are now given a new, possibly quite small, set of data, and we are to decide if these are normal data, or if they are indicating a new phenomenon. This is a novelty detection or out-of-distribution detection problem. An example is in medicine, where the normal data is for people with no known disease, and the n… ▽ More We consider the following problem: we have a large dataset of normal data available. We are now given a new, possibly quite small, set of data, and we are to decide if these are normal data, or if they are indicating a new phenomenon. This is a novelty detection or out-of-distribution detection problem. An example is in medicine, where the normal data is for people with no known disease, and the new dataset people with symptoms. Other examples could be in security. We solve this problem by training a bidirectional generative adversarial network (BiGAN) on the normal data and using a Gaussian graphical model to model the output. We then use universal source coding, or minimum description length (MDL) on the output to decide if it is a new distribution, in an implementation of Kolmogorov and Martin-Löf randomness. We apply the methodology to both MNIST data and a real-world electrocardiogram (ECG) dataset of healthy and patients with Kawasaki disease, and show better performance in terms of the ROC curve than similar methods. △ Less

Submitted 3 June, 2022; originally announced June 2022.

arXiv:2110.00701 [pdf, other]

Graph Compression with Application to Model Selection

Authors: Mojtaba Abolfazli, Anders Host-Madsen, June Zhang, Andras Bratincsak

Abstract: Many multivariate data such as social and biological data exhibit complex dependencies that are best characterized by graphs. Unlike sequential data, graphs are, in general, unordered structures. This means we can no longer use classic, sequential-based compression methods on these graph-based data. Therefore, it is necessary to develop new methods for graph compression. In this paper, we present… ▽ More Many multivariate data such as social and biological data exhibit complex dependencies that are best characterized by graphs. Unlike sequential data, graphs are, in general, unordered structures. This means we can no longer use classic, sequential-based compression methods on these graph-based data. Therefore, it is necessary to develop new methods for graph compression. In this paper, we present universal source coding methods for the lossless compression of unweighted, undirected, unlabelled graphs. We encode in two steps: 1) transforming graph into a rooted binary tree, 2) the encoding rooted binary tree using graph statistics. Our coders showed better compression performance than other source coding methods on both synthetic and real-world graphs. We then applied our graph coding methods for model selection of Gaussian graphical models using minimum description length (MDL) principle finding the description length of the conditional independence graph. Experiments on synthetic data show that our approach gives better performance compared to common model selection methods. We also applied our approach to electrocardiogram (ECG) data in order to explore the differences between graph models of two groups of subjects. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: Submitted to IEEE Transactions on Signal Processing

arXiv:2102.02431 [pdf, other]

doi 10.1109/ISIT45174.2021.9518002

Graph Coding for Model Selection and Anomaly Detection in Gaussian Graphical Models

Authors: Mojtaba Abolfazli, Anders Host-Madsen, June Zhang, Andras Bratincsak

Abstract: A classic application of description length is for model selection with the minimum description length (MDL) principle. The focus of this paper is to extend description length for data analysis beyond simple model selection and sequences of scalars. More specifically, we extend the description length for data analysis in Gaussian graphical models. These are powerful tools to model interactions amo… ▽ More A classic application of description length is for model selection with the minimum description length (MDL) principle. The focus of this paper is to extend description length for data analysis beyond simple model selection and sequences of scalars. More specifically, we extend the description length for data analysis in Gaussian graphical models. These are powerful tools to model interactions among variables in a sequence of i.i.d Gaussian data in the form of a graph. Our method uses universal graph coding methods to accurately account for model complexity, and therefore provide a more rigorous approach for graph model selection. The developed method is tested with synthetic and electrocardiogram (ECG) data to find the graph model and anomaly in Gaussian graphical models. The experiments show that our method gives better performance compared to commonly used methods. △ Less

Submitted 4 February, 2021; originally announced February 2021.

Comments: Submitted to ISIT 2021

arXiv:2009.08562 [pdf, other]

Bounds for Learning Lossless Source Coding

Authors: Anders Host-Madsen

Abstract: This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to e… ▽ More This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to encode new data of that type. This is a type of coder that has recently become very popular for (lossy) image and video coding. The paper consider two criteria for performance of learned coders: the average performance over training data, and a guaranteed performance over all training except for some error probability $P_e$. In both cases the coders are evaluated with respect to redundancy. The paper considers the IID binary case and binary Markov chains. In both cases it is shown that the amount of training data required is very moderate: to code sequences of length $l$ the amount of training data required to beat a universal source coder is $m=K\frac{l}{\log l}$, where the constant in front depends the case considered. △ Less

Submitted 17 September, 2020; originally announced September 2020.

Comments: Submitted to IEEE Transactions on Information Theory

arXiv:1902.04699 [pdf, other]

Differential Description Length for Hyperparameter Selection in Machine Learning

Authors: Mojtaba Abolfazli, Anders Host-Madsen, June Zhang

Abstract: This paper introduces a new method for model selection and more generally hyperparameter selection in machine learning. Minimum description length (MDL) is an established method for model selection, which is however not directly aimed at minimizing generalization error, which is often the primary goal in machine learning. The paper demonstrates a relationship between generalization error and a dif… ▽ More This paper introduces a new method for model selection and more generally hyperparameter selection in machine learning. Minimum description length (MDL) is an established method for model selection, which is however not directly aimed at minimizing generalization error, which is often the primary goal in machine learning. The paper demonstrates a relationship between generalization error and a difference of description lengths of the training data; we call this difference differential description length (DDL). This allows prediction of generalization error from the training data alone by performing encoding of the training data. DDL can then be used for model selection by choosing the model with the smallest predicted generalization error. We show how this method can be used for linear regression and neural networks and deep learning. Experimental results show that DDL leads to smaller generalization error than cross-validation and traditional MDL and Bayes methods. △ Less

Submitted 22 May, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

Comments: Submitted to NeurIPS 2019

arXiv:1804.02469 [pdf, other]

Coding of Graphs with Application to Graph Anomaly Detection

Authors: Anders Host-Madsen, June Zhang

Abstract: This paper has dual aims. First is to develop practical universal coding methods for unlabeled graphs. Second is to use these for graph anomaly detection. The paper develops two coding methods for unlabeled graphs: one based on the degree distribution, the second based on the triangle distribution. It is shown that these are efficient for different types of random graphs, and on real-world graphs.… ▽ More This paper has dual aims. First is to develop practical universal coding methods for unlabeled graphs. Second is to use these for graph anomaly detection. The paper develops two coding methods for unlabeled graphs: one based on the degree distribution, the second based on the triangle distribution. It is shown that these are efficient for different types of random graphs, and on real-world graphs. These coding methods is then used for detecting anomalous graphs, based on structure alone. It is shown that anomalous graphs can be detected with high probability. △ Less

Submitted 6 April, 2018; originally announced April 2018.

Comments: To be presented at ISIT'18

arXiv:1710.07319 [pdf, other]

Atypicality for Heart Rate Variability Using a Pattern-Tree Weighting Method

Authors: Elyas Sabeti, Anders Høst-Madsen

Abstract: Heart rate variability (HRV) is a vital measure of the autonomic nervous system functionality and a key indicator of cardiovascular condition. This paper proposes a novel method, called pattern tree which is an extension of Willem's context tree to real-valued data, to investigate HRV via an atypicality framework. In a previous paper atypicality was developed as method for mining and discovery in… ▽ More Heart rate variability (HRV) is a vital measure of the autonomic nervous system functionality and a key indicator of cardiovascular condition. This paper proposes a novel method, called pattern tree which is an extension of Willem's context tree to real-valued data, to investigate HRV via an atypicality framework. In a previous paper atypicality was developed as method for mining and discovery in "Big Data," which requires a universal approach. Using the proposed pattern tree as a universal source coder in this framework led to discovery of arrhythmias and unknown patterns in HRV Holter Monitoring. △ Less

Submitted 11 October, 2017; originally announced October 2017.

Comments: 5 pages

arXiv:1709.03191 [pdf, other]

Data Discovery and Anomaly Detection Using Atypicality: Signal Processing Methods

Authors: Elyas Sabeti, Anders Høst-Madsen

Abstract: The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we devel… ▽ More The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We show that this shares a number of theoretical properties with the discrete-valued case. We develop the methodology for a number of "universal" signal processing models, and finally apply them to recorded hydrophone data. △ Less

Submitted 10 September, 2017; originally announced September 2017.

Comments: 13 pages, two columns

arXiv:1709.03189 [pdf, other]

Data Discovery and Anomaly Detection Using Atypicality: Theory

Authors: Anders Høst-Madsen, Elyas Sabeti, Chad Walton

Abstract: A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in… ▽ More A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by 'atypical' in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets. △ Less

Submitted 10 September, 2017; originally announced September 2017.

Comments: 40 pages

arXiv:1706.03436 [pdf, other]

Repair of Multiple Descriptions on Distributed Storage

Authors: Anders Host-Madsen, Heechoel Yang, Minchul Kim, Jungwoo Lee

Abstract: In multiple descriptions on distributed storage, a source is stored in a shared fashion on multiple servers. When a subset of servers are contacted, the source should be estimated with a certain maximum distortion depending on the number of servers. The problem considered in this paper is how to restore the system operation when one of the servers fail and a new server replaces it, that is, repair… ▽ More In multiple descriptions on distributed storage, a source is stored in a shared fashion on multiple servers. When a subset of servers are contacted, the source should be estimated with a certain maximum distortion depending on the number of servers. The problem considered in this paper is how to restore the system operation when one of the servers fail and a new server replaces it, that is, repair. The requirement is that the distortions in the restored system should be no more than in the original system. The question is how many extra bits are needed for repair. We find the optimum solution for a two server problem in the Gaussian case, and an achievable rate for general $n$ nodes. One conclusion is that it is necessary to design the multiple description codes with repair in mind; just using an existing multiple description code results in unnecessary high repair rates. △ Less

Submitted 9 January, 2018; v1 submitted 11 June, 2017; originally announced June 2017.

Comments: Preliminary journal version of ISIT'18 submission. Includes formal proofs

arXiv:1301.1061

On the Minimum Energy of Sending Gaussian Multiterminal Sources over the Gaussian MAC

Authors: Nan Jiang, Yang Yang, Anders Høst-Madsen, Zixiang Xiong

Abstract: In this work, we investigate the minimum energy of transmitting correlated sources over the Gaussian multiple-access channel (MAC). Compared to other works on joint source-channel coding, we consider the general scenario where the source and channel bandwidths are not naturally matched. In particular, we proposed the use of hybrid digital-analog coding over to improve the transmission energy effic… ▽ More In this work, we investigate the minimum energy of transmitting correlated sources over the Gaussian multiple-access channel (MAC). Compared to other works on joint source-channel coding, we consider the general scenario where the source and channel bandwidths are not naturally matched. In particular, we proposed the use of hybrid digital-analog coding over to improve the transmission energy efficiency. Different models of correlated sources are studied. We first consider lossless transmission of binary sources over the MAC. We then treat lossy transmission of Gaussian sources over the Gaussian MAC, including CEO sources and multiterminal sources. In all cases, we show that hybrid transmission achieves the best known energy efficiency. △ Less

Submitted 18 January, 2013; v1 submitted 6 January, 2013; originally announced January 2013.

Comments: Under revision

arXiv:1207.4252 [pdf, ps, other]

The Wideband Slope of Interference Channels: The Small Bandwidth Case

Authors: Minqi Shen, Anders Høst-Madsen

Abstract: This paper studies the low-SNR regime performance of a scalar complex K -user interference channel with Gaussian noise. The finite bandwidth case is considered, where the low-SNR regime is approached by letting the input power go to zero while bandwidth is small and fixed. We show that for all δ>0 there exists a set with non-zero measure (probability) in which the wideband slope per user satisfies… ▽ More This paper studies the low-SNR regime performance of a scalar complex K -user interference channel with Gaussian noise. The finite bandwidth case is considered, where the low-SNR regime is approached by letting the input power go to zero while bandwidth is small and fixed. We show that for all δ>0 there exists a set with non-zero measure (probability) in which the wideband slope per user satisfies Slope<2/K+δ. This is quite contrary to the large bandwidth case [ShenAHM11IT], where a slope of 1 per user is achievable with probability 1. We also develop an interference alignment scheme for the finite bandwidth case that shows some gain. △ Less

Submitted 17 July, 2012; originally announced July 2012.

Comments: submitted to Information Theory, IEEE Transactions on

arXiv:1010.5661 [pdf, ps, other]

The Wideband Slope of Interference Channels: The Large Bandwidth Case

Authors: Minqi Shen, Anders Høst-Madsen

Abstract: It is well known that minimum received energy per bit in the interference channel is -1.59dB as if there were no interference. Thus, the best way to mitigate interference is to operate the interference channel in the low-SNR regime. However, when the SNR is small but non-zero, minimum energy per bit alone does not characterize performance. Verdu introduced the wideband slope S_0 to characterize th… ▽ More It is well known that minimum received energy per bit in the interference channel is -1.59dB as if there were no interference. Thus, the best way to mitigate interference is to operate the interference channel in the low-SNR regime. However, when the SNR is small but non-zero, minimum energy per bit alone does not characterize performance. Verdu introduced the wideband slope S_0 to characterize the performance in this regime. We show that a wideband slope of S_0/S_{0,no interference}=1/2 is achievable. This result is similar to recent results on degrees of freedom in the high SNR regime, and we use a type of interference alignment using delays to obtain the result. We also show that in many cases the wideband slope is upper bounded by S_0/S_{0,no interference}<=1/2 for large number of users K . △ Less

Submitted 11 November, 2011; v1 submitted 27 October, 2010; originally announced October 2010.

arXiv:0904.0037 [pdf, ps, other]

doi 10.1109/ACSSC.2008.5074667

Deterministic Capacity of MIMO Relay Networks

Authors: Anders Host-Madsen

Abstract: The deterministic capacity of a relay network is the capacity of a network when relays are restricted to transmitting \emph{reliable} information, that is, (asymptotically) deterministic function of the source message. In this paper it is shown that the deterministic capacity of a number of MIMO relay networks can be found in the low power regime where $\SNR\to0$. This is accomplished through de… ▽ More The deterministic capacity of a relay network is the capacity of a network when relays are restricted to transmitting \emph{reliable} information, that is, (asymptotically) deterministic function of the source message. In this paper it is shown that the deterministic capacity of a number of MIMO relay networks can be found in the low power regime where $\SNR\to0$. This is accomplished through deriving single letter upper bounds and finding the limit of these as $\SNR\to0$. The advantage of this technique is that it overcomes the difficulty of finding optimum distributions for mutual information. △ Less

Submitted 31 March, 2009; originally announced April 2009.

Comments: Submitted to IEEE Transactions on Information Theory

Showing 1–16 of 16 results for author: Host-Madsen, A