-
Out-of-Distribution Detection using Maximum Entropy Coding
Authors:
Mojtaba Abolfazli,
Mohammad Zaeri Amirani,
Anders Høst-Madsen,
June Zhang,
Andras Bratincsak
Abstract:
Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics…
▽ More
Given a default distribution $P$ and a set of test data $x^M=\{x_1,x_2,\ldots,x_M\}$ this paper seeks to answer the question if it was likely that $x^M$ was generated by $P$. For discrete distributions, the definitive answer is in principle given by Kolmogorov-Martin-Löf randomness. In this paper we seek to generalize this to continuous distributions. We consider a set of statistics $T_1(x^M),T_2(x^M),\ldots$. To each statistic we associate its maximum entropy distribution and with this a universal source coder. The maximum entropy distributions are subsequently combined to give a total codelength, which is compared with $-\log P(x^M)$. We show that this approach satisfied a number of theoretical properties.
For real world data $P$ usually is unknown. We transform data into a standard distribution in the latent space using a bidirectional generate network and use maximum entropy coding there. We compare the resulting method to other methods that also used generative neural networks to detect anomalies. In most cases, our results show better performance.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Compress-and-Forward via Multilevel Coding and Trellis Coded Quantization
Authors:
Heping Wan,
Anders Host-Madsen,
Aria Nosratinia
Abstract:
Compress-forward (CF) relays can improve communication rates even when the relay cannot decode the source signal. Efficient implementation of CF is a topic of contemporary interest, in part because of its potential impact on wireless technologies such as cloud-RAN. There exists a gap between the performance of CF implementations in the high spectral efficiency regime and the corresponding informat…
▽ More
Compress-forward (CF) relays can improve communication rates even when the relay cannot decode the source signal. Efficient implementation of CF is a topic of contemporary interest, in part because of its potential impact on wireless technologies such as cloud-RAN. There exists a gap between the performance of CF implementations in the high spectral efficiency regime and the corresponding information theoretic achievable rates. We begin by re-framing a dilemma causing this gap, and propose an approach for its mitigation. We utilize trellis coded quantization (TCQ) at the relay together with multi-level coding at the source and relay, in a manner that facilitates the calculation of bit LLRs at the destination for joint decoding. The contributions of this work include designing TCQ for end-to-end relay performance, since a distortion-minimizing TCQ is suboptimum. The reported improvements include a 1dB gain over prior results for PSK modulation.
△ Less
Submitted 8 January, 2023;
originally announced January 2023.
-
Out-of-Distribution Detection using BiGAN and MDL
Authors:
Mojtaba Abolfazli,
Mohammad Zaeri Arimani,
Anders Host-Madsen,
June Zhang,
Andras Bratincsak
Abstract:
We consider the following problem: we have a large dataset of normal data available. We are now given a new, possibly quite small, set of data, and we are to decide if these are normal data, or if they are indicating a new phenomenon. This is a novelty detection or out-of-distribution detection problem. An example is in medicine, where the normal data is for people with no known disease, and the n…
▽ More
We consider the following problem: we have a large dataset of normal data available. We are now given a new, possibly quite small, set of data, and we are to decide if these are normal data, or if they are indicating a new phenomenon. This is a novelty detection or out-of-distribution detection problem. An example is in medicine, where the normal data is for people with no known disease, and the new dataset people with symptoms. Other examples could be in security. We solve this problem by training a bidirectional generative adversarial network (BiGAN) on the normal data and using a Gaussian graphical model to model the output. We then use universal source coding, or minimum description length (MDL) on the output to decide if it is a new distribution, in an implementation of Kolmogorov and Martin-Löf randomness. We apply the methodology to both MNIST data and a real-world electrocardiogram (ECG) dataset of healthy and patients with Kawasaki disease, and show better performance in terms of the ROC curve than similar methods.
△ Less
Submitted 3 June, 2022;
originally announced June 2022.
-
Graph Compression with Application to Model Selection
Authors:
Mojtaba Abolfazli,
Anders Host-Madsen,
June Zhang,
Andras Bratincsak
Abstract:
Many multivariate data such as social and biological data exhibit complex dependencies that are best characterized by graphs. Unlike sequential data, graphs are, in general, unordered structures. This means we can no longer use classic, sequential-based compression methods on these graph-based data. Therefore, it is necessary to develop new methods for graph compression. In this paper, we present…
▽ More
Many multivariate data such as social and biological data exhibit complex dependencies that are best characterized by graphs. Unlike sequential data, graphs are, in general, unordered structures. This means we can no longer use classic, sequential-based compression methods on these graph-based data. Therefore, it is necessary to develop new methods for graph compression. In this paper, we present universal source coding methods for the lossless compression of unweighted, undirected, unlabelled graphs. We encode in two steps: 1) transforming graph into a rooted binary tree, 2) the encoding rooted binary tree using graph statistics. Our coders showed better compression performance than other source coding methods on both synthetic and real-world graphs.
We then applied our graph coding methods for model selection of Gaussian graphical models using minimum description length (MDL) principle finding the description length of the conditional independence graph. Experiments on synthetic data show that our approach gives better performance compared to common model selection methods. We also applied our approach to electrocardiogram (ECG) data in order to explore the differences between graph models of two groups of subjects.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Graph Coding for Model Selection and Anomaly Detection in Gaussian Graphical Models
Authors:
Mojtaba Abolfazli,
Anders Host-Madsen,
June Zhang,
Andras Bratincsak
Abstract:
A classic application of description length is for model selection with the minimum description length (MDL) principle. The focus of this paper is to extend description length for data analysis beyond simple model selection and sequences of scalars. More specifically, we extend the description length for data analysis in Gaussian graphical models. These are powerful tools to model interactions amo…
▽ More
A classic application of description length is for model selection with the minimum description length (MDL) principle. The focus of this paper is to extend description length for data analysis beyond simple model selection and sequences of scalars. More specifically, we extend the description length for data analysis in Gaussian graphical models. These are powerful tools to model interactions among variables in a sequence of i.i.d Gaussian data in the form of a graph. Our method uses universal graph coding methods to accurately account for model complexity, and therefore provide a more rigorous approach for graph model selection. The developed method is tested with synthetic and electrocardiogram (ECG) data to find the graph model and anomaly in Gaussian graphical models. The experiments show that our method gives better performance compared to commonly used methods.
△ Less
Submitted 4 February, 2021;
originally announced February 2021.
-
Bounds for Learning Lossless Source Coding
Authors:
Anders Host-Madsen
Abstract:
This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to e…
▽ More
This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to encode new data of that type. This is a type of coder that has recently become very popular for (lossy) image and video coding.
The paper consider two criteria for performance of learned coders: the average performance over training data, and a guaranteed performance over all training except for some error probability $P_e$. In both cases the coders are evaluated with respect to redundancy.
The paper considers the IID binary case and binary Markov chains. In both cases it is shown that the amount of training data required is very moderate: to code sequences of length $l$ the amount of training data required to beat a universal source coder is $m=K\frac{l}{\log l}$, where the constant in front depends the case considered.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Differential Description Length for Hyperparameter Selection in Machine Learning
Authors:
Mojtaba Abolfazli,
Anders Host-Madsen,
June Zhang
Abstract:
This paper introduces a new method for model selection and more generally hyperparameter selection in machine learning. Minimum description length (MDL) is an established method for model selection, which is however not directly aimed at minimizing generalization error, which is often the primary goal in machine learning. The paper demonstrates a relationship between generalization error and a dif…
▽ More
This paper introduces a new method for model selection and more generally hyperparameter selection in machine learning. Minimum description length (MDL) is an established method for model selection, which is however not directly aimed at minimizing generalization error, which is often the primary goal in machine learning. The paper demonstrates a relationship between generalization error and a difference of description lengths of the training data; we call this difference differential description length (DDL). This allows prediction of generalization error from the training data alone by performing encoding of the training data. DDL can then be used for model selection by choosing the model with the smallest predicted generalization error. We show how this method can be used for linear regression and neural networks and deep learning. Experimental results show that DDL leads to smaller generalization error than cross-validation and traditional MDL and Bayes methods.
△ Less
Submitted 22 May, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
Coding of Graphs with Application to Graph Anomaly Detection
Authors:
Anders Host-Madsen,
June Zhang
Abstract:
This paper has dual aims. First is to develop practical universal coding methods for unlabeled graphs. Second is to use these for graph anomaly detection. The paper develops two coding methods for unlabeled graphs: one based on the degree distribution, the second based on the triangle distribution. It is shown that these are efficient for different types of random graphs, and on real-world graphs.…
▽ More
This paper has dual aims. First is to develop practical universal coding methods for unlabeled graphs. Second is to use these for graph anomaly detection. The paper develops two coding methods for unlabeled graphs: one based on the degree distribution, the second based on the triangle distribution. It is shown that these are efficient for different types of random graphs, and on real-world graphs. These coding methods is then used for detecting anomalous graphs, based on structure alone. It is shown that anomalous graphs can be detected with high probability.
△ Less
Submitted 6 April, 2018;
originally announced April 2018.
-
Atypicality for Heart Rate Variability Using a Pattern-Tree Weighting Method
Authors:
Elyas Sabeti,
Anders Høst-Madsen
Abstract:
Heart rate variability (HRV) is a vital measure of the autonomic nervous system functionality and a key indicator of cardiovascular condition. This paper proposes a novel method, called pattern tree which is an extension of Willem's context tree to real-valued data, to investigate HRV via an atypicality framework. In a previous paper atypicality was developed as method for mining and discovery in…
▽ More
Heart rate variability (HRV) is a vital measure of the autonomic nervous system functionality and a key indicator of cardiovascular condition. This paper proposes a novel method, called pattern tree which is an extension of Willem's context tree to real-valued data, to investigate HRV via an atypicality framework. In a previous paper atypicality was developed as method for mining and discovery in "Big Data," which requires a universal approach. Using the proposed pattern tree as a universal source coder in this framework led to discovery of arrhythmias and unknown patterns in HRV Holter Monitoring.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
Data Discovery and Anomaly Detection Using Atypicality: Signal Processing Methods
Authors:
Elyas Sabeti,
Anders Høst-Madsen
Abstract:
The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we devel…
▽ More
The aim of atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such "interesting" parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We show that this shares a number of theoretical properties with the discrete-valued case. We develop the methodology for a number of "universal" signal processing models, and finally apply them to recorded hydrophone data.
△ Less
Submitted 10 September, 2017;
originally announced September 2017.
-
Data Discovery and Anomaly Detection Using Atypicality: Theory
Authors:
Anders Høst-Madsen,
Elyas Sabeti,
Chad Walton
Abstract:
A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in…
▽ More
A central question in the era of 'big data' is what to do with the enormous amount of information. One possibility is to characterize it through statistics, e.g., averages, or classify it using machine learning, in order to understand the general structure of the overall data. The perspective in this paper is the opposite, namely that most of the value in the information in some applications is in the parts that deviate from the average, that are unusual, atypical. We define what we mean by 'atypical' in an axiomatic way as data that can be encoded with fewer bits in itself rather than using the code for the typical data. We show that this definition has good theoretical properties. We then develop an implementation based on universal source coding, and apply this to a number of real world data sets.
△ Less
Submitted 10 September, 2017;
originally announced September 2017.
-
Repair of Multiple Descriptions on Distributed Storage
Authors:
Anders Host-Madsen,
Heechoel Yang,
Minchul Kim,
Jungwoo Lee
Abstract:
In multiple descriptions on distributed storage, a source is stored in a shared fashion on multiple servers. When a subset of servers are contacted, the source should be estimated with a certain maximum distortion depending on the number of servers. The problem considered in this paper is how to restore the system operation when one of the servers fail and a new server replaces it, that is, repair…
▽ More
In multiple descriptions on distributed storage, a source is stored in a shared fashion on multiple servers. When a subset of servers are contacted, the source should be estimated with a certain maximum distortion depending on the number of servers. The problem considered in this paper is how to restore the system operation when one of the servers fail and a new server replaces it, that is, repair. The requirement is that the distortions in the restored system should be no more than in the original system. The question is how many extra bits are needed for repair. We find the optimum solution for a two server problem in the Gaussian case, and an achievable rate for general $n$ nodes. One conclusion is that it is necessary to design the multiple description codes with repair in mind; just using an existing multiple description code results in unnecessary high repair rates.
△ Less
Submitted 9 January, 2018; v1 submitted 11 June, 2017;
originally announced June 2017.
-
On the Minimum Energy of Sending Gaussian Multiterminal Sources over the Gaussian MAC
Authors:
Nan Jiang,
Yang Yang,
Anders Høst-Madsen,
Zixiang Xiong
Abstract:
In this work, we investigate the minimum energy of transmitting correlated sources over the Gaussian multiple-access channel (MAC). Compared to other works on joint source-channel coding, we consider the general scenario where the source and channel bandwidths are not naturally matched. In particular, we proposed the use of hybrid digital-analog coding over to improve the transmission energy effic…
▽ More
In this work, we investigate the minimum energy of transmitting correlated sources over the Gaussian multiple-access channel (MAC). Compared to other works on joint source-channel coding, we consider the general scenario where the source and channel bandwidths are not naturally matched. In particular, we proposed the use of hybrid digital-analog coding over to improve the transmission energy efficiency. Different models of correlated sources are studied. We first consider lossless transmission of binary sources over the MAC. We then treat lossy transmission of Gaussian sources over the Gaussian MAC, including CEO sources and multiterminal sources. In all cases, we show that hybrid transmission achieves the best known energy efficiency.
△ Less
Submitted 18 January, 2013; v1 submitted 6 January, 2013;
originally announced January 2013.
-
The Wideband Slope of Interference Channels: The Small Bandwidth Case
Authors:
Minqi Shen,
Anders Høst-Madsen
Abstract:
This paper studies the low-SNR regime performance of a scalar complex K -user interference channel with Gaussian noise. The finite bandwidth case is considered, where the low-SNR regime is approached by letting the input power go to zero while bandwidth is small and fixed. We show that for all δ>0 there exists a set with non-zero measure (probability) in which the wideband slope per user satisfies…
▽ More
This paper studies the low-SNR regime performance of a scalar complex K -user interference channel with Gaussian noise. The finite bandwidth case is considered, where the low-SNR regime is approached by letting the input power go to zero while bandwidth is small and fixed. We show that for all δ>0 there exists a set with non-zero measure (probability) in which the wideband slope per user satisfies Slope<2/K+δ. This is quite contrary to the large bandwidth case [ShenAHM11IT], where a slope of 1 per user is achievable with probability 1. We also develop an interference alignment scheme for the finite bandwidth case that shows some gain.
△ Less
Submitted 17 July, 2012;
originally announced July 2012.
-
The Wideband Slope of Interference Channels: The Large Bandwidth Case
Authors:
Minqi Shen,
Anders Høst-Madsen
Abstract:
It is well known that minimum received energy per bit in the interference channel is -1.59dB as if there were no interference. Thus, the best way to mitigate interference is to operate the interference channel in the low-SNR regime. However, when the SNR is small but non-zero, minimum energy per bit alone does not characterize performance. Verdu introduced the wideband slope S_0 to characterize th…
▽ More
It is well known that minimum received energy per bit in the interference channel is -1.59dB as if there were no interference. Thus, the best way to mitigate interference is to operate the interference channel in the low-SNR regime. However, when the SNR is small but non-zero, minimum energy per bit alone does not characterize performance. Verdu introduced the wideband slope S_0 to characterize the performance in this regime. We show that a wideband slope of S_0/S_{0,no interference}=1/2 is achievable. This result is similar to recent results on degrees of freedom in the high SNR regime, and we use a type of interference alignment using delays to obtain the result. We also show that in many cases the wideband slope is upper bounded by S_0/S_{0,no interference}<=1/2 for large number of users K .
△ Less
Submitted 11 November, 2011; v1 submitted 27 October, 2010;
originally announced October 2010.
-
Deterministic Capacity of MIMO Relay Networks
Authors:
Anders Host-Madsen
Abstract:
The deterministic capacity of a relay network is the capacity of a network when relays are restricted to transmitting \emph{reliable} information, that is, (asymptotically) deterministic function of the source message. In this paper it is shown that the deterministic capacity of a number of MIMO relay networks can be found in the low power regime where $\SNR\to0$. This is accomplished through de…
▽ More
The deterministic capacity of a relay network is the capacity of a network when relays are restricted to transmitting \emph{reliable} information, that is, (asymptotically) deterministic function of the source message. In this paper it is shown that the deterministic capacity of a number of MIMO relay networks can be found in the low power regime where $\SNR\to0$. This is accomplished through deriving single letter upper bounds and finding the limit of these as $\SNR\to0$. The advantage of this technique is that it overcomes the difficulty of finding optimum distributions for mutual information.
△ Less
Submitted 31 March, 2009;
originally announced April 2009.