Search | arXiv e-print repository

Unified Auto-Encoding with Masked Diffusion

Authors: Philippe Hansen-Estruch, Sriram Vishwanath, Amy Zhang, Manan Tomar

Abstract: At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarit… ▽ More At the core of both successful generative and self-supervised representation learning models there is a reconstruction objective that incorporates some form of image corruption. Diffusion models implement this approach through a scheduled Gaussian corruption process, while masked auto-encoder models do so by masking patches of the image. Despite their different approaches, the underlying similarity in their methodologies suggests a promising avenue for an auto-encoder capable of both de-noising tasks. We propose a unified self-supervised objective, dubbed Unified Masked Diffusion (UMD), that combines patch-based and noise-based corruption techniques within a single auto-encoding framework. Specifically, UMD modifies the diffusion transformer (DiT) training process by introducing an additional noise-free, high masking representation step in the diffusion noising schedule, and utilizes a mixed masked and noised image for subsequent timesteps. By integrating features useful for diffusion modeling and for predicting masked patch tokens, UMD achieves strong performance in downstream generative and representation learning tasks, including linear probing and class-conditional generation. This is achieved without the need for heavy data augmentations, multiple views, or additional encoders. Furthermore, UMD improves over the computational efficiency of prior diffusion based methods in total training time. We release our code at https://github.com/philippe-eecs/small-vision. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 19 Pages, 8 Figures, 3Tables

ACM Class: I.2.10

arXiv:2406.14657 [pdf, other]

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Authors: Allen Roush, Yusuf Shabazz, Arvind Balaji, Peter Zhang, Stefano Mezza, Markus Zhang, Sanjay Basu, Sriram Vishwanath, Mehdi Fatemi, Ravid Shwartz-Ziv

Abstract: We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable r… ▽ More We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable resources for training and evaluation. Our extensive experiments demonstrate the efficacy of fine-tuning state-of-the-art large language models for argumentative abstractive summarization across various methods, models, and datasets. By providing this comprehensive resource, we aim to advance computational argumentation and support practical applications for debaters, educators, and researchers. OpenDebateEvidence is publicly available to support further research and innovation in computational argumentation. Access it here: https://huggingface.co/datasets/Yusuf5/OpenCaselist △ Less

Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted for Publication to ARGMIN 2024 at ACL2024

arXiv:2402.05132 [pdf, other]

TexShape: Information Theoretic Sentence Embedding for Language Models

Authors: Kaan Kale, Homa Esfahanizadeh, Noel Elias, Oguzhan Baser, Muriel Medard, Sriram Vishwanath

Abstract: With the exponential growth in data volume and the emergence of data-intensive applications, particularly in the field of machine learning, concerns related to resource utilization, privacy, and fairness have become paramount. This paper focuses on the textual domain of data and addresses challenges regarding encoding sentences to their optimized representations through the lens of information-the… ▽ More With the exponential growth in data volume and the emergence of data-intensive applications, particularly in the field of machine learning, concerns related to resource utilization, privacy, and fairness have become paramount. This paper focuses on the textual domain of data and addresses challenges regarding encoding sentences to their optimized representations through the lens of information-theory. In particular, we use empirical estimates of mutual information, using the Donsker-Varadhan definition of Kullback-Leibler divergence. Our approach leverages this estimation to train an information-theoretic sentence embedding, called TexShape, for (task-based) data compression or for filtering out sensitive information, enhancing privacy and fairness. In this study, we employ a benchmark language model for initial text representation, complemented by neural networks for information-theoretic compression and mutual information estimations. Our experiments demonstrate significant advancements in preserving maximal targeted information and minimal sensitive information over adverse compression ratios, in terms of predictive accuracy of downstream models that are trained using the compressed data. △ Less

Submitted 11 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Submitted to the 2024 IEEE International Symposium on Information Theory

arXiv:2312.07709 [pdf, other]

Majority is Not Required: A Rational Analysis of the Private Double-Spend Attack from a Sub-Majority Adversary

Authors: Yanni Georghiades, Rajesh Mishra, Karl Kreder, Sriram Vishwanath

Abstract: We study the incentives behind double-spend attacks on Nakamoto-style Proof-of-Work cryptocurrencies. In these systems, miners are allowed to choose which transactions to reference with their block, and a common strategy for selecting transactions is to simply choose those with the highest fees. This can be problematic if these transactions originate from an adversary with substantial (but less th… ▽ More We study the incentives behind double-spend attacks on Nakamoto-style Proof-of-Work cryptocurrencies. In these systems, miners are allowed to choose which transactions to reference with their block, and a common strategy for selecting transactions is to simply choose those with the highest fees. This can be problematic if these transactions originate from an adversary with substantial (but less than 50\%) computational power, as high-value transactions can present an incentive for a rational adversary to attempt a double-spend attack if they expect to profit. The most common mechanism for deterring double-spend attacks is for the recipients of large transactions to wait for additional block confirmations (i.e., to increase the attack cost). We argue that this defense mechanism is not satisfactory, as the security of the system is contingent on the actions of its users. Instead, we propose that defending against double-spend attacks should be the responsibility of the miners; specifically, miners should limit the amount of transaction value they include in a block (i.e., reduce the attack reward). To this end, we model cryptocurrency mining as a mean-field game in which we augment the standard mining reward function to simulate the presence of a rational, double-spending adversary. We design and implement an algorithm which characterizes the behavior of miners at equilibrium, and we show that miners who use the adversary-aware reward function accumulate more wealth than those who do not. We show that the optimal strategy for honest miners is to limit the amount of value transferred by each block such that the adversary's expected profit is 0. Additionally, we examine Bitcoin's resilience to double-spend attacks. Assuming a 6 block confirmation time, we find that an attacker with at least 25% of the network mining power can expect to profit from a double-spend attack. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.10900 [pdf, other]

Stability of Sequential Lateration and of Stress Minimization in the Presence of Noise

Authors: Ery Arias-Castro, Siddharth Vishwanath

Abstract: Sequential lateration is a class of methods for multidimensional scaling where a suitable subset of nodes is first embedded by some method, e.g., a clique embedded by classical scaling, and then the remaining nodes are recursively embedded by lateration. A graph is a lateration graph when it can be embedded by such a procedure. We provide a stability result for a particular variant of sequential l… ▽ More Sequential lateration is a class of methods for multidimensional scaling where a suitable subset of nodes is first embedded by some method, e.g., a clique embedded by classical scaling, and then the remaining nodes are recursively embedded by lateration. A graph is a lateration graph when it can be embedded by such a procedure. We provide a stability result for a particular variant of sequential lateration. We do so in a setting where the dissimilarities represent noisy Euclidean distances between nodes in a geometric lateration graph. We then deduce, as a corollary, a perturbation bound for stress minimization. To argue that our setting applies broadly, we show that a (large) random geometric graph is a lateration graph with high probability under mild conditions, extending a previous result of Aspnes et al (2006). △ Less

Submitted 26 March, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2207.07218

arXiv:2309.16878 [pdf, other]

Investigating Human-Identifiable Features Hidden in Adversarial Perturbations

Authors: Dennis Y. Menn, Tzu-hsun Feng, Sriram Vishwanath, Hung-yi Lee

Abstract: Neural networks perform exceedingly well across various machine learning tasks but are not immune to adversarial perturbations. This vulnerability has implications for real-world applications. While much research has been conducted, the underlying reasons why neural networks fall prey to adversarial attacks are not yet fully understood. Central to our study, which explores up to five attack algori… ▽ More Neural networks perform exceedingly well across various machine learning tasks but are not immune to adversarial perturbations. This vulnerability has implications for real-world applications. While much research has been conducted, the underlying reasons why neural networks fall prey to adversarial attacks are not yet fully understood. Central to our study, which explores up to five attack algorithms across three datasets, is the identification of human-identifiable features in adversarial perturbations. Additionally, we uncover two distinct effects manifesting within human-identifiable features. Specifically, the masking effect is prominent in untargeted attacks, while the generation effect is more common in targeted attacks. Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models. In addition, our findings indicate a notable extent of similarity in perturbations across different attack algorithms when averaged over multiple models. This work also provides insights into phenomena associated with adversarial perturbations, such as transferability and model interpretability. Our study contributes to a deeper understanding of the underlying mechanisms behind adversarial attacks and offers insights for the development of more resilient defense strategies for neural networks. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2304.05354 [pdf, other]

iDML: Incentivized Decentralized Machine Learning

Authors: Haoxiang Yu, Hsiao-Yuan Chen, Sangsu Lee, Sriram Vishwanath, Xi Zheng, Christine Julien

Abstract: With the rising emergence of decentralized and opportunistic approaches to machine learning, end devices are increasingly tasked with training deep learning models on-devices using crowd-sourced data that they collect themselves. These approaches are desirable from a resource consumption perspective and also from a privacy preservation perspective. When the devices benefit directly from the traine… ▽ More With the rising emergence of decentralized and opportunistic approaches to machine learning, end devices are increasingly tasked with training deep learning models on-devices using crowd-sourced data that they collect themselves. These approaches are desirable from a resource consumption perspective and also from a privacy preservation perspective. When the devices benefit directly from the trained models, the incentives are implicit - contributing devices' resources are incentivized by the availability of the higher-accuracy model that results from collaboration. However, explicit incentive mechanisms must be provided when end-user devices are asked to contribute their resources (e.g., computation, communication, and data) to a task performed primarily for the benefit of others, e.g., training a model for a task that a neighbor device needs but the device owner is uninterested in. In this project, we propose a novel blockchain-based incentive mechanism for completely decentralized and opportunistic learning architectures. We leverage a smart contract not only for providing explicit incentives to end devices to participate in decentralized learning but also to create a fully decentralized mechanism to inspect and reflect on the behavior of the learning architecture. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2210.12881 [pdf, other]

A Control Theoretic Approach to Infrastructure-Centric Blockchain Tokenomics

Authors: Oguzhan Akcin, Robert P. Streit, Benjamin Oommen, Sriram Vishwanath, Sandeep Chinchali

Abstract: There are a multitude of Blockchain-based physical infrastructure systems, operating on a crypto-currency enabled token economy, where infrastructure suppliers are rewarded with tokens for enabling, validating, managing and/or securing the system. However, today's token economies are largely designed without infrastructure systems in mind, and often operate with a fixed token supply (e.g., Bitcoin… ▽ More There are a multitude of Blockchain-based physical infrastructure systems, operating on a crypto-currency enabled token economy, where infrastructure suppliers are rewarded with tokens for enabling, validating, managing and/or securing the system. However, today's token economies are largely designed without infrastructure systems in mind, and often operate with a fixed token supply (e.g., Bitcoin). This paper argues that token economies for infrastructure networks should be structured differently - they should continually incentivize new suppliers to join the network to provide services and support to the ecosystem. As such, the associated token rewards should gracefully scale with the size of the decentralized system, but should be carefully balanced with consumer demand to manage inflation and be designed to ultimately reach an equilibrium. To achieve such an equilibrium, the decentralized token economy should be adaptable and controllable so that it maximizes the total utility of all users, such as achieving stable (overall non-inflationary) token economies. Our main contribution is to model infrastructure token economies as dynamical systems - the circulating token supply, price, and consumer demand change as a function of the payment to nodes and costs to consumers for infrastructure services. Crucially, this dynamical systems view enables us to leverage tools from mathematical control theory to optimize the overall decentralized network's performance. Moreover, our model extends easily to a Stackelberg game between the controller and the nodes, which we use for robust, strategic pricing. In short, we develop predictive, optimization-based controllers that outperform traditional algorithmic stablecoin heuristics by up to $2.4 \times$ in simulations based on real demand data from existing decentralized wireless networks. △ Less

Submitted 23 October, 2022; originally announced October 2022.

arXiv:2207.05869 [pdf, other]

Achieving Almost All Blockchain Functionalities with Polylogarithmic Storage

Authors: Parikshit Hegde, Robert Streit, Yanni Georghiades, Chaya Ganesh, Sriram Vishwanath

Abstract: In current blockchain systems, full nodes that perform all of the available functionalities need to store the entire blockchain. In addition to the blockchain, full nodes also store a blockchain-summary, called the \emph{state}, which is used to efficiently verify transactions. With the size of popular blockchains and their states growing rapidly, full nodes require massive storage resources in or… ▽ More In current blockchain systems, full nodes that perform all of the available functionalities need to store the entire blockchain. In addition to the blockchain, full nodes also store a blockchain-summary, called the \emph{state}, which is used to efficiently verify transactions. With the size of popular blockchains and their states growing rapidly, full nodes require massive storage resources in order to keep up with the scaling. This leads to a tug-of-war between scaling and decentralization since fewer entities can afford expensive resources. We present \emph{hybrid nodes} for proof-of-work (PoW) cryptocurrencies which can validate transactions, validate blocks, validate states, mine, select the main chain, bootstrap new hybrid nodes, and verify payment proofs. With the use of a protocol called \emph{trimming}, hybrid nodes only retain polylogarithmic number of blocks in the chain length in order to represent the proof-of-work of the blockchain. Hybrid nodes are also optimized for the storage of the state with the use of \emph{stateless blockchain} protocols. The lowered storage requirements should enable more entities to join as hybrid nodes and improve the decentralization of the system. We define novel theoretical security models for hybrid nodes and show that they are provably secure. We also show that the storage requirement of hybrid nodes is near-optimal with respect to our security definitions. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2206.10137 [pdf, other]

Few-Max: Few-Shot Domain Adaptation for Unsupervised Contrastive Representation Learning

Authors: Ali Lotfi Rezaabad, Sidharth Kumar, Sriram Vishwanath, Jonathan I. Tamir

Abstract: Contrastive self-supervised learning methods learn to map data points such as images into non-parametric representation space without requiring labels. While highly successful, current methods require a large amount of data in the training phase. In situations where the target training set is limited in size, generalization is known to be poor. Pretraining on a large source data set and fine-tunin… ▽ More Contrastive self-supervised learning methods learn to map data points such as images into non-parametric representation space without requiring labels. While highly successful, current methods require a large amount of data in the training phase. In situations where the target training set is limited in size, generalization is known to be poor. Pretraining on a large source data set and fine-tuning on the target samples is prone to overfitting in the few-shot regime, where only a small number of target samples are available. Motivated by this, we propose a domain adaption method for self-supervised contrastive learning, termed Few-Max, to address the issue of adaptation to a target distribution under few-shot learning. To quantify the representation quality, we evaluate Few-Max on a range of source and target datasets, including ImageNet, VisDA, and fastMRI, on which Few-Max consistently outperforms other approaches. △ Less

Submitted 22 June, 2022; v1 submitted 21 June, 2022; originally announced June 2022.

arXiv:2206.01795 [pdf, other]

Robust Topological Inference in the Presence of Outliers

Authors: Siddharth Vishwanath, Bharath K. Sriperumbudur, Kenji Fukumizu, Satoshi Kuriki

Abstract: The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this w… ▽ More The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this work, we develop a framework of statistical inference for persistent homology in the presence of outliers. Drawing inspiration from recent developments in robust statistics, we propose a $\textit{median-of-means}$ variant of the distance function ($\textsf{MoM Dist}$), and establish its statistical properties. In particular, we show that, even in the presence of outliers, the sublevel filtrations and weighted filtrations induced by $\textsf{MoM Dist}$ are both consistent estimators of the true underlying population counterpart, and their rates of convergence in the bottleneck metric are controlled by the fraction of outliers in the data. Finally, we demonstrate the advantages of the proposed methodology through simulations and applications. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 50 pages, 10 figures

MSC Class: 62R40; 55N31; 68T09

arXiv:2201.01283 [pdf, other]

Self-supervised Learning from 100 Million Medical Images

Authors: Florin C. Ghesu, Bogdan Georgescu, Awais Mansoor, Youngjin Yoo, Dominik Neumann, Pragneshkumar Patel, R. S. Vishwanath, James M. Balter, Yue Cao, Sasa Grbic, Dorin Comaniciu

Abstract: Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the… ▽ More Building accurate and robust artificial intelligence systems for medical image assessment requires not only the research and design of advanced deep learning models but also the creation of large and curated sets of annotated training examples. Constructing such datasets, however, is often very costly -- due to the complex nature of annotation tasks and the high level of expertise required for the interpretation of medical images (e.g., expert radiologists). To counter this limitation, we propose a method for self-supervised learning of rich image features based on contrastive learning and online feature clustering. For this purpose we leverage large training datasets of over 100,000,000 medical images of various modalities, including radiography, computed tomography (CT), magnetic resonance (MR) imaging and ultrasonography. We propose to use these features to guide model training in supervised and hybrid self-supervised/supervised regime on various downstream tasks. We highlight a number of advantages of this strategy on challenging image assessment problems in radiography, CT and MR: 1) Significant increase in accuracy compared to the state-of-the-art (e.g., AUC boost of 3-7% for detection of abnormalities from chest radiography scans and hemorrhage detection on brain CT); 2) Acceleration of model convergence during training by up to 85% compared to using no pretraining (e.g., 83% when training a model for detection of brain metastases in MR scans); 3) Increase in robustness to various image augmentations, such as intensity variations, rotations or scaling reflective of data variation seen in the field. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2112.11072 [pdf, other]

doi 10.1109/BLOCKCHAIN55522.2022.00072

Scalable Multi-Chain Coordination via the Hierarchical Longest Chain Rule

Authors: Yanni Georghiades, Karl Kreder, Jonathan Downing, Alan Orwick, Sriram Vishwanath

Abstract: This paper introduces BlockReduce, a Proof-of-Work (PoW) based blockchain system which achieves high transaction throughput through a hierarchy of merged mined blockchains, each operating in parallel on a partition the overall application state. Most notably, the full PoW available within the network is applied to all blockchains in BlockReduce, and cross-blockchain state transitions are enabled s… ▽ More This paper introduces BlockReduce, a Proof-of-Work (PoW) based blockchain system which achieves high transaction throughput through a hierarchy of merged mined blockchains, each operating in parallel on a partition the overall application state. Most notably, the full PoW available within the network is applied to all blockchains in BlockReduce, and cross-blockchain state transitions are enabled seamlessly within the core protocol. This paper shows that, given a hierarchy of blockchains and its associated security model, the protocol scales superlinearly in transaction throughput with the number of blockchains operated by the protocol. △ Less

Submitted 27 December, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

Journal ref: 2022 IEEE International Conference on Blockchain (Blockchain)

arXiv:2111.12002 [pdf, other]

Armada: A Robust Latency-Sensitive Edge Cloud in Heterogeneous Edge-Dense Environments

Authors: Lei Huang, Zhiying Liang, Nikhil Sreekumar, Sumanth Kaushik Vishwanath, Cody Perakslis, Abhishek Chandra, Jon Weissman

Abstract: Edge computing has enabled a large set of emerging edge applications by exploiting data proximity and offloading latency-sensitive and computation-intensive workloads to nearby edge servers. However, supporting edge application users at scale in wide-area environments poses challenges due to limited point-of-presence edge sites and constrained elasticity. In this paper, we introduce Armada: a dens… ▽ More Edge computing has enabled a large set of emerging edge applications by exploiting data proximity and offloading latency-sensitive and computation-intensive workloads to nearby edge servers. However, supporting edge application users at scale in wide-area environments poses challenges due to limited point-of-presence edge sites and constrained elasticity. In this paper, we introduce Armada: a densely-distributed edge cloud infrastructure that explores the use of dedicated and volunteer resources to serve geo-distributed users in heterogeneous environments. We describe the lightweight Armada architecture and optimization techniques including performance-aware edge selection, auto-scaling and load balancing on the edge, fault tolerance, and in-situ data access. We evaluate Armada in both real-world volunteer environments and emulated platforms to show how common edge applications, namely real-time object detection and face recognition, can be easily deployed on Armada serving distributed users at scale with low latency. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: 13 pages, 13 figures

ACM Class: C.2.4; D.4.5; D.4.7

arXiv:2105.08892 [pdf, other]

A Phase Transition in Large Network Games

Authors: Abhishek Shende, Deepanshu Vasal, Sriram Vishwanath

Abstract: In this paper, we use a model of large random network game where the agents plays selfishly and are affected by their neighbors, to explore the conditions under which the Nash equilibrium (NE) of the game is affected by a perturbation in the network. We use a phase transition phenomenon observed in finite rank deformations of large random matrices, to study how the NE changes on crossing critical… ▽ More In this paper, we use a model of large random network game where the agents plays selfishly and are affected by their neighbors, to explore the conditions under which the Nash equilibrium (NE) of the game is affected by a perturbation in the network. We use a phase transition phenomenon observed in finite rank deformations of large random matrices, to study how the NE changes on crossing critical threshold points. Our main contribution is as follows: when the perturbation strength is greater than a critical point, it impacts the NE of the game, whereas when this perturbation is below this critical point, the NE remains independent of the perturbation parameter. This demonstrates a phase transition in NE which alludes that perturbations can affect the behavior of the society only if their strength is above a critical threshold. We provide numerical examples for this result and present scenarios under which this phenomenon could potentially occur in real world applications. △ Less

Submitted 18 May, 2021; originally announced May 2021.

arXiv:2103.02087 [pdf, other]

Deep J-Sense: Accelerated MRI Reconstruction via Unrolled Alternating Optimization

Authors: Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik, Jonathan I. Tamir

Abstract: Accelerated multi-coil magnetic resonance imaging reconstruction has seen a substantial recent improvement combining compressed sensing with deep learning. However, most of these methods rely on estimates of the coil sensitivity profiles, or on calibration data for estimating model parameters. Prior work has shown that these methods degrade in performance when the quality of these estimators are p… ▽ More Accelerated multi-coil magnetic resonance imaging reconstruction has seen a substantial recent improvement combining compressed sensing with deep learning. However, most of these methods rely on estimates of the coil sensitivity profiles, or on calibration data for estimating model parameters. Prior work has shown that these methods degrade in performance when the quality of these estimators are poor or when the scan parameters differ from the training conditions. Here we introduce Deep J-Sense as a deep learning approach that builds on unrolled alternating minimization and increases robustness: our algorithm refines both the magnetization (image) kernel and the coil sensitivity maps. Experimental results on a subset of the knee fastMRI dataset show that this increases reconstruction performance and provides a significant degree of robustness to varying acceleration factors and calibration region sizes. △ Less

Submitted 11 April, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2012.14296 [pdf, other]

Network Design for Social Welfare

Authors: Abhishek Shende, Deepanshu Vasal, Sriram Vishwanath

Abstract: In this paper, we consider the problem of network design on network games. We study the conditions on the adjacency matrix of the underlying network to design a game such that the Nash equilibrium coincides with the social optimum. We provide the examples for linear quadratic games that satisfy this condition. Furthermore, we identify conditions on properties of adjacency matrix that provide a uni… ▽ More In this paper, we consider the problem of network design on network games. We study the conditions on the adjacency matrix of the underlying network to design a game such that the Nash equilibrium coincides with the social optimum. We provide the examples for linear quadratic games that satisfy this condition. Furthermore, we identify conditions on properties of adjacency matrix that provide a unique solution using variational inequality formulation, and verify the robustness and continuity of the social cost under perturbations of the network. Finally we comment on individual rationality and extension of our results to large random networked games. △ Less

Submitted 10 September, 2022; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2012.12843 [pdf, other]

EQ-Net: A Unified Deep Learning Framework for Log-Likelihood Ratio Estimation and Quantization

Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

Abstract: In this work, we introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method. We motivate our approach with theoretical insights on two practical estimation algorithms at the ends of the complexity spectrum and reveal a connection between the complexity of an algorithm and the information bottleneck… ▽ More In this work, we introduce EQ-Net: the first holistic framework that solves both the tasks of log-likelihood ratio (LLR) estimation and quantization using a data-driven method. We motivate our approach with theoretical insights on two practical estimation algorithms at the ends of the complexity spectrum and reveal a connection between the complexity of an algorithm and the information bottleneck method: simpler algorithms admit smaller bottlenecks when representing their solution. This motivates us to propose a two-stage algorithm that uses LLR compression as a pretext task for estimation and is focused on low-latency, high-performance implementations via deep neural networks. We carry out extensive experimental evaluation and demonstrate that our single architecture achieves state-of-the-art results on both tasks when compared to previous methods, with gains in quantization efficiency as high as $20\%$ and reduced estimation latency by up to $60\%$ when measured on general purpose and graphical processing units (GPU). In particular, our approach reduces the GPU inference latency by more than two times in several multiple-input multiple-output (MIMO) configurations. Finally, we demonstrate that our scheme is robust to distributional shifts and retains a significant part of its performance when evaluated on 5G channel models, as well as channel estimation errors. △ Less

Submitted 3 May, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

arXiv:2011.00194 [pdf, other]

Hyperbolic Graph Embedding with Enhanced Semi-Implicit Variational Inference

Authors: Ali Lotfi Rezaabad, Rahi Kalantari, Sriram Vishwanath, Mingyuan Zhou, Jonathan Tamir

Abstract: Efficient modeling of relational data arising in physical, social, and information sciences is challenging due to complicated dependencies within the data. In this work, we build off of semi-implicit graph variational auto-encoders to capture higher-order statistics in a low-dimensional graph latent representation. We incorporate hyperbolic geometry in the latent space through a Poincare embedding… ▽ More Efficient modeling of relational data arising in physical, social, and information sciences is challenging due to complicated dependencies within the data. In this work, we build off of semi-implicit graph variational auto-encoders to capture higher-order statistics in a low-dimensional graph latent representation. We incorporate hyperbolic geometry in the latent space through a Poincare embedding to efficiently represent graphs exhibiting hierarchical structure. To address the naive posterior latent distribution assumptions in classical variational inference, we use semi-implicit hierarchical variational Bayes to implicitly capture posteriors of given graph data, which may exhibit heavy tails, multiple modes, skewness, and highly correlated latent structures. We show that the existing semi-implicit variational inference objective provably reduces information in the observed graph. Based on this observation, we estimate and add an additional mutual information term to the semi-implicit variational inference learning objective to capture rich correlations arising between the input and latent spaces. We show that the inclusion of this regularization term in conjunction with the Poincare embedding boosts the quality of learned high-level representations and enables more flexible and faithful graphical modeling. We experimentally demonstrate that our approach outperforms existing graph variational auto-encoders both in Euclidean and in hyperbolic spaces for edge link prediction and node classification. △ Less

Submitted 10 March, 2021; v1 submitted 31 October, 2020; originally announced November 2020.

arXiv:2007.04779 [pdf, other]

Long Short-Term Memory Spiking Networks and Their Applications

Authors: Ali Lotfi Rezaabad, Sriram Vishwanath

Abstract: Recent advances in event-based neuromorphic systems have resulted in significant interest in the use and development of spiking neural networks (SNNs). However, the non-differentiable nature of spiking neurons makes SNNs incompatible with conventional backpropagation techniques. In spite of the significant progress made in training conventional deep neural networks (DNNs), training methods for SNN… ▽ More Recent advances in event-based neuromorphic systems have resulted in significant interest in the use and development of spiking neural networks (SNNs). However, the non-differentiable nature of spiking neurons makes SNNs incompatible with conventional backpropagation techniques. In spite of the significant progress made in training conventional deep neural networks (DNNs), training methods for SNNs still remain relatively poorly understood. In this paper, we present a novel framework for training recurrent SNNs. Analogous to the benefits presented by recurrent neural networks (RNNs) in learning time series models within DNNs, we develop SNNs based on long short-term memory (LSTM) networks. We show that LSTM spiking networks learn the timing of the spikes and temporal dependencies. We also develop a methodology for error backpropagation within LSTM-based SNNs. The developed architecture and method for backpropagation within LSTM-based SNNs enable them to learn long-term dependencies with comparable results to conventional LSTMs. △ Less

Submitted 9 July, 2020; originally announced July 2020.

arXiv:2007.04258 [pdf, other]

Quantifying and Leveraging Predictive Uncertainty for Medical Image Assessment

Authors: Florin C. Ghesu, Bogdan Georgescu, Awais Mansoor, Youngjin Yoo, Eli Gibson, R. S. Vishwanath, Abishek Balachandran, James M. Balter, Yue Cao, Ramandeep Singh, Subba R. Digumarthy, Mannudeep K. Kalra, Sasa Grbic, Dorin Comaniciu

Abstract: The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance.… ▽ More The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance. An additional example is the classification of anatomical views based on 2D Ultrasound images. Often, the anatomical context captured in a frame is not sufficient to recognize the underlying anatomy. Current machine learning solutions for these problems are typically limited to providing probabilistic predictions, relying on the capacity of underlying models to adapt to limited information and the high degree of label noise. In practice, however, this leads to overconfident systems with poor generalization on unseen data. To account for this, we propose a system that learns not only the probabilistic estimate for classification, but also an explicit uncertainty measure which captures the confidence of the system in the predicted output. We argue that this approach is essential to account for the inherent ambiguity characteristic of medical images from different radiologic exams including computed radiography, ultrasonography and magnetic resonance imaging. In our experiments we demonstrate that sample rejection based on the predicted uncertainty can significantly improve the ROC-AUC for various tasks, e.g., by 8% to 0.91 with an expected rejection rate of under 25% for the classification of different abnormalities in chest radiographs. In addition, we show that using uncertainty-driven bootstrapping to filter the training data, one can achieve a significant increase in robustness and accuracy. △ Less

Submitted 8 July, 2020; originally announced July 2020.

Comments: Under review at Medical Image Analysis

arXiv:2006.15189 [pdf, other]

Interpretable Factorization for Neural Network ECG Models

Authors: Christopher Snyder, Sriram Vishwanath

Abstract: The ability of deep learning (DL) to improve the practice of medicine and its clinical outcomes faces a looming obstacle: model interpretation. Without description of how outputs are generated, a collaborating physician can neither resolve when the model's conclusions are in conflict with his or her own, nor learn to anticipate model behavior. Current research aims to interpret networks that diagn… ▽ More The ability of deep learning (DL) to improve the practice of medicine and its clinical outcomes faces a looming obstacle: model interpretation. Without description of how outputs are generated, a collaborating physician can neither resolve when the model's conclusions are in conflict with his or her own, nor learn to anticipate model behavior. Current research aims to interpret networks that diagnose ECG recordings, which has great potential impact as recordings become more personalized and widely deployed. A generalizable impact beyond ECGs lies in the ability to provide a rich test-bed for the development of interpretive techniques in medicine. Interpretive techniques for Deep Neural Networks (DNNs), however, tend to be heuristic and observational in nature, lacking the mathematical rigor one might expect in the analysis of math equations. The motivation of this paper is to offer a third option, a scientific approach. We treat the model output itself as a phenomenon to be explained through component parts and equations governing their behavior. We argue that these component parts should also be "black boxes" --additional targets to interpret heuristically with clear functional connection to the original. We show how to rigorously factor a DNN into a hierarchical equation consisting of black box variables. This is not a subdivision into physical parts, like an organism into its cells; it is but one choice of an equation into a collection of abstract functions. Yet, for DNNs trained to identify normal ECG waveforms on PhysioNet 2017 Challenge data, we demonstrate this choice yields interpretable component models identified with visual composite sketches of ECG samples in corresponding input regions. Moreover, the recursion distills this interpretation: additional factorization of component black boxes corresponds to ECG partitions that are more morphologically pure. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:2006.10012 [pdf, other]

Robust Persistence Diagrams using Reproducing Kernels

Authors: Siddharth Vishwanath, Kenji Fukumizu, Satoshi Kuriki, Bharath Sriperumbudur

Abstract: Persistent homology has become an important tool for extracting geometric and topological features from data, whose multi-scale features are summarized in a persistence diagram. From a statistical perspective, however, persistence diagrams are very sensitive to perturbations in the input space. In this work, we develop a framework for constructing robust persistence diagrams from superlevel filtra… ▽ More Persistent homology has become an important tool for extracting geometric and topological features from data, whose multi-scale features are summarized in a persistence diagram. From a statistical perspective, however, persistence diagrams are very sensitive to perturbations in the input space. In this work, we develop a framework for constructing robust persistence diagrams from superlevel filtrations of robust density estimators constructed using reproducing kernels. Using an analogue of the influence function on the space of persistence diagrams, we establish the proposed framework to be less sensitive to outliers. The robust persistence diagrams are shown to be consistent estimators in bottleneck distance, with the convergence rate controlled by the smoothness of the kernel. This, in turn, allows us to construct uniform confidence bands in the space of persistence diagrams. Finally, we demonstrate the superiority of the proposed approach on benchmark datasets. △ Less

Submitted 3 June, 2022; v1 submitted 17 June, 2020; originally announced June 2020.

MSC Class: 55N31; 62R40; 62G07; 46E22

arXiv:2006.03638 [pdf, other]

Robust Face Verification via Disentangled Representations

Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

Abstract: We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss… ▽ More We introduce a robust algorithm for face verification, i.e., deciding whether twoimages are of the same person or not. Our approach is a novel take on the idea ofusing deep generative networks for adversarial robustness. We use the generativemodel during training as an online augmentation method instead of a test-timepurifier that removes adversarial noise. Our architecture uses a contrastive loss termand a disentangled generative model to sample negative pairs. Instead of randomlypairing two real images, we pair an image with its class-modified counterpart whilekeeping its content (pose, head tilt, hair, etc.) intact. This enables us to efficientlysample hard negative pairs for the contrastive loss. We experimentally show that, when coupled with adversarial training, the proposed scheme converges with aweak inner solver and has a higher clean and robust accuracy than state-of-the-art-methods when evaluated against white-box physical attacks. △ Less

Submitted 23 June, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

Comments: Preprint

arXiv:2005.11853 [pdf, other]

Model-free Reinforcement Learning for Stochastic Stackelberg Security Games

Authors: Rajesh K Mishra, Deepanshu Vasal, Sriram Vishwanath

Abstract: In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to… ▽ More In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy which maximizes its own returns given the knowledge that the follower is going to play the best response to its policy. Thus, both players converge to a pair of policies that form the Stackelberg equilibrium of the game. Recently,~[1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend the idea to an MDP whose dynamics are not known to the players, to propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. △ Less

Submitted 24 May, 2020; originally announced May 2020.

arXiv:2003.11619 [pdf, other]

Deep Networks as Logical Circuits: Generalization and Interpretation

Authors: Christopher Snyder, Sriram Vishwanath

Abstract: Not only are Deep Neural Networks (DNNs) black box models, but also we frequently conceptualize them as such. We lack good interpretations of the mechanisms linking inputs to outputs. Therefore, we find it difficult to analyze in human-meaningful terms (1) what the network learned and (2) whether the network learned. We present a hierarchical decomposition of the DNN discrete classification map in… ▽ More Not only are Deep Neural Networks (DNNs) black box models, but also we frequently conceptualize them as such. We lack good interpretations of the mechanisms linking inputs to outputs. Therefore, we find it difficult to analyze in human-meaningful terms (1) what the network learned and (2) whether the network learned. We present a hierarchical decomposition of the DNN discrete classification map into logical (AND/OR) combinations of intermediate (True/False) classifiers of the input. Those classifiers that can not be further decomposed, called atoms, are (interpretable) linear classifiers. Taken together, we obtain a logical circuit with linear classifier inputs that computes the same label as the DNN. This circuit does not structurally resemble the network architecture, and it may require many fewer parameters, depending on the configuration of weights. In these cases, we obtain simultaneously an interpretation and generalization bound (for the original DNN), connecting two fronts which have historically been investigated separately. Unlike compression techniques, our representation is. We motivate the utility of this perspective by studying DNNs in simple, controlled settings, where we obtain superior generalization bounds despite using only combinatorial information (e.g. no margin information). We demonstrate how to "open the black box" on the MNIST dataset. We show that the learned, internal, logical computations correspond to semantically meaningful (unlabeled) categories that allow DNN descriptions in plain English. We improve the generalization of an already trained network by interpreting, diagnosing, and replacing components the logical circuit that is the DNN. △ Less

Submitted 26 June, 2020; v1 submitted 25 March, 2020; originally announced March 2020.

arXiv:2002.12504 [pdf, other]

Detecting Patch Adversarial Attacks with Image Residuals

Authors: Marius Arvinte, Ahmed Tewfik, Sriram Vishwanath

Abstract: We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images a… ▽ More We introduce an adversarial sample detection algorithm based on image residuals, specifically designed to guard against patch-based attacks. The image residual is obtained as the difference between an input image and a denoised version of it, and a discriminator is trained to distinguish between clean and adversarial samples. More precisely, we use a wavelet domain algorithm for denoising images and demonstrate that the obtained residuals act as a digital fingerprint for adversarial attacks. To emulate the limitations of a physical adversary, we evaluate the performance of our approach against localized (patch-based) adversarial attacks, including in settings where the adversary has complete knowledge about the detection scheme. Our results show that the proposed detection method generalizes to previously unseen, stronger attacks and that it is able to reduce the success rate (conversely, increase the computational effort) of an adaptive attacker. △ Less

Submitted 2 March, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

arXiv:2002.02567 [pdf, other]

doi 10.1145/3392153

Stability and Scalability of Blockchain Systems

Authors: Aditya Gopalan, Abishek Sankararaman, Anwar Walid, Sriram Vishwanath

Abstract: The blockchain paradigm provides a mechanism for content dissemination and distributed consensus on Peer-to-Peer (P2P) networks. While this paradigm has been widely adopted in industry, it has not been carefully analyzed in terms of its network scaling with respect to the number of peers. Applications for blockchain systems, such as cryptocurrencies and IoT, require this form of network scaling.… ▽ More The blockchain paradigm provides a mechanism for content dissemination and distributed consensus on Peer-to-Peer (P2P) networks. While this paradigm has been widely adopted in industry, it has not been carefully analyzed in terms of its network scaling with respect to the number of peers. Applications for blockchain systems, such as cryptocurrencies and IoT, require this form of network scaling. In this paper, we propose a new stochastic network model for a blockchain system. We identify a structural property called \emph{one-endedness}, which we show to be desirable in any blockchain system as it is directly related to distributed consensus among the peers. We show that the stochastic stability of the network is sufficient for the one-endedness of a blockchain. We further establish that our model belongs to a class of network models, called monotone separable models. This allows us to establish upper and lower bounds on the stability region. The bounds on stability depend on the connectivity of the P2P network through its conductance and allow us to analyze the scalability of blockchain systems on large P2P networks. We verify our theoretical insights using both synthetic data and real data from the Bitcoin network. △ Less

Submitted 18 December, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: This is the revised version of the paper

MSC Class: 94A06 (Primary); 60H06 (Secondary) ACM Class: C.4; G.3; H.4.3

Journal ref: Proc. ACM Meas. Anal. Comput. Syst. Vol. 4 No. 2 (2020) Article 35, pages 1-35

arXiv:2001.09599 [pdf, other]

Achieving Multi-Port Memory Performance on Single-Port Memory with Coding Techniques

Authors: Hardik Jain, Matthew Edwards, Ethan Elenberg, Ankit Singh Rawat, Sriram Vishwanath

Abstract: Many performance critical systems today must rely on performance enhancements, such as multi-port memories, to keep up with the increasing demand of memory-access capacity. However, the large area footprints and complexity of existing multi-port memory designs limit their applicability. This paper explores a coding theoretic framework to address this problem. In particular, this paper introduces a… ▽ More Many performance critical systems today must rely on performance enhancements, such as multi-port memories, to keep up with the increasing demand of memory-access capacity. However, the large area footprints and complexity of existing multi-port memory designs limit their applicability. This paper explores a coding theoretic framework to address this problem. In particular, this paper introduces a framework to encode data across multiple single-port memory banks in order to {\em algorithmically} realize the functionality of multi-port memory. This paper proposes three code designs with significantly less storage overhead compared to the existing replication based emulations of multi-port memories. To further improve performance, we also demonstrate a memory controller design that utilizes redundancy across coded memory banks to more efficiently schedule read and write requests sent across multiple cores. Furthermore, guided by DRAM traces, the paper explores {\em dynamic coding} techniques to improve the efficiency of the coding based memory design. We then show significant performance improvements in critical word read and write latency in the proposed coded-memory design when compared to a traditional uncoded-memory design. △ Less

Submitted 27 January, 2020; originally announced January 2020.

Comments: 10 pages, 20 figures, ICICT 2020 conference

arXiv:2001.05633 [pdf, ps, other]

Master equation of discrete time graphon mean field games and teams

Authors: Deepanshu Vasal, Rajesh K Mishra, Sriram Vishwanath

Abstract: In this paper, we present a sequential decomposition algorithm equivalent of Master equation to compute GMFE of GMFG and graphon optimal Markovian policies (GOMPs) of graphon mean field teams (GMFTs). We consider a large population of players sequentially making strategic decisions where the actions of each player affect their neighbors which is captured in a graph, generated by a known graphon. E… ▽ More In this paper, we present a sequential decomposition algorithm equivalent of Master equation to compute GMFE of GMFG and graphon optimal Markovian policies (GOMPs) of graphon mean field teams (GMFTs). We consider a large population of players sequentially making strategic decisions where the actions of each player affect their neighbors which is captured in a graph, generated by a known graphon. Each player observes a private state and also a common information as a graphon mean-field population state which represents the empirical networked distribution of other players' types. We consider non-stationary population state dynamics and present a novel backward recursive algorithm to compute both GMFE and GOMP that depend on both, a player's private type, and the current (dynamic) population state determined through the graphon. Each step in computing GMFE consists of solving a fixed-point equation, while computing GOMP involves solving for an optimization problem. We provide conditions on model parameters for which there exists such a GMFE. Using this algorithm, we obtain the GMFE and GOMP for a specific security setup in cyber physical systems for different graphons that capture the interactions between the nodes in the system. △ Less

Submitted 7 June, 2022; v1 submitted 15 January, 2020; originally announced January 2020.

Comments: 26 pages, 6 figures. arXiv admin note: text overlap with arXiv:1905.04154

arXiv:1912.13361 [pdf, other]

Learning Representations by Maximizing Mutual Information in Variational Autoencoders

Authors: Ali Lotfi Rezaabad, Sriram Vishwanath

Abstract: Variational autoencoders (VAEs) have ushered in a new era of unsupervised learning methods for complex distributions. Although these techniques are elegant in their approach, they are typically not useful for representation learning. In this work, we propose a simple yet powerful class of VAEs that simultaneously result in meaningful learned representations. Our solution is to combine traditional… ▽ More Variational autoencoders (VAEs) have ushered in a new era of unsupervised learning methods for complex distributions. Although these techniques are elegant in their approach, they are typically not useful for representation learning. In this work, we propose a simple yet powerful class of VAEs that simultaneously result in meaningful learned representations. Our solution is to combine traditional VAEs with mutual information maximization, with the goal to enhance amortized inference in VAEs using Information Theoretic techniques. We call this approach InfoMax-VAE, and such an approach can significantly boost the quality of learned high-level representations. We realize this through the explicit maximization of information measures associated with the representation. Using extensive experiments on varied datasets and setups, we show that InfoMax-VAE outperforms contemporary popular approaches, including Info-VAE and $β$-VAE. △ Less

Submitted 7 January, 2020; v1 submitted 21 December, 2019; originally announced December 2019.

arXiv:1906.07849 [pdf, other]

Deep Learning-Based Quantization of L-Values for Gray-Coded Modulation

Authors: Marius Arvinte, Sriram Vishwanath, Ahmed H. Tewfik

Abstract: In this work, a deep learning-based quantization scheme for log-likelihood ratio (L-value) storage is introduced. We analyze the dependency between the average magnitude of different L-values from the same quadrature amplitude modulation (QAM) symbol and show they follow a consistent ordering. Based on this we design a deep autoencoder that jointly compresses and separately reconstructs each L-val… ▽ More In this work, a deep learning-based quantization scheme for log-likelihood ratio (L-value) storage is introduced. We analyze the dependency between the average magnitude of different L-values from the same quadrature amplitude modulation (QAM) symbol and show they follow a consistent ordering. Based on this we design a deep autoencoder that jointly compresses and separately reconstructs each L-value, allowing the use of a weighted loss function that aims to more accurately reconstructs low magnitude inputs. Our method is shown to be competitive with state-of-the-art maximum mutual information quantization schemes, reducing the required memory footprint by a ratio of up to two and a loss of performance smaller than 0.1 dB with less than two effective bits per L-value or smaller than 0.04 dB with 2.25 effective bits. We experimentally show that our proposed method is a universal compression scheme in the sense that after training on an LDPC-coded Rayleigh fading scenario we can reuse the same network without further training on other channel models and codes while preserving the same performance benefits. △ Less

Submitted 9 May, 2021; v1 submitted 18 June, 2019; originally announced June 2019.

Comments: Submitted to IEEE Globecom 2019

arXiv:1903.04656 [pdf, other]

Deep Log-Likelihood Ratio Quantization

Authors: Marius Arvinte, Ahmed H. Tewfik, Sriram Vishwanath

Abstract: In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a l… ▽ More In this work, a deep learning-based method for log-likelihood ratio (LLR) lossy compression and quantization is proposed, with emphasis on a single-input single-output uncorrelated fading communication setting. A deep autoencoder network is trained to compress, quantize and reconstruct the bit log-likelihood ratios corresponding to a single transmitted symbol. Specifically, the encoder maps to a latent space with dimension equal to the number of sufficient statistics required to recover the inputs - equal to three in this case - while the decoder aims to reconstruct a noisy version of the latent representation with the purpose of modeling quantization effects in a differentiable way. Simulation results show that, when applied to a standard rate-1/2 low-density parity-check (LDPC) code, a finite precision compression factor of nearly three times is achieved when storing an entire codeword, with an incurred loss of performance lower than 0.1 dB compared to straightforward scalar quantization of the log-likelihood ratios. △ Less

Submitted 9 May, 2021; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: Accepted for publication at EUSIPCO 2019. Camera-ready version

arXiv:1902.00112 [pdf, other]

HashCore: Proof-of-Work Functions for General Purpose Processors

Authors: Yanni Georghiades, Steven Flolid, Sriram Vishwanath

Abstract: Over the past five years, the rewards associated with mining Proof-of-Work blockchains have increased substantially. As a result, miners are heavily incentivized to design and utilize Application Specific Integrated Circuits (ASICs) that can compute hashes far more efficiently than existing general purpose hardware. Currently, it is difficult for most users to purchase and operate ASICs due to pri… ▽ More Over the past five years, the rewards associated with mining Proof-of-Work blockchains have increased substantially. As a result, miners are heavily incentivized to design and utilize Application Specific Integrated Circuits (ASICs) that can compute hashes far more efficiently than existing general purpose hardware. Currently, it is difficult for most users to purchase and operate ASICs due to pricing and availability constraints, resulting in a relatively small number of miners with respect to total user base for most popular cryptocurrencies. In this work, we aim to invert the problem of ASIC development by constructing a Proof-of-Work function for which an existing general purpose processor (GPP, such as an x86 IC) is already an optimized ASIC. In doing so, we will ensure that any would-be miner either already owns an ASIC for the Proof-of-Work system they wish to participate in or can attain one at a competitive price with relative ease. In order to achieve this, we present HashCore, a Proof-of-Work function composed of "widgets" generated pseudo-randomly at runtime that each execute a sequence of general purpose processor instructions designed to stress the computational resources of such a GPP. The widgets will be modeled after workloads that GPPs have been optimized for, for example, the SPEC CPU 2017 benchmark suite for x86 ICs, in a technique we refer to as inverted benchmarking. We provide a proof that HashCore is collision-resistant regardless of how the widgets are implemented. We observe that GPP designers/developers essentially create an ASIC for benchmarks such as SPEC CPU 2017. By modeling HashCore after such benchmarks, we create a Proof-of-Work function that can be run most efficiently on a GPP, resulting in a more accessible, competitive, and balanced mining market. △ Less

Submitted 15 April, 2019; v1 submitted 31 January, 2019; originally announced February 2019.

arXiv:1811.02067 [pdf, other]

Sample Compression, Support Vectors, and Generalization in Deep Learning

Authors: Christopher Snyder, Sriram Vishwanath

Abstract: Even though Deep Neural Networks (DNNs) are widely celebrated for their practical performance, they possess many intriguing properties related to depth that are difficult to explain both theoretically and intuitively. Understanding how weights in deep networks coordinate together across layers to form useful learners has proven challenging, in part because the repeated composition of nonlinearitie… ▽ More Even though Deep Neural Networks (DNNs) are widely celebrated for their practical performance, they possess many intriguing properties related to depth that are difficult to explain both theoretically and intuitively. Understanding how weights in deep networks coordinate together across layers to form useful learners has proven challenging, in part because the repeated composition of nonlinearities has proved intractable. This paper presents a reparameterization of DNNs as a linear function of a feature map that is locally independent of the weights. This feature map transforms depth-dependencies into simple tensor products and maps each input to a discrete subset of the feature space. Then, using a max-margin assumption, the paper develops a sample compression representation of the neural network in terms of the discrete activation state of neurons induced by s ``support vectors". The paper shows that the number of support vectors s relates with learning guarantees for neural networks through sample compression bounds, yielding a sample complexity of O(ns/epsilon) for networks with n neurons. Finally, the number of support vectors s is found to monotonically increase with width and label noise but decrease with depth. △ Less

Submitted 17 March, 2020; v1 submitted 5 November, 2018; originally announced November 2018.

Comments: 15 pages, 10 figures

arXiv:1810.11867 [pdf, other]

Experimental Design for Cost-Aware Learning of Causal Graphs

Authors: Erik M. Lindgren, Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath

Abstract: We consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. We first show that this problem is NP-hard. We then prove that we can achieve a constant factor approximation to this problem with a gree… ▽ More We consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. We first show that this problem is NP-hard. We then prove that we can achieve a constant factor approximation to this problem with a greedy algorithm. We then constrain the sparsity of each intervention. We develop an algorithm that returns an intervention design that is nearly optimal in terms of size for sparse graphs with sparse interventions and we discuss how to use it when there are costs on the vertices. △ Less

Submitted 28 October, 2018; originally announced October 2018.

Comments: In NIPS 2018

arXiv:1807.10399 [pdf, other]

Applications of Common Entropy for Causal Inference

Authors: Murat Kocaoglu, Sanjay Shakkottai, Alexandros G. Dimakis, Constantine Caramanis, Sriram Vishwanath

Abstract: We study the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. The minimum entropy required for such a latent is known as common entropy in information theory. We extend this notion to Renyi common entropy by minimizing the Renyi entropy of the latent variable. To efficiently compute common entropy, we propose an iterative… ▽ More We study the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. The minimum entropy required for such a latent is known as common entropy in information theory. We extend this notion to Renyi common entropy by minimizing the Renyi entropy of the latent variable. To efficiently compute common entropy, we propose an iterative algorithm that can be used to discover the trade-off between the entropy of the latent variable and the conditional mutual information of the observed variables. We show two applications of common entropy in causal inference: First, under the assumption that there are no low-entropy mediators, it can be used to distinguish causation from spurious correlation among almost all joint distributions on simple causal graphs with two observed variables. Second, common entropy can be used to improve constraint-based methods such as PC or FCI algorithms in the small-sample regime, where these methods are known to struggle. We propose a modification to these constraint-based methods to assess if a separating set found by these algorithms is valid using common entropy. We finally evaluate our algorithms on synthetic and real data to establish their performance. △ Less

Submitted 5 December, 2020; v1 submitted 26 July, 2018; originally announced July 2018.

Comments: In Proceedings of NeurIPS 2020

arXiv:1806.06438 [pdf, other]

Compressed Sensing with Deep Image Prior and Learned Regularization

Authors: Dave Van Veen, Ajil Jalal, Mahdi Soltanolkotabi, Eric Price, Sriram Vishwanath, Alexandros G. Dimakis

Abstract: We propose a novel method for compressed sensing recovery using untrained deep generative models. Our method is based on the recently proposed Deep Image Prior (DIP), wherein the convolutional weights of the network are optimized to match the observed measurements. We show that this approach can be applied to solve any differentiable linear inverse problem, outperforming previous unlearned methods… ▽ More We propose a novel method for compressed sensing recovery using untrained deep generative models. Our method is based on the recently proposed Deep Image Prior (DIP), wherein the convolutional weights of the network are optimized to match the observed measurements. We show that this approach can be applied to solve any differentiable linear inverse problem, outperforming previous unlearned methods. Unlike various learned approaches based on generative models, our method does not require pre-training over large datasets. We further introduce a novel learned regularization technique, which incorporates prior information on the network weights. This reduces reconstruction error, especially for noisy measurements. Finally, we prove that, using the DIP optimization approach, moderately overparameterized single-layer networks can perfectly fit any signal despite the non-convex nature of the fitting problem. This theoretical result provides justification for early stopping. △ Less

Submitted 29 October, 2020; v1 submitted 17 June, 2018; originally announced June 2018.

arXiv:1801.07733 [pdf, other]

On the Key Generation Rate of Physically Unclonable Functions

Authors: Yitao Chen, Muryong Kim, Sriram Vishwanath

Abstract: In this paper, an algebraic binning based coding scheme and its associated achievable rate for key generation using physically unclonable functions (PUFs) is determined. This achievable rate is shown to be optimal under the generated-secret (GS) model for PUFs. Furthermore, a polar code based polynomial-time encoding and decoding scheme that achieves this rate is also presented. In this paper, an algebraic binning based coding scheme and its associated achievable rate for key generation using physically unclonable functions (PUFs) is determined. This achievable rate is shown to be optimal under the generated-secret (GS) model for PUFs. Furthermore, a polar code based polynomial-time encoding and decoding scheme that achieves this rate is also presented. △ Less

Submitted 6 February, 2018; v1 submitted 23 January, 2018; originally announced January 2018.

arXiv:1711.00881 [pdf, other]

On the Steady State of Continuous Time Stochastic Opinion Dynamics with Power Law Confidence

Authors: Jae Oh Woo, François Baccelli, Sriram Vishwanath

Abstract: This paper introduces a class of non-linear and continuous-time opinion dynamics model with additive noise and state dependent interaction rates between agents. The model features interaction rates which are proportional to a negative power of opinion distances. We establish a non-local partial differential equation for the distribution of opinion distances and use Mellin transforms to provide an… ▽ More This paper introduces a class of non-linear and continuous-time opinion dynamics model with additive noise and state dependent interaction rates between agents. The model features interaction rates which are proportional to a negative power of opinion distances. We establish a non-local partial differential equation for the distribution of opinion distances and use Mellin transforms to provide an explicit formula for the stationary solution of the latter, when it exists. Our approach leads to new qualitative and quantitative results on this type of dynamics. To the best of our knowledge these Mellin transform results are the first quantitative results on the equilibria of opinion dynamics with distance-dependent interaction rates. The closed form expressions for this class of dynamics are obtained for the two agent case. However the results can be used in mean-field models featuring several agents whose interaction rates depend on the empirical average of their opinions. The technique also applies to linear dynamics, namely with a constant interaction rate, on an interaction graph. △ Less

Submitted 12 December, 2020; v1 submitted 2 November, 2017; originally announced November 2017.

arXiv:1709.02023 [pdf, other]

CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training

Authors: Murat Kocaoglu, Christopher Snyder, Alexandros G. Dimakis, Sriram Vishwanath

Abstract: We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels… ▽ More We propose an adversarial training procedure for learning a causal implicit generative model for a given causal graph. We show that adversarial training can be used to learn a generative model with true observational and interventional distributions if the generator architecture is consistent with the given causal graph. We consider the application of generating faces based on given binary labels where the dependency structure between the labels is preserved with a causal graph. This problem can be seen as learning a causal implicit generative model for the image and labels. We devise a two-stage procedure for this problem. First we train a causal implicit generative model over binary labels using a neural network consistent with a causal graph as the generator. We empirically show that WassersteinGAN can be used to output discrete labels. Later, we propose two new conditional GAN architectures, which we call CausalGAN and CausalBEGAN. We show that the optimal generator of the CausalGAN, given the labels, samples from the image distributions conditioned on these labels. The conditional GAN combined with a trained causal implicit generative model for the labels is then a causal implicit generative model over the labels and the generated image. We show that the proposed architectures can be used to sample from observational and interventional image distributions, even for interventions which do not naturally occur in the dataset. △ Less

Submitted 14 September, 2017; v1 submitted 6 September, 2017; originally announced September 2017.

arXiv:1703.02645 [pdf, other]

Cost-Optimal Learning of Causal Graphs

Authors: Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath

Abstract: We consider the problem of learning a causal graph over a set of variables with interventions. We study the cost-optimal causal graph learning problem: For a given skeleton (undirected version of the causal graph), design the set of interventions with minimum total cost, that can uniquely identify any causal graph with the given skeleton. We show that this problem is solvable in polynomial time. L… ▽ More We consider the problem of learning a causal graph over a set of variables with interventions. We study the cost-optimal causal graph learning problem: For a given skeleton (undirected version of the causal graph), design the set of interventions with minimum total cost, that can uniquely identify any causal graph with the given skeleton. We show that this problem is solvable in polynomial time. Later, we consider the case when the number of interventions is limited. For this case, we provide polynomial time algorithms when the skeleton is a tree or a clique tree. For a general chordal skeleton, we develop an efficient greedy algorithm, which can be improved when the causal graph skeleton is an interval graph. △ Less

Submitted 7 March, 2017; originally announced March 2017.

arXiv:1701.08254 [pdf, ps, other]

Entropic Causality and Greedy Minimum Entropy Coupling

Authors: Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi

Abstract: We study the problem of identifying the causal relationship between two discrete random variables from observational data. We recently proposed a novel framework called entropic causality that works in a very general functional model but makes the assumption that the unobserved exogenous variable has small entropy in the true causal direction. This framework requires the solution of a minimum en… ▽ More We study the problem of identifying the causal relationship between two discrete random variables from observational data. We recently proposed a novel framework called entropic causality that works in a very general functional model but makes the assumption that the unobserved exogenous variable has small entropy in the true causal direction. This framework requires the solution of a minimum entropy coupling problem: Given marginal distributions of m discrete random variables, each on n states, find the joint distribution with minimum entropy, that respects the given marginals. This corresponds to minimizing a concave function of nm variables over a convex polytope defined by nm linear constraints, called a transportation polytope. Unfortunately, it was recently shown that this minimum entropy coupling problem is NP-hard, even for 2 variables with n states. Even representing points (joint distributions) over this space can require exponential complexity (in n, m) if done naively. In our recent work we introduced an efficient greedy algorithm to find an approximate solution for this problem. In this paper we analyze this algorithm and establish two results: that our algorithm always finds a local minimum and also is within an additive approximation error from the unknown global optimum. △ Less

Submitted 28 January, 2017; originally announced January 2017.

Comments: Submitted to ISIT 2017

arXiv:1701.07576 [pdf, other]

Approximate Capacity of a Class of Partially Connected Interference Channels

Authors: Muryong Kim, Yitao Chen, Sriram Vishwanath

Abstract: We derive inner and outer bounds on the capacity region for a class of three-user partially connected interference channels. We focus on the impact of topology, interference alignment, and interplay between interference and noise. The representative channels we consider are the ones that have clear interference alignment gain. For these channels, Z-channel type outer bounds are tight to within a c… ▽ More We derive inner and outer bounds on the capacity region for a class of three-user partially connected interference channels. We focus on the impact of topology, interference alignment, and interplay between interference and noise. The representative channels we consider are the ones that have clear interference alignment gain. For these channels, Z-channel type outer bounds are tight to within a constant gap from capacity. We present near-optimal achievable schemes based on rate-splitting and lattice alignment. △ Less

Submitted 13 February, 2017; v1 submitted 25 January, 2017; originally announced January 2017.

Comments: Submitted to IEEE Transactions on Information Theory

arXiv:1611.04035 [pdf, ps, other]

Entropic Causal Inference

Authors: Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath, Babak Hassibi

Abstract: We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam's razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using Rényi entropy. Ou… ▽ More We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam's razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using Rényi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low $H_0$ entropy (cardinality) in the true direction, it must have high $H_0$ entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of finding the exogenous variable with minimum entropy is equivalent to the problem of finding minimum joint entropy given $n$ marginal distributions, also known as minimum entropy coupling problem. We propose an efficient greedy algorithm for the minimum entropy coupling problem, that for $n=2$ provably finds a local optimum. This gives a greedy algorithm for finding the exogenous variable with minimum $H_1$ (Shannon Entropy). Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models. △ Less

Submitted 14 November, 2016; v1 submitted 12 November, 2016; originally announced November 2016.

Comments: To appear in AAAI 2017

arXiv:1603.04822 [pdf, other]

Centralized Repair of Multiple Node Failures with Applications to Communication Efficient Secret Sharing

Authors: Ankit Singh Rawat, O. Ozan Koyluoglu, Sriram Vishwanath

Abstract: This paper considers a distributed storage system, where multiple storage nodes can be reconstructed simultaneously at a centralized location. This centralized multi-node repair (CMR) model is a generalization of regenerating codes that allow for bandwidth-efficient repair of a single failed node. This work focuses on the trade-off between the amount of data stored and repair bandwidth in this CMR… ▽ More This paper considers a distributed storage system, where multiple storage nodes can be reconstructed simultaneously at a centralized location. This centralized multi-node repair (CMR) model is a generalization of regenerating codes that allow for bandwidth-efficient repair of a single failed node. This work focuses on the trade-off between the amount of data stored and repair bandwidth in this CMR model. In particular, repair bandwidth bounds are derived for the minimum storage multi-node repair (MSMR) and the minimum bandwidth multi-node repair (MBMR) operating points. The tightness of these bounds are analyzed via code constructions. The MSMR point is characterized through codes achieving this point under functional repair for general set of CMR parameters, as well as with codes enabling exact repair for certain CMR parameters. The MBMR point, on the other hand, is characterized with exact repair codes for all CMR parameters for systems that satisfy a certain entropy accumulation property. Finally, the model proposed here is utilized for the secret sharing problem, where the codes for the multi-node repair problem is used to construct communication efficient secret sharing schemes with the property of bandwidth efficient share repair. △ Less

Submitted 15 March, 2016; originally announced March 2016.

arXiv:1601.06362 [pdf, ps, other]

Progress on High-rate MSR Codes: Enabling Arbitrary Number of Helper Nodes

Authors: Ankit Singh Rawat, O. Ozan Koyluoglu, Sriram Vishwanath

Abstract: This paper presents a construction for high-rate MDS codes that enable bandwidth-efficient repair of a single node. Such MDS codes are also referred to as the minimum storage regenerating (MSR) codes in the distributed storage literature. The construction presented in this paper generates MSR codes for all possible number of helper nodes $d$ as $d$ is a design parameter in the construction. Furthe… ▽ More This paper presents a construction for high-rate MDS codes that enable bandwidth-efficient repair of a single node. Such MDS codes are also referred to as the minimum storage regenerating (MSR) codes in the distributed storage literature. The construction presented in this paper generates MSR codes for all possible number of helper nodes $d$ as $d$ is a design parameter in the construction. Furthermore, the obtained MSR codes have polynomial sub-packetization (a.k.a. node size) $α$. The construction is built on the recent code proposed by Sasidharan et al. [1], which works only for $d = n-1$, i.e., where all the remaining nodes serve as the helper nodes for the bandwidth-efficient repair of a single node. The results of this paper broaden the set of parameters where the constructions of MSR codes were known earlier. △ Less

Submitted 24 January, 2016; originally announced January 2016.

arXiv:1601.04094 [pdf, other]

Efficient and Flexible Crowdsourcing of Specialized Tasks with Precedence Constraints

Authors: Avhishek Chatterjee, Michael Borokhovich, Lav R. Varshney, Sriram Vishwanath

Abstract: Many companies now use crowdsourcing to leverage external (as well as internal) crowds to perform specialized work, and so methods of improving efficiency are critical. Tasks in crowdsourcing systems with specialized work have multiple steps and each step requires multiple skills. Steps may have different flexibilities in terms of obtaining service from one or multiple agents, due to varying level… ▽ More Many companies now use crowdsourcing to leverage external (as well as internal) crowds to perform specialized work, and so methods of improving efficiency are critical. Tasks in crowdsourcing systems with specialized work have multiple steps and each step requires multiple skills. Steps may have different flexibilities in terms of obtaining service from one or multiple agents, due to varying levels of dependency among parts of steps. Steps of a task may have precedence constraints among them. Moreover, there are variations in loads of different types of tasks requiring different skill-sets and availabilities of different types of agents with different skill-sets. Considering these constraints together necessitates the design of novel schemes to allocate steps to agents. In addition, large crowdsourcing systems require allocation schemes that are simple, fast, decentralized and offer customers (task requesters) the freedom to choose agents. In this work we study the performance limits of such crowdsourcing systems and propose efficient allocation schemes that provably meet the performance limits under these additional requirements. We demonstrate our algorithms on data from a crowdsourcing platform run by a non-profit company and show significant improvements over current practice. △ Less

Submitted 15 January, 2016; originally announced January 2016.

arXiv:1511.01212 [pdf, ps, other]

Hierarchical Polar Coding for Achieving Secrecy over Fading Wiretap Channels without any Instantaneous CSI

Authors: Hongbo Si, O. Ozan Koyluoglu, Sriram Vishwanath

Abstract: This paper presents a polar coding scheme to achieve secrecy in block fading binary symmetric wiretap channels without the knowledge of instantaneous channel state information (CSI) at the transmitter. For this model, a coding scheme that hierarchically utilizes polar codes is presented. In particular, on polarization of different binary symmetric channels over different fading blocks, each channe… ▽ More This paper presents a polar coding scheme to achieve secrecy in block fading binary symmetric wiretap channels without the knowledge of instantaneous channel state information (CSI) at the transmitter. For this model, a coding scheme that hierarchically utilizes polar codes is presented. In particular, on polarization of different binary symmetric channels over different fading blocks, each channel use is modeled as an appropriate binary erasure channel over fading blocks. Polar codes are constructed for both coding over channel uses for each fading block and coding over fading blocks for certain channel uses. In order to guarantee security, random bits are introduced at appropriate places to exhaust the observations of the eavesdropper. It is shown that this coding scheme, without instantaneous CSI at the transmitter, is secrecy capacity achieving for the simultaneous fading scenario. For the independent fading case, the capacity is achieved when the fading realizations for the eavesdropper channel is always degraded with respect to the receiver. For the remaining cases, the gap is analyzed by comparing lower and upper bounds. Remarkably, for the scenarios where the secrecy capacity is achieved, the results imply that instantaneous CSI does not increase the secrecy capacity. △ Less

Submitted 4 November, 2015; originally announced November 2015.

arXiv:1511.00041 [pdf, ps, other]

Learning Causal Graphs with Small Interventions

Authors: Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G. Dimakis, Sriram Vishwanath

Abstract: We consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl's Structural Equation Model with independent errors (SEM-IE). The objective is to minimize the number of experiments to discover the causal directions of all the edges in a causal graph. Previous work has focused on the use of separating systems for complete graphs for this… ▽ More We consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl's Structural Equation Model with independent errors (SEM-IE). The objective is to minimize the number of experiments to discover the causal directions of all the edges in a causal graph. Previous work has focused on the use of separating systems for complete graphs for this task. We prove that any deterministic adaptive algorithm needs to be a separating system in order to learn complete graphs in the worst case. In addition, we present a novel separating system construction, whose size is close to optimal and is arguably simpler than previous work in combinatorics. We also develop a novel information theoretic lower bound on the number of interventions that applies in full generality, including for randomized adaptive learning algorithms. For general chordal graphs, we derive worst case lower bounds on the number of interventions. Building on observations about induced trees, we give a new deterministic adaptive algorithm to learn directions on any chordal skeleton completely. In the worst case, our achievable scheme is an $α$-approximation algorithm where $α$ is the independence number of the graph. We also show that there exist graph classes for which the sufficient number of experiments is close to the lower bound. In the other extreme, there are graph classes for which the required number of experiments is multiplicatively $α$ away from our lower bound. In simulations, our algorithm almost always performs very close to the lower bound, while the approach based on separating systems for complete graphs is significantly worse for random chordal graphs. △ Less

Submitted 30 October, 2015; originally announced November 2015.

Comments: Accepted to NIPS 2015

Showing 1–50 of 126 results for author: Vishwanath, S