-
AI prediction of cardiovascular events using opportunistic epicardial adipose tissue assessments from CT calcium score
Authors:
Tao Hu,
Joshua Freeze,
Prerna Singh,
Justin Kim,
Yingnan Song,
Hao Wu,
Juhwan Lee,
Sadeer Al-Kindi,
Sanjay Rajagopalan,
David L. Wilson,
Ammar Hoori
Abstract:
Background: Recent studies have used basic epicardial adipose tissue (EAT) assessments (e.g., volume and mean HU) to predict risk of atherosclerosis-related, major adverse cardiovascular events (MACE). Objectives: Create novel, hand-crafted EAT features, 'fat-omics', to capture the pathophysiology of EAT and improve MACE prediction. Methods: We segmented EAT using a previously-validated deep learn…
▽ More
Background: Recent studies have used basic epicardial adipose tissue (EAT) assessments (e.g., volume and mean HU) to predict risk of atherosclerosis-related, major adverse cardiovascular events (MACE). Objectives: Create novel, hand-crafted EAT features, 'fat-omics', to capture the pathophysiology of EAT and improve MACE prediction. Methods: We segmented EAT using a previously-validated deep learning method with optional manual correction. We extracted 148 radiomic features (morphological, spatial, and intensity) and used Cox elastic-net for feature reduction and prediction of MACE. Results: Traditional fat features gave marginal prediction (EAT-volume/EAT-mean-HU/ BMI gave C-index 0.53/0.55/0.57, respectively). Significant improvement was obtained with 15 fat-omics features (C-index=0.69, test set). High-risk features included volume-of-voxels-having-elevated-HU-[-50, -30-HU] and HU-negative-skewness, both of which assess high HU, which as been implicated in fat inflammation. Other high-risk features include kurtosis-of-EAT-thickness, reflecting the heterogeneity of thicknesses, and EAT-volume-in-the-top-25%-of-the-heart, emphasizing adipose near the proximal coronary arteries. Kaplan-Meyer plots of Cox-identified, high- and low-risk patients were well separated with the median of the fat-omics risk, while high-risk group having HR 2.4 times that of the low-risk group (P<0.001). Conclusion: Preliminary findings indicate an opportunity to use more finely tuned, explainable assessments on EAT for improved cardiovascular risk prediction.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Pericoronary adipose tissue feature analysis in CT calcium score images with comparison to coronary CTA
Authors:
Yingnan Song,
Hao Wu,
Juhwan Lee,
Justin Kim,
Ammar Hoori,
Tao Hu,
Vladislav Zimin,
Mohamed Makhlouf,
Sadeer Al-Kindi,
Sanjay Rajagopalan,
Chun-Ho Yun,
Chung-Lieh Hung,
David L. Wilson
Abstract:
We investigated the feasibility and advantages of using non-contrast CT calcium score (CTCS) images to assess pericoronary adipose tissue (PCAT) and its association with major adverse cardiovascular events (MACE). PCAT features from coronary CTA (CCTA) have been shown to be associated with cardiovascular risk but are potentially confounded by iodine. If PCAT in CTCS images can be similarly analyze…
▽ More
We investigated the feasibility and advantages of using non-contrast CT calcium score (CTCS) images to assess pericoronary adipose tissue (PCAT) and its association with major adverse cardiovascular events (MACE). PCAT features from coronary CTA (CCTA) have been shown to be associated with cardiovascular risk but are potentially confounded by iodine. If PCAT in CTCS images can be similarly analyzed, it would avoid this issue and enable its inclusion in formal risk assessment from readily available, low-cost CTCS images. To identify coronaries in CTCS images that have subtle visual evidence of vessels, we registered CTCS with paired CCTA images having coronary labels. We developed a novel axial-disk method giving regions for analyzing PCAT features in three main coronary arteries. We analyzed novel hand-crafted and radiomic features using univariate and multivariate logistic regression prediction of MACE and compared results against those from CCTA. Registration accuracy was sufficient to enable the identification of PCAT regions in CTCS images. Motion or beam hardening artifacts were often present in high-contrast CCTA but not CTCS. Mean HU and volume were increased in both CTCS and CCTA for MACE group. There were significant positive correlations between some CTCS and CCTA features, suggesting that similar characteristics were obtained. Using hand-crafted/radiomics from CTCS and CCTA, AUCs were 0.82/0.79 and 0.83/0.77 respectively, while Agatston gave AUC=0.73. Preliminarily, PCAT features can be assessed from three main coronary arteries in non-contrast CTCS images with performance characteristics that are at the very least comparable to CCTA.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Hiding in Plain Sight: Towards the Science of Linguistic Steganography
Authors:
Leela Raj-Sankar,
S. Raj Rajagopalan
Abstract:
Covert communication (also known as steganography) is the practice of concealing a secret inside an innocuous-looking public object (cover) so that the modified public object (covert code) makes sense to everyone but only someone who knows the code can extract the secret (message). Linguistic steganography is the practice of encoding a secret message in natural language text such as spoken convers…
▽ More
Covert communication (also known as steganography) is the practice of concealing a secret inside an innocuous-looking public object (cover) so that the modified public object (covert code) makes sense to everyone but only someone who knows the code can extract the secret (message). Linguistic steganography is the practice of encoding a secret message in natural language text such as spoken conversation or short public communications such as tweets.. While ad hoc methods for covert communications in specific domains exist ( JPEG images, Chinese poetry, etc), there is no general model for linguistic steganography specifically. We present a novel mathematical formalism for creating linguistic steganographic codes, with three parameters: Decodability (probability that the receiver of the coded message will decode the cover correctly), density (frequency of code words in a cover code), and detectability (probability that an attacker can tell the difference between an untampered cover compared to its steganized version). Verbal or linguistic steganography is most challenging because of its lack of artifacts to hide the secret message in. We detail a practical construction in Python of a steganographic code for Tweets using inserted words to encode hidden digits while using n-gram frequency distortion as the measure of detectability of the insertions. Using the publicly accessible Stanford Sentiment Analysis dataset we implemented the tweet steganization scheme -- a codeword (an existing word in the data set) inserted in random positions in random existing tweets to find the tweet that has the least possible n-gram distortion. We argue that this approximates KL distance in a localized manner at low cost and thus we get a linguistic steganography scheme that is both formal and practical and permits a tradeoff between codeword density and detectability of the covert message.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Enhancing cardiovascular risk prediction through AI-enabled calcium-omics
Authors:
Ammar Hoori,
Sadeer Al-Kindi,
Tao Hu,
Yingnan Song,
Hao Wu,
Juhwan Lee,
Nour Tashtish,
Pingfu Fu,
Robert Gilkeson,
Sanjay Rajagopalan,
David L. Wilson
Abstract:
Background. Coronary artery calcium (CAC) is a powerful predictor of major adverse cardiovascular events (MACE). Traditional Agatston score simply sums the calcium, albeit in a non-linear way, leaving room for improved calcification assessments that will more fully capture the extent of disease.
Objective. To determine if AI methods using detailed calcification features (i.e., calcium-omics) can…
▽ More
Background. Coronary artery calcium (CAC) is a powerful predictor of major adverse cardiovascular events (MACE). Traditional Agatston score simply sums the calcium, albeit in a non-linear way, leaving room for improved calcification assessments that will more fully capture the extent of disease.
Objective. To determine if AI methods using detailed calcification features (i.e., calcium-omics) can improve MACE prediction.
Methods. We investigated additional features of calcification including assessment of mass, volume, density, spatial distribution, territory, etc. We used a Cox model with elastic-net regularization on 2457 CT calcium score (CTCS) enriched for MACE events obtained from a large no-cost CLARIFY program (ClinicalTri-als.gov Identifier: NCT04075162). We employed sampling techniques to enhance model training. We also investigated Cox models with selected features to identify explainable high-risk characteristics.
Results. Our proposed calcium-omics model with modified synthetic down sampling and up sampling gave C-index (80.5%/71.6%) and two-year AUC (82.4%/74.8%) for (80:20, training/testing), respectively (sampling was applied to the training set only). Results compared favorably to Agatston which gave C-index (71.3%/70.3%) and AUC (71.8%/68.8%), respectively. Among calcium-omics features, numbers of calcifications, LAD mass, and diffusivity (a measure of spatial distribution) were important determinants of increased risk, with dense calcification (>1000HU) associated with lower risk. The calcium-omics model reclassified 63% of MACE patients to the high risk group in a held-out test. The categorical net-reclassification index was NRI=0.153.
Conclusions. AI analysis of coronary calcification can lead to improved results as compared to Agatston scoring. Our findings suggest the utility of calcium-omics in improved prediction of risk.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Cardiac CT perfusion imaging of pericoronary adipose tissue (PCAT) highlights potential confounds in coronary CTA
Authors:
Hao Wu,
Yingnan Song,
Ammar Hoori,
Ananya Subramaniam,
Juhwan Lee,
Justin Kim,
Tao Hu,
Sadeer Al-Kindi,
Wei-Ming Huang,
Chun-Ho Yun,
Chung-Lieh Hung,
Sanjay Rajagopalan,
David L. Wilson
Abstract:
Features of pericoronary adipose tissue (PCAT) assessed from coronary computed tomography angiography (CCTA) are associated with inflammation and cardiovascular risk. As PCAT is vascularly connected with coronary vasculature, the presence of iodine is a potential confounding factor on PCAT HU and textures that has not been adequately investigated. Use dynamic cardiac CT perfusion (CCTP) to inform…
▽ More
Features of pericoronary adipose tissue (PCAT) assessed from coronary computed tomography angiography (CCTA) are associated with inflammation and cardiovascular risk. As PCAT is vascularly connected with coronary vasculature, the presence of iodine is a potential confounding factor on PCAT HU and textures that has not been adequately investigated. Use dynamic cardiac CT perfusion (CCTP) to inform contrast determinants of PCAT assessment. From CCTP, we analyzed HU dynamics of territory-specific PCAT, myocardium, and other adipose depots in patients with coronary artery disease. HU, blood flow, and radiomics were assessed over time. Changes from peak aorta time, Pa, chosen to model the time of CCTA, were obtained. HU in PCAT increased more than in other adipose depots. The estimated blood flow in PCAT was ~23% of that in the contiguous myocardium. Comparing PCAT distal and proximal to a significant stenosis, we found less enhancement and longer time-to-peak distally. Two-second offsets [before, after] Pa resulted in [ 4-HU, 3-HU] differences in PCAT. Due to changes in HU, the apparent PCAT volume reduced ~15% from the first scan (P1) to Pa using a conventional fat window. Comparing radiomic features over time, 78% of features changed >10% relative to P1. CCTP elucidates blood flow in PCAT and enables analysis of PCAT features over time. PCAT assessments (HU, apparent volume, and radiomics) are sensitive to acquisition timing and the presence of obstructive stenosis, which may confound the interpretation of PCAT in CCTA images. Data normalization may be in order.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Remote Task-oriented Grasp Area Teaching By Non-Experts through Interactive Segmentation and Few-Shot Learning
Authors:
Furkan Kaynar,
Sudarshan Rajagopalan,
Shaobo Zhou,
Eckehard Steinbach
Abstract:
A robot operating in unstructured environments must be able to discriminate between different grasping styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented grasping. We propose a novel two-step framework towards this aim. The first step involves gr…
▽ More
A robot operating in unstructured environments must be able to discriminate between different grasping styles depending on the prospective manipulation task. Having a system that allows learning from remote non-expert demonstrations can very feasibly extend the cognitive skills of a robot for task-oriented grasping. We propose a novel two-step framework towards this aim. The first step involves grasp area estimation by segmentation. We receive grasp area demonstrations for a new task via interactive segmentation, and learn from these few demonstrations to estimate the required grasp area on an unseen scene for the given task. The second step is autonomous grasp estimation in the segmented region. To train the segmentation network for few-shot learning, we built a grasp area segmentation (GAS) dataset with 10089 images grouped into 1121 segmentation tasks. We benefit from an efficient meta learning algorithm for training for few-shot adaptation. Experimental evaluation showed that our method successfully detects the correct grasp area on the respective objects in unseen test scenes and effectively allows remote teaching of new grasp strategies by non-experts.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Spectrum-inspired Low-light Image Translation for Saliency Detection
Authors:
Kitty Varghese,
Sudarshan Rajagopalan,
Mohit Lamba,
Kaushik Mitra
Abstract:
Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only…
▽ More
Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only tedious and time-consuming but also infeasible if a huge training corpus is required. We propose a technique that performs classical band-pass filtering in the Fourier space to transform well-lit images to low-light images and use them as a proxy for real low-light images. Unlike popular deep learning approaches which require learning thousands of parameters and enormous amounts of training data, the proposed transformation is fast and simple and easy to extend to other tasks such as low-light depth estimation. Our experiments show that the state-of-the-art saliency detection and depth estimation networks trained on our proxy low-light images perform significantly better on real low-light images than networks trained using existing strategies.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
DynaMaR: Dynamic Prompt with Mask Token Representation
Authors:
Xiaodi Sun,
Sunny Rajagopalan,
Priyanka Nigam,
Weiyi Lu,
Yi Xu,
Belinda Zeng,
Trishul Chilimbi
Abstract:
Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific h…
▽ More
Recent research has shown that large language models pretrained using unsupervised approaches can achieve significant performance improvement on many downstream tasks. Typically when adapting these language models to downstream tasks, like a classification or regression task, we employ a fine-tuning paradigm in which the sentence representation from the language model is input to a task-specific head; the model is then fine-tuned end-to-end. However, with the emergence of models like GPT-3, prompt-based fine-tuning has been proven to be a successful approach for few-shot tasks. Inspired by this work, we study discrete prompt technologies in practice. There are two issues that arise with the standard prompt approach. First, it can overfit on the prompt template. Second, it requires manual effort to formulate the downstream task as a language model problem. In this paper, we propose an improvement to prompt-based fine-tuning that addresses these two issues. We refer to our approach as DynaMaR -- Dynamic Prompt with Mask Token Representation. Results show that DynaMaR can achieve an average improvement of 10% in few-shot settings and improvement of 3.7% in data-rich settings over the standard fine-tuning approach on four e-commerce applications.
△ Less
Submitted 6 June, 2022;
originally announced June 2022.
-
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Authors:
Xuanli He,
Iman Keivanloo,
Yi Xu,
Xiang He,
Belinda Zeng,
Santosh Rajagopalan,
Trishul Chilimbi
Abstract:
Pre-training and then fine-tuning large language models is commonly used to achieve state-of-the-art performance in natural language processing (NLP) tasks. However, most pre-trained models suffer from low inference speed. Deploying such large models to applications with latency constraints is challenging. In this work, we focus on accelerating the inference via conditional computations. To achiev…
▽ More
Pre-training and then fine-tuning large language models is commonly used to achieve state-of-the-art performance in natural language processing (NLP) tasks. However, most pre-trained models suffer from low inference speed. Deploying such large models to applications with latency constraints is challenging. In this work, we focus on accelerating the inference via conditional computations. To achieve this, we propose a novel idea, Magic Pyramid (MP), to reduce both width-wise and depth-wise computation via token pruning and early exiting for Transformer-based models, particularly BERT. The former manages to save the computation via removing non-salient tokens, while the latter can fulfill the computation reduction by terminating the inference early before reaching the final layer, if the exiting condition is met. Our empirical studies demonstrate that compared to previous state of arts, MP is not only able to achieve a speed-adjustable inference but also to surpass token pruning and early exiting by reducing up to 70% giga floating point operations (GFLOPs) with less than 0.5% accuracy drop. Token pruning and early exiting express distinctive preferences to sequences with different lengths. However, MP is capable of achieving an average of 8.06x speedup on two popular text classification tasks, regardless of the sizes of the inputs.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
Trait of Gait: A Survey on Gait Biometrics
Authors:
Ebenezer R. H. P. Isaac,
Susan Elias,
Srinivasan Rajagopalan,
K. S. Easwarakumar
Abstract:
Gait analysis is the study of the systematic methods that assess and quantify animal locomotion. The research on gait analysis has considerably evolved through time. It was an ancient art, and it still finds its application today in modern science and medicine. This paper describes how one's gait can be used as a biometric. It shall diversely cover salient research done within the field and explai…
▽ More
Gait analysis is the study of the systematic methods that assess and quantify animal locomotion. The research on gait analysis has considerably evolved through time. It was an ancient art, and it still finds its application today in modern science and medicine. This paper describes how one's gait can be used as a biometric. It shall diversely cover salient research done within the field and explain the nuances and advances in each type of gait analysis. The prominent methods of gait recognition from the early era to the state of the art are covered. This survey also reviews the various gait datasets. The overall aim of this study is to provide a concise roadmap for anyone who wishes to do research in the field of gait biometrics.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
View-invariant Gait Recognition through Genetic Template Segmentation
Authors:
Ebenezer Isaac,
Susan Elias,
Srinivasan Rajagopalan,
K. S. Easwarakumar
Abstract:
Template-based model-free approach provides by far the most successful solution to the gait recognition problem in literature. Recent work discusses how isolating the head and leg portion of the template increase the performance of a gait recognition system making it robust against covariates like clothing and carrying conditions. However, most involve a manual definition of the boundaries. The me…
▽ More
Template-based model-free approach provides by far the most successful solution to the gait recognition problem in literature. Recent work discusses how isolating the head and leg portion of the template increase the performance of a gait recognition system making it robust against covariates like clothing and carrying conditions. However, most involve a manual definition of the boundaries. The method we propose, the genetic template segmentation (GTS), employs the genetic algorithm to automate the boundary selection process. This method was tested on the GEI, GEnI and AEI templates. GEI seems to exhibit the best result when segmented with our approach. Experimental results depict that our approach significantly outperforms the existing implementations of view-invariant gait recognition.
△ Less
Submitted 3 July, 2017; v1 submitted 15 May, 2017;
originally announced May 2017.
-
Ontology Guided Information Extraction from Unstructured Text
Authors:
Raghu Anantharangachar,
Srinivasan Ramani,
S Rajagopalan
Abstract:
In this paper, we describe an approach to populate an existing ontology with instance information present in the natural language text provided as input. An ontology is defined as an explicit conceptualization of a shared domain. This approach starts with a list of relevant domain ontologies created by human experts, and techniques for identifying the most appropriate ontology to be extended with…
▽ More
In this paper, we describe an approach to populate an existing ontology with instance information present in the natural language text provided as input. An ontology is defined as an explicit conceptualization of a shared domain. This approach starts with a list of relevant domain ontologies created by human experts, and techniques for identifying the most appropriate ontology to be extended with information from a given text. Then we demonstrate heuristics to extract information from the unstructured text and for adding it as structured information to the selected ontology. This identification of the relevant ontology is critical, as it is used in identifying relevant information in the text. We extract information in the form of semantic triples from the text, guided by the concepts in the ontology. We then convert the extracted information about the semantic class instances into Resource Description Framework (RDF3) and append it to the existing domain ontology. This enables us to perform more precise semantic queries over the semantic triple store thus created. We have achieved 95% accuracy of information extraction in our implementation.
△ Less
Submitted 6 February, 2013;
originally announced February 2013.
-
Application of Gist SVM in Cancer Detection
Authors:
S. Aruna,
S. P. Rajagopalan,
L. V. Nandakishore
Abstract:
In this paper, we study the application of GIST SVM in disease prediction (detection of cancer). Pattern classification problems can be effectively solved by Support vector machines. Here we propose a classifier which can differentiate patients having benign and malignant cancer cells. To improve the accuracy of classification, we propose to determine the optimal size of the training set and perfo…
▽ More
In this paper, we study the application of GIST SVM in disease prediction (detection of cancer). Pattern classification problems can be effectively solved by Support vector machines. Here we propose a classifier which can differentiate patients having benign and malignant cancer cells. To improve the accuracy of classification, we propose to determine the optimal size of the training set and perform feature selection. To find the optimal size of the training set, different sizes of training sets are experimented and the one with highest classification rate is selected. The optimal features are selected through their F-Scores.
△ Less
Submitted 6 March, 2012; v1 submitted 1 March, 2012;
originally announced March 2012.
-
The Expert System Designed to Improve Customer Satisfaction
Authors:
P. Isakki alias Devi,
S. P. Rajagopalan
Abstract:
Customer Relationship Management becomes a leading business strategy in highly competitive business environment. It aims to enhance the performance of the businesses by improving the customer satisfaction and loyalty. The objective of this paper is to improve customer satisfaction on product's colors and design with the help of the expert system developed by using Artificial Neural Networks. The e…
▽ More
Customer Relationship Management becomes a leading business strategy in highly competitive business environment. It aims to enhance the performance of the businesses by improving the customer satisfaction and loyalty. The objective of this paper is to improve customer satisfaction on product's colors and design with the help of the expert system developed by using Artificial Neural Networks. The expert system's role is to capture the knowledge of the experts and the data from the customer requirements, and then, process the collected data and form the appropriate rules for choosing product's colors and design. In order to identify the hidden pattern of the customer's needs, the Artificial Neural Networks technique has been applied to classify the colors and design based upon a list of selected information. Moreover, the expert system has the capability to make decisions in ranking the scores of the colors and design presented in the selection. In addition, the expert system has been validated with a different customer types.
△ Less
Submitted 9 December, 2011;
originally announced December 2011.
-
Smart Meter Privacy: A Utility-Privacy Framework
Authors:
S. Raj Rajagopalan,
Lalitha Sankar,
Soheil Mohajer,
H. Vincent Poor
Abstract:
End-user privacy in smart meter measurements is a well-known challenge in the smart grid. The solutions offered thus far have been tied to specific technologies such as batteries or assumptions on data usage. Existing solutions have also not quantified the loss of benefit (utility) that results from any such privacy-preserving approach. Using tools from information theory, a new framework is prese…
▽ More
End-user privacy in smart meter measurements is a well-known challenge in the smart grid. The solutions offered thus far have been tied to specific technologies such as batteries or assumptions on data usage. Existing solutions have also not quantified the loss of benefit (utility) that results from any such privacy-preserving approach. Using tools from information theory, a new framework is presented that abstracts both the privacy and the utility requirements of smart meter data. This leads to a novel privacy-utility tradeoff problem with minimal assumptions that is tractable. Specifically for a stationary Gaussian Markov model of the electricity load, it is shown that the optimal utility-and-privacy preserving solution requires filtering out frequency components that are low in power, and this approach appears to encompass most of the proposed privacy approaches.
△ Less
Submitted 10 August, 2011;
originally announced August 2011.
-
Utility-Privacy Tradeoff in Databases: An Information-theoretic Approach
Authors:
Lalitha Sankar,
S. Raj Rajagopalan,
H. Vincent Poor
Abstract:
Ensuring the usefulness of electronic data sources while providing necessary privacy guarantees is an important unsolved problem. This problem drives the need for an analytical framework that can quantify the safety of personally identifiable information (privacy) while still providing a quantifable benefit (utility) to multiple legitimate information consumers. This paper presents an information-…
▽ More
Ensuring the usefulness of electronic data sources while providing necessary privacy guarantees is an important unsolved problem. This problem drives the need for an analytical framework that can quantify the safety of personally identifiable information (privacy) while still providing a quantifable benefit (utility) to multiple legitimate information consumers. This paper presents an information-theoretic framework that promises an analytical model guaranteeing tight bounds of how much utility is possible for a given level of privacy and vice-versa. Specific contributions include: i) stochastic data models for both categorical and numerical data; ii) utility-privacy tradeoff regions and the encoding (sanization) schemes achieving them for both classes and their practical relevance; and iii) modeling of prior knowledge at the user and/or data source and optimal encoding schemes for both cases.
△ Less
Submitted 21 January, 2013; v1 submitted 17 February, 2011;
originally announced February 2011.
-
An Information-theoretic Approach to Privacy
Authors:
Lalitha Sankar,
S. Raj Rajagopalan,
H. Vincent Poor
Abstract:
Ensuring the usefulness of electronic data sources while providing necessary privacy guarantees is an important unsolved problem. This problem drives the need for an overarching analytical framework that can quantify the safety of personally identifiable information (privacy) while still providing a quantifable benefit (utility) to multiple legitimate information consumers. State of the art approa…
▽ More
Ensuring the usefulness of electronic data sources while providing necessary privacy guarantees is an important unsolved problem. This problem drives the need for an overarching analytical framework that can quantify the safety of personally identifiable information (privacy) while still providing a quantifable benefit (utility) to multiple legitimate information consumers. State of the art approaches have predominantly focused on privacy. This paper presents the first information-theoretic approach that promises an analytical model guaranteeing tight bounds of how much utility is possible for a given level of privacy and vice-versa.
△ Less
Submitted 1 October, 2010;
originally announced October 2010.
-
Utility and Privacy of Data Sources: Can Shannon Help Conceal and Reveal Information?
Authors:
Lalitha Sankar,
S. Raj Rajagopalan,
H. Vincent Poor
Abstract:
The problem of private information "leakage" (inadvertently or by malicious design) from the myriad large centralized searchable data repositories drives the need for an analytical framework that quantifies unequivocally how safe private data can be (privacy) while still providing useful benefit (utility) to multiple legitimate information consumers. Rate distortion theory is shown to be a natur…
▽ More
The problem of private information "leakage" (inadvertently or by malicious design) from the myriad large centralized searchable data repositories drives the need for an analytical framework that quantifies unequivocally how safe private data can be (privacy) while still providing useful benefit (utility) to multiple legitimate information consumers. Rate distortion theory is shown to be a natural choice to develop such a framework which includes the following: modeling of data sources, developing application independent utility and privacy metrics, quantifying utility-privacy tradeoffs irrespective of the type of data sources or the methods of providing privacy, developing a side-information model for dealing with questions of external knowledge, and studying a successive disclosure problem for multiple query data sources.
△ Less
Submitted 5 February, 2010;
originally announced February 2010.
-
The StoreGate: a Data Model for the Atlas Software Architecture
Authors:
P. Calafiura,
C. G. Leggett,
D. R. Quarrie,
H. Ma,
S. Rajagopalan
Abstract:
The Atlas collaboration at CERN has adopted the Gaudi software architecture which belongs to the blackboard family: data objects produced by knowledge sources (e.g. reconstruction modules) are posted to a common in-memory data base from where other modules can access them and produce new data objects. The StoreGate has been designed, based on the Atlas requirements and the experience of other HE…
▽ More
The Atlas collaboration at CERN has adopted the Gaudi software architecture which belongs to the blackboard family: data objects produced by knowledge sources (e.g. reconstruction modules) are posted to a common in-memory data base from where other modules can access them and produce new data objects. The StoreGate has been designed, based on the Atlas requirements and the experience of other HENP systems such as Babar, CDF, CLEO, D0 and LHCB, to identify in a simple and efficient fashion (collections of) data objects based on their type and/or the modules which posted them to the Transient Data Store (the blackboard). The developer also has the freedom to use her preferred key class to uniquely identify a data object according to any other criterion. Besides this core functionality, the StoreGate provides the developers with a powerful interface to handle in a coherent fashion persistable references, object lifetimes, memory management and access control policy for the data objects in the Store. It also provides a Handle/Proxy mechanism to define and hide the cache fault mechanism: upon request, a missing Data Object can be transparently created and added to the Transient Store presumably retrieving it from a persistent data-base, or even reconstructing it on demand.
△ Less
Submitted 14 June, 2003;
originally announced June 2003.