-
Removing Radio Frequency Interference from Auroral Kilometric Radiation with Stacked Autoencoders
Authors:
Allen Chang,
Mary Knapp,
James LaBelle,
John Swoboda,
Ryan Volz,
Philip J. Erickson
Abstract:
Radio frequency data in astronomy enable scientists to analyze astrophysical phenomena. However, these data can be corrupted by radio frequency interference (RFI) that limits the observation of underlying natural processes. In this study, we extend recent developments in deep learning algorithms to astronomy data. We remove RFI from time-frequency spectrograms containing auroral kilometric radiati…
▽ More
Radio frequency data in astronomy enable scientists to analyze astrophysical phenomena. However, these data can be corrupted by radio frequency interference (RFI) that limits the observation of underlying natural processes. In this study, we extend recent developments in deep learning algorithms to astronomy data. We remove RFI from time-frequency spectrograms containing auroral kilometric radiation (AKR), a coherent radio emission originating from the Earth's auroral zones that is used to study astrophysical plasmas. We propose a Denoising Autoencoder for Auroral Radio Emissions (DAARE) trained with synthetic spectrograms to denoise AKR signals collected at the South Pole Station. DAARE achieves 42.2 peak signal-to-noise ratio (PSNR) and 0.981 structural similarity (SSIM) on synthesized AKR observations, improving PSNR by 3.9 and SSIM by 0.064 compared to state-of-the-art filtering and denoising networks. Qualitative comparisons demonstrate DAARE's capability to effectively remove RFI from real AKR observations, despite being trained completely on a dataset of simulated AKR. The framework for simulating AKR, training DAARE, and employing DAARE can be accessed at github.com/Cylumn/daare.
△ Less
Submitted 10 March, 2023; v1 submitted 23 October, 2022;
originally announced October 2022.
-
Transcriptional Response of SK-N-AS Cells to Methamidophos
Authors:
Akos Vertes,
Albert-Baskar Arul,
Peter Avar,
Andrew R. Korte,
Lida Parvin,
Ziad J. Sahab,
Deborah I. Bunin,
Merrill Knapp,
Denise Nishita,
Andrew Poggio,
Mark-Oliver Stehr,
Carolyn L. Talcott,
Brian M. Davis,
Christine A. Morton,
Christopher J. Sevinsky,
Maria I. Zavodszky
Abstract:
Transcriptomics response of SK-N-AS cells to methamidophos (an acetylcholine esterase inhibitor) exposure was measured at 10 time points between 0.5 and 48 h. The data was analyzed using a combination of traditional statistical methods and novel machine learning algorithms for detecting anomalous behavior and infer causal relations between time profiles. We identified several processes that appear…
▽ More
Transcriptomics response of SK-N-AS cells to methamidophos (an acetylcholine esterase inhibitor) exposure was measured at 10 time points between 0.5 and 48 h. The data was analyzed using a combination of traditional statistical methods and novel machine learning algorithms for detecting anomalous behavior and infer causal relations between time profiles. We identified several processes that appeared to be upregulated in cells treated with methamidophos including: unfolded protein response, response to cAMP, calcium ion response, and cell-cell signaling. The data confirmed the expected consequence of acetylcholine buildup. In addition, transcripts with potentially key roles were identified and causal networks relating these transcripts were inferred using two different computational methods: Siamese convolutional networks and time warp causal inference. Two types of anomaly detection algorithms, one based on Autoencoders and the other one based on Generative Adversarial Networks (GANs), were applied to narrow down the set of relevant transcripts.
△ Less
Submitted 10 August, 2019;
originally announced August 2019.
-
Probabilistic Approximate Logic and its Implementation in the Logical Imagination Engine
Authors:
Mark-Oliver Stehr,
Minyoung Kim,
Carolyn L. Talcott,
Merrill Knapp,
Akos Vertes
Abstract:
In spite of the rapidly increasing number of applications of machine learning in various domains, a principled and systematic approach to the incorporation of domain knowledge in the engineering process is still lacking and ad hoc solutions that are difficult to validate are still the norm in practice, which is of growing concern not only in mission-critical applications.
In this note, we introd…
▽ More
In spite of the rapidly increasing number of applications of machine learning in various domains, a principled and systematic approach to the incorporation of domain knowledge in the engineering process is still lacking and ad hoc solutions that are difficult to validate are still the norm in practice, which is of growing concern not only in mission-critical applications.
In this note, we introduce Probabilistic Approximate Logic (PALO) as a logic based on the notion of mean approximate probability to overcome conceptual and computational difficulties inherent to strictly probabilistic logics. The logic is approximate in several dimensions. Logical independence assumptions are used to obtain approximate probabilities, but by averaging over many instances of formulas a useful estimate of mean probability with known confidence can usually be obtained. To enable efficient computational inference, the logic has a continuous semantics that reflects only a subset of the structural properties of classical logic, but this imprecision can be partly compensated by richer theories obtained by classical inference or other means. Computational inference, which refers to the construction of models and validation of logical properties, is based on Stochastic Gradient Descent (SGD) and Markov Chain Monte Carlo (MCMC) techniques and hence another dimension where approximations are involved.
We also present the Logical Imagination Engine (LIME), a prototypical implementation of PALO based on TensorFlow. Albeit not limited to the biological domain, we illustrate its operation in a quite substantial bioinformatics machine learning application concerned with network synthesis and analysis in a recent DARPA project.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Learning Causality: Synthesis of Large-Scale Causal Networks from High-Dimensional Time Series Data
Authors:
Mark-Oliver Stehr,
Peter Avar,
Andrew R. Korte,
Lida Parvin,
Ziad J. Sahab,
Deborah I. Bunin,
Merrill Knapp,
Denise Nishita,
Andrew Poggio,
Carolyn L. Talcott,
Brian M. Davis,
Christine A. Morton,
Christopher J. Sevinsky,
Maria I. Zavodszky,
Akos Vertes
Abstract:
There is an abundance of complex dynamic systems that are critical to our daily lives and our society but that are hardly understood, and even with today's possibilities to sense and collect large amounts of experimental data, they are so complex and continuously evolving that it is unlikely that their dynamics will ever be understood in full detail. Nevertheless, through computational tools we ca…
▽ More
There is an abundance of complex dynamic systems that are critical to our daily lives and our society but that are hardly understood, and even with today's possibilities to sense and collect large amounts of experimental data, they are so complex and continuously evolving that it is unlikely that their dynamics will ever be understood in full detail. Nevertheless, through computational tools we can try to make the best possible use of the current technologies and available data. We believe that the most useful models will have to take into account the imbalance between system complexity and available data in the context of limited knowledge or multiple hypotheses. The complex system of biological cells is a prime example of such a system that is studied in systems biology and has motivated the methods presented in this paper. They were developed as part of the DARPA Rapid Threat Assessment (RTA) program, which is concerned with understanding of the mechanism of action (MoA) of toxins or drugs affecting human cells. Using a combination of Gaussian processes and abstract network modeling, we present three fundamentally different machine-learning-based approaches to learn causal relations and synthesize causal networks from high-dimensional time series data. While other types of data are available and have been analyzed and integrated in our RTA work, we focus on transcriptomics (that is gene expression) data obtained from high-throughput microarray experiments in this paper to illustrate capabilities and limitations of our algorithms. Our algorithms make different but overall relatively few biological assumptions, so that they are applicable to other types of biological data and potentially even to other complex systems that exhibit high dimensionality but are not of biological nature.
△ Less
Submitted 6 May, 2019;
originally announced May 2019.
-
A Novel Microdata Privacy Disclosure Risk Measure
Authors:
Marmar Orooji,
Gerald M. Knapp
Abstract:
A tremendous amount of individual-level data is generated each day, of use to marketing, decision makers, and machine learning applications. This data often contain private and sensitive information about individuals, which can be disclosed by adversaries. An adversary can recognize the underlying individual's identity for a data record by looking at the values of quasi-identifier attributes, know…
▽ More
A tremendous amount of individual-level data is generated each day, of use to marketing, decision makers, and machine learning applications. This data often contain private and sensitive information about individuals, which can be disclosed by adversaries. An adversary can recognize the underlying individual's identity for a data record by looking at the values of quasi-identifier attributes, known as identity disclosure, or can uncover sensitive information about an individual through attribute disclosure. In Statistical Disclosure Control, multiple disclosure risk measures have been proposed. These share two drawbacks: they do not consider identity and attribute disclosure concurrently in the risk measure, and they make restrictive assumptions on an adversary's knowledge by assuming certain attributes are quasi-identifiers and there is a clear boundary between quasi-identifiers and sensitive information. In this paper, we present a novel disclosure risk measure that addresses these limitations, by presenting a single combined metric of identity and attribute disclosure risk, and providing flexibility in modeling adversary's knowledge. We have developed an efficient algorithm for computing the proposed risk measure and evaluated the feasibility and performance of our approach on a real-world data set from the domain of social work.
△ Less
Submitted 2 January, 2019;
originally announced January 2019.
-
Improving Suppression to Reduce Disclosure Risk and Enhance Data Utility
Authors:
Marmar Orooji,
Gerald M. Knapp
Abstract:
In Privacy Preserving Data Publishing, various privacy models have been developed for employing anonymization operations on sensitive individual level datasets, in order to publish the data for public access while preserving the privacy of individuals in the dataset. However, there is always a trade-off between preserving privacy and data utility; the more changes we make on the confidential datas…
▽ More
In Privacy Preserving Data Publishing, various privacy models have been developed for employing anonymization operations on sensitive individual level datasets, in order to publish the data for public access while preserving the privacy of individuals in the dataset. However, there is always a trade-off between preserving privacy and data utility; the more changes we make on the confidential dataset to reduce disclosure risk, the more information the data loses and the less data utility it preserves. The optimum privacy technique is the one that results in a dataset with minimum disclosure risk and maximum data utility. In this paper, we propose an improved suppression method, which reduces the disclosure risk and enhances the data utility by targeting the highest risk records and keeping other records intact. We have shown the effectiveness of our approach through an experiment on a real-world confidential dataset.
△ Less
Submitted 7 January, 2019; v1 submitted 2 January, 2019;
originally announced January 2019.
-
Multimodal Affect Analysis for Product Feedback Assessment
Authors:
Amol S Patwardhan,
Gerald M Knapp
Abstract:
Consumers often react expressively to products such as food samples, perfume, jewelry, sunglasses, and clothing accessories. This research discusses a multimodal affect recognition system developed to classify whether a consumer likes or dislikes a product tested at a counter or kiosk, by analyzing the consumer's facial expression, body posture, hand gestures, and voice after testing the product.…
▽ More
Consumers often react expressively to products such as food samples, perfume, jewelry, sunglasses, and clothing accessories. This research discusses a multimodal affect recognition system developed to classify whether a consumer likes or dislikes a product tested at a counter or kiosk, by analyzing the consumer's facial expression, body posture, hand gestures, and voice after testing the product. A depth-capable camera and microphone system - Kinect for Windows - is utilized. An emotion identification engine has been developed to analyze the images and voice to determine affective state of the customer. The image is segmented using skin color and adaptive threshold. Face, body and hands are detected using the Haar cascade classifier. Canny edges are identified and the lip, body and hand contours are extracted using spatial filtering. Edge count and orientation around the mouth, cheeks, eyes, shoulders, fingers and the location of the edges are used as features. Classification is done by an emotion template mapping algorithm and training a classifier using support vector machines. The real-time performance, accuracy and feasibility for multimodal affect recognition in feedback assessment are evaluated.
△ Less
Submitted 7 May, 2017;
originally announced May 2017.
-
Automated Prediction of Temporal Relations
Authors:
Amol S Patwardhan,
Jacob Badeaux,
Siavash,
Gerald M Knapp
Abstract:
Background: There has been growing research interest in automated answering of questions or generation of summary of free form text such as news article. In order to implement this task, the computer should be able to identify the sequence of events, duration of events, time at which event occurred and the relationship type between event pairs, time pairs or event-time pairs. Specific Problem: It…
▽ More
Background: There has been growing research interest in automated answering of questions or generation of summary of free form text such as news article. In order to implement this task, the computer should be able to identify the sequence of events, duration of events, time at which event occurred and the relationship type between event pairs, time pairs or event-time pairs. Specific Problem: It is important to accurately identify the relationship type between combinations of event and time before the temporal ordering of events can be defined. The machine learning approach taken in Mani et. al (2006) provides an accuracy of only 62.5 on the baseline data from TimeBank. The researchers used maximum entropy classifier in their methodology. TimeML uses the TLINK annotation to tag a relationship type between events and time. The time complexity is quadratic when it comes to tagging documents with TLINK using human annotation. This research proposes using decision tree and parsing to improve the relationship type tagging. This research attempts to solve the gaps in human annotation by automating the task of relationship type tagging in an attempt to improve the accuracy of event and time relationship in annotated documents. Scope information: The documents from the domain of news will be used. The tagging will be performed within the same document and not across documents. The relationship types will be identified only for a pair of event and time and not a chain of events. The research focuses on documents tagged using the TimeML specification which contains tags such as EVENT, TLINK, and TIMEX. Each tag has attributes such as identifier, relation, POS, time etc.
△ Less
Submitted 22 July, 2016;
originally announced July 2016.