-
ETL for the integration of remote sensing data
Authors:
Paula V. Romero Jure,
Juan Bautista Cabral,
Sergio Masuelli
Abstract:
Modern in-orbit satellites and other available remote sensing tools have generated a huge availability of public data waiting to be exploited in different formats hosted on different servers. In this context, ETL formalism becomes relevant for the integration and analysis of the combined information from all these sources. Throughout this work, we present the theoretical and practical foundations…
▽ More
Modern in-orbit satellites and other available remote sensing tools have generated a huge availability of public data waiting to be exploited in different formats hosted on different servers. In this context, ETL formalism becomes relevant for the integration and analysis of the combined information from all these sources. Throughout this work, we present the theoretical and practical foundations to build a modular analysis infrastructure that allows the creation of ETLs to download, transform and integrate data coming from different instruments in different formats. Part of this work is already implemented in a Python library which is intended to be integrated into already available workflow management tools based on acyclic-directed graphs which also have different adapters to impact the combined data in different warehouses.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
A labeled dataset of cloud types using data from GOES-16 and CloudSat
Authors:
Paula V. Romero Jure,
Sergio Masuelli,
Juan Bautista Cabral
Abstract:
In this paper we present the development of a dataset consisting of 91 Multi-band Cloud and Moisture Product Full-Disk (MCMIPF) from the Advanced Baseline Imager (ABI) on board GOES-16 geostationary satellite with 91 temporally and spatially corresponding CLDCLASS products from the CloudSat polar satellite. The products are diurnal, corresponding to the months of January and February 2019 and were…
▽ More
In this paper we present the development of a dataset consisting of 91 Multi-band Cloud and Moisture Product Full-Disk (MCMIPF) from the Advanced Baseline Imager (ABI) on board GOES-16 geostationary satellite with 91 temporally and spatially corresponding CLDCLASS products from the CloudSat polar satellite. The products are diurnal, corresponding to the months of January and February 2019 and were chosen such that the products from both satellites can be co-located over South America. The CLDCLASS product provides the cloud type observed for each of the orbit's steps and the GOES-16 multiband images contain pixels that can be co-located with these data. We develop an algorithm that returns a product in the form of a table that provides pixels from multiband images labelled with the type of cloud observed in them. These labelled data conformed in this particular structure are very useful to perform supervised learning. This was corroborated by training a simple linear artificial neural network based on the work of Gorooh et al. (2020), which gave good results, especially for the classification of deep convective clouds.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Drifting Features: Detection and evaluation in the context of automatic RRLs identification in VVV
Authors:
J. B. Cabral,
M. Lares,
S. Gurovich,
D. Minniti,
P. M. Granitto
Abstract:
As most of the modern astronomical sky surveys produce data faster than humans can analyze it, Machine Learning (ML) has become a central tool in Astronomy. Modern ML methods can be characterized as highly resistant to some experimental errors. However, small changes on the data over long distances or long periods of time, which cannot be easily detected by statistical methods, can be harmful to t…
▽ More
As most of the modern astronomical sky surveys produce data faster than humans can analyze it, Machine Learning (ML) has become a central tool in Astronomy. Modern ML methods can be characterized as highly resistant to some experimental errors. However, small changes on the data over long distances or long periods of time, which cannot be easily detected by statistical methods, can be harmful to these methods. We develop a new strategy to cope with this problem, also using ML methods in an innovative way, to identify these potentially harmful features. We introduce and discuss the notion of Drifting Features, related with small changes in the properties as measured in the data features. We use the identification of RRLs in VVV based on an earlier work and introduce a method for detecting Drifting Features. Our method forces a classifier to learn the tile of origin of diverse sources (mostly stellar 'point sources'), and select the features more relevant to the task of finding candidates to Drifting Features. We show that this method can efficiently identify a reduced set of features that contains useful information about the tile of origin of the sources. For our particular example of detecting RRLs in VVV, we find that Drifting Features are mostly related to color indices. On the other hand, we show that, even if we have a clear set of Drifting Features in our problem, they are mostly insensitive to the identification of RRLs. Drifting Features can be efficiently identified using ML methods. However, in our example, removing Drifting Features does not improve the identification of RRLs.
△ Less
Submitted 22 May, 2021; v1 submitted 4 May, 2021;
originally announced May 2021.
-
Computational models in Electroencephalography
Authors:
Katharina Glomb,
Joana Cabral,
Anna Cattani,
Alberto Mazzoni,
Ashish Raj,
Benedetta Franceschiello
Abstract:
Computational models lie at the intersection of basic neuroscience and healthcare applications because they allow researchers to test hypotheses \textit{in silico} and predict the outcome of experiments and interactions that are very hard to test in reality. Yet, what is meant by "computational model" is understood in many different ways by researchers in different fields of neuroscience and psych…
▽ More
Computational models lie at the intersection of basic neuroscience and healthcare applications because they allow researchers to test hypotheses \textit{in silico} and predict the outcome of experiments and interactions that are very hard to test in reality. Yet, what is meant by "computational model" is understood in many different ways by researchers in different fields of neuroscience and psychology, hindering communication and collaboration. In this review, we point out the state of the art of computational modeling in Electroencephalography (EEG) and outline how these models can be used to integrate findings from electrophysiology, network-level models, and behavior. On the one hand, computational models serve to investigate the mechanisms that generate brain activity, for example measured with EEG, such as the transient emergence of oscillations at different frequency bands and/or with different spatial topographies. On the other hand, computational models serve to design experiments and test hypotheses \emph{in silico}. The final purpose of computational models of EEG is to obtain a comprehensive understanding of the mechanisms that underlie the EEG signal. This is crucial for an accurate interpretation of EEG measurements that may ultimately serve in the development of novel clinical applications.
△ Less
Submitted 17 September, 2020;
originally announced September 2020.
-
Automatic Catalog of RRLyrae from $\sim$ 14 million VVV Light Curves: How far can we go with traditional machine-learning?
Authors:
Juan B. Cabral,
Felipe Ramos,
Sebastián Gurovich,
Pablo Granitto
Abstract:
The creation of a 3D map of the bulge using RRLyrae (RRL) is one of the main goals of the VVV(X) surveys. The overwhelming number of sources under analysis request the use of automatic procedures. In this context, previous works introduced the use of Machine Learning (ML) methods for the variable star classification. Our goal is the development and analysis of an automatic procedure, based on ML,…
▽ More
The creation of a 3D map of the bulge using RRLyrae (RRL) is one of the main goals of the VVV(X) surveys. The overwhelming number of sources under analysis request the use of automatic procedures. In this context, previous works introduced the use of Machine Learning (ML) methods for the variable star classification. Our goal is the development and analysis of an automatic procedure, based on ML, for the identification of RRLs in the VVV Survey. This procedure will be use to generate reliable catalogs integrated over several tiles in the survey. After the reconstruction of light-curves, we extract a set of period and intensity-based features. We use for the first time a new subset of pseudo color features. We discuss all the appropriate steps needed to define our automatic pipeline: selection of quality measures; sampling procedures; classifier setup and model selection. As final result, we construct an ensemble classifier with an average Recall of 0.48 and average Precision of 0.86 over 15 tiles. We also make available our processed datasets and a catalog of candidate RRLs. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, is that our results indicate that Color is an informative feature type of the RRL that should be considered for automatic classification methods via ML. We also argue that Recall and Precision in both tables and curves are high quality metrics for this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates it is important to use the original distribution more than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step, and that most errors in the identification of RRLs are related to low quality observations of some sources or to the difficulty to resolve the RRL-C type given the date.
△ Less
Submitted 4 May, 2021; v1 submitted 1 May, 2020;
originally announced May 2020.
-
Astroalign: A Python module for astronomical image registration
Authors:
Martin Beroiz,
Juan B. Cabral,
Bruno Sanchez
Abstract:
We present an algorithm implemented in the astroalign Python module for image registration in astronomy. Our module does not rely on WCS information and instead matches 3-point asterisms (triangles) on the images to find the most accurate linear transformation between the two. It is especially useful in the context of aligning images prior to stacking or performing difference image analysis. Astro…
▽ More
We present an algorithm implemented in the astroalign Python module for image registration in astronomy. Our module does not rely on WCS information and instead matches 3-point asterisms (triangles) on the images to find the most accurate linear transformation between the two. It is especially useful in the context of aligning images prior to stacking or performing difference image analysis. Astroalign can match images of different point-spread functions, seeing, and atmospheric conditions.
△ Less
Submitted 22 May, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue
Authors:
Benjamin R. Cowan,
Philip Doyle,
Justin Edwards,
Diego Garaialde,
Ali Hayes-Brady,
Holly P. Branigan,
João Cabral,
Leigh Clark
Abstract:
The assumptions we make about a dialogue partner's knowledge and communicative ability (i.e. our partner models) can influence our language choices. Although similar processes may operate in human-machine dialogue, the role of design in shaping these models, and their subsequent effects on interaction are not clearly understood. Focusing on synthesis design, we conduct a referential communication…
▽ More
The assumptions we make about a dialogue partner's knowledge and communicative ability (i.e. our partner models) can influence our language choices. Although similar processes may operate in human-machine dialogue, the role of design in shaping these models, and their subsequent effects on interaction are not clearly understood. Focusing on synthesis design, we conduct a referential communication experiment to identify the impact of accented speech on lexical choice. In particular, we focus on whether accented speech may encourage the use of lexical alternatives that are relevant to a partner's accent, and how this is may vary when in dialogue with a human or machine. We find that people are more likely to use American English terms when speaking with a US accented partner than an Irish accented partner in both human and machine conditions. This lends support to the proposal that synthesis design can influence partner perception of lexical knowledge, which in turn guide user's lexical choices. We discuss the findings with relation to the nature and dynamics of partner models in human machine dialogue.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
The State of Speech in HCI: Trends, Themes and Challenges
Authors:
Leigh Clark,
Phillip Doyle,
Diego Garaialde,
Emer Gilmartin,
Stephan Schlögl,
Jens Edlund,
Matthew Aylett,
João Cabral,
Cosmin Munteanu,
Benjamin Cowan
Abstract:
Speech interfaces are growing in popularity. Through a review of 68 research papers this work maps the trends, themes, findings and methods of empirical research on speech interfaces in HCI. We find that most studies are usability/theory-focused or explore wider system experiences, evaluating Wizard of Oz, prototypes, or developed systems by using self-report questionnaires to measure concepts lik…
▽ More
Speech interfaces are growing in popularity. Through a review of 68 research papers this work maps the trends, themes, findings and methods of empirical research on speech interfaces in HCI. We find that most studies are usability/theory-focused or explore wider system experiences, evaluating Wizard of Oz, prototypes, or developed systems by using self-report questionnaires to measure concepts like usability and user attitudes. A thematic analysis of the research found that speech HCI work focuses on nine key topics: system speech production, modality comparison, user speech production, assistive technology \& accessibility, design insight, experiences with interactive voice response (IVR) systems, using speech technology for development, people's experiences with intelligent personal assistants (IPAs) and how user memory affects speech interface interaction. From these insights we identify gaps and challenges in speech research, notably the need to develop theories of speech interface interaction, grow critical mass in this domain, increase design work, and expand research from single to multiple user interaction contexts so as to reflect current use contexts. We also highlight the need to improve measure reliability, validity and consistency, in the wild deployment and reduce barriers to building fully functional speech interfaces for research.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
From FATS to feets: Further improvements to an astronomical feature extraction tool based on machine learning
Authors:
J. B. Cabral,
B. Sánchez,
F. Ramos,
S. Gurovich,
P. Granitto,
J. Vanderplas
Abstract:
Machine learning algorithms are highly useful for the classification of time series data in astronomy in this era of peta-scale public survey data releases. These methods can facilitate the discovery of new unknown events in most astrophysical areas, as well as improving the analysis of samples of known phenomena. Machine learning algorithms use features extracted from collected data as input pred…
▽ More
Machine learning algorithms are highly useful for the classification of time series data in astronomy in this era of peta-scale public survey data releases. These methods can facilitate the discovery of new unknown events in most astrophysical areas, as well as improving the analysis of samples of known phenomena. Machine learning algorithms use features extracted from collected data as input predictive variables. A public tool called Feature Analysis for Time Series (FATS) has proved an excellent workhorse for feature extraction, particularly light curve classification for variable objects. In this study, we present a major improvement to FATS, which corrects inconvenient design choices, minor details, and documentation for the re-engineering process. This improvement comprises a new Python package called "feets", which is important for future code-refactoring for astronomical software tools.
△ Less
Submitted 6 September, 2018;
originally announced September 2018.
-
Corral Framework: Trustworthy and Fully Functional Data Intensive Parallel Astronomical Pipelines
Authors:
Juan B. Cabral,
Bruno Sánchez,
Martín Beroiz,
Mariano Domínguez,
Marcelo Lares,
Sebastián Gurovich,
Pablo Granitto
Abstract:
Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of…
▽ More
Data processing pipelines represent an important slice of the astronomical software library that include chains of processes that transform raw data into valuable information via data reduction and analysis. In this work we present Corral, a Python framework for astronomical pipeline generation. Corral features a Model-View-Controller design pattern on top of an SQL Relational Database capable of handling: custom data models; processing stages; and communication alerts, and also provides automatic quality and structural metrics based on unit testing. The Model-View-Controller provides concept separation between the user logic and the data models, delivering at the same time multi-processing and distributed computing capabilities. Corral represents an improvement over commonly found data processing pipelines in Astronomy since the design pattern eases the programmer from dealing with processing flow and parallelization issues, allowing them to focus on the specific algorithms needed for the successive data transformations and at the same time provides a broad measure of quality over the created pipeline. Corral and working examples of pipelines that use it are available to the community at https://github.com/toros-astro.
△ Less
Submitted 7 August, 2017; v1 submitted 19 January, 2017;
originally announced January 2017.