Search | arXiv e-print repository

arXiv:2407.08166 [pdf, other]

Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

Authors: Mikhail Kulyabin, Paul A. Constable, Aleksei Zhdanov, Irene O. Lee, David H. Skuse, Dorothy A. Thompson, Andreas Maier

Abstract: The electroretinogram (ERG) is a clinical test that records the retina's electrical response to light. The ERG is a promising way to study different neurodevelopmental and neurodegenerative disorders, including autism spectrum disorder (ASD) - a neurodevelopmental condition that impacts language, communication, and reciprocal social interactions. However, in heterogeneous populations, such as ASD,… ▽ More The electroretinogram (ERG) is a clinical test that records the retina's electrical response to light. The ERG is a promising way to study different neurodevelopmental and neurodegenerative disorders, including autism spectrum disorder (ASD) - a neurodevelopmental condition that impacts language, communication, and reciprocal social interactions. However, in heterogeneous populations, such as ASD, where the ability to collect large datasets is limited, the application of artificial intelligence (AI) is complicated. Synthetic ERG signals generated from real ERG recordings carry similar information as natural ERGs and, therefore, could be used as an extension for natural data to increase datasets so that AI applications can be fully utilized. As proof of principle, this study presents a Generative Adversarial Network capable of generating synthetic ERG signals of children with ASD and typically developing control individuals. We applied a Time Series Transformer and Visual Transformer with Continuous Wavelet Transform to enhance classification results on the extended synthetic signals dataset. This approach may support classification models in related psychiatric conditions where the ERG may help classify disorders. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2401.00060 [pdf, ps, other]

Assessing your Observatory's Impact: Best Practices in Establishing and Maintaining Observatory Bibliographies

Authors: Observatory Bibliographers Collaboration, Raffaele D'Abrusco, Monique Gomez, Uta Grothkopf, Sharon Hunt, Ruth Kneale, Mika Konuma, Jenny Novacescu, Luisa Rebull, Elena Scire, Erin Scott, Donna Thompson, Lance Utley, Christopher Wilkinson, Sherry Winkelman

Abstract: Observatories need to measure and evaluate the scientific output and overall impact of their facilities. An observatory bibliography consists of the papers published using that observatory's data, typically gathered by searching the major journals for relevant keywords. Recently, the volume of literature and methods by which the publications pool is evaluated has increased. Efficient and standardi… ▽ More Observatories need to measure and evaluate the scientific output and overall impact of their facilities. An observatory bibliography consists of the papers published using that observatory's data, typically gathered by searching the major journals for relevant keywords. Recently, the volume of literature and methods by which the publications pool is evaluated has increased. Efficient and standardized procedures are necessary to assign meaningful metadata; enable user-friendly retrieval; and provide the opportunity to derive reports, statistics, and visualizations to impart a deeper understanding of the research output. In 2021, a group of observatory bibliographers from around the world convened online to continue the discussions presented in Lagerstrom (2015). We worked to extract general guidelines from our experiences, techniques, and lessons learnt. The paper explores the development, application, and current status of telescope bibliographies and future trends. This paper briefly describes the methodologies employed in constructing databases, along with the various bibliometric techniques used to analyze and interpret them. We explain reasons for non-standardization and why it is essential for each observatory to identify metadata and metrics that are meaningful for them; caution the (over-)use of comparisons among facilities that are, ultimately, not comparable through bibliometrics; and highlight the benefits of telescope bibliographies, both for researchers within the astronomical community and for stakeholders beyond the specific observatories. There is tremendous diversity in the ways bibliographers track publications and maintain databases, due to parameters such as resources, type of observatory, historical practices, and reporting requirements to funders and outside agencies. However, there are also common sets of Best Practices. △ Less

Submitted 28 July, 2024; v1 submitted 29 December, 2023; originally announced January 2024.

Comments: 20 pages, 3 appendices

arXiv:2312.14211 [pdf, ps, other]

Experimenting with Large Language Models and vector embeddings in NASA SciX

Authors: Sergi Blanco-Cuaresma, Ioana Ciucă, Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Kelly E. Lockhart, Felix Grezes, Thomas Allen, Golnaz Shapurian, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Daniel Chivvis, Fernanda de Macedo Alves, Jean-Claude Paquin, Jennifer Bartlett, Mugdha Polimera, Stephanie Jarmak

Abstract: Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed a… ▽ More Open-source Large Language Models enable projects such as NASA SciX (i.e., NASA ADS) to think out of the box and try alternative approaches for information retrieval and data augmentation, while respecting data copyright and users' privacy. However, when large language models are directly prompted with questions without any context, they are prone to hallucination. At NASA SciX we have developed an experiment where we created semantic vectors for our large collection of abstracts and full-text content, and we designed a prompt system to ask questions using contextual chunks from our system. Based on a non-systematic human evaluation, the experiment shows a lower degree of hallucination and better responses when using Retrieval Augmented Generation. Further exploration is required to design new features and data augmentation processes at NASA SciX that leverages this technology while respecting the high level of trust and quality that the project holds. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: To appear in the proceedings of the 33th annual international Astronomical Data Analysis Software & Systems (ADASS XXXIII)

arXiv:2212.00744 [pdf, ps, other]

Improving astroBERT using Semantic Textual Similarity

Authors: Felix Grezes, Thomas Allen, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Shinyi Chen, Jennifer Koch, Taylor Jacovich, Pavlos Protopapas

Abstract: The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first… ▽ More The NASA Astrophysics Data System (ADS) is an essential tool for researchers that allows them to explore the astronomy and astrophysics scientific literature, but it has yet to exploit recent advances in natural language processing. At ADASS 2021, we introduced astroBERT, a machine learning language model tailored to the text used in astronomy papers in ADS. In this work we: - announce the first public release of the astroBERT language model; - show how astroBERT improves over existing public language models on astrophysics specific tasks; - and detail how ADS plans to harness the unique structure of scientific papers, the citation graph and citation context, to further improve astroBERT. △ Less

Submitted 29 November, 2022; originally announced December 2022.

arXiv:2208.07904 [pdf, ps, other]

Sturm's Theorem with Endpoints

Authors: Philippe Pébay, J. Maurice Rojas, David C. Thompson

Abstract: Sturm's Theorem is a fundamental 19th century result relating the number of real roots of a polynomial $f$ in an interval to the number of sign alternations in a sequence of polynomial division-like calculations. We provide a short direct proof of Sturm's Theorem, including the numerically vexing case (ignored in many published accounts) where an interval endpoint is a root of $f$. Sturm's Theorem is a fundamental 19th century result relating the number of real roots of a polynomial $f$ in an interval to the number of sign alternations in a sequence of polynomial division-like calculations. We provide a short direct proof of Sturm's Theorem, including the numerically vexing case (ignored in many published accounts) where an interval endpoint is a root of $f$. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: 4 pages. A software implementation can be found in algorithm vtkPolynomialSolversUnivariate , within the VTK (Visualization Toolkit) software package

arXiv:2208.07261 [pdf, other]

Bias amplification in experimental social networks is reduced by resampling

Authors: Mathew D. Hardy, Bill D. Thompson, P. M. Krafft, Thomas L. Griffiths

Abstract: Large-scale social networks are thought to contribute to polarization by amplifying people's biases. However, the complexity of these technologies makes it difficult to identify the mechanisms responsible and to evaluate mitigation strategies. Here we show under controlled laboratory conditions that information transmission through social networks amplifies motivational biases on a simple perceptu… ▽ More Large-scale social networks are thought to contribute to polarization by amplifying people's biases. However, the complexity of these technologies makes it difficult to identify the mechanisms responsible and to evaluate mitigation strategies. Here we show under controlled laboratory conditions that information transmission through social networks amplifies motivational biases on a simple perceptual decision-making task. Participants in a large behavioral experiment showed increased rates of biased decision-making when part of a social network relative to asocial participants, across 40 independently evolving populations. Drawing on techniques from machine learning and Bayesian statistics, we identify a simple adjustment to content-selection algorithms that is predicted to mitigate bias amplification. This algorithm generates a sample of perspectives from within an individual's network that is more representative of the population as a whole. In a second large experiment, this strategy reduced bias amplification while maintaining the benefits of information sharing. △ Less

Submitted 5 October, 2022; v1 submitted 15 August, 2022; originally announced August 2022.

arXiv:2204.12993 [pdf, other]

Counterfactual harm

Authors: Jonathan G. Richens, Rory Beard, Daniel H. Thompson

Abstract: To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in ce… ▽ More To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. However, to date there is no statistical method for measuring harm and factoring it into algorithmic decisions. In this paper we propose the first formal definition of harm and benefit using causal models. We show that any factual definition of harm must violate basic intuitions in certain scenarios, and show that standard machine learning algorithms that cannot perform counterfactual reasoning are guaranteed to pursue harmful policies following distributional shifts. We use our definition of harm to devise a framework for harm-averse decision making using counterfactual objective functions. We demonstrate this framework on the problem of identifying optimal drug doses using a dose-response model learned from randomized control trial data. We find that the standard method of selecting doses using treatment effects results in unnecessarily harmful doses, while our counterfactual approach allows us to identify doses that are significantly less harmful without sacrificing efficacy. △ Less

Submitted 2 November, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: Accepted at NeurIPS 2022. Included appendix comparing to Beckers et. al. arXiv:2210.05327. Typos corrected

arXiv:2202.00777 [pdf, ps, other]

Web accessibility trends and implementation in dynamic web applications

Authors: Timothy W. Hostetler, Shinyi Chen, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Carolyn S. Grant, Edwin Henneken, Donna M. Thompson, Roman Chyla, Golnaz Shapurian, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Stephen McDonald, Felix Grezes

Abstract: The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to pre… ▽ More The NASA Astrophysics Data System (ADS), a critical research service for the astrophysics community, strives to provide the most accessible and inclusive environment for the discovery and exploration of the astronomical literature. Part of this goal involves creating a digital platform that can accommodate everybody, including those with disabilities that would benefit from alternative ways to present the information provided by the website. NASA ADS follows the official Web Content Accessibility Guidelines (WCAG) standard for ensuring accessibility of all its applications, striving to exceed this standard where possible. Through the use of both internal audits and external expert review based on these guidelines, we have identified many areas for improving accessibility in our current web application, and have implemented a number of updates to the UI as a result of this. We present an overview of some current web accessibility trends, discuss our experience incorporating these trends in our web application, and discuss the lessons learned and recommendations for future projects. △ Less

Submitted 1 February, 2022; originally announced February 2022.

Comments: Submitted to ADASS XXXI (2021)

arXiv:2112.00590 [pdf, ps, other]

Building astroBERT, a language model for Astronomy & Astrophysics

Authors: Felix Grezes, Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Golnaz Shapurian, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Nemanja Martinovic, Shinyi Chen, Chris Tanner, Pavlos Protopapas

Abstract: The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and… ▽ More The existing search tools for exploring the NASA Astrophysics Data System (ADS) can be quite rich and empowering (e.g., similar and trending operators), but researchers are not yet allowed to fully leverage semantic search. For example, a query for "results from the Planck mission" should be able to distinguish between all the various meanings of Planck (person, mission, constant, institutions and more) without further clarification from the user. At ADS, we are applying modern machine learning and natural language processing techniques to our dataset of recent astronomy publications to train astroBERT, a deeply contextual language model based on research at Google. Using astroBERT, we aim to enrich the ADS dataset and improve its discoverability, and in particular we are developing our own named entity recognition tool. We present here our preliminary results and lessons learned. △ Less

Submitted 1 December, 2021; originally announced December 2021.

arXiv:2110.07910 [pdf, other]

SaLinA: Sequential Learning of Agents

Authors: Ludovic Denoyer, Alfredo de la Fuente, Song Duong, Jean-Baptiste Gaya, Pierre-Alexandre Kamienny, Daniel H. Thompson

Abstract: SaLinA is a simple library that makes implementing complex sequential learning models easy, including reinforcement learning algorithms. It is built as an extension of PyTorch: algorithms coded with \SALINA{} can be understood in few minutes by PyTorch users and modified easily. Moreover, SaLinA naturally works with multiple CPUs and GPUs at train and test time, thus being a good fit for the large… ▽ More SaLinA is a simple library that makes implementing complex sequential learning models easy, including reinforcement learning algorithms. It is built as an extension of PyTorch: algorithms coded with \SALINA{} can be understood in few minutes by PyTorch users and modified easily. Moreover, SaLinA naturally works with multiple CPUs and GPUs at train and test time, thus being a good fit for the large-scale training use cases. In comparison to existing RL libraries, SaLinA has a very low adoption cost and capture a large variety of settings (model-based RL, batch RL, hierarchical RL, multi-agent RL, etc.). But SaLinA does not only target RL practitioners, it aims at providing sequential learning capabilities to any deep learning programmer. △ Less

Submitted 15 October, 2021; originally announced October 2021.

arXiv:2101.07220 [pdf, other]

A Tensor-Based Formulation of Hetero-functional Graph Theory

Authors: Amro M. Farid, Dakota Thompson, Wester Schoonenberg

Abstract: Recently, hetero-functional graph theory (HFGT) has developed as a means to mathematically model the structure of large-scale complex flexible engineering systems. It does so by fusing concepts from network science and model-based systems engineering (MBSE). For the former, it utilizes multiple graph-based data structures to support a matrix-based quantitative analysis. For the latter, HFGT inheri… ▽ More Recently, hetero-functional graph theory (HFGT) has developed as a means to mathematically model the structure of large-scale complex flexible engineering systems. It does so by fusing concepts from network science and model-based systems engineering (MBSE). For the former, it utilizes multiple graph-based data structures to support a matrix-based quantitative analysis. For the latter, HFGT inherits the heterogeneity of conceptual and ontological constructs found in model-based systems engineering including system form, system function, and system concept. These diverse conceptual constructs indicate multi-dimensional rather than two-dimensional relationships. This paper provides the first tensor-based treatment of hetero-functional graph theory. In particular, it addresses the ``system concept" and the hetero-functional adjacency matrix from the perspective of tensors and introduces the hetero-functional incidence tensor as a new data structure. The tensor-based formulation described in this work makes a stronger tie between HFGT and its ontological foundations in MBSE. Finally, the tensor-based formulation facilitates several analytical results that provide an understanding of the relationships between HFGT and multi-layer networks. △ Less

Submitted 12 October, 2022; v1 submitted 14 January, 2021; originally announced January 2021.

arXiv:2009.05048 [pdf, ps, other]

Agile methodologies in teams with highly creative and autonomous members

Authors: Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi

Abstract: The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented c… ▽ More The Agile manifesto encourages us to value individuals and interactions over processes and tools, while Scrum, the most adopted Agile development methodology, is essentially based on roles, events, artifacts, and the rules that bind them together (i.e., processes). Moreover, it is generally proclaimed that whenever a Scrum project does not succeed, the reason is because Scrum was not implemented correctly and not because Scrum may have its own flaws. This grants irrefutability to the methodology, discouraging deviations to fit the actual needs and peculiarities of the developers. In particular, the members of the NASA ADS team are highly creative and autonomous whose motivation can be affected if their freedom is too strongly constrained. We present our experience following Agile principles, reusing certain Scrum elements and seeking the satisfaction of the team members, while rapidly reacting/keeping the project in line with our stakeholders expectations. △ Less

Submitted 10 September, 2020; originally announced September 2020.

Comments: To appear in the proceedings of the 29th annual international Astronomical Data Analysis Software & Systems (ADASS XXIX)

arXiv:2005.10006 [pdf, other]

The Hetero-functional Graph Theory Toolbox

Authors: Dakota Thompson, Prabhat Hegde, Wester C. H. Schoonenberg, Inas Khayal, Amro M. Farid

Abstract: In the 20th century, newly invented technical artifacts were connected to form large-scale complex engineering systems. Furthermore, the interactions found within these networked systems has grown in both degree as well as heterogeneity. Consequently, these already complex engineering systems have converged in what is now called systems-of-systems. The analysis, design, planning, and operation of… ▽ More In the 20th century, newly invented technical artifacts were connected to form large-scale complex engineering systems. Furthermore, the interactions found within these networked systems has grown in both degree as well as heterogeneity. Consequently, these already complex engineering systems have converged in what is now called systems-of-systems. The analysis, design, planning, and operation of these engineering systems from a holistic perspective has necessitated ever-more sophisticated modeling techniques. Despite significant advancements in model-based systems engineering and network science, these seemingly disparate fields have experienced similar limitations in addressing the complexity of engineering systems. Hetero-Functional Graph Theory (HFGT) has emerged as a means to address some of these limitations. This paper serves as a user guide to a recently developed Hetero-functional Graph Theory Toolbox which facilitates the computation of HFGT mathematical models. It is written in the MATLAB language and has been tested with v9.6 (R2019a). It is openly available on GitHub together with a sample input file for straightforward re-use. The paper details the syntax and semantics of the input file, the principal data structure of the toolbox, and the functions used to construct and populate this data structure. The toolbox has been fully validated against several peer-review HFGT publications. △ Less

Submitted 2 October, 2020; v1 submitted 8 May, 2020; originally announced May 2020.

Comments: 12 pages, 6 figures

arXiv:2003.12828 [pdf, other]

Learning medical triage from clinicians using Deep Q-Learning

Authors: Albert Buchard, Baptiste Bouvier, Giulia Prando, Rory Beard, Michail Livieratos, Dan Busbridge, Daniel Thompson, Jonathan Richens, Yuanzhao Zhang, Adam Baker, Yura Perov, Kostis Gourgoulias, Saurabh Johri

Abstract: Medical Triage is of paramount importance to healthcare systems, allowing for the correct orientation of patients and allocation of the necessary resources to treat them adequately. While reliable decision-tree methods exist to triage patients based on their presentation, those trees implicitly require human inference and are not immediately applicable in a fully automated setting. On the other ha… ▽ More Medical Triage is of paramount importance to healthcare systems, allowing for the correct orientation of patients and allocation of the necessary resources to treat them adequately. While reliable decision-tree methods exist to triage patients based on their presentation, those trees implicitly require human inference and are not immediately applicable in a fully automated setting. On the other hand, learning triage policies directly from experts may correct for some of the limitations of hard-coded decision-trees. In this work, we present a Deep Reinforcement Learning approach (a variant of DeepQ-Learning) to triage patients using curated clinical vignettes. The dataset, consisting of 1374 clinical vignettes, was created by medical doctors to represent real-life cases. Each vignette is associated with an average of 3.8 expert triage decisions given by medical doctors relying solely on medical history. We show that this approach is on a par with human performance, yielding safe triage decisions in 94% of cases, and matching expert decisions in 85% of cases. The trained agent learns when to stop asking questions, acquires optimized decision policies requiring less evidence than supervised approaches, and adapts to the novelty of a situation by asking for more information. Overall, we demonstrate that a Deep Reinforcement Learning approach can learn effective medical triage policies directly from expert decisions, without requiring expert knowledge engineering. This approach is scalable and can be deployed in healthcare settings or geographical regions with distinct triage specifications, or where trained experts are scarce, to improve decision making in the early stage of care. △ Less

Submitted 24 June, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

Comments: 17 pages, 4 figures, 3 tables, preprint, in press

MSC Class: 93E35

arXiv:2003.02978 [pdf, other]

doi 10.1109/TGRS.2020.2976888

Fast and Accurate Retrieval of Methane Concentration from Imaging Spectrometer Data Using Sparsity Prior

Authors: Markus D. Foote, Philip E. Dennison, Andrew K. Thorpe, David R. Thompson, Siraput Jongaramrungruang, Christian Frankenberg, Sarang C. Joshi

Abstract: The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Air… ▽ More The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Airborne Visible InfraRed Imaging Spectrometer Next Generation (AVIRIS-NG). Current quantitative methods have tradeoffs between computational requirements and retrieval accuracy, creating obstacles for processing real-time data or large datasets from flight campaigns. We present a new computationally efficient algorithm that applies sparsity and an albedo correction to matched filter retrieval of trace gas concentration-pathlength. The new algorithm was tested using AVIRIS-NG data acquired over several point source plumes in Ahmedabad, India. The algorithm was validated using simulated AVIRIS-NG data including synthetic plumes of known methane concentration. Sparsity and albedo correction together reduced the root mean squared error of retrieved methane concentration-pathlength enhancement by 60.7% compared with a previous robust matched filter method. Background noise was reduced by a factor of 2.64. The new algorithm was able to process the entire 300 flightline 2016 AVIRIS-NG India campaign in just over 8 hours on a desktop computer with GPU acceleration. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Comments: 13 pages, 11 figures

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2020, pp. 1-13

arXiv:2001.05895 [pdf, other]

Masking schemes for universal marginalisers

Authors: Divya Gautam, Maria Lomeli, Kostis Gourgoulias, Daniel H. Thompson, Saurabh Johri

Abstract: We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser (arXiv:1711.00695) in order to learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$, where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ is some arbitrary subset of all random variables of the generative model of interest. In other words,… ▽ More We consider the effect of structure-agnostic and structure-dependent masking schemes when training a universal marginaliser (arXiv:1711.00695) in order to learn conditional distributions of the form $P(x_i |\mathbf x_{\mathbf b})$, where $x_i$ is a given random variable and $\mathbf x_{\mathbf b}$ is some arbitrary subset of all random variables of the generative model of interest. In other words, we mimic the self-supervised training of a denoising autoencoder, where a dataset of unlabelled data is used as partially observed input and the neural approximator is optimised to minimise reconstruction loss. We focus on studying the underlying process of the partially observed data---how good is the neural approximator at learning all conditional distributions when the observation process at prediction time differs from the masking process during training? We compare networks trained with different masking schemes in terms of their predictive performance and generalisation properties. △ Less

Submitted 16 January, 2020; originally announced January 2020.

Comments: To be published in Proceedings of the 2nd Symposium on Advances in Approximate Bayesian Inference, 2019

arXiv:1906.03479 [pdf, other]

Learning Radiative Transfer Models for Climate Change Applications in Imaging Spectroscopy

Authors: Shubhankar Deshpande, Brian D. Bue, David R. Thompson, Vijay Natraj, Mario Parente

Abstract: According to a recent investigation, an estimated 33-50% of the world's coral reefs have undergone degradation, believed to be as a result of climate change. A strong driver of climate change and the subsequent environmental impact are greenhouse gases such as methane. However, the exact relation climate change has to the environmental condition cannot be easily established. Remote sensing methods… ▽ More According to a recent investigation, an estimated 33-50% of the world's coral reefs have undergone degradation, believed to be as a result of climate change. A strong driver of climate change and the subsequent environmental impact are greenhouse gases such as methane. However, the exact relation climate change has to the environmental condition cannot be easily established. Remote sensing methods are increasingly being used to quantify and draw connections between rapidly changing climatic conditions and environmental impact. A crucial part of this analysis is processing spectroscopy data using radiative transfer models (RTMs) which is a computationally expensive process and limits their use with high volume imaging spectrometers. This work presents an algorithm that can efficiently emulate RTMs using neural networks leading to a multifold speedup in processing time, and yielding multiple downstream benefits. △ Less

Submitted 8 June, 2019; originally announced June 2019.

Comments: Accepted to International Conference on Machine Learning (ICML) 2019 Workshop: Climate Change: How Can AI Help?

arXiv:1901.05463 [pdf, ps, other]

Fundamentals of effective cloud management for the new NASA Astrophysics Data System

Authors: Sergi Blanco-Cuaresma, Alberto Accomazzi, Michael J. Kurtz, Edwin Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Stephen McDonald, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton, Kelly E. Lockhart, Kris Bukovi, Nathan Rapport

Abstract: The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient s… ▽ More The new NASA Astrophysics Data System (ADS) is designed with a serviceoriented architecture (SOA) that consists of multiple customized Apache Solr search engine instances plus a collection of microservices, containerized using Docker, and deployed in Amazon Web Services (AWS). For complex systems, like the ADS, this loosely coupled architecture can lead to a more scalable, reliable and resilient system if some fundamental questions are addressed. After having experimented with different AWS environments and deployment methods, we decided in December 2017 to go with Kubernetes as our container orchestration. Defining the best strategy to properly setup Kubernetes has shown to be challenging: automatic scaling services and load balancing traffic can lead to errors whose origin is difficult to identify, monitoring and logging the activity that happens across multiple layers for a single request needs to be carefully addressed, and the best workflow for a Continuous Integration and Delivery (CI/CD) system is not self-evident. We present here how we tackle these challenges and our plans for the future. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: To appear in the proceedings of the 28th annual international Astronomical Data Analysis Software & Systems (ADASS XXVIII)

arXiv:1710.08505 [pdf, ps, other]

doi 10.1051/epjconf/201818608001

New ADS Functionality for the Curator

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Steven McDonald, Taylor J. Shaulis, Sergi Blanco-Cuaresma, Golnaz Shapurian, Timothy W. Hostetler, Matthew R. Templeton

Abstract: In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community… ▽ More In this paper we provide an update concerning the operations of the NASA Astrophysics Data System (ADS), its services and user interface, and the content currently indexed in its database. As the primary information system used by researchers in Astronomy, the ADS aims to provide a comprehensive index of all scholarly resources appearing in the literature. With the current effort in our community to support data and software citations, we discuss what steps the ADS is taking to provide the needed infrastructure in collaboration with publishers and data providers. A new API provides access to the ADS search interface, metrics, and libraries allowing users to programmatically automate discovery and curation tasks. The new ADS interface supports a greater integration of content and services with a variety of partners, including ORCID claiming, indexing of SIMBAD objects, and article graphics from a variety of publishers. Finally, we highlight how librarians can facilitate the ingest of gray literature that they curate into our system. △ Less

Submitted 23 October, 2017; originally announced October 2017.

Comments: Submitted to the Proceedings of Library and Information Services in Astronomy VIII, Strasbourg, France

arXiv:1601.07858 [pdf, ps, other]

Aggregation and Linking of Observational Metadata in the ADS

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Carolyn S. Grant, Donna M. Thompson, Roman Chyla, Alexandra Holachek, Jonathan Elliott

Abstract: We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched b… ▽ More We discuss current efforts behind the curation of observing proposals, archive bibliographies, and data links in the NASA Astrophysics Data System (ADS). The primary data in the ADS is the bibliographic content from scholarly articles in Astronomy and Physics, which ADS aggregates from publishers, arXiv and conference proceeding sites. This core bibliographic information is then further enriched by ADS via the generation of citations and usage data, and through the aggregation of external resources from astronomy data archives and libraries. Important sources of such additional information are the metadata describing observing proposals and high level data products, which, once ingested in ADS, become easily discoverable and citeable by the science community. Bibliographic studies have shown that the integration of links between data archives and the ADS provides greater visibility to data products and increased citations to the literature associated with them. △ Less

Submitted 28 January, 2016; originally announced January 2016.

Comments: 4 pages, Proceedings of the ADASS XXV conference

arXiv:1507.03681 [pdf, other]

Teaching natural deduction in the right order with Natural Deduction Planner

Authors: Jeremy Seligman, Declan Thompson

Abstract: We describe a strategy-based approach to teaching natural deduction using a notation that emphasises the order in which deductions are constructed, together with a {\LaTeX} package and Java app to aid in the production of teaching resources and classroom demonstrations. Our approach is aimed at students with little exposure to mathematical method and has been developed while teaching undergraduate… ▽ More We describe a strategy-based approach to teaching natural deduction using a notation that emphasises the order in which deductions are constructed, together with a {\LaTeX} package and Java app to aid in the production of teaching resources and classroom demonstrations. Our approach is aimed at students with little exposure to mathematical method and has been developed while teaching undergraduate classes for philosophy students over the last ten years. △ Less

Submitted 13 July, 2015; originally announced July 2015.

Comments: Proceedings of the Fourth International Conference on Tools for Teaching Logic (TTL2015), Rennes, France, June 9-12, 2015. Editors: M. Antonia Huertas, João Marcos, María Manzano, Sophie Pinchinat, François Schwarzentruber

ACM Class: K.3.1

arXiv:1503.05881 [pdf, other]

ADS 2.0: new architecture, API and services

Authors: Roman Chyla, Alberto Accomazzi, Alexandra Holachek, Carolyn S. Grant, Jonathan Elliott, Edwin A. Henneken, Donna M. Thompson, Michael J. Kurtz, Stephen S. Murray, Vladimir Sudilovsky

Abstract: The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using… ▽ More The ADS platform is undergoing the biggest rewrite of its 20-year history. While several components have been added to its architecture over the past couple of years, this talk will concentrate on the underpinnings of ADS's search layer and its API. To illustrate the design of the components in the new system, we will show how the new ADS user interface is built exclusively on top of the API using RESTful web services. Taking one step further, we will discuss how we plan to expose the treasure trove of information hosted by ADS (10 million records and fulltext for much of the Astronomy and Physics refereed literature) to partners interested in using this API. This will provide you (and your intelligent applications) with access to ADS's underlying data to enable the extraction of new knowledge and the ingestion of these results back into the ADS. Using this framework, researchers could run controlled experiments with content extraction, machine learning, natural language processing, etc. In this talk, we will discuss what is already implemented, what will be available soon, and where we are going next. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Comments: ADASS Conference 2014

arXiv:1503.04194 [pdf, other]

ADS: The Next Generation Search Platform

Authors: Alberto Accomazzi, Michael J. Kurtz, Edwin A. Henneken, Roman Chyla, James Luker, Carolyn S. Grant, Donna M. Thompson, Alexandra Holachek, Rahul Dave, Stephen S. Murray

Abstract: Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse… ▽ More Four years after the last LISA meeting, the NASA Astrophysics Data System (ADS) finds itself in the middle of major changes to the infrastructure and contents of its database. In this paper we highlight a number of features of great importance to librarians and discuss the additional functionality that we are currently developing. Starting in 2011, the ADS started to systematically collect, parse and index full-text documents for all the major publications in Physics and Astronomy as well as many smaller Astronomy journals and arXiv e-prints, for a total of over 3.5 million papers. Our citation coverage has doubled since 2010 and now consists of over 70 million citations. We are normalizing the affiliation information in our records and, in collaboration with the CfA library and NASA, we have started collecting and linking funding sources with papers in our system. At the same time, we are undergoing major technology changes in the ADS platform which affect all aspects of the system and its operations. We have rolled out and are now enhancing a new high-performance search engine capable of performing full-text as well as metadata searches using an intuitive query language which supports fielded, unfielded and functional searches. We are currently able to index acknowledgments, affiliations, citations, funding sources, and to the extent that these metadata are available to us they are now searchable under our new platform. The ADS private library system is being enhanced to support reading groups, collaborative editing of lists of papers, tagging, and a variety of privacy settings when managing one's paper collection. While this effort is still ongoing, some of its benefits are already available through the ADS Labs user interface and API at http://adslabs.org/adsabs/ △ Less

Submitted 13 March, 2015; originally announced March 2015.

Comments: Submitted to Library and Information Services in Astronomy VII, Naples, Italy

arXiv:1408.0703 [pdf, other]

Computational Analysis of Perfect-Information Position Auctions

Authors: David R. M Thompson, Kevin Leyton-Brown

Abstract: After experimentation with other designs, the major search engines converged on the weighted, generalized second-price auction (wGSP) for selling keyword advertisements. Notably, this convergence occurred before position auctions were well understood (or, indeed, widely studied) theoretically. While much progress has been made since, theoretical analysis is still not able to settle the question of… ▽ More After experimentation with other designs, the major search engines converged on the weighted, generalized second-price auction (wGSP) for selling keyword advertisements. Notably, this convergence occurred before position auctions were well understood (or, indeed, widely studied) theoretically. While much progress has been made since, theoretical analysis is still not able to settle the question of why search engines found wGSP preferable to other position auctions. We approach this question in a new way, adopting a new analytical paradigm we dub "computational mechanism analysis." By sampling position auction games from a given distribution, encoding them in a computationally efficient representation language, computing their Nash equilibria, and then calculating economic quantities of interest, we can quantitatively answer questions that theoretical methods have not. We considered seven widely studied valuation models from the literature and three position auction variants (generalized first price, unweighted generalized second price, and wGSP). We found that wGSP consistently showed the best ads of any position auction, measured both by social welfare and by relevance (expected number of clicks). Even in models where wGSP was already known to have bad worse-case efficiency, we found that it almost always performed well on average. In contrast, we found that revenue was extremely variable across auction mechanisms, and was highly sensitive to equilibrium selection, the preference model, and the valuation distribution. △ Less

Submitted 4 August, 2014; originally announced August 2014.

arXiv:1406.4542 [pdf, ps, other]

Computing and Using Metrics in the ADS

Authors: Edwin A. Henneken, Alberto Accomazzi, Michael J. Kurtz, Carolyn S. Grant, Donna Thompson, Jay Luker, Roman Chyla, Alexandra Holachek, Stephen S. Murray

Abstract: Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication… ▽ More Finding measures for research impact, be it for individuals, institutions, instruments or projects, has gained a lot of popularity. More papers than ever are being written on new impact measures, and problems with existing measures are being pointed out on a regular basis. Funding agencies require impact statistics in their reports, job candidates incorporate them in their resumes, and publication metrics have even been used in at least one recent court case. To support this need for research impact indicators, the SAO/NASA Astrophysics Data System (ADS) has developed a service which provides a broad overview of various impact measures. In this presentation we discuss how the ADS can be used to quench the thirst for impact measures. We will also discuss a couple of the lesser known indicators in the metrics overview and the main issues to be aware of when compiling publication-based metrics in the ADS, namely author name ambiguity and citation incompleteness. △ Less

Submitted 17 June, 2014; originally announced June 2014.

Comments: to appear in proceedings of LISA VII conference, Naples, Italy

arXiv:1210.0840 [pdf, ps, other]

ADS Labs - Supporting Information Discovery in Science Education

Authors: Edwin A. Henneken, Donna Thompson

Abstract: The SAO/NASA Astrophysics Data System (ADS) is an open access digital library portal for researchers in astronomy and physics, operated by the Smithsonian Astrophysical Observatory (SAO) under a NASA grant, successfully serving the professional science community for two decades. Currently there are about 55,000 frequent users (100+ queries per year), and up to 10 million infrequent users per year.… ▽ More The SAO/NASA Astrophysics Data System (ADS) is an open access digital library portal for researchers in astronomy and physics, operated by the Smithsonian Astrophysical Observatory (SAO) under a NASA grant, successfully serving the professional science community for two decades. Currently there are about 55,000 frequent users (100+ queries per year), and up to 10 million infrequent users per year. Access by the general public now accounts for about half of all ADS use, demonstrating the vast reach of the content in our databases. The visibility and use of content in the ADS can be measured by the fact that there are over 17,000 links from Wikipedia pages to ADS content, a figure comparable to the number of links that Wikipedia has to OCLCs WorldCat catalog. The ADS, through its holdings and innovative techniques available in ADS Labs (http://adslabs.org), offers an environment for information discovery that is unlike any other service currently available to the astrophysics community. Literature discovery and review are important components of science education, aiding the process of preparing for a class, project, or presentation. The ADS has been recognized as a rich source of information for the science education community in astronomy, thanks to its collaborations within the astronomy community, publishers and projects like Com- PADRE. One element that makes the ADS uniquely relevant for the science education community is the availability of powerful tools to explore aspects of the astronomy literature as well as the relationship between topics, people, observations and scientific papers. The other element is the extensive repository of scanned literature, a significant fraction of which consists of historical literature. △ Less

Submitted 2 October, 2012; originally announced October 2012.

Comments: 6 pages, 3 figures, to appear in the proceedings of the 2012 ASP Meeting "Communicating Science", held August 4-8, in Tucson, AZ

arXiv:1005.2308 [pdf, ps, other]

doi 10.1007/978-1-4419-8369-5_14

Finding Your Literature Match -- A Recommender System

Authors: Edwin A. Henneken, Michael J. Kurtz, Alberto Accomazzi, Carolyn Grant, Donna Thompson, Elizabeth Bohlen, Giovanni Di Milia, Jay Luker, Stephen S. Murray

Abstract: The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still… ▽ More The universe of potentially interesting, searchable literature is expanding continuously. Besides the normal expansion, there is an additional influx of literature because of interdisciplinary boundaries becoming more and more diffuse. Hence, the need for accurate, efficient and intelligent search tools is bigger than ever. Even with a sophisticated search engine, looking for information can still result in overwhelming results. An overload of information has the intrinsic danger of scaring visitors away, and any organization, for-profit or not-for-profit, in the business of providing scholarly information wants to capture and keep the attention of its target audience. Publishers and search engine engineers alike will benefit from a service that is able to provide visitors with recommendations that closely meet their interests. Providing visitors with special deals, new options and highlights may be interesting to a certain degree, but what makes more sense (especially from a commercial point of view) than to let visitors do most of the work by the mere action of making choices? Hiring psychics is not an option, so a technological solution is needed to recommend items that a visitor is likely to be looking for. In this presentation we will introduce such a solution and argue that it is practically feasible to incorporate this approach into a useful addition to any information retrieval system with enough usage. △ Less

Submitted 13 May, 2010; originally announced May 2010.

Comments: Contribution to the proceedings of the colloquium Future Professional Communication in Astronomy II, 13-14 April 2010, Cambridge, Massachusetts. 11 pages, 4 figures.

arXiv:0808.0103 [pdf, ps, other]

doi 10.1016/j.joi.2008.10.001

Use of Astronomical Literature - A Report on Usage Patterns

Authors: Edwin A. Henneken, Michael J. Kurtz, Alberto Accomazzi, Carolyn S. Grant, Donna Thompson, Elizabeth Bohlen, Stephen S. Murray

Abstract: In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways… ▽ More In this paper we present a number of metrics for usage of the SAO/NASA Astrophysics Data System (ADS). Since the ADS is used by the entire astronomical community, these are indicative of how the astronomical literature is used. We will show how the use of the ADS has changed both quantitatively and qualitatively. We will also show that different types of users access the system in different ways. Finally, we show how use of the ADS has evolved over the years in various regions of the world. The ADS is funded by NASA Grant NNG06GG68G. △ Less

Submitted 3 October, 2008; v1 submitted 1 August, 2008; originally announced August 2008.

Comments: 12 pages, 8 figures, 2 tables. Accepted by Journal of Informetrics

arXiv:cs/0701035 [pdf, ps, other]

Finding Astronomical Communities Through Co-readership Analysis

Authors: Edwin A. Henneken, Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Donna Thompson, Elizabeth Bohlen, Stephen S. Murray

Abstract: Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the r… ▽ More Whenever a large group of people are engaged in an activity, communities will form. The nature of these communities depends on the relationship considered. In the group of people who regularly use scholarly literature, a relationship like ``person i and person j have cited the same paper'' might reveal communities of people working in a particular field. On this poster, we will investigate the relationship ``person i and person j have read the same paper''. Using the data logs of the NASA/Smithsonian Astrophysics Data System (ADS), we first determine the population that will participate by requiring that a user queries the ADS at a certain rate. Next, we apply the relationship to this population. The result of this will be an abstract ``relationship space'', which we will describe in terms of various ``representations''. Examples of such ``representations'' are the projection of co-read vectors onto Principal Components and the spectral density of the co-read network. We will show that the co-read relationship results in structure, we will describe this structure and we will provide a first attempt in the classification of this structure in terms of astronomical communities. The ADS is funded by NASA Grant NNG06GG68G. △ Less

Submitted 5 January, 2007; originally announced January 2007.

Comments: poster presented at the 209th AAS Meeting, 7 pages, 4 figures

arXiv:cs/0610030 [pdf, ps, other]

Paper to Screen: Processing Historical Scans in the ADS

Authors: Donna M. Thompson, Alberto Accomazzi, Guenther Eichhorn, Carolyn Grant, Edwin Henneken, Michael J. Kurtz, Elizabeth Bohlen, Stephen S. Murray

Abstract: The NASA Astrophysics Data System in conjunction with the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics is working on a project to microfilm historical observatory publications. The microfilm is then scanned for inclusion in the ADS. The ADS currently contains over 700,000 scanned pages of volumes of historical literature. Many of these volumes lack clear pagination or other… ▽ More The NASA Astrophysics Data System in conjunction with the Wolbach Library at the Harvard-Smithsonian Center for Astrophysics is working on a project to microfilm historical observatory publications. The microfilm is then scanned for inclusion in the ADS. The ADS currently contains over 700,000 scanned pages of volumes of historical literature. Many of these volumes lack clear pagination or other bibliographic data that are necessary to take advantage of the searching capabilities of the ADS. This paper will address some of the interesting challenges that needed to be resolved during the processing of the Observatory Reports included in the ADS. △ Less

Submitted 5 October, 2006; originally announced October 2006.

Comments: 4 pages; submitted to the proceedings of Library and Information Services in Astronomy; to be published in the ASP Conference Series

arXiv:cs/0610029 [pdf, ps, other]

Data in the ADS -- Understanding How to Use it Better

Authors: Carolyn S. Grant, Alberto Accomazzi, Donna Thompson, Edwin Henneken, Guenther Eichhorn, Michael J. Kurtz, Stephen S. Murray

Abstract: The Smithsonian/NASA ADS Abstract Service contains a wealth of data for astronomers and librarians alike, yet the vast majority of usage consists of rudimentary searches. Hints on how to obtain more focused search results by using more of the various capabilities of the ADS are presented, including searching by affiliation. We also discuss the classification of articles by content and by referee… ▽ More The Smithsonian/NASA ADS Abstract Service contains a wealth of data for astronomers and librarians alike, yet the vast majority of usage consists of rudimentary searches. Hints on how to obtain more focused search results by using more of the various capabilities of the ADS are presented, including searching by affiliation. We also discuss the classification of articles by content and by referee status. The ADS is funded by NASA Grant NNG06GG68G-16613687. △ Less

Submitted 5 October, 2006; originally announced October 2006.

Comments: 4 pages; submitted to the proceedings of the Library and Information Services in Astronomy V; to be published by ASP Conference Proceedings

arXiv:cs/0610011 [pdf, ps, other]

Creation and use of Citations in the ADS

Authors: Alberto Accomazzi, Gunther Eichhorn, Michael J. Kurtz, Carolyn S. Grant, Edwin Henneken, Markus Demleitner, Donna Thompson, Elizabeth Bohlen, Stephen S. Murray

Abstract: With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data u… ▽ More With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data used in the creation of citations, the policies and procedures that we follow to avoid double-counting and to eliminate contributions which may not be scholarly in nature. Finally, we describe how users and institutions can easily obtain quantitative citation data from the ADS, both interactively and via web-based programming tools. The ADS is available at http://ads.harvard.edu. △ Less

Submitted 3 October, 2006; originally announced October 2006.

Comments: 9 pages; to be published in the proceedings of the conference "Library and Information Services V," June 2006, Cambridge, MA, USA

arXiv:cs/0610008 [pdf, ps, other]

Connectivity in the Astronomy Digital Library

Authors: Günther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Edwin A. Henneken, Donna M. Thompson, Michael J. Kurtz, Stephen S. Murray

Abstract: The Astrophysics Data System (ADS) provides an extensive system of links between the literature and other on-line information. Recently, the journals of the American Astronomical Society (AAS) and a group of NASA data centers have collaborated to provide more links between on-line data obtained by space missions and the on-line journals. Authors can now specify which data sets they have used in… ▽ More The Astrophysics Data System (ADS) provides an extensive system of links between the literature and other on-line information. Recently, the journals of the American Astronomical Society (AAS) and a group of NASA data centers have collaborated to provide more links between on-line data obtained by space missions and the on-line journals. Authors can now specify which data sets they have used in their article. This information is used by the participants to provide the links between the literature and the data. The ADS is available at: http://ads.harvard.edu △ Less

Submitted 2 October, 2006; originally announced October 2006.

Comments: To appear in Library and Information Systems in Astronomy V

arXiv:cs/0610007 [pdf, ps, other]

Full Text Searching in the Astrophysics Data System

Authors: Günther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Edwin A. Henneken, Donna M. Thompson, Michael J. Kurtz, Stephen S. Murray

Abstract: The Smithsonian/NASA Astrophysics Data System (ADS) provides a search system for the astronomy and physics scholarly literature. All major and many smaller astronomy journals that were published on paper have been scanned back to volume 1 and are available through the ADS free of charge. All scanned pages have been converted to text and can be searched through the ADS Full Text Search System. In… ▽ More The Smithsonian/NASA Astrophysics Data System (ADS) provides a search system for the astronomy and physics scholarly literature. All major and many smaller astronomy journals that were published on paper have been scanned back to volume 1 and are available through the ADS free of charge. All scanned pages have been converted to text and can be searched through the ADS Full Text Search System. In addition, searches can be fanned out to several external search systems to include the literature published in electronic form. Results from the different search systems are combined into one results list. The ADS Full Text Search System is available at: http://adsabs.harvard.edu/fulltext_service.html △ Less

Submitted 5 October, 2006; v1 submitted 2 October, 2006; originally announced October 2006.

Comments: To appear in Library and Information Systems in Astronomy V

arXiv:cs/0609126 [pdf, ps, other]

doi 10.1087/095315107779490661

E-prints and Journal Articles in Astronomy: a Productive Co-existence

Authors: Edwin A. Henneken, Michael J. Kurtz, Simeon Warner, Paul Ginsparg, Guenther Eichhorn, Alberto Accomazzi, Carolyn S. Grant, Donna Thompson, Elizabeth Bohlen, Stephen S. Murray

Abstract: Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data Sy… ▽ More Are the e-prints (electronic preprints) from the arXiv repository being used instead of the journal articles? In this paper we show that the e-prints have not undermined the usage of journal papers in the astrophysics community. As soon as the journal article is published, the astronomical community prefers to read the journal article and the use of e-prints through the NASA Astrophysics Data System drops to zero. This suggests that the majority of astronomers have access to institutional subscriptions and that they choose to read the journal article when given the choice. Within the NASA Astrophysics Data System they are given this choice, because the e-print and the journal article are treated equally, since both are just one click away. In other words, the e-prints have not undermined journal use in the astrophysics community and thus currently do not pose a financial threat to the publishers. We present readership data for the arXiv category "astro-ph" and the 4 core journals in astronomy (Astrophysical Journal, Astronomical Journal, Monthly Notices of the Royal Astronomical Society and Astronomy & Astrophysics). Furthermore, we show that the half-life (the point where the use of an article drops to half the use of a newly published article) for an e-print is shorter than for a journal paper. The ADS is funded by NASA Grant NNG06GG68G. arXiv receives funding from NSF award #0404553 △ Less

Submitted 22 September, 2006; originally announced September 2006.

Comments: 8 pages, 4 figures, submitted to Learned Publishing

Journal ref: Learn.Publ.20:16-22,2007

arXiv:astro-ph/0609794 [pdf, ps, other]

The Future of Technical Libraries

Authors: Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn Grant, Edwin Henneken, Donna Thompson, Elizabeth Bohlen, Stephen S. Murray

Abstract: Technical libraries are currently experiencing very rapid change. In the near future their mission will change, their physical nature will change, and the skills of their employees will change. While some will not be able to make these changes, and will fail, others will lead us into a new era. Technical libraries are currently experiencing very rapid change. In the near future their mission will change, their physical nature will change, and the skills of their employees will change. While some will not be able to make these changes, and will fail, others will lead us into a new era. △ Less

Submitted 28 September, 2006; originally announced September 2006.

Comments: To appear in Library and Information Systems in Astronomy V

arXiv:cs/0608027 [pdf, ps, other]

myADS-arXiv - a Tailor-Made, Open Access, Virtual Journal

Authors: E. Henneken, M. J. Kurtz, G. Eichhorn, A. Accomazzi, C. S. Grant, D. Thompson, E. Bohlen, S. S. Murray

Abstract: The myADS-arXiv service provides the scientific community with a one stop shop for staying up-to-date with a researcher's field of interest. The service provides a powerful and unique filter on the enormous amount of bibliographic information added to the ADS on a daily basis. It also provides a complete view with the most relevant papers available in the subscriber's field of interest. With thi… ▽ More The myADS-arXiv service provides the scientific community with a one stop shop for staying up-to-date with a researcher's field of interest. The service provides a powerful and unique filter on the enormous amount of bibliographic information added to the ADS on a daily basis. It also provides a complete view with the most relevant papers available in the subscriber's field of interest. With this service, the subscriber will get to know the lastest developments, popular trends and the most important papers. This makes the service not only unique from a technical point of view, but also from a content point of view. On this poster we will argue why myADS-arXiv is a tailor-made, open access, virtual journal and we will illustrate its unique character. △ Less

Submitted 4 August, 2006; originally announced August 2006.

Comments: 4 pages, 2 figures, poster paper to appear in the proceedings of the LISA V conference

arXiv:cs/0604061 [pdf]

Effect of E-printing on Citation Rates in Astronomy and Physics

Authors: Edwin A. Henneken, Michael J. Kurtz, Guenther Eichhorn, Alberto Accomazzi, Carolyn Grant, Donna Thompson, Stephen S. Murray

Abstract: In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsoni… ▽ More In this report we examine the change in citation behavior since the introduction of the arXiv e-print repository (Ginsparg, 2001). It has been observed that papers that initially appear as arXiv e-prints get cited more than papers that do not (Lawrence, 2001; Brody et al., 2004; Schwarz & Kennicutt, 2004; Kurtz et al., 2005a, Metcalfe, 2005). Using the citation statistics from the NASA-Smithsonian Astrophysics Data System (ADS; Kurtz et al., 1993, 2000), we confirm the findings from other studies, we examine the average citation rate to e-printed papers in the Astrophysical Journal, and we show that for a number of major astronomy and physics journals the most important papers are submitted to the arXiv e-print repository first. △ Less

Submitted 5 June, 2006; v1 submitted 13 April, 2006; originally announced April 2006.

Comments: Submitted to the Journal of Electronic Publishing. 11 pages with 5 figures

Showing 1–38 of 38 results for author: Thompson, D