-
Virtual Observatory Publishing with DaCHS
Authors:
Markus Demleitner,
Margarida Castro Neves,
Florian Rothmaier,
Joachim Wambsganss
Abstract:
The Data Center Helper Suite DaCHS is an integrated publication package for building Virtual Observatory (VO) and Web services, supporting the entire workflow from ingestion to data mapping to service definition. It implements all major data discovery, data access, and registry protocols defined by the VO. DaCHS in this sense works as glue between data produced by the data providers and the standa…
▽ More
The Data Center Helper Suite DaCHS is an integrated publication package for building Virtual Observatory (VO) and Web services, supporting the entire workflow from ingestion to data mapping to service definition. It implements all major data discovery, data access, and registry protocols defined by the VO. DaCHS in this sense works as glue between data produced by the data providers and the standard protocols and formats defined by the VO. This paper discusses central elements of the design of the package and gives two case studies of how VO protocols are implemented using DaCHS' concepts.
△ Less
Submitted 25 August, 2014;
originally announced August 2014.
-
The Virtual Observatory Registry
Authors:
Markus Demleitner,
Gretchen Greene,
Pierre Le Sidaner,
Raymond L. Plante
Abstract:
In the Virtual Observatory (VO), the Registry provides the mechanism with which users and applications discover and select resources -- typically, data and services -- that are relevant for a particular scientific problem. Even though the VO adopted technologies in particular from the bibliographic community where available, building the Registry system involved a major standardisation effort, inv…
▽ More
In the Virtual Observatory (VO), the Registry provides the mechanism with which users and applications discover and select resources -- typically, data and services -- that are relevant for a particular scientific problem. Even though the VO adopted technologies in particular from the bibliographic community where available, building the Registry system involved a major standardisation effort, involving about a dozen interdependent standard texts. This paper discusses the server-side aspects of the standards and their application, as regards the functional components (registries), the resource records in both format and content, the exchange of resource records between registries (harvesting), as well as the creation and management of the identifiers used in the system based on the notion of authorities. Registry record authors, registry operators or even advanced users thus receive a big picture serving as a guideline through the body of relevant standard texts. To complete this picture, we also mention common usage patterns and open issues as appropriate.
△ Less
Submitted 11 July, 2014;
originally announced July 2014.
-
IVOA Recommendation: DALI: Data Access Layer Interface Version 1.0
Authors:
Patrick Dowler,
Markus Demleitner,
Mark Taylor,
Doug Tody
Abstract:
This document describes the Data Access Layer Interface (DALI). DALI defines the base web service interface common to all Data Access Layer (DAL) services. This standard defines the behaviour of common resources, the meaning and use of common parameters, success and error responses, and DAL service registration. The goal of this specification is to define the common elements that are shared across…
▽ More
This document describes the Data Access Layer Interface (DALI). DALI defines the base web service interface common to all Data Access Layer (DAL) services. This standard defines the behaviour of common resources, the meaning and use of common parameters, success and error responses, and DAL service registration. The goal of this specification is to define the common elements that are shared across DAL services in order to foster consistency across concrete DAL service specifications and to enable standard re-usable client and service implementations and libraries to be written and widely adopted.
△ Less
Submitted 19 February, 2014;
originally announced February 2014.
-
IVOA Recommendation: TAPRegExt: a VOResource Schema Extension for Describing TAP Services
Authors:
Markus Demleitner,
Patrick Dowler,
Ray Plante,
Guy Rixon,
Mark Taylor
Abstract:
This document describes an XML encoding standard for metadata about services implementing the table access protocol TAP [TAP], referred to as TAPRegExt. Instance documents are part of the service's registry record or can be obtained from the service itself. They deliver information to both humans and software on the languages, output formats, and upload methods supported by the service, as well as…
▽ More
This document describes an XML encoding standard for metadata about services implementing the table access protocol TAP [TAP], referred to as TAPRegExt. Instance documents are part of the service's registry record or can be obtained from the service itself. They deliver information to both humans and software on the languages, output formats, and upload methods supported by the service, as well as data models implemented by the exposed tables, optional language features, and certain limits enforced by the service.
△ Less
Submitted 19 February, 2014;
originally announced February 2014.
-
The Bibliometric Properties of Article Readership Information
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn S. Grant,
Markus Demleitner,
Stephen S. Murray,
Nathalie Martimbeau,
Barbara Elwell
Abstract:
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliome…
▽ More
The NASA Astrophysics Data System (ADS), along with astronomy's journals and data centers (a collaboration dubbed URANIA), has developed a distributed on-line digital library which has become the dominant means by which astronomers search, access and read their technical literature. Digital libraries such as the NASA Astrophysics Data System permit the easy accumulation of a new type of bibliometric measure, the number of electronic accesses (``reads'') of individual articles. We explore various aspects of this new measure. We examine the obsolescence function as measured by actual reads, and show that it can be well fit by the sum of four exponentials with very different time constants. We compare the obsolescence function as measured by readership with the obsolescence function as measured by citations. We find that the citation function is proportional to the sum of two of the components of the readership function. This proves that the normative theory of citation is true in the mean. We further examine in detail the similarities and differences between the citation rate, the readership rate and the total citations for individual articles, and discuss some of the causes. Using the number of reads as a bibliometric measure for individuals, we introduce the read-cite diagram to provide a two-dimensional view of an individual's scientific productivity. We develop a simple model to account for an individual's reads and cites and use it to show that the position of a person in the read-cite diagram is a function of age, innate productivity, and work history. We show the age biases of both reads and cites, and develop two new bibliometric measures which have substantially less age bias than citations
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
Worldwide Use and Impact of the NASA Astrophysics Data System Digital Library
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Markus Demleitner,
Stephen S. Murray
Abstract:
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data o…
▽ More
By combining data from the text, citation, and reference databases with data from the ADS readership logs we have been able to create Second Order Bibliometric Operators, a customizable class of collaborative filters which permits substantially improved accuracy in literature queries.
Using the ADS usage logs along with membership statistics from the International Astronomical Union and data on the population and gross domestic product (GDP) we develop an accurate model for world-wide basic research where the number of scientists in a country is proportional to the GDP of that country, and the amount of basic research done by a country is proportional to the number of scientists in that country times that country's per capita GDP.
We introduce the concept of utility time to measure the impact of the ADS/URANIA and the electronic astronomical library on astronomical research. We find that in 2002 it amounted to the equivalent of 736 FTE researchers, or $250 Million, or the astronomical research done in France.
Subject headings: digital libraries; bibliometrics; sociology of science; information retrieval
△ Less
Submitted 25 September, 2009;
originally announced September 2009.
-
Creation and use of Citations in the ADS
Authors:
Alberto Accomazzi,
Gunther Eichhorn,
Michael J. Kurtz,
Carolyn S. Grant,
Edwin Henneken,
Markus Demleitner,
Donna Thompson,
Elizabeth Bohlen,
Stephen S. Murray
Abstract:
With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data u…
▽ More
With over 20 million records, the ADS citation database is regularly used by researchers and librarians to measure the scientific impact of individuals, groups, and institutions. In addition to the traditional sources of citations, the ADS has recently added references extracted from the arXiv e-prints on a nightly basis. We review the procedures used to harvest and identify the reference data used in the creation of citations, the policies and procedures that we follow to avoid double-counting and to eliminate contributions which may not be scholarly in nature. Finally, we describe how users and institutions can easily obtain quantitative citation data from the ADS, both interactively and via web-based programming tools.
The ADS is available at http://ads.harvard.edu.
△ Less
Submitted 3 October, 2006;
originally announced October 2006.
-
Bibliographic Classification using the ADS Databases
Authors:
Alberto Accomazzi,
Michael J. Kurtz,
Guenther Eichhorn,
Edwin Henneken,
Carolyn S. Grant,
Markus Demleitner,
Stephen S. Murray
Abstract:
We discuss two techniques used to characterize bibliographic records based on their similarity to and relationship with the contents of the NASA Astrophysics Data System (ADS) databases. The first method has been used to classify input text as being relevant to one or more subject areas based on an analysis of the frequency distribution of its individual words. The second method has been used to…
▽ More
We discuss two techniques used to characterize bibliographic records based on their similarity to and relationship with the contents of the NASA Astrophysics Data System (ADS) databases. The first method has been used to classify input text as being relevant to one or more subject areas based on an analysis of the frequency distribution of its individual words. The second method has been used to classify existing records as being relevant to one or more databases based on the distribution of the papers citing them. Both techniques have proven to be valuable tools in assigning new and existing bibliographic records to different disciplines within the ADS databases.
△ Less
Submitted 31 October, 2005;
originally announced November 2005.
-
The Effect of Use and Access on Citations
Authors:
Michael J. Kurtz,
Guenther Eichhorn,
Alberto Accomazzi,
Carolyn Grant,
Markus Demleitner,
Edwin Henneken,
Stephen S. Murray
Abstract:
It has been shown (S. Lawrence, 2001, Nature, 411, 521) that journal articles which have been posted without charge on the internet are more heavily cited than those which have not been. Using data from the NASA Astrophysics Data System (ads.harvard.edu) and from the ArXiv e-print archive at Cornell University (arXiv.org) we examine the causes of this effect.
It has been shown (S. Lawrence, 2001, Nature, 411, 521) that journal articles which have been posted without charge on the internet are more heavily cited than those which have not been. Using data from the NASA Astrophysics Data System (ads.harvard.edu) and from the ArXiv e-print archive at Cornell University (arXiv.org) we examine the causes of this effect.
△ Less
Submitted 14 March, 2005;
originally announced March 2005.
-
Automated Resolution of Noisy Bibliographic References
Authors:
Markus Demleitner,
Michael Kurtz,
Alberto Accomazzi,
Günther Eichhorn,
Carolyn S. Grant,
Steven S. Murray
Abstract:
We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against the database provides u…
▽ More
We describe a system used by the NASA Astrophysics Data System to identify bibliographic references obtained from scanned article pages by OCR methods with records in a bibliographic database. We analyze the process generating the noisy references and conclude that the three-step procedure of correcting the OCR results, parsing the corrected string and matching it against the database provides unsatisfactory results. Instead, we propose a method that allows a controlled merging of correction, parsing and matching, inspired by dependency grammars. We also report on the effectiveness of various heuristics that we have employed to improve recall.
△ Less
Submitted 27 January, 2004;
originally announced January 2004.