The NYU Data Catalog: a modular, flexible infrastructure for data discovery

J Am Med Inform Assoc. 2023 Sep 25;30(10):1693-1700. doi: 10.1093/jamia/ocad125.

Abstract

Objective: Researchers at New York University (NYU) Grossman School of Medicine contacted the Health Sciences Library for help with locating large datasets for reuse. In response, the library developed and maintained the NYU Data Catalog, a public-facing data catalog that has supported not only faculty acquisition of data but also the dissemination of the products of their research in various ways.

Materials and methods: The current NYU Data Catalog is built upon the Symfony framework with a tailored metadata schema reflecting the scope of faculty research areas. The project team curates new resources, including datasets and supporting software code, and conducts quarterly and annual evaluations to assess user interactions with the NYU Data Catalog and opportunities for growth.

Results: Since its launch in 2015, the NYU Data Catalog underwent a number of changes prompted by an increase in the disciplines represented by faculty contributors. The catalog has also utilized faculty feedback to enhance support of data reuse and researcher collaboration through alterations to its schema, layout, and visibility of records.

Discussion: These findings demonstrate the flexibility of data catalogs as a platform for enabling the discovery of disparate sources of data. While not a repository, the NYU Data Catalog is well-positioned to support mandates for data sharing from study sponsors and publishers.

Conclusion: The NYU Data Catalog makes the most of the data that researchers share and can be harnessed as a modular and adaptable platform to promote data sharing as a cultural practice.

Keywords: data discovery; data management; data sharing; information storage and retrieval; medical informatics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Humans
  • Medicine*
  • New York
  • Software*
  • Universities