Zum Hauptinhalt springen

Showing 1–20 of 20 results for author: Grossman, R L

.
  1. arXiv:2404.15475  [pdf, ps, other

    cs.IR

    An Annotated Glossary for Data Commons, Data Meshes, and Other Data Platforms

    Authors: Robert L. Grossman

    Abstract: Cloud-based data commons, data meshes, data hubs, and other data platforms are important ways to manage, analyze and share data to accelerate research and to support reproducible research. This is an annotated glossary of some of the more common terms used in articles and discussions about these platforms.

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 6 pages

  2. arXiv:2311.05659  [pdf, other

    cs.LG cs.AI

    Enhancing Instance-Level Image Classification with Set-Level Labels

    Authors: Renyu Zhang, Aly A. Khan, Yuxin Chen, Robert L. Grossman

    Abstract: Instance-level image classification tasks have traditionally relied on single-instance labels to train models, e.g., few-shot learning and transfer learning. However, set-level coarse-grained labels that capture relationships among instances can provide richer information in real-world scenarios. In this paper, we present a novel approach to enhance instance-level image classification by leveragin… ▽ More

    Submitted 17 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  3. arXiv:2302.02425  [pdf, ps, other

    q-bio.OT

    Principles and Guidelines for Sharing Biomedical Data for Secondary Use: The University of Chicago Perspective

    Authors: Robert L. Grossman, Maryellen L. Giger, Julie A. Johnson, Jeremy D. Marks, Jessica P. Ridgway, Julian Solway, Walter M. Stadler

    Abstract: Academic medical centers are generating an increasing amount of biomedical data and there is an increasing demand for biomedical data for research purposes by research projects, research consortia, companies, and other third parties. At the same time, as the number of patients grows and the amount of data per patient grows, there is an increasing possibility that some information about some patien… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: 6 pages

  4. arXiv:2207.11167  [pdf, ps, other

    cs.DC

    Ten Lessons for Data Sharing With a Data Commons

    Authors: Robert L. Grossman

    Abstract: A data commons is a cloud-based data platform with a governance structure that allows a community to manage, analyze and share its data. Data commons provide a research community with the ability to manage and analyze large datasets using the elastic scalability provided by cloud computing and to share data securely and compliantly, and, in this way, accelerate the pace of research. Over the past… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

  5. arXiv:2203.05097  [pdf

    cs.DC

    A Framework for the Interoperability of Cloud Platforms: Towards FAIR Data in SAFE Environments

    Authors: Robert L. Grossman, Rebecca R. Boyles, Brandi N. Davis-Dusenbery, Amanda Haddock, Allison P. Heath, Brian D. O'Connor, Adam C. Resnick, Deanne M. Taylor, Stan Ahalt

    Abstract: As the number of cloud platforms supporting scientific research grows, there is an increasing need to support interoperability between two or more cloud platforms, as a growing amount of data is being hosted in cloud-based platforms. A well accepted core concept is to make data in cloud platforms Findable, Accessible, Interoperable and Reusable (FAIR). We introduce a companion concept that applies… ▽ More

    Submitted 15 February, 2024; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 16 pages with 2 figures

    ACM Class: D.2.11; D.2.12; E.0

  6. arXiv:2112.13737  [pdf, other

    cs.LG cs.AI

    Scalable Batch-Mode Deep Bayesian Active Learning via Equivalence Class Annealing

    Authors: Renyu Zhang, Aly A. Khan, Robert L. Grossman, Yuxin Chen

    Abstract: Active learning has demonstrated data efficiency in many fields. Existing active learning algorithms, especially in the context of batch-mode deep Bayesian active models, rely heavily on the quality of uncertainty estimations of the model, and are often challenging to scale to large batches. In this paper, we propose Batch-BALanCe, a scalable batch-mode active learning algorithm, which combines in… ▽ More

    Submitted 20 February, 2023; v1 submitted 27 December, 2021; originally announced December 2021.

  7. arXiv:2007.09526  [pdf, ps, other

    math.RA math.DS

    The realization of input-output maps using bialgebras

    Authors: Robert L. Grossman, Richard G. Larson

    Abstract: We use the theory of bialgebras to provide the algebraic background for state space realization theorems for input-output maps of control systems. This allows us to consider from a common viewpoint classical results about formal state space realizations of nonlinear systems and more recent results involving analysis related to families of trees. If $H$ is a bialgebra, we say that $p \in H^*$ is di… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: 16 pages

    MSC Class: 93B15 (Primary) 16T10; 93C10; 05C05 (Secondary)

  8. arXiv:1809.01699  [pdf

    q-bio.GN cs.CY

    Data Lakes, Clouds and Commons: A Review of Platforms for Analyzing and Sharing Genomic Data

    Authors: Robert L. Grossman

    Abstract: Data commons collate data with cloud computing infrastructure and commonly used software services, tools and applications to create biomedical resources for the large-scale management, analysis, harmonization, and sharing of biomedical data. Over the past few years, data commons have been used to analyze, harmonize and share large scale genomics datasets. Data ecosystems can be built by interopera… ▽ More

    Submitted 24 December, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

    Comments: 28 pages, 4 figures

  9. arXiv:1703.01692  [pdf, other

    stat.ME

    Detecting Spatial Patterns of Disease in Large Collections of Electronic Medical Records Using Neighbor-Based Bootstrapping (NB2)

    Authors: Maria T Patterson, Robert L Grossman

    Abstract: We introduce a method called neighbor-based bootstrapping (NB2) that can be used to quantify the geospatial variation of a variable. We applied this method to an analysis of the incidence rates of disease from electronic medical record data (ICD-9 codes) for approximately 100 million individuals in the US over a period of 8 years. We considered the incidence rate of disease in each county and its… ▽ More

    Submitted 5 March, 2017; originally announced March 2017.

  10. arXiv:1604.02608  [pdf, other

    cs.CY cs.DC

    A Case for Data Commons: Towards Data Science as a Service

    Authors: Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson, Walt Wells

    Abstract: As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science lifecycle, including long-term data storage, data exploration and discovery services, and compute capabilities to support data analysis and re-analysis, as new data are added and as scienti… ▽ More

    Submitted 9 April, 2016; originally announced April 2016.

  11. arXiv:1601.00323  [pdf, other

    cs.CE

    The Design of a Community Science Cloud: The Open Science Data Cloud Perspective

    Authors: Robert L. Grossman, Matthew Greenway, Allison P. Heath, Ray Powell, Rafael D. Suarez, Walt Wells, Kevin White, Malcolm Atkinson, Iraklis Klampanos, Heidi L. Alvarez, Christine Harvey, Joe J. Mambretti

    Abstract: In this paper we describe the design, and implementation of the Open Science Data Cloud, or OSDC. The goal of the OSDC is to provide petabyte-scale data cloud infrastructure and related services for scientists working with large quantities of data. Currently, the OSDC consists of more than 2000 cores and 2 PB of storage distributed across four data centers connected by 10G networks. We discuss som… ▽ More

    Submitted 3 January, 2016; originally announced January 2016.

    Comments: 12 pages, 3 figures

  12. arXiv:1007.1261  [pdf, other

    cs.DC

    MalStone: Towards A Benchmark for Analytics on Large Data Clouds

    Authors: Collin Bennett, Robert L. Grossman, David Locke, Jonathan Seidman, Steve Vejcik

    Abstract: Developing data mining algorithms that are suitable for cloud computing platforms is currently an active area of research, as is developing cloud computing platforms appropriate for data mining. Currently, the most common benchmark for cloud computing is the Terasort (and related) benchmarks. Although the Terasort Benchmark is quite useful, it was not designed for data mining per se. In this paper… ▽ More

    Submitted 7 July, 2010; originally announced July 2010.

  13. arXiv:0901.2735  [pdf, ps, other

    stat.ML math.RA math.ST

    State Space Realization Theorems For Data Mining

    Authors: Robert L Grossman, Richard G Larson

    Abstract: In this paper, we consider formal series associated with events, profiles derived from events, and statistical models that make predictions about events. We prove theorems about realizations for these formal series using the language and tools of Hopf algebras.

    Submitted 18 January, 2009; originally announced January 2009.

    MSC Class: 62A01; 16W30

  14. arXiv:0809.1181  [pdf

    cs.DC

    Sector and Sphere: Towards Simplified Storage and Processing of Large Scale Distributed Data

    Authors: Yunhong Gu, Robert L Grossman

    Abstract: Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. In contrast to existing storage and compute clouds, Sector can manage data not only within a data center, but also ac… ▽ More

    Submitted 16 January, 2009; v1 submitted 6 September, 2008; originally announced September 2008.

  15. arXiv:0808.3019  [pdf, other

    cs.DC

    Data Mining Using High Performance Data Clouds: Experimental Studies Using Sector and Sphere

    Authors: Robert L Grossman, Yunhong Gu

    Abstract: We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastructure that provides resources and/or services over the Internet. A storage cloud provides storage services, while a compute cloud provides compute services. We describe the design of the Sector storage cloud and how it p… ▽ More

    Submitted 21 August, 2008; originally announced August 2008.

  16. arXiv:0808.1802  [pdf, other

    cs.DC

    Compute and Storage Clouds Using Wide Area High Performance Networks

    Authors: Robert L. Grossman, Yunhong Gu, Michael Sabala, Wanzhi Zhang

    Abstract: We describe a cloud based infrastructure that we have developed that is optimized for wide area, high performance networks and designed to support data mining applications. The infrastructure consists of a storage cloud called Sector and a compute cloud called Sphere. We describe two applications that we have built using the cloud and some experimental studies.

    Submitted 13 August, 2008; originally announced August 2008.

  17. arXiv:0711.3877  [pdf, ps, other

    math.RA math.CO

    Hopf-algebraic structures of families of trees

    Authors: R. L. Grossman, R. G. Larson

    Abstract: Description of cocommutative Hopf algebras associated with families of trees. Applications include Cayley's theorem on the number of rooted trees with n nodes, and Catalan's theorem on the number of rooted ordered trees with n nodes.

    Submitted 24 November, 2007; originally announced November 2007.

    Comments: 29 pages

    MSC Class: 16W30

    Journal ref: J. Algebra, 126 (1989), 184-210

  18. arXiv:0711.3875  [pdf, ps, other

    math.RA math.CO

    An Overview of Hopf Algebras of Trees and Their Actions on Functions

    Authors: Robert L. Grossman, Richard G. Larson

    Abstract: We provide an expository account of some of the Hopf algebras that can be defined using trees, labeled trees, ordered trees and heap ordered trees. We also describe some actions of these Hopf algebras on algebra of functions.

    Submitted 24 November, 2007; originally announced November 2007.

    MSC Class: 16W30; 05C05

  19. arXiv:0706.1327  [pdf, ps, other

    math.RA

    Hopf Algebras of Heap Ordered Trees and Permutations

    Authors: R. L. Grossman, R. G. Larson

    Abstract: It is known that there is a Hopf algebra structure on the vector space with basis all heap-ordered trees. We give a new bialgebra structure on the space with basis all permutations and show that there is a direct bialgebra isomorphism between the Hopf algebra of heap-ordered trees and the bialgebra of permutations.

    Submitted 14 November, 2007; v1 submitted 9 June, 2007; originally announced June 2007.

    Comments: 10 pages LaTeX, minor revision

    MSC Class: 16W30

  20. arXiv:math/0409006  [pdf, ps, other

    math.QA

    Differential Algebra Structures on Familes of Trees

    Authors: Robert L Grossman, Richard G Larson

    Abstract: It is known that the vector space spanned by labeled rooted trees forms a Hopf algebra. Let k be a field and let R be a commutative k-algebra. Let H denote the Hopf algebra of rooted trees labeled using derivations D in Der(R). In this paper, we introduce a construction which gives R a H-module algebra structure and show this induces a differential algebra structure of H acting on R. The work he… ▽ More

    Submitted 31 August, 2004; originally announced September 2004.

    Comments: 31 pages, 8 figures