Storing sparse and repeated data in multivariate Markovian models of tuberculosis spread

Comput Biomed Res. 1996 Apr;29(2):85-92. doi: 10.1006/cbmr.1996.0009.

Abstract

Through the use of appropriate sparse storage techniques, we were able to reduce memory usage in a multivariate Markovian model for the spread of tuberculosis in the United States through the year 2010. A straightforward software implementation of the model would have required approximately 2.5 x 10(9) bytes of storage for the population of each year being modeled and approximately 1.3 x 10(14) bytes of storage for each year-to-year set of transition probabilities. We were able to reduce memory usage in the model by 96% for cross-sectional population data and over 99.9% for transition probability data. Data structure initialization time for population data was increased by a factor of 16.48 and lookup time for population data was increased by a factor of 11.3 over times required for an array implementation. For transition data the initialization and lookup times were increased by negligible factors. This work was done under contract from the Centers for Disease Control and the Association of Teachers of Preventive Medicine.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Centers for Disease Control and Prevention, U.S.
  • Cross-Sectional Studies
  • Humans
  • Information Storage and Retrieval*
  • Markov Chains*
  • Models, Statistical
  • Multivariate Analysis
  • Population Surveillance
  • Preventive Medicine
  • Probability
  • Software
  • Tuberculosis, Pulmonary / epidemiology*
  • United States / epidemiology