Fractal-like distributions over the rational numbers in high-throughput biological and clinical data

Sci Rep. 2011:1:191. doi: 10.1038/srep00191. Epub 2011 Dec 13.

Abstract

Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Comorbidity
  • Computational Biology*
  • Data Interpretation, Statistical*
  • Fractals
  • Gene Expression Profiling / methods*
  • Gene Expression Regulation
  • Genetic Variation
  • Genome, Human
  • Genomics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Influenza, Human / epidemiology
  • Lymphoma, Large B-Cell, Diffuse / metabolism
  • Medical Records Systems, Computerized
  • Models, Statistical
  • Prevalence
  • Risk Factors
  • Sequence Analysis, DNA / methods