Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer

Thomas A Peterson; Nathan L Nehrt; Dohwan Park; Maricel G Kann

doi:10.1136/amiajnl-2011-000655

Incorporating molecular and functional context into the analysis and prioritization of human variants associated with cancer

J Am Med Inform Assoc. 2012 Mar-Apr;19(2):275-83. doi: 10.1136/amiajnl-2011-000655.

Authors

Thomas A Peterson¹, Nathan L Nehrt, Dohwan Park, Maricel G Kann

Affiliation

¹ University of Maryland, Baltimore County, Baltimore, Maryland 21250, USA.

Abstract

Background and objective: With recent breakthroughs in high-throughput sequencing, identifying deleterious mutations is one of the key challenges for personalized medicine. At the gene and protein level, it has proven difficult to determine the impact of previously unknown variants. A statistical method has been developed to assess the significance of disease mutation clusters on protein domains by incorporating domain functional annotations to assist in the functional characterization of novel variants.

Methods: Disease mutations aggregated from multiple databases were mapped to domains, and were classified as either cancer- or non-cancer-related. The statistical method for identifying significantly disease-associated domain positions was applied to both sets of mutations and to randomly generated mutation sets for comparison. To leverage the known function of protein domain regions, the method optionally distributes significant scores to associated functional feature positions.

Results: Most disease mutations are localized within protein domains and display a tendency to cluster at individual domain positions. The method identified significant disease mutation hotspots in both the cancer and non-cancer datasets. The domain significance scores (DS-scores) for cancer form a bimodal distribution with hotspots in oncogenes forming a second peak at higher DS-scores than non-cancer, and hotspots in tumor suppressors have scores more similar to non-cancers. In addition, on an independent mutation benchmarking set, the DS-score method identified mutations known to alter protein function with very high precision.

Conclusion: By aggregating mutations with known disease association at the domain level, the method was able to discover domain positions enriched with multiple occurrences of deleterious mutations while incorporating relevant functional annotations. The method can be incorporated into translational bioinformatics tools to characterize rare and novel variants within large-scale sequencing studies.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Databases, Protein
Disease / genetics
Humans
Mutation*
Neoplasms / genetics*
Protein Structure, Tertiary / genetics*
Proteins / chemistry
Proteins / genetics*

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding