Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome

Daniall Masood; Luyao Ren; Cu Nguyen; Francesco G Brundu; Lily Zheng; Yongmei Zhao; Erich Jaeger; Yong Li; Seong Won Cha; Aaron Halpern; Sean Truong; Michael Virata; Chunhua Yan; Qingrong Chen; Andy Pang; Reyes Alberto; Chunlin Xiao; Zhaowei Yang; Wanqiu Chen; Charles Wang; Frank Cross Jr; Severine Catreux; Leming Shi; Julia A Beaver; Wenming Xiao; Daoud M Meerzaman

doi:10.1186/s13059-024-03294-8

Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome

Genome Biol. 2024 Jun 20;25(1):163. doi: 10.1186/s13059-024-03294-8.

Authors

Daniall Masood^#¹, Luyao Ren^#², Cu Nguyen^#³, Francesco G Brundu^#⁴, Lily Zheng^#¹, Yongmei Zhao⁵, Erich Jaeger⁴, Yong Li⁴, Seong Won Cha⁴, Aaron Halpern⁴, Sean Truong⁴, Michael Virata⁴, Chunhua Yan³, Qingrong Chen³, Andy Pang⁶, Reyes Alberto⁶, Chunlin Xiao⁷, Zhaowei Yang⁸, Wanqiu Chen⁸, Charles Wang⁸, Frank Cross Jr¹, Severine Catreux⁴, Leming Shi², Julia A Beaver^{1

9}, Wenming Xiao¹⁰, Daoud M Meerzaman¹¹

Affiliations

¹ Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA.
² State Key Laboratory of Genetic Engineering, Human Phenome Institute, School of Life Sciences and Shanghai Cancer Center, Fudan University, Shanghai, China.
³ Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA.
⁴ Illumina Inc., San Diego, CA, USA.
⁵ Sequencing Facility Bioinformatics Group, Biomedical Informatics and Data Science Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, USA.
⁶ Bionano Genomics, San Diego, CA, 20892, USA.
⁷ National Center for Biotechnology Information, National Librarssy of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA.
⁸ Center for Genomics, Loma Linda University School of Medicine, Loma Linda, CA, USA.
⁹ Oncology Center of Excellence, Food and Drug Administration, Silver Spring, MD, USA.
¹⁰ Office of Oncologic Diseases, Office of New Drug, Center for Drug Evaluation and Research, Food and Drug Administration, 10903 New Hampshire Avenue, Silver Spring, 20993, USA. [email protected].
¹¹ Computational Genomics and Bioinformatics Branch, Center for Biomedical Informatics and Information Technology (CBIIT), National Cancer Institute, Rockville, MD, USA. [email protected].

^# Contributed equally.

Abstract

Background: Copy number variation (CNV) is a key genetic characteristic for cancer diagnostics and can be used as a biomarker for the selection of therapeutic treatments. Using data sets established in our previous study, we benchmark the performance of cancer CNV calling by six most recent and commonly used software tools on their detection accuracy, sensitivity, and reproducibility. In comparison to other orthogonal methods, such as microarray and Bionano, we also explore the consistency of CNV calling across different technologies on a challenging genome.

Results: While consistent results are observed for copy gain, loss, and loss of heterozygosity (LOH) calls across sequencing centers, CNV callers, and different technologies, variation of CNV calls are mostly affected by the determination of genome ploidy. Using consensus results from six CNV callers and confirmation from three orthogonal methods, we establish a high confident CNV call set for the reference cancer cell line (HCC1395).

Conclusions: NGS technologies and current bioinformatics tools can offer reliable results for detection of copy gain, loss, and LOH. However, when working with a hyper-diploid genome, some software tools can call excessive copy gain or loss due to inaccurate assessment of genome ploidy. With performance matrices on various experimental conditions, this study raises awareness within the cancer research community for the selection of sequencing platforms, sample preparation, sequencing coverage, and the choice of CNV detection tools.

Keywords: Accuracy; Bioinformatics tools; Cancer genome; Consistency; Copy number variation; Detection sensitivity; Genome ploidy; Next-generation sequencing; Reproducibility.

Publication types

Evaluation Study
Research Support, N.I.H., Intramural
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Cell Line, Tumor
Computational Biology* / methods
DNA Copy Number Variations*
Diploidy
Genome, Human
High-Throughput Nucleotide Sequencing* / methods
Humans
Loss of Heterozygosity*
Neoplasms* / genetics
Reproducibility of Results
Sequence Analysis, DNA / methods
Software*

Abstract

Publication types

MeSH terms

Grants and funding