Genome-wide identification of the essential protein-coding genes and long non-coding RNAs for human pan-cancer

Bioinformatics. 2019 Nov 1;35(21):4344-4349. doi: 10.1093/bioinformatics/btz230.

Abstract

Motivation: Genome-scale CRISPR/Cas9 system has been a democratized gene editing technique and widely used to investigate gene functions in some biological processes and diseases especially cancers. Aiming to characterize gene aberrations and assess their effects on cancer, we designed a pipeline to identify the essential genes for pan-cancer.

Methods: CRISPR screening data were used to identify the essential genes that were collected from published data and integrated by Robust Rank Aggregation algorithm. Then, hypergeometrics test and random walks with restart (RWR) were used to predict additional essential genes on broader scale. Finally, the expression status and potential roles of these genes were explored based on TCGA portal and regulatory network analysis.

Results: We collected 926 samples from 10 CRISPR-based screening studies involving 33 different types of cancer to identify cancer-essential genes, which consists of 799 protein-coding genes (PCGs) and 97 long non-coding RNAs (lncRNAs). Then, we constructed a 'bi-colored' network with both PCGs and lncRNAs and applied it to predict additional essential genes including 495 PCGs and 280 lncRNAs on a broader scale using hypergeometrics test and RWR. After obtaining all essential genes, we further investigated their potential roles in cancer and found that essential genes have higher and more stable expression levels, and are associated with multiple cancer-associated biological processes and survival time. The regulatory network analysis detected two intriguing modules of essential genes participating in the regulation of cell cycle and ribosome biogenesis in cancer.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Genome-Wide Association Study
  • Humans
  • Neoplasms* / genetics
  • Oncogenes
  • RNA, Long Noncoding

Substances

  • RNA, Long Noncoding