Computational Analysis of RNA-Seq Data from Airway Epithelial Cells for Studying Lung Disease

Methods Mol Biol. 2018:1809:203-235. doi: 10.1007/978-1-4939-8570-8_15.

Abstract

Airway epithelial cells (AECs) play a central role in the pathogenesis of many lung diseases. Consequently, advancements in our understanding of the underlying causes of lung diseases, and the development of novel treatments, depend on continued detailed study of these cells. Generation and analysis of high-throughput gene expression data provide an indispensable tool for carrying out the type of broad-scale investigations needed to identify the key genes and molecular pathways that regulate, distinguish, and predict distinct pulmonary pathologies. Of the available technologies for generating genome-wide expression data, RNA sequencing (RNA-seq) has emerged as the most powerful. Hence many researchers are turning to this approach in their studies of lung disease. For the relatively uninitiated, computational analysis of RNA-seq data can be daunting, given the large number of methods and software packages currently available. The aim of this chapter is to provide a broad overview of the major steps involved in processing and analyzing RNA-seq data, with a special focus on methods optimized for data generated from AECs. We take the reader from the point of obtaining sequence reads from the lab to the point of making biological inferences with expression data. Along the way, we discuss the statistical and computational considerations one typically confronts during different phases of analysis and point to key methods, software packages, papers, online guides, and other resources that can facilitate successful RNA-seq analysis.

Keywords: Clustering; Data normalization; Differential expression; Functional enrichment; Gene mapping; Gene quantification; Pathway analysis; Transcript quantification; Transcriptome alignment; WGCNA.

MeSH terms

  • Alveolar Epithelial Cells / cytology*
  • Alveolar Epithelial Cells / metabolism*
  • Computational Biology / methods
  • Data Interpretation, Statistical
  • Gene Expression Profiling* / methods
  • Genetic Variation
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Lung Diseases / genetics*
  • Lung Diseases / metabolism
  • Molecular Sequence Annotation
  • Respiratory Mucosa / cytology*
  • Sequence Analysis, DNA
  • Transcriptome*