Introductory Analysis and Validation of CUT&RUN Sequencing Data

Junwoo Lee; Biji Chatterjee; Nakyung Oh; Dhurjhoti Saha; Yue Lu; Blaine Bartholomew; Charles A Ishak

doi:10.3791/67359

Introductory Analysis and Validation of CUT&RUN Sequencing Data

J Vis Exp. 2024 Dec 13:(214). doi: 10.3791/67359.

Authors

Junwoo Lee¹, Biji Chatterjee², Nakyung Oh¹, Dhurjhoti Saha¹, Yue Lu¹, Blaine Bartholomew¹, Charles A Ishak³

Affiliations

¹ Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center.
² Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center; Department of Genomic Medicine, University of Texas MD Anderson Cancer Center.
³ Department of Epigenetics and Molecular Carcinogenesis, University of Texas MD Anderson Cancer Center; Department of Gynecologic Oncology and Reproductive Medicine, University of Texas MD Anderson Cancer Center; [email protected].

PMID: 39760380
DOI: 10.3791/67359

Abstract

The CUT&RUN technique facilitates detection of protein-DNA interactions across the genome. Typical applications of CUT&RUN include profiling changes in histone tail modifications or mapping transcription factor chromatin occupancy. Widespread adoption of CUT&RUN is driven, in part, by technical advantages over conventional ChIP-seq that include lower cell input requirements, lower sequencing depth requirements, and increased sensitivity with reduced background signal due to a lack of cross-linking agents that otherwise mask antibody epitopes. Widespread adoption of CUT&RUN has also been achieved through the generous sharing of reagents by the Henikoff lab and the development of commercial kits to accelerate adoption for beginners. As technical adoption of CUT&RUN increases, CUT&RUN sequencing analysis and validation become critical bottlenecks that must be surmounted to enable complete adoption by predominantly wet lab teams. CUT&RUN analysis typically begins with quality control checks on raw sequencing reads to assess sequencing depth, read quality, and potential biases. Reads are then aligned to a reference genome sequence assembly, and several bioinformatics tools are subsequently employed to annotate genomic regions of protein enrichment, confirm data interpretability, and draw biological conclusions. Although multiple in silico analysis pipelines have been developed to support CUT&RUN data analysis, their complex multi-module structure and usage of multiple programming languages render the platforms difficult for bioinformatics beginners who may lack familiarity with multiple programming languages but wish to understand the CUT&RUN analysis procedure and customize their analysis pipelines. Here, we provide a single-language step-by-step CUT&RUN analysis pipeline protocol designed for users with any level of bioinformatics experience. This protocol includes completing critical quality checks to validate that the sequencing data is suitable for biological interpretation. We expect that following the introductory protocol provided in this article combined with downstream peak annotation will allow users to draw biological insights from their own CUT&RUN datasets.

Publication types

Video-Audio Media

MeSH terms

Chromatin Immunoprecipitation Sequencing / methods
DNA / chemistry
DNA / genetics
High-Throughput Nucleotide Sequencing / methods
Humans
Sequence Analysis, DNA* / methods

Substances

DNA