High-throughput complement component 4 genomic sequence analysis with C4Investigator

HLA. 2024 Jan;103(1):e15273. doi: 10.1111/tan.15273. Epub 2023 Oct 29.

Abstract

The complement component 4 gene loci, composed of the C4A and C4B genes and located on chromosome 6, encodes for complement component 4 (C4) proteins, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. C4A and C4B gene loci exhibit copy number variation, with each composite gene varying between 0 and 5 copies per haplotype. C4A and C4B genes also vary in size depending on the presence of the human endogenous retrovirus (HERV) in intron 9, denoted by C4(L) for long-form and C4(S) for short-form, which affects expression and is found in both C4A and C4B. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. C4A and C4B copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high-throughput genomic sequence analysis of C4A and C4B variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high-throughput characterization of human C4A and C4B sequences from short-read sequencing data, named C4Investigator. Using paired-end targeted or whole genome sequence data as input, C4Investigator determines the overall gene copy numbers, as well as C4A, C4B, C4(Rodger), C4(Ch), C4(L), and C4(S). Additionally, C4Ivestigator reports the full overall C4A and C4B aligned sequence, enabling nucleotide level analysis. To demonstrate the utility of this workflow we have analyzed C4A and C4B variation in the 1000 Genomes Project Data set, showing that these genes are highly poly-allelic with many variants that have the potential to impact C4 protein function.

Keywords: C4; bioinformatics pipeline; complement component; copy number; genotyping; immunogenetics.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Alleles
  • Complement C4 / genetics
  • Complement C4b* / genetics
  • DNA Copy Number Variations*
  • Epitopes
  • Genomics
  • Humans
  • Sequence Analysis

Substances

  • Complement C4b
  • Complement C4
  • Epitopes