Somatic embryogenesis (SE) is the developmental reprogramming of somatic cells toward the embryogenesis pathway and is a notable illustration of cell totipotency. To identify genes involved in SE, subtractive polymerase chain reaction (PCR) was performed to generate transcripts highly enriched for SE-related genes, using cDNA prepared from a mixture of embryogenic callus and pre-globular somatic embryos, as the tester, and cDNA from non-embryogenic callus, as the driver. After differential screening and subsequent confirmation by reverse Northern blot analysis, a total of 671 differentially expressed cDNA fragments were identified, and 242 uni-genes significantly up-regulated during cotton SE were recovered, as confirmed by Northern blot and reverse-transcription PCR analysis of representative cases, including most previously published SE-related genes in plants. In total, more than half had not been identified previously as SE-related genes, including dominant crucial genes involved in transcription, post-transcription, and transportation, and about one-third had not been reported previously to GenBank or were expected to be unknown, or newly identified genes. We used cDNA arrays to further investigate the expression patterns of these genes in differentiating gradient culture, ranging from pro-embryogenic masses to somatic embryos at every stage. The cDNA collection is composed of a broad repertoire of SE genes which is an important resource for understanding the genetic interactions underlying SE signaling and regulation. Our results suggested that a complicated and concerted mechanism involving multiple cellular pathways is responsible for cotton SE. This report represents a systematic and comprehensive analysis of genes involved in the process of somatic embryogenesis.