The goal of this research is to computationally identify candidate modifiers for retinitis pigmentosa (RP), a group of rare genetic disorders that trigger the cellular degeneration of retinal tissue. RP being subject to phenotypic variation complicates diagnosis and treatment of the disease. In a previous study, modifiers of RP were identified by an association between genetic variation in the DNA sequence and variation in eye size in a well-characterized Drosophila model of RP. This study will instead focus on RNA expression data to identify candidate modifier genes whose expression is correlated with phenotypic variation in eye size. The proposed approach uses the K-Means algorithm to cluster 171 Drosophila strains based on their expression profiles for 18,140 genes in adult females. This algorithm is designed to investigate the correlation between Drosophila eye size and genetic expression and gather suspect genes from clusters with abnormally large or small eyes. The clustering algorithm was implemented using the R scripting language and successfully identified 10 suspected candidate modifiers for RP. This analysis was followed by a validation study that tested seven candidate modifiers and found that the loss of five of them significantly altered the degeneration phenotype and thus can be labeled as a bona fide modifier of disease.
Keywords: Endoplasmic reticulum (ER) stress; K-Means clustering; degenerative models; gene expression; modifier genes; phenotypic variation; retinal apoptosis.