Structural features of proteins capture underlying information about protein evolution and function, which enhances the analysis of proteomic and transcriptomic data. Here, we develop Structural Analysis of Gene and protein Expression Signatures (SAGES), a method that describes expression data using features calculated from sequence-based prediction methods and 3D structural models. We used SAGES, along with machine learning, to characterize tissues from healthy individuals and those with breast cancer. We analyzed gene expression data from 23 breast cancer patients and genetic mutation data from the Catalog of Somatic Mutations In Cancer database as well as 17 breast tumor protein expression profiles. We identified prominent expression of intrinsically disordered regions in breast cancer proteins as well as relationships between drug perturbation signatures and breast cancer disease signatures. Our results suggest that SAGES is generally applicable to describe diverse biological phenomena including disease states and drug effects.
Keywords: Bioinformatics; Biological sciences; Computer science.
© 2024 The Authors.