Large scale proteomic studies create novel privacy considerations

Andrew C Hill; Claire Guo; Elizabeth M Litkowski; Ani W Manichaikul; Bing Yu; Iain R Konigsberg; Betty A Gorbet; Leslie A Lange; Katherine A Pratte; Katerina J Kechris; Matthew DeCamp; Marilyn Coors; Victor E Ortega; Stephen S Rich; Jerome I Rotter; Robert E Gerzsten; Clary B Clish; Jeffrey L Curtis; Xiaowei Hu; Ma-En Obeidat; Melody Morris; Joseph Loureiro; Debby Ngo; Wanda K O'Neal; Deborah A Meyers; Eugene R Bleecker; Brian D Hobbs; Michael H Cho; Farnoush Banaei-Kashani; Russell P Bowler

doi:10.1038/s41598-023-34866-6

Large scale proteomic studies create novel privacy considerations

Sci Rep. 2023 Jun 7;13(1):9254. doi: 10.1038/s41598-023-34866-6.

Authors

Andrew C Hill¹, Claire Guo¹, Elizabeth M Litkowski², Ani W Manichaikul³, Bing Yu⁴, Iain R Konigsberg⁵, Betty A Gorbet⁴, Leslie A Lange⁵, Katherine A Pratte¹, Katerina J Kechris⁵, Matthew DeCamp⁵, Marilyn Coors⁵, Victor E Ortega⁶, Stephen S Rich³, Jerome I Rotter⁷, Robert E Gerzsten⁸, Clary B Clish⁹, Jeffrey L Curtis¹⁰, Xiaowei Hu³, Ma-En Obeidat¹¹, Melody Morris¹¹, Joseph Loureiro¹¹, Debby Ngo¹¹, Wanda K O'Neal¹², Deborah A Meyers¹³, Eugene R Bleecker¹³, Brian D Hobbs^{14

15

16}, Michael H Cho^{14

15

16}, Farnoush Banaei-Kashani¹⁷, Russell P Bowler¹⁸

Affiliations

¹ National Jewish Health, Denver, CO, USA.
² Colorado School of Public Health, Fort Collins, CO, USA.
³ Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.
⁴ Department of Epidemiology and Human Genetics Center, UTHealth School of Public Health, Houston, TX, USA.
⁵ University of Colorado - Anschutz Medical Campus, Aurora, CO, USA.
⁶ Mayo Clinic, Rochester, MN, USA.
⁷ Department of Pediatrics, The Institute for Translational Genomics and Population Sciences, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA.
⁸ Division of Cardiovascular Medicine, Cardiovascular Research Center, Beth Israel Deaconess Medical Center, Boston, MA, USA.
⁹ Metabolomics Platform, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA.
¹⁰ University of Michigan, Ann Arbor, MI, USA.
¹¹ Novartis, Basel, Switzerland.
¹² University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
¹³ University of Arizona, Tucson, AZ, USA.
¹⁴ Harvard Medical School, Boston, MA, USA.
¹⁵ Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA.
¹⁶ Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA.
¹⁷ University of Colorado Denver, Denver, CO, USA.
¹⁸ National Jewish Health, Denver, CO, USA. [email protected].

Abstract

Privacy protection is a core principle of genomic but not proteomic research. We identified independent single nucleotide polymorphism (SNP) quantitative trait loci (pQTL) from COPDGene and Jackson Heart Study (JHS), calculated continuous protein level genotype probabilities, and then applied a naïve Bayesian approach to link SomaScan 1.3K proteomes to genomes for 2812 independent subjects from COPDGene, JHS, SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) and Multi-Ethnic Study of Atherosclerosis (MESA). We correctly linked 90-95% of proteomes to their correct genome and for 95-99% we identify the 1% most likely links. The linking accuracy in subjects with African ancestry was lower (~ 60%) unless training included diverse subjects. With larger profiling (SomaScan 5K) in the Atherosclerosis Risk Communities (ARIC) correct identification was > 99% even in mixed ancestry populations. We also linked proteomes-to-proteomes and used the proteome only to determine features such as sex, ancestry, and first-degree relatives. When serial proteomes are available, the linking algorithm can be used to identify and correct mislabeled samples. This work also demonstrates the importance of including diverse populations in omics research and that large proteomic datasets (> 1000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered unidentifiable.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Atherosclerosis* / genetics
Bayes Theorem
Genome-Wide Association Study
Humans
Polymorphism, Single Nucleotide
Privacy
Proteome* / genetics

Substances

Proteome

Abstract

Publication types

MeSH terms

Substances

Grants and funding