Supporting Regularized Logistic Regression Privately and Efficiently

Wenfa Li; Hongzhe Liu; Peng Yang; Wei Xie

doi:10.1371/journal.pone.0156479

Supporting Regularized Logistic Regression Privately and Efficiently

PLoS One. 2016 Jun 6;11(6):e0156479. doi: 10.1371/journal.pone.0156479. eCollection 2016.

Authors

Wenfa Li¹, Hongzhe Liu¹, Peng Yang¹, Wei Xie²

Affiliations

¹ Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing, 100101, China.
² Department of Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN 37232, United States of America.

Abstract

As one of the most popular statistical and machine learning models, logistic regression with regularization has found wide adoption in biomedicine, social sciences, information technology, and so on. These domains often involve data of human subjects that are contingent upon strict privacy regulations. Concerns over data privacy make it increasingly difficult to coordinate and conduct large-scale collaborative studies, which typically rely on cross-institution data sharing and joint analysis. Our work here focuses on safeguarding regularized logistic regression, a widely-used statistical model while at the same time has not been investigated from a data security and privacy perspective. We consider a common use scenario of multi-institution collaborative studies, such as in the form of research consortia or networks as widely seen in genetics, epidemiology, social sciences, etc. To make our privacy-enhancing solution practical, we demonstrate a non-conventional and computationally efficient method leveraging distributing computing and strong cryptography to provide comprehensive protection over individual-level and summary data. Extensive empirical evaluations on several studies validate the privacy guarantee, efficiency and scalability of our proposal. We also discuss the practical implications of our solution for large-scale studies and applications from various disciplines, including genetic and biomedical studies, smart grid, network analysis, etc.

MeSH terms

Computer Communication Networks / organization & administration
Computer Communication Networks / standards
Computer Communication Networks / statistics & numerical data
Computer Security*
Confidentiality
Cooperative Behavior
Humans
Information Dissemination / methods*
Logistic Models*
Machine Learning* / standards
Models, Statistical
Privacy*

Grants and funding

This work was partly supported by The National Nature Science Foundation of China (No. 61300078), and The Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions (CIT&TCD20130320, CIT&TCD201504039), and Funding Project for Academic Human Resources Development in Beijing Union University (Zk80201403, Rk100201510). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.