This paper presents a secure outsourcing solution to assess logistic regression models for quantitative traits to test their associations with genotypes. We adapt the semi-parallel training method by Sikorska et al., which builds a logistic regression model for covariates, followed by one-step parallelizable regressions on all individual single nucleotide polymorphisms (SNPs). In addition, we modify our underlying approximate homomorphic encryption scheme for performance improvement.
We evaluated the performance of our solution through experiments on real-world dataset. It achieves the best performance of homomorphic encryption system for GWAS analysis in terms of both complexity and accuracy. For example, given a dataset consisting of 245 samples, each of which has 10643 SNPs and 3 covariates, our algorithm takes about 43 seconds to perform logistic regression based genome wide association analysis over encryption. We demonstrate the feasibility and scalability of our solution.
Category / Keywords: applications / Homomorphic encryption and Genome-wide association studies and Logistic regression Original Publication (in the same form): BMC Medical Genomics Date: received 13 Mar 2019, last revised 22 May 2020 Contact author: miran kim at uth tmc edu Available format(s): PDF | BibTeX Citation Note: The manuscript is the latest version. Version: 20200523:055334 (All versions of this report) Short URL: ia.cr/2019/294