Paper 2018/662

Efficient Logistic Regression on Large Encrypted Data

Kyoohyung Han, Seungwan Hong, Jung Hee Cheon, and Daejun Park

Abstract

Machine learning on encrypted data is a cryptographic method for analyzing private and/or sensitive data while keeping privacy. In the training phase, it takes as input an encrypted training data and outputs an encrypted model without using the decryption key. In the prediction phase, it uses the encrypted model to predict results on new encrypted data. In each phase, no decryption key is needed, and thus the privacy of data is guaranteed while the underlying encryption is secure. It has many applications in various areas such as finance, education, genomics, and medical field that have sensitive private data. While several studies have been reported on the prediction phase, few studies have been conducted on the training phase due to the inefficiency of homomorphic encryption (HE), leaving the machine learning training on encrypted data only as a long-term goal. In this paper, we propose an efficient algorithm for logistic regression on encrypted data, and evaluate our algorithm on real financial data consisting of 422,108 samples over 200 features. Our experiment shows that an encrypted model with a sufficient Kolmogorov Smirnow statistic value can be obtained in $\sim$17 hours in a single machine. We also evaluate our algorithm on the public MNIST dataset, and it takes $\sim$2 hours to learn an encrypted model with 96.4% accuracy. Considering the inefficiency of HEs, our result is encouraging and demonstrates the practical feasibility of the logistic regression training on large encrypted data, for the first time to the best of our knowledge.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Preprint. MINOR revision.
Keywords
implementationmachine learninghomomorphic encryption
Contact author(s)
swanhong @ snu ac kr
History
2018-07-10: revised
2018-07-10: received
See all versions
Short URL
https://ia.cr/2018/662
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2018/662,
      author = {Kyoohyung Han and Seungwan Hong and Jung Hee Cheon and Daejun Park},
      title = {Efficient Logistic Regression on Large Encrypted Data},
      howpublished = {Cryptology {ePrint} Archive, Paper 2018/662},
      year = {2018},
      url = {https://eprint.iacr.org/2018/662}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.