Paper 2019/1113

Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector

Oliver Masters, Hamish Hunt, Enrico Steffinlongo, Jack Crawford, Flavio Bergamaschi, Maria E. Dela Rosa, Caio C. Quini, Camila T. Alves, Feranda de Souza, and Deise G. Ferreira

Abstract

Machinelearning(ML)istodaycommonlyemployedintheFinancialServicesSector(FSS) to create various models to predict a variety of conditions ranging from financial transactions fraud to outcomes of investments and also targeted marketing campaigns. The common ML technique used for the modeling is supervised learning using regression algorithms and usually involves large amounts of data that needs to be shared and prepared before the actual learning phase. Compliance with privacy laws and confidentiality regulations requires that most, if not all, of the data must be kept in a secure environment, usually in-house, and not outsourced to cloud or multi-tenant shared environments. This paper presents the results of a research collaboration between IBM Research and Banco Bradesco SA to investigate approaches to homomorphically secure a typical ML pipeline commonly employed in the FSS industry. We investigated and de-constructed a typical ML pipeline used by Banco Bradesco and applied Homo- morphic Encryption (HE) to two of the important ML tasks, namely the variable selection phase of the model generation task and the prediction task. Variable selection, which usually precedes the training phase, is very important when working with data sets for which no prior knowledge of the covariate set exists. Our work provides a way to define an initial covariate set for the training phase while preserving the privacy and confidentiality of the input data sets. Quality metrics, using real financial data, comprising quantitative, qualitative and categorical features, demonstrated that our HE based pipeline can yield results comparable to state of the art variable selection techniques and the performance results demonstrated that HE technology has reached the inflection point where it can be useful in batch processing in a financial business setting.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Published elsewhere. Minor revision. RWC 2020
Keywords
homomorphic encryptionvariable reductionvariable selectionfeature selectionprediction
Contact author(s)
flavio @ uk ibm com
hamishun @ uk ibm com
History
2019-12-22: last of 2 revisions
2019-10-01: received
See all versions
Short URL
https://ia.cr/2019/1113
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2019/1113,
      author = {Oliver Masters and Hamish Hunt and Enrico Steffinlongo and Jack Crawford and Flavio Bergamaschi and Maria E.  Dela Rosa and Caio C.  Quini and Camila T.  Alves and Feranda de Souza and Deise G.  Ferreira},
      title = {Towards a Homomorphic Machine Learning Big Data Pipeline for the Financial Services Sector},
      howpublished = {Cryptology {ePrint} Archive, Paper 2019/1113},
      year = {2019},
      url = {https://eprint.iacr.org/2019/1113}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.