Paper 2021/069

Fast Privacy-Preserving Text Classification based on Secure Multiparty Computation

Amanda Resende, Davis Railsback, Rafael Dowsley, Anderson C. A. Nascimento, and Diego F. Aranha

Abstract

We propose a privacy-preserving Naive Bayes classifier and apply it to the problem of private text classification. In this setting, a party (Alice) holds a text message, while another party (Bob) holds a classifier. At the end of the protocol, Alice will only learn the result of the classifier applied to her text input and Bob learns nothing. Our solution is based on Secure Multiparty Computation (SMC). Our Rust implementation provides a fast and secure solution for the classification of unstructured text. Applying our solution to the case of spam detection (the solution is generic, and can be used in any other scenario in which the Naive Bayes classifier can be employed), we can classify an SMS as spam or ham in less than 340ms in the case where the dictionary size of Bob's model includes all words ($n = 5200$) and Alice's SMS has at most $m = 160$ unigrams. In the case with $n = 369$ and $m = 8$ (the average of a spam SMS in the database), our solution takes only 21ms.

Metadata
Available format(s)
PDF
Category
Cryptographic protocols
Publication info
Preprint. MINOR revision.
Contact author(s)
amanda resende @ ic unicamp br
drail @ uw edu
rafael dowsley @ monash edu
andclay @ uw edu
dfaranha @ cs au dk
History
2021-06-08: revised
2021-01-22: received
See all versions
Short URL
https://ia.cr/2021/069
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2021/069,
      author = {Amanda Resende and Davis Railsback and Rafael Dowsley and Anderson C.  A.  Nascimento and Diego F.  Aranha},
      title = {Fast Privacy-Preserving Text Classification based on Secure Multiparty Computation},
      howpublished = {Cryptology {ePrint} Archive, Paper 2021/069},
      year = {2021},
      url = {https://eprint.iacr.org/2021/069}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.