Cryptology ePrint Archive: Report 2017/148

Pattern Matching on Encrypted Streams: Applications to DPI and searches on genomic data

Olivier Sanders and Cristina Onete and Pierre-Alain Fouque

Abstract: Pattern matching is essential to applications such as filtering content in Data streams, searching on genomic data, and searching for correlation in medical data. However, increasing concerns of user and data privacy, exacerbated by threats of mass surveillance, have made the use of encryption practically standard for personal data. Hence, entities performing pattern-matching on data they do not own must now be able to provide the functionality of keyword-search on \emph{encrypted} data.

Existent solutions in searchable encryption suffer from one of two main disadvantages: either an exhausted list of keywords needs to be hardcoded in the input ciphertexts, or the input must be \emph{tokenized}, massively increasing the size of the ciphertext. In both cases, the symmetric-key approach provides faster encryption, but also induces a token-re-generation step at each instantiation (\ie, essentially, for each user). Such approaches are not well-suited when either the data owner is unable to choose all the relevant keywords, or when a single searcher (\eg, an IDS, a firewall, or an independent medical researcher) must screen ciphertexts from many different ownerships. Fast symmetric searchable encryption alternatives (SSE) also come with an extensive leakage, which is not well understood and has recently been under attack.

In this work, we introduce Searchable Encryption with Shiftable Trapdoors (SEST), a new primitive, which allows for pattern matching with universal tokens, \ie, trapdoors which can function on ciphertexts produced by multiple entities, and which allow to match keywords of arbitrary lengths to arbitrary ciphertexts. Our approach relies on a public-key encryption scheme and on bilinear pairings. We essentially project the plaintext bit by bit on a multiplicative basis consisting of powers of a secret key. The keyword is also projected on the same basis, with the order of its bits encoded as a polynomial of degree equal to the keyword length. The searching entity receives unforgeable trapdoors for requested keywords, and can match these against the input ciphertexts, thus finding out whether the pattern matched, and at what position of the plaintext the keyword can be found. In addition, very minor modifications to our solution enable it to take into account regular expressions, such as fully- or partly-unknown characters in a keyword (namely wildcards or interval/subset searches).

Our scheme is a variation of Rabin-Karp and has many potential applications in deep-packet inspection on encrypted streams, searching on genomic data, as well as searching on encrypted structured data. Compared to other alternatives in the literature, our trapdoor size is only linear in the keyword length (and independent of the plaintext size), and we prove that the leakage to the searcher is only the trivial one, namely the ability to distinguish based on different search results of a single trapdoor on two different plaintexts. Although our proofs use a (marginally) interactive assumption, we argue that this is a relatively small price to pay for the flexibility and privacy that we are able to attain.

Category / Keywords: cryptographic protocols / searchable encryption, pattern-matching, cloud computing, pairings

Date: received 16 Feb 2017

Contact author: olivier sanders at orange com

Available format(s): PDF | BibTeX Citation

Version: 20170222:153436 (All versions of this report)

Short URL: ia.cr/2017/148

Discussion forum: Show discussion | Start new discussion


[ Cryptology ePrint archive ]