Paper 2018/1086

Two Party Distribution Testing: Communication and Security

Alexandr Andoni, Tal Malkin, and Negev Shekel Nosatzki

Abstract

We study the problem of discrete distribution testing in the two-party setting. For example, in the standard closeness testing problem, Alice and Bob each have $t$ samples from, respectively, distributions $a$ and $b$ over $[n]$, and they need to test whether $a=b$ or $a,b$ are $\varepsilon$-far (in the $\ell_1$ distance) for some fixed $\varepsilon>0$. This is in contrast to the well-studied one-party case, where the tester has unrestricted access to samples of both distributions, for which optimal bounds are known for a number of variations. Despite being a natural constraint in applications, the two-party setting has evaded attention so far. We address two fundamental aspects of the two-party setting: 1) what is the communication complexity, and 2) can it be accomplished securely, without Alice and Bob learning extra information about each other's input. Besides closeness testing, we also study the independence testing problem, where Alice and Bob have $t$ samples from distributions $a$ and $b$ respectively, which may be correlated; the question is whether $a,b$ are independent of $\epsilon$-far from being independent. Our contribution is three-fold: $\bullet$ Communication: we show how to gain communication efficiency as we have more samples, beyond the information-theoretic bound on $t$. Furthermore, the gain is polynomially better than what one may obtain by adapting one-party algorithms. For the closeness testing, our protocol has communication $s = \tilde{\Theta}_{\varepsilon}\left(n^2/t^2\right)$ as long as $t$ is at least the information-theoretic minimum number of samples. For the independence testing over domain $[n] \times [m]$, where $n\ge m$, we obtain $s = \tilde{O}_{\varepsilon}(n^2 m/t^2 + n m/t + \sqrt{m})$. $\bullet$ Lower bounds: we prove tightness of our trade-off for the closeness testing, as well as that the independence testing requires tight $\Omega(\sqrt{m})$ communication for unbounded number of samples. These lower bounds are of independent interest as, to the best of our knowledge, these are the first 2-party communication lower bounds for testing problems, where the inputs represent a set of i.i.d. samples. $\bullet$ Security: we define the concept of secure distribution testing and argue that it must leak at least some minimal information when the promise is not satisfied. We then provide secure versions of the above protocols with an overhead that is only polynomial in the security parameter.

Metadata
Available format(s)
PDF
Category
Cryptographic protocols
Publication info
Published elsewhere. MAJOR revision.Privacy Preserving Machine Learning, NIPS 2018 workshop
Keywords
foundationssecure computationdistribution testingsublinear algorithms
Contact author(s)
tal @ cs columbia edu
History
2018-11-09: received
Short URL
https://ia.cr/2018/1086
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2018/1086,
      author = {Alexandr Andoni and Tal Malkin and Negev Shekel Nosatzki},
      title = {Two Party Distribution Testing: Communication and Security},
      howpublished = {Cryptology ePrint Archive, Paper 2018/1086},
      year = {2018},
      note = {\url{https://eprint.iacr.org/2018/1086}},
      url = {https://eprint.iacr.org/2018/1086}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.