Cryptology ePrint Archive: Report 2014/417

Using Random Error Correcting Codes in Near-Collision Attacks on Generic Hash-Functions

Inna Polak, Adi Shamir

Abstract: In this paper we consider the problem of finding a near-collision with Hamming distance bounded by $r$ in a generic cryptographic hash function $h$ whose outputs can be modeled as random $n$-bit strings. In 2011, Lamberger suggested a modified version of Pollard's rho method which computes a chain of values by alternately applying the hash function $h$ and an error correcting code $e$ to a random starting value $x_{0}$ until it cycles. This turns some (but not all) of the near-collisions in $h$ into full collisions in $f=e\circ h$, which are easy to find. In 2012, Leurent improved Lamberger's memoryless algorithm by using any available amount of memory to store the endpoints of multiple chains of $f$ values, and using Van Oorschot and Wiener's algorithm to find many full collisions in $f$, hoping that one of them will be an $r$-near-collision in $h$. This is currently the best known time/memory tradeoff algorithm for the problem.

The efficiency of both Lamberger's and Leurent's algorithms depend on the quality of their error correction code. Since they have to apply error correction to \emph{any} bit string, they want to use perfect codes, but all the known constructions of such codes can correct only $1$ or $3$ errors. To deal with a larger number of errors, they recommend using a concatenation of many Hamming codes, each capable of correcting a single error in a particular subset of the bits, along with some projections. As we show in this paper, this is a suboptimal choice, which can be considerably improved by using randomly chosen linear codes instead of Hamming codes and storing a precomputed lookup table to make the error correction process efficient. We show both theoretically and experimentally that this is a better way to utilize the available memory, instead of devoting all the memory to the storage of chain endpoints. Compared to Leurent's algorithm, we demonstrate an improvement ratio which grows with the size of the problem. In particular, we experimentally verified an improvement ratio of about $3$ in a small example with $n=160$ and $r=33$ which we implemented on a single PC, and mathematically predicted an improvement ratio of about $730$ in a large example with $n=1024$ and $r=100$, using $2^{40}$ memory.

Category / Keywords: hash function, near-collision, random-code, time-memory trade-off, generic attack

Date: received 2 Jun 2014

Contact author: innapolak at gmail com

Available format(s): PDF | BibTeX Citation

Version: 20140605:203755 (All versions of this report)

Discussion forum: Show discussion | Start new discussion


[ Cryptology ePrint archive ]