**Using Random Error Correcting Codes in Near-Collision Attacks on Generic Hash-Functions**

*Inna Polak, Adi Shamir*

**Abstract: **In this paper we consider the problem of finding a near-collision
with Hamming distance bounded by $r$ in a generic cryptographic hash
function $h$ whose outputs can be modeled as random $n$-bit strings.
In 2011, Lamberger suggested a modified version of Pollard's rho method
which computes a chain of values by alternately applying the hash
function $h$ and an error correcting code $e$ to a random starting
value $x_{0}$ until it cycles. This turns some (but not all) of the
near-collisions in $h$ into full collisions in $f=e\circ h$, which
are easy to find. In 2012, Leurent improved Lamberger's memoryless
algorithm by using any available amount of memory to store the endpoints
of multiple chains of $f$ values, and using Van Oorschot and Wiener's
algorithm to find many full collisions in $f$, hoping that one of
them will be an $r$-near-collision in $h$. This is currently the
best known time/memory tradeoff algorithm for the problem.

The efficiency of both Lamberger's and Leurent's algorithms depend on the quality of their error correction code. Since they have to apply error correction to \emph{any} bit string, they want to use perfect codes, but all the known constructions of such codes can correct only $1$ or $3$ errors. To deal with a larger number of errors, they recommend using a concatenation of many Hamming codes, each capable of correcting a single error in a particular subset of the bits, along with some projections. As we show in this paper, this is a suboptimal choice, which can be considerably improved by using randomly chosen linear codes instead of Hamming codes and storing a precomputed lookup table to make the error correction process efficient. We show both theoretically and experimentally that this is a better way to utilize the available memory, instead of devoting all the memory to the storage of chain endpoints. Compared to Leurent's algorithm, we demonstrate an improvement ratio which grows with the size of the problem. In particular, we experimentally verified an improvement ratio of about $3$ in a small example with $n=160$ and $r=33$ which we implemented on a single PC, and mathematically predicted an improvement ratio of about $730$ in a large example with $n=1024$ and $r=100$, using $2^{40}$ memory.

**Category / Keywords: **hash function, near-collision, random-code, time-memory trade-off, generic attack

**Date: **received 2 Jun 2014

**Contact author: **innapolak at gmail com

**Available format(s): **PDF | BibTeX Citation

**Version: **20140605:203755 (All versions of this report)

**Short URL: **ia.cr/2014/417

[ Cryptology ePrint archive ]