Quantum indistinguishability of random sponges

,


Introduction
Originally introduced in the context of cryptographic hash functions, the sponge construction [2] became one of the most widely used constructions in symmetric cryptography.Consequently, sponges get used in keyed constructions, including message authentication codes (MAC), stream ciphers, and authenticated encryption (AE), see e.g.[5,4,7,15,18,1,11].For all these applications it is either necessary or at least sufficient for security if a secretly keyed sponge is indistinguishable from a random function.That this is indeed the case was already shown in the original security proof for the sponge construction [3] where cryptographic sponges were shown to be indifferentiable from random functions.This result is widely applicable and consequently was followed up with several improved bounds for specific applications.Recent works [15,1,11] improved the bound for the setting of indistinguishability of secretly keyed sponges.
While these results show the applicability of the sponge construction in today's computing environment, they leave open the question of its applicability in a future post-quantum setting where adversaries have access to quantum computers.Such an attacker can for example run Shor's algorithm [20] to break the security of constructions based on the RSA or discretelogarithm problem.While such constructions are hardly ever considered for practical symmetric cryptography due to their slow operations, the impact of quantum adversaries goes beyond Shor's algorithm.Conventional security proofs, especially in idealized models, might break down in the light of quantum attackers who are allowed to ask queries in superposition [8].Going even further, allowing adversaries superposition access to secretly keyed primitives, it was shown that several well known MACs and encryption schemes, including CBC-MAC and the Even-Mansour block cipher become insecure [14,12,19].While these latter attacks are not applicable in the post-quantum setting, they are indications that secret-key cryptography does not trivially withstand quantum adversaries and that it is necessary to study the security of symmetric cryptography in the post-quantum setting.
In this work we do exactly this: We study the security of secretly keyed sponges against quantum adversaries.
Sponges.The sponge construction [2] is an eXtendable Output Function (XOF) that maps arbitrary-length inputs to outputs of a length specified by an additional input.The construction operates on an (r + c)-bit state.The parameter r is called the rate and the parameter c is called the capacity.The first r bits of the state are called the outer part or outer state, the remaining c bits are called the inner part or inner state.The sponge uses an internal function f mapping (r + c)-bit strings to (r + c)-bit strings.To process a message consisting of several r-bit blocks, the sponge alternates between mixing a new message block into the outer state and applying f , as shown in Figure 1.When all message blocks are processed (i.e.absorbed into the internal state) the sponge can be squeezed to produce outputs by alternating between applying f and outputting the outer state.We write Sponge f for the sponge using f as internal function.

Absorbing phase
Squeezing phase Sponges can be keyed in several ways.For example, the state can be initialized with the key, referred to as root-keyed sponge in [1].Another option is to just apply the sponge on the concatenation of key and message.This was called the keyed sponge in [4] and the outer-keyed sponge in [1].The last and for us most relevant concept is keying the sponge by replacing f with a keyed function f K .For the special case of f K being a single-key Even-Mansour construction this was called E-M keyed sponge construction in [10] and later the inner-keyed sponge in [1].We refer to the general case for any keyed function f K as keyed-internal-function sponge.
Our results.As main result, we prove that the sponge construction using a random function or permutation is quantumly indistinguishable from a random function (see Theorems 8 and 16).This result can be used to obtain a quantum version of Theorem 1 from [1] (see Theorem 12) which states that the indistinguishability of keyed-internal-function sponges can be derived from the quantum-PRF-security (or quantum-PRP-security in case of a block-cipher) of the keyed internal function.Thereby we not only provide a proof for the security of keyed-internalfunction sponges in the post-quantum setting, but even in the stronger quantum settings where the adversary gets full quantum-access to the keyed-internal-function sponge, i.e we prove that keyed-internal-function sponges are quantum PRFs.
Another implication of our result is that the quantum attacks against CBC-MAC mentioned above can be prevented using a state with a non-trivial inner part.The authors of the attack already noted1 that their attack does not work in this case.More specifically, CBC-MAC can be viewed as full-width sponge (where the state has no inner part, i.e., the capacity is 0).On the other hand, a CBC-MAC where all message blocks are padded with 0 c and the output is truncated to the first r bits can be viewed as an keyed-internal-function sponge.Hence, our result applies and shows that the quantum attacks by Kaplan, Leurent, Leverrier, and Naya-Plasencia [12] and Santoli, and Schaffner [19] using Simon's algorithm are not applicable any longer.Even more, our result proves that this little tweak of CBC-MAC indeed results in a quantum secure MAC.
In Appendix A we show a direct proof of indistinguishability for f being a random permutation.In this proof we state and prove Lemma 19 that generalizes the average case polynomial method to allow for functions that are not necessarily polynomials but are close to one; this result is not necessary to achieve the main goal of the paper but might be useful in other works using similar techniques.A limitation.The authors of [1] use their Theorem 1 to show security of inner-keyed sponges using the PRP-security of single-key Even-Mansour.Their result does not carry over to the quantum setting as Even-Mansour is vulnerable in the quantum setting [14].This does not lead an actual attack on inner-keyed sponges in the quantum setting.The attack needs access to the full input to the Even-Mansour cipher, which is never the case for inner-keyed sponges as long as a non-trivial inner state is used.However, the attack on Even-Mansour does render the modular proof strategy not applicable for inner-keyed sponges.Our approach.The main technical contribution of our work is a proof that the probability for any given input-output behavior of Sponge f is a polynomial in the capacity of the sponge.This observation allows us then to apply the average-case polynomial method of [22] (see Theorem 4 below).
In more detail, recall that the capacity of a Sponge f is the size of the inner state (there are 2 c possible inner states for a sponge as in Figure 1).If the capacity of a sponge increases, it becomes less and less likely that there are collisions in the inner state.Hence for infinite capacity, the inner states are unique and so the internal functions are called on unique inputs and therefore, the sponge behaves like a random function.Our proof formalizes this intuition by carefully analyzing the probabilities for q given input-output values of the sponge in terms of the capacity.We show that these probabilities are in fact polynomials in the inverse of the capacity of degree at most q times the length of the input-output values.We refer to Lemma 9 for the formal statement.
By establishing the capacity as this crucial parameter, we fit directly into the proof technique from [22] that uses approximating polynomials of low degree to show closeness of distributions and in turn small quantum distinguishing advantage.By the PRF/PRP switching lemma from [23], quantum indistinguishability also holds for the case of f being a random permutation.In the appendix, we provide an alternative proof for this case by generalizing the proof technique of [22] to the case of permutations.
Organization.Section 2 introduces the definition of quantum indistinguishability and other notions used throughout this work.In Section 3 we extend the above informal discussion of the sponge construction with a more formal description.At the end of the section we show that Sponge f is indistinguishable from a random oracle in the conventional-access setting (in contrast to the quantum-access model).In Section 4 we state the main result of our paper as well as several derived results.Section 5 contains an example proof valid for limited distinguishers but giving sufficient details to understand our approach and verify correctness without all the particulars of the full proof.Section 6 contains the proof of Lemma 9, the main technical result of this work.The case of random permutations is covered in Section 7. We conclude the paper with Section 8 discussing some open problems related to the problem we analyze and related work.

Preliminaries and Tools
In the Symbol Index 9 we list the most important notation used in our paper.We use small caps for algorithms, CAPITAL letters for strings, CAPITAL and boldface for arrays of strings.lower case, italic letters denote parameters and counters.Functions are denoted by lower case boldface letters.Sets are denoted with CAPIT AL calligraphic letters.Finally distributions are denoted with CAPITAL letters using fraktur font.The general guideline for denoting elements of different sets is that we use a different letter together with indices.So if A is some set then A i is the i-th element of that set.If we write A i that means that the set A i is a member of some family.

Quantum threat model
The quantum threat model we consider allows the adversary to query oracles in superposition.Oracles are modeled as unitary operators U h acting on computational basis states as follows The adversary is considered to have access to a fault-tolerant (perfect) quantum computer.We do not provide more details on quantum computing as we do not directly require it here, but we refer to [17] instead.

Distributions
A distribution D on a set X is a function D : X → [0, 1] such that X∈X D(X) = 1.We denote sampling X from X according to D by X ← D. Y X denotes the set of functions {f : X → Y}.If D is a distribution on Y then D X denotes a distribution on Y X where the output for each input is chosen independently according to D. By $ ← X we denote sampling uniformly at random from the set X .

Classical and Quantum Indistinguishability
By classical indistinguishability we mean a feature of two distributions that are hard to distinguish if only polynomially many classical queries are allowed.The mentioned polynomial is evaluated on the security parameter.Note however that we have not yet specified it.For now though we leave it implicit, the security parameter will be specified for the particular construction we are going to analyze.In the following we are going to use functions N → R that for big enough argument are smaller than any inverse polynomial, they are called negligible functions.
Definition 1 (Classical Indistinguishability).Two distributions D 1 and D 2 over a set Y X are computationally classically indistinguishable if no quantum algorithm A can distinguish D 1 from D 2 using a polynomial number of classical queries.That is, for all A, there is a negligible function such that We write A g to denote that adversary A has classical oracle access to g.We will use the following generalization of the above definition to specify our goal.
Definition 2 (Quantum Indistinguishability [22]).Two distributions D 1 and D 2 over a set Y X are computationally quantumly indistinguishable if no quantum algorithm A can distinguish D 1 from D 2 using a polynomial number of quantum queries.That is, for all A, there is a negligible function such that We write A |g to denote that adversary A has quantum oracle access to g, i.e. she can query g on a superposition of inputs.
In what follows the setting that we focus on is indistinguishability from a random oracle.The first distribution is the one analyzed and the other is the uniform distribution over the set of all functions from X to Y, i.e.Y X .Sampling a uniformly random function is denoted by $ ← Y X .

Main tools
In this section we describe the proof technique-based on approximating polynomials-that proves useful when dealing with notions like quantum indistinguishability.In the following [q] := {1, 2, . . ., q}.
Theorem 3 (Theorem 3.1 in [24]).Let A be a quantum algorithm making q quantum queries to an oracle h : X → Y.If we draw h from some distribution D, then the quantity The intuition behind the above theorem is that with q queries the amplitudes of the quantum state of the algorithm depend on at most q input-output pairs.The probability of any outcome is a linear combination of squares of amplitudes, that is why we have 2q input-output pairs in the probability function.Finally as the probability of any measurement depends on just 2q input-output pairs the same holds for the algorithm's output probability.All the information about h comes from the queries A made.
We use the above theorem together with statements about approximating polynomials to connect the probability of some input-output behavior of a function from a given distribution with the probability of the adversary distinguishing two distributions.
Theorem 4 (Theorem 7.3 in [22]).Fix q, and let F t be a family of distributions on Y X indexed by t ∈ Z + ∪{∞}.Suppose there is an integer d such that for every 2q pairs ∀i ∈ Then for any quantum algorithm A making at most q quantum queries, the output distribution under F t and F ∞ are π 2 d 3 /3t-close This theorem is an average case version of the polynomial method often used in complexity theory.If the polynomial approximating the ideal behavior of h ← F ∞ is of low degree the distance between polynomials must be small.

Definition of Sponges
While an informal explanation of sponges was given in the introduction, we now give a more formal definition.
We define a sponge-compliant padding as: Definition 5 (Definition 1 in [6]).A padding rule is sponge-compliant if it never results in the empty string and if it satisfies the following criterion: where denotes concatenation of bit strings.
A formal definition of the construction is provided as Algorithm 1.Note that ⊕ denotes the bitwise XOR, |P| r denotes the number of blocks of length r in P, P i is the i-th block of P and Z are the first bits of Z.

Classical indistinguishability of random Sponges
In the following we state the indistinguishability result in the classical domain.We use the following notation for a set of arbitrary finite-length bit strings: we usually denote this set by M. Before we proceed let us define what we mean by a random oracle.
Definition 6 (Random Oracle).A random oracle is sampled from a distribution R on functions from M × N to M, where M := {0, 1} * .We define h ← R as follows: • Choose g uniformly at random from {g : M → {0, 1} ∞ }, where by {0, 1} ∞ we denote the set of infinitely long bit-strings.
• For each (X, ) ∈ M × N set h(X, ) := g(X) , that is output the first bits of the output of g.
Theorem 7 (Classical indistinguishability of Sponge).If f is a random transformation or a random permutation then Sponge f defined in Alg. 1 is classically indistinguishable from a random oracle.Namely for all quantum algorithms A making polynomially many classical queries there is a negligible function such that where S = {0, 1} r+c , and R is defined according to Definition 6.
Proof.The proof follows closely the proof of Theorem 2 of [2].Even though we give more power to the adversary giving her access to a quantum computer, the queries are considered to be classical.All arguments in the proof of Bertoni and others depend only on the queries made by the adversary and not her computing power.For that reason we can use the result of [16], which states that a query-based classical result easily translates to the quantum case if we do not change the query model.

Random Sponges are quantumly indistinguishable from random oracles
We want to show that the distribution corresponding to random sponges is quantumly indistinguishable from a random oracle.We can define a family of distributions indexed by the security parameter that intuitively gets closer to a random oracle with increasing parameter.For that reason Theorem 4 is a perfect theoretical tool to be used.The relevant tasks that remain are to identify the family of distributions that correspond to our figure of merit, to show that in fact the most secure member of the family with t = ∞ is a random oracle, and to prove that the assumptions of Theorem 4 are fulfilled.
The security parameter in Sponge is the capacity; we parametrize the family of random sponges by the size of the inner state space t = 2 c .Intuitively speaking, for c → ∞ each evaluation of the internal function is done with a different inner state.In this case irrespective of the input, the output is a completely random string, which is the definition of a random oracle (RO).Hence we conclude that we identified a family of distributions that is well suited to be used with Theorem 4. If we show that indeed for t = ∞ the member of the family is the random oracle we have that: We are left with the task to prove the left-hand side of the above statement.The assumption of Theorem 4 is that the probability of witnessing any input-output behavior on q queries is a polynomial in 1/2 c .At this point we stumble upon a problem with the set of indices.If we want to use the statement about closeness of polynomials we have to show that p is a polynomial for any inverse integer and not only for 2 −c .This difficulty brings us to the definition of the generalized sponge construction SpGen.The only difference between SpGen and Sponge is the space of inner states, we change it from {0, 1} c to any finite-size set C. This modification solves the problem of defining distributions for any integer, not only powers of 2. It remains to prove that p(|C| −1 ) is in fact a polynomial in |C| −1 , where by |C| we denote cardinality of the set.With that statement proven we fulfill the assumptions of Theorem 4 and show quantum indistinguishability of SpGen, which implies the same for Sponge.
In Algorithm 2 we present a generalization of Sponge.The set of inner states is denoted by C and can be any finite set, to be specified by the user.The internal function is generalized to any map ϕ f : {0, 1} r × C → {0, 1} r × C. In the following we denote the part of the entire state S in {0, 1} r by S and call it the outer part and the part in C by Ŝ, we will refer to it as the inner part of a state.
Let us now formally state the main claim of this paper.We are going to focus on the internal function being modeled as a random function, in Section 7 though, we are going to cover the case of random permutations.Theorem 8. SpGen ϕ f for random ϕ f is quantumly indistinguishable from a random oracle.More concretely, for all quantum algorithms A making at most q quantum queries to SpGen, such that the input length is at most m • r bits long and the output length is at most z • r bits long, where η := 2q(m + z − 2) and R is defined according to Definition 6.The domain is defined as S = {0, 1} r × C for some non-empty finite set C.
Before we prove the above theorem we state the main technical lemma.For a fixed q and for every (M, Z) (ii) and the coefficient All coefficients a j are real, and the degree of the polynomial equals η := 2q(m + z − 2).In the equation describing a 0 we use δ(M, Z, i) to denote a Boolean function that is 0 if M i is input more than once and Z i is not the longest output of SpGen on M i or is inconsistent with other outputs (inputting the same message for the second time should yield the same output) and is 1 otherwise.
The full proof is presented in Section 6.
Proof idea.Our goal is to explicitly evaluate We base all of our discussion on two facts: SpGen has a structure that we know and it involves multiple evaluations of the internal function ϕ f .ϕ f is a random function with well specified probability of yielding some output on a given input.The main idea of our approach is to extract terms like P[ϕ f (S 1 ) = S 2 ] for some states S 1 , S 2 from the overall probability expression and evaluate them.
Let us go through a more detailed plan of the proof.Fix (M, Z) and set i := |Z i |.
In the first step we include all intermediate states in the probabilistic event ∀i ∈ [2q] : SpGen ϕ f (M i , i ) = Z i .We write explicitly all inner states and outer states not specified by the input-output pairs (M, Z).Next we rewrite the full probability expression in the form The sum comes from the fact that there are many possible intermediate states that yield the given input-output behavior.The product is the result of using Bayes' rule to isolate a single evaluation of ϕ f in the probability.To correctly evaluate the summands we need to analyze all states in P[ϕ f (S 1 ) = S 2 | . . .] from the perspective of uniqueness-we say a state is unique if it is input to ϕ f just a single time.Given a specific setup of unique states in all 2q evaluations of SpGen we can easily evaluate the probabilities, as the only thing we need to know is that ϕ f is random.The final step of the proof is to calculate the number of states in the sum.We sum over all values of states that fulfill the constraints of ∀i ∈ [2q] : SpGen ϕ f (M i , i ) = Z i and ϕ f being a function.The previous analysis of uniqueness of states makes it easier to include the latter constraint; non-unique states have predetermined outputs under ϕ f decreasing the number of possible states.After those steps we end up with an explicit expression for P ∀i ∈ [2q] : SpGen ϕ f (M i , i ) = Z i , which allows us to show that p is a polynomial of the claimed degree and its limit in t → ∞, i.e. the coefficient a 0 is the probability of uniformly random outputs.
Proof of Theorem 8. Let us define a family F t indexed by t ∈ N∪{∞}, t > 0. F t is a distribution on functions from M × N to M, where M := {0, 1} * .The family is additionally parametrized by the choice of r ∈ N and a sponge-compliant padding function pad.We define h ← F t as follows: • Choose ϕ f uniformly at random from S S , where S := {0, 1} r × C and C is any finite set of size t > 0.
• Use ϕ f , C, the fixed r, and pad to construct To show that we defined F t in the right way, let us analyze Eq. ( 8) from the point of view of the newly defined distribution.On the one hand from our definition it follows that where the first equality follows from our definition of h and the second from the fact that all randomness in F t comes from choosing a random function ϕ f .On the other hand if we take t → ∞ the internal function is going to be injective on its inner part.Namely φf -the internal function with its output restricted to the inner part-is injective.That implies a different inner state in every evaluation of ϕ f in SpGen what in turn implies a random and independent outer part in every step of generating the output, formally This intuition is formally captured by Statement (ii) of Lemma 9, where we state that in the limit of |C| → ∞ the probability of getting particular outputs of SpGen is the same as for a random oracle.
From the above discussion we get that which is the crucial equality for using Theorem 4 to prove our statement.The last element of the proof is the assumption about p being a polynomial and that is exactly the statement of Lemma 9.
Quantum indistinguishability of commonly used sponges with binary state follows directly from the general result.
Corollary 10.If f is a random function or a random permutation, then Sponge f is quantumly indistinguishable from a random oracle.
Proof.For a random function we use Theorem 8 and for a random permutation Theorem 16 and set C = {0, 1} c .

Application to keyed-internal-function sponges
We show that Theorem 8 implies that keyed-internal-function sponges are indistinguishable from a random oracle under quantum access if the used internal function is a quantum-secure PRF (or if the internal function is a permutation, a quantum-secure PRP).This means that in the case f is a quantum-secure pseudorandom function or permutation the sponge construction is a quantum-secure pseudorandom function.For keyed primitives, indistinguishability from a random oracle/permutation is exactly what we call pseudorandomness.
Definition 11 (Quantum-secure PRF/PRP).Say f : K × S → S is a keyed function (permutation), then we say that f is a quantum-secure pseudorandom function (permutation) if for every quantum algorithm running in polynomial time, there is a negligible function PR such that where n := log |K| and g is sampled uniformly from the set of functions (permutations) from S to S. Below, we refer to PR as advantage.
Now we state and prove a quantum version of Theorem 1 of [1] which formalizes the above statement about quantum security of keyed-internal-function sponges.Note that we state the theorem for the general sponge construction but thanks to Corollary 10 it holds for the regular construction as well.
Theorem 12.If the internal function f used in SpGen f is a quantum-secure PRF/PRP with advantage PR , then the resulting keyed-internal-function sponge is a quantum-secure PRF with advantage where η := 2q(m + z − 2), q is the number of queries A makes to its oracle, m and z are as defined in the statement of Thm. 8, and R is defined according to Definition 6.
Proof.We give the proof for f being a keyed function.The proof when f is a keyed permutation is obtained by using Theorem 16 in place of Theorem 8 and restricting the sets from which g and ϕ f are drawn below to permutations.We show that the advantage of any quantum adversary in distinguishing the keyed-internalfunction sponge from a random oracle is bound by its ability to distinguish f from a random oracle (permutation, respectively) plus its ability to distinguish a random sponge from a random oracle.In the following calculation we use the triangle inequality and the result of Theorem 8.
Quantum Indistinguishability, Thm. 8 or 16 where B is an adversary that uses A as a subroutine, simulating A's oracle using its own oracle and the sponge construction.B outputs the same output as A.

Example proof of Lemma 9
In this section we prove Lemma 9 in a setting limited enough that every step can be done in all details.The main difficulty of our technique is of combinatorial nature, namely counting the possible values of intermediate states in multiple evaluations of SpGen.In the full proof we provide an algorithmic explanation of some steps but here we can execute these algorithms and explicitly write down their outputs.We want to show that the probability function describing the input-output behavior of SpGen is a polynomial of bounded degree in |C| −1 .By that we mean that the expression for p(|C| −1 ) can be written as i a i |C| −i .The proof goes as follows: Firstly we expand the event that on some inputs SpGen gives some outputs, this allows us to pinpoint the individual evaluations of ϕ f .Secondly we impose an order on the evaluations of the internal function; which in turn allows us to exclude state values that would require ϕ f to output different values on the same input, calculate the probability of ϕ f having particular input-output behavior, and divide the set of state values in a way allowing to calculate its size.Finally we obtain a closed expression for p(|C| −1 ).
The limitation we make in the example proof is to consider only single-query algorithms (q = 1).We also restrict ourselves to a limited SpGen that allows only 2-block inputs and always outputs a single r-bit block.As q = 1 the number of input-output pairs we need to consider is 2. The array of inputs and outputs is (M, Z) = ((M 1 , Z 1 ), (M 2 , Z 2 )) and for i ∈ {1, 2} : In the following example ϕ f : S → S, where S := {0, 1} r × C. The probabilistic event we analyze throughout this section is ∀i ∈ [2] : where by φf : S → {0, 1} r and φf : S → C we denote the first part and the second part of the output of ϕ f respectively.In the following paragraph we are going to make explicit all inputs to ϕ f .Throughout this section we will discuss two evaluations of SpGen which are depicted in Fig. 2. In Figure 2 we show the two evaluations of SpGen we analyze.The values of not-boxed-states are fixed by the requirement of inputs being M and outputs Z.
By M i j we denote the j-th block of M i , and similarly by Z i j the j-th block of Z i .Note that by including intermediate states we can further expand the above event.By intermediate states we mean the value of the state of SpGen during calculation of SpGen(M).Namely We are using the upper index to count the number of the evaluation of SpGen.Note that there is one inner state that we have not made explicit, the one being output by the first ϕ f .Following the above reasoning we get ∀i : where S i 2 = ( Si 2 , Ŝi 2 ), we denote the above as ∀i : where ∀i : By adding the subscript "⊕" we highlight that the state output by ϕ f has been updated by XORing the appropriate block of M. Up to this point we have expanded the initial event from Eq. ( 19) to a form with all inputs and outputs of ϕ f being explicit, namely ϕ f (S 1 ) = S 2 .From now on we are going to denote the set of the states by ∇-c (read as "nabla configuration", where the ∇ is suggested by the three values that can be seen as vertices of a triangle), which we define as a matrix where ∀i : ; the constraints we impose fix the input-output behavior of the two evaluations of SpGen.Nabla-configurations ∇-c are matrices of triples but when we want to refer to a part of the triple in row i and column j of ∇-c we are going to write S i j ∈ ∇-c.More formally where by ∇-c i j we denote the element of ∇-c in row i and column j, similarly for the second part of the state S i j⊕ = ( Si j⊕ , Ŝi j ), of the corresponding triple.We introduce this notation of ∇-c to capture possible values of the states in SpGen(M i ) that are consistent with (M, Z).
The set of all possible values of states in ∇-c is denoted by ∇-C(M, Z) (the set of nabla configurations).The size of ∇-C(M, Z) is the number of different ∇-c for a particular (M, Z), In Figure 2 values of not-boxed-states are fixed by the requirement of inputs being M and outputs Z.In what follows we analyze this set to find out how many possible values of states correspond to each value of probability of seeing (∀i ∈ [2] : To better understand our approach we should clarify the implicit equivalence between ∇-c-so values of the internal states of SpGen-and ϕ f -the function taken at random from S S .Note that for every ϕ f ∈ S S we have at most a single ∇-c, we say at most because some ϕ f are not consistent with the input-output pairs (M, Z).On the other hand for a single ∇-c we have plenty of functions: all those that have input-output pairs consistent with values of states in ∇-c and any outputs on all inputs not present in ∇-c.Also note that there are many ∇-c that will result in (∀i ∈ [2] : What we do is basically counting the number of functions ϕ f that will result in SpGen evaluating to Z on M and dividing it by the number of all functions.The only difference is that we immediately simplify the result by not counting the functions with behavior outside of our scope-limited to few (in this section four) evaluations.This simplification is made easier by focusing on relevant values of inputs and outputs; on a few rows of the evaluation tables of ϕ f .The events we take the OR of are disjoint, so in terms of probabilities we get where the sum is taken over S 1 2 , S 2 2 ∈ S and Ŝ1 3 , Ŝ2 3 ∈ C and Si 1⊕ , Si 2⊕ , Si 3⊕ are constrained by (M, Z).Now that we have exposed the individual evaluations of ϕ f we can use the chain rule to specify the order in which we analyze the evaluations of the internal function.This order is only a tool for the analysis of the probability, not the actual time evolution.Note that the probability on the right hand side of Eq. ( 25) is taken over a conjunction of events depending only on a single evaluation of ϕ f .The next step is to extract events with a single evaluation of ϕ f .We can do it simply by using Bayes' formula and the chain rule, where in the last equation we have omitted ∀i, j : S i j , S i j⊕ ∈ ∇-c in each probability function.We denote the order specified above by "≺".
We still cannot evaluate the above expression because we do not know if ϕ f is queried on a "fresh" input or not.First of all, note that thanks to conditioning on one event, we can treat 2 ) from the second factor in Eq. ( 27) as being prior to (ϕ f (S 1 2⊕ ) = S 1 3 ).Prior in that case means that ϕ f is sampled on S 1 1⊕ before it is sampled on S 2 1⊕ .That implies, e.g., that if S 2 1⊕ = S 1 1⊕ then the outputs have to be the same, otherwise the probability is 0. This is what we mean when saying that an input is fresh or not.To separate a particular ∇-c with different numbers of fresh states, we perform a procedure on each ∇-c that assigns flags to the states.Flags mark whether the value of the state was previously input to ϕ f or not.By performing this procedure we want to divide ∇-C(M, Z) into subsets with the same probability-i.e.having the same probability of sampling ϕ f that yields a particular input-output behavior.Let us call this procedure Flag-Assign.Running it also identifies impossible values of internal states, calculates the probabilities of each transition, and divides ∇-C(M, Z) into sets of cardinalities we can compute.
Algorithm Flag-Assign, see Alg. 4 in the next section, takes as input ∇-c and goes through each state starting from the first column going down, then down from the top of the second column and so on.The order in which Flag-Assign operates is depicted by arrows in Fig. 4. If the value of the "⊕" part of the state which is input to ϕ f appears in ∇-c just once, the algorithm assigns the flag "u" to it, we call such states unique.If the value is not unique, i.e. it appears in ∇-c more than once, the state that is encountered first is assigned the flag "f" and the rest of the states get the flag "n".We call states with the flag "n" non-unique.Flag-Assign also appends to each non-unique "⊕"-state the output it should yield, i.e. the output of the corresponding "f" state.If the state in ∇-c that follows the considered state is different than the claimed output we discard the whole configuration.We denote the set ∇-C(M, Z) without states that conflict with ϕ f being a proper map by p-∇-CF(M, Z) (set of p-nabla configurations with flags, p emphasizes the fact that we have restricted ϕ f to proper transformations).By a proper map we mean that it does not output different states on the same input.Elements of p-∇-CF(M, Z) are denoted by p-∇-cf (p-nabla configurations).
After running Flag-Assign on every ∇-c ∈ ∇-C(M, Z) and discarding the configurations with bad output states we still need to add more details to our picture.The procedure we describe below is depicted in Figure 3. Firstly we discriminate between p-∇-cf with different numbers of unique states.The total number of flags is 4, the final states are not inputs to ϕ f and are not assigned a flag.We denote the number of unique states by u, the number of states that are non-unique but appear for the first time is f , and the number of non-unique states is n.Note that u+f +n = 4.In general there are 5 possible sets of those numbers in the case of q = 1 and lengths of the input and output strings we specified.These are as follows: (u = 4, f = 0), (u = 2, f = 1), (u = 1, f = 1), (u = 0, f = 2), and (u = 0, f = 1).Secondly we discriminate between different placements of flags.For each setup there are several possible placements of flags.For (u = 4, f = 0) and (u = 0, f = 1), flags can be set in only one configuration.If we have 2 unique states and the setup is (u = 2, f = 1) then there are 6 possible configurations of flags.For (u = 1, f = 1) there are 4 and for (u = 0, f = 2) only 2. While calculating the number of configurations it is important to remember that the flag of the first state S 1 1⊕ is either u or f.There are some details of how to find the placements but they are made explicit only in the full proof of the lemma.All possible placements are depicted in Figure 3.
In most steps we perform using Flag-Assign the distinction between u and f seems unimportant.We will need it to properly identify different placements of flags in p-∇-CF(M, Z), but indeed in all other tasks one can treat them as a single "unique" flag.
The next step is calculating the number of values that can be assigned to states in a given setup and for a given placement of flags.We calculated those numbers assuming it is possible to have such placement.This assumption is not always true as particular messages and outputs exclude some options.For example, if both messages start with the same block then all positions where the two first states are unique are impossible.By Calc we denote the algorithm calculating the cardinality of subsets of p-∇-CF(M, Z), the details of Calc are specified below in Alg. 5.It goes through a single placement of flags.The basic rules of its operation are: for every unique flag that maps to a unique flag multiply the result by 2 r |C|, for every unique flag that maps to a non-unique flag multiply the result by 1, for every non-unique flag that maps to any state multiply the result by 1, and for every unique flag in the last column of flagged states multiply the result by |C|.The first two rules are adjusted a bit to keep track of non-repeating unique states and allow for multiple values of non-unique states respectively.The last column in Fig. 3 lists the results of Calc for placements in the respective rows.If the squeezing phases were longer we would have to account for the fact that outer part of the state can be either unique or not which slightly changes the final outcome.Now we want to show that the probability function p is a polynomial in |C| −1 .Up to this point we have shown that where the order "≺" is the same as in Eq. ( 27).In the above equation we have discarded those ∇-c that require ϕ f to output different states on the same input, because the probability is then 0. The sum in Eq. ( 28) can be expanded to where by p-∇-cf(u, f, P ) we denote the configuration with the number of unique and nonunique states and placement fixed.P(u, f ) is the set of all placements in which the flags can be arranged given the number of unique and non-unique states.We omitted the input (M, Z) to P for brevity.Making use of information from Fig. 3 we can now evaluate expression (28).Note that setting u and f to some particular values allows us to evaluate the probabilities.Denoting the total number of unique states by ū we get that for p-∇-cf with u + f = ū.Finally we arrive at where the sets are denoted as in Fig. 3.The function appearing in the above equation is defined as where in Eq. (31) we omitted the input of (M, Z) for readability.The degree of p(|C| −1 ) is at most 2, as claimed for messages of length 2r-bits.The coefficient for |C| 0 is Let us recapitulate the results of this section.First we characterized the possible internal functions by the outputs of their consecutive evaluations, Eq. ( 26).Secondly we captured the features of the intermediate states that determine the probability of seeing a particular inputoutput behavior, Eq. (30).Finally we calculated an explicit formula for the probability function, Eq. ( 31), (33).

Proof of Lemma 9
In this section we give the complete proof of Lemma 9 for the general case of q ≥ 1 queries the adversary makes and message lengths bounded by some m, not fixed to 2 like in the previous section.In Subsection 6.1 we expand the probability expression to encompass all intermediate states of ∀i ∈ [2q] : SpGen ϕ f (M i , i ) = Z i and individual evaluations of ϕ f .In Subsection 6.2 we introduce the concept of unique states to evaluate the probabilities of P[ϕ f (S 1 ) = S 2 ].In Subsection 6.3 we define the algorithm that calculates the cardinality of the set of intermediate states-and equivalently inner functions-consistent with given characteristics.In Subsection 6.4 we conclude the proof and provide the final expression for the probability of an input-output pair under a random SpGen ϕ f .We omit the padding function of the sponge construction and assume that the length of all M i is a multiple of r.This is done without loss of generality since we can just say that all the considered messages are in fact messages after padding and we do not use any properties of the padding in the proof.Also we focus on q evaluations of SpGen instead of 2q to improve readability.

Expansion of the probability function
In this section we expand the probability function to the point that all intermediate states are accounted for.We follow exactly the same reasoning as at the beginning of Section 5. We consider the event ∀i ∈ [q] : SpGen ϕ f (M i , i ) = Z i and then include the states that appear between consecutive evaluations of ϕ f , similarly to the steps in Equations ( 19)- (21).
To keep track of the states we introduce the following notation.By the upper-index we denote the number of evaluations of SpGen, going from 1 to q.The lower index corresponds to the number of evaluations of ϕ f in the i-th calculation of SpGen.A state occurring during the calculation on M i that is the input to the j-th evaluation of ϕ f is denoted by S i j⊕ .The output of that evaluation is S i j+1 .All states traversed in q evaluations of SpGen can be depicted in a similar way as in Figure 2 but in an array with q rows with |M i | r + |Z i | r columns each.
We call an array like that presented in Figure 2 with values assigned to every state a nabla configuration ∇-c.∇ symbolizes the triangle shape in which we put states between evaluations of ϕ f , each corner being an outer or inner part of the state.Note that in the figure we assume the initial state to be equal to (0 r , I C ) which is not included in our definition of ∇-c.By array we mean a 2-dimensional matrix with unequal length of rows.Now we define ∇-c relative to input-output pairs (M, Z).The size of the array is determined by the number of blocks in M i and Z i .
To refer to the element of ∇-c that lies in row i and column j we write ∇-c i j .To refer to parts of the triple that lies in row i and column j we write Let us define the number of evaluations of ϕ f in ∇-c for (M, Z) as To make good use of the newly introduced concept of nabla configurations ∇-c we want to restrict the set of arrays we discuss.Similarly to Equation (22) we want to put constraints on the set of ∇-c to make explicit the requirement that states correspond to a correct input-output behavior of SpGen.The set of ∇-c for (M, Z) is defined as follows.
Definition 14 (∇-C(M, Z)).The set of nabla configurations ∇-c for (M, Z) is a set of arrays of size specified by (M, Z), ∇-C(M, Z) ⊂ {0, 1} 2r × C κ+q .We define ∇-C(M, Z) by the following constraints ∀i ∈ [q] : Ŝi The formal definition reads In the following we assume that rows of all ∇-c ∈ ∇-C(M, Z) are initially sorted according to the following relation.We arrange (M i , Z i ) in non-decreasing order in terms of length, so ∀i < j : k i ≤ k j , this also means that rows of ∇-c are ordered in this way.
Having established the notation we move on to realizing the goal of this section: rewriting the probability function in a suitable way for further analysis.In the following when we consider ϕ f (S i j⊕ ) = S i j+1 for some ∇-c we leave implicit that S i j⊕ , S i j+1 ∈ ∇-c.We have that In the above equations we first include the intermediate states and then combine all evaluations of ϕ f .In the following we make use of the fact that the events we take the disjunction of are disjoint and the logical disjunction turns into a sum of the probability.
To further extract an expression involving the probability of a single ϕ f (S i j⊕ ) = S i j+1 we use Bayes' rule.By a chain of conditions we want to arrive at a function we can evaluate in the end.At this point we want to choose a particular order of ϕ f (S i j⊕ ) = S i j+1 events.Let us define the order ≺ as The above rule imposes an order that begins with the top-left corner of Figure 2 and proceeds downwards to the end of the column to continue from the second column from the left.
In the case there is no state (q − 1, k q − 1) we just take the next state preceding (q, k q − 1) in the order given by Equation ( 42).
Up to this point we have performed some transformations of the event ∀i ∈ [q] : SpGen ϕ f (M i , i ) = Z i , but we did not address the issue of correctness.Is it correct to consider state values in evaluations of SpGen instead of different ϕ f -are we in fact discussing the probability over the random choice of the internal function?The answer to this question is "yes", that is because of the equivalence of every ∇-c with some set of ϕ f .We can treat the input-output pairs for ϕ f assigned in ∇-c as values in the function table of ϕ f .By picking a single ∇-c we fix at most κ rows of this table.As we sample ϕ f uniformly at random we are interested in the fraction of functions that are consistent with the input-output pairs (M, Z) among all functions.Note however, that we only care about κ evaluations of ϕ f and all the details of those future evaluations are implicitly simplified in the fraction.This allows us to focus only on the part of the function table corresponding to those few evaluations and that is exactly ∇-c.The summing over nabla configurations ∇-c corresponds to different values of the function table that are still consistent with (M, Z).
The probability or 1 or 0. If the internal function is queried on a "fresh" input, it outputs any value with uniform probability.If on the other hand it is queried on the same input for the second time, it outputs the value it has output before with probability 1.One might think that the proof is finished, p(λ) = i w i (λ), where w i are monomials in λ of degree up to κ + q.There is one problem with that reasoning, namely that the sum limits depend on the variable λ.Up until now we have shown that p(λ) = v(1/λ) i=1 w i (λ), where v is another polynomial.Even for v = id (the identity function) the degree of p is different than the maximal degree of w i .This means that we have to analyze the expression derived in Equation (43) in more detail.To this end, we add more structure to ∇-C(M, Z) which will make it easier to count the number of values that the intermediate states can assume, i.e. the number of nabla configurations ∇-c in ∇-C(M, Z).

Algorithm 4: Flag-Assign
Input : Set counter i := 1 // the number of states with the same value // (state with the same value and a flag, indices, image) 16 Make a 2-dimensional array out of ∇-cf according to the second entry in a standard left-to-right order ((i, j) ) 0 otherwise . (45)

Cardinality of ∇-C(M, Z)
In this section we evaluate the number of intermediate states that give ∀i ∈ [q] : SpGen ϕ f (M i , i ) = Z i .First we impose the constraint of ϕ f being a function.Then we want to calculate the product of probabilities in Eq. ( 43).It depends on the number of unique states in ∇-c so we divide the set of possible states into subsets with the same number of states with the flag u or f.The next steps involve further divisions of ∇-C(M, Z).
In the process of calculating the conditional probabilities in Eq. ( 43) we included in each state in ∇-c the image it should have under ϕ f .The set ∇-C(M, Z) does however contain states that would violate the constraint of ϕ f being a function.The first step to calculate the cardinality of ∇-C(M, Z) is to exclude ∇-c that do not fulfill this requirement.The set of states that should be taken into consideration is defined below, we denote this set by p-∇-CF(M, Z) (p emphasizes the fact that ϕ f is a proper function).
Definition 15 (p-∇-CF(M, Z)).The set of nabla configurations ∇-c for (M, Z) with flags and a proper function ϕ f is a set of arrays of size specified by κ+q , the set is defined in two steps, first we define the set of ∇-cf that are output by Flag-Assign, We define p-∇-CF(M, Z) by the following constraints on ∇-CF(M, Z): The formal definition reads One may think about p-∇-CF(M, Z) as follows, first we consider ∇-c: an array of states.The collection of all those arrays-with the exception of those that do not fulfill constraints (37)-is denoted by ∇-C(M, Z).On each ∇-c ∈ ∇-C(M, Z) we run the algorithm Flag-Assign, getting a collection of ∇-cf-denoted by ∇-CF(M, Z).Now we discard all those ∇-cf that do no fulfill constraints (47).The collection we are left with is denoted by p-∇-CF(M, Z).We have the following relations between sets: omitting the flags ∇-C(M, Z) Each p-∇-cf ∈ p-∇-CF(M, Z) has some number of unique states: with flag u or f.Let us denote this number by ū.Eq. ( 45) implies that no matter in what configurations the unique states are, the product of probabilities in Eq. ( 43) is the same.Hence the first division of p-∇-CF(M, Z) is in terms of the total number of unique states.We denote the state with a fixed number ū by p-∇-CF(M, Z, ū), we have that The product in Eq. ( 43) for p-∇-cf ∈ p-∇-CF(M, Z, ū) evaluates to (q,kq−1) where all states p-∇-cf are in p-∇-CF(M, Z, ū).
We have to work a bit more to calculate the total number of states.The number of possibilities in which a single transition event can be realized depends both on the input and the output.For that reason we need to specify the configuration of flags in more detail, not just by the total number of unique states.Let us denote a transition event from a unique state to a unique state by ϕ f ( (u ∨ f ) S ⊕ ) = (u ∨ f ) S and similarly for other flags.The flag of the output is defined by the XORed message block or the output block.Before we go into details of the analysis of the structure of p-∇-CF(M, Z), we list the intuitive principles of counting the output states depending on the input and output states: the only constraint is that the output cannot be the same as any on the previous unique states, the number of possible output values is at most 2 r • |C| or |C| and can be smaller by at most κ (the bound is 2 r • |C| if the transition is in the absorbing phase and |C| if it is in the squeezing phase), 2. ϕ f ( (u ∨ f ) S ⊕ ) = n S -the output has to be in the set of outputs of states with the flag f, the number of possible output values is at most κ, -the output is defined by the image memorized in the second entry of the state, the number of possible output values = 1.
The actual numbers in the above guidelines can be calculated precisely but they depend on the actual case we deal with.
To properly treat the transition events we need to keep track of not only the total number of unique states but also the number of truly unique u states.We denote the latter by u and the set with those numbers fixed by p-∇-CF(M, Z, ū, u).In the above paragraph we also noticed that we should include in our considerations the number of unique states in different phases of SpGen.The number of states with the flag u in the absorbing phase is denoted by u abs .Note that we are addressing all q absorbing phases so we take into account flags of all states with indices (i, j) ∈ {(i , j )} i ∈{1,...,q},j ∈{1,...,|M i |r} .The number of states with the flag u in the squeezing phase is denoted by u squ and we take into account states with indices (i, j) ∈ {(i , j )} i ∈{1,...,q},j ∈{|M i |r+1,...,k i −1} .Similarly the total number of unique states is denoted by ūabs and ūsqu .
Next we fix particular placements of flags in the arrays p-∇-cf ∈ p-∇-CF(M, Z, ūabs , u abs , ūsqu , u squ ).We no longer need to keep u and ū explicit as u = u abs +u squ and ū = ūabs + ūsqu .Let us define a placement P for (M, Z) as an array of flags F ∈ {u, f, n} with its dimensions determined by (M, Z) in the same way as for nabla configurations ∇-c.The set of placements P(M, Z, ūabs , u abs , ūsqu , u squ ) is defined as the set of all placements P encountered in elements of p-∇-CF(M, Z, ūabs , u abs , ūsqu , u squ ).We are going to write Flag(P i j ) to determine the flag in the position (i, j) in placement P .For each P we are able to calculate the size of p-∇-CF(M, Z, P ), we no longer add ūabs and other parameters as they are already included in P .Before we define the algorithm performing this calculation we need to bound the number of different placements.
Let us assume for a moment that (M, Z) restrains only the size of p-∇-cf and not the values of the states.If there were no constraints coming from the workings of Flag-Assign then unique states would be distributed in all combinations of picking ūabs elements among states in absorbing phases.Additionally, we also want to take into account combinations of u abs elements among the ūabs flags.Let us recapitulate: first we distribute ūabs flags (without specifying whether they are u or f) and then assign them concrete values (u or f).The total number of state-triples in the absorbing phases of p-∇-cf is µ := q i=1 |M i | r .The number of possibilities for the first step is The two calculations above bring us to the conclusion that our analysis is sufficiently detailed; we have identified and taken into account all parts of ∀i ∈ [q] : SpGen ϕ f (M i , i ) = Z i that depend on |C|.In summary we divided p-∇-CF(M, Z) into a small (relatively to |C|) number of subsets whose size we can actually calculate.The last result assures that even though we do not formally describe the structure of the last level of division of p-∇-CF(M, Z), the number of possibilities of next divisions does not depend on |C|.So we have that Our assumption is that κ is fixed so the number of placements is independent of |C|.Note that we can compute |P(M, Z, ūabs , u abs , ūsqu , u squ )| for fixed parameters and the above inequality just shows that irrespective of the exact value of the calculation the number of placements does not depend on |C| and is relatively small.Let us define a function that helps us accommodate for the fact that some subsets of p-∇-CF(M, Z) are empty for some specific (M, Z): In what follows we leave out the input to δ, as it can be inferred from context.For example δ evaluates to 0 if the input includes ūabs = µ and the first block of the input messages is not always different.
The last division we make is done be characterizing uniqueness of outer and inner parts of states.This step is done to get the precise and correct result, but the high level explanation and an approximation of the output of Calc is already captured in 1.We have not captured this situation in detail in our example proof because it becomes important only if longer outputs are present.Here we explain the procedure of including the necessary details.
Main detail we add is assigning flags to outer and inner parts of states individually.We introduce those flags only now to keep the proof as clear as possible; technically to include the additional flags we modify the algorithm Flag-Assign in such a way that it runs over a configuration ∇-c two additional times but acting solely on outer states and inner states.Those two additional runs assign the same flags as the original one but corresponding to just one of the parts of S ⊕ states.Rest of the discussion after applying Flag-Assign is unchanged and depends only on flags of the full states.
When discussing placements note that a unique state (u or f) can consist of a unique outer state and a unique inner state but also out of a non-unique outer state and a unique inner state or vice versa.After we assign a particular placement P ∈ P(M, Z, ūabs , u abs , ūsqu , u squ ) there are still many possibilities of arranging outer and inner states flags.There are exactly three possibilities every unique state can be arranged in: where we symbolize a state S ⊕ by a column vector with flags assigned to its outer state in the first row and inner state in the second row.Hence, for every placement P we have 3 ūabs +ūsqu placements of the outer and inner states flags.We are going to mark the fact that we have included those additional details into placements by adding a star to the set of placements P ∈ P * (M, Z, ūabs , u abs , ūsqu , u squ ).We have that We also write Flag( P i j ) and Flag( P i j ) to access the flag of the outer and inner part of P i j respectively.Alg. 5 below shows the algorithm Calc that outputs the number of different p-∇-cf ∈ p-∇-CF(M, Z, ūabs , u abs , ūsqu , u squ ) for some given placement P ∈ P * (M, Z, ūabs , u abs , ūsqu , u squ ).To capture the fact that the number of possible values a unique state can have depends on the number of unique states with already assigned values we define the following sets.For unique outer states we have Ūprev (P, i, j) := P i j : (i , j ) ≺ (i, j) ∧ Flag( Ūf prev (P, i, j) := P i j : (i , j ) ≺ (i, j) ∧ Flag( For unique inner states we have Ûprev (P, i, j) := P i j : (i , j ) ≺ (i, j) ∧ Flag( P i j ) ∈ {u, f} , Ûf prev (P, i, j) := P i j : (i , j ) ≺ (i, j) ∧ Flag( Note that all of the above quantities (57, 58, 59, 60) are bounded by 1 ≤ Ūprev (P, i, j), Ûprev (P, i, j), Ūf prev (P, i, j), Ûf prev (P, i, j) ≤ ūabs + ūsqu ≤ κ.
In the algorithm we also use N-Possibilities, defined in Eq. ( 87), is the number of possibilities in which one can assign values to non-unique states in a nabla configuration.N-Possibilities is bounded by κ κ , which is an upper bound for Eq. ( 88).Thanks to the additional details we get the precise form of the expression p but note that when we sum over placements of outer and inner states flags we have that same δ(M, Z, P ) for all cases in the absorbing phases, so we sum the expressions listed in Calc and get the same result as in Section 5.In the squeezing phases the outer states are fixed by the outputs Z and we can do the sum over placements with the same flag of the outer state in Calc.

Final expression
In the previous subsections we formalized algorithms that help us analyze the expression in Eq. ( 43).First we introduced Flag-Assign that analyzes ∇-c from the perspective of having the same input to ϕ f multiple times.Then we defined Calc that counts the arrays of states that fulfill a given set of constraints, the number and arrangement of unique states.The final part of the proof of Lemma 9 is to use those algorithms to show that p(|C| −1 ) is of the claimed form.We start by formally writing down the expression in terms of divisions of p-∇-CF(M, Z) we introduced and the outputs of Calc.Next we identify crucial elements of the sum that lead to the claim of the lemma, showing the maximal degree of |C| −1 in the expression p(λ).
In the previous sections we showed that Eq. ( 64),(65) (q,kq−1) (i,j)=(1,1) Eq. ( 52) where the second equality comes from the fact that constraints ( 47 To calculate a 0 and the maximal degree of p let us focus on p(|C| −1 ) for all unique (with the flag u in both outer and inner part) sates: (67) In the above expression if we take all messages of maximal length m and outputs of maximal length z we get a polynomial of degree κ − q = q(m + z − 2).This is necessarily the maximal degree as every evaluation of ϕ f increases the degree by one, except for the last but this cannot be changed, the last column does not matter at all for the overall probability.Hence the maximal degree of p is as claimed In the case all states are unique, i.e. |C| → ∞, p(|C| −1 ) evaluates to ∼ 2 − i i .This expression corresponds to the output probability of a random oracle, exactly how expected of a sponge with all different inner states.If we only take the terms 2 r |C| and |C| and the probability we arrive at 2 − i i .This result is only one of the terms in a 0 but note that all other terms will correspond to different placements and will include δ(M, Z, P ) with different inputs, being non-zero for different (M, Z).Hence for any given input-output pairs (M, Z) for |C| → ∞ the probability function approaches the probability of a random oracle outputting Z on M. To get the power of |C| equal to zero we need to have the same number of unique states (probability terms decreasing the degree by one) as pairs of unique states (increasing the degree by one).Configurations that satisfy those conditions come from inputs and outputs that are either fully unique or exactly the same as at least one other input or output, respectively.One special case occurs if the output is just a single block long then messages can differ by just the last block and still have different outputs (like in our example proof in Section 5).
In our proof we have focused on the case of ϕ f being a random transformation.In Appendix 7 we provide the details that should be considered to show that Theorem 8 holds also for random permutations.

Open Question
One of the most desirable security notions for hash functions is indifferentiability from a random oracle which is defined with respect to a possible simulator that fools a distinguisher into believing that it interacts with the internal function instead of a simulation of it.Proving indifferentiability is more challenging than indistinguishability.It is not clear whether the natural translation of the classical notion of indidfferentiability to the quantum setting is achievable.Only recently, two articles [9,25] opened the discussion, but so far, the results remain inconclusive.
In our work, we provide a quantum security guarantee more suitable for keyed primitives where an attacker does not have access to the internal building block.On the one hand, we increase the trust that hash functions based on the sponge construction are quantum safe and on the other hand, we formally prove that it is a quantum secure pseudorandom function when used with a keyed internal function-like it is used in the hash-based signatures scheme SPHINCS+ [21] in the instantiation using the Haraka hash function [13].
9 Symbol index Y X The set of functions {f : The q-element set [q] := {1, 2, . . ., q} The total number of evaluations of ϕ f in q evaluations of SpGen on M i with outputs Z i . 18 The number of internal states in evaluation of SpGen(M i ) outputting Z i .

∇-C(M, Z)
The set of all ∇-c constrained by (M, Z). 19

Flag-Assign
The algorithm for assigning u, f, and n flags in ∇-c.22 The set of all ∇-c constrained by (M, Z) with Flag-Assign run on it.

p-∇-cf
A particular assignment of states in Sponge evaluations of (M, Z) with additional constraints for ϕ f being a valid function.The algorithm that outputs the cardinality of p-∇-CF(M, Z, P ). 27 The number of states with the next step being in the squeezing phase of SpGen. 25 The number of states with the next state being in the absorbing phase, κ = µ + ζ.

T (S)
The set of permutations from S to S, {ϕ f : S → S | ϕ f is a bijection}

A Direct proof of indistinguishability with permutations
Here we prove Theorem 16 by direct application of Theorem 4 instead of relying on the PRF/PRP switching lemma.For this proof we need to generalize the average-case polynomial method.We show how to use it if the probability of a certain input-output behavior is not a polynomial but is close to a polynomial.This small generalization might prove useful in other applications of the polynomial method.The following is a restatement of Theorem 16, with a slightly worse bound.
Theorem 17. SpGen ϕ f for a random permutation ϕ f is quantumly indistinguishable from a random oracle.More concretely, for all quantum algorithms A making at most q quantum queries to SpGen, such that the input length is at most m • r bits long and the output length is at most z • r bits long, where η := 2q(m + z − 2), R is defined according to Definition 6, and the set of permutations is denoted by T (S) := {ϕ f : S → S | ϕ f is a bijection}.The domain is defined as S = {0, 1} r × C for some non-empty finite set C.
Proof sketch.The proof follows the same reasoning as the proof of Theorem 8 with small differences explained in the following.We define the family of distributions F t with random permutations ϕ f from T (S).When we get to Eq (14) though, we need an argument different from Lemma 9 because it does not hold for permutations in SpGen.We perform the same analysis of the probability function as in the proof of Lemma 9.Only the final argument is missing as we cannot use Theorem 4: P ∀i ∈ [2q] : SpGen ϕ f (M i , i ) = Z i is not a polynomial in |C| −1 if ϕ f is a permutation.Instead we formulate a generalization of Theorem 4 in Lemma 19 below that leads to the claimed bound.
Let us now highlight the differences we encounter when analyzing the case of permutations when following the reasoning of the proof of Lemma 9.The first and main difference is that the expression for the probability of a single evaluation of ϕ f (Equations (45)) changes to: where U prev (i, j) := P i j : (i , j ) ≺ (i, j) ∧ Flag(P i j ) ∈ {u, f} is the number of unique states preceding the position (i, j).Note that we assume we have done all steps leading to Eq.(45).The product in Eq. (43) for p-∇-cf ∈ p-∇-CF(M, Z, ū) and for ϕ f being a random permutation evaluates not to Eq. ( 52), but instead to (q,kq−1) (i,j)=(1,1) assumptions of Theorem 18 above for all t ∈ Z + .As g j , p j , p j , p all take on the same value for t → ∞, we obtain that p(0) = g j (0), and hence the claim follows.
Now we just need to show that P[∀i ∈ [2q] : SpGen ϕ f (M i , i ) = Z i ] is bounded by polynomials and find their degree.We are going to show that there are p , p (indexed with j in the statement of the lemma) that are polynomials and that fulfill the assumptions of Lemma 19 for a = 1.For each set of pairs of inputs and outputs we consider g to be a sum like in Eq.(78).
To do that we need to distinguish between placements P that involve at least two consecutive unique states and those that do not.Let us deal with the former case first.We can bound the probability part of Eq. (78) as follows: Note that we have skipped the first term in the product that is supposed to range from 0 to ū−1.We have done it because in the case we discuss, the first term that is output by CalcPer necessarily involves |C| (not |C| − 1).Thanks to how we divide the product the final expression is a polynomial in 1 |C|−1 .In the latter case, no consecutive pairs of unique states, we bound every element of the product like in the above inequalities.
As for CalcPer we just treat |C| − 1 as the new variable.Note that now p j and p j are polynomials in (|C| − 1) −1 .Polynomial p has the same degree as the the polynomial corresponding to g in the proof for functions, i.e. following the derivation of Eq. (68) it equals d = η = 2q(m + z − 2).From the above lower bound however, we get that d = 2d .
The last assumption we need to check is for p , p to be bounded by 0 and 1 for |C| > 1.Note that it is enough to show that p ≥ 0 and p ≤ 1.We already know that p and p bound g.For the lower bound p ≥ 0 comes from the fact that all coefficients are positive (they equal CalcPer(P )) and so is 1 For the upper bound of g we need to check that p ≤ 1.Following the algorithm Alg. 5 we can see that CalcPer(P ) is bounded by 2 r(ū−q) |C|(|C| − 1) ū−1 , so as long as the number of terms in Eq. ( 78) is smaller than 2 qr -which is our implicit assumption-then p ≤ 1.
The above discussion, together with Lemma 19 proves Theorem 17.

B Additional details B.1 Auxiliary algorithms, functions
Let us consider the problem of assigning values to the non-unique states given some placement P .Let us denote the number of non-unique flags n by n and the number of "first" non-unique states with flags f by f .We are going to analyze the combinatorial problem of assigning n objects to f classes in a way that each class has assigned at least one object.Objects in a single class are indistinguishable but are distinguishable between different classes.For example three objects that we divide among two classes putting one object in the first class and two in the second can be assigned in three ways: we put into the first class the first object or the second object or the third.We do not count the fact that the two objects that are in class two can be in one order or another.The number of permutations in such a problem in the general case of n objects and f classes is given by Note that the above formula requires that we specify the occupation of classes.These occupation numbers are not fixed by the placement P so we also need to analyze these occupation numbers.The occupations of the classes are defined by the possible distribution of objects among different classes.Let us define the set of possible distributions: There is one more detail we need to add to properly count all possible assignments of nonunique states; repeated values of each different f appear in P only after the initial unique state.Let us denote by Π class-ind (n; n 1 , n 2 , . . ., n f ) the set of permutations with classes of indistinguishable objects, enumerated by P (n; n 1 , n 2 , . . ., n f ).We need to implement the requirement coming from the nature of working of Flag-Assign.The set of permutations after including this constraint is Π (P, n; n 1 , n 2 , . . ., n f ) := {π ∈ Π class-ind (n; n 1 , n 2 , . . ., n f ) | there are no objects in class i prior to the i-th state according to P } . (86) Eventually we count the number of possible assignments of values of non-unique states: N-Possibilities(n, f, P ) := The most crucial observation of this subsection is that the number of possible assignments does not depend on |C| and N-Possibilities(n, f, P ) ≤ f n .
The above is a trivial bound found by ignoring all structure and only counting the total number of possibilities to put one of f values in every of the n places.

B.2 Auxiliary algorithms, permutations
In the case of permutations we need to add one constraint to N-Possibilities

Figure 1 :
Figure 1: A scheme illustrating the sponge construction.

Figure 2 :
Figure 2: Table showing the intermediate states of the limited SpGen.Boxed states are the elements of ∇-c that do not have a fixed value across different ∇-c ∈ ∇-C(M, Z), the arrows indicate the order in which Flag-Assign assigns flags.

Figure 3 :
Figure 3: Possible placements for the limited SpGen. . . . .denotes the flags assigned to the four states.ū denotes the total number of unique states: ū := u + f .

Definition 13 (
∇-c).The nabla configuration ∇-c for (M, Z) is an array of triples S S⊕ Ŝ ∈ {0, 1} 2r × C, where C is an arbitrary non-empty finite set.The array ∇-c consists of q rows, for every i row i has k i columns and k i := |M i | r + |Z i | r (|M i | r denotes the number of r-bit blocks in M i ).Formally we have µ ūabs and the second step is ūabs u abs .The total number of possibilities of placing the unique flags in absorbing phases is µ ūabs • ūabs u abs .The problem of distributing unique states in squeezing phases is the same as in absorbing phases.The total number of state-triples with flags in the squeezing phases of p-∇-cf is ζ := q i=1 (|Z i | r − 1).The number of placements is ζ ūsqu .We also need to multiply this result by the number of placements of states with flag u among all unique states.

23 p
-∇-CF(M, Z) The set of all p-∇-cf constrained by (M, Z) and ϕ f .23 P The set representing a particular placement of unique and non-unique flags in p-∇-cf 24 p-∇-CF(M, Z, * ) p-∇-CF(M, Z) with specified parameters * .24 Calc